memory regions #70257
Replies: 35 comments 159 replies
-
why not //go:region
func myFunc(buf []byte) error {
data := new(MyBigComplexProto)
if err := proto.Unmarshal(buf, data); err != nil {
return err
}
use(data)
} |
Beta Was this translation helpful? Give feedback.
-
I think this is worth considering in isolation. |
Beta Was this translation helpful? Give feedback.
-
I don’t like it at all. Arena memory management work great in C++ because you are already manually managing memory. Bringing that to go is a step backwards. Better to integrate different GC algorithms that better handle some workloads. Adding more manual memory management to Go is a bad choice. Already I guess that unsafe causing 90+% of all of Go hard to debug bugs. |
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
-
I am in favor of removing Ignore(), as personally it seems so much simpler to reason about programs using this feature without it. What happens when a CGo call happens in the middle of region.Do? |
Beta Was this translation helpful? Give feedback.
-
I wonder if this or a similar scheme could be used in the future for situations where multiple goroutine objects have the same life cycle? |
Beta Was this translation helpful? Give feedback.
-
I'm slightly concerned about the use of the stdlib package namespace. I reacted already at the Identifiers already often conflict with stdlib package names, most commonly |
Beta Was this translation helpful? Give feedback.
-
Can't the compiler do escape analysis at the goroutine-level and use this as a hueristic for guiding such an allocation optimisation? Wouldn't all self contained goroutines benefit from this? I don't see any reason why a new API needs to be introduced for this. It would be disappointing to see yet another interesting technical optimisation pushed unnecessarily into the language through some new magical standard library package. |
Beta Was this translation helpful? Give feedback.
-
What happens with variables that internally own memory? Are there any differences in what memory is bound to the region between these examples? // Example A: No region.
x := big.NewInt(1)
for range rand.IntN(32) {
x.Lsh(x, 64)
}
return x // Example B: Region for the loop.
x := big.NewInt(1)
region.Do(func() {
for range rand.IntN(32) {
x.Lsh(x, 64)
}
})
return x // Example C: Region per iteration.
x := big.NewInt(1)
for range rand.IntN(32) {
region.Do(func() {
x.Lsh(x, 64)
})
}
return x // Example D: Explicit copy out of the region.
x := new(big.Int)
region.Do(func() {
y := big.NewInt(1)
for range rand.IntN(32) {
y.Lsh(y, 64)
}
x.Set(y)
})
return x |
Beta Was this translation helpful? Give feedback.
-
For arenas and now regions, the obvious use case is allocating request and response objects. This has far better ergonomics than arenas, so that's very good, but the design seems to be fooling itself a bit: obviously most go programs in the wild would have to live entirely inside regions if this plan went forward, since generally speaking 100% of all running code will be an in flight endpoint call that needs access to the request that initiated it. Making all goroutines have regions isn't just an interesting experiment, that is an accurate description of how most production go code will run and if go can't handle it, this proposal cannot succeed. I've mentioned it in the past, and I know language changes are a serious issue, but one possibility not explored in the design doc is a type keyword that would prevent taking actions with a variable that would result in its data being promoted out of its region (enforced by compiler), thereby allowing region-allocated data to be safely used read-only on the stack within an ignore goroutine. It's a big lift, but if it doesn't exist then everything is going to be in a region anyway and this proposal should probably acknowledge that. |
Beta Was this translation helpful? Give feedback.
-
I am not certain why you couldn’t get most of the benefit transparently with a small object non copying per routine generational collector/region. So objects that are recently allocated can be quickly collected - hopefully cleaning the region. The region could also support bump allocations if mostly empty - which is the expectation of a region biased workload. |
Beta Was this translation helpful? Give feedback.
-
Why not tie the lifetime of the region to a context? Then escape analysis from the goroutine would be unnecessary AND this could be used across goroutines. Specifically this would make it usable for the whole lifetime of an http request. If we were getting really fancy, we could add a new variant of new, newCtx(ctx, type, size…) that uses the allocator associated with a context, or the default global allocator if none is associated with the context. |
Beta Was this translation helpful? Give feedback.
-
Hi, thank you for your detailed proposal and explanations. Could you answer some questions, please?
func myFunc(buf []byte, fn handleFunc) (err error) {
region.Do(func() {
data := new(MyBigComplexProto)
if err = proto.Unmarshal(buf, data); err != nil {
return
}
fn(data) // data passed to the function outside of the region
})
return
}
func myFunc(buf []byte, fn handleFunc) (err error) {
region.Do(func() {
data := new(MyBigComplexProto)
if err = proto.Unmarshal(buf, data); err != nil {
return
}
go process(data) // data passed to another goroutine
})
return
} Thanks |
Beta Was this translation helpful? Give feedback.
-
Thanks for working on this. It sounds fantastic.
This comment (above) helps my mental picture of how regions would work. Question: As I potential user of regions, one question in my mind is: do I need to concern myself with the (region or heap) origin of the memory (a pointer to struct of some kind) that I send on a channel? Is there an efficiency difference if something is coming from a region because it needs to be copied to the garbage collected "region" so to speak? Comment: a nice side effect of any design would be to enable users to experiment readily with memory allocation strategies, to find those most appropriate for their code. I wrote an off-heap hash table for my Go code about 10 years ago, and the main pain-point in use was that I had to serialize and deserialize everything manually to move objects between the heap and the manually managed memory ( https://github.com/glycerine/offheap ). Ideally I imagine that Beside the off-heap hash table, a second use case for user-customization would be when running in WASM code on a web-worker thread, and wanting to pass memory to a WasmGC implementation for other components to use. Go compiled to Wasm is probably never going to be able to use WasmGC's moving collector, because it is moving and lacks support for interior pointers, but we might well want to be able to inter-operate with other languages that do. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the work on this! I'm excited to see this completed. Regarding the concern in the design related to allowing regions to cover multiple goroutines, this might dovetail really well with a structured/scoped concurrency approaches like https://github.com/sourcegraph/conc. If the lifetime of a goroutine is entirely within a In terms of the implementation issue, I agree that it might be better to leave this as a "potential future enhancement". It would be reasonable for this to be higher cost and possibly even have a variant which allowed the user to specify the "low cost single goroutine" region or the "higher cost including child goroutine" regions. Something like The paradigm I'm thinking about is something like a request/response microservice that kicks off goroutines to make additional requests along the way. Everything is bound to the top level request and when that is complete the entire request and all sub-requests will be destroyed. Within the requests there could be data sharing through channels or other mechanisms, but it would all be in a single region. |
Beta Was this translation helpful? Give feedback.
-
I was surprised at the last word in this sentence from the detailed design. Was meant to be "regular heap spaces"? Is there such a thing as a "regular heapArena"? It feels like it might be a typo.
|
Beta Was this translation helpful? Give feedback.
-
This (detailed design doc) sentence asserts surprising a claim without any backing rationale. If the traditional escape analysis says something can be stack allocated, how is it possible that, all of a sudden, it now cannot be stack allocated if it is under a region.Do() call? |
Beta Was this translation helpful? Give feedback.
-
The detailed design doc was very helpful in clarifying for me one main point. This is not a region design. I like it, and I think its great. I just think it is mis-named. Arenas and regions are synonyms, and this design is neither. It is an enhancement to the garbage collector based on a user annotation or hint. So my primary feedback is about the writing and presentation, not the merits of the design. But for the presentation: I think calling this design a region design does it a disservice. It creates confusion that could be avoided. I thought I caught the gist of what was going on from the title and the short description at the top. Only in the detailed design doc did I realize how wrong my assumptions had been. This design is not doing region management where So I would, foremost, suggest a rename. Any of these names would be better...
This would also immediately allow the detailed design doc to distinguish the |
Beta Was this translation helpful? Give feedback.
-
Would runtime diagnostics for fades be feasible? Particularly reporting the location of the pointer write that causes the fade. I could imagine some users of regions wanting fairly rigorous enforcement of not accidentally copying memory allocated within a region. Runtime diagnostics seem like a plausible way to deliver on this. |
Beta Was this translation helpful? Give feedback.
-
I read the detailed proposal, and the papers its inspired by. Overall, I can see myself using this in a few places which would bring some sanity back to the codebase. I think shipping good tooling together with this proposal would actually be key to making it much more useful. If we have something like the proposed heap with lifetime analysis, it would be much easier to identify good candidates. Personally, pprof would be preferred approach since we have it live in production and already use it to identify bottlenecks. |
Beta Was this translation helpful? Give feedback.
-
Do I understand proposed spec properly - region exists only for the enclosed function, and not functions called or goroutines created from it? If so, why region is not "inherited"? |
Beta Was this translation helpful? Give feedback.
-
I’m sympathetic to the “I don’t want to think like a garbage collector” argument but not so much to the "I don't want to think about control flow and object availability” argument. If done right the core infrastructure needed to support regions and fading builds on infrastructure and tools to better understand and manage object availability. The proposal lists a few such tools but doesn’t articulate the overall goal of such tools which goes well beyond servicing memory management to exploring and debugging object availability. A separate proposal to add these tools would be a great addition to the Go ecosystem. Common Lisp and Guy Steele created some terminology that can be used in such a proposal. See 3. Scope and Extent. Briefly objects are divided based on their availability. Availability determined the lexical scope has lexical extent. Availability that is bounded by dynamic scope has dynamic extent and globally available objects have indefinite extent. With these three concepts developers can reason about their object’s availability without having to think like a garbage collector. Reasoning about control flow and data availability is at the heart of concurrent programming and debugging such programs. The problem being worked on here is how to more efficiently manage memory for objects with dynamic extent. Currently the GC treats objects with dynamic extent and indefinite extent the same. This proposal says we can do dynamic extent better. Since this proposal is focused on the GC aspects of the discussion it sometimes conflates the problem with implementation terminology which is understandable. A separate proposal talking about availability, independent of memory management, would use terms like “extent” and “scope”. A proposal about memory management would build on the "extent" proposal and use region, fade, bound and so forth. More importantly the “extent” tools proposal could be accepted even if the “region” proposal is declined for performance reasons. |
Beta Was this translation helpful? Give feedback.
-
@mknyszek regarding Diagnostics, would it be feasible to have a new kind of runtime/trace span corresponding to memory regions and to record the number/size of allocations as well as the number/size of fades? |
Beta Was this translation helpful? Give feedback.
-
I wouldn't rule out rebinding from an inner to an outer region, at least in the spec, but I would make it optional so that the initial implementation doesn't have to do so. This is kinda like how resizing a slice may - or may not - relocate the unchanged part, and code using such slices is expected to cope with both possibilities. |
Beta Was this translation helpful? Give feedback.
-
Sorry, one more (naive) question. Sorry if this is way too far for the current spec. |
Beta Was this translation helpful? Give feedback.
-
While manual |
Beta Was this translation helpful? Give feedback.
-
Just wondering: if we were able to write generic code working for all functions Then we could perhaps enforce a region to not rely on implicit global state but having state passed as argument. Edit: and objects which assign pointers created in the region to implicitly available global objects (via methods for instance). It's still interprocedural. Perhaps that it could then become statically sound and (more) precise. Just thinking out loud, I might be forgetting some important details. [edit2] thinking about this again, not sure it brings much more. |
Beta Was this translation helpful? Give feedback.
-
I don't understand why the PGO "solution" isn't sufficient. The runtime tracks the total number of objects and their sizes allocated from each call point and tracks where each object is allocated. Then at a GC cycle, it determines the number of objects freed that were allocated from each call point down, against the number of objects still live from that call point. The functions with the highest free/live ratio are good candidates for an implicit region. You could also exclude regions to only those call sites where the live ~= 0. Since the regions are "safe", even if it makes a mistake it should be no worse than the worst case of a developer adding the hint at the wrong place. If the claim is that a developer doing this manually will be 100% correct, even with library usage, then you truly are moving Go into a manual memory management scenario. |
Beta Was this translation helpful? Give feedback.
-
It's not clear to me what the behavior of In general, I'm worried about the semantics of nested region.Do(func() {
z := new(MyStruct)
var y *MyStruct
region.Do(func() {
x := new(MyStruct)
use(x, z) // z may be freely used within this inner region.
y = x // x is unbound from any region.
})
use(y)
}) Suppose I'm writing a library, and I want to use regions to eagerly free temporary objects, but users may be calling my code from a region. If I don't use From reading the design doc, it wasn't clear to me if there's a reasonable way to change the nesting behavior, so objects fade only from the innermost region (and likewise for |
Beta Was this translation helpful? Give feedback.
-
I'm starting this discussion to collect early feedback on a draft design for a kind of region-based memory management in Go. There is no prototype yet, only a design and a preliminary evaluation.
Please read everything below before replying, especially the design discussion section.
(Feel free to skip the detailed design, unless you're interested.)
Background
The arena experiment adds a package consisting of a single type to the standard library:
Arena
. This type allows one to allocate data structures into it directly, and allows early release of the arena memory in bulk. In other words, it adds a form of region-based memory management to Go. The implementation is memory-safe insofar as use-after-frees will never result in memory corruption, only a potential crash. Arenas have achieved real performance wins, almost entirely due to earlier memory reuse and staving off GC execution.Unfortunately, the proposal to add arenas to the standard library is on indefinite hold due to the fact that they compose poorly with the language and standard library.
For example, builtin types all need special cases in the implementation, and require explicit slice- and map-related methods. Additionally, choosing to arena-allocate a variable means that it can never be stack-allocated, not without more complexity in the compiler.
Furthermore, for an API to make use of arenas, it must accept an additional argument: the arena to allocate into. There are far too many APIs that would need to be updated to make this integrate well with the language, and it would make those APIs worse.
The text below proposes a composable replacement for arenas in the form of user-defined goroutine-local memory regions.
Goals
First and foremost, our main goal is to reduce resource costs associated with the GC. If we can't achieve that, then this proposal is pointless.
The second most important goal is composability. Specifically:
sync.Pool
andunique.Handle
.Finally, whatever we implement must be relatively easy to use and intuitive for intermediate-to-advanced Go developers. We must offer tools for discovering where regions might be worth it, and where they aren't working out.
Design
The core of this design revolves around a pair of functions that behave like annotations of function calls. It's useful to think of them as annotations, because crucially, they do not affect the correctness of code, bugs notwithstanding.
The annotations indicate whether the user expects most or all the memory allocated by some function call (and its callees) to stay local to that function (and its callees), and to be unreachable by the time that function returns. If these expectations hold, then that memory is eagerly reclaimed when the function returns, bypassing the garbage collector. If these expectations do not hold for some memory, then that memory is opted out of this early reclaim; management is passed on to the garbage collector as normal.
Below is the proposed new API which explains the semantics in more detail.
For some very basic examples, see the detailed design doc, or the next section.
Comparison with arenas
Where an arena might be used like...
... regions would be used like so:
You can think of a region as an implicit goroutine-local arena that lives for the duration of some function call. That goroutine-local arena is used for allocating all the memory needed by that function call and its callees (including maps, slices, structs, etc.). Thanks to some compiler and runtime magic (see below), if any of that memory would cause a use-after-free issue, it is automatically removed from the arena and handed off to the garbage collector instead.
In practice, we've found that the vast majority of arena uses tightly limit the arena's lifetime to that of a particular function, usually the one they are created in, like the example above. This fact suggests that regions will most likely be usable in most of the same circumstances as arenas.
Summary of benefits and costs
The core benefit is the potential for reduced GC overheads. An additional, more minor benefit is the potential for more efficient memory allocation. If the application code follows the region discipline, it makes much more sense to introduce a bump-pointer allocator for that memory (something like Immix; see the detailed design.
As alluded to in the previous section, some "magic" is required to dynamically escape memory from the region to the general heap. The magic is a goroutine-local write barrier (goroutine-local because it is only enabled on that goroutine, inside the region). We believe that we have a write barrier design that is cheap enough to make this worthwhile, incurring between 1–4% worst-case overhead when enabled globally, depending on the application (so it will be less in practice, limited to the goroutines that use it). We believe that this can be easily won back and then some in GC-heavy applications, provided their memory usage patterns line up with the region's assumptions.
However, this assumes that most or all memory in a region does not escape. The cost of promoting memory is higher, approximately the same cost as reallocating that memory on the heap (that is not how it would be implemented, but it gives you a sense of the cost).
Detailed design and implementation
For more details, please see the complete design document, which includes:
Detailed draft design.
(Note that the full design doc introduces a new term for memory 'escaping' a region ("fading") to avoid overloading with the compiler's 'escape analysis'. These mean the same thing.)
Design discussion
Below are a few discussion points that have come up often in early feedback, as well as my responses to those discussion points.
Goroutine-local region state seems problematic. Why is it OK?
Enabling region-based allocation for all variables created by a goroutine delivers a clear win if the vast majority of your memory allocated adheres to the region discipline. It's really OK if a small percentage (say, under 5%) of memory allocations escape from the region to the heap.
Also, note that the idea of implicitly opting-in memory was discarded for arenas, but that's because arenas can possibly introduce use-after-free crashes. If you use regions incorrectly, your program will not crash.
Will code owners need to consider applying
region.Ignore
everywhere?One concern that was raised multiple times early in the design was whether
region.Ignore
would encourage tightly controlling allocations within regions so heavily that users would start pestering library owners to wrap certain portions of code inregion.Ignore
.While this is something that could happen, I hope it would be rare, and I would encourage maintainers to push back on such requests if they occur. As mentioned in the previous discussion point, it's really OK if a small percentage of memory allocations escape from the region to the heap.
For example, I would explicitly advocate for not wrapping
(*sync.Pool).New
withregion.Ignore
in the standard library. Why? Because if you're using aPool
effectively, the number of steady-state allocations made should be quite small in practice, and easily overtaken by region allocations.Given the concern however, perhaps we should remove
region.Ignore
from the design until we get more experience with it.Possible extensions
Using PGO to automatically disable costly regions
If at compile time we see from a profile that the region slow paths are "hot" inside a particular region, the compiler can disable that region and potentially report that it did so. This technique has the potential to make monitoring more automatic.
Provide a
GOEXPERIMENT
to make every goroutine implicitly a regionThis
GOEXPERIMENT
makes it easy to quickly turn regions on and off for an entire application. I suspect the majority of performance-sensitive Go applications, such as web services, would benefit from wrapping entire requests (usually captured by a single goroutine) in a region.This idea is equivalent to enabling the request-oriented collector, an experimental garbage collector from early in Go's life, designed by Rick Hudson and Austin Clements. The difference between that design and this one is in the details: separately managed memory, and a much cheaper write barrier fast path.
This may also combine well with dynamically disabling regions with PGO.
Provide a
GODEBUG
to disable all regionsThis allows for quicker rollback and experimentation. We can also extend this
GODEBUG
to work with compile-time hash bisection to identify costly regions efficiently. This is made possible due to the fact that regions do not change the semantics of the program.Next steps
Although fairly fleshed out, this design does not yet have a prototype. Before making such an investment, we wish to gauge interest from the community.
Once we feel that broad interest exists, we may prioritize it. This would then involve building a prototype, available as a
GOEXPERIMENT
, which would then be used to steer the design, possibly enough toward approval. Note that we plan to remove arenas from the standard library once this prototype is created.Beta Was this translation helpful? Give feedback.
All reactions