-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(gossip): reduce gossip table memory usage #526
base: main
Are you sure you want to change the base?
Conversation
when overwriting, there was a bug (recently introduced) where the overwritten item is supposed to be returned to the caller, and they deinit it. instead, the *new* item was being returned as if it was the overwritten item. the caller would deinit the new item, leaving corrupted memory in the gossip table (which would eventually be double freed). the overwritten item would never be freed, so it leaked also adds a test and adds a stack trace to an error log
…ayList to save space
…usize index, just using tagged index this simplificaiton means you can no longer index with an integer over all items. this is a limitation, but this extra indexing makes the data structure use a surprising amount of memory. instead i've opted to make the data structure a simpler building block, and the user of the struct can implement a wrapper with this indexing if needed. in this case, its internal indexing was redundant with indexing that GossipMap is already doing. so by stripping out this indexing, GossipMap doesn't need any extra logic to handle it. i've just simplified the code and reduced the memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few nits - otherwise lgtm
@@ -191,6 +191,25 @@ const GossipDataTag = enum(u32) { | |||
ContactInfo, | |||
RestartLastVotedForkSlots, | |||
RestartHeaviestFork, | |||
|
|||
pub fn Value(self: GossipDataTag) type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value
lowercase methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
types are conventionally capitalized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it being capital is correct
|
||
const assert = std.debug.assert; | ||
|
||
pub const GossipMap = struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to add a doc comment about this struct with more context on why we have it/the pros of it
}; | ||
} | ||
|
||
pub const Entry = struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these nested structs make sense but also feel off and make it difficult to read - feel like most of these structs can be outside of the struct - since all the logic is already enclosed in map.zig - so the context/space is already defined okay i think
self.logger.err().logf("failed to generate push messages: {any}", .{e}); | ||
self.logger.err().logf( | ||
"failed to generate push messages: {any}\n{any}", | ||
.{ e, @errorReturnTrace() }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I encountered this error message and it wasn't very helpful for debugging the issue. Adding the stack trace enabled me to figure out the problem. I'd actually like to move towards this being a standard pattern any time we log an error. I was considering a few options like:
- service manager can do this whenever any a service exits with an error
- the logger could have a method that's specifically for logging errors and capturing their stack trace (not sure if this would work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an interesting idea. I understand how the error return trace could be helpful here, but I also find it very noisy and distracting. If this is an error message that could happen, but isn't fatal, and happens somewhat regularly (example: "NoPeers found"), I'd rather not commit code with the error return trace printing.
This is like leaving random debug prints in the code, it can be helpful for debugging, but it's annoying to others using the code. Unless I'm specifically trying to debug why this error message happened, I wouldn't like to see the trace.
@@ -64,6 +64,188 @@ pub fn RecyclingList( | |||
}; | |||
} | |||
|
|||
pub fn SplitUnionList(TaggedUnion: type) type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to add a doc comment explaining the high level of this helper
} | ||
} | ||
|
||
pub const Entry = struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe move this to be above getEntry
pub fn next(self: *Iterator) ?Entry { | ||
defer self.cursor += 1; | ||
return if (self.cursor < self.keys.len) .{ | ||
.key_ptr = &self.keys[self.cursor], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just give iterator self
and then call getEntryByIndex
?
|
||
// replace data with newly swapped value | ||
const entry = self.store.getEntryByIndex(entry_index); | ||
// gossip_data now points to the element which was swapped in and needs updating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think this comment is off
|
||
var keys = std.ArrayList(GossipKey).init(std.testing.allocator); | ||
defer keys.deinit(); | ||
// var data = std.ArrayList(GossipVersionedData).init(std.testing.allocator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need this commented out code?
for (0..num_items) |i| { | ||
const versioned_value = gossip_values[i]; | ||
var gossip_values = gossip_table.store.iterator(); | ||
for (0..num_items) |_| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not:
for (0..num_items) |_| { | |
while (gossip_values.next()) |versioned_value| |
?
@@ -46,7 +47,7 @@ pub fn AutoArrayHashSet(comptime T: type) type { | |||
/// | |||
/// Analogous to [Crds](https://github.com/solana-labs/solana/blob/e0203f22dc83cb792fa97f91dbe6e924cbd08af1/gossip/src/crds.rs#L68) | |||
pub const GossipTable = struct { | |||
store: AutoArrayHashMap(GossipKey, GossipVersionedData), | |||
store: GossipMap = .{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add a default value here and not initialize in the init
function instead? I'd rather us use the init
function if it already exists, default values are a bit of a smell.
const metadata = self.map.metadata.items[index]; | ||
if (metadata.timestamp_on_insertion >= self.minimum_insertion_timestamp) { | ||
switch (self.map.tagOfIndex(index)) { | ||
inline .ContactInfo => |tag| return self.map.getTypedPtr(tag, index), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a useless inline
keyword, same thing on the next line. I'd rather you just use getTypedPtr(.ContactInfo
, no need to be fancy as it's just harder to read.
.ContactInfo => { | ||
const did_remove = self.contact_infos.swapRemove(table_len); | ||
std.debug.assert(did_remove); | ||
self.contact_infos.put(entry_index, {}) catch unreachable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why these can't fail.
.ContactInfo => |ci| if (ci.getSocket(.gossip)) |addr| { | ||
if (addr.eql(&gossip_addr)) return try ci.clone(); | ||
switch (self.store.tagOfIndex(index)) { | ||
inline .ContactInfo => |tag| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing again, useless inline
and just complicates the code.
|
||
pub const Index = struct { | ||
tag: Tag, | ||
index: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making the sub-index a non-exhaustive enum for type safety?
@@ -77,6 +77,28 @@ pub fn ReturnType(comptime FnPtr: type) type { | |||
}; | |||
} | |||
|
|||
/// Same as std.EnumFieldStruct, except every field may be a different type | |||
pub fn EnumStruct(comptime E: type, comptime Data: fn (E) type) type { | |||
@setEvalBranchQuota(@typeInfo(E).Enum.fields.len + 5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the + 5
come from?
for (&struct_fields, @typeInfo(E).Enum.fields) |*struct_field, enum_field| { | ||
const T = Data(@field(E, enum_field.name)); | ||
struct_field.* = .{ | ||
.name = enum_field.name ++ "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ++ ""
?
.type = T, | ||
.default_value = null, | ||
.is_comptime = false, | ||
.alignment = if (@sizeOf(T) > 0) @alignOf(T) else 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why special case @sizeOf(T) == 0
? I don't think align(0)
does what you think it does. It selects the default alignment for the target, which we don't care about here. (it also doesn't exist in Zig 0.14 anymore).
const Tag = @typeInfo(TaggedUnion).Union.tag_type.?; | ||
|
||
return struct { | ||
lists: sig.utils.types.EnumStruct(Tag, List), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to have an ArrayList(u8) thats bump allocated from? Where each item is struct { Tag, DataType }
(user reads Tag
first, then fieldParentPtr the struct to get the Data
ptr).
The gossip table was previously using a regular hashmap to hold up to 1 million instances of GossipData, which is a 732 byte tagged union due to ContactInfo being 728 bytes. This means the hashmap would be at least 732 MB when full. Running on mainnet I witnessed it growing to over 1 GB.
The bulk of this data is unused due to the union being so large. Most of the union fields are actually much smaller, around 50-100 bytes. There are only around ~5000 contact infos being stored, but each of the ~1 million other elements are being allocated 732 bytes since they are instances of the same union.
The solution for this is to store each union variant's unwrapped type in a separate array. So if an item is 50 bytes, it only uses 50 bytes in the array, instead of using 732 bytes as union. This is implemented in SplitUnionList, which contains a separate ArrayList for every union field. GossipMap is a new struct that wraps this, mapping from GossipKey to the items within SplitUnionList.
I still need to do more rigorous benchmarking, but initial tests are showing that the gossip table is about 20% of its previous size.