-
Notifications
You must be signed in to change notification settings - Fork 276
Compressed pointer addressing in single and 2-tier mode #188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. I'm not expecting a perf change since the boolean flag is either always true or false at runtime, but will run a sanity test to be sure.
@@ -36,7 +36,8 @@ CCacheAllocator::CCacheAllocator(MemoryAllocator& allocator, | |||
currentChunksIndex_(0) { | |||
auto& currentChunks = chunks_[currentChunksIndex_]; | |||
for (auto chunk : *object.chunks()) { | |||
currentChunks.push_back(allocator_.unCompress(CompressedPtr(chunk))); | |||
// TODO : pass multi-tier flag when compact cache supports multi-tier config | |||
currentChunks.push_back(allocator_.unCompress(CompressedPtr(chunk), false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nit: false /* isMultiTier */
Can you also annotate in other places we pass in the boolean for compress and uncompress calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// We compress pointers by storing the tier index, slab index and alloc index of | ||
// the allocation inside the slab. With slab worth kNumSlabBits (22 bits) of | ||
// data, if we have the min allocation size as 64 bytes, that requires | ||
// kNumSlabBits - 6 = 16 bits for storing the alloc index. The tier id occupies | ||
// the 32nd bit only since its value cannot exceed kMaxTiers (2). This leaves | ||
// the remaining (32 - (kNumSlabBits - 6) - 1 bit for tier id) = 15 bits for | ||
// the slab index. Hence we can index 128 GiB of memory per tier in multi-tier | ||
// configuration or index 256 GiB in single-tier configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you add more details to the pointer math for both single-tier and multi-tier config?
e.g.
// Single-tier Pointer Compression
// With slab worth kNumSlabBits of data, if we
// have the min allocation size as 64 bytes, that requires kNumSlabBits - 6
// bits for storing the alloc index. This leaves the remaining (32 -
// (kNumSlabBits - 6)) bits for the slab index. Hence we can index 256 GiB
// of memory in slabs and index anything more than 64 byte allocations inside
// the slab using a 32 bit representation.
// Multi-tier Point Compression
// With slab worth kNumSlabBits (22 bits) of
// data, if we have the min allocation size as 64 bytes, that requires
// kNumSlabBits - 6 = 16 bits for storing the alloc index. The tier id occupies
// the 32nd bit only since its value cannot exceed kMaxTiers (2). This leaves
// the remaining (32 - (kNumSlabBits - 6) - 1 bit for tier id) = 15 bits for
// the slab index. Hence we can index 128 GiB of memory per tier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// Number of bits for the slab index. This will be the top 16 bits of the | ||
// compressed ptr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update comments here. With multi-tier, the slab index is top 15 bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
const SlabIdx slabIndex = ptr.getSlabIdx(); | ||
const uint32_t allocIdx = ptr.getAllocIdx(); | ||
const SlabIdx slabIndex = ptr.getSlabIdx(isMultiTiered); | ||
const uint32_t allocIdx = ptr.getAllocIdx(isMultiTiered); | ||
const Slab* slab = &slabMemoryStart_[slabIndex]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment/TODO here just to clarify multi-tier has no effect here in accessing slab memory since we haven't incorporated the actual multi-tier logic yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is hard to figure out if this change makes sense without the succeeding changes. Why don't you include other commits in the stack in the pull request?
: ptr_(compress(slabIdx, allocIdx)) {} | ||
CompressedPtr(uint32_t slabIdx, | ||
uint32_t allocIdx, | ||
bool isMultiTiered, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isMultiTiered
looks redundant. We are storing the tid at the MSB of the compressed ptr. In this case, we can just interpret isMultiTiered = false
if tid == 0
. Additional logic in CompressedPtr
can be simplified a lot in this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bit-packing format changes for multi-tier compressed pointer. The 32nd bit is reserved for tid only for multi-tiered compressed pointer. When config is single-tiered, compressed pointer bit packing used the original design - 32nd bit is not reserved for tid.
tid ==0 means cachelib may or may not be multi-tiered.
auto noTierIdPtr = isMultiTiered ? ptr_ & ~kTierIdxMask : ptr_; | ||
return static_cast<uint32_t>(noTierIdPtr & kAllocIdxMask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not needed. MSB 16 bits of kAllocIdxMask
should be 0. For the parameter isMultiTiered
, it would not be used, but looks OK if you want to have for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right. I pushed up the correct version now.
void setTierId(TierId tid) noexcept { | ||
ptr_ += static_cast<uint32_t>(tid) << kNumTierIdxOffset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use of this? Is this needed for succeeding changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, It will be referred to in subsequent upstream PRs. For complete picture, please refer to intel#56
return CompressedPtr{slabIndex, allocIdx}; | ||
return CompressedPtr{slabIndex, allocIdx, isMultiTiered}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, tier id is not set here. How are you going to provide the tier id?
I think you are going to have a separate MemoryAllocator
per tier, so separate SlabAllocator
. If so, I think the tier ID could be a member variable of the SlabAllocator
?
What about the PtrCompressor
then? Would CacheAllocator
create separate PtrCompressor
per MemoryAllocator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to this commit intel@2704ac8#diff-a6542b6dbf2cfb5e03e82205ee960757ab2b50de7bc25085089a3cffba40ae87 which shows how tier Id is passed to the alloctor.compress() methods.
This change is due to be sent for upstream review soon.
d749fbd
to
14efce0
Compare
@guptask has updated the pull request. You must reimport the pull request before landing. |
14efce0
to
6a882b2
Compare
@guptask has updated the pull request. You must reimport the pull request before landing. |
6a882b2
to
c6eb25e
Compare
@guptask has updated the pull request. You must reimport the pull request before landing. |
c6eb25e
to
87f0da6
Compare
@guptask has updated the pull request. You must reimport the pull request before landing. |
87f0da6
to
88a5522
Compare
@guptask has updated the pull request. You must reimport the pull request before landing. |
Decision was made to upstream the multi-tier changes in phases. This upstream plan has been discussed with Meta as well. Please refer to intel#56 for the phased plan. |
88a5522
to
969d5c9
Compare
@guptask has updated the pull request. You must reimport the pull request before landing. |
@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@guptask: Hey I didn't expect this but I have observed about 1.5% CPU regression. (I did A/B and then B/A in our internal test setups to verify the difference exists). Have you tried any cachebench runs on your end to see if throughput is affected with this change? |
@therealgymmy I ran several sets of experiment with compressed_ptr changes (rebased to latest main) and the main branch:
In all experiments, I'm not seeing any regression in stats and CPU usage. Can you please provide me with more details on how to reproduce this regression ? |
…ng in single tier mode and use 31 bits for addressing in multi-tier mode
@guptask has updated the pull request. You must reimport the pull request before landing. |
HI @therealgymmy is there any further update on this regression issue from your end ? I didn't observe any performance regression after rebasing the upstream branch. |
@guptask: hey i haven't had a chance to re-run the A/B test. Will kick off a run today. |
@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Looks like I made a mistake of not building A/B packages close together. Last time my "control" A-side was the production package whereas "test" B-side has many more diffs including this one. This time, A and B only differ in this diff, and the cpu difference is consistent during A/B and after I flipped it to B/A. |
@therealgymmy merged this pull request in 7f66996. |
In this change, compressed pointer stores the tier index, slab index and alloc index. With slab worth 22 bits of data, if we have the min allocation size as 64 bytes, that requires 16 bits for storing the alloc index. The tier id occupies the 32nd bit only since its value cannot exceed 2. This leaves the remaining 15 bits for the slab index. Hence we can index 128 GiB of memory per tier in multi-tier configuration or index 256 GiB in single-tier configuration.
Backward compatibility has been ensured in this change. Till the multi-tier code is up or review, modular change such as compressed pointer ensures backward compatibility. This code will undergo some change when the multi-tier code is ready for review.