Compressed pointer addressing in single and 2-tier mode #188

guptask · 2023-01-17T10:58:26Z

In this change, compressed pointer stores the tier index, slab index and alloc index. With slab worth 22 bits of data, if we have the min allocation size as 64 bytes, that requires 16 bits for storing the alloc index. The tier id occupies the 32nd bit only since its value cannot exceed 2. This leaves the remaining 15 bits for the slab index. Hence we can index 128 GiB of memory per tier in multi-tier configuration or index 256 GiB in single-tier configuration.
Backward compatibility has been ensured in this change. Till the multi-tier code is up or review, modular change such as compressed pointer ensures backward compatibility. This code will undergo some change when the multi-tier code is ready for review.

therealgymmy

looks good. I'm not expecting a perf change since the boolean flag is either always true or false at runtime, but will run a sanity test to be sure.

therealgymmy · 2023-02-03T18:10:05Z

cachelib/allocator/CCacheAllocator.cpp

@@ -36,7 +36,8 @@ CCacheAllocator::CCacheAllocator(MemoryAllocator& allocator,
      currentChunksIndex_(0) {
  auto& currentChunks = chunks_[currentChunksIndex_];
  for (auto chunk : *object.chunks()) {
-    currentChunks.push_back(allocator_.unCompress(CompressedPtr(chunk)));
+    // TODO : pass multi-tier flag when compact cache supports multi-tier config
+    currentChunks.push_back(allocator_.unCompress(CompressedPtr(chunk), false));


just a nit: false /* isMultiTier */

Can you also annotate in other places we pass in the boolean for compress and uncompress calls?

therealgymmy · 2023-02-03T18:12:41Z

cachelib/allocator/memory/CompressedPtr.h

+// We compress pointers by storing the tier index, slab index and alloc index of
+// the allocation inside the slab. With slab worth kNumSlabBits (22 bits) of
+// data, if we have the min allocation size as 64 bytes, that requires
+// kNumSlabBits - 6 = 16 bits for storing the alloc index.  The tier id occupies
+// the 32nd bit only since its value cannot exceed kMaxTiers (2). This leaves
+// the remaining (32 - (kNumSlabBits - 6) - 1 bit for tier id) =  15 bits  for
+// the  slab  index. Hence we can index 128 GiB of memory per tier in multi-tier
+// configuration or index 256 GiB in single-tier configuration.


nit: can you add more details to the pointer math for both single-tier and multi-tier config?

e.g.

// Single-tier Pointer Compression // With slab worth kNumSlabBits of data, if we // have the min allocation size as 64 bytes, that requires kNumSlabBits - 6 // bits for storing the alloc index. This leaves the remaining (32 - // (kNumSlabBits - 6)) bits for the slab index. Hence we can index 256 GiB // of memory in slabs and index anything more than 64 byte allocations inside // the slab using a 32 bit representation. // Multi-tier Point Compression // With slab worth kNumSlabBits (22 bits) of // data, if we have the min allocation size as 64 bytes, that requires // kNumSlabBits - 6 = 16 bits for storing the alloc index. The tier id occupies // the 32nd bit only since its value cannot exceed kMaxTiers (2). This leaves // the remaining (32 - (kNumSlabBits - 6) - 1 bit for tier id) = 15 bits for // the slab index. Hence we can index 128 GiB of memory per tier.

therealgymmy · 2023-02-03T18:21:32Z

cachelib/allocator/memory/CompressedPtr.h

  // Number of bits for the slab index. This will be the top 16 bits of the
  // compressed ptr.


Update comments here. With multi-tier, the slab index is top 15 bits.

therealgymmy · 2023-02-03T18:25:40Z

cachelib/allocator/memory/SlabAllocator.h

-    const SlabIdx slabIndex = ptr.getSlabIdx();
-    const uint32_t allocIdx = ptr.getAllocIdx();
+    const SlabIdx slabIndex = ptr.getSlabIdx(isMultiTiered);
+    const uint32_t allocIdx = ptr.getAllocIdx(isMultiTiered);
    const Slab* slab = &slabMemoryStart_[slabIndex];


Can you add a comment/TODO here just to clarify multi-tier has no effect here in accessing slab memory since we haven't incorporated the actual multi-tier logic yet.

facebook-github-bot · 2023-02-03T18:27:03Z

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jaesoo-fb

It is hard to figure out if this change makes sense without the succeeding changes. Why don't you include other commits in the stack in the pull request?

jaesoo-fb · 2023-02-03T23:30:51Z

cachelib/allocator/memory/CompressedPtr.h

-      : ptr_(compress(slabIdx, allocIdx)) {}
+  CompressedPtr(uint32_t slabIdx,
+                uint32_t allocIdx,
+                bool isMultiTiered,


isMultiTiered looks redundant. We are storing the tid at the MSB of the compressed ptr. In this case, we can just interpret isMultiTiered = false if tid == 0. Additional logic in CompressedPtr can be simplified a lot in this way.

The bit-packing format changes for multi-tier compressed pointer. The 32nd bit is reserved for tid only for multi-tiered compressed pointer. When config is single-tiered, compressed pointer bit packing used the original design - 32nd bit is not reserved for tid.
tid ==0 means cachelib may or may not be multi-tiered.

jaesoo-fb · 2023-02-03T23:33:39Z

cachelib/allocator/memory/CompressedPtr.h

+    auto noTierIdPtr = isMultiTiered ? ptr_ & ~kTierIdxMask : ptr_;
+    return static_cast<uint32_t>(noTierIdPtr & kAllocIdxMask);


This change is not needed. MSB 16 bits of kAllocIdxMask should be 0. For the parameter isMultiTiered, it would not be used, but looks OK if you want to have for consistency?

you're right. I pushed up the correct version now.

jaesoo-fb · 2023-02-03T23:35:00Z

cachelib/allocator/memory/CompressedPtr.h

+  void setTierId(TierId tid) noexcept {
+    ptr_ += static_cast<uint32_t>(tid) << kNumTierIdxOffset;


What is the use of this? Is this needed for succeeding changes?

yes, It will be referred to in subsequent upstream PRs. For complete picture, please refer to intel#56

jaesoo-fb · 2023-02-03T23:39:55Z

cachelib/allocator/memory/SlabAllocator.h

-    return CompressedPtr{slabIndex, allocIdx};
+    return CompressedPtr{slabIndex, allocIdx, isMultiTiered};


So, tier id is not set here. How are you going to provide the tier id?

I think you are going to have a separate MemoryAllocator per tier, so separate SlabAllocator. If so, I think the tier ID could be a member variable of the SlabAllocator?

What about the PtrCompressor then? Would CacheAllocator create separate PtrCompressor per MemoryAllocator?

Please refer to this commit intel@2704ac8#diff-a6542b6dbf2cfb5e03e82205ee960757ab2b50de7bc25085089a3cffba40ae87 which shows how tier Id is passed to the alloctor.compress() methods.

This change is due to be sent for upstream review soon.

facebook-github-bot · 2023-02-14T22:12:43Z

@guptask has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-02-14T22:20:37Z

@guptask has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-02-15T01:36:16Z

@guptask has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-02-15T01:38:20Z

@guptask has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-02-15T01:49:23Z

@guptask has updated the pull request. You must reimport the pull request before landing.

guptask · 2023-02-16T05:27:34Z

It is hard to figure out if this change makes sense without the succeeding changes. Why don't you include other commits in the stack in the pull request?

Decision was made to upstream the multi-tier changes in phases. This upstream plan has been discussed with Meta as well. Please refer to intel#56 for the phased plan.

facebook-github-bot · 2023-02-16T05:55:33Z

@guptask has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-02-16T21:23:22Z

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

therealgymmy · 2023-03-02T22:11:30Z

@guptask: Hey I didn't expect this but I have observed about 1.5% CPU regression. (I did A/B and then B/A in our internal test setups to verify the difference exists).

Have you tried any cachebench runs on your end to see if throughput is affected with this change?

guptask · 2023-03-14T23:07:38Z

@guptask: Hey I didn't expect this but I have observed about 1.5% CPU regression. (I did A/B and then B/A in our internal test setups to verify the difference exists).

Have you tried any cachebench runs on your end to see if throughput is affected with this change?

@therealgymmy I ran several sets of experiment with compressed_ptr changes (rebased to latest main) and the main branch:

graph cache follower fbobj with opRatePerSec set to 1m.
CDN workload with opRatePerSec set to 500k.
benchmark-test-PtrCompressionBench with -bm_min_iters=10000

In all experiments, I'm not seeing any regression in stats and CPU usage.

Can you please provide me with more details on how to reproduce this regression ?

…ng in single tier mode and use 31 bits for addressing in multi-tier mode

facebook-github-bot · 2023-03-14T23:29:05Z

@guptask has updated the pull request. You must reimport the pull request before landing.

guptask · 2023-03-31T20:16:26Z

@guptask: Hey I didn't expect this but I have observed about 1.5% CPU regression. (I did A/B and then B/A in our internal test setups to verify the difference exists).
Have you tried any cachebench runs on your end to see if throughput is affected with this change?

@therealgymmy I ran several sets of experiment with compressed_ptr changes (rebased to latest main) and the main branch:

graph cache follower fbobj with opRatePerSec set to 1m.

CDN workload with opRatePerSec set to 500k.

benchmark-test-PtrCompressionBench with -bm_min_iters=10000

In all experiments, I'm not seeing any regression in stats and CPU usage.

Can you please provide me with more details on how to reproduce this regression ?

HI @therealgymmy is there any further update on this regression issue from your end ? I didn't observe any performance regression after rebasing the upstream branch.

therealgymmy · 2023-04-03T16:46:56Z

@guptask: hey i haven't had a chance to re-run the A/B test. Will kick off a run today.

facebook-github-bot · 2023-04-03T16:49:34Z

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

therealgymmy · 2023-04-05T18:25:55Z

Looks like I made a mistake of not building A/B packages close together. Last time my "control" A-side was the production package whereas "test" B-side has many more diffs including this one. This time, A and B only differ in this diff, and the cpu difference is consistent during A/B and after I flipped it to B/A.

facebook-github-bot · 2023-04-07T22:29:22Z

@therealgymmy merged this pull request in 7f66996.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 17, 2023

byrnedj mentioned this pull request Feb 2, 2023

Rebased develop with NUMA bindings and proposed upstream PRs intel/CacheLib#56

Closed

therealgymmy reviewed Feb 3, 2023

View reviewed changes

jaesoo-fb reviewed Feb 3, 2023

View reviewed changes

guptask force-pushed the compressed_ptr branch from d749fbd to 14efce0 Compare February 14, 2023 22:12

guptask force-pushed the compressed_ptr branch from 14efce0 to 6a882b2 Compare February 14, 2023 22:20

guptask force-pushed the compressed_ptr branch from 6a882b2 to c6eb25e Compare February 15, 2023 01:36

guptask force-pushed the compressed_ptr branch from c6eb25e to 87f0da6 Compare February 15, 2023 01:38

guptask force-pushed the compressed_ptr branch from 87f0da6 to 88a5522 Compare February 15, 2023 01:49

guptask force-pushed the compressed_ptr branch from 88a5522 to 969d5c9 Compare February 16, 2023 05:55

guptask requested review from therealgymmy and jaesoo-fb and removed request for therealgymmy and jaesoo-fb February 16, 2023 19:27

added ability for compressed pointer to use full 32 bits for addressi…

56dc400

…ng in single tier mode and use 31 bits for addressing in multi-tier mode

guptask force-pushed the compressed_ptr branch from 969d5c9 to 56dc400 Compare March 14, 2023 23:29

guptask requested review from therealgymmy and removed request for jaesoo-fb March 14, 2023 23:29

therealgymmy approved these changes Apr 5, 2023

View reviewed changes

facebook-github-bot closed this in 7f66996 Apr 7, 2023

facebook-github-bot added the Merged label Apr 7, 2023

		// Number of bits for the slab index. This will be the top 16 bits of the
		// compressed ptr.

		auto noTierIdPtr = isMultiTiered ? ptr_ & ~kTierIdxMask : ptr_;
		return static_cast<uint32_t>(noTierIdPtr & kAllocIdxMask);

		void setTierId(TierId tid) noexcept {
		ptr_ += static_cast<uint32_t>(tid) << kNumTierIdxOffset;

		return CompressedPtr{slabIndex, allocIdx};
		return CompressedPtr{slabIndex, allocIdx, isMultiTiered};

Compressed pointer addressing in single and 2-tier mode #188

Compressed pointer addressing in single and 2-tier mode #188

Conversation

guptask commented Jan 17, 2023

therealgymmy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Feb 3, 2023

jaesoo-fb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guptask Feb 16, 2023 • edited Loading

Choose a reason for hiding this comment

jaesoo-fb Feb 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaesoo-fb Feb 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Feb 14, 2023

facebook-github-bot commented Feb 14, 2023

facebook-github-bot commented Feb 15, 2023

facebook-github-bot commented Feb 15, 2023

facebook-github-bot commented Feb 15, 2023

guptask commented Feb 16, 2023

facebook-github-bot commented Feb 16, 2023

facebook-github-bot commented Feb 16, 2023

therealgymmy commented Mar 2, 2023

guptask commented Mar 14, 2023 • edited Loading

facebook-github-bot commented Mar 14, 2023

guptask commented Mar 31, 2023

therealgymmy commented Apr 3, 2023

facebook-github-bot commented Apr 3, 2023

therealgymmy commented Apr 5, 2023

facebook-github-bot commented Apr 7, 2023

guptask Feb 16, 2023 •

edited

Loading

jaesoo-fb Feb 3, 2023 •

edited

Loading

jaesoo-fb Feb 3, 2023 •

edited

Loading

guptask commented Mar 14, 2023 •

edited

Loading