Is nvmLookupLatency just calculates job scheduling time? #332

james990309 · 2024-08-15T06:03:10Z

I am currently studying code related to latency calculation in Cache. (Especially nvmCache)

I found that latency calculation is done by creation/deletion of LatencyTracker object, and there are two kinds of LatencyTracker calculating find/lookup latency.

/cachelib/cachebench/cache/Cache-inl.h

tracking cacheFindLatency_ in Cache::find(key)

/cachelib/allocator/nvmcache/NvmCache-inl.h

tracking nvmLookupLatency_ in NvmCache::find(key)

Since there are it.wait() in Cache::find(key), I can guess that cacheFindLatency_ is calculated correctly after internal cache_->find() API finishes

/* Code for cacheFindLatency_ */
template <typename Allocator>
typename Cache<Allocator>::ReadHandle Cache<Allocator>::find(Key key) {
  auto findFn = [&]() {
    util::LatencyTracker tracker;
    if (FLAGS_report_api_latency) {
      tracker = util::LatencyTracker(cacheFindLatency_);
    }
    // find from cache and wait for the result to be ready.
    auto it = cache_->find(key);
    it.wait();

    if (touchValueEnabled()) {
      touchValue(it);
    }

    return it;
  };

  if (!consistencyCheckEnabled()) {
    return findFn();
  }

  auto opId = valueTracker_->beginGet(key);
  auto it = findFn();
  if (checkGet(opId, it)) {
    invalidKeys_[key.str()].store(true, std::memory_order_relaxed);
  }
  return it;
}

But I could not find it.wait() after calling lookup() function in NvmCache::find(key). Moreover, I found that NvmCache::find(key) calls navyCache_->lookupAsync() (asynchronous API), which makes me more confusing.

template <typename C>
typename NvmCache<C>::WriteHandle NvmCache<C>::find(HashedKey hk) {
  if (!isEnabled()) {
    return WriteHandle{};
  }

  util::LatencyTracker tracker(stats().nvmLookupLatency_);

  auto shard = getShardForKey(hk);
  // invalidateToken any inflight puts for the same key since we are filling
  // from nvmcache.
  inflightPuts_[shard].invalidateToken(hk.key());

  stats().numNvmGets.inc();
  
   ...

     // create a context
    auto newCtx = std::make_unique<GetCtx>(
        *this, hk.key(), std::move(waitContext), std::move(tracker));
    auto res =
        fillMap.emplace(std::make_pair(newCtx->getKey(), std::move(newCtx)));
    XDCHECK(res.second);
    ctx = res.first->second.get();
  } // scope for fill lock

  XDCHECK(ctx);
  auto guard = folly::makeGuard([hk, this]() { removeFromFillMap(hk); });

  navyCache_->lookupAsync(
      HashedKey::precomputed(ctx->getKey(), hk.keyHash()),
      [this, ctx](navy::Status s, HashedKey k, navy::Buffer v) {
        this->onGetComplete(*ctx, s, k, v.view());
      });
  guard.dismiss();
  return hdl;
}

I dig into several internal functions starting from NvmCache::find(key) (Driver::lookupAsync(hk, cb), EnginePair::scheduleLookup(hk, cb), OrderedThreadPoolJobScheduler::enqueueWithKey(job, name, type, key). But I could not find code that waits for scheduled lookup job to be finished.

/* Code for navyCache_->lookupAsync(), I thought that navyCache is some kind of Driver */
// There is no it.wait(), only calls scheduleLookup()

void Driver::lookupAsync(HashedKey hk, LookupCallback cb) {
  XDCHECK(cb);
  enginePairs_[selectEnginePair(hk)].scheduleLookup(hk, std::move(cb));
}

/* Code for EnginePair::scheduleLookup() */
// Still there are no it.wait(). only enqueue job to job scheduler

void EnginePair::scheduleLookup(HashedKey hk, LookupCallback cb) {
  scheduler_->enqueueWithKey(
      [this, cb = std::move(cb), hk, skipLargeItemCache = false]() mutable {
        Buffer value;
        Status status = lookupInternal(hk, value, skipLargeItemCache);
        if (status == Status::Retry) {
          return JobExitCode::Reschedule;
        }
        if (cb) {
          cb(status, hk, std::move(value));
        }

        return JobExitCode::Done;
      },
      "lookup",
      JobType::Read,
      hk.keyHash());
}

/* Code for OrderedThreadPoolJobScheduler::enqueueWithKey() */
// If job cannot be served immediately, it just add job to pending job list and return (I guess)

void OrderedThreadPoolJobScheduler::enqueueWithKey(Job job,
                                                   folly::StringPiece name,
                                                   JobType type,
                                                   uint64_t key) {
  const auto shard = key % numShards(numShardsPower_);
  JobParams params{std::move(job), type, name, key};
  std::lock_guard<std::mutex> l(mutexes_[shard]);
  if (shouldSpool_[shard]) {
    // add to the pending jobs since there is already a job for this key
    pendingJobs_[shard].emplace_back(std::move(params));
    numSpooled_.inc();
    currSpooled_.inc();
  } else {
    shouldSpool_[shard] = true;
    scheduleJobLocked(std::move(params), shard);
  }
}

Is nvmLookupLatency just calculates scheduling time into job scheduler? Or am I missing puzzle piece that enables synchronous nvmLookup while calling asynchronous lookup API internally?

Sorry, I am newbie to coding/cachelib so maybe this is a stupid question. But I need help to understand latency tracking inside nvmCache. Thank you for your help in advance.

The text was updated successfully, but these errors were encountered:

haowu14 · 2024-08-16T00:27:56Z

nvmLookupLatency counts the time between NvmCache::find is called and the result is fetched.
It starts couting here: https://github.com/facebook/CacheLib/blob/main/cachelib/allocator/nvmcache/NvmCache.h#L575
This tracker is then moved into a GetCtx (context) https://github.com/facebook/CacheLib/blob/main/cachelib/allocator/nvmcache/NvmCache.h#L1016.
The counting stops when the context object goes out of scope, which happens at the end of this fuction: https://github.com/facebook/CacheLib/blob/main/cachelib/allocator/nvmcache/NvmCache.h#L1070

This function (onGetComplete) is called by a callback that is invoked when the get is completed: https://github.com/facebook/CacheLib/blob/main/cachelib/allocator/nvmcache/NvmCache.h#L658

If you dig more, you can see that this callback is called when a lookup job is picked up from the scheduler and completed. So no .wait() is needed for this to work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is nvmLookupLatency just calculates job scheduling time? #332

Is nvmLookupLatency just calculates job scheduling time? #332

james990309 commented Aug 15, 2024 •

edited

Loading

haowu14 commented Aug 16, 2024 •

edited

Loading

Is nvmLookupLatency just calculates job scheduling time? #332

Is nvmLookupLatency just calculates job scheduling time? #332

Comments

james990309 commented Aug 15, 2024 • edited Loading

haowu14 commented Aug 16, 2024 • edited Loading

james990309 commented Aug 15, 2024 •

edited

Loading

haowu14 commented Aug 16, 2024 •

edited

Loading