Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is nvmLookupLatency just calculates job scheduling time? #332

Open
james990309 opened this issue Aug 15, 2024 · 1 comment
Open

Is nvmLookupLatency just calculates job scheduling time? #332

james990309 opened this issue Aug 15, 2024 · 1 comment

Comments

@james990309
Copy link

james990309 commented Aug 15, 2024

I am currently studying code related to latency calculation in Cache. (Especially nvmCache)

I found that latency calculation is done by creation/deletion of LatencyTracker object, and there are two kinds of LatencyTracker calculating find/lookup latency.

  1. /cachelib/cachebench/cache/Cache-inl.h
  • tracking cacheFindLatency_ in Cache::find(key)
  1. /cachelib/allocator/nvmcache/NvmCache-inl.h
  • tracking nvmLookupLatency_ in NvmCache::find(key)

Since there are it.wait() in Cache::find(key), I can guess that cacheFindLatency_ is calculated correctly after internal cache_->find() API finishes

/* Code for cacheFindLatency_ */
template <typename Allocator>
typename Cache<Allocator>::ReadHandle Cache<Allocator>::find(Key key) {
  auto findFn = [&]() {
    util::LatencyTracker tracker;
    if (FLAGS_report_api_latency) {
      tracker = util::LatencyTracker(cacheFindLatency_);
    }
    // find from cache and wait for the result to be ready.
    auto it = cache_->find(key);
    it.wait();

    if (touchValueEnabled()) {
      touchValue(it);
    }

    return it;
  };

  if (!consistencyCheckEnabled()) {
    return findFn();
  }

  auto opId = valueTracker_->beginGet(key);
  auto it = findFn();
  if (checkGet(opId, it)) {
    invalidKeys_[key.str()].store(true, std::memory_order_relaxed);
  }
  return it;
}

But I could not find it.wait() after calling lookup() function in NvmCache::find(key). Moreover, I found that NvmCache::find(key) calls navyCache_->lookupAsync() (asynchronous API), which makes me more confusing.

template <typename C>
typename NvmCache<C>::WriteHandle NvmCache<C>::find(HashedKey hk) {
  if (!isEnabled()) {
    return WriteHandle{};
  }

  util::LatencyTracker tracker(stats().nvmLookupLatency_);

  auto shard = getShardForKey(hk);
  // invalidateToken any inflight puts for the same key since we are filling
  // from nvmcache.
  inflightPuts_[shard].invalidateToken(hk.key());

  stats().numNvmGets.inc();
  
   ...

     // create a context
    auto newCtx = std::make_unique<GetCtx>(
        *this, hk.key(), std::move(waitContext), std::move(tracker));
    auto res =
        fillMap.emplace(std::make_pair(newCtx->getKey(), std::move(newCtx)));
    XDCHECK(res.second);
    ctx = res.first->second.get();
  } // scope for fill lock

  XDCHECK(ctx);
  auto guard = folly::makeGuard([hk, this]() { removeFromFillMap(hk); });

  navyCache_->lookupAsync(
      HashedKey::precomputed(ctx->getKey(), hk.keyHash()),
      [this, ctx](navy::Status s, HashedKey k, navy::Buffer v) {
        this->onGetComplete(*ctx, s, k, v.view());
      });
  guard.dismiss();
  return hdl;
}

I dig into several internal functions starting from NvmCache::find(key) (Driver::lookupAsync(hk, cb), EnginePair::scheduleLookup(hk, cb), OrderedThreadPoolJobScheduler::enqueueWithKey(job, name, type, key). But I could not find code that waits for scheduled lookup job to be finished.

/* Code for navyCache_->lookupAsync(), I thought that navyCache is some kind of Driver */
// There is no it.wait(), only calls scheduleLookup()

void Driver::lookupAsync(HashedKey hk, LookupCallback cb) {
  XDCHECK(cb);
  enginePairs_[selectEnginePair(hk)].scheduleLookup(hk, std::move(cb));
}
/* Code for EnginePair::scheduleLookup() */
// Still there are no it.wait(). only enqueue job to job scheduler

void EnginePair::scheduleLookup(HashedKey hk, LookupCallback cb) {
  scheduler_->enqueueWithKey(
      [this, cb = std::move(cb), hk, skipLargeItemCache = false]() mutable {
        Buffer value;
        Status status = lookupInternal(hk, value, skipLargeItemCache);
        if (status == Status::Retry) {
          return JobExitCode::Reschedule;
        }
        if (cb) {
          cb(status, hk, std::move(value));
        }

        return JobExitCode::Done;
      },
      "lookup",
      JobType::Read,
      hk.keyHash());
}
/* Code for OrderedThreadPoolJobScheduler::enqueueWithKey() */
// If job cannot be served immediately, it just add job to pending job list and return (I guess)

void OrderedThreadPoolJobScheduler::enqueueWithKey(Job job,
                                                   folly::StringPiece name,
                                                   JobType type,
                                                   uint64_t key) {
  const auto shard = key % numShards(numShardsPower_);
  JobParams params{std::move(job), type, name, key};
  std::lock_guard<std::mutex> l(mutexes_[shard]);
  if (shouldSpool_[shard]) {
    // add to the pending jobs since there is already a job for this key
    pendingJobs_[shard].emplace_back(std::move(params));
    numSpooled_.inc();
    currSpooled_.inc();
  } else {
    shouldSpool_[shard] = true;
    scheduleJobLocked(std::move(params), shard);
  }
}

Is nvmLookupLatency just calculates scheduling time into job scheduler? Or am I missing puzzle piece that enables synchronous nvmLookup while calling asynchronous lookup API internally?

Sorry, I am newbie to coding/cachelib so maybe this is a stupid question. But I need help to understand latency tracking inside nvmCache. Thank you for your help in advance.

@haowu14
Copy link
Contributor

haowu14 commented Aug 16, 2024

nvmLookupLatency counts the time between NvmCache::find is called and the result is fetched.
It starts couting here: https://github.com/facebook/CacheLib/blob/main/cachelib/allocator/nvmcache/NvmCache.h#L575
This tracker is then moved into a GetCtx (context) https://github.com/facebook/CacheLib/blob/main/cachelib/allocator/nvmcache/NvmCache.h#L1016.
The counting stops when the context object goes out of scope, which happens at the end of this fuction: https://github.com/facebook/CacheLib/blob/main/cachelib/allocator/nvmcache/NvmCache.h#L1070

This function (onGetComplete) is called by a callback that is invoked when the get is completed: https://github.com/facebook/CacheLib/blob/main/cachelib/allocator/nvmcache/NvmCache.h#L658

If you dig more, you can see that this callback is called when a lookup job is picked up from the scheduler and completed. So no .wait() is needed for this to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants