-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalidate empty value metastore cache entry for a non-existent table #24068
base: master
Are you sure you want to change the base?
Invalidate empty value metastore cache entry for a non-existent table #24068
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fix @nmahadevuni. Just wondering which one of the below should be a more ideal way to solve this -
- Not put any entry in the cache when the table does not exist
- Force a refresh like this fix.
I think option 1 would involve only one get call from the cache and one fetch from metastore when retrieving the details about the table as compared to option 2 where we execute two get calls from cache and one fetch from metastore. Thoughts?
Thanks, Pratyaksh.
Another solution is we can invalidate the entry if we get a empty object, but I thought refresh would be a better way to handle this, since if any other session is trying to load the table in to cache, then this refresh would be a no-op. |
Thank you for the analysis Naveen. I am actually a bit more inclined towards adding a check for empty object as an alternative for the reasons mentioned in my previous comment. Also, the chances of concurrent sessions trying to deal with the same non existent table are pretty less I think. We can probably wait to hear from one of the committers if you think otherwise. |
Yeah, that loader call happens in the third party library, we cannot intercept that, we can intercept only in our value loader method, but cannot influence the put logic. |
My bad, on taking a closer look at the code, I get what you are saying. Even the computation of the value to be put in the cache happens inside the guava library. The current fix makes sense and looks good to me. |
@nmahadevuni Thanks for making the changes. |
// Presto creates the table, Presto keeps throwing "Table does not exist" until refresh time is reached. So, | ||
// forcing a refresh for non-existent table once here. | ||
if (!table.isPresent()) { | ||
tableCache.refresh(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Just thinking out loud, should we invalidate the entry instead of refreshing it? While this may not lead to a significant improvement, invalidating the entry—rather than caching an empty table—could save an extra call to the metastore when the table is empty the very first time. Since we plan to call the metastore for any cached empty table, it might be more efficient not to cache it at all. This approach would also provide more accurate JMX metrics. If we continue to cache non-existent tables, the count of actual cached tables may not be accurate since these empty table entries would also be counted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am aligned with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have changed it to invalidate the entry.
f287ec6
to
47d0496
Compare
If we query a table that is non-existent, we store [Key=Table, Value=Optional.Empty] cache entry. If the table is created outside Presto, we keep throwing "Table does not exist" until refresh time is reached. Invalidating the empty value to fix this.
47d0496
to
0492b34
Compare
I have added unit tests in TestInMemoryCachingHiveMetastore.java which simulates this scenario. |
@imjalpreet @agrawalreetika @pratyakshsharma Thank you. Addressed your comments. Please review. @ZacBlanco Can you please review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we just shouldn't store the reference to the table in the cache in the first place rather than invalidating it here. I think I would actually prefer throwing a TableNotFoundException
in the cache loader than just storing Optional.empty if this is the desired behavior. These semantics make the behavior of the cache non-obvious to future readers since the logic is split among two locations
Description
If we query a table that is non-existent, we store [Key=Table, Value=Optional.Empty] metastore cache entry. If the table is created outside Presto, we keep throwing "Table does not exist" until refresh time is reached. Invalidating this empty value to fix this.
Motivation and Context
Changes to tables by different tools is common in disaggregated query engines. This fix tries to solve a possible stale cache entry case when a table is created outside of Presto, after querying the same non-existent table once from Presto.
Impact
No impact
Test Plan
No new tests added.