-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Purge bad Yahoo data from requests_cache
#1204
base: dev
Are you sure you want to change the base?
Conversation
36cadc2
to
abb0da1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great with documentation of how to use session, but not enthusiastic about the extra layer of added complexity when it should be up to the user to handle the session cache.
Most stuff there with handling errors can be done by the actual owner of the cache with request hooks.
If yfinance adds this support of only caching successful requests it should probably also be done by registering request hooks (documented at requests_cache site) in order to not complicate the class.
yfinance/base.py
Outdated
@@ -147,7 +147,9 @@ def history(self, period="1mo", interval="1d", | |||
return utils.empty_df() | |||
|
|||
if end is None: | |||
end = int(_time.time()) | |||
dt_now = _pd.Timestamp(_datetime.datetime.utcnow()).tz_localize("UTC") | |||
midnight = _pd.Timestamp.combine(dt_now.tz_convert(tz).date()+_datetime.timedelta(days=1), _datetime.time(0)).tz_localize(tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would caching it as being data up to next day make the cache return old data?
For example, I do a request for daily data before the exchange close then it will cache data missing the days data.
Then later that day after close when ticker data for that day is present the old cached data will be returned as long as it is done with the same session object.
Tip: I think the easiest way to get midnight is to use Timestamps floor method:
pd.Timestamp.utcnow().tz_convert("CET").floor("D")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This problem equally applies to following: Ticker.history(period="1d")
Here user must evict old data, via requests_cache
expiration.
Good suggestion on midnight.
The pruning is targeting situations when the request itself was successful, and what failed was the subsequent processing in yfinance because Yahoo embedded incomplete/wrong/no data. This happens when long lists of tickers trigger their spam system e.g. #1087 So I'm thinking if those heavy users would be reluctant to use a session cache. |
abb0da1
to
0840b60
Compare
I've moved the checking/pruning into a hook function. |
Nice feature! |
requests_cache
67ecfa7
to
4d8ca37
Compare
Improvements to
requests_cache
session use to encourage users use.- Replace 'datetime.now()' with midnightSome financial data was being fetched using parameters derived from<- Moved to separate PR #1284datetime.now()
, so obviously will never get reused in session cache. Replace with midnightWhen Yahoo returns garbage data, then automatically remove from session cache. Garbage = parsing fails to find expected keys etc. Can disable this if e.g. debugging problem, via
yfinance.disable_prune_session_cache()
. @fredrik-corneliusson did I miss any fetches?