-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ticker.history return messy tz info #208
Comments
Hi @ericwbzhang, thanks for raising this. The bug's here, in how the dataframes for each symbol are concatenated: yahooquery/yahooquery/ticker.py Lines 1324 to 1325 in cb5b2bd
Rather than retain the timezone info of the labels being concatenated, it'll use the first label it comes across that represents the same timestamp, regardless of the timezone. In your example you'll see that the labels that seem out for the HK symbol are correct as timestamps, they are just represented in a different timezone to what would be expected. This happens wherever the label represents a timestamp that's already present in the data for the NYM symbol.
It's probably worth noting that when you request data for symbols trading in different timezones AND have the
|
Right, this is annoying, I'm not able to create the required It might be bug in pandas (I've raised it with them here) but I think it's just as likely that it can't be done given how Assuming it's not a bug in pandas, options that occur to me are:
@dpguthrie, what do you reckon? I'm happy to offer up a PR for the second option if you agree that's the way to go. As a related aside, I think the default for |
@maread99 thanks for looking into this issue. I am thinking maybe returning an extra column that states the tzinfo for each symbol if adj_timezone= True. The date col will always be utc datetime |
Hi @ericwbzhang. IMHO, and the idea I get from having followed the package's evolution, yahooquery is about providing an API to get data from Yahoo Finance and it should not be concerned with doing much more than a bare minimum of post-processing. Rather post-processing should be left to the client according to their needs (although I might be off-the-mark with this - @dpguthrie ?). The extra column you suggest could be easily added by any user who would find it useful (a symbol's timezone is available through the API). I'm not sure if you're suggesting that this column should have dtype 'str' and show the timezone (e.g. "America/New_York") or have dtype 'object' and hold |
|
Yeah agree with Doug. Simply retuning a df with column symbol and date would be great. Sent from my iPhoneOn Jul 19, 2023, at 17:38, Doug Guthrie ***@***.***> wrote:
Definitely agree that this package should be a wrapper around the API and return, for the most part, what the underlying data is and not do much more.
@maread99 Maybe we don't need a pd.MultiIndex - pretty sure people end up resetting the index anyway after getting the results.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Just to clarify, are we suggesting having the dataframe with index as the default integer index (first row 0) and adding a 'symbol' and a 'date' column, i.e. as currently retuned by I think this would certainly resolve the issue. I'm happy to offer the PR and revise the doc and affected tests - let me know. Only thing, I suspect the change would break pretty much every dependant that uses the history method. Maybe one for 2.4, even 3.0? |
@maread99 As always, appreciate the work and timely responses! I've definitely gotten feedback in the past that the MultiIndex is somewhat undesirable and that most people do in fact It is definitely a breaking change. In my opinion, it's more of a minor breaking change (so upgrading to 2.4) instead of a major one (upgrading to 3.0). Would love to hear your opinion, and anyone else's, on what the appropriate code path and semver path should be. |
Then why combine the tables at all? Just return a dict of individual tables.
My suggestion: put this change in a new major version like 3.0, but continue supporting 2.* for a good while. Let users migrate in their own time. |
I think this is a good point. At market-prices I just end up splitting the DataFrame back into individual tables. I suspect at least some users will be doing something similar. Seems to make sense to offer the rawer option and leave it to the user to decide what they want to do with it. This would also nicely solve the tz matter that started off this issue - there would be no room for conflict between 'same timestamp different timezone', rather if So there's a couple of options to resolve the tz conflicts - either as if the index were reset or offer the return as a VersioningWhichever way this is resolved it's going to break the code of almost everyone who uses's the
If it were me I'd probably go 2.4 then 2.5 - stable prior version to pin to and no responsibility to maintain a separate branch. |
Hi I notice the price history returns messy tz info when working with more than one ticker in different tz
Example:
tmp_ticker= yq.Ticker( ['CLK24.NYM', '6618.HK'])
a= tmp_ticker.history(interval='5m', start='2023-07-10').reset_index()
part of returns:
6618.HK,2023-07-13 15:45:00+08:00,55.50000,55.80000,55.50000,55.70000,147000.00000
6618.HK,2023-07-13 15:50:00+08:00,55.80000,55.90000,55.70000,55.75000,157050.00000
6618.HK,2023-07-13 15:55:00+08:00,55.80000,55.80000,55.55000,55.60000,86600.00000
6618.HK,2023-07-13 04:05:00-04:00,55.60000,55.70000,55.60000,55.70000,359100.00000
6618.HK,2023-07-14 09:30:00+08:00,55.70000,56.15000,55.35000,55.90000,0.00000
6618.HK,2023-07-14 09:35:00+08:00,55.70000,55.75000,55.10000,55.35000,80950.00000
pd.version =1.5.3
yq.version= 2.3.2
The text was updated successfully, but these errors were encountered: