-
Notifications
You must be signed in to change notification settings - Fork 11
What is the best way to read a pin from Posit Connect directly into polars? #233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Oh this is a great question! There is currently no official way to read pins directly into polars. I would likely lean towards your option 2. I do see many packages moving to a dataframe library agnostic approach and am definitely interested in what that would look like for pins 👀 |
Thanks for your help @isabelizimm! I am working on some examples for posit::conf, I will update them to use option 2. I did not realize Pandas was already a dependency! |
Some thoughts about a design for supporting multiple DataFrame families in I agree with @SamEdwardes that the best thing would be a new string argument like In the longer run, something global might change the behaviour of None: import pins
pins.set_default_df_provider("polars") Perhaps this would be at the board level rather than completely globally. Also, the On the other hand, it's not too hard to run Lastly, I think that |
Hey, just to add 2 cents here -- since pins has Board classes, there could be an option for the DataFrame class constructor to use? Something like... board = BaseBoard(..., frame_cls=pd.DataFrame)
# or in the board constructors
board_s3(..., frame_cls=pd.DataFrame) This way...
(Alternatively, I def think some kind of global option like @nathanjmcdougall is super reasonable!) Avoiding recoding option docstrings over and overOne downside to setting it as a parameter on things like One option for avoiding tht could be osmething like an BaseBoard(..., Options(frame_cls=pl.DataFrame)) This way, you'd just have to document a set of |
What do people feel about some blend of a global option (with the default being pandas) + |
I think that's a good balance and has the advantage that it avoids adding a One question to answer would be: what if the file type has multiple df libraries supported, but none of them are the global deafult or board default? There's still an ambiguity in that case. So maybe the API needs to allow you to set an ordered prioritization between libraries. Alternatively, the user could just temporarily change the global default... that's a bit hacky though. Also, what if a user does want to deal with multiple df libraries per board? One way to handle this might be to have a context manager implementation like this, which would over-ride any other board or global config: with pins.force_df_provider("polars"):
df = pin_read(...) |
I don't have a strong preference on the implantation, but one outcome that would be nice for for things like intellisense to work. For example, VS Code should know if it is getting a pandas dataframe or polars dataframe back so that it can show me the correct auto-completion. |
What is the best way to read a pin from Posit Connect directly into polars?
I have found a few options that work, but none of them are exactly what I am looking for.
(1) Use pin_download
This method is my favourite, but it requires one extra step, I can't get a polars dataframe directly from
pin_read
.(2) Use pandas
This works, but requires me to have pandas installed. I also assume there is some kind of performance hit b/c you first read into pandas, then into polars?
(3) Use fsspec
I have not figured out how to implement this yet, but I found this example for duckdb:
#193
Is there a way to do this with polars?
Related issues
Suggestions
It would be nice if pins gave me a choice about what DataFrame library is used with pin_read. For example:
The text was updated successfully, but these errors were encountered: