-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
h2o.H2OFrame.as_data_frame() leads to OSError #16045
Comments
Thank you for taking time to pinpoint the issue. Unfortunately, I don't have Windows machine so I have just 2 untested hypotheses: If it's just the (1), could you provide us with part of the h2o log? I'm interested in log entry
If that is the only problem, you could workaround it by adding something like the following to the top of your script/jupyter notebook (just make sure the path exists). import tempfile
tempfile.tempdir = "C:\\tmp\\" If it's the (2), we will need to fix creating the temporary file. It should be a simple thing to fix. I think something like the following would do. cc @wendycwong --- a/h2o-py/h2o/frame.py
+++ b/h2o-py/h2o/frame.py
@@ -1970,12 +1970,16 @@ class H2OFrame(Keyed, H2ODisplay):
if can_use_pandas() and use_pandas:
import pandas
if (can_use_datatable()) or (can_use_polars() and can_use_pyarrow()): # can use multi-thread
- with tempfile.NamedTemporaryFile(suffix=".h2oframe2Convert.csv") as exportFile:
+ exportFile = tempfile.NamedTemporaryFile(suffix=".h2oframe2Convert.csv", delete=False)
+ try:
+ exportFile.close()
h2o.export_file(self, exportFile.name, force=True)
if can_use_datatable(): # use datatable for multi-thread by default
return self.convert_with_datatable(exportFile.name)
elif can_use_polars() and can_use_pyarrow(): # polar/pyarrow if datatable is not available
return self.convert_with_polars(exportFile.name)
+ finally:
+ os.unlink(exportFile.name)
warnings.warn("converting H2O frame to pandas dataframe using single-thread. For faster conversion using"
" multi-thread, install datatable (for Python 3.9 or lower), or polars and pyarrow "
"(for Python 3.10 or above).", H2ODependencyWarning)
You can patch you h2o library using that code but it might get little more involved. If it's just the (2) I think we could manage to release the fix in the upcoming major release (likely within the next month). If the problem is in (1) as well we would probably require your help in providing us with the line from the log. |
Hi @tomasfryda Thanks for your reponse. Here is that part of the logs:
Sounds like it might be (2) rather? |
Thank you for the log entry. I think we'll need to fix both or at least make sure the problem is not in (1) as well. The exception from java has combination of path separators (
|
@tomasfryda Do you have any workaround for the fix. |
@kalaiselvan263 Not yet. I think the modification I suggested (#16045 (comment)) would work but I don't have a windows machine to test it on. You would need to find where the h2o package is installed and navigate to file If that wouldn't work you can change the import random
exportFile = "C:\\tmp\\h2o_tempfile_{}.csv".format(random.randint(0,1e8)) It's not perfect and with this change there could be issues with multiple users trying to do the same thing at the same time but the probability of that is pretty low (1e-8) and on Windows you'd be more likely to end up with the same error |
I found the key to the problem, you just need to uninstall datatable : ) |
H2O version, Operating System and Environment
H2O 3.44.0.3, Python 3.11, Windows 10
(environment also has pyarrow 14.0.2 and polars 0.20.6)
Python Code
Directly taken from the documentation of
h2o.H2OFrame.as_data_frame()
:OSError*
Worked before using h2o 3.42.0.3
Looking at the changelog for changes made to
as_data_frame()
, I've tried downgrading to version3.42.0.3
where the above code still works fine for me.The text was updated successfully, but these errors were encountered: