Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Accelerate cupy array creation from DataFrame.values #16483

Open
bdice opened this issue Aug 2, 2024 · 5 comments
Open

[FEA] Accelerate cupy array creation from DataFrame.values #16483

bdice opened this issue Aug 2, 2024 · 5 comments
Assignees
Labels
feature request New feature or request Performance Performance related issue

Comments

@bdice
Copy link
Contributor

bdice commented Aug 2, 2024

Is your feature request related to a problem? Please describe.
Users with large numerical datasets (such as a dataframe with thousands of time-series columns) would like to be able to convert from a cuDF dataframe to a cupy array as quickly as possible. Currently we have a raw loop in Python that does casting and assignment for each column.

for i, col in enumerate(self._data.values()):
# TODO: col.values may fail if there is nullable data or an
# unsupported dtype. We may want to catch and provide a more
# suitable error.
matrix[:, i] = to_array(col, dtype)
return matrix

It should be possible to lower this into libcudf, and use a kernel that calls a batched memcpy from CCCL CUB to copy the same-type input columns into the matrix directly. Some columns may require casting, and that work could be launched in separate streams.

Describe the solution you'd like

template<typename T>
void table_to_array(cudf::table_view input, cudf::device_span<T> output) {
    CUDF_EXPECTS(std::all_of( ... /* all input columns are convertible to T */, cudf::data_type_error));
    // 1. Get boolean map of which columns already match the output type
    // 2. Call batchmemcpy on all matching columns
    // 3. Use a thrust transform with custom input and output iterators for casting all other types
    //    This is nontrivial but shouldn't be too hard. Some device-side type dispatch, maybe.
}

Describe alternatives you've considered
We might be able to use cudf::contiguous_copy_column_device_views, but that requires all the types to be the same. (I was wrong, this is not what I want.) I think the best performing solution would do casting of any compatible input type to the target type as it copies.

We could also make the API take a void * and a cudf::data_type output_dtype? I'm not sure. I think it is important for this to have an output parameter and let the data be allocated by cupy with matrix = cupy.empty(shape=(len(self), ncol), dtype=dtype, order="F") like we already do here.

@bdice bdice added the feature request New feature or request label Aug 2, 2024
@bdice
Copy link
Contributor Author

bdice commented Aug 2, 2024

We could also make the API take a void* output and a cudf::data_type output_dtype?

Yes, this is probably the right way. We ought to do host-side type-dispatch to determine which kernel to call, and device-side type-dispatch to handle casting the various input column types.

@bdice
Copy link
Contributor Author

bdice commented Aug 2, 2024

We might be able to shortcut this and just accelerate the "easy" path where all types are the same, to start out.

@mroeschke
Copy link
Contributor

xref #11648

@bdice
Copy link
Contributor Author

bdice commented Aug 2, 2024

Also xref #12928 - I don't think this is a duplicate issue, since that one focuses on transpose-related issues and this one offers concrete proposals for implementation. I think this might solve the same problem, but it may depend on the implementation choices.

@bdice bdice added the Performance Performance related issue label Aug 2, 2024
@bdice
Copy link
Contributor Author

bdice commented Mar 14, 2025

Sharing some notes from a related offline discussion:

  • cuDF table -> cupy array:
    • cannot be a view because columns aren't contiguous
    • a batchmemcpy can generate contiguous output very efficiently if we produce F ordered output, otherwise a transpose is needed (out of scope for this issue)
    • this path is the focus of this issue
  • cupy array -> cuDF table:
    • can be a view for each column (if input array is F ordered)
    • typically arrays are not F ordered unless you made them that way on purpose. In C-ordered cases, a transpose is needed.
  • cuDF lists column -> cupy array:
    • can be a view (zero-copy, produces C ordered output)
    • Snippet: ser.list.leaves.values.reshape(len(ser), -1)
  • cupy array -> cuDF lists column

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Performance Performance related issue
Projects
Status: Todo
Development

No branches or pull requests

3 participants