-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow computer vision task models to run without tf.data #2128
Comments
Contributions are welcome here, but this is a fairly abstract problem that would need some scouting out first. We could try to leverage Keras' DataAdapter here, I'm not sure how to best iterate over the dataset and apply the Keras layer. This is probably something best prototyped for a number of vision tasks first (classification, detection, segmentation). |
@mattdangerw I would love to contribute to this issue. I think that we can relax our reliance on tf.data for computer vision tasks by converting inputs to NumPy arrays and then using a simple Python generator for batching and preprocessing. This approach should allow us to efficiently support Torch and JAX backends. While converting to NumPy arrays does add a bit of overhead, it's generally efficient when working with array-like inputs, and using a generator helps mitigate any performance impact by handling data in manageable batches. Let me know how this sounds. |
Hi,
Let me know your thoughts, thank you. |
I was trying to run some tests to understand more, particularly Is the plan to support any type of input? Like Numpy, TfDataset, PyDataset and so on? If so am thinking about monkey patching or something to inject these transformations on the fly. Not great but should work with 0 overhead |
Hi @mattdangerw what are the next steps here? |
First step (of a few), to slowly relax our reliance on tf.data for preprocessing.
Our text models are more heavily reliant on tf.data because of the tf-text dependency. Our image models do not have this constraint.
We could try to allow running preprocessing without tf.data when running on the torch and jax backends. To do so, we would need to stop always converting to a
tf.data.Dataset
in our pipeline model helper here and find a way to still apply preprocessing to the iterator efficiently.The text was updated successfully, but these errors were encountered: