Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csvclean: Declare a stable Python API #1269

Open
DeflateAwning opened this issue Nov 18, 2024 · 6 comments
Open

csvclean: Declare a stable Python API #1269

DeflateAwning opened this issue Nov 18, 2024 · 6 comments
Labels

Comments

@DeflateAwning
Copy link

The docs (for example, https://csvkit.readthedocs.io/en/latest/scripts/csvclean.html) only show info about using the CLI tools. I'm hoping to use this project within a Python project.

Is there a way this can be included, and the CLI methods can be called with Python? If so, it would be awesome if this API's documentation was available in the readthedocs pages.

@jpmckinney
Copy link
Member

@DeflateAwning
Copy link
Author

Hmm that's strange; when I search "csvclean" on there, I don't see any indication that it's linked to this project at all. Maybe more docs explaining the interconnect would be good?

After searching around those docs quite a bit, I still don't see a way to use the "csvclean" functionality of this CLI tool. I stand by my point.

@jpmckinney
Copy link
Member

True, csvclean is one of the utilities that does not have any corresponding features in agate.

CSV Kit is not designed to be used as a Python library. Back around version 1.0, there was a major effort to move functionality to agate, which is designed as a Python library. However, not all functionality was moved over.

The remaining functionality in CSV Kit could be organized into a stable API. If we publish the API as part of the docs, then that is a public declaration of a stable API, and we will have to accept the overhead of managing API changes, properly documenting the API, etc. However, there is not a lot of demand (or maintainer availability) to do that. So, I would prefer to leave things as is. For csvclean, you can just read the code which is 100 lines here and then another 150 lines here, and you can call it from your code.

@jpmckinney jpmckinney closed this as not planned Won't fix, can't repro, duplicate, stale Nov 19, 2024
@DeflateAwning
Copy link
Author

Would it be okay if we kept this issue open to centralize discussion? Or does the issue belong over in agate?

@jpmckinney jpmckinney reopened this Dec 4, 2024
@jpmckinney jpmckinney changed the title Feature Request: Add the Python API to the docs csvclean: Add the Python API to the docs Dec 4, 2024
@jpmckinney jpmckinney changed the title csvclean: Add the Python API to the docs csvclean: Declare a stable Python API Dec 4, 2024
@jpmckinney
Copy link
Member

You can already use from csvkit.cleanup import RowChecker and use it in the same way as it is used in https://github.com/wireservice/csvkit/blob/master/csvkit/utilities/csvclean.py

I'll leave the issue open to see if there is any interest.

@alvaro-osvaldo-tm
Copy link
Contributor

I thought about that, an alternative is to create a 'Builder Pattern' that separate the utility implementation and creates a stable and friendly API, for example.

from csvkit.builders import In2CSV

(In2CSV()
          .input('my-dataset.xls')
          .output('converted.csv')
          .addBom()
          .run())

And even expando to use pipelines as in Bash

from csvkit.builders import In2CSV,CSVGrep

csvGrep = CSVGrep()
csvGrep.column('first-name')

(In2CSV()
          .input('my-dataset.xls')
          .output(  csvGrep  )
          .addBom()
          .run())

Is not a straightforward implementation , but if done well can create an stable and friendly API , and allow the utility to evolve independently from the builder.

if done badly will constantly break and being heavily to maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants