Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use collect-mail --url for W3C/3GPP/IEEE/etc #597

Open
laurenmarietta opened this issue Jul 12, 2023 · 1 comment
Open

Use collect-mail --url for W3C/3GPP/IEEE/etc #597

laurenmarietta opened this issue Jul 12, 2023 · 1 comment

Comments

@laurenmarietta
Copy link

The documentation for how to scrape datasets shows that you can use either collect-mail --url or collect-mail --file when scraping IETF mailing lists, but only collect-mail --file when scraping W3C/3GPP/IEEE/etc mailing lists.

From my (admittedly limited) poking around in the code, it seems like mailman.collect_archive_from_url could be pretty simply rewritten using the code already in the documentation (linked above) to allow the --url option to work for all of the different mailing list types. Which I imagine might be useful for those who are coming to this package without necessarily wanting to download hundreds of mailing lists in one go?

(Please forgive if there is an existing issue about this or if I've wildly misunderstood the code in mailman.py, I've just been getting acquainted with the package! 😅 )

@sbenthall
Copy link
Collaborator

Thanks for this. It's right on.
It's related to an issue that's just come up, which is that it's much easier to download mbox files from the new IETF mailing list archive interface.
So we will need a mailman ingest from files very soon.

Streamlining the CLI so that it automatically recognizes whether something is a URL or a file name is a nice idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants