Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean, Refactor/Fix and Document updated text-reuse approach #9

Open
4 tasks
piconti opened this issue Jan 24, 2025 · 0 comments
Open
4 tasks

Clean, Refactor/Fix and Document updated text-reuse approach #9

piconti opened this issue Jan 24, 2025 · 0 comments
Assignees

Comments

@piconti
Copy link
Member

piconti commented Jan 24, 2025

The text reuse has been updated and finalized, but there was not time to fully document the process.

There are several scripts that need refactoring, documentation and some adaptations to new naming conventions discussed afterwards.
This refacotring, cleaning and documentation will be done when applying the text-reuse detection on the next iteration of the corpus for the upcoming release.

This issue focuses on that, with the following action points:

  • Fix/adapt the parts of the scripts relating to how and where the data is written in S3.
  • Refactor the code when possible, making the code more modular
  • Comment the code
  • Document the overall approach and how to run everything from start to end.
@piconti piconti self-assigned this Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant