You can download 243 papers from ArXiv converted using pdfminer from our S3 bucket (around 4MB)
If you want, you can also download the original PDFs too from our S3 bucket (around 400MB).
You can download the metadata and keywords of 10000 processed papers at this github repo or by typing:
git clone [email protected]:EtymoIO/OpenData.git
Description of who we are and what we do.
Main methods of extraction, the main problems and which one looks the best.
Main problems:
- Whitespace
- Equation conversions
- Figures
- References
How to download a bundle of text conversions.
Overview of main methods
TODO: make heuristic better
Briefly explain how it works in general terms, show examples
Build up method from very simple to more complicated.
Can we use the author's name?
Deep learning? Object recognition
How to download a bundle of PDFs. Brief look at some examples.