This project is a semantic search tool for NeurIPS papers from the past 10 years. It uses ChromaDB for vector storage, sentence-transformers for semantic embeddings, and includes a Streamlit-based UI for interaction. Additionally, a t-SNE plot helps visualize clusters of similar papers in 3D space.
- Semantic search on NeurIPS papers.
- Categorization of papers into specific fields.
- 3D t-SNE visualization of paper embeddings.
- Interactive web app using Streamlit.
.
├── chromadb # Database folder for ChromaDB
├── data
│ └── neurips_papers_last10years.json # JSON file containing the NeurIPS paper data
├── paper_categories.csv # CSV file mapping paper titles to categories
├── papers_with_tsne.csv # CSV file with embeddings and t-SNE coordinates
├── requirements.txt # Python dependencies for the project
├── scripts
│ ├── categorize_papers.py # Script for categorizing papers
│ ├── ingest_data.py # Script to ingest paper data into ChromaDB
│ ├── query_engine.py # Query engine for semantic search
│ └── tsne_visualization.py # Script for generating t-SNE plots
├── streamlit_app
│ └── app.py # Streamlit app for interactive paper search
├── tsne_3d_plot.html # Pre-generated 3D t-SNE visualization
└── setup.py # Setup script for the project
- Python 3.10+
- pip
-
Clone the repository:
git clone https://github.com/siddharth899/neurips-paper-search.git cd neurips-paper-search
-
Install required dependencies:
pip install -r requirements.txt
-
Download the paper data into the
data
folder (if not already provided).
To prepare the database for semantic search:
python scripts/ingest_data.py
Categorize papers into specific fields:
python scripts/categorize_papers.py
Start the Streamlit app to perform semantic searches:
streamlit run streamlit_app/app.py
To create or update the t-SNE plot:
python scripts/tsne_visualization.py
Feel free to contribute to this project:
- Fork the repository.
- Create a new branch:
git checkout -b feature-name
- Commit changes:
git commit -m "Description of changes"
- Push the branch:
git push origin feature-name
- Open a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
For any inquiries, feel free to reach out.