Skip to content

siddharth7113/Neurips-paper-search

Repository files navigation

NeurIPS Paper Search Project

This project is a semantic search tool for NeurIPS papers from the past 10 years. It uses ChromaDB for vector storage, sentence-transformers for semantic embeddings, and includes a Streamlit-based UI for interaction. Additionally, a t-SNE plot helps visualize clusters of similar papers in 3D space.

Features

  • Semantic search on NeurIPS papers.
  • Categorization of papers into specific fields.
  • 3D t-SNE visualization of paper embeddings.
  • Interactive web app using Streamlit.

Project Structure

.
├── chromadb                # Database folder for ChromaDB
├── data
│   └── neurips_papers_last10years.json  # JSON file containing the NeurIPS paper data
├── paper_categories.csv    # CSV file mapping paper titles to categories
├── papers_with_tsne.csv    # CSV file with embeddings and t-SNE coordinates
├── requirements.txt        # Python dependencies for the project
├── scripts
│   ├── categorize_papers.py       # Script for categorizing papers
│   ├── ingest_data.py             # Script to ingest paper data into ChromaDB
│   ├── query_engine.py            # Query engine for semantic search
│   └── tsne_visualization.py      # Script for generating t-SNE plots
├── streamlit_app
│   └── app.py              # Streamlit app for interactive paper search
├── tsne_3d_plot.html       # Pre-generated 3D t-SNE visualization
└── setup.py                # Setup script for the project

Getting Started

Prerequisites

  • Python 3.10+
  • pip

Installation

  1. Clone the repository:

    git clone https://github.com/siddharth899/neurips-paper-search.git
    cd neurips-paper-search
  2. Install required dependencies:

    pip install -r requirements.txt
  3. Download the paper data into the data folder (if not already provided).


Usage

Ingest Data

To prepare the database for semantic search:

python scripts/ingest_data.py

Categorize Papers

Categorize papers into specific fields:

python scripts/categorize_papers.py

Run Streamlit App

Start the Streamlit app to perform semantic searches:

streamlit run streamlit_app/app.py

Generate t-SNE Visualization

To create or update the t-SNE plot:

python scripts/tsne_visualization.py

Screenshots

1. Semantic Search Interface

Semantic Search Screenshot

2. 3D t-SNE Visualization

TNSE Embeddings TNSE Embeddings


Contributing

Feel free to contribute to this project:

  1. Fork the repository.
  2. Create a new branch:
    git checkout -b feature-name
  3. Commit changes:
    git commit -m "Description of changes"
  4. Push the branch:
    git push origin feature-name
  5. Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.


Acknowledgments


Contact

For any inquiries, feel free to reach out.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published