The APIs can be accessed and tested via the swagger in http://localhost:8000/docs/
- Install dependencies by
pip install -r requirements.txt
- Run Neo4j and update
.env
file - Run
python add_data.py
to add the extracted scientific knowledge graph into the graph database (takes quite a long time) - Run
python cluster_and_drop.py
to drop some semantic/syntactic duplications - Run
python gen_vocab.py
to generate a replication of vocaburary from the graph database - Run
python app.py
to serve the endpoints (also generate a set of embedding vectors from the previous step if not exist)
The following files contain many essential field using for constructing knowledge graph. You can modify the dataset and script to add more information to the graph.
Metadata of arxiv dataset retreived from Cornell-University/arxiv filtering only Computation and Language (CL) category.
Citations and references for each publication in the arxiv cs.CL dataset
The combination of retreived metadata from Cornell-University/arxiv and additional essential fields