An API service for academic topic classification based on OpenAlex's predictor model.
- Python 3.10+
- curl or wget for downloading model artifacts
- Download the trained model and artifacts:
wget https://zenodo.org/records/10568402/files/topic_classifier_v1_artifacts.tar.gz
- Create models directory and extract artifacts:
mkdir -p model
tar -xzf topic_classifier_v1_artifacts.tar.gz -C model
- Install the uv package manager:
curl -LsSf https://astral.sh/uv/install.sh | sh
- Create and activate virtual environment:
uv venv
source .venv/bin/activate
- Install dependencies:
uv pip install -r requirements.txt --no-cache-dir
- Start the development server:
uvicorn main:app --reload --port <PORT>
-
Endpoint:
/health_check
-
Method:
GET
-
Description: Checks the service and model health.
-
Example Request:
curl http://localhost:<PORT>/health_check
-
Example Response:
{ "status": "healthy", "model": "loaded" }
-
Endpoint:
/single
-
Method:
POST
-
Description: Predicts topics for a single academic paper.
-
Input Data Format:
[ { "title": "Multiplication of matrices of arbitrary shape on a data parallel computer", "abstract_inverted_index": { "Some": [0], "level-2": [1], "and": [2], "level-3": [3], "Distributed": [4], "Basic": [5], "Linear": [6], "Algebra": [7], "Subroutines": [8], "(DBLAS)": [9], "that": [10], "have": [11], "been": [12], "implemented": [13], "on": [14, 26], "the": [15, 27], "Connection": [16], "Machine": [17], "system": [18], "CM-200": [19], "are": [20], "described.": [21], "No": [22], "assumption": [23], "is": [24], "made": [25], "shape": [28], "or": [29], "...": [30] }, "journal_display_name": "Fire Safety Science", "referenced_works": [ "https://openalex.org/W183327403", "https://openalex.org/W1851212222", "https://openalex.org/W1967958850", "https://openalex.org/W1988425770", "https://openalex.org/W1991286031", "https://openalex.org/W2029342163", "https://openalex.org/W2045381439", "https://openalex.org/W2053280233", "https://openalex.org/W2071782145", "https://openalex.org/W2083202979", "https://openalex.org/W2104487100", "https://openalex.org/W4234919994" ], "inverted": true } ]
-
Example Request:
import requests import json url = "http://localhost:<PORT>/single" headers = {"Content-Type": "application/json"} with open("test_samples/test_json_single.json", "r") as f: data = json.load(f) response = requests.post(url, headers=headers, json=data) print(response.json())
-
Example Response:
[ [ { "topic_id": 10829, "topic_label": "829: Networks on Chip in System-on-Chip Design", "topic_score": 0.9978 }, { "topic_id": 10054, "topic_label": "54: Parallel Computing and Performance Optimization", "topic_score": 0.9963 }, { "topic_id": 11522, "topic_label": "1522: Design and Optimization of Field-Programmable Gate Arrays and Application-Specific Integrated Circuits", "topic_score": 0.991 } ] ]
-
Endpoint:
/single
-
Method:
POST
-
Description: Predicts topics for a single academic paper with an univerted abstract.
-
Input Data Format:
[ { "title": "The renewable energy role in the global energy Transformations", "abstract": "In a comprehensive analysis of the global transition towards renewable energy, the study revealed...", "abstract_inverted_index": {}, "journal_display_name": "Renewable energy focus", "referenced_works": [ "https://openalex.org/W2275853436", "https://openalex.org/W2412247133", "https://openalex.org/W2545730423", ...., "https://openalex.org/W2601431494"] "inverted": false } ]
-
Example Request:
import requests import json url = "http://localhost:<PORT>/single" headers = {"Content-Type": "application/json"} with open("test_samples/test_json_single_not_inverted.json", "r") as f: data = json.load(f) response = requests.post(url, headers=headers, json=data) print(response.json())
-
Example Response:
[ [ { "topic_id": 12639, "topic_label": "2639: Global Energy Transition and Fossil Fuel Depletion", "topic_score": 0.9951 }, { "topic_id": 11185, "topic_label": "1185: Integration of Renewable Energy Systems in Power Grids", "topic_score": 0.9747 }, { "topic_id": 12129, "topic_label": "2129: Energy Supply and Security Issues for Developed Economies", "topic_score": 0.9722 } ] ]
-
Endpoint:
/batch
-
Method:
POST
-
Description: Predicts topics for a batch of academic papers.
-
Example Request:
import requests import json url = "http://localhost:<PORT>/batch" headers = {"Content-Type": "application/json"} with open("test_samples/test_json_batch.json", "r") as f: data = json.load(f) response = requests.post(url, headers=headers, json=data) print(response.json())
-
Example Response:
[ [ { "topic_id": 10829, "topic_label": "829: Networks on Chip in System-on-Chip Design", "topic_score": 0.9978 }, { "topic_id": 10054, "topic_label": "54: Parallel Computing and Performance Optimization", "topic_score": 0.9962 }, { "topic_id": 11522, "topic_label": "1522: Design and Optimization of Field-Programmable Gate Arrays and Application-Specific Integrated Circuits", "topic_score": 0.9909 } ], [ { "topic_id": 10110, "topic_label": "110: Seismicity and Tectonic Plate Interactions", "topic_score": 0.9995 }, { "topic_id": 12157, "topic_label": "2157: Machine Learning for Mineral Prospectivity Mapping", "topic_score": 0.9933 }, { "topic_id": 10399, "topic_label": "399: Characterization of Shale Gas Pore Structure", "topic_score": 0.991 } ] ]
This project uses OpenAlex's topic classification model. Please refer to their license for terms of use.