SmartDoc AI 🤖
A self-hosted AI document summarizer and Q&A system that processes documents locally - no API keys needed.

Features

🔎 Document upload and text extraction (PDF/TXT)
📝 Automatic document summarization
❓ Question answering system
🔍 Semantic search using FAISS
💻 Local processing with no external APIs
⚡ FastAPI backend ready for React frontend

Tech Stack

Frontend: React/Next.js, TailwindCSS
Backend: FastAPI

AI Models:

Summarization: t5-small (~300MB)

summarizer = pipeline(
    "summarization",
    model="t5-small",
    tokenizer="t5-small",
    framework="pt"
)

Q&A: distilbert-base-uncased-distilled-squad (~250MB)

qa_model = pipeline(
    "question-answering", 
    model="distilbert-base-uncased-distilled-squad",
    framework="pt"
)

Embeddings: all-MiniLM-L6-v2 (~90MB)

embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

Vector Search: FAISS

Total model size: ~640MB

Setup & Installation

Windows Environment Setup (Backend)

Create virtual environment:

python -m venv venv

Activate virtual environment:

venv\Scripts\activate

Install required packages:

pip install fastapi uvicorn python-multipart PyPDF2 transformers sentence-transformers faiss-cpu torch numpy

or run pip install -r requirements.txt to install all the dependencies

Run the server

uvicorn smartdoc_backend:app --reload

Server will be available at http://127.0.0.1:8000. If by chance it isn't that IP, check your CLI it will display the available IP it chose and port

Chute (end) the virtual environment using deactivate command

Frontend Setup (Next.js)

Navigate to frontend directory: cd .\frontend\
Install Node.js dependencies: npm install
Start development server: npm run dev

Frontend available at: http://localhost:3000

Model Storage

Models are cached in:

Windows: C:\Users\<YourUsername>\.cache\huggingface\hub
Linux/MacOS: ~/.cache/huggingface/hub

API Endpoints

Document Management

POST /upload - Upload PDF/text documents
GET /documents - List all documents
GET /document/{doc_id} - Get document metadata

AI Features

GET /document/{doc_id}/summary - Generate document summary
POST /document/{doc_id}/query - Ask questions about document content
GET /document/{doc_id}/chunks - Get document chunks (debug)

Usage Example without frontend

import requests

# Upload a document
files = {'file': open('document.pdf', 'rb')}
response = requests.post('http://127.0.0.1:8000/upload', files=files)
doc_id = response.json()['doc_id']

# Get a summary
summary = requests.get(f'http://127.0.0.1:8000/document/{doc_id}/summary')

# Ask a question
query = {'query': 'What is this document about?'}
answer = requests.post(f'http://127.0.0.1:8000/document/{doc_id}/query', json=query)

License

MIT License - See LICENSE for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Features

Tech Stack

Setup & Installation

Windows Environment Setup (Backend)

Run the server

Frontend Setup (Next.js)

Model Storage

API Endpoints

Document Management

AI Features

Usage Example without frontend

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Features

Tech Stack

Setup & Installation

Windows Environment Setup (Backend)

Run the server

Frontend Setup (Next.js)

Model Storage

API Endpoints

Document Management

AI Features

Usage Example without frontend

License