A document QA chatbot demo built with LangChain and ChromaDB for analyzing DC housing policy documents.
This project is adapted from this tutorial and Anthropic's Contextual Retrieval Cookbook.
- Tested on MacOS (Apple Silicon)
- Python 3.12.4
- Install required packages:
pip install -r requirements.txt
- (Optional) For local embedding/LLM:
- Install Ollama (Mac users can use Homebrew):
brew install ollama
ollama pull mxbai-embed-large
ollama pull llama3.2
- API Setup:
- Copy
.env.example
to.env
- Add your API keys for:
- OpenAI (embeddings)
- Choice of LLM providers:
- OpenAI
- Anthropic
- Groq
- OpenRouter
Note on LLM Providers:
- Groq: Known for its ultra-fast inference speeds, especially for Claude and Mixtral models. Visit Groq to get API key and see available models.
- OpenRouter: Acts as a middleware service that provides access to various LLMs (including GPT-4, Claude, Mixtral, etc.) through a unified API. Check OpenRouter for available models and pricing.
.
├── data/ # PDF documents
├── db/ # ChromaDB vector database
├── models.py # Model configurations
├── ingest.py # Document processing script
├── chat.py # Chat interface
└── test_qa.md # Test questions and answers
- Start document processing:
python ingest.py
- Monitors
./data
folder for new PDFs - Processed files are marked with "_" prefix
- Drag-n-drop knowledge base updates
- Start chat interface:
python chat.py
- Choose between terminal or Gradio web interface
- Terminal: Type 'q' to quit
- Web UI: Use Ctrl+C to stop server
Note on Gradio: Gradio is an open-source Python library that makes it easy to create customizable web interfaces for ML models.
- By default, the UI is accessible only locally
- To create a public link, set
share=True
inlaunch()
:demo.launch(show_api=False, share=True)
- Gradio will generate a temporary public URL (valid for 72 hours)
- This allows others to access your chatbot through the internet
chunk_size = 1000 # Characters per chunk
chunk_overlap = 200 # Overlap between chunks
Adjust based on your documents for optimal QA performance.
In chat.py
:
- Change
USE_HYBRID
to choose between simple vector retrieval or hybrid (vector + BM25) retrieval - Adjustable parameters:
k
value (default: 10) - number of retrieved documents- Vector/BM25 weights in hybrid retrieval (default: 0.8/0.2)
See test_qa.md
for three types of test questions designed by Claude 2.5 Sonnet:
- Factual questions (single-source)
- Synthetic questions (multi-source)
- Analytical questions (policy recommendations)