DC Housing Policy Chatbot (Demo)

A document QA chatbot demo built with LangChain and ChromaDB for analyzing DC housing policy documents.

Credit

This project is adapted from this tutorial and Anthropic's Contextual Retrieval Cookbook.

Environment Setup

System Requirements

Tested on MacOS (Apple Silicon)
Python 3.12.4

Installation

Install required packages:

pip install -r requirements.txt

(Optional) For local embedding/LLM:

Install Ollama (Mac users can use Homebrew):

brew install ollama
ollama pull mxbai-embed-large
ollama pull llama3.2

API Setup:

Copy .env.example to .env
Add your API keys for:
- OpenAI (embeddings)
- Choice of LLM providers:
  - OpenAI
  - Anthropic
  - Groq
  - OpenRouter

Note on LLM Providers:

Groq: Known for its ultra-fast inference speeds, especially for Claude and Mixtral models. Visit Groq to get API key and see available models.
OpenRouter: Acts as a middleware service that provides access to various LLMs (including GPT-4, Claude, Mixtral, etc.) through a unified API. Check OpenRouter for available models and pricing.

Project Structure

.
├── data/           # PDF documents
├── db/             # ChromaDB vector database
├── models.py       # Model configurations
├── ingest.py      # Document processing script
├── chat.py        # Chat interface
└── test_qa.md     # Test questions and answers

Usage

Start document processing:

python ingest.py

Monitors ./data folder for new PDFs
Processed files are marked with "_" prefix
Drag-n-drop knowledge base updates

Start chat interface:

python chat.py

Choose between terminal or Gradio web interface
Terminal: Type 'q' to quit
Web UI: Use Ctrl+C to stop server

Note on Gradio: Gradio is an open-source Python library that makes it easy to create customizable web interfaces for ML models.

By default, the UI is accessible only locally
To create a public link, set share=True in launch():
```
demo.launch(show_api=False, share=True)
```
Gradio will generate a temporary public URL (valid for 72 hours)
This allows others to access your chatbot through the internet

Key Parameters

Document Processing

chunk_size = 1000   # Characters per chunk
chunk_overlap = 200 # Overlap between chunks

Adjust based on your documents for optimal QA performance.

Retrieval Strategy

In chat.py:

Change USE_HYBRID to choose between simple vector retrieval or hybrid (vector + BM25) retrieval
Adjustable parameters:
- k value (default: 10) - number of retrieved documents
- Vector/BM25 weights in hybrid retrieval (default: 0.8/0.2)

Evaluation

See test_qa.md for three types of test questions designed by Claude 2.5 Sonnet:

Factual questions (single-source)
Synthetic questions (multi-source)
Analytical questions (policy recommendations)

Used Documents

Document Title	Year	Page Count
A 'Perfect storm' of problems pulls D.C. toward full-blown housing crisis (Washington Post)	2024	8
Single-Family Zoning in the District of Columbia	2020	24
Affordable Housing Policies in the Washington D.C. Metropolitan Area (LMU Honors Thesis)	2023	33
DC Housing Survey Report: A Supplement to the Assessment of the Need for Large Units	2019	19
An Assessment of the Need for Large Units in the District of Columbia	2019	67
Families sue D.C. for ending housing aid in unprecedented case	2024	5
Housing Equity Report: Creating Goals for Areas of Our City	2019	20
Housing Insecurity in the District of Columbia	2023	101

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DC Housing Policy Chatbot (Demo)

Credit

Environment Setup

System Requirements

Installation

Project Structure

Usage

Key Parameters

Document Processing

Retrieval Strategy

Evaluation

Used Documents

Built with

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
db		db
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Reflection Note.pdf		Reflection Note.pdf
chat.py		chat.py
ingest.py		ingest.py
models.py		models.py
requirements.txt		requirements.txt
test_qa.md		test_qa.md

License

qinip/DC_Housing_Chatbot

Folders and files

Latest commit

History

Repository files navigation

DC Housing Policy Chatbot (Demo)

Credit

Environment Setup

System Requirements

Installation

Project Structure

Usage

Key Parameters

Document Processing

Retrieval Strategy

Evaluation

Used Documents

Built with

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages