RAG-Based Chatbot with Web Scraping

This project implements a Retrieval-Augmented Generation (RAG) chatbot that scrapes content from specified websites and uses it to answer user queries. The chatbot retrieves relevant information from the scraped data and generates responses using a language model.

Features

Web Scraping: Extracts content from specified URLs to build a knowledge base.
Retrieval-Augmented Generation: Combines information retrieval with text generation to provide accurate and contextually relevant answers.
Streamlit Interface: Offers an interactive web-based interface for user interaction.
Docker Support: Containerized application for easy deployment.

Installation

Prerequisites

Python 3.9 or higher
pip
Docker (optional, for containerized deployment)

Clone the Repository

git clone https://github.com/sadavaidya/RAG_Chatbot.git
cd RAG_Chatbot

Install Dependencies

Using pip:

pip install -r requirements.txt

Usage

Running the Chatbot Locally

Start the Streamlit Application:
```
streamlit run app.py
```
Interact with the Chatbot:

Open your browser and navigate to http://localhost:8501 to start interacting with the chatbot.

Running the Chatbot with Docker

Build the Docker Image:
```
docker build -t rag_chatbot .
```
Run the Docker Container:
```
 docker run --gpus all -p 8501:8501 rag_chatbot
```
The application will be accessible at http://localhost:8501.

Project Structure

RAG_Chatbot/
├── src/
│   ├── components/
│   │   ├── embedding.py
│   │   ├── generation.py
│   │   ├── retrieval.py
│   │   └── web_scraper.py
│   ├── pipeline/
│   │   └── rag_pipeline.py
│   └── utils.py
├── app.py
├── requirements.txt
└── Dockerfile

src/components/: Contains modules for embedding, generation, retrieval, and web scraping.
src/pipeline/: Defines the RAG pipeline that integrates the components.
app.py: Streamlit application entry point.
requirements.txt: Lists Python dependencies.
Dockerfile: Defines the Docker image configuration.

How It Works

Web Scraping: The web_scraper.py module fetches content from specified URLs and processes it into a list of documents.
Embedding: The embedding.py module encodes these documents into vector representations using a pre-trained model.
Retrieval: The retrieval.py module searches for documents relevant to the user's query based on vector similarity.
Generation: The generation.py module uses a language model to generate a response, conditioning on the retrieved documents and the user's query.
Pipeline Integration: The rag_pipeline.py module orchestrates the embedding, retrieval, and generation components to produce a final answer.
User Interface: The app.py file sets up a Streamlit web interface where users can input queries and receive responses from the chatbot.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Note: Ensure that the URLs specified for web scraping in the web_scraper.py module are accessible and that scraping them complies with their terms of service.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.MD		README.MD
Screenshot 2025-02-23 101517.png		Screenshot 2025-02-23 101517.png
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG-Based Chatbot with Web Scraping

Features

Installation

Prerequisites

Clone the Repository

Install Dependencies

Usage

Running the Chatbot Locally

Running the Chatbot with Docker

Project Structure

How It Works

License

About

Releases

Packages

Languages

License

sadavaidya/RAG_Chatbot

Folders and files

Latest commit

History

Repository files navigation

RAG-Based Chatbot with Web Scraping

Features

Installation

Prerequisites

Clone the Repository

Install Dependencies

Usage

Running the Chatbot Locally

Running the Chatbot with Docker

Project Structure

How It Works

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages