Large language models are prone to hallucination, which is just a fancy word for making up a response. To correctly and consistently answer questions, we need to ensure that the model has real information available to support its responses. We use the Retrieval-Augmented Generation (RAG) pattern to make this happen.
With Retrieval-Augmented Generation, we first pass a user's prompt to a data store. This might be in the form of a query to Amazon Kendra . We could also create a numerical representation of the prompt using Amazon Titan Embeddings to pass to a vector database. We then retrieve the most relevant content from the data store to support the large language model's response.
In this lab, we will use an in-memory database Amazon MemoryDB to demonstrate the RAG pattern.
We will walk you through the steps to deploy a Python chatbot application using Streamlit on Cloud9. This is the architecture we will be implementing today.
The application is contained in the ragmm_app.py
file, and it requires specific packages listed in requirements.txt
.
Before you proceed, make sure you have the following prerequisites in place:
-
An AWS Cloud9 development environment set up.
-
We will be using Amazon Bedrock to access foundation models in this workshop.
-
Enable Foundation models such as Claude, as shown below:
-
Python and pip installed in your Cloud9 environment.
-
Internet connectivity to download packages.
- Clone this repository to your Cloud9 environment:
git clone https://github.com/aws-samples/amazon-memorydb-for-redis-samples
cd tutorials/memorydb-rag
- Install the required packages using pip:
pip3 install -r requirements.txt -U
- Use langchain vectorstore plugin for MemoryDB. For details Langchain aws Official Website.
from langchain_aws.vectorstores.inmemorydb import InMemoryVectorStore
vds = InMemoryVectorStore.from_documents(
chunks,
embeddings,
redis_url="rediss://cluster_endpoint:6379/ssl=True ssl_cert_reqs=none",
vector_schema=vector_schema,
index_name=INDEX_NAME,
)
- Configure environment variables (optional) .
export MEMORYDB_CLUSTER=rediss://CLUSTER_ENDPOINT:PORT
- Running the application
streamlit run 'ragmm_app.py' --server.port 8080
If the index is not created and data is not loaded into MemoryDB then you can select this radio button.
If the index is already created below appear when we first load the application.
The vector database has MemoryDB developer guide.
For more detailed information, refer to the MemoryDB Developer Guide.
Here are a few sample questions we can ask
- What is MemoryDB ?
- How do you create a MemoryDB cluster?
- What are some reasons a highly regulated industry should pick MemoryDB?
LangChain provides easy ways to incorporate modular utilities into chains. It allows us to easily define and interact with different types of abstractions, which make it easy to build powerful chatbots.
In this use case we will ask the Chatbot to answer question from some external corpus. To do this we apply a pattern called RAG (Retrieval Augmented Generation): the idea is to index the corpus in chunks, then look up which sections of the corpus might be relevant to provide an answer by using semantic similarity between the chunks and the question. Finally the most relevant chunks are aggregated and passed as context to the ConversationChain, similar to providing a history.
We will take a PDF file and use Titan Embeddings Model to create vectors. This vector is then stored in Amazon MemoryDB, in-memory vector datbase.
When the chatbot is asked a question, we query MemoryDB with the question and retrieve the text which is semantically closest. This will be our answer.
To see what the input prompt is to the LLM we can execute this search directly on the document store which runs a VSS
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.