Chat with your multimedia content using Amazon Bedrock Data Automation and Amazon Bedrock Knowledge Bases
In the era of information overload, extracting meaningful insights from diverse data sources has become increasingly challenging. This becomes particularly difficult when businesses have terabytes of video and audio files, along with text based data and need to quickly access specific sections or topics, summarize content, or answer targeted questions using information sourced from these diverse files without having to switch context or solutions. This unified GenAI solution transforms how users interact with their data. This solution seamlessly integrates with various file formats including video, audio PDFs and text documents, providing a unified interface for knowledge extraction. Users can ask questions about their data, and the solution delivers precise answers, complete with source attribution. Responses are linked to their origin, which could include videos that load at the exact timestamp, for faster and efficient referance, PDF files or documents.
This sample solution will demonstrate how to leverage AWS AI services to:
- Process and index multi-format data at scale, including large video, audio and documents
- Rapidly summarize extensive content from various file types
- Deliver context-rich responses Provide an unified, intuitive user experience for seamless data exploration
bot-demo.mp4
This solution uses Amazon Bedrock Data Automation for data parsing and Amazon Bedrock Knowledge Bases for chunking, embedding, retrieval and answer generation.
Amazon Bedrock Data Automation:
- Manages all content parsing
- Converts documents, images, video, and audio to text
- Processing text from common text formats, visually rich documents and images.
- Processing speech to text from audio or video files
- Processing video files without audio for complete summary and events
- Processing text within videos
- Categorizing data within files for efficient search and retrieval
- Media Bucket: Secure bucket for source files
- Organized Bucket: Processed files destination
- Application Host Bucket: React frontend host
-
Initial Processing Lambda
- Handles S3 uploads
- Triggers Bedrock Data Automation
-
Output Processing Lambda
- Processes Bedrock Data Automation results
- Converts JSON to timestamped text
- Stores in organized bucket
-
Retrieval Lambda
- Handles user queries
- Manages context retrieval and response generation
Parameter | Description | Default/Constraints |
---|---|---|
ModelId | The Amazon Bedrock supported LLM inference profile ID used for inference. | Default: "us.anthropic.claude-3-haiku-20240307-v1:0" |
EmbeddingModelId | The Amazon Bedrock supported embedding LLM ID used in Bedrock Knowledge Bases. | Default: "amazon.titan-embed-text-v2:0" |
DataParser | Bedrock Data Automation processes visually rich documents, images, videos and audio and converts to text. | Default: "Bedrock Data Automation" Allowed Values: ["Bedrock Data Automation"] |
ResourceSuffix | Suffix to append to resource names (e.g., dev, test, prod) | - Alphanumeric characters and hyphens only - Pattern: ^[a-zA-Z0-9-]*$ - MinLength: 1 - MaxLength: 20 |
- Automatic media files transcription
- Support for multiple media formats
- Timestamped transcript generation
- User authentication using Amazon Cognito
- IAM roles with least privilege access
- Cognito user pool for authentication
- Cloudfront resource URLs validated using Amazon Lambda@Edge
- AWS CLI with credentials
- Node.js and npm
- AWS Console access
- Upload the template to CloudFormation console [1]
- Fill in the required parameters
- Create stack and wait for completion
aws cloudformation create-stack
--stack-name chatbot-react-stack
--template-body file://path/to/template.yaml
--capabilities CAPABILITY_IAM
--parameters ParameterKey=,ParameterValue=
- Using the console or CLI, deploy chatbot.yaml template first (only us-west-2 supported currently)
- From the Outputs section of the deployed stack, copy ReactAppUserPoolId's value
- Deploy lambda-edge.yaml template (in us-east-1 only) using the Cognito User Pool ID obtained from previous step
- From the Outputs section of the lambda-edge.yaml stack, copy EdgeFunctionVersionARN's value
- Navigate to CloudFront in the AWS Management Console
- Select the distribution you want to modify
- Go to the "Behaviors" tab
- Select the default behavior (Path pattern: Default (*))
- Click "Edit" button
- Scroll down to the "Function associations" section
- For "Origin request", select "Lambda@Edge" as Function type
- Provide the EdgeFunctionVersionARN obtained from the previous step
- Scroll to the bottom and click "Save changes"
- Wait for the distribution to deploy the changes (Status will change from "In Progress" to "Deployed")
- Navigate to the chatbot-react folder
- Create .env file with the following structure:
REACT_APP_LAMBDA_FUNCTION_NAME=<ReactAppLambdaFunctionName>
REACT_APP_S3_SOURCE=<ReactAppS3Source>
REACT_APP_AWS_REGION=<chatbot.yaml_deployment_region>
REACT_APP_USER_POOL_ID=<ReactAppUserPoolId>
REACT_APP_USER_POOL_CLIENT_ID=<ReactAppUserPoolClientId>
REACT_APP_IDENTITY_POOL_ID=<ReactAppIdentityPoolId>
REACT_APP_CLOUDFRONT_DOMAIN_NAME=<ReactAppCloudfrontDomainName>
REACT_APP_DOCUMENTS_KB_ID=<ReactAppDocumentsKbId>
REACT_APP_DOCUMENTS_DS_ID=<ReactAppDocumentsDsId>
- Replace placeholder values with chatbot.yaml CloudFormation stack outputs
- Build and Deploy Frontend
- Install dependencies
npm install
- Build the application
npm run build
- Install dependencies
- Upload the contents of chatbot-react/build to Amazon S3 bucket
- Verify the CloudFront distribution is deployed and active
- Access the application using:
https://<ReactAppCloudFrontDomainName>.cloudfront.net/
- Signup or Log in with your credentials
- Use the left navigation pane to: a. Upload files b. Initiate data sync c. Monitor sync status
- Once sync is complete, start chatting with your data
- Create Guardrails from the Amazon Bedrock Console or obtain existing Guardrail ID and version
- Use the left navigation pane to select 'Guardrails' from the dropdown
- Provide the Guardrail ID and version
- Ask a question and test for blocked content
- Use the left navigation pane to select 'Inference Configuration' from the dropdown
- Provide a Bedrock supported model's inference profile ID (This solution works best with Anthropic Claude 3 Haiku. Other LLMs might require prompt tuning in )
- Change Temperature and TopP
- Ask a question and test infered answer
- Direct S3 Upload : Place files in the bucket (Optional)
- Web Interface : Upload through the application's UI
- Keep .env file secure and never commit it to version control
- Do not use sensitive, confidential, or critical data
- Do not process personally identifiable information (PII)
- Use only public data for testing and demonstration purposes
- CloudWatch Logs for Lambda functions and upload/sync failures
- EventBridge rules for tracking file processing
- Supports specific media file formats only (Refer Amazon Bedrock Data Automation documentation)
- Maximum file size limitations apply based on AWS service limits
- Single document cannot exceed 20 pages
- Files have to be manually deleted from and buckets and Amazon Bedrock Knowledge Basess have to be manually synced to reflect these changes.
Bedrock Data Automation currently available only in:
- us-west-2
This is a demonstration/sample solution and is not intended for production use. Please note:
- Do not use sensitive, confidential, or critical data
- Do not process personally identifiable information (PII)
- Use only public data for testing and demonstration purposes
- This solution is provided for learning and evaluation purposes only
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.