A powerful Telegram bot that transcribes and summarizes voice notes using state-of-the-art AI models. Built with Python and powered by Groq's API with Whisper and Llama 3 model for transcription and summarization.
Transcription Example | Summary Example |
---|---|
![]() |
![]() |
- Transcribe voice notes from forwarded messages
- Handle direct voice note recordings
- Generate accurate transcriptions using Whisper
- Provide concise summaries of the transcribed content using Llama 3
- Support for multiple audio formats
- Well-formatted, easy-to-read output
- Python 3.10 or higher
- pipenv (Python package manager)
- Telegram Bot Token (from @BotFather)
- Groq API Key (Get it here)
-
Clone the repository:
git clone https://github.com/aviaryan/voice-transcribe-summarize-telegram-bot.git cd voice-transcribe-summarize-telegram-bot
-
Install dependencies using pipenv:
pipenv install
-
Create a
.env
file in the root directory:cp .env.copy .env
-
Fill in your environment variables in the
.env
file:TELEGRAM_BOT_TOKEN=your_telegram_bot_token GROQ_API_KEY=your_groq_api_key
-
Configure authorized users:
- Open
bot.py
and locate theAUTHORIZED_USERS
array - Add your Telegram user ID to the array (you can get your ID by messaging @userinfobot on Telegram)
AUTHORIZED_USERS = [your_telegram_id] # Add more user IDs as needed
- Open
-
Activate the virtual environment:
pipenv shell
-
Start the bot:
python bot.py
-
In Telegram:
- Forward any message containing a voice note to the bot
- Record and send a voice note directly to the bot
- Wait for the bot to process and return both transcription and summary
- The bot receives a voice note through Telegram
- Audio is processed and sent to Groq's API
- Whisper model transcribes the audio content
- Another pass through Groq's API generates a concise summary using Llama 3 model
- Both transcription and summary are returned to the user in a well-formatted message
Contributions are welcome! Feel free to:
- Open issues for bugs or feature requests
- Submit pull requests
- Improve documentation
- Share feedback
This project is licensed under the MIT License - see the LICENSE file for details.