A demonstration project showing how to build a realtime multimodal application using Google's Gemini 2.0 API and Next.js. This app can process audio, video, and generate transcripts in realtime.
- Realtime audio/video(image) interaction with Gemini 2.0 Multimodal Live API
- Live transcription by Gemini 1.5/2.0 GenerativeAI API
- Built with Next.js for optimal performance
- Node.js 18+ installed
- API key for Gemini 2.0 Model
- Clone the repository
git clone https://github.com/yeyu2/gemini-nextjs.git
cd gemini-nextjs
- Install dependencies
npm install
# or
yarn install
- Set up environment variables
cp .env.example .env.local
Add your Gemini API key to .env.local
:
GEMINI_API_KEY=your_api_key_here
- Run the development server
npm run dev
# or
yarn dev
Open http://localhost:3000 with your browser to see the application.