Someone at my work shared a Tweet about this project here, and I decided to make a clone in Node.JS
demo.mov
- Node v20.6.0
- ElevenLabs API Key (https://elevenlabs.io)
- OpenAI API Key (https://platform.openai.com/api-keys)
The difference between my project and the original project is that I am using socket.io to Stream the audio to the client using Socket.io. How it works is that the client uses the webcam and captures it, after which it converts the image to base64 and sends it to the server. The server receives the image and sends it to the ChatGPT Vision API to get a description of the image based on the prompt. After that, it sends the description to the ElevenLabs API to get the audio file of the description, and finally, the server sends the audio file to the client using Socket.io.
- Clone the repo.
- Run
npm install
. - Create a
.env
file using the.env.example
file as a template. - Run
npm start:client
to start the client. - Run
npm start:server
to start the server. - Go to
http://localhost:8080
and enjoy!