Talking Avatar with Lip-Sync and Vision Capabilities

pork1977gm · May 2, 2025, 11:19am

So I’ve been working on a project recently, attempting to create an interactive avatar with real-time features such as streaming audio with lip-sync data (phonemes) and support for vision to produce a digital representation of yourself (or some other character). I’ve come up with a demo that manages to do this. Because it also uses AI, it’s also capable of figuring out a specific facial emotion (48 of them in total) where one is chosen for every message you send and the face of the avatar temporarily changes to reflect that emotion whilst it’s speaking still.

This is mainly for use with AI applications where text to speech in involved.

It works relatively well in it’s current state, although there are a few tweaks I shall probably make within the coming weeks.

You can play with the demo here: avatar lip-sync demo

How it works…

It relies on a few things. You need to provide a GLB file (this is the avatar), a default one is included. The main library which runs this requires the GLB file to be based on the ReadyPlayerMe specs, as they contain additional data to help with the movements.

You can use any of these utilities to create a customized avatar.

There is only one text to speech service supported right now, and that’s cartesia because of their ability to produce phonemes (a language that represents units of sound) along with timestamps that refer to the timings associated to these sounds. These are needed to map the correct movement of mouth positions to the spoken text.

If enabling your camera, it will also be able to see you by analysing captured images.

Have a play, it’s not perfect but there’s room for improvement.

Here’s the editor if you’re interested:
paul-testing-1 | Bubble Editor

Paul

betteredbritain · May 2, 2025, 12:00pm

Uhm did you create this technology, is it open-source code … or is this an API integration? Can you also give me the pricing page if this is a 3rd party service. Obviously, the technology itself could be better with a better budget but I’d love to keep an eye on this for one application I’m responsible for at the moment.

betteredbritain · May 2, 2025, 12:02pm

Also you could probably go and raise a pre-seed round with this demo if this is your code.

Timbo · May 2, 2025, 12:06pm

Amazing work once again, Paul.
Thanks for implementing this, it’s working perfectly for us.

Keep up the amazing development

pork1977gm · May 2, 2025, 12:07pm

Hi,

So I didn’t create the tech behind it, but I have customized it to make it work within the plugin it’s currently incorporated with.

There’s no additional costs as it’s open source, and the AI is using Groq + llama model for the demo. The only part which isn’t free, is the text to speech service which is Cartesia. New signups with them give you 20,000 free credits to play around with but they are amazingly good (better than ElevenLabs some may say…) and a lot cheaper.

Hope that helps!
Paul

pork1977gm · May 2, 2025, 12:07pm

Cheers Timbo!

pork1977gm · May 2, 2025, 12:09pm

Well it is open source, but it required a few tweaks to get it working in a Bubble plugin. I’ll have to get my brain into gear and come up with a proper use for it.

betteredbritain · May 2, 2025, 12:51pm

Okay thanks for the info.

Great work!

pork1977gm · June 23, 2025, 8:52pm

We now have a service…

It’s still in development, but functionally it’s all working. Over the next week, it should have all the documentation completed and other bits that are currently on my list.

Anyone interested in testing using it, let me know and I’ll grant you free access.

Topic		Replies	Views
New TTS Audio/Speech AI Plugin using Vapi Plugins	13	311	February 1, 2025
:speaking_head: ᴺᴱᵂ ᴾᴸᵁᴳᴵᴺ Real-Time Speech To Text (Write as You Speak, without API, 180 Languages!) Plugins	27	2274	September 12, 2024
New Plugin: Synthesia AI Video creation Plugins	6	2165	December 5, 2023
GeminiConnect Plugin: Free Access to Leading AI Models in Bubble Plugins	2	35	April 23, 2025
Text-To-Spokesperson APIs	3	321	April 30, 2025

Talking Avatar with Lip-Sync and Vision Capabilities

How it works…

Related topics