[PLUGIN] - LiveKit (real-time video chat)

Hi all,

This post is to introduce a brand new audio/video/data real-time chat plugin. I’ve been sitting on this for quite some time now, and I’m glad it’s finally complete! It’s gone through nearly 4 months of development which has involved not only testing the code, but making sure it works and plays nicely across all devices and everything runs smoothly. There’s a ton of features inside, like having the ability to float video elements, drag them around, swap on click (all that you get when you have a WhatsApp call with someone) and then it also includes background image segmentation where you can set a background image or blur it (like you get with Google Meet, but this is a desktop only feature). You can send audio across a WebSocket connection if you want, maybe integrate it with some third party system etc and record meetings uploading them to S3, GCP and Azure (all explained in the docs).

LiveKit (Real-Time Video Chat) Plugin | Bubble

I’ve built three different layouts to choose from (gallery, grid, speaker) which give you different views. The actual UI is quite configurable, but if I’ve missed anything then please let me know. Excuse the mug shots!

Gallery layout (responsive)

Grid layout (set sizes)

Speaker layout (this changes depending on who is speaking)

Screen sharing (participants get shifted to a container which is configurable)

As a part of the building process for this plugin, I’ve collated a list of what everyone has asked for in other plugins of this nature and made sure they are all included. So you can, if you want assign sound effects to specific events or turn on the “knocking” feature where specified room owners have to accept/reject participant requests before they are allowed to join the room.


I was going to release a version of this type of plugin some months back when MUX was developing their MUX Meet feature, but sadly they decided to sunset this and in the process of doing so recommended LiveKit. These guys really have built an awesome infrastructure and they give you a fair bit of data access on the free plan too (50 GB every month). The plugin automatically optimizes the quality of subscribed video tracks (both bandwidth and CPU) with a setting called adaptive stream which means it doesn’t consume a lot of bandwidth or data that gets sent. When video elements become visible, an appropriate resolution will be set. When none of the video elements are visible, it’ll temporarily pause the data flow until they become visible again.

You can even hook it up to an AI agent and have AI agents join your room. The agents use Deepgram, ChatGPT and ElevenLabs. Additional configuration for voice activation (VAD) and Speech to Text (SST) will be available BUT… none of this is in the current version of the plugin, not yet anyway. I will be adding this functionality once LiveKit have completed their JavaScript integration. For now, this only runs on Python backends.

If that’s caught your interest, you can demo it below.
LiveKit Agents Playground: KITT

Let’s move on … … …


Demo Pages

There are three demo pages and another that contains all the documentation. Every documentation section inside all the options/actions has been written up and proof read to make sure they state exactly how things work.

Standard demo page
https://paul-testing.bubbleapps.io/version-test/livekit_demo

Demo page showing the knocking feature
https://paul-testing.bubbleapps.io/version-test/livekit_demo_knocking

Demo page showing the visual effects (only supported on desktop)
https://paul-testing-4.bubbleapps.io/version-test/livekit-effects

Documentation
https://paul-testing.bubbleapps.io/version-test/livekit_instructions


Mobile views can be configured to look like the image below. The floating video element is draggable by click and touch and if you click it, the videos can swap.

Data can be sent between participants.

Locations of where people are (if they agree to sharing) can be plotted on to a map.

Background imaging.



As far as technology goes, LiveKit use the latest and greatest and have a Slack channel to join if you ever need further information - Slack channel

There’s not much more for me to say other than I hope it works well for everyone. If there’s any bugs then it’s genuinely because I’ve missed something, but I’m here for support as always and I’m happy to include any additional options going forward. It goes without saying, if anyone does spot any bugs then I’ll be jumping onto it straight away.

Paul

13 Likes

This is pretty amazing! I’m still fairly new, but very curious about how you mean when you say AI agents can join the call? Do you mean you can have them follow along in a video meeting? Is that possible to do easily with the plugin?

Thanks @M1J1

Exactly that, so think of an AI agent as an additional participant (person in the call) that’s capable of answering any questions etc. It’s not in the plugin right now but it’s being added once complete. Did you check the demo out which LiveKit themselves produced for it?

Paul

Quick update to the plugin…

A few minor fixes were resolved and a new feature has been added to support avatars.

You can test these out here.

This feature uses facial detection through the camera to allow virtual avatars to come to life, they mimic your movements.

Not sure how useful it will be but it’s certainly something different to try!
This shows the default ones included, but you can create your own and style the background.

Animation

All details in the documentation page.

2 Likes

Just installed and used this because Dyte stopped supporting their plugin. Great job Paul! Would definitely recommend.

Fully customisable to get it to look like my clients figma.

1 Like

Update v1.11.0

Deepgram speech to text integration.

Quick update… the plugin now has the ability to convert speech to text and return a transcript, it can do this in one of two ways.


First way - Live streaming transcripts

By enabling the options below, data from each participant’s microphone is sent across a web socket connection directly to Deepgram (in real time), transcribed then saved within the participants states as a list of transcripts. There are various settings which you may want to consider to help with the transcription accuracy.

These individual transcripts are seen as a list of texts contained within the participants states. When viewed in a repeating group, they will be seen like this where each participant has it’s own transcript.

This functionality is implemented on the livekit_demo page but is currently disabled. If anyone wants it enabled for testing, then just let me know and I will temporarily enable it.


Second way - Transcribing a completed recording

Because the first way contains individual transcripts per participant, it isn’t possible to then use this data to combine them into a single transcript for the entirety of a meeting, so if you have completed a Room Composite Egress recording, then you can send that to Deepgram instead.

Deepgram has a feature called diarization that’s capable of recognizing speaker changes and assigning speakers to various words in the transcript. Included is a new server side action called “LiveKit - Transcribe Recording” that will send the recording and return a list of transcripts, formatted in a way which highlights the speakers.

If you use that data in a repeating group, then it will look something like this…

Both these ways require an API key from Deepgram. The first way uses the key client-side and is potentially exposed. The second way is secure and uses the key from the plugin settings as shown below.

All of this has been documented in the instructions page and I hope it’s useful.
As always, any problems/issues/question… just shout.

Paul

2 Likes

Would this work with other audio streaming services that use web sockets like Hume.ai? Chat — Hume API

or does this plugin have to use LiveKit?

Hi Matt,

Unfortunately it has to use LiveKit and the integrations I’ve outlined. But I’ve just checked out Hume.ai as I hadn’t come across them before, and I was shocked at how good they are. I mean at first glance and trying their demos, they seem to be incredibly good.

What part did you want to integrate into this plugin exactly? What’s the use case here? They would benefit from a plug-in all on their own, I might even integrate them into my AI BOT plugin.

They’ve just released EVI 2 as the empathetic voice AI. Price wise Hume is competitive with Vapi and I find the ethos appealing.

I wouldn’t know the first bit out using one of their EVI frameworks (EVI TypeScript Quickstart Guide — Hume API) to build a Bubble plugin to allow voice back and forth and access to their API endpoints in a Bubble app.

That’s ok, i wouldn’t have expected that. I’m curious as to why you wanted to use them inside this LiveKit plug-in that’s all.

Your plugin was the only plugin I could find that had any mention of WSS. There aren’t many options for streaming audio in and out of Bubble.

Yeh there’s not many. What I’m trying to understand, is what functionality you want, if you can tell me the use case here and what it is that you’re wanting to do then I can figure out what it is I need to do to make it work for you. Right now, I’m unsure what you’re wanting to do exactly and whether it’s worth me changing portions of the code to make that happen. Whatever you need, it will probably take me a little time to do, but if it’s something that’s viable then I don’t mind adding in additional functionality.

Ok, scrap all that above…
LiveKit have now released their Agent for JS so at some point, I’ll attempt to integrate it.

Hi Paul, I have a bubble app which I use for training child protection professionals to develop their expertise through role-plays. Having played around in the hume sandpit, I’m very excited by the possibilities of the Hume EVI for giving training and feedback to social workers and other talking therapists. I know there is a lot of talk about the potential of AI therapists, but in my view the better model is to develop the complex conversational skills in humans using AI as a coaching partner.

Like Matt, I would really struggle to understand how to use Hume’s API directly, but I have a really compelling use case if you were able to build a bubble plugin. I am very confident that if I could get a prototype up and running it would attract a load of interest and probably funding.

Hi @stephenrice1

I was planning (and I started looking at) integrating Hume into this real time plugin, but the complexions around integrating it were just too much and whilst I had started, LiveKit released an agent for JS for which at some point in the future I will probably add.

For now though, I can create a Hume AI plugin of it’s own, I’m a bit tied up at the moment but I’ll try and make it my next project I work on.

Paul

1 Like

Hi Paul @pork1977gm ,

We are looking to integrate the OpenAI realtime API via Livekit:

Can we get this supported?

1 Like

Is this capable of just AI agent Speech to Speech too? I see on their site they offer this service, similar to Vapi which is 3x the price.

Hi @sniphairmail and @Timbo

I haven’t completed the agent setup yet, it’s on my list and I’m working my way through the OpenAI real time API. As soon as I’ve done it, I’ll update everything and report back here once done. The agent will be to handle speech to speech once complete.

Paul

3 Likes

Update v1.17.0

Hume AI - Expression Management services.

@stephenrice1, @mattblake

Hume AI has now been integrated. The standard demo page (not the knocking one) has the option to enable it (see images below). This will only be showcased for a short period of time before the key is revoked. Both voice and face expressions can be optionally enabled through the settings.

The audio stream from the microphone is run through the prosody model, whilst the video stream from the camera is run through the face model.

There are 2 new properties across all the participant states, these are a list of voice emotions and a list of face emotions. Each list contains 48 emotions automatically sorted by score so the most probable is at the top.

The instructions page has been updated so there is some additional information on there if required.

https://paul-testing.bubbleapps.io/version-test/livekit_instructions?tab=1 (scroll to bottom of page)
https://paul-testing.bubbleapps.io/version-test/livekit_instructions?tab=3
https://paul-testing.bubbleapps.io/version-test/livekit_instructions?tab=11

Paul

1 Like

This is seriously cool! My only hesitate is having to use your site to manage my Hume AI key. I’m sure this is due a limitation with Bubble not being able to handle streaming data, but does there have to be another third party server involved?

In this LiveKit plugin, the API key is taken directly from the plugin settings here:

It’s worth noting that this key is exposed client side, but the connections have to made from the browser to perform adequately. It doesn’t rely on the same system that the AI BOT plugin uses to generate tokens and grab keys from a secure area, although I may add that process into this plugin at a later date.

Does that answer your question?

Paul