Video Analysis with OpenAI (extract frames)

Hi Bubble community - I am building a video library app, and I need to run a video through OpenAI to provide me with tags. I know that OpenAI doesn’t offer video files as input, but there’s a workaround to extract frames and then upload these to be analyzed. OpenAI themselves refer to this article for this use case, however, this requires python and that goes beyond my abilities.
Processing and narrating a video with GPT's visual capabilities and the TTS API | OpenAI Cookbook

Alternatively, if someone has experience implementing such a feature with Google Cloud (Vertex AI, Gemini, Video Intelligence API, etc), any help is welcome.

Thank you!

1 Like

Is your input a live stream or a video file?
What do you want to run as analysis?

it can be easily done by Azure OpenAI API with gpt-4-vision with enhancements, that support video as an input that will take main important frames from video, and then it give to openai. So, you don’t need to do code by yourself, just use Azure OpenAI API

The input is a mp4 file, around 15-30 seconds. Basically, I want to use OpenAI to generate tags (e.g. about what’s in the video), that I then can store in the bubble database to better search for these videos.

That’s a good hint. Thank you. Let me try that.

For indexing purposes, the backend processes are better suited.

Would suggest rather to use a service already meeting your requirements and built for this purpose, running at a cheaper price.

See

Is this something you have done and you could share the setup in azure? Haven’t been able to upload video files there either

After getting access to Azure OpenAI, then
just deploy gpt-4-vision(gpt-4v) on Azure OpenAI in deployment section,
then open chat
where you see this interface

then enable Azure AI Vision,
which is
image

then click on view code, select CURL code,

Thank you everyone for your help. Both approaches will work, but were too complex for my use case. I needed to deploy Storage and AI Search to enable Vision, which was expensive at>$70/month. The AWS method would also work, but it extracted too many frames and was too slow.

I went for a much simpler approach. I use the Shotstack API to extract a few (3-5) frames from a video and then ingest these frames into the OpenAI API. As I am not expecting high volumes for my prototype, this approach was cheap, quick, and flexible (I can now use all OpenAI features like describing a scene, etc.)

This topic was automatically closed after 70 days. New replies are no longer allowed.