[FREE API] ChatGPT Web Browsing in Bubble

georgecollier · August 13, 2023, 2:30pm

Hello folks,

I have another take it or leave it API that you may wish to use. This allows your website to surf the web and use the responses in ChatGPT prompts.

Demo: Try it out here

This API will scrape provided URLs, process that content into embeddings using OpenAI’s embeddings API, and upsert these embeddings to a Pinecone index and namespace of your choice. Then, it will query the Pinecone index using the message you provide and return the relevant matches. The advantage of doing this externally over natively in Bubble is 1. WUs and 2. speed. This API will process most requests between 3-10 seconds, depending on how many URLs provide.

Technically there’s no limit to the number of URLs you can provide - I provide max 5 at a time with no issues. If it can’t scrape a URL it’ll timeout after 10 seconds and only return the URLs that it could extract.

Anyway, I can’t be bothered to write detailed docs so got ChatGPT to do it (I checked over it myself though )

As I said above, the API is take it or leave it. I haven’t had any issues with it. Technically multithreading isn’t recommended but it makes it faster which I care about, and I haven’t had any errors from the API at all. Hope someone finds it useful

There is a suggested Bubble implementation at the bottom of the post.
If you would like me to deploy the API for you, we can discuss a fixed fee for that, and same goes if you’d like me to do the Bubble implementation. DM me or visit Not Quite Unicorns!

Overview

This API is designed to scrape content from given websites, process that content to generate embeddings using OpenAI’s API, and then store these embeddings into Pinecone, a vector database. The API also provides a feature to query Pinecone with a given message to retrieve relevant results.

Dependencies

The API uses:

Flask: To expose endpoints.
requests: To make HTTP requests.
numpy: For mathematical operations.
BeautifulSoup: For web scraping.
uuid: To generate unique IDs.
threading: For multi-threading.

API Endpoint

`/browse`

Method: POST

Description: This endpoint accepts a list of URLs. For each URL, the content and title are scraped, processed to get embeddings and then upserted to Pinecone in the specified namespace. The endpoint also queries Pinecone with a given message and returns the query results.

Input:

JSON payload with the following fields:

urls: List of URLs to be processed.
openAIAPIkey: API key for OpenAI.
pineconeURL: URL endpoint for Pinecone.
pineconeAPIkey: API key for Pinecone.
namespace: Namespace in Pinecone to differentiate between different collections.
wordLimit: Word limit for splitting content into chunks.
uniqueID: Unique identifier for the current process (helpful for grouping data).
message: Message to query Pinecone with.
topK: Number of top results to return from Pinecone.
category (Optional): If provided, will be used as metadata when storing embeddings in Pinecone and as a filter when querying Pinecone.

Output:

JSON response with the following fields:

status: Status of the processing, typically returns “processing complete”.
numTokens: Total number of words from the content of all websites.
queryResults: Results from querying Pinecone after upserting the URLs you provided.
urlData: List of processed data for each URL including the content, title, URL, and associated vector IDs. These are the data that now lie in your specified Pinecone namespace.

Example Request:

{
    "urls": ["<https://example.com>", "<https://example2.com>"],
    "openAIAPIkey": "YOUR_OPENAI_API_KEY",
    "pineconeURL": "YOUR_PINECONE_URL",
    "pineconeAPIkey": "YOUR_PINECONE_API_KEY",
    "namespace": "sample-namespace",
    "wordLimit": 150,
    "uniqueID": "sample-id",
    "message": "Looking for technology news",
    "topK": 5,
    "category": "tech"
}

Example Response:

{
    "status": "processing complete",
    "numTokens": 1200,
    "queryResults": [
        {
            "metadata": {
                "content": "Sample content from example.com",
                "memoryID": "sample-id",
                "url": "<https://example.com>",
                "title": "Sample Title",
                "category": "tech"
            },
            "score": 0.95
        },
        ...
    ],
    "urlData": [
        {
            "url": "<https://example.com>",
            "title": "Sample Title",
            "content": "Full content from example.com",
            "vector_ids": ["vector-id-1", "vector-id-2"]
        },
        ...
    ]
}

Deploying the API

Create a Google Cloud Platform account and make a project. Set up a billing account (you’ll get a few hundred $ to use as a trial)
Install Google Cloud CLI
gcloud init and configure you’re project
Download the API into a folder
Navigate to the directory in the terminal
gcloud app deploy

Notes:

The API uses OpenAI’s API to get embeddings for content chunks.
The API uses Pinecone’s API to store and query embeddings.
The API is designed for batch processing of URLs and uses a constant batch size for processing chunks of text.
The API uses a user-agent string for making requests to avoid being blocked by websites.
Exception handling is in place to skip problematic websites and continue processing others.
It will not render JavaScript.

Suggested Bubble Implementation

When user sends message with web browsing enabled, trigger custom workflow
Use GPT-3.5 to suggest a search query for the user’s message
Use a SERP API to search the internet using that search query
Make /browse API request with the top n URLs in the SERP API’s results :merged with user’s message:extract with Regex:format as text(Content: This text:formatted as JSON-safe Delimiter: ,). The extract with Regex can be a Regex expression to identify URLs in the user’s own message e.g ([\\w+]+\\:\\/\\/)?([\\w\\d-]+\\.)*[\\w-]+[\\.\\:]\\w+([\\/\\?\\=\\&\\#\\.]?[\\w-]+)*\\/? does a pretty good job of extracting most URLs. So, this will send the URLs we found from the SERP API, and any URLs the user provided in their message.
Make namespace the Current user’s unique ID. A namespace is like a folder in a Pinecone index, so ensure that when you upsert to Pinecone or query Pinecone, you only do that using namespace = Current user’s unique ID.
message is the message used to query the Pinecone index after we upsert it.
After receiving the response, save all of the vectorIDs to your database in some way or another so you can delete them later if needed (I schedule ‘create browsing memory’ on a list (the urlData) which creates a Memory ‘thing’ for each URL that the user can manage.
Insert the most similar returned matches into the prompt with the content, title, and URL for each one (including title and URL means it can ‘credit’ the source in its response as it knows where the chunk of text has come from)

Wow, you got this far! Thanks for reading the docs. The download link is here.

georgecollier · August 13, 2023, 2:51pm

Here is a little demo: Loom | Free Screen & Video Recording Software | Loom

I passed it the URL of this forum post and asked it to summarise. You can see which memory was passed to the prompt when I show the debugger at the end.

It took a while to come through primarily because of the ChatGPT plugin warmup and some other stuff I run - the web browsing took 5 seconds

incomdies · August 13, 2023, 3:47pm

can you post this link to the web page where you demoed this?

georgecollier · August 13, 2023, 4:29pm

My app @ https://flexgpt.io

It also uses this API for uploading memory really quickly (compared to recursively in Bubble): Free OpenAI + Pinecone Upload API

DanielShinall · August 13, 2023, 5:51pm

George, this sounds amazing

I haven’t thought of an app I want to build using embeddings yet, but when I do I’ll probably try this out!

georgecollier · August 14, 2023, 6:22pm

Edited to include a dedicated demo (Demos | Not Quite Unicorns) so no signup is required on FlexGPT

Topic		Replies	Views
[Free Template] ChatGPT + Browser + Vectorstore Templates	1	450	August 17, 2023
FlexGPT - ChatGPT with memory, web search, unlimited GPT-4, no subscriptions Showcase	39	11253	May 20, 2023
[FREE API] Easy OpenAI + Pinecone Upsert / Embeddings Showcase	1	2495	August 9, 2024
How to connect chatgpt 4 to the internet using the api? APIs	2	3172	September 22, 2023
API searching bubble database APIs	6	71	January 22, 2025