[New Plugin] NoCoddo AI: Run LLMs inside the browser. 100% Free, Private & Offline

tiagoncpereira · December 9, 2025, 1:32am

Hey Bubblers!

We all love OpenAI, but let’s be honest: paying API costs just to check if a review is “Positive” or “Negative,” or to summarize a short text, adds up quickly. Plus, sending sensitive customer data to third-party clouds can be a privacy nightmare.

I built NoCoddo AI to solve this.

It runs powerful Machine Learning models (Transformers) directly in your user’s browser via WebAssembly.

Why use this over API calls?

Zero Cost: You can process 1 million items, and it costs $0. The “brain” is the user’s device.
100% Private: Data never leaves the browser. Perfect for GDPR/HIPAA compliant apps.
Offline Capable: Works even if the user loses internet connection.

What can it do?

Sentiment Analysis: Auto-tag support tickets as Positive/Negative.
Summarization: Condense long articles into snippets.
Zero-Shot Classification (Pro): Categorize text into your own custom labels (e.g., “Spam”, “Lead”, “Urgent”) without training.
Text Generation: A lightweight chat model for basic tasks.
Feature Extraction: Generate Vector Embeddings for semantic search directly in the browser.

Check the Demo & Plugin: Free Version

Check the Demo & Plugin (Pro): Pro Version

I’d love to hear your thoughts! If you have specific models you’d like to see added, let me know below.

ihsanzainal84 · December 9, 2025, 4:03am

You should share some inference test results of a sample workload. Include the browser and hardware specs.

WASM varies depending on hardware. For Pro you should allow some variation of models.

Curious, why pick those models?

tiagoncpereira · December 9, 2025, 9:29am

Thanks for the feedback! Since this plugin runs entirely client-side via WASM (using Transformers.js), you are absolutely right, hardware plays a huge role.

Here are the details on our choices and some initial benchmarks:

1. Why pick those specific models? The primary criteria for the “Default” models were Size vs. Performance ratio and RAM usage. Since we are running in the browser, downloading a 4GB Llama model isn’t viable for most UX.

We specifically chose Quantized and Distilled versions (hosted by Xenova) to ensure they load quickly even on average connections and don’t crash the browser tab:

Sentiment: DistilBERT (Standard, very lightweight).
Summarization: DistilBART (Much smaller than the full BART, good enough for paragraphs).
Text Gen: LaMini-Flan-T5 (At ~248M params, it’s one of the few generative models that runs smoothly in-browser without massive lag).
Embeddings: all-MiniLM-L6-v2 (The gold standard for speed/quality balance in semantic search).

2. MacBook Air 2019 (Intel Core i5, 8GB RAM) to represent a standard/lower-end device baseline:

Initialization (Cold Start): ~5-8 seconds (WASM compilation is slower on older Intel chips).
Sentiment Analysis (Short sentence): ~150ms - 200ms (Still feels instant to the user).
Feature Extraction (Embeddings): ~400ms - 600ms per standard paragraph.
Summarization (500 words input): ~10 to 15 seconds. (This task is CPU intensive; older dual-core CPUs will throttle here).
Text Generation (LaMini): Noticeably slower. Initial latency ~2s, then generates tokens at roughly 2-4 tokens/sec. Usable for short outputs, but requires patience for long texts.

Estimations with the Macbook:

Sentiment Analysis / Zero-Shot: 500 to 1000 records.
Summarization / Text Generation: 20 to 50 records.

3. Model Variation You make a great point about allowing variation. For this V1, we hardcoded the models to ensure a “plug-and-play” stability for NoCode users who might not know which HuggingFace models are ONNX-compatible.

However, opening up a “Custom Model ID” field where users can input any generic Xenova-compatible model string is definitely on our roadmap for the next update!

Thanks again for the insight!

tiagoncpereira · December 12, 2025, 11:44am

V2 is here!

Now you can chose your AI to perform the task you want!

Where to find compatible models?

You cannot use just any model (like standard GPT or Llama). You need models optimized for the web (ONNX format) supported by Transformers.js.

Go to Hugging Face Hub: Models – Hugging Face
Filter for Transformers.js: Search for models with the tag transformers.js or look at the Xenova collection (highly recommended).
Copy the Model ID: Copy the text that looks like User/ModelName.

Example: Xenova/distilbert-base-uncased

Recommended Custom Models by Task

If you want to change the default behavior, here are the best tested models for each task supported by this plugin.

Sentiment Analysis: Xenova/bert-base-multilingual-uncased-sentiment
Summarization: Xenova/bart-large-cnn
Text Generation: Xenova/flan-t5-base
Zero-Shot Classif.: Xenova/bart-large-mnli
Q&A English: Xenova/roberta-base-squad2
Q&A Multilingue: Xenova/xlm-roberta-base-squad2
Feature Extraction: Xenova/paraphrase-multilingual-MiniLM-L12-v2

Important Performance Warning

Size Matters: Since these models run in the browser, the user must download them once.

Small Models (<100MB): Fast load, good for mobile.

Large Models (>500MB): High accuracy, but may take 10-30 seconds to initialize on the first run and might crash on older mobile devices.

Always test the Custom Model ID on your target device before deploying.

Cheers!

Thank you @ihsanzainal84 for the suggestion!

Topic		Replies	Views
[NEW FREE PLUGIN] OpenAI Language Task Showcase	0	1209	July 26, 2021
[NEW PLUGIN] 🤖 GPT-3 OpenAI API Plugin for Bubble by Lossless Labs Showcase	0	689	August 19, 2022
:heart: ᴺᴱᵂ ᴾᴸᵁᴳᴵᴺ Google Cloud NLP Sentiment Analysis Showcase	0	879	September 23, 2020
:heart: ᴺᴱᵂ ᴾᴸᵁᴳᴵᴺ AWS Comprehend - Sentiment Analysis Showcase	0	653	October 30, 2020
Clean AI response and put it through another APIs	1	26	September 4, 2025

[New Plugin] NoCoddo AI: Run LLMs inside the browser. 100% Free, Private & Offline

Related topics