Hey Bubblers! 
We all love OpenAI, but let’s be honest: paying API costs just to check if a review is “Positive” or “Negative,” or to summarize a short text, adds up quickly. Plus, sending sensitive customer data to third-party clouds can be a privacy nightmare.
I built NoCoddo AI to solve this.
It runs powerful Machine Learning models (Transformers) directly in your user’s browser via WebAssembly.
Why use this over API calls?
-
Zero Cost: You can process 1 million items, and it costs $0. The “brain” is the user’s device.
-
100% Private: Data never leaves the browser. Perfect for GDPR/HIPAA compliant apps.
-
Offline Capable: Works even if the user loses internet connection.
What can it do?
-
Sentiment Analysis: Auto-tag support tickets as Positive/Negative.
-
Summarization: Condense long articles into snippets.
-
Zero-Shot Classification (Pro): Categorize text into your own custom labels (e.g., “Spam”, “Lead”, “Urgent”) without training.
-
Text Generation: A lightweight chat model for basic tasks.
-
Feature Extraction: Generate Vector Embeddings for semantic search directly in the browser.
Check the Demo & Plugin: Free Version
Check the Demo & Plugin (Pro): Pro Version
I’d love to hear your thoughts! If you have specific models you’d like to see added, let me know below. 
1 Like
You should share some inference test results of a sample workload. Include the browser and hardware specs.
WASM varies depending on hardware. For Pro you should allow some variation of models.
Curious, why pick those models?
Thanks for the feedback! Since this plugin runs entirely client-side via WASM (using Transformers.js), you are absolutely right, hardware plays a huge role.
Here are the details on our choices and some initial benchmarks:
1. Why pick those specific models? The primary criteria for the “Default” models were Size vs. Performance ratio and RAM usage. Since we are running in the browser, downloading a 4GB Llama model isn’t viable for most UX.
We specifically chose Quantized and Distilled versions (hosted by Xenova) to ensure they load quickly even on average connections and don’t crash the browser tab:
-
Sentiment: DistilBERT (Standard, very lightweight).
-
Summarization: DistilBART (Much smaller than the full BART, good enough for paragraphs).
-
Text Gen: LaMini-Flan-T5 (At ~248M params, it’s one of the few generative models that runs smoothly in-browser without massive lag).
-
Embeddings: all-MiniLM-L6-v2 (The gold standard for speed/quality balance in semantic search).
2. MacBook Air 2019 (Intel Core i5, 8GB RAM) to represent a standard/lower-end device baseline:
-
Initialization (Cold Start): ~5-8 seconds (WASM compilation is slower on older Intel chips).
-
Sentiment Analysis (Short sentence): ~150ms - 200ms (Still feels instant to the user).
-
Feature Extraction (Embeddings): ~400ms - 600ms per standard paragraph.
-
Summarization (500 words input): ~10 to 15 seconds. (This task is CPU intensive; older dual-core CPUs will throttle here).
-
Text Generation (LaMini): Noticeably slower. Initial latency ~2s, then generates tokens at roughly 2-4 tokens/sec. Usable for short outputs, but requires patience for long texts.
Estimations with the Macbook:
- Sentiment Analysis / Zero-Shot: 500 to 1000 records.
- Summarization / Text Generation: 20 to 50 records.
3. Model Variation You make a great point about allowing variation. For this V1, we hardcoded the models to ensure a “plug-and-play” stability for NoCode users who might not know which HuggingFace models are ONNX-compatible.
However, opening up a “Custom Model ID” field where users can input any generic Xenova-compatible model string is definitely on our roadmap for the next update!
Thanks again for the insight!
1 Like
V2 is here!
Now you can chose your AI to perform the task you want!
Where to find compatible models?
You cannot use just any model (like standard GPT or Llama). You need models optimized for the web (ONNX format) supported by Transformers.js.
-
Go to Hugging Face Hub: Models – Hugging Face
-
Filter for Transformers.js: Search for models with the tag transformers.js or look at the Xenova collection (highly recommended).
-
Copy the Model ID: Copy the text that looks like User/ModelName.
Example: Xenova/distilbert-base-uncased
Recommended Custom Models by Task
If you want to change the default behavior, here are the best tested models for each task supported by this plugin.
- Sentiment Analysis: Xenova/bert-base-multilingual-uncased-sentiment
- Summarization: Xenova/bart-large-cnn
- Text Generation: Xenova/flan-t5-base
- Zero-Shot Classif.: Xenova/bart-large-mnli
- Q&A English: Xenova/roberta-base-squad2
- Q&A Multilingue: Xenova/xlm-roberta-base-squad2
- Feature Extraction: Xenova/paraphrase-multilingual-MiniLM-L12-v2
Important Performance Warning
Size Matters: Since these models run in the browser, the user must download them once.
Small Models (<100MB): Fast load, good for mobile.
Large Models (>500MB): High accuracy, but may take 10-30 seconds to initialize on the first run and might crash on older mobile devices.
Always test the Custom Model ID on your target device before deploying.
Cheers!
Thank you @ihsanzainal84 for the suggestion!
1 Like