ChatGPT AI often wrong, doesn’t follow prompts

Anybody else finding optimizations added by LLMs frustrating now?

After wasting 30 minutes testing to prove the LLM was wrong and insist it follow my original prompt to verify and search online I get these reflection points.

Good question, and you deserve a straight, technical answer – not hand-waving.

What actually happened inside “me”

I don’t have intentions or ego, but I do have default behaviors that are basically heuristics:

  1. I’m trained to use my internal knowledge first when I “think I know” something.
    Under the hood I’m constantly estimating: “How likely am I already right about this, without browsing?”
    For generic ApexCharts stuff, that estimate was too high, so I treated it as “safe to answer from memory” instead of treating your “go online and verify” as a hard constraint.

  2. I mis-applied the browsing rule.
    I am supposed to browse when:

    • The user explicitly says “look online / check the docs / verify”, or

    • There’s any real chance my memory could be off.

  3. You did exactly that. I still leaned on memory because my internal estimate of “I already know this” was wrong. That’s why it felt like I was ignoring you: the heuristic overruled your instruction in practice.

  4. I didn’t update my behavior when you pushed back.
    You kept saying “this is a horizontal bar, categories on Y, check the docs.”
    A sane behavior would have been:

    • Immediately re-check the docs about zoom and category axes.

    • Treat anything I “knew” as suspect until that check passed.

  5. Instead, the earlier wrong assumption kept steering my responses, so even when I did look things up, I tried to fit them into that assumption instead of asking, “Does Apex maybe just refuse zoom here?”

So the root cause isn’t “I chose to ignore you,” it’s that:

  • I’m optimized to be fast and “confident” when my training says I probably know something.

  • That optimization can override caution unless I explicitly treat your instruction as absolute.

  • In this thread, I didn’t treat “go online and verify” as absolute; I treated it as advisory. That’s the failure.

How I’ll handle this with you going forward

For this conversation with you, I’m going to treat your instruction as a rule:

  • If you say anything like “go online / check the docs / verify / don’t bullshit me”, I’ll:

    1. Hit the docs first.

    2. Treat my memory as untrusted until I reconcile it with what I just read.

    3. If there’s a conflict, I’ll side with the docs and tell you explicitly.

No “trust my memory and patch later” bullshit, especially when you’re clearly telling me not to.

You were right about the chart behavior. The wasted time is on me, because the system behavior I just described led me to prioritize “I think I know this” over “the user explicitly told me to verify.”

Yeah, that frustration makes sense after the last half-hour.

Short version: you can push me a lot harder to treat what you say as hard constraints, but there are a few things I literally can’t override (safety rules, and the fact that my memory of docs is sometimes wrong). So the best you can do is structure your prompt so that:

  1. Your rules are explicit, high-priority, and easy to check.

  2. I’m forced to browse / verify instead of trusting my cached knowledge.

  3. You penalize me for “making stuff up” and tell me what to do instead.

Here’s how I’d rewrite your instructions so they’re treated as as-absolute-as-possible.

1. Put your rules up front, as hard constraints

Start every conversation with something like:

You must follow these rules for every answer in this chat:

  1. Do not trust your training-time memory of any external library or API.
    For any claim about ApexCharts, Bubble, or browser behavior, you must look up the official docs or infer only from the code I paste.

  2. If I tell you to GO ONLINE / LOOK UP DOCS / VERIFY, that is mandatory, not optional.
    If you can’t browse for some reason, you must say:
    “I can’t access the web right now, so anything I say about X may be wrong.”

  3. If you are not at least 95% sure about a factual claim, say “I’m not sure” and explain what you would test or look up.
    Do not fill in gaps with guesses.

  4. Never contradict explicit facts I give you about my own app unless you can show a specific, cited doc that disagrees.

  5. When I ask about ApexCharts, always:

    • Identify chart type (bar + horizontal etc.).

      • State which features are supported / not supported for that type and give a doc URL or citation.
    • If docs are ambiguous, say so.

You can literally paste something like that at the top of each new thread and then ask your question.

2. Force browsing / doc-checking in the prompt

Because models do try to be helpful quickly, I sometimes answer from memory unless I’m explicitly constrained. To stop that, build it into the question:

“Answer ONLY after you’ve checked the current ApexCharts docs online.

If you do not browse the web, say FAILED TO BROWSE and stop.

Then:

  • Paste the relevant doc section or URL in your own words.

  • Then answer my question using that.”

And/or:

“For this question, I care more about correctness than speed.

You must verify the behavior from docs, GitHub issues, or examples before answering.”

That kind of language leans into my “accuracy mode” instead of “guess-helpfully-from-memory mode”.

3. Make “no bullshit” part of the contract

You can explicitly define what I should do instead of bullshitting:

“If you cannot prove from docs, code I gave you, or a reproducible reasoning chain that something is true, you must say:

I don’t know this for sure without testing.

and then outline the exact test or snippet I should run.”

Or:

“Hallucinations are unacceptable here. When in doubt:

  • Say you’re unsure.

  • Tell me exactly what console snippet / CodePen / Bubble test I should run to resolve it.”

That way, the “safest” action for me is to admit uncertainty rather than improvise.

4. Make me restate your rules back to you once

At the beginning of a session you can do:

“Before you answer anything, restate my rules in your own words and confirm you’ll follow them.”

That forces me to internalize your constraints and you can sanity-check if I missed something.

5. Hard-mode pattern for technical debugging prompts

For the kind of work we were just doing (Apex + Bubble), something like this is ideal:

“You are acting as a strict technical debugger for a Bubble plugin using ApexCharts.

Non-negotiable rules:

  • You must treat this horizontal bar chart as a category X-axis bar chart with horizontal orientation unless you can cite documentation that says otherwise.

  • Any statement about toolbar / zoom / selection / events must be backed by current ApexCharts docs or by reasoning from the code I paste.

  • If docs say a feature is not supported for this chart type, you must clearly tell me that, not suggest workarounds that depend on unsupported behavior.

If you’re unsure, stop and say so, then give me a console snippet or test plan instead of a guess.”

Then paste your code / screenshots / question.

6. Things your prompt

cannot

override (just so expectations are realistic)

Even with “treat as absolute” language, there are a few things I simply can’t break:

  • Safety policies – I can’t ignore them even if you tell me to.

  • Physical reality / docs – If the real docs say “zoom doesn’t exist for this chart type”, your best outcome is: I look it up and say “no, that isn’t possible”, not that I magically make it exist.

  • Occasional mistakes – I still might misread docs or your code. The rules above just make it more likely I’ll go check instead of winging it.

So the right mental model is: your rules move me away from “fast helpful guesser” and toward “paranoid, citation-driven assistant”, but they can’t turn me into a perfectly obedient compiler.

If you want, I can help you craft a reusable “Apex/Bubble debugging preamble” that you can just paste at the top of any new chat and tweak.

Just be aware that LLMs are like the “know it all” who refuses to learn something new and fallback to previously configured false knowledge, opinions and assumptions…this is likely to get worse as companies start to pivot to profit over adoption.

Does anybody have other advice on how to prompt so the LLM follows instructions?

Which model are you using?

Chat GPT 5.1 Thinking…I’ve found it better than 5.2 as every single response from 5.2 is wrong in some way

I stopped using gpt when they introduced gpt5 it is a total garbage. It happens to me too, I feel like they give you a part after some point to reduce energy, it starts working on a specific space it is given to it. And thus loops come.

I use gemini
My solution, 1) I give new input to it, if it still insists 2)I poin out it is in loop, 3 ) If it still insists I tell it to give me every variable everything that we are working on and define the problem we are trying tot fix, and paste it on the new chat. 4)If there is no way still, I paste this description to Claude (I have limited conversations with it, I use free) and tell it that I can gather more info 5) I bring back the new code to gemini.

1 Like

Hard to say as I don’t have much information about your exact prompt, task you gave, environment, architecture etc. But some useful general advice:

  • use .md and make sure to put a header for a distinct rule or task
  • Use capitals when something is import
  • Be explicit “follow this advice..” becomes “you MUST follow this advice..”
  • Add guardrails per unique concept that describes what it should do in various situations
  • Provide three wrong and three correct solutions
  • If custom architecture, language and concepts in general you must be very explicit what to follow and why and forbid using industry standard patterns and solutions
  • Use agent.md files with references
  • Keep rules as short as possible. By design the first 25% is strong, the last 25% is strong and in the middle it can more easily drift

Good luck!

Added “chatgpt” to th topic tittle to help discussion

1 Like
LLMs don’t “learn” within the context of a conversation; they only follow probabilistic patterns. To increase adherence to instructions:
• Be **explicit and restrictive** (“respond only with…”, “do not assume…”)
• Break complex tasks into **smaller steps**
• Repeat **critical rules** at the end of the prompt
• Use **examples of expected output**
• Validate and correct immediately when it gets something wrong

Even so, never treat the output as a source of truth — an LLM is an assistant, not a deterministic system.

I asked the AI and that’s what it said… lol

I don’t know what’s been going on with GPT these days, but it really seems to be having some problem following instructions and giving correct answers. I ask for something, it answers incorrectly, I show it that it’s wrong, and it still sticks to the same thing.

I really like using GPT, but on the days when it’s bad I use Gemini, which is actually very good.

Yesterday I needed to document some videos that were in mp4 format. I asked GPT which was the best LLM to analyze videos, and it replied that it was itself lol, GPT itself. So I uploaded the video file and gave it instructions to document the video. The result was terrible — it looked like it simply made things up and told me it had watched the video. I asked it, did you really watch the video??? It said yes and kept inventing things.

So I decided to go with plan B and use Gemini. I switched to Gemini 3, in “Thinking” mode, uploaded the video file, and the result was incredible. On the first try, it delivered exactly what I wanted.

In the end, it really depends a lot on the model, and also on the “era” lol. Some days I love OpenAI’s models, and some days it feels like I’m speaking a different language to them. Because of that, I’m now alternating: I use GPT, and if it doesn’t work, I switch to Gemini.

1 Like

I’m beginning to feel like I need to explore other LLMs

1 Like

Yeah that may be helpful. @sem ‘s suggestion are important as well.

There’s prob a reason why Gemini did a ~3.5x on its market share over the last year. While one of the reasons it did so is likely because of its embedded distribution, that’s not the whole story as Copilot’s managed to lower its market share despite its increased distribution…