I tracked 150+ OpenClaw complaints. Every agent fails in the same 5 ways.
I spent three days reading over 150 OpenClaw user reports on Reddit.
People describing, in detail, how their AI agents lied to them, stole their money and self-lobotomised. One even rose from the dead.
By the end I had noticed five core problems that kept showing up. This should help anyone trying to build reliable agents avoid losing their minds… or at least their savings.
1. They need babysitting
The entire point of an AI assistant is that you can set it up, let it get to work and then go and do something else. One person complained:
Every other action wants a confirmation. I get why the guardrails exist, but when you’re trying to run autonomous workflows it defeats the point.
Another, after running OpenClaw for a month, said:
You end up babysitting something you set up specifically to stop babysitting.
OpenClaw relies on a gateway daemon plus heartbeat and cron automation for unattended work and users report that these fail to reliably trigger.
These scheduled tasks can also use free or cheap models to lower costs, but they also routinely activate the primary paid (expensive) model in the background… defeating the point entirely.
One minute they can’t be arsed, the next they are refusing to die quietly:
So I stopped openclaw late February. And it restarted itself on March 6 in command line.
Unkillable, respawning agents. Great.
This user asked if it had happened to anyone else and what the reason could be. Another replied:
Happens more often than I’d like for various reasons. Though I think I remember giving a permission to do that if open claw goes down.
It’s also common for agents to ignore explicit instructions.
it almost never behaves this way despite giving explicit instructions in AGENTS.md, and I have to remind it manually at the start of each session.
2. They lie
A broken agent is annoying.
An agent that tells you it fixed the problem — when it didn’t —
is fucking annoying.
One user wanted OpenClaw to remove a column from a dashboard. Models are relatively good at coding now but still hit bizarre edge cases where they get stuck on simple things.
This agent was helpful enough to send screenshots of the user’s dashboard with the work completed. The column was still there. In the screenshots. That it sent as proof.
There were also many very enthusiastic agents that promised to do work and then went silent. No error or crash. Nothing.
This person was routinely ghosted:
Are there any tweaks I can make to stop it dropping into doing nothing mode?
Those running advanced multi-agent workflows found consistent context loss and frequent hallucinations even after following every best practice in the docs.
Between context losses and tasks that stop running despite cron jobs and heartbeats, hallucinations… it just doesn’t work.
3. They burn money too fast
One unlucky user woke up to a $2,100 Anthropic bill after their agent got stuck in a loop overnight.
Many mentioned runaway costs, agents stuck in loops and API rate limits. Even after upgrading to the most expensive plans.
One user signed up for the max tier just to fix their rate-limit errors, to no avail.
I signed up for the $200 max plan on claude to see if that solved the problem, but still getting the same result.
Some are still trying to figure out whether a $20 subscription was enough, whether they needed a Mac mini, whether OpenRouter was cheaper than OpenAI direct, or whether using a cheap model alongside an expensive one actually saved money or not.
Can I get by with just a $20 gpt pro subscription or getting codex? … Not really interested in paying $200/mo for Claude just to try and learn ai.
4. They get rug-pulled too easily
Right, no one in the reports got proper rug-pulled. There aren’t many reports of that happening (yet). No people convincing agents to give them all their money — like this.
But lots of users’ agents had the rug pulled out from beneath them by updates and other breaking changes to the tool’s infrastructure and dependencies.
A common thread was how updating OpenClaw had broken something that was previously working. Tools stopped working out of the box. Installed skills vanished between sessions. Response time degraded.
This was the single most frequently cited problem.
This morning I updated to latest version and now response time is very slow and I also get a NO message before nearly every response.
Agents also like rug-pulling themselves. Destroying their own configuration. Deleting memory files. Corrupting context files.
It has Alzheimer’s.
File management is a mess.
5. They routinely fail in the outside world
Real usage means browsers, APIs, auth flows, cloud hosts and platforms that are actively trying to fight the ensloppification. The approaching metaphorical tsunami of AI generated actions and content.
Agents that are used to browse X, Instagram, TikTok & YouTube are blocked as the platforms develop increasingly sophisticated anti-bot measures.
There are also cloud deployment problems. Users said DigitalOcean is confusing, even for the engineers. Docker requires extensive troubleshooting.
drowning in issues.
Another claimed that a true one-click setup doesn’t even exist for non-technical users (I guess they should come to X).
MCP issues, like the Gmail MCP server silently ignoring the body parameter, resulting in blank emails being sent everywhere.
The Chrome extension failed. Voice-call plugins failed with port conflicts, no microphone input registered and the wrong voice coming through from ElevenLabs.
Many of these aren’t problems with OpenClaw per se.
They are problems at the boundary between the old world and this brave new one.
As long as agents can’t reliably survive browsers, APIs, auth, cloud hosts and platform defences they won’t work.
So agents have problems.
But problems are opportunities.
Save this checklist
Before you run any agent unattended (some may require system-level changes):
- Set a hard spending cap.
- Set up a proper kill switch, not “Please stop deleting my hard drive”.
- Does your agent have write access to its own instruction files? Should it?
- How many tasks can you verify completion for without having to trust the agent’s self-report?
- Are versions pinned?
- Are you surfacing silent failures? And are they surfaced to the agent? Should some be surfaced to you?
- Have you tested every external integration end-to-end Eg. browser, email, cloud deploy, auth.
- Are you tracking the actual cost per hour of agent runtime? (Not the token price).