Harness vs. OpenClaw: Two Very Different "Agents"

JK1982 
Created at May 30, 2026 02:28:40
Updated at May 30, 2026 02:30:20 

  9   0   0  

If you've been anywhere near AI Twitter lately, you've probably seen two open-source projects throwing off sparks: Harness (the openharness project, published on PyPI as harness-agent) and OpenClaw. Both call themselves "agents," both are open source, and both want access to your machine. But they're built for almost opposite jobs, and lumping them together does a disservice to anyone trying to pick one.

Harness vs. OpenClaw: Two Very Different "Agents"

Here's an honest look at what each one actually is, how you'd install it, and how to think about which (if either) belongs in your workflow.

 

What is Harness?

Harness is a coding agent — a command-line tool plus a Python SDK that drives an LLM through a read-edit-run loop inside your codebase. You point it at a repo, give it a task ("fix the auth bug," "refactor this module," "run the tests and fix what breaks"), and it works through the problem using a set of built-in tools: reading and writing files, running shell commands, searching code, fetching web pages, and spawning sub-agents for parallel work.

Its main selling point is that it's model-agnostic. The same agent runs on Anthropic's Claude, OpenAI's GPT models, Google's Gemini, or a fully local model through Ollama — you switch with a single flag. That's a meaningful difference from agents that are welded to one provider.

A few features worth calling out: permission modes that range from "ask before every change" to a full auto-approve "bypass" mode for CI; context compaction that summarizes the conversation when you approach the model's limit; an MCP client for connecting external tools like Jira or Slack; and a skills system where you teach it workflows by dropping a Markdown file in a config folder. If that last part sounds familiar, it's because the design borrows heavily from the conventions popularized by tools like Claude Code.

A note on the benchmark claims. The project's README states that Harness scores 100% on "Harness-Bench" and outperforms Claude Code, OpenCode, and pi-mono. Take this with appropriate salt: Harness-Bench is the project's own eight-task benchmark, and a tool topping the test its authors wrote isn't independent evidence of anything. The repo is also young and small at the time of writing. The architecture looks reasonable and the feature list is real — but treat "state of the art" as a marketing claim, not a measured fact, until third-party benchmarks (it does support running SWE-bench) confirm it.

 

 

What is OpenClaw?

OpenClaw is a personal assistant agent, and the framing is completely different. Instead of living in your terminal and editing code, it runs on your own machine (Mac, Windows, or Linux) and you talk to it through chat apps you already use — WhatsApp, Telegram, Discord, Slack, Signal, iMessage. You message it like a coworker and it does things: clears your inbox, manages your calendar, browses the web and fills out forms, runs shell commands, and remembers context across conversations.

It's also model-flexible (Claude, GPT, or local models) and heavily extensible through community-built "skills" and plugins, with the agent able to write its own. The project was created by Peter Steinberger and has had a genuinely viral run; its creator has since joined OpenAI while the project continues as open source.

The pitch, distilled: it's the "do things for me in the background" assistant that Siri was supposed to be, but open and running on hardware you control.

 

 

The real difference

The cleanest way to see it:

  • Harness is for builders working in a codebase. It's a developer tool. The output is committed code, fixed bugs, passing tests.
  • OpenClaw is for automating your personal and digital life. The output is a cleared inbox, a booked appointment, a daily briefing in your Telegram.

They overlap only at the edges — both can run shell commands, both use the MCP ecosystem, both let you define skills in Markdown. But choosing between them as if they compete is a bit like choosing between a power drill and a personal assistant. If you're shipping software, Harness. If you want an always-on agent handling your messages and errands, OpenClaw. Plenty of people in both communities run both, pointing OpenClaw at coding tools like Harness or Claude Code when they want code written from their phone.

 

 

Installing them (and a serious caveat first)

Both projects offer the same fashionable one-liner:

curl -fsSL <url>/install.sh | bash

I'd encourage you — and your readers — not to run either blindly. Piping a script straight from the internet into bash executes whatever is in it, with your permissions, no review. For tools this young, download the script and read it first, or use the package-manager path in an isolated environment.

 

Harness, the safer way:

pip install harness-agent
harness connect          # pick a provider, paste your API key harness "Fix the bug in auth.py"

It requires Python 3.12+. Your API key lands in ~/.harness/config.toml. Be especially careful with --permission bypass, which auto-approves every action including shell commands — convenient for CI, dangerous on a machine you care about.

OpenClaw installs via npm (npm i -g openclaw, then openclaw onboard) and walks you through connecting a model and a chat app. Because it's designed to read your email, control your calendar, and act on your behalf, think hard about what you connect it to. The project's own blog has posts about where its security model is heading, which is a polite way of saying that part is still maturing. Granting an autonomous agent your credentials — or, as some enthusiastic early users have done, your credit card — is not something to do casually.

 

 

So which should you use?

Wrong question, most of the time. Decide what you're trying to automate:

  • Writing, fixing, reviewing, or refactoring code → look at Harness (and compare it honestly against Claude Code and OpenCode, which are more established).
  • Offloading email, calendar, errands, and digital chores → look at OpenClaw.

Both are early-stage, both are open source, and both ask for a lot of trust in exchange for their power. That trade-off — capability now, security still catching up — is the real story with this generation of agents, and it's worth keeping front of mind no matter which one you reach for.

 

 

What about token consumption?

This is the question everyone asks, and the honest answer is that the two tools can't be compared apples-to-apples on tokens — and you should be suspicious of anyone who claims otherwise.

The reason is simple: they do different work. Token usage for any agent is driven by the size of the task, how much context gets loaded, how much conversation history is replayed, and which model is running. A coding task in Harness ("refactor this module") and an assistant task in OpenClaw ("check my email") aren't the same unit of work, so putting their token counts side by side tells you nothing meaningful. Neither project publishes usage figures, and inventing a comparison table would be worse than useless.

 

What you can do is measure each tool's real usage yourself, within its own domain. Both expose the numbers:

Harness has built-in cost reporting. Inside the REPL, /cost shows token usage and cost for the current session, and /status shows the provider, model, session, and running cost. So for any task you give it, you can read the actual figure straight from the tool:

harness "Fix the failing tests in test_auth.py" # ...agent works...

/cost      # → tokens used and dollar cost for this session

OpenClaw routes through your own API key, so its usage shows up in your provider's dashboard — the Anthropic Console or OpenAI usage page, depending on which model you connected. Run a task, then check the dashboard for the spend in that time window.

 

 

The "check my email" example

Worth being clear about: this example only runs on OpenClaw. Harness is a coding agent — its tools are file read/write/edit, shell, search, and web-fetch. It has no email integration, so there's nothing to measure on the Harness side of an inbox task.

On OpenClaw, a realistic way to measure it:

  1. From your chat app (say Telegram), message your assistant: "Check my inbox and summarize anything that needs a reply today."
  2. Let it run — it'll read your mail and respond in the chat.
  3. Open your provider dashboard and note the tokens consumed in that window.

What drives the number, in rough order of impact: how many emails it actually reads (and how long they are), how much of its persistent memory and prior conversation gets replayed into context, and the model you chose — a frontier model like Claude Opus or GPT-5.2 will cost far more per run than a local Ollama model, which is effectively free on tokens but trades off quality and speed.

If you want a genuinely useful comparison for readers, the right experiment is: pick one realistic task per tool's actual domain, run it on the same underlying model, and report the real /cost and dashboard figures. Real measured numbers from your own setup will be far more credible — and more interesting — than any generic claim.

 

 

My experience: OpenClaw runs up tokens fast

In my own use, OpenClaw has been a heavy token consumer. That tracks with how it's built: persistent memory and conversation history get replayed into context on each turn, and tasks like reading an inbox pull a lot of raw text in. If you're on a frontier model, that adds up quickly.

It's tempting to assume a coding agent like Harness would therefore be lighter — but I want to be careful here, because I haven't measured it, and the assumption doesn't really hold. Coding agents can be just as token-hungry, sometimes more: they load large files and whole codebases into context, replay them across turns, and spawn sub-agents that each carry their own context. Whether Harness uses fewer tokens than OpenClaw isn't something you can reason out from first principles — it depends entirely on the task, and the only honest way to know is to measure both.

So treat "OpenClaw uses a lot of tokens" as my lived experience, not "Harness uses less" as a conclusion. If you've measured Harness's /cost on real tasks, I'd love to hear your numbers — that's the data that would actually settle it.


 

Both projects are independent and not affiliated with the model providers they run on. Benchmark figures cited above are self-reported by the Harness project and were not independently verified for this post.



Tags: AI Agents AI Security Agent Comparison Coding Agent Developer Tool Harness Harness Agent LLM Agents Open Source OpenClaw Personal Assistant Token Consumption Share on Facebook Share on X

◀ PREVIOUS
What is Docker? Why is Docker also useful in a development environment?

  Comments 0
SIMILAR POSTS

GPL aims to protect the four freedoms of free software

(updated at Sep 23, 2024)

Key Features of the Apache License

(updated at Sep 23, 2024)

Building a Brighter Future: Launching My First GitHub Repository for WE Service

(updated at Sep 22, 2024)


OTHER POSTS IN THE SAME CATEGORY

What is Docker? Why is Docker also useful in a development environment?

(created at May 02, 2026)

Open-Source LLMs: The AI Revolution

(updated at Apr 22, 2026)

Open Databases for Sex Crime Occurrences in the U.S.

(updated at Apr 01, 2026)

Automatically copy text to the clipboard when dragging the mouse in the Cursor

(updated at Mar 19, 2026)

The Future of Software Engineer - AI Engineering

(updated at Nov 05, 2025)

Why ROLLBACK is useful when you work with Google Gemini CLI?

(created at Oct 24, 2025)

Gemini CLI makes a Magic! Time to speed up your app development with Google Gemini CLI!

(created at Oct 21, 2025)

Common Naming Format in Software Development

(created at Oct 07, 2025)

Types of Memory and Storage

(updated at Jul 22, 2025)

How to access websites blocked by ESNI and ECH settings with Firefox!

(updated at Nov 29, 2024)

Block unwanted URLs for comfortable web browsing with Chrome Addon - URL Blocker

(updated at Nov 01, 2024)

Modern Web Indexing Technology - IndexNow

(updated at Oct 24, 2024)

Key Differences in Gen Z/Alpha/Zalpha based on Upbringing and Life Experiences

(updated at Oct 22, 2024)

Zalpha: A Global Trend, Not Just a Distant Concept

(updated at Oct 22, 2024)

Zalpha Generation: A New Term for the Children of Gen Z and Millennials

(updated at Oct 22, 2024)

UPDATES

Clean Python Environments: The Power of venv vs. Docker

(updated at May 04, 2026)

What is Docker? Why is Docker also useful in a development environment?

(created at May 02, 2026)

UIUC 2026-2027 Academic Calendar

(updated at Apr 22, 2026)

How to Build Llama 3 AI Apps with Python: Setup & User Prompts

(updated at Apr 22, 2026)

Open-Source LLMs: The AI Revolution

(updated at Apr 22, 2026)

Resume 2.0: Leveling Up for My First Software Gig

(created at Apr 16, 2026)

Not everyone will understand what this man just did

(created at Apr 08, 2026)

UIUC Dorm Guide: Find Your Perfect Fit !!

(updated at Apr 07, 2026)

Unpacking IU's Shopper

(created at Apr 06, 2026)

Jackie Chan's Police Story: The Action Masterpiece

(updated at Apr 06, 2026)

The IVE Story: Identity, 'I AM' Charts, and Influence

(updated at Apr 06, 2026)

Tech Visionaries who graduated at UIUC - You are the Next Turn

(updated at Apr 02, 2026)

Open Databases for Sex Crime Occurrences in the U.S.

(updated at Apr 01, 2026)

Automatically copy text to the clipboard when dragging the mouse in the Cursor

(updated at Mar 19, 2026)

My First Day at University of Illinois-Urvana Champaign

(updated at Feb 25, 2026)

Sand, Sea, and a Splash of Fun at Newport Beach: A Family Adventure

(updated at Feb 25, 2026)

Sun, Rocks, and Adventure: A Day at Joshua Tree National Park

(updated at Feb 25, 2026)

Sipping the Stars: My Starbucks Adventure

(updated at Feb 25, 2026)

Exciting explore at Sequoia National Park

(updated at Feb 25, 2026)

My Life Shot at Death Valley

(updated at Feb 25, 2026)

Ip Man fights with Muay Thai Master

(created at Jan 20, 2026)

Mad Clown - Don't Die

(created at Jan 15, 2026)

How to get Student Enrollment and Degree Verification at UIUC

(updated at Dec 18, 2025)

LAX Thanksgiving Rush: A Joyful Reunion

(updated at Nov 24, 2025)

ZO ZAZZ(조째즈) - Don`t you know (모르시나요) (PROD.ROCOBERRY)

(updated at Nov 24, 2025)

FISHINGIRLS Unleashes Energetic EP 'Funiverse' Featuring Signature Track 'Fishing King'

(updated at Nov 18, 2025)

10CM - To Reach You (너에게 닿기를)

(updated at Nov 17, 2025)

Feeling weak? Transform yourself at the UIUC ARC!

(updated at Nov 15, 2025)

BOYNEXTDOOR - If I Say I Love You

(updated at Nov 11, 2025)

The Future of Software Engineer - AI Engineering

(updated at Nov 05, 2025)

G Dragon x Taeyang (Eyes Nose Lips, Power, Home Sweet Home, GOOD BOY) - LE GALA PIÈCES JAUNES 2025

(updated at Nov 01, 2025)

Lie - Legend song by BIGBANG

(updated at Nov 01, 2025)

Why ROLLBACK is useful when you work with Google Gemini CLI?

(created at Oct 24, 2025)

Reimbursement after Vaccination at McKinley Health Center

(created at Oct 24, 2025)

Gemini CLI makes a Magic! Time to speed up your app development with Google Gemini CLI!

(created at Oct 21, 2025)

Common Questions from UIUC school life in terms of CS Program

(created at Oct 20, 2025)

UIUC Immunization Compliance

(created at Oct 20, 2025)

LEE CHANHYUK's songs really resonate with my soul - Time Stop! Vivid LaLa Love, Eve, Endangered Love ...

(created at Oct 18, 2025)

LEE CHANHYUK - Endangered Love (멸종위기사랑)

(created at Oct 18, 2025)

Cupid (OT4/Twin Ver.) - LIVE IN STUDIO | FIFTY FIFTY (피프티피프티)

(created at Oct 18, 2025)

Common methods to improve coding skills

(created at Oct 18, 2025)

US National Holiday in 2026

(created at Oct 18, 2025)

BABYMONSTER “WE GO UP” Band LIVE [it's Live] K-POP live music show

(created at Oct 18, 2025)

BLACKPINK - ‘Shut Down’ Live at Coachella 2023

(created at Oct 18, 2025)

JENNIE - like JENNIE - One of Hot K-POP in 2025

(created at Oct 18, 2025)

BABYMONSTER(베이비몬스터) - DRIP + HOT SOURCE + SHEESH

(created at Oct 08, 2025)

Common Naming Format in Software Development

(created at Oct 07, 2025)

In a life where I don't want to spill even a single sip of champagne - LEE CHANHYUK - Panorama(파노라마)

(created at Oct 06, 2025)

Countries with more males and females - what about UIUC?

(created at Oct 04, 2025)

Challenge: One Code Problem Per Day

(created at Oct 03, 2025)