why use many token when few do trick
Make your AI coding agent talk like a caveman.
Same answers, 65% fewer output tokens. Brain still big. Mouth small.
See it Β· Install Β· Levels Β· What you get Β· Benchmarks Β· Ecosystem Β· Caveman 2
Caveman is a skill/plugin for Claude Code, Codex, Gemini, Cursor, Windsurf, Cline, Copilot, and 30+ other agents. Install once. Agent drops the filler and answers in tight caveman-speak, keeping code, commands, and errors byte-for-byte exact. You save output tokens on every reply, forever.
Same fix. Third of the words. Nothing technical lost.
ββββββββββββββββββββββββββββββββββββββββββββββ
β output tokens saved βββββββββ 65% β
β input tokens saved βββββββββ 0% β
β technical accuracy βββββββββ 100% β
β vibes βββββββββ OOG β
ββββββββββββββββββββββββββββββββββββββββββββββ
Caveman no make brain smaller. Caveman make mouth smaller. Shrinks what the agent says, not what it knows.
One command. Finds every agent on your machine. Installs for each.
# macOS Β· Linux Β· WSL Β· Git Bash
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash# Windows Β· PowerShell 5.1+
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex~30 seconds. Needs Node β₯18. Skips agents you no have. Safe to re-run.
Tip
Turn it on: type /caveman or say "talk like caveman". Turn it off: say "normal mode". On Claude Code, Codex, and Gemini it's already on from message one. No command needed.
Install for one agent, or any of 30+ others
Every agent has its own path (plugin, extension, rule file, or npx skills add). The full per-agent matrix, all flags, dry-run, and uninstall live in INSTALL.md. A few common ones:
# Claude Code plugin
claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman
# Gemini CLI extension
gemini extensions install https://github.com/JuliusBrussee/caveman
# Cursor / Windsurf / Cline / Codex / 30+ more, via the skills registry
npx skills add JuliusBrussee/caveman -a cursorInstall broke? Open your agent in this repo and say: "Read CLAUDE.md and INSTALL.md, install caveman for me." Agent read repo, agent fix own brain. Snake eat tail.
Six levels. Switch anytime with /caveman <level>. Level sticks until you change it or the session ends.
| Level | Same sentence, shrunk |
|---|---|
| normal agent | You should wrap the object in useMemo, since a new reference is created on every render. |
lite |
Wrap object in useMemo. New ref created every render. |
full (default) |
New ref each render. Wrap object in useMemo. |
ultra |
New ref/render. useMemo it. |
wenyan |
New ref every render, so wrap in useMemo β rendered in classical Chinese, shorter still. |
Note
Speak your tongue. Caveman keeps your language. Write Portuguese, caveman grunt Portuguese. Spanish, French, same. It compresses the style, never translates. wenyan mode is the exception on purpose: classical Chinese packs the most meaning per token.
| Command | What it does |
|---|---|
/caveman [lite|full|ultra|wenyan] |
Compress every reply. Level sticks for the session. |
/caveman-commit |
Conventional Commit messages, β€50-char subject. Why over what. |
/caveman-review |
One-line PR comments: L42: π΄ bug: user null. Add guard. |
/caveman-stats |
Real session token usage, lifetime savings, USD. Tweetable line with --share. |
/caveman-compress <file> |
Rewrite a memory file (like CLAUDE.md) into caveman-speak. Cuts ~46% input tokens every session after. Code, URLs, paths byte-preserved. |
caveman-shrink |
MCP middleware. Wraps any MCP server, compresses its tool descriptions. npm. |
cavecrew-* |
Caveman subagents (investigator, builder, reviewer). ~60% fewer tokens than vanilla, so main context lasts longer. |
Tip
On Claude Code the statusline shows [CAVEMAN] β 12.4k β that's your lifetime tokens saved, updated on every /caveman-stats. Silence it with CAVEMAN_STATUSLINE_SAVINGS=0.
Real token counts from the Claude API. Average 65% output reduction across 10 prompts (range 22β87%), measured against default verbose replies. Output tokens only, committed and reproducible in benchmarks/ and evals/.
| Task | Normal | Caveman | Saved |
|---|---|---|---|
| Explain React re-render bug | 1180 | 159 | 87% |
| Fix auth middleware token expiry | 704 | 121 | 83% |
| Set up PostgreSQL connection pool | 2347 | 380 | 84% |
| Explain git rebase vs merge | 702 | 292 | 58% |
| Refactor callback to async/await | 387 | 301 | 22% |
| Architecture: microservices vs monolith | 446 | 310 | 30% |
| Review PR for security issues | 678 | 398 | 41% |
| Docker multi-stage build | 1042 | 290 | 72% |
| Debug PostgreSQL race condition | 1200 | 232 | 81% |
| Implement React error boundary | 3454 | 456 | 87% |
| Average | 1214 | 294 | 65% |
Important
Honest number warning. Caveman only shrinks output tokens. Input and reasoning tokens are untouched, and the skill itself adds ~1β1.5k input tokens per turn. So whole-session savings run smaller than the output number, and on already-terse workloads they can go net-negative. The real win is readability and speed. Cost savings are the bonus. When caveman wins, when it loses, and how to measure it yourself: docs/HONEST-NUMBERS.md.
Turns out short isn't just cheaper. A March 2026 paper, Brevity Constraints Reverse Performance Hierarchies in Language Models, tested 31 models and found that constraining large models to brief answers improved accuracy by ~26 points on some benchmarks. Sometimes less word = more correct.
caveman-compress receipts β real memory files, cutting input tokens forever
| File | Original | Compressed | Saved |
|---|---|---|---|
claude-md-preferences.md |
706 | 285 | 59.6% |
project-notes.md |
1145 | 535 | 53.3% |
claude-md-project.md |
1122 | 636 | 43.3% |
todo-list.md |
627 | 388 | 38.1% |
mixed-with-code.md |
888 | 560 | 36.9% |
| Average | 898 | 481 | 46% |
Every session after, that file loads ~46% smaller. Input tokens saved forever, not just one reply.
|
This skill shrinks what an agent says. caveman-code shrinks everything β a full terminal coding agent, caveman top to bottom. ~2Γ fewer tokens than Codex on identical tasks. 20+ providers, plan mode, autopilot goal loop, MIT. npm install -g @juliusbrussee/caveman-code |
Five tools, one idea: agent do more with less.
| Repo | What it shrinks |
|---|---|
| caveman (you here) | What the agent says |
| caveman-code | The whole agent, end to end |
| cavemem | What the agent remembers, across sessions |
| cavekit | The build loop β spec-driven, no guessing |
| cavegemma | The compression baked into weights (Gemma fine-tune) |
Also: five sibling skills, one install
JuliusBrussee/skills β works in Claude Code, Cursor, Gemini, Cline, Copilot, 40+ agents:
| Skill | What |
|---|---|
| caveman | This one. Speak less, say more. |
| grill-me | Agent grills your plan before you build the wrong thing. |
| interface-kit | Build UI that looks good, loads fast, works for everyone. |
| junior-to-senior | Adversarial review pass. Junior output in, senior output out. |
| loop-factory | Spec-driven task loop β inbox β active β archive. |
npx skills@latest add JuliusBrussee/skillsπ¦ Teach the lobster brevity β OpenClaw integration
OpenClaw is a self-host gateway: one box, many agents inside, wired to Slack / Discord / iMessage / Telegram. Lobster strong. Lobster smart. Lobster also talk a lot.
Same installer, scoped to one agent:
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash -s -- --only openclawTwo things happen, no more: a caveman skill lands in the workspace, and a tiny marker-fenced block is appended to SOUL.md (OpenClaw injects it every turn, so the lobster is terse from message one β no /caveman per session). Custom path? OPENCLAW_WORKSPACE=/your/path. Uninstall with the same line plus --uninstall; your other workspace content stays untouched. Lobster claw still sharp. Lobster mouth now small.
Caveman make token small. Caveman 2 make it provable.
Today's savings numbers (including /caveman-stats) are local estimates. Caveman 2 measures and verifies them across a whole team β real receipts, real dashboard, real proof the tokens went down. Building it now.
Join the waitlist β caveman.so
- Install drops a skill file into your agent.
- Skill tells agent: drop filler, keep substance, use fragments β but never touch code, commands, or errors.
- On Claude Code, a hook writes a tiny flag file each session, so the agent talks caveman from message one without
/caveman. /caveman-statsreads your session log, counts tokens saved, writes the number to your statusline./caveman-compressrewrites memory files (likeCLAUDE.md) so every future session starts with a smaller context. Save tokens forever, not just once.
Hook architecture, file ownership, and CI sync are documented for maintainers in CLAUDE.md.
Caveman no phone home. No telemetry, no analytics, no accounts, no backend. After install, zero network calls β the skill is a prompt, the hooks are local scripts, and /caveman-stats reads a log already on your disk. Install-time fetches (GitHub plus your agents' own registries) are spelled out in SECURITY.md.
Caveman free forever. Sponsors keep the rock sharp.
Atlas Cloud β full-modal AI inference platform, one API.
Want your rock here? β Sponsor caveman
Caveman save you token, save you money. Star cost zero. Fair trade. β
Docs: Install matrix Β· Honest numbers Β· Contributing Β· Maintainer guide Β· Issues
Also by Julius Brussee: Revu β local-first macOS study app with FSRS spaced repetition (revu.cards)
MIT β free like mass mammoth on open plain.
