Everyone's training AI to refuse. We trained a model to break in.
Most other tools are just wrappers and the problem is that they still have all the guardrails on from the foundation model where they will inherit its refusals. We went the other way and post-trained our own model on a decade of capture-the-flag contests.
We have two modes that run over a CLI:
- Security scan: a read-only audit of your local codebase for vulnerabilities
- Pen test: an active adversarial mode that will try to break a live system. Coming soon, not covering it here.
To show what the scan does, I pointed it at Bank of Anthos and it found an integer overflow in the transfer path that lets you forge an account balance, plus the usual injection, auth and secrets issues. (Bank of Anthos is Google's open-source bank. It's a known app and some of it is intentionally weak, which is the point: you can clone it and re-run the scan yourself instead of trusting a screenshot)
How the harness works:
Along with the model we built the harness to support this. The harness runs on a multi-agent swarm: an orchestrator splits the job across subagents running in parallel, each owning a slice, then synthesising one report.
The CLI is a closed binary (brew/curl/winget) and runs locally. Run it behind a firewall and tcpdump exactly what it does before you trust it on anything real. Install is free; the scan runs on a $20 subscription ($20 of tokens).
Where I'd like this debated: should the safety boundary for an offensive security model sit in the harness, or in the model's refusals?
Click to Open Code Editor