Welcome to our

Cyber Security News Aggregator

.

Cyber Tzar

provide a

cyber security risk management

platform; including automated penetration tests and risk assesments culminating in a "cyber risk score" out of 1,000, just like a credit score.

We post-trained a model for offensive security instead of teaching it to refuse

published on 2026-06-10 15:56:02 UTC by /u/greatness_only12
Content:

Everyone's training AI to refuse. We trained a model to break in.

Most other tools are just wrappers and the problem is that they still have all the guardrails on from the foundation model where they will inherit its refusals. We went the other way and post-trained our own model on a decade of capture-the-flag contests.

We have two modes that run over a CLI:

- Security scan: a read-only audit of your local codebase for vulnerabilities

- Pen test: an active adversarial mode that will try to break a live system. Coming soon, not covering it here.

To show what the scan does, I pointed it at Bank of Anthos and it found an integer overflow in the transfer path that lets you forge an account balance, plus the usual injection, auth and secrets issues. (Bank of Anthos is Google's open-source bank. It's a known app and some of it is intentionally weak, which is the point: you can clone it and re-run the scan yourself instead of trusting a screenshot)

How the harness works:
Along with the model we built the harness to support this. The harness runs on a multi-agent swarm: an orchestrator splits the job across subagents running in parallel, each owning a slice, then synthesising one report.

The CLI is a closed binary (brew/curl/winget) and runs locally. Run it behind a firewall and tcpdump exactly what it does before you trust it on anything real. Install is free; the scan runs on a $20 subscription ($20 of tokens).

Where I'd like this debated: should the safety boundary for an offensive security model sit in the harness, or in the model's refusals?

submitted by /u/greatness_only12
[link] [comments]
Article: We post-trained a model for offensive security instead of teaching it to refuse - published about 20 hours ago.

https://www.reddit.com/r/netsec/comments/1u267yi/we_posttrained_a_model_for_offensive_security/   
Published: 2026 06 10 15:56:02
Received: 2026 06 10 16:03:24
Feed: /r/netsec - Information Security News and Discussion
Source: /r/netsec - Information Security News and Discussion
Category: Cyber Security
Topic: Cyber Security
Views: 2

Custom HTML Block

Click to Open Code Editor