Welcome to our

Cyber Security News Aggregator

.

Cyber Tzar

provide a

cyber security risk management

platform; including automated penetration tests and risk assesments culminating in a "cyber risk score" out of 1,000, just like a credit score.

More Research Showing AI Breaking the Rules

published on 2025-02-24 12:08:56 UTC by Bruce Schneier
Content:

These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating.

Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any human, or any of the AI models in the study. Researchers also gave the models what they call a “scratchpad:” a text box the AI could use to “think” before making its next move, providing researchers with a window into their reasoning.

In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’—not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.

Between Jan. 10 and Feb. 13, the researchers ran hundreds of such trials with each model. OpenAI’s o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the time­making them the only two models tested that attempted to hack without the researchers’ first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.

Here’s the paper.

Article: More Research Showing AI Breaking the Rules - published about 4 hours ago.

https://www.schneier.com/blog/archives/2025/02/more-research-showing-ai-breaking-the-rules.html   
Published: 2025 02 24 12:08:56
Received: 2025 02 24 12:19:55
Feed: Schneier on Security
Source: Schneier on Security
Category: Cyber Security
Topic: Cyber Security
Views: 2

Custom HTML Block

Click to Open Code Editor