New AI security benchmark: Backbone breaker release

Check Point and Lakera have launched the b3 benchmark to enhance LLM security in AI agents, promising improved security measures for developers.

  • Wednesday, 29th October 2025 Posted 5 months ago in by Aaron Sandhu

Check Point Software Technologies Ltd., a global leader in cyber security solutions, has teamed up with Lakera to launch the backbone breaker benchmark (b3). Developed in collaboration with researchers from the UK AI Security Institute, this open-source security evaluation is specifically designed to enhance the security of large language models (LLMs) within AI agents.

The b3 introduces a novel approach known as threat snapshots. Rather than simulating the entire life cycle of an AI agent, threat snapshots focus on critical junctures where vulnerabilities are apt to occur. This targeted testing strategy allows developers to gauge the resilience of their models against real-world adversarial challenges, devoid of the complexities involved in modelling complete agent workflows.

Mateo Rojas-Carulla, Co-Founder and Chief Scientist at Lakera, expressed, "Threat Snapshots allow us to systematically surface vulnerabilities that have until now remained hidden in complex agent workflows. By making this benchmark open to the world, we hope to equip developers and model providers with a realistic way to measure and improve their security posture."

The benchmark consists of 10 representative agent "threat snapshots" and utilises a comprehensive dataset that features over 19,000 adversarial attacks sourced from the gamified Gandalf: Agent Breaker simulator. This ensures robust testing against varied forms of cyber threats, including system prompt exfiltration, phishing, and malicious code injection.

Initial assessments conducted on 31 popular LLMs have unveiled crucial insights: while enhanced reasoning capabilities tend to bolster security, the model size itself doesn't necessarily influence security performance. Notably, closed-source models generally demonstrate superior security, although leading open-source models are swiftly catching up.

The backbone breaker benchmark is available under an open-source license, providing developers worldwide with an invaluable tool for fortifying AI security. Originally an internal hackathon project at Lakera, the Gandalf: Agent Breaker game has evolved into the world's foremost red teaming platform. This hacking simulator challenges users to test AI agents within realistic scenarios, encouraging a deeper understanding of potential vulnerabilities in GenAI applications.

Since its inception, Gandalf has generated over 80 million data points, rapidly expanding the global red teaming community. Though initially crafted as a playful endeavor, Gandalf's mission remains serious: to raise vital awareness about AI-first security and the inherent challenges posed by GenAI technology.

Cato Networks joins Westcon-Comstor's AWS Marketplace

Posted 3 days ago by Sophie Milburn
Westcon-Comstor has added Cato Networks to its AWS Marketplace programme, expanding cloud procurement options for partners.

Atlassian introduces AI-powered 'Remix' for confluence

Posted 3 days ago by Sophie Milburn
Atlassian Corporation has introduced new AI features in Confluence that enable content to be transformed into formats such as charts, infographics,...
Cynomi has enhanced its platform with AI Insights and co-worker Agents, aimed at supporting cybersecurity service delivery for MSPs and MSSPs.

DXC Technology and ServiceNow forge AI partnership

Posted 3 days ago by Sophie Milburn
DXC Technology and ServiceNow have announced a collaboration to integrate AI into enterprise operations across global business functions.

Cloudera updates hybrid data and AI platform capabilities

Posted 3 days ago by Sophie Milburn
Cloudera has announced updates to its hybrid data and AI platform aimed at supporting enterprise data environments.
WatchGuard Technologies has launched a new endpoint security portfolio that introduces changes to traditional EDR licensing models.

SonicWall reveals 2026 Cyber Protect Report

Posted 3 days ago by Sophie Milburn
SonicWall's latest report identifies the 'Seven Deadly Sins of Cybersecurity', focusing on protection outcomes crucial for small and medium-sized...
Hammer AI Works is an end-to-end ecosystem designed to support AI adoption across organisations.