Risk Indexes

Specialized risk assessments for different use cases and concerns

Understanding Risk Indexes

Each risk index aggregates multiple evaluations to provide a specialized view of AI safety. Different indexes matter for different stakeholders and deployment scenarios.

Composite Risk

Overall safety assessment combining all major risk dimensions

Specialized Views

Focus on specific concerns like scheming, bias, or cyber capabilities

Use Case Specific

Tailored indexes for agents, high-stakes apps, or enterprise deployment

Core Risk Indexes

Primary indexes that compose the overall Composite Risk Index

🔥

Composite Risk Index

Weight in Composite Index: Primary

Aggregated risk score combining all specialized indexes with weighted importance.

Index Composition

Offensive Cyber Capabilities40%
Scheming & Deceptive Alignment30%
Harmful Agent Capabilities15%
Adversarial Robustness10%
Bias & Fairness5%

Interpretation: Lower scores indicate lower overall risk. Models above 0.7 require enhanced controls.

💻

Offensive Cyber Capabilities Index

Weight in Composite Index: 40%

Measures dangerous cyber capabilities through CTF challenges, exploitation tasks, and security testing.

Cybench3CB (Catastrophic Cyber Capabilities)GDM CTFInterCode SecurityCyberSecEval 2 & 3

Interpretation: High scores indicate significant dual-use risk. Critical for models with code generation or tool access.

🎭

Scheming & Deceptive Alignment Index

Weight in Composite Index: 30%

Evaluates deceptive alignment risks, self-reasoning about model situation, and stealth behavior.

MASK (Honesty under pressure)Agentic MisalignmentGDM Self-reasoningGDM StealthSycophancy

Interpretation: Addresses existential risk concerns. Models showing scheming behavior require immediate investigation.

⚗️

Harmful Agent Capabilities Index

Weight in Composite Index: 15%

Assesses potential for direct harm through hazardous knowledge and dangerous agentic capabilities.

AgentHarmWMDP (WMD knowledge)SOS BENCH (Scientific knowledge)AIR Bench

Interpretation: Measures propensity for harm. High scores indicate need for deployment restrictions.

🛡️

Adversarial Robustness Index

Weight in Composite Index: 10%

Tests resistance to jailbreaks, prompt injection attacks, and social engineering.

StrongREJECTAgentDojoMake Me Pay

Interpretation: Lower scores mean stronger defenses. Critical for public-facing deployments.

⚖️

Bias & Fairness Index

Weight in Composite Index: 5%

Measures stereotype bias, fairness issues, and calibrated refusal behavior.

BBQBOLDStereoSet

Interpretation: Addresses societal harm. Important for consumer applications and high-stakes decisions.

📊

Calibration & Honesty Index

Weight in Composite Index: Modifier

Evaluates uncertainty calibration, appropriate abstention, and honesty metrics.

SimpleQA (abstention only)AbstentionBenchXSTestMASK

Interpretation: Acts as risk modifier. Well-calibrated models with good abstention get risk-adjusted downward.

🤖

Agentic Risk Index

For models with tool access and autonomy—combines agent-specific harm benchmarks.

Target Audience:

Deploying LLM agents with API integrations or autonomous decision-making

☢️

CBRN/WMD Risk Index

Chemical, Biological, Radiological, Nuclear hazards assessment.

Target Audience:

Government regulators, national security, research institutions

🎪

Deception & Manipulation Index

Focuses specifically on lying, misleading, and manipulative behavior.

Target Audience:

Enterprise trust & safety teams, alignment researchers

🗡️

Attack Surface Index

How easily can adversaries compromise the model through various attack vectors.

Target Audience:

Security teams, red-teamers, product safety engineers

🎯

Insider Threat / Goal Misalignment Index

The 'will it betray you?' index—measures unethical insider behavior and goal misalignment.

Target Audience:

AI safety researchers, long-term risk assessment, enterprise deployment

⚖️

Dual-Use Capabilities Index

Powerful capabilities that could help or harm depending on deployment context.

Target Audience:

Enterprise risk assessment, capability vs safety tradeoff analysis

About Risk Index Methodology

Normalization: All benchmark scores are normalized to 0-1 scale where 1 represents highest risk. Different benchmarks have inverted interpretations (e.g., high accuracy on cyber evals = high risk).

Aggregation: Indexes combine multiple benchmarks using weighted averages based on benchmark relevance and reliability. Weights are adjusted as we gather more evaluation data.

Thresholds: Risk levels are categorized as Very Low (0-0.3), Low (0.3-0.5), Moderate (0.5-0.7), High (0.7-0.85), Very High (0.85-1.0). Models above 0.7 typically require deployment restrictions.

Transparency: All index calculations, benchmark weightings, and normalization procedures will be published with full documentation. ARM prioritizes reproducibility and methodological clarity.