Sitemap - 2024 - METR

Evaluating frontier AI R&D capabilities of language model agents against human experts

The Rogue Replication Threat Model

Response to Bureau of Industry and Security’s proposed AI reporting requirements

New Support Through The Audacious Project

Details about METR’s preliminary evaluation of o1-preview

Response to U.S. AISI Draft “Managing Misuse Risk for Dual-Use Foundation Models”

Common Elements of Frontier AI Safety Policies

Details about METR’s preliminary evaluation of GPT-4o

An update on our general capability evaluations

Response to NIST Draft Generative AI Profile

ML Engineers Needed for New AI R&D Evals Project

Emma Abele is METR’s new Executive Director

Autonomy Evaluation Resources

Example autonomy evaluation protocol

Guidelines for capability elicitation

Measuring the impact of post-training enhancements

Portable Evaluation Tasks via the METR Task Standard

2023 Year In Review

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts