METR (pronounced ‘meter’) researches, develops and runs evaluations of frontier AI systems’ ability to complete complex tasks without human input. This means our work focuses on AI agents.
METR’s evaluations assess the extent to which an AI system can autonomously carry out substantial tasks, including general-purpose tasks like conducting research or developing an app, and concerning capabilities such as conducting cyberattacks or making itself hard to shut down. Currently, we are primarily developing evaluations measuring the capability to automate AI R&D.
METR also prototypes governance approaches which use AI systems’ measured or forecasted capabilities to determine when better risk mitigations are needed for further scaling. This included prototyping the Responsible Scaling Policies approach. See how companies are using evaluations for this purpose in their frontier AI safety policies.
