New 'Humanity's Last Exam' AI Benchmark Could Signal Early AGI Progress

Researchers from the Center for AI Safety and Scale AI have introduced "Humanity's Last Exam" (HLE), a rigorous benchmark designed to measure how close advanced AI models are to human-level expertise across over 100 subjects.

The exam, detailed in a new Nature study, features 2,500 PhD-level questions vetted by over 1,000 global experts. Questions are designed to be unambiguous, verifiable, and not solvable by simple web search.

In initial tests, top models like OpenAI's o1 scored only 8.3%. As of February 2026, Google's Gemini 3 Deep Think leads with 48.4%, still far below the human expert average of 90%.

The creators emphasize that while high HLE performance is a necessary step, it alone does not signify the achievement of true artificial general intelligence (AGI), which requires broader capabilities like autonomous research.

New 'Humanity's Last Exam' AI Benchmark Could Signal Early AGI Progress

Latest news

One UI 8.5 Beta 6 Reportedly Rolls Out for Galaxy S25, Aiming to Be Final Beta

New 'Humanity's Last Exam' AI Benchmark Could Signal Early AGI Progress

Lensbaby Unveils New Twist 28 Pancake Lens with Swirling Bokeh

Uber to Launch Flying Taxi Service in Partnership with Joby Aviation

First Look: All Three Gen 10 Starter Pokemon Revealed for Pokemon Winds and Waves