Google's Gemini 3 AI Achieves Record Scores on "Humanity'...

Google's Gemini 3 AI Achieves Record Scores on "Humanity's Last Exam," Signaling Major Leap in Reasoning

Google has unveiled its new AI model, Gemini 3 Pro, which achieved a groundbreaking 37.5% on "Humanity's Last Exam," a benchmark designed to identify artificial superintelligence, significantly outperforming competitors. An even more powerful version, Gemini 3 Deep Think, scored an impressive 41% on the same rigorous test, and these advanced models are now being integrated across Google's ecosystem, including AI Mode in Search and the Gemini app.

Google has officially unveiled its latest artificial intelligence model, Gemini 3 Pro, which achieved a groundbreaking score of 37.5% on "Humanity's Last Exam." This benchmark is specifically designed by AI safety researchers to identify artificial superintelligence, as reported by yahoo News Canada on November 19, 2025. The performance demonstrates what Google describes as PhD-level reasoning capabilities.

The achievement places Gemini 3 Pro significantly ahead of its closest competitor, OpenAI's GPT-5 Pro, which scored 31.64% on the same rigorous test, according to yahoo News Canada. Google stated that Gemini 3 represents a "massive jump in reasoning," capable of grasping depth and nuance in complex problems.

Further enhancing its AI portfolio, Google also announced Gemini 3 Deep Think, an even more powerful version of the model. This advanced iteration scored an impressive 41% on "Humanity's Last Exam," setting new records on other benchmark tests as well, Google's blog confirmed on November 18, 2025.

Google CEO Sundar Pichai emphasized that Gemini 3 is advancing the state of the art, pushing the frontiers of intelligence to make AI truly helpful for everyone, Yahoo News Canada reported. The model is built to understand and respond with unprecedented depth and understanding across various tasks.

The new Gemini 3 Pro model is being rolled out across Google's ecosystem, including AI Mode in Search and the Gemini app, which boasts over 650 million monthly active users, Jagran Josh stated on November 19, 2025. This widespread integration aims to make the advanced AI accessible to billions globally.

Demis Hassabis, CEO of Google DeepMind, highlighted that Gemini 3 Deep Think mode further pushes the boundaries of intelligence, delivering a step-change in reasoning and multimodal understanding. This enhanced mode is designed to tackle even more complex problems, as noted by Yahoo News Canada.

The launch of Gemini 3 marks a significant moment in the ongoing race for artificial general intelligence (AGI), with Google positioning its new models as leaders in advanced reasoning and problem-solving. Technology Magazine reported on November 18, 2025, that this release advances the path towards AGI.

Background and Evolution of AI Benchmarks: "Humanity's Last Exam" (HLE) is a rigorous, closed-book benchmark featuring 2,500 expert-curated questions across over 100 subdomains, including mathematics, sciences, and humanities, as detailed by Emergent Mind on November 15, 2025. It was created by AI safety researchers to evaluate frontier AI capabilities beyond simple internet lookups, requiring multi-step, domain-expert problem-solving. This benchmark emerged as older tests became saturated by rapidly improving LLMs.
Technical Prowess and Reasoning Capabilities: Gemini 3 Pro's 37.5% score on HLE, without tool usage, signifies genuine reasoning capability, according to implicator.ai on November 18, 2025. Google's blog also mentioned on November 18, 2025, that Gemini 3 Pro significantly outperforms its predecessor, Gemini 2.5 Pro, on every major AI benchmark, including a 1501 Elo score on the LMArena Leaderboard. This demonstrates its ability to solve complex problems across diverse topics with high reliability.
The Competitive Landscape and OpenAI's Position: OpenAI's GPT-5 Pro, while a strong contender, scored 31.64% on "Humanity's Last Exam," placing it behind Google's new models, Yahoo News Canada reported. OpenAI's GPT-5, launched in August 2025, also showed strong performance on benchmarks like GPQA Diamond and competitive math challenges, as detailed by Vellum AI on August 7, 2025. However, the market is highly dynamic, with continuous advancements from various AI developers.
Implications for Artificial Superintelligence (ASI): The very design of "Humanity's Last Exam" is to identify artificial superintelligence, as noted by Yahoo News Canada. While current AI models still score substantially below human expert levels (who exceed 98% in controlled settings), the rapid progress, particularly with models like Gemini 3 Deep Think, suggests a faster trajectory towards advanced AI capabilities. King's College London researchers also discussed new benchmarks for ASI in April 2025.
Integration and Accessibility Across Google's Ecosystem: Google is integrating Gemini 3 Pro into various products, including AI Mode in Search and the Gemini app, making its advanced capabilities widely available, Jagran Josh confirmed on November 19, 2025. This broad deployment strategy aims to leverage the new AI for billions of users, enhancing experiences from search queries to complex problem-solving. Developers can also access Gemini 3 through Google AI Studio and Vertex AI.
Expert Perspectives and Future Outlook: While benchmark scores are impressive, experts like those cited by Implicator.ai on November 18, 2025, suggest that "PhD-level performance" in AI can still exhibit weaknesses typical of graduate students, including methodological errors. The year 2025 marks a fundamental shift towards agentic AI, capable of autonomous planning and action, with IBM reporting 99% of developers exploring AI agents, according to sentisight.ai on August 18, 2025. This indicates a future where AI will increasingly act as a collaborator rather than just a tool.

Google's Gemini 3 AI Achieves Record Scores on "Humanity's Last Exam," Signaling Major Leap in Reasoning

Discussion

No comments yet

Research Sources

Accessibility Options

Font Size

High Contrast

Reading Preferences

Data & Privacy

Discussion

No comments yet

Join the Discussion

Research Sources

Related Articles

AI-Driven Cyberattack Uncovered: Anthropic Thwarts Chinese State-Sponsored Espionage Campaign

Table Salt Unlocks Breakthrough in Metallic Nanotube Production for Advanced Electronics

NVIDIA's Q3 FY26 Earnings Call Set to Unveil Crucial AI Sector Insights

Tech Sector Faces Unprecedented Layoffs in 2025 Amidst AI Surge and Post-Pandemic Correction

Landmark German Court Rules ChatGPT Violated Copyright Law with Song Lyrics

Accessibility Options

Font Size

High Contrast

Reading Preferences

Data & Privacy

Help us improve your experience

Privacy Settings