A fresh Apple research report has doused water on the AI sector's most daring aspiration: achieving Artificial General Intelligence (AGI).
The report, "The Illusion of Thinking," demonstrates that even advanced thinking models like Claude 3.7 and Deepseek-R1 crash when presented with high-complexity challenges, their performance dropping to zero as problems get tougher to solve.
While Apple's results have challenged the "reasoning" abilities of contemporary frontier models, Google and big gaming houses are going all-in on data collection — hoping behavioral data, not model size, could hold the solution to AGI.
Apple researchers tested AI models on algorithmic problems such as Tower of Hanoi and River Crossing. Models excelled at low-level tasks but fell apart hopelessly at higher levels of complexity, even when given explicit solving commands.
As this one researcher put it, the models "learn patterns well enough, but when the questions get modified or the level of complexity is increased, they fall completely apart."
AI critic Gary Marcus framed Apple's findings as a "knockout blow" to the hype, and he further said that convincing LLMs have a superficial veneer that obscures the absence of substantial reasoning.
Meanwhile, Google continues to work on Project Mariner, a multi-agent system that can surf the web, book flights, and purchase items for users on their own. With training on massive user interaction data, Google's agents "learn by doing" without the reason-giving constraints Apple raised.
DeepMind CEO Demis Hassabis argued at Google I/O 2025 that, "AGI isn't about passing logic puzzles. It's about navigating the real world."
Video game developers are also entering the fray as unexpected heavyweights. With 3.4 billion gamers globally generating $177 billion annually, gaming data is now considered to be a goldmine for AI training.
Every in-game decision — a botched parry or heroic heal creates clean, high-frequency samples of human cognition under stress. The data already is being used to train AI agents for logistics, medicine, and even autonomous vehicles.
But with the introduction of eye-tracking headsets and pulse-reading haptics came concerns over privacy, which have spurred the creation of new regulations like the EU's AI Act and zero-knowledge proofs to ensure secure and auditable data transfer.
Apple's Reality Check: "Reasoning" AI Can't Think
Apple's efforts foretell a revolutionary size limit for current LLMs and LRMs: additional data or parameters simply make better pattern-matchers, not wiser thinkers. The models' performance disintegrates dramatically as irrelevant information or slight perturbations are introduced, and "chains of thought" are actually statistical computations and not actual reasoning. That is why the AI competition is shifting from model size to data quality and variety.
This change is inciting a new war on proprietary data. Google's Project Mariner and other intermediaries rely on web-scale behavioral data, and game studios market in-game telemetry as "AGI fuel" for next-generation model training.
The industry is also witnessing the advent of blockchain-enabled data marketplaces such as Nokia's Data Marketplace, which enables secure, federated data and AI model trading for collaborative learning and monetization. They use blockchain and zero-knowledge proof to offer provenance, auditability, and privacy. They resolve both regulatory and ethical challenges.
The stakes are high: with the "illusion of thinking" in LLMs exposed, the next AI innovation could occur in who owns and operates the richest, most valuable datasets — not necessarily who trains the biggest models. Apple's restraint may slow down the hype cycle, but it's also speeding up the game among technology leaders, game studios, and blockchain innovators to own the future of data.
Analysis & Industry Implications
Apple's research identifies a basic scaling limit to present-day LLMs and LRMs: more data or parameters simply make better pattern detectors, not wiser thinkers. The models' performance disintegrates rapidly if one provides them with irrelevant data or tweaks them slightly, such that "chains of thought" are actually just statistical calculations, not genuine reasoning.
Therefore, the AI competition is shifting from model size to diversity and quality of data.
This change is fueling a new war for proprietary data sets. Google's Project Mariner and similar agents rely on web-scale behavioral information, while game studios profit from selling telemetry within games as "AGI fuel" employed to train future models.
The industry is also witnessing the rise of blockchain-driven data marketplaces such as Nokia's Data Marketplace, providing secure, federated exchange of data and AI models for collective learning and monetization.
These platforms use blockchain and zero-knowledge proofs to offer provenance, auditability, and confidentiality, meeting regulatory as well as ethical standards.
The stakes are high: with LLMs' "illusion of thinking" exposed, the next AI advance might not be made by whoever can build the biggest models, but by whoever controls and structures the richest, most useful datasets.
Apple's warning might slow the hype cycle, but it's also accelerating the competition between tech companies, game studios, and blockchain innovators for the future of data.