The Deep Reason Behind the "Simple Puzzles" AI Still Can't Solve - Exploring the Astonishing Power of the Human Brain: "Seconds" for Humans, "Mazes" for AI

The Deep Reason Behind the "Simple Puzzles" AI Still Can't Solve - Exploring the Astonishing Power of the Human Brain: "Seconds" for Humans, "Mazes" for AI

ARC is a task that involves identifying and applying "hidden rules" in color grids to measure generalization from a few examples. In an interview with Live Science, Greg Kamradt of the ARC Prize explained that while humans achieve an average of about 66% on ARC-AGI-2, AI continues to struggle, stating, "As long as there are problems that humans can solve but AI cannot, it is not AGI." OpenAI's o3 scored highly on ARC-AGI-1 (75.7% / 87.5%), causing an "o3 shock," but many view this as a spike in performance supported by high-computation searches. The next focus is ARC-AGI-3, which has shifted from a question-and-answer format to an "agent" test that evaluates exploration, planning, and memory in 2D games on a scale of 100. On Reddit, there are active discussions about terminology, suggesting it should be called LLM instead of AI, and dissatisfaction from users about household chores not becoming easier. On Hacker News, debates revolve around human averages and score interpretation. Overall, AGI is approached not by extending scores but by re-evaluating learning efficiency and the design of actions.