Unveiling the Future of Tech — Revolutionizing Tech at Cyber Tech Hub

New ClockBench Benchmark Exposes AI's Struggle with Analog Clocks

AI models struggle with reading analog clocks. Despite Google's Gemini 2.5 Pro leading, human performance is still far ahead.

, and Administrator

2025 October 4 . 6:12 AM

1 min read

The picture consists of an analog clock.

New ClockBench Benchmark Exposes AI's Struggle with Analog Clocks

A new benchmark, ClockBench, has been introduced to assess AI models' ability to read analog clocks. Despite Gemini 2.5 Pro leading with 13.3% accuracy, human performance remains significantly higher at 89.1%.

ClockBench, created to evaluate AI models' visual reasoning skills, consists of 180 unique analog clocks and 720 questions. It's designed to be 'easy for humans, hard for AI'.

The test revealed that while AI models show basic visual reasoning, they struggle with initial visual information extraction. Certain clock features like Roman numerals, circular numbers, and colorful backgrounds posed challenges. Gemini 2.5 Pro, the top performer, still lagged behind human accuracy by 75.8 percentage points.

Grok 4 performed poorly with 0.7% accuracy, marking 63.3% of clocks as invalid. GPT-5 came in third with 8.4% accuracy, with varying reasoning budgets having little impact. The median error sizes for incorrect AI answers were much larger than those of humans.

ClockBench, an ongoing benchmark with a public version available, highlights the gap in AI models' ability to read analog clocks compared to humans. While Gemini 2.5 Pro leads AI models with 13.3% accuracy, human performance stands at 89.1%. Further research is needed to bridge this significant performance gap.

Latest

In this image, we can see an advertisement contains robots and some text.

Protect Your Digital World

Killnet Launches Major Cyberattack on Japan, Targeting Government and Commercial Websites

Killnet strikes again, this time targeting Japan. The pro-Russian group's cyberwarfare is escalating, with a 42% global increase in attacks since the start of the Russia-Ukraine war.

, and Administrator

2025 October 9

Science

CSIRO Launches Innovate to Grow: Cyber Security Program for Australian SMEs

Get free R&D support for your cyber security products. Boost your business with CSIRO's expertise and funding.

, and Administrator

2025 October 9

In this image there is a bus on the road. Beside the bus there are two persons walking on the road....

Finance

India's Infrastructure Boom: PPPs Drive Highway and Railway Modernization

PPPs are revolutionizing India's highways and railways. Major expressways and station redevelopments are boosting connectivity and stimulating economic growth.

, and Administrator

2025 October 9

In this image we can see three persons wearing id cards standing on the ground. In the background...

Finance

Thredd & Featurespace Launch One View: Pioneering Fraud Detection Solution

One View offers a holistic view of customer payment activities. Self-resolving alerts empower customers to fight fraud, reducing false positives and enhancing user experience.

, and Administrator

2025 October 9

New ClockBench Benchmark Exposes AI's Struggle with Analog Clocks

New ClockBench Benchmark Exposes AI's Struggle with Analog Clocks

Read also:

Related

Latest