With AI models clobbering every benchmark, it's time for human evaluation

March 29, 2025, 09:25:58 AM

With AI models clobbering every benchmark, it's time for human evaluation

[html]The latest frontier in AI research is having more humans in the loop assessing just how good the models are.[/html]

Source: With AI models clobbering every benchmark, it's time for human evaluation (http://ht**://www.zdnet.c**/article/reasoning-ai-models-are-overwhelming-the-benchmark-tests-its-time-for-human-evaluation/)