Pro@programming.dev to Technology@lemmy.worldEnglish · 23 hours agoClockBench: Even the best AI models can't reliably read the clockclockbench.aiexternal-linkmessage-square8fedilinkarrow-up191arrow-down11file-textcross-posted to: Technology@programming.dev
arrow-up190arrow-down1external-linkClockBench: Even the best AI models can't reliably read the clockclockbench.aiPro@programming.dev to Technology@lemmy.worldEnglish · 23 hours agomessage-square8fedilinkfile-textcross-posted to: Technology@programming.dev
minus-squareearthworm@sh.itjust.workslinkfedilinkEnglisharrow-up4·edit-26 hours agoThis seems like a dumb benchmark. ClockBench evaluates whether models can read analog clocks - a task that is trivial for humans, but current frontier models struggle with. What do you mean trivial? Most humans I know can’t read the most basic white-background-big-black-numbers clocks. Someone rigged the jury to get 90% on this:
minus-squareMCasq_qsaCJ_234@lemmy.ziplinkfedilinkEnglisharrow-up1·3 hours agoRather, ClockBench will end up improving AI in this regard over the next few years. This is because they need any AI benchmark to identify its strengths and weaknesses in order to improve it in future versions.
This seems like a dumb benchmark.
What do you mean trivial? Most humans I know can’t read the most basic white-background-big-black-numbers clocks.
Someone rigged the jury to get 90% on this:
Rather, ClockBench will end up improving AI in this regard over the next few years. This is because they need any AI benchmark to identify its strengths and weaknesses in order to improve it in future versions.