A new benchmark pitting AI against previously unseen maths problems shows systems still fall short of top human expertise.
Model choice actually did more work than the tools did ...