Meta Lied when Benchmark-testing its Llama 4 Model

News,

Yann LeCun, Meta’s outgoing chief artificial intelligence (AI) scientist, says his employer tested its latest Llama model in a way that may have made the model look better than it really was. 

In a recent Financial Times interview, LeCun says Meta researchers "fudged a little bit" by using different versions of Llama 4 Maverick and Llama 4 Scout models on different benchmarks to improve test results. Normally, researchers use a single version of a new model for all benchmarks, instead of choosing a variant that will score best on a given benchmark. 

Prior to the launch of the Llama 4 models, Meta had begun to fall behind rivals Anthropic, OpenAI and Google in pushing the envelope. The company was under pressure to reassert Llama’s prowess, especially in an environment where stock prices can turn on the latest model benchmarks.

Please select this link to read the complete article from Fast Company.