Meta Lied when Benchmark-testing its Llama 4 Model

January 9, 2026

in News,

Yann LeCun, Meta’s outgoing chief artificial intelligence (AI) scientist, says his employer tested its latest Llama model in a way that may have made the model look better than it really was.

In a recent Financial Times interview, LeCun says Meta researchers "fudged a little bit" by using different versions of Llama 4 Maverick and Llama 4 Scout models on different benchmarks to improve test results. Normally, researchers use a single version of a new model for all benchmarks, instead of choosing a variant that will score best on a given benchmark.

Prior to the launch of the Llama 4 models, Meta had begun to fall behind rivals Anthropic, OpenAI and Google in pushing the envelope. The company was under pressure to reassert Llama’s prowess, especially in an environment where stock prices can turn on the latest model benchmarks.

Please select this link to read the complete article from Fast Company.

Framing ADHD as a Strength Could Lead to Better Mental HealthJanuary 9, 2026 In Major Reversal, U.S. Shifts Guidance on Alcohol UseJanuary 9, 2026

Meta Lied when Benchmark-testing its Llama 4 Model

Categories

Most Recent Posts