Meta claims Llama 3 matches GPT-4 — independent MMLU scores disagree
Meta's technical report states Llama 3 70B achieves 82% on MMLU. However, Hugging Face Open LLM Leaderboard v1 shows reproducibility gaps around 3 percentage points. We need standardized eval harnesses before accepting parity claims.
0 comments
0