view article Article Back to The Future: Evaluating AI Agents on Predicting Future Events +5 Jul 17 • 47
view article Article CO₂ Emissions and Models Performance: Insights from the Open LLM Leaderboard +2 Jan 9 • 21
view article Article Rethinking LLM Evaluation with 3C3H: AraGen Benchmark and Leaderboard +3 Dec 4, 2024 • 38
view article Article Letting Large Models Debate: The First Multilingual LLM Debate Competition +10 Nov 20, 2024 • 33