Jonas Kübler's picture

2 1 12

Jonas Kübler

jonaskuebler

·

AI & ML interests

None yet

Recent Activity

liked a model about 1 month ago

amazon/chronos-2

reacted to georgewritescode's post with 🚀 about 2 months ago

Announcing Artificial Analysis Long Context Reasoning (AA-LCR), a new benchmark to evaluate long context performance through testing reasoning capabilities across multiple long documents (~100k tokens) The focus of AA-LCR is to replicate real knowledge work and reasoning tasks, testing capability critical to modern AI applications spanning document analysis, codebase understanding, and complex multi-step workflows. AA-LCR is 100 hard text-based questions that require reasoning across multiple real-world documents that represent ~100k input tokens. Questions are designed so answers cannot be directly found but must be reasoned from multiple information sources, with human testing verifying that each question requires genuine inference rather than retrieval. Key takeaways: ➤ Today’s leading models achieve ~70% accuracy: the top three places go to OpenAI o3 (69%), xAI Grok 4 (68%) and Qwen3 235B 2507 Thinking (67%) ➤👀 We also already have gpt-oss results! 120B performs close to o4-mini (high), in-line with OpenAI claims regarding model performance. We will be following up shortly with a Intelligence Index for the models. ➤ 100 hard text-based questions spanning 7 categories of documents (Company Reports, Industry Reports, Government Consultations, Academia, Legal, Marketing Materials and Survey Reports) ➤ ~100k tokens of input per question, requiring models to support a minimum 128K context window to score on this benchmark ➤ ~3M total unique input tokens spanning ~230 documents to run the benchmark (output tokens typically vary by model) We’re adding AA-LCR to the Artificial Analysis Intelligence Index, and taking the version number to v2.2. Artificial Analysis Intelligence Index v2.2 now includes: MMLU-Pro, GPQA Diamond, AIME 2025, IFBench, LiveCodeBench, SciCode and AA-LCR. Link to dataset: https://huggingface.co/datasets/ArtificialAnalysis/AA-LCR

new activity 2 months ago

Qwen/Qwen3-Coder-30B-A3B-Instruct:What does `max_window_layers` do?

View all activity

Organizations

liked a model about 1 month ago

amazon/chronos-2

Time Series Forecasting • 0.1B • Updated Nov 5 • 3.98M • 109

liked 11 models 12 months ago

autogluon/chronos-t5-base

Time Series Forecasting • 0.2B • Updated Oct 30 • 17.7k • 5

autogluon/chronos-t5-small

Time Series Forecasting • 46.2M • Updated Oct 30 • 1.24k • 5

autogluon/chronos-t5-mini

Time Series Forecasting • 20.5M • Updated Oct 30 • 54.4k • 5

autogluon/chronos-t5-large

Time Series Forecasting • 0.7B • Updated Oct 30 • 32.1k • 6

autogluon/chronos-t5-tiny

Time Series Forecasting • 8.39M • Updated Oct 30 • 26.9k • 12

autogluon/tabpfn-mix-1.0-regressor

Tabular Regression • Updated Nov 27, 2024 • 258 • 14

autogluon/tabpfn-mix-1.0-classifier

Tabular Classification • Updated Nov 27, 2024 • 63k • 17

autogluon/chronos-bolt-base

Time Series Forecasting • 0.2B • Updated Oct 30 • 4.49M • 35

autogluon/chronos-bolt-mini

Time Series Forecasting • 21.2M • Updated Oct 30 • 336k • 8

autogluon/chronos-bolt-small

Time Series Forecasting • 47.7M • Updated Oct 30 • 5.55M • 20

autogluon/chronos-bolt-tiny

Time Series Forecasting • 8.65M • Updated Oct 30 • 385k • 13