Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2505.00949

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8 • 192
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Paper • 2508.14444 • Published Aug 20 • 38
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Paper • 2507.06261 • Published Jul 7 • 64
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 272

May 2025 - Top Papers

The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29 • 72
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 73
LLMs for Engineering: Teaching Models to Design High Powered Rockets

Paper • 2504.19394 • Published Apr 27 • 14
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions

Paper • 2504.19056 • Published Apr 27 • 18

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
Qwen/Qwen3-14B-GGUF

Text Generation • 15B • Updated May 9 • 11.9k • 57
Qwen/Qwen3-8B-GGUF

Text Generation • 8B • Updated May 21 • 84.8k • 84
Qwen/Qwen3-4B-GGUF

Text Generation • 4B • Updated May 21 • 24.2k • 46

Open, Production-ready Enterprise Models

nvidia/Llama-3_3-Nemotron-Super-49B-v1_5

Text Generation • 50B • Updated Oct 15 • 449k • 217
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8

Text Generation • 50B • Updated Oct 15 • 2.74k • 21
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1

Text Generation • 253B • Updated Oct 15 • 112k • • 339
nvidia/Llama-3_3-Nemotron-Super-49B-v1

Text Generation • 50B • Updated Oct 15 • 14.7k • 320

Human-like Episodic Memory for Infinite Context LLMs

Paper • 2407.09450 • Published Jul 12, 2024 • 62
MUSCLE: A Model Update Strategy for Compatible LLM Evolution

Paper • 2407.09435 • Published Jul 12, 2024 • 23
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

Paper • 2407.09121 • Published Jul 12, 2024 • 6
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

Paper • 2407.14482 • Published Jul 19, 2024 • 26

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8 • 192
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Paper • 2508.14444 • Published Aug 20 • 38
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Paper • 2507.06261 • Published Jul 7 • 64
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 272

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
Qwen/Qwen3-14B-GGUF

Text Generation • 15B • Updated May 9 • 11.9k • 57
Qwen/Qwen3-8B-GGUF

Text Generation • 8B • Updated May 21 • 84.8k • 84
Qwen/Qwen3-4B-GGUF

Text Generation • 4B • Updated May 21 • 24.2k • 46

May 2025 - Top Papers

The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29 • 72
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 73
LLMs for Engineering: Teaching Models to Design High Powered Rockets

Paper • 2504.19394 • Published Apr 27 • 14
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions

Paper • 2504.19056 • Published Apr 27 • 18

Open, Production-ready Enterprise Models

nvidia/Llama-3_3-Nemotron-Super-49B-v1_5

Text Generation • 50B • Updated Oct 15 • 449k • 217
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8

Text Generation • 50B • Updated Oct 15 • 2.74k • 21
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1

Text Generation • 253B • Updated Oct 15 • 112k • • 339
nvidia/Llama-3_3-Nemotron-Super-49B-v1

Text Generation • 50B • Updated Oct 15 • 14.7k • 320

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

Human-like Episodic Memory for Infinite Context LLMs

Paper • 2407.09450 • Published Jul 12, 2024 • 62
MUSCLE: A Model Update Strategy for Compatible LLM Evolution

Paper • 2407.09435 • Published Jul 12, 2024 • 23
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

Paper • 2407.09121 • Published Jul 12, 2024 • 6
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

Paper • 2407.14482 • Published Jul 19, 2024 • 26

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs