Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 151
Gated Delta Networks: Improving Mamba2 with Delta Rule Paper • 2412.06464 • Published Dec 9, 2024 • 14
TinyBERT: Distilling BERT for Natural Language Understanding Paper • 1909.10351 • Published Sep 23, 2019 • 1
The Prompt Report: A Systematic Survey of Prompting Techniques Paper • 2406.06608 • Published Jun 6, 2024 • 68
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights Paper • 2510.04800 • Published Oct 6 • 36
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models Paper • 2510.05034 • Published Oct 6 • 48
BERT release Collection Regroups the original BERT models released by the Google team. Except for the models marked otherwise, the checkpoints support English. • 8 items • Updated Jul 10 • 35
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use Paper • 2508.04482 • Published Aug 6 • 9
Hidden Dynamics of Massive Activations in Transformer Training Paper • 2508.03616 • Published Aug 5 • 18
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization Paper • 2508.05731 • Published Aug 7 • 25
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published Aug 8 • 192
view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval +1 Mar 22, 2024 • 103
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 151
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Paper • 2506.09513 • Published Jun 11 • 100