Daniel van Strien's picture

Building on HF

Daniel van Strien PRO

davanstrien

·

https://danielvanstrien.xyz/

AI & ML interests

Machine Learning Librarian

Recent Activity

updated a dataset 1 minute ago

data-is-better-together/fineweb-c-progress

updated a dataset about 5 hours ago

librarian-bots/dataset-columns

upvoted a collection about 9 hours ago

View all activity

Organizations

upvoted a collection about 9 hours ago

K2-V2

The collection for K2-V2 models. • 6 items • Updated about 6 hours ago • 8

upvoted a collection 4 days ago

Ministral 3

A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated 4 days ago • 110

upvoted an article 9 days ago

Article

Curating datasets directly on the Hub

9 days ago

•

22

upvoted a paper 9 days ago

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 137

upvoted 2 collections 9 days ago

DeepSeek-Math

DeepSeek Math series • 6 items • Updated 9 days ago • 44

INTELLECT-3

INTELLECT-3: A 100B+ MoE trained with large-scale RL • 4 items • Updated 7 days ago • 11

upvoted an article 10 days ago

Article

A Guide to Hugging Face’s Papers Page

10 days ago

•

8

upvoted 2 articles 11 days ago

Article

OVHcloud on Hugging Face Inference Providers 🔥

12 days ago

•

68

Article

Continuous batching from first principles

+1

11 days ago

•

240

upvoted a paper 11 days ago

AICC: Parse HTML Finer, Make Models Better -- A 7.3T AI-Ready Corpus Built by a Model-Based HTML Parser

Paper • 2511.16397 • Published 16 days ago • 7

upvoted a paper 15 days ago

The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification

Paper • 2511.15622 • Published 17 days ago • 1

upvoted an article 15 days ago

Article

Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks

+2

15 days ago

•

19

upvoted 2 papers 15 days ago

BBox DocVQA: A Large Scale Bounding Box Grounded Dataset for Enhancing Reasoning in Document Visual Question Answer

Paper • 2511.15090 • Published 17 days ago • 1

CASTELLA: Long Audio Dataset with Captions and Temporal Boundaries

Paper • 2511.15131 • Published 17 days ago • 1

upvoted an article 15 days ago

Article

We’re open-sourcing our text-to-image model and the process behind it

24 days ago

•

73

upvoted a paper 15 days ago

EBind: a practical approach to space binding

Paper • 2511.14229 • Published 18 days ago • 6

upvoted 2 collections 15 days ago

E-MM1

Multimodal embedding model, supporting datasets, and a paper describing the process going into building both the datasets and the models 🤗 • 6 items • Updated 15 days ago • 10

Olmo 3

Artifacts for the Olmo 3 release. • 9 items • Updated 4 days ago • 140

upvoted an article 16 days ago

Article

Introducing Cogito v2.1

17 days ago

•

17

upvoted an article 17 days ago

Article

Aligning to What? Rethinking Agent Generalization in MiniMax M2

Oct 30

•

26