view article Article FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages Jul 8 • 32
view article Article Explore, Curate and Vector Search Any Hugging Face Dataset with Nomic Atlas Jan 23 • 30
view article Article FineWeb2-C: Help Build Better Language Models in Your Language Dec 23, 2024 • 21
view article Article Open Preference Dataset for Text-to-Image Generation by the 🤗 Community +5 Dec 9, 2024 • 69
view article Article Introducing Synthetic Data Workshop: Your Gateway to Easy Synthetic Dataset Creation Jun 20, 2024 • 12
view article Article Synthetic dataset generation techniques: generating custom sentence similarity data May 23, 2024 • 16
view article Article Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia? May 7, 2024 • 8
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models +1 Mar 20, 2024 • 105
view article Article Data is better together: Enabling communities to collectively build better datasets together using Argilla and Hugging Face Spaces Mar 4, 2024 • 8
view article Article Data is better together: Enabling communities to collectively build better datasets together using Argilla and Hugging Face Spaces Mar 4, 2024 • 8
view article Article Extracting Insights from Model Cards Using Open Large Language Models Nov 27, 2023
view article Article Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Langage Model +9 Aug 22, 2023 • 37