Synthetic data derived from finepdfs
MultiSynt
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Organization Card
MultiSynt is a collaborative initiative between OpenEuroLLM and EuroLLM focused on developing high-quality multilingual synthetic datasets for language model pretraining. By combining expertise from both organizations, MultiSynt aims to advance the creation of multilingual synthetic training data that supports diverse European languages to enable more inclusive AI development across languages.
models
30
MultiSynt/nemotron-cc-portuguese-opus
Updated
•
18
MultiSynt/nemotron-cc-basque-opus
Updated
•
54
MultiSynt/nemotron-cc-polish-opus
Updated
•
12
MultiSynt/nemotron-cc-polish-tower9b
Updated
•
21
MultiSynt/nemotron-cc-french-opus
Updated
•
14
MultiSynt/nemotron-cc-french-tower9b
Updated
•
22
MultiSynt/nemotron-cc-norwegian-tower9b
Updated
•
48
MultiSynt/nemotron-cc-danish-opus
Updated
•
32
MultiSynt/nemotron-cc-danish-tower9b
Updated
•
55
MultiSynt/nemotron-cc-romanian-opus
Updated
•
26
datasets
6
MultiSynt/MT-Nemotron-CC
Viewer
•
Updated
•
15.6B
•
405
MultiSynt/MT-HPLT2c
Viewer
•
Updated
•
1.76B
•
365
MultiSynt/MT-Reasoning
Viewer
•
Updated
•
82M
•
102
MultiSynt/MT-Reasoning-Prompts
Viewer
•
Updated
•
399M
•
252
MultiSynt/nemotron-cc-spanish-opus-qe
Viewer
•
Updated
•
3.29B
•
75
MultiSynt/finepdfs-summaries
Viewer
•
Updated
•
1.57B
•
301
•
1