Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Mixture of Experts

Carbon Emissions

Models

158

Full-text search

Active filters: GRPO

tobrun/SmolLM2-135M-GRPO

Text Generation • 0.1B • Updated Mar 15, 2025 • 6

stranger47/Qwen2.5-3B-Instruct-GRPO-NuminaMath-TIR

Text Generation • 3B • Updated Mar 16, 2025 • 3

TharunSivamani/SmolGRPO-135M

Text Generation • 0.1B • Updated Mar 16, 2025 • 4

frascuchon/SmolGRPO-135M

Text Generation • 0.1B • Updated Mar 17, 2025 • 5

bhaveshgoel07/SmolGRPO-135M

Updated Mar 18, 2025

Arushhh/SmolGRPO-135M

Text Generation • 0.1B • Updated Mar 24, 2025 • 3

hiroyuki0823/SakanaAI-TinySwallow-1.5B-Instruct-GRPO-lora

Updated Mar 24, 2025 • 1

ykarout/Phi4-ThinkMode-fp16

Text Generation • 15B • Updated Mar 27, 2025 • 4

mradermacher/Phi4-ThinkMode-fp16-GGUF

15B • Updated Jul 11, 2025 • 45

czuo03/SmolGRPO-135M

Text Generation • 0.1B • Updated Mar 28, 2025 • 5

mradermacher/Nuke_X_Gemma3_1B_Reasoner_Testing-GGUF

1.0B • Updated Jul 11, 2025 • 32 • 1

mradermacher/Nuke_X_Gemma3_1B_Reasoner_Testing-i1-GGUF

1.0B • Updated Jul 11, 2025 • 37 • 1

opria123/SmolGRPO-135M

Text Generation • 0.1B • Updated Apr 6, 2025 • 3

alonsosilva/SmolGRPO-135M

Text Generation • 0.1B • Updated Apr 8, 2025 • 6

VaidikML0508/Shark-Tank-Offer-Evaluator-llama3.2-3B-Instruct-GRPO-16bits-V1

Text Generation • 3B • Updated Apr 22, 2025 • 3 • 1

mradermacher/Shark-Tank-Offer-Evaluator-llama3.2-3B-Instruct-GRPO-16bits-V1-GGUF

3B • Updated Jul 11, 2025 • 18

alfredcs/gemma-3-12b-grpo-firstaid

Updated Apr 24, 2025

garethpaul/SmolGRPO-135M

Text Generation • 0.1B • Updated May 8, 2025 • 5

Thabet/SmolGRPO-135M-learning

Text Generation • 0.1B • Updated May 10, 2025 • 5

jcollado/SmolGRPO-135M

Text Generation • 0.1B • Updated May 14, 2025 • 5

Brianpuz/SmolGRPO-135M

Text Generation • 0.1B • Updated May 19, 2025 • 5

yigitkucuk/tint-interact-sft-grpo

Text Generation • 0.4B • Updated May 19, 2025 • 4

koochikoo25/SmolGRPO-135M

Text Generation • 0.1B • Updated May 20, 2025 • 5

jackle33/SmolGRPO-135M

Text Generation • 0.1B • Updated May 22, 2025 • 3

TianheWu/VisualQuality-R1-7B

Reinforcement Learning • 8B • Updated Sep 19, 2025 • 14.1k • 9

pedrocurvo/llama2-grpo-lora

Text Generation • 7B • Updated May 26, 2025 • 8

mradermacher/VisualQuality-R1-7B-GGUF

8B • Updated Jul 31, 2025 • 53

HuangXinBa/GRPO

Text Generation • 0.1B • Updated May 28, 2025 • 10 • 1

Ceenen2302/Llama-3.2-1B-Instruct-GRPO-SmartLed

Feature Extraction • 1B • Updated Jun 3, 2025 • 6

alfredcs/torchrun-gemma-3-12b-grpo-icd10pcs-merged

Text Generation • 8B • Updated Jun 4, 2025 • 19