-
-
-
-
-
-
Inference Providers
Active filters:
GRPO
Text Generation
•
0.1B
•
Updated
•
6
stranger47/Qwen2.5-3B-Instruct-GRPO-NuminaMath-TIR
Text Generation
•
3B
•
Updated
•
3
TharunSivamani/SmolGRPO-135M
Text Generation
•
0.1B
•
Updated
•
4
Text Generation
•
0.1B
•
Updated
•
5
bhaveshgoel07/SmolGRPO-135M
Updated
Text Generation
•
0.1B
•
Updated
•
3
hiroyuki0823/SakanaAI-TinySwallow-1.5B-Instruct-GRPO-lora
ykarout/Phi4-ThinkMode-fp16
Text Generation
•
15B
•
Updated
•
4
mradermacher/Phi4-ThinkMode-fp16-GGUF
15B
•
Updated
•
45
Text Generation
•
0.1B
•
Updated
•
5
mradermacher/Nuke_X_Gemma3_1B_Reasoner_Testing-GGUF
1.0B
•
Updated
•
32
•
1
mradermacher/Nuke_X_Gemma3_1B_Reasoner_Testing-i1-GGUF
1.0B
•
Updated
•
37
•
1
Text Generation
•
0.1B
•
Updated
•
3
alonsosilva/SmolGRPO-135M
Text Generation
•
0.1B
•
Updated
•
6
VaidikML0508/Shark-Tank-Offer-Evaluator-llama3.2-3B-Instruct-GRPO-16bits-V1
Text Generation
•
3B
•
Updated
•
3
•
1
mradermacher/Shark-Tank-Offer-Evaluator-llama3.2-3B-Instruct-GRPO-16bits-V1-GGUF
3B
•
Updated
•
18
alfredcs/gemma-3-12b-grpo-firstaid
Updated
Text Generation
•
0.1B
•
Updated
•
5
Thabet/SmolGRPO-135M-learning
Text Generation
•
0.1B
•
Updated
•
5
Text Generation
•
0.1B
•
Updated
•
5
Text Generation
•
0.1B
•
Updated
•
5
yigitkucuk/tint-interact-sft-grpo
Text Generation
•
0.4B
•
Updated
•
4
koochikoo25/SmolGRPO-135M
Text Generation
•
0.1B
•
Updated
•
5
Text Generation
•
0.1B
•
Updated
•
3
TianheWu/VisualQuality-R1-7B
Reinforcement Learning
•
8B
•
Updated
•
14.1k
•
9
pedrocurvo/llama2-grpo-lora
Text Generation
•
7B
•
Updated
•
8
mradermacher/VisualQuality-R1-7B-GGUF
8B
•
Updated
•
53
Text Generation
•
0.1B
•
Updated
•
10
•
1
Ceenen2302/Llama-3.2-1B-Instruct-GRPO-SmartLed
Feature Extraction
•
1B
•
Updated
•
6
alfredcs/torchrun-gemma-3-12b-grpo-icd10pcs-merged
Text Generation
•
8B
•
Updated
•
19