2 41 2

Akshay Nuthanapati

a0308

AI & ML interests

Neural Networks, Large Language Models

Recent Activity

upvoted an article 7 days ago

Deriving the PPO Loss from First Principles

new activity 21 days ago

huggingface/InferenceSupport:haykgrigorian/v2mini-eval1

upvoted an article about 1 month ago

Exploring Quantization Backends in Diffusers

View all activity

Organizations

None yet

upvoted an article 7 days ago

Article

Deriving the PPO Loss from First Principles

8 days ago

•

upvoted 3 articles about 1 month ago

Article

Exploring Quantization Backends in Diffusers

May 21, 2025

•

Article

Diffusers welcomes FLUX-2

Nov 25, 2025

•

167

Article

Continuous batching from first principles

Nov 25, 2025

•

291

upvoted 6 articles 2 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16, 2024

•

436

Article

Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face

Feb 11, 2025

•

Article

KV Cache from scratch in nanoVLM

Jun 4, 2025

•

107

Article

Proximal Policy Optimization (PPO)

Aug 5, 2022

•

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

Jan 30, 2025

•

209

Article

Get your VLM running in 3 simple steps on Intel CPUs

Oct 15, 2025

•

upvoted a paper 2 months ago

MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21, 2025 • 97

upvoted 2 articles 3 months ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Mar 12, 2025

•

480

Article

Vision Language Models (Better, faster, stronger)

May 12, 2025

•

580

upvoted a paper 3 months ago

D-AR: Diffusion via Autoregressive Models

Paper • 2505.23660 • Published May 29, 2025 • 34

upvoted 6 articles 3 months ago

Article

Introducing Würstchen: Fast Diffusion for Image Generation

Sep 13, 2023

•

Article

How 🤗 Accelerate runs very large models thanks to PyTorch

Sep 27, 2022

•

Article

From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease

Oct 21, 2022

•

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Oct 7, 2024

•

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Feb 7, 2025

•

267

Article

There is no such thing as a tokenizer-free lunch

Sep 25, 2025

•

Akshay Nuthanapati

AI & ML interests

Recent Activity

Organizations

a0308's activity

Deriving the PPO Loss from First Principles

Exploring Quantization Backends in Diffusers

Diffusers welcomes FLUX-2

Continuous batching from first principles

SmolLM - blazingly fast and remarkably powerful

Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face

KV Cache from scratch in nanoVLM

Proximal Policy Optimization (PPO)

KV Caching Explained: Optimizing Transformer Inference Efficiency

Get your VLM running in 3 simple steps on Intel CPUs

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Vision Language Models (Better, faster, stronger)

Introducing Würstchen: Fast Diffusion for Image Generation

How 🤗 Accelerate runs very large models thanks to PyTorch

From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease

Efficient LLM Pretraining: Packed Sequences and Masked Attention

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

There is no such thing as a tokenizer-free lunch