Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Shengyi Costa Huang's picture

Shengyi Costa Huang

vwxyzjn

osanseviero's profile picture

emretmrk's profile picture

mehdikiani's profile picture

·

http://costa.sh

vwxyzjn
vwxyzjn

AI & ML interests

None yet

Organizations

vwxyzjn 's collections 4

Async RLHF Paper Checkpoints

Checkpoints for "Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models" https://arxiv.org/abs/2410.18252

vwxyzjn/online_dpo_async

Updated Feb 5 • 2
vwxyzjn/online_dpo_sync

Updated Feb 5 • 3
vwxyzjn/ppo_async

Updated Feb 5 • 2
vwxyzjn/ppo_sync

Updated Feb 5 • 4

TL;DR summarization checkpoints

The checkpoints are trained in https://arxiv.org/abs/2403.17031 and taken from https://wandb.ai/costa-huang/tldr_summarize/reports/Release--Vmlldzo3MT

cleanrl/EleutherAI_pythia-1b-deduped__sft__tldr

Text Generation • Updated May 15, 2024 • 986
cleanrl/EleutherAI_pythia-1b-deduped__reward__tldr

Text Classification • Updated May 15, 2024 • 384
cleanrl/EleutherAI_pythia-2.8b-deduped__sft__tldr

Text Generation • Updated May 15, 2024 • 121
cleanrl/EleutherAI_pythia-2.8b-deduped__reward__tldr

Text Classification • Updated May 15, 2024 • 61

lm-human-preference-details

vwxyzjn/train_policy_accelerate__sentiment_offline_5k.json__seed1__1696447674

Text Generation • 0.1B • Updated Oct 4, 2023 • 5
lm-human-preference-details/train_policy_accelerate__sentiment_offline_5k.json__seed1

Text Generation • 0.1B • Updated Oct 4, 2023 • 8

RLOO / PPOv2 TL;DR summarize checkpoints

vwxyzjn/ppo_tldr

Text Generation • 1B • Updated May 24, 2024 • 6 • 1
vwxyzjn/ppo_tldr_6.9b

Text Generation • 7B • Updated Jun 7, 2024 • 3
vwxyzjn/rloo_tldr

Text Generation • 1B • Updated Jun 11, 2024 • 10
vwxyzjn/rloo_tldr_6.9b

Text Generation • 7B • Updated Jun 7, 2024 • 3

Async RLHF Paper Checkpoints

Checkpoints for "Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models" https://arxiv.org/abs/2410.18252

vwxyzjn/online_dpo_async

Updated Feb 5 • 2
vwxyzjn/online_dpo_sync

Updated Feb 5 • 3
vwxyzjn/ppo_async

Updated Feb 5 • 2
vwxyzjn/ppo_sync

Updated Feb 5 • 4

lm-human-preference-details

vwxyzjn/train_policy_accelerate__sentiment_offline_5k.json__seed1__1696447674

Text Generation • 0.1B • Updated Oct 4, 2023 • 5
lm-human-preference-details/train_policy_accelerate__sentiment_offline_5k.json__seed1

Text Generation • 0.1B • Updated Oct 4, 2023 • 8

TL;DR summarization checkpoints

The checkpoints are trained in https://arxiv.org/abs/2403.17031 and taken from https://wandb.ai/costa-huang/tldr_summarize/reports/Release--Vmlldzo3MT

cleanrl/EleutherAI_pythia-1b-deduped__sft__tldr

Text Generation • Updated May 15, 2024 • 986
cleanrl/EleutherAI_pythia-1b-deduped__reward__tldr

Text Classification • Updated May 15, 2024 • 384
cleanrl/EleutherAI_pythia-2.8b-deduped__sft__tldr

Text Generation • Updated May 15, 2024 • 121
cleanrl/EleutherAI_pythia-2.8b-deduped__reward__tldr

Text Classification • Updated May 15, 2024 • 61

RLOO / PPOv2 TL;DR summarize checkpoints

vwxyzjn/ppo_tldr

Text Generation • 1B • Updated May 24, 2024 • 6 • 1
vwxyzjn/ppo_tldr_6.9b

Text Generation • 7B • Updated Jun 7, 2024 • 3
vwxyzjn/rloo_tldr

Text Generation • 1B • Updated Jun 11, 2024 • 10
vwxyzjn/rloo_tldr_6.9b

Text Generation • 7B • Updated Jun 7, 2024 • 3

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs