Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.13918

LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation

Paper • 2508.03694 • Published Aug 5 • 50
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7 • 180
Improving Video Generation with Human Feedback

Paper • 2501.13918 • Published Jan 23 • 52
Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 122

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published Apr 7 • 110
MoCha: Towards Movie-Grade Talking Character Synthesis

Paper • 2503.23307 • Published Mar 30 • 138
Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21 • 157
Antidistillation Sampling

Paper • 2504.13146 • Published Apr 17 • 59

Improving Video Generation with Human Feedback

Paper • 2501.13918 • Published Jan 23 • 52

TTV 📝➡️📺

tencent/HunyuanVideo

Text-to-Video • Updated Mar 6 • 1.17k • • 2.08k
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

Paper • 2412.02687 • Published Dec 3, 2024 • 113
STIV: Scalable Text and Image Conditioned Video Generation

Paper • 2412.07730 • Published Dec 10, 2024 • 74
Improving Video Generation with Human Feedback

Paper • 2501.13918 • Published Jan 23 • 52

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 300
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 303
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models

Paper • 2503.24235 • Published Mar 31 • 54
Seedream 3.0 Technical Report

Paper • 2504.11346 • Published Apr 15 • 70

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 115
PaSa: An LLM Agent for Comprehensive Academic Paper Search

Paper • 2501.10120 • Published Jan 17 • 53
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16 • 33
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

Paper • 2501.10132 • Published Jan 17 • 22

SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

Paper • 2501.13200 • Published Jan 22 • 69
Improving Video Generation with Human Feedback

Paper • 2501.13918 • Published Jan 23 • 52

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published May 14, 2024 • 24
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 30
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 132
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20, 2024 • 41

LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation

Paper • 2508.03694 • Published Aug 5 • 50
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7 • 180
Improving Video Generation with Human Feedback

Paper • 2501.13918 • Published Jan 23 • 52
Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 122

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 300
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 303
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models

Paper • 2503.24235 • Published Mar 31 • 54
Seedream 3.0 Technical Report

Paper • 2504.11346 • Published Apr 15 • 70

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published Apr 7 • 110
MoCha: Towards Movie-Grade Talking Character Synthesis

Paper • 2503.23307 • Published Mar 30 • 138
Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21 • 157
Antidistillation Sampling

Paper • 2504.13146 • Published Apr 17 • 59

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 115
PaSa: An LLM Agent for Comprehensive Academic Paper Search

Paper • 2501.10120 • Published Jan 17 • 53
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16 • 33
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

Paper • 2501.10132 • Published Jan 17 • 22

Improving Video Generation with Human Feedback

Paper • 2501.13918 • Published Jan 23 • 52

SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

Paper • 2501.13200 • Published Jan 22 • 69
Improving Video Generation with Human Feedback

Paper • 2501.13918 • Published Jan 23 • 52

TTV 📝➡️📺

tencent/HunyuanVideo

Text-to-Video • Updated Mar 6 • 1.17k • • 2.08k
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

Paper • 2412.02687 • Published Dec 3, 2024 • 113
STIV: Scalable Text and Image Conditioned Video Generation

Paper • 2412.07730 • Published Dec 10, 2024 • 74
Improving Video Generation with Human Feedback

Paper • 2501.13918 • Published Jan 23 • 52

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published May 14, 2024 • 24
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 30
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 132
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20, 2024 • 41

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs