-
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 54 -
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper • 2510.14973 • Published • 39 -
Attention Sinks in Diffusion Language Models
Paper • 2510.15731 • Published • 48 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 124
Po Hsiang Yu
EasyMoneySniper66
AI & ML interests
None yet
Recent Activity
updated
a collection
about 6 hours ago
dLLMs
updated
a collection
about 1 month ago
dLLMs
updated
a collection
about 1 month ago
dLLMs
Organizations
None yet
Multi-modality LVM Datasets
-
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Paper • 2406.11833 • Published • 63 -
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
Paper • 2406.11230 • Published • 33 -
Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models
Paper • 2406.14035 • Published • 13 -
Needle In A Multimodal Haystack
Paper • 2406.07230 • Published • 54
Long Context
-
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Paper • 2408.07055 • Published • 67 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 52 -
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Paper • 2408.15518 • Published • 42
Multi-modality LVM
-
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Paper • 2406.12275 • Published • 31 -
TroL: Traversal of Layers for Large Language and Vision Models
Paper • 2406.12246 • Published • 35 -
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Paper • 2406.15334 • Published • 9 -
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Paper • 2406.12742 • Published • 15
Multimodality Video LVM
-
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
Paper • 2407.12679 • Published • 8 -
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Paper • 2407.15841 • Published • 40 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 52
dLLMs
-
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 54 -
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper • 2510.14973 • Published • 39 -
Attention Sinks in Diffusion Language Models
Paper • 2510.15731 • Published • 48 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 124
Multi-modality LVM
-
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Paper • 2406.12275 • Published • 31 -
TroL: Traversal of Layers for Large Language and Vision Models
Paper • 2406.12246 • Published • 35 -
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Paper • 2406.15334 • Published • 9 -
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Paper • 2406.12742 • Published • 15
Multi-modality LVM Datasets
-
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Paper • 2406.11833 • Published • 63 -
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
Paper • 2406.11230 • Published • 33 -
Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models
Paper • 2406.14035 • Published • 13 -
Needle In A Multimodal Haystack
Paper • 2406.07230 • Published • 54
Multimodality Video LVM
-
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
Paper • 2407.12679 • Published • 8 -
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Paper • 2407.15841 • Published • 40 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 52
Long Context
-
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Paper • 2408.07055 • Published • 67 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 52 -
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Paper • 2408.15518 • Published • 42