LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs Paper • 2506.21862 • Published Jun 27 • 36 • 3
ViSpeak: Visual Instruction Feedback in Streaming Videos Paper • 2503.12769 • Published Mar 17 • 8 • 2