Image Negative Token Merging: Image-based Adversarial Feature Guidance Paper • 2412.01339 • Published Dec 2, 2024 • 22
Negative Token Merging: Image-based Adversarial Feature Guidance Paper • 2412.01339 • Published Dec 2, 2024 • 22
Video ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning Paper • 2411.05003 • Published Nov 7, 2024 • 71
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning Paper • 2411.05003 • Published Nov 7, 2024 • 71
Vision FeatUp: A Model-Agnostic Framework for Features at Any Resolution Paper • 2403.10516 • Published Mar 15, 2024 • 16 InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 98
FeatUp: A Model-Agnostic Framework for Features at Any Resolution Paper • 2403.10516 • Published Mar 15, 2024 • 16
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 98
layout & UI LMM InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper • 2404.06512 • Published Apr 9, 2024 • 30
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper • 2404.06512 • Published Apr 9, 2024 • 30
3d generation LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published Nov 14, 2024 • 77 CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published Nov 27, 2024 • 59 Running on Zero Featured 4.78k TRELLIS 🏢 4.78k Scalable and Versatile 3D Generation from images
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published Nov 14, 2024 • 77
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published Nov 27, 2024 • 59
nerf & 3dgs Revising Densification in Gaussian Splatting Paper • 2404.06109 • Published Apr 9, 2024 • 9
Image Negative Token Merging: Image-based Adversarial Feature Guidance Paper • 2412.01339 • Published Dec 2, 2024 • 22
Negative Token Merging: Image-based Adversarial Feature Guidance Paper • 2412.01339 • Published Dec 2, 2024 • 22
3d generation LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published Nov 14, 2024 • 77 CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published Nov 27, 2024 • 59 Running on Zero Featured 4.78k TRELLIS 🏢 4.78k Scalable and Versatile 3D Generation from images
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published Nov 14, 2024 • 77
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published Nov 27, 2024 • 59
Video ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning Paper • 2411.05003 • Published Nov 7, 2024 • 71
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning Paper • 2411.05003 • Published Nov 7, 2024 • 71
Vision FeatUp: A Model-Agnostic Framework for Features at Any Resolution Paper • 2403.10516 • Published Mar 15, 2024 • 16 InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 98
FeatUp: A Model-Agnostic Framework for Features at Any Resolution Paper • 2403.10516 • Published Mar 15, 2024 • 16
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 98
nerf & 3dgs Revising Densification in Gaussian Splatting Paper • 2404.06109 • Published Apr 9, 2024 • 9
layout & UI LMM InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper • 2404.06512 • Published Apr 9, 2024 • 30
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper • 2404.06512 • Published Apr 9, 2024 • 30