Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
6.2.0
Model Zoo
Note
- For all the pretraining and finetuning, we adopt spaese/uniform sampling.
#Frame$=$#input_frame$\times$#crop$\times$#clip#input_framemeans how many frames are input for model per inference#cropmeans spatial crops (e.g., 3 for left/right/center)#clipmeans temporal clips (e.g., 4 means repeted sampling four clips with different start indices)
Pretraining
| Model | Setting | Model | Shell |
|---|---|---|---|
| $\text{InternVideo2}_{s1}$-1B | K-Mash-1.1M 300e | TBD | run.sh |
| $\text{InternVideo2}_{s1}$-6B | K-Mash-2M 300e | TBD | run.sh |
Finetuning
K710
| Model | Setting | #Frame | Top-1 | Model | Shell |
|---|---|---|---|---|---|
| $\text{InternVideo2}_{s1}$-1B | K-Mash PT | 8x3x4 | 87.6 | TBD | run.sh |
| $\text{InternVideo2}_{s1}$-6B | K-Mash PT | 8x3x4 | 88.1 | TBD | run.sh |
K400
| Model | Setting | #Frame | Top-1 | Model | Shell |
|---|---|---|---|---|---|
| $\text{InternVideo2}_{s1}$-1B | K-Mash PT + K710 FT | 8x3x4 | 91.3 | TBD | run.sh |
| $\text{InternVideo2}_{s1}$-1B | K-Mash PT + K710 FT | 16x3x4 | 91.6 | TBD | run.sh |
| $\text{InternVideo2}_{s1}$-6B | K-Mash PT + K710 FT | 8x3x4 | 91.9 | TBD | run.sh |
| $\text{InternVideo2}_{s1}$-6B | K-Mash PT + K710 FT | 16x3x4 | 92.1 | TBD | run.sh |
K600
| Model | Setting | #Frame | Top-1 | Model | Shell |
|---|---|---|---|---|---|
| $\text{InternVideo2}_{s1}$-1B | K-Mash PT + K710 FT | 8x3x4 | 91.4 | TBD | run.sh |
| $\text{InternVideo2}_{s1}$-1B | K-Mash PT + K710 FT | 16x3x4 | 91.6 | TBD | run.sh |
| $\text{InternVideo2}_{s1}$-6B | K-Mash PT + K710 FT | 8x3x4 | 91.7 | TBD | run.sh |
| $\text{InternVideo2}_{s1}$-6B | K-Mash PT + K710 FT | 16x3x4 | 91.9 | TBD | run.sh |
K700
| Model | Setting | #Frame | Top-1 | Model | Shell |
|---|---|---|---|---|---|
| $\text{InternVideo2}_{s1}$-1B | K-Mash PT + K710 FT | 8x3x4 | 85.0 | TBD | run.sh |
| $\text{InternVideo2}_{s1}$-1B | K-Mash PT + K710 FT | 16x3x4 | 85.4 | TBD | run.sh |
| $\text{InternVideo2}_{s1}$-6B | K-Mash PT + K710 FT | 8x3x4 | 85.7 | TBD | run.sh |
| $\text{InternVideo2}_{s1}$-6B | K-Mash PT + K710 FT | 16x3x4 | 85.9 | TBD | run.sh |
MiT V1
| Model | Setting | #Frame | Top-1 | Model | Shell |
|---|---|---|---|---|---|
| $\text{InternVideo2}_{s1}$-1B | K-Mash PT + K710 FT + K400 FT | 8x3x4 | 50.8 | TBD | run.sh |
| $\text{InternVideo2}_{s1}$-6B | K-Mash PT + K710 FT + K400 FT | 8x3x4 | 51.0 | TBD | run.sh |
| $\text{InternVideo2}_{s1}$-6B 336↑ | K-Mash PT + K710 FT + K400 FT | 8x3x4 | 51.2 | TBD | run.sh |
SthSth V1
| Model | Setting | #Frame | Top-1 | Model | Shell |
|---|---|---|---|---|---|
| $\text{InternVideo2}_{s1}$-1B | K-Mash PT | 8x3x4 | 68.5 | TBD | run.sh |
| $\text{InternVideo2}_{s1}$-6B | K-Mash PT | 8x3x4 | 69.7 | TBD | run.sh |
SthSth V2
| Model | Setting | #Frame | Top-1 | Model | Shell |
|---|---|---|---|---|---|
| $\text{InternVideo2}_{s1}$-1B | K-Mash PT | 8x3x4 | 77.1 | TBD | run.sh |
| $\text{InternVideo2}_{s1}$-6B | K-Mash PT | 8x3x4 | 77.5 | TBD | run.sh |
ANet
| Model | Setting | #Frame | Top-1 | mAP | Model | Shell |
|---|---|---|---|---|---|---|
| $\text{InternVideo2}_{s1}$-6B | K-Mash PT + K710 FT + K400 FT | 8x3x4 | 95.9 | 98.2 | TBD | run.sh |
HACS
| Model | Setting | #Frame | Top-1 | mAP | Model | Shell |
|---|---|---|---|---|---|---|
| $\text{InternVideo2}_{s1}$-6B | K-Mash PT + K710 FT + K400 FT | 8x3x4 | 97.0 | 98.8 | TBD | run.sh |