| base_model: | |
| - Qwen/Qwen2.5-VL-7B-Instruct | |
| datasets: | |
| - RenlyH/CodeV-RL-Data | |
| language: | |
| - en | |
| - zh | |
| license: mit | |
| metrics: | |
| - accuracy | |
| pipeline_tag: image-text-to-text | |
| library_name: transformers | |
| CodeV is a code-based visual agent trained with Tool-Aware Policy Optimization (TAPO) for faithful visual reasoning. This agentic vision-language model is designed to "think with images" by calling image operations, addressing unfaithful visual reasoning in prior models. CodeV achieves competitive accuracy and substantially increases faithful tool-use rates on visual search benchmarks, also demonstrating strong performance on multimodal reasoning and math benchmarks. | |
| This model was presented in the paper [CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization](https://huggingface.co/papers/2511.19661). | |
| Code: https://github.com/RenlyH/CodeV |