mistralai
/

Ministral-3-3B-Instruct-2512-ONNX

+---
+license: apache-2.0
+base_model:
+- mistralai/Ministral-3-3B-Instruct-2512
+language:
+- en
+- fr
+- es
+- de
+- it
+- pt
+- nl
+- zh
+- ja
+- ko
+- ar
+---
+# Ministral 3 3B Instruct 2512
+The smallest model in the Ministral 3 family, **Ministral 3 3B** is a powerful, efficient tiny language model with vision capabilities.
+This model is the instruct post-trained version, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.
+The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, capable of fitting in 16GB of VRAM in BF16, and less than 8GB of RAM/VRAM when quantized.
+We provide a no-loss FP8 version [here](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512-FP8), you can find other formats and quantizations in the [Ministral 3 - Quants](https://huggingface.co/collections/mistralai/ministral-3-quants) collection.
+## Key Features
+Ministral 3 3B consists of two main architectural components:
+- **3.4B Language Model**
+- **0.4B Vision Encoder**
+The Ministral 3 3B Instruct model offers the following capabilities:
+- **Vision**: Enables the model to analyze images and provide insights based on visual content, in addition to text.
+- **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
+- **System Prompt**: Maintains strong adherence and support for system prompts.
+- **Agentic**: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
+- **Edge-Optimized**: Delivers best-in-class performance at a small scale, deployable anywhere.
+- **Apache 2.0 License**: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
+- **Large Context Window**: Supports a 256k context window.
+### Use Cases
+Ideal for lightweight, real-time applications on edge or low-resource devices, such as:
+- Image captioning
+- Text classification
+- Real-time efficient translation
+- Data extraction
+- Short content generation
+- Fine-tuning and specialization
+- And more...
+Bringing advanced AI capabilities to edge and distributed environments for embedded systems.
+## Ministral 3 Family
+| Model Name                     | Type               | Precision | Link                                                                                     |
+|--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|
+| Ministral 3 3B Base 2512       | Base pre-trained   | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512)                |
+| **Ministral 3 3B Instruct 2512**   | **Instruct post-trained** | **BF16**   | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512)            |
+| Ministral 3 3B Reasoning 2512  | Reasoning capable  | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512)           |
+| Ministral 3 8B Base 2512       | Base pre-trained   | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512)                |
+| Ministral 3 8B Instruct 2512   | Instruct post-trained | BF16    | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512)            |
+| Ministral 3 8B Reasoning 2512  | Reasoning capable  | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512)           |
+| Ministral 3 14B Base 2512      | Base pre-trained   | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512)               |
+| Ministral 3 14B Instruct 2512  | Instruct post-trained | BF16    | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512)           |
+| Ministral 3 14B Reasoning 2512 | Reasoning capable  | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512)          |
+Other formats available [here](https://huggingface.co/collections/mistralai/ministral-3-quants).
+## Benchmark Results
+We compare Ministral 3 to similar sized models.
+### Reasoning
+| Model                     | AIME25      | AIME24      | GPQA Diamond | LiveCodeBench |
+|---------------------------|-------------|-------------|--------------|---------------|
+| **Ministral 3 14B**       | <u>0.850</u>| <u>0.898</u>| <u>0.712</u> | <u>0.646</u>  |
+| Qwen3-14B (Thinking)      | 0.737       | 0.837       | 0.663        | 0.593         |
+|                           |             |             |              |               |
+| **Ministral 3 8B**        | 0.787       | <u>0.860</u>| 0.668        | <u>0.616</u>  |
+| Qwen3-VL-8B-Thinking      | <u>0.798</u>| <u>0.860</u>| <u>0.671</u> | 0.580         |
+|                           |             |             |              |               |
+| **Ministral 3 3B**        | <u>0.721</u>| <u>0.775</u>| 0.534        | <u>0.548</u>  |
+| Qwen3-VL-4B-Thinking      | 0.697       | 0.729       | <u>0.601</u> | 0.513         |
+### Instruct
+| Model                     | Arena Hard  | WildBench  | MATH Maj@1  | MM MTBench       |
+|---------------------------|-------------|------------|-------------|------------------|
+| **Ministral 3 14B**       | <u>0.551</u>| <u>68.5</u>| <u>0.904</u>| <u>8.49</u>      |
+| Qwen3 14B (Non-Thinking)  | 0.427       | 65.1       | 0.870       | NOT MULTIMODAL   |
+| Gemma3-12B-Instruct       | 0.436       | 63.2       | 0.854       | 6.70             |
+|                           |             |            |             |                  |
+| **Ministral 3 8B**        | 0.509       | <u>66.8</u>| 0.876       | <u>8.08</u>      |
+| Qwen3-VL-8B-Instruct      | <u>0.528</u>| 66.3       | <u>0.946</u>| 8.00             |
+|                           |             |            |             |                  |
+| **Ministral 3 3B**        | 0.305       | <u>56.8</u>| 0.830       | 7.83             |
+| Qwen3-VL-4B-Instruct      | <u>0.438</u>| <u>56.8</u>| <u>0.900</u>| <u>8.01</u>      |
+| Qwen3-VL-2B-Instruct      | 0.163       | 42.2       | 0.786       | 6.36             |
+| Gemma3-4B-Instruct        | 0.318       | 49.1       | 0.759       | 5.23             |
+### Base
+| Model               | Multilingual MMLU | MATH CoT 2-Shot | AGIEval 5-shot | MMLU Redux 5-shot | MMLU 5-shot | TriviaQA 5-shot |
+|---------------------|-------------------|-----------------|----------------|-------------------|-------------|-----------------|
+| **Ministral 3 14B** | 0.742             | <u>0.676</u>    | 0.648          | 0.820             | 0.794       | 0.749           |
+| Qwen3 14B Base      | <u>0.754</u>      | 0.620           | <u>0.661</u>   | <u>0.837</u>      | <u>0.804</u>| 0.703           |
+| Gemma 3 12B Base    | 0.690             | 0.487           | 0.587          | 0.766             | 0.745       | <u>0.788</u>    |
+|                     |                   |                 |                |                   |             |                 |
+| **Ministral 3 8B**  | <u>0.706</u>      | <u>0.626</u>    | 0.591          | 0.793             | <u>0.761</u>| <u>0.681</u>    |
+| Qwen 3 8B Base      | 0.700             | 0.576           | <u>0.596</u>   | <u>0.794</u>      | 0.760       | 0.639           |
+|                     |                   |                 |                |                   |             |                 |
+| **Ministral 3 3B**  | 0.652             | <u>0.601</u>    | 0.511          | 0.735             | 0.707       | 0.592           |
+| Qwen 3 4B Base      | <u>0.677</u>      | 0.405           | <u>0.570</u>   | <u>0.759</u>      | <u>0.713</u>| 0.530           |
+| Gemma 3 4B Base     | 0.516             | 0.294           | 0.430          | 0.626             | 0.589       | <u>0.640</u>    |
+## Usage
+### ONNXRuntime
+```py
+from transformers import AutoConfig, AutoProcessor
+import onnxruntime
+import numpy as np
+from huggingface_hub import hf_hub_download
+# 1. Load config, processor, and model
+model_id = "mistralai/Ministral-3-3B-Instruct-2512-ONNX"
+config = AutoConfig.from_pretrained(model_id)
+processor = AutoProcessor.from_pretrained(model_id)
+vision_model_path = hf_hub_download(model_id, "vision_encoder_q4.onnx", subfolder="onnx")         # Download vision graph
+hf_hub_download(model_id, "vision_encoder_q4.onnx_data", subfolder="onnx")                        # Download vision weights
+embed_model_path = hf_hub_download(model_id, "embed_tokens_fp16.onnx", subfolder="onnx")          # Download embed_tokens graph
+hf_hub_download(model_id, "embed_tokens_fp16.onnx_data", subfolder="onnx")                        # Download embed_tokens weights
+decoder_model_path = hf_hub_download(model_id, "decoder_model_merged_q4.onnx", subfolder="onnx")  # Download decoder graph
+hf_hub_download(model_id, "decoder_model_merged_q4.onnx_data", subfolder="onnx")                  # Download decoder weights (1/2)
+hf_hub_download(model_id, "decoder_model_merged_q4.onnx_data_1", subfolder="onnx")                # Download decoder weights (2/2)
+## Load sessions
+providers = ['CPUExecutionProvider']
+vision_session = onnxruntime.InferenceSession(vision_model_path, providers=providers)
+embed_session = onnxruntime.InferenceSession(embed_model_path, providers=providers)
+decoder_session = onnxruntime.InferenceSession(decoder_model_path, providers=providers)
+## Set config values
+text_config = config.text_config
+num_key_value_heads = text_config.num_key_value_heads
+head_dim = text_config.head_dim
+num_hidden_layers = text_config.num_hidden_layers
+eos_token_id = text_config.eos_token_id
+image_token_index = config.image_token_index
+# 2. Prepare inputs
+image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {
+                "type": "text",
+                "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
+            },
+            {"type": "image", "url": image_url},
+        ],
+    },
+]
+inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt")
+input_ids = inputs['input_ids'].numpy()
+attention_mask = inputs['attention_mask'].numpy()
+pixel_values = inputs['pixel_values'].numpy()
+batch_size = input_ids.shape[0]
+past_key_values = {
+    f'past_key_values.{layer}.{kv}': np.zeros([batch_size, num_key_value_heads, 0, head_dim], dtype=np.float32)
+    for layer in range(num_hidden_layers)
+    for kv in ('key', 'value')
+}
+position_ids = np.tile(np.arange(0, input_ids.shape[-1]), (batch_size, 1))
+# 3. Generation loop
+max_new_tokens = 1024
+generated_tokens = np.array([[]], dtype=np.int64)
+image_features = None
+for i in range(max_new_tokens):
+  inputs_embeds = embed_session.run(None, {'input_ids': input_ids})[0]
+  if image_features is None:
+    ## Only compute vision features if not already computed
+    image_features = vision_session.run(None, dict(
+        pixel_values=pixel_values,
+    ))[0]
+    ## Merge text and vision embeddings
+    inputs_embeds[input_ids == image_token_index] = image_features.reshape(-1, image_features.shape[-1])
+  logits, *present_key_values = decoder_session.run(None, dict(
+      inputs_embeds=inputs_embeds,
+      attention_mask=attention_mask,
+      position_ids=position_ids,
+      **past_key_values,
+  ))
+  ## Update values for next generation loop
+  input_ids = logits[:, -1].argmax(-1, keepdims=True)
+  attention_mask = np.concatenate([attention_mask, np.ones((batch_size, 1), dtype=attention_mask.dtype)], axis=-1)
+  position_ids = position_ids[:, -1:] + 1
+  for j, key in enumerate(past_key_values):
+    past_key_values[key] = present_key_values[j]
+  generated_tokens = np.concatenate([generated_tokens, input_ids], axis=-1)
+  if (input_ids == eos_token_id).all():
+    break
+  ## (Optional) Streaming
+  print(processor.decode(input_ids[0]), end='', flush=True)
+print()
+# 4. Output result
+print(processor.batch_decode(generated_tokens, skip_special_tokens=True)[0])
+```
+### Transformers.js
+TODO
+## License
+This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
+*You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.*