Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# AutoNeural-VL-1.5B
|
| 2 |
|
| 3 |
## **Introduction**
|
|
@@ -11,39 +26,6 @@ Key Features:
|
|
| 11 |
- **Automotive-tuned dataset** – Trained with **200k** proprietary cockpit samples (AI Sentinel, Greeter, Car Finder, Safety) plus large-scale Infinity-MM instruction data.
|
| 12 |
- **Production-focused** – Designed for always-on, low-power, privacy-preserving deployment in real vehicles.
|
| 13 |
|
| 14 |
-
# **How to Use**
|
| 15 |
-
|
| 16 |
-
> ⚠️ **Hardware requirement:** AutoNeural is optimized for **Qualcomm NPUs**.
|
| 17 |
-
|
| 18 |
-
### 1) Install Nexa-SDK
|
| 19 |
-
|
| 20 |
-
Download the SDK,follow the installation steps provided on the model page.
|
| 21 |
-
|
| 22 |
-
---
|
| 23 |
-
|
| 24 |
-
### 2) Configure authentication
|
| 25 |
-
|
| 26 |
-
Create an access token in the Model Hub, then run:
|
| 27 |
-
|
| 28 |
-
```bash
|
| 29 |
-
nexa config set license '<access_token>'
|
| 30 |
-
```
|
| 31 |
-
|
| 32 |
-
---
|
| 33 |
-
|
| 34 |
-
### 3) Run the model
|
| 35 |
-
|
| 36 |
-
```bash
|
| 37 |
-
nexa infer NexaAI/AutoNeural
|
| 38 |
-
```
|
| 39 |
-
|
| 40 |
-
### Image input
|
| 41 |
-
|
| 42 |
-
Drag and drop one or more image files into the terminal window.
|
| 43 |
-
Multiple images can be processed with a single query.
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
---
|
| 47 |
|
| 48 |
## Use Cases
|
| 49 |
|
|
@@ -53,7 +35,7 @@ AutoNeural powers real-time cockpit intelligence including **in-cabin detection*
|
|
| 53 |
|
| 54 |
---
|
| 55 |
|
| 56 |
-
## ⚡ **Benchmarks
|
| 57 |
|
| 58 |
Validated on **Qualcomm SA8295P NPU**:
|
| 59 |
|
|
@@ -71,13 +53,12 @@ Validated on **Qualcomm SA8295P NPU**:
|
|
| 71 |
|
| 72 |
# **How to Use**
|
| 73 |
|
| 74 |
-
> ⚠️ **Hardware requirement:** AutoNeural is
|
| 75 |
|
| 76 |
### 1) Install Nexa-SDK
|
| 77 |
|
| 78 |
Download the SDK,follow the installation steps provided on the model page.
|
| 79 |
|
| 80 |
-
---
|
| 81 |
|
| 82 |
### 2) Configure authentication
|
| 83 |
|
|
@@ -87,15 +68,13 @@ Create an access token in the Model Hub, then run:
|
|
| 87 |
nexa config set license '<access_token>'
|
| 88 |
```
|
| 89 |
|
| 90 |
-
---
|
| 91 |
-
|
| 92 |
### 3) Run the model
|
| 93 |
|
| 94 |
```bash
|
| 95 |
nexa infer NexaAI/AutoNeural
|
| 96 |
```
|
| 97 |
|
| 98 |
-
### Image input
|
| 99 |
|
| 100 |
Drag and drop one or more image files into the terminal window.
|
| 101 |
Multiple images can be processed with a single query.
|
|
@@ -106,52 +85,15 @@ Multiple images can be processed with a single query.
|
|
| 106 |
|
| 107 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/6851901ea43b4824f79e27a9/eHNdopWWaoir2IP3Cu_AF.png" alt="Model Architecture" style="width:700px;"/>
|
| 108 |
|
| 109 |
-
### 🔍 **MobileNetV5 Vision Encoder (300M)**
|
| 110 |
-
|
| 111 |
-
Optimized for edge hardware, with:
|
| 112 |
-
|
| 113 |
-
* **Depthwise separable convolutions** for low compute and bounded activations.
|
| 114 |
-
* **Local attention bottlenecks** only in late stages for efficient long-range reasoning.
|
| 115 |
-
* **Multi-Scale Fusion Adapter (MSFA)** producing a compact **16×16×2048** feature map.
|
| 116 |
-
* Stable **INT8/16** behavior with minimal post-quantization degradation.
|
| 117 |
|
| 118 |
-
Yields **5.8× – 14× speedups** over ViT baselines across 256–768 px inputs.
|
| 119 |
|
| 120 |
---
|
| 121 |
|
| 122 |
-
|
| 123 |
|
| 124 |
-
Designed for NPU memory hierarchies:
|
| 125 |
|
| 126 |
-
* **5:1 ratio of SSM layers to Transformer attention layers**
|
| 127 |
-
* **Linear-time gated convolution layers** for most steps
|
| 128 |
-
* **Tiny rolling state** instead of KV-cache → up to **60% lower memory bandwidth**
|
| 129 |
-
* **W4A16 stable quantization** across layers
|
| 130 |
-
|
| 131 |
-
---
|
| 132 |
-
|
| 133 |
-
### 🔗 **Normalization-Free Vision–Language Connector**
|
| 134 |
-
|
| 135 |
-
A compact 2-layer MLP using **SiLU**, deliberately **removing RMSNorm** to avoid unstable activation ranges during static quantization.
|
| 136 |
-
|
| 137 |
-
Ensures reliable deployment on W8A16/W4A16 pipelines.
|
| 138 |
-
|
| 139 |
-
---
|
| 140 |
-
|
| 141 |
-
### 🚗 **Automotive-Grade Multimodal Intelligence**
|
| 142 |
-
|
| 143 |
-
Trained on **10M Infinity-MM samples** plus **200k automotive cockpit samples**, covering:
|
| 144 |
-
|
| 145 |
-
* AI Sentinel (vehicle security)
|
| 146 |
-
* AI Greeter (identity recognition)
|
| 147 |
-
* Car Finder (parking localization)
|
| 148 |
-
* Passenger safety monitoring
|
| 149 |
-
|
| 150 |
-
Ensures robust performance across lighting, demographics, weather, and motion scenarios.
|
| 151 |
-
|
| 152 |
-
---
|
| 153 |
|
| 154 |
-
|
| 155 |
|
| 156 |
The AutoNeural model is released under the **Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0)** license.
|
| 157 |
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
pipeline_tag: image-text-to-text
|
| 4 |
+
---
|
| 5 |
+
<p align="center">
|
| 6 |
+
<a href="https://arxiv.org/abs/2512.02924"><img src="https://img.shields.io/badge/📄%20arXiv-2512.02924-b31b1b?style=for-the-badge" alt="arXiv"></a>
|
| 7 |
+
<a href="https://discord.com/invite/nexa-ai"><img src="https://img.shields.io/badge/💬%20Discord-Nexa%20AI-5865F2?style=for-the-badge" alt="Discord"></a>
|
| 8 |
+
<a href="https://x.com/nexa_ai"><img src="https://img.shields.io/badge/𝕏%20Twitter-nexa__ai-000000?style=for-the-badge" alt="Twitter"></a>
|
| 9 |
+
</p>
|
| 10 |
+
|
| 11 |
+
<p align="center">
|
| 12 |
+
<a href="https://github.com/NexaAI/nexa-sdk/edit/main/solutions/autoneural/README.md"><b>🌟 Github</b></a> |
|
| 13 |
+
<a href="https://nexa.ai/solution/intelligent-cockpit"><b>📄 Webpage</b></a>
|
| 14 |
+
</p>
|
| 15 |
+
|
| 16 |
# AutoNeural-VL-1.5B
|
| 17 |
|
| 18 |
## **Introduction**
|
|
|
|
| 26 |
- **Automotive-tuned dataset** – Trained with **200k** proprietary cockpit samples (AI Sentinel, Greeter, Car Finder, Safety) plus large-scale Infinity-MM instruction data.
|
| 27 |
- **Production-focused** – Designed for always-on, low-power, privacy-preserving deployment in real vehicles.
|
| 28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
## Use Cases
|
| 31 |
|
|
|
|
| 35 |
|
| 36 |
---
|
| 37 |
|
| 38 |
+
## ⚡ **Benchmarks**
|
| 39 |
|
| 40 |
Validated on **Qualcomm SA8295P NPU**:
|
| 41 |
|
|
|
|
| 53 |
|
| 54 |
# **How to Use**
|
| 55 |
|
| 56 |
+
> ⚠️ **Hardware requirement:** AutoNeural is only available for **Qualcomm NPUs**.
|
| 57 |
|
| 58 |
### 1) Install Nexa-SDK
|
| 59 |
|
| 60 |
Download the SDK,follow the installation steps provided on the model page.
|
| 61 |
|
|
|
|
| 62 |
|
| 63 |
### 2) Configure authentication
|
| 64 |
|
|
|
|
| 68 |
nexa config set license '<access_token>'
|
| 69 |
```
|
| 70 |
|
|
|
|
|
|
|
| 71 |
### 3) Run the model
|
| 72 |
|
| 73 |
```bash
|
| 74 |
nexa infer NexaAI/AutoNeural
|
| 75 |
```
|
| 76 |
|
| 77 |
+
### 4) Image input
|
| 78 |
|
| 79 |
Drag and drop one or more image files into the terminal window.
|
| 80 |
Multiple images can be processed with a single query.
|
|
|
|
| 85 |
|
| 86 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/6851901ea43b4824f79e27a9/eHNdopWWaoir2IP3Cu_AF.png" alt="Model Architecture" style="width:700px;"/>
|
| 87 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
|
|
|
|
| 89 |
|
| 90 |
---
|
| 91 |
|
| 92 |
+
## **Training**
|
| 93 |
|
|
|
|
| 94 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
|
| 96 |
+
## **License**
|
| 97 |
|
| 98 |
The AutoNeural model is released under the **Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0)** license.
|
| 99 |
|