Image-Text-to-Text
nexaml commited on
Commit
0b0ad6e
·
verified ·
1 Parent(s): 0305d13

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -78
README.md CHANGED
@@ -1,3 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # AutoNeural-VL-1.5B
2
 
3
  ## **Introduction**
@@ -11,39 +26,6 @@ Key Features:
11
  - **Automotive-tuned dataset** – Trained with **200k** proprietary cockpit samples (AI Sentinel, Greeter, Car Finder, Safety) plus large-scale Infinity-MM instruction data.
12
  - **Production-focused** – Designed for always-on, low-power, privacy-preserving deployment in real vehicles.
13
 
14
- # **How to Use**
15
-
16
- > ⚠️ **Hardware requirement:** AutoNeural is optimized for **Qualcomm NPUs**.
17
-
18
- ### 1) Install Nexa-SDK
19
-
20
- Download the SDK,follow the installation steps provided on the model page.
21
-
22
- ---
23
-
24
- ### 2) Configure authentication
25
-
26
- Create an access token in the Model Hub, then run:
27
-
28
- ```bash
29
- nexa config set license '<access_token>'
30
- ```
31
-
32
- ---
33
-
34
- ### 3) Run the model
35
-
36
- ```bash
37
- nexa infer NexaAI/AutoNeural
38
- ```
39
-
40
- ### Image input
41
-
42
- Drag and drop one or more image files into the terminal window.
43
- Multiple images can be processed with a single query.
44
-
45
-
46
- ---
47
 
48
  ## Use Cases
49
 
@@ -53,7 +35,7 @@ AutoNeural powers real-time cockpit intelligence including **in-cabin detection*
53
 
54
  ---
55
 
56
- ## ⚡ **Benchmarks on NPU**
57
 
58
  Validated on **Qualcomm SA8295P NPU**:
59
 
@@ -71,13 +53,12 @@ Validated on **Qualcomm SA8295P NPU**:
71
 
72
  # **How to Use**
73
 
74
- > ⚠️ **Hardware requirement:** AutoNeural is optimized for **Qualcomm NPUs**.
75
 
76
  ### 1) Install Nexa-SDK
77
 
78
  Download the SDK,follow the installation steps provided on the model page.
79
 
80
- ---
81
 
82
  ### 2) Configure authentication
83
 
@@ -87,15 +68,13 @@ Create an access token in the Model Hub, then run:
87
  nexa config set license '<access_token>'
88
  ```
89
 
90
- ---
91
-
92
  ### 3) Run the model
93
 
94
  ```bash
95
  nexa infer NexaAI/AutoNeural
96
  ```
97
 
98
- ### Image input
99
 
100
  Drag and drop one or more image files into the terminal window.
101
  Multiple images can be processed with a single query.
@@ -106,52 +85,15 @@ Multiple images can be processed with a single query.
106
 
107
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6851901ea43b4824f79e27a9/eHNdopWWaoir2IP3Cu_AF.png" alt="Model Architecture" style="width:700px;"/>
108
 
109
- ### 🔍 **MobileNetV5 Vision Encoder (300M)**
110
-
111
- Optimized for edge hardware, with:
112
-
113
- * **Depthwise separable convolutions** for low compute and bounded activations.
114
- * **Local attention bottlenecks** only in late stages for efficient long-range reasoning.
115
- * **Multi-Scale Fusion Adapter (MSFA)** producing a compact **16×16×2048** feature map.
116
- * Stable **INT8/16** behavior with minimal post-quantization degradation.
117
 
118
- Yields **5.8× – 14× speedups** over ViT baselines across 256–768 px inputs.
119
 
120
  ---
121
 
122
- ### 🧠 **Hybrid Transformer-SSM Language Backbone (1.2B)**
123
 
124
- Designed for NPU memory hierarchies:
125
 
126
- * **5:1 ratio of SSM layers to Transformer attention layers**
127
- * **Linear-time gated convolution layers** for most steps
128
- * **Tiny rolling state** instead of KV-cache → up to **60% lower memory bandwidth**
129
- * **W4A16 stable quantization** across layers
130
-
131
- ---
132
-
133
- ### 🔗 **Normalization-Free Vision–Language Connector**
134
-
135
- A compact 2-layer MLP using **SiLU**, deliberately **removing RMSNorm** to avoid unstable activation ranges during static quantization.
136
-
137
- Ensures reliable deployment on W8A16/W4A16 pipelines.
138
-
139
- ---
140
-
141
- ### 🚗 **Automotive-Grade Multimodal Intelligence**
142
-
143
- Trained on **10M Infinity-MM samples** plus **200k automotive cockpit samples**, covering:
144
-
145
- * AI Sentinel (vehicle security)
146
- * AI Greeter (identity recognition)
147
- * Car Finder (parking localization)
148
- * Passenger safety monitoring
149
-
150
- Ensures robust performance across lighting, demographics, weather, and motion scenarios.
151
-
152
- ---
153
 
154
- # **License**
155
 
156
  The AutoNeural model is released under the **Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0)** license.
157
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ pipeline_tag: image-text-to-text
4
+ ---
5
+ <p align="center">
6
+ <a href="https://arxiv.org/abs/2512.02924"><img src="https://img.shields.io/badge/📄%20arXiv-2512.02924-b31b1b?style=for-the-badge" alt="arXiv"></a>
7
+ <a href="https://discord.com/invite/nexa-ai"><img src="https://img.shields.io/badge/💬%20Discord-Nexa%20AI-5865F2?style=for-the-badge" alt="Discord"></a>
8
+ <a href="https://x.com/nexa_ai"><img src="https://img.shields.io/badge/𝕏%20Twitter-nexa__ai-000000?style=for-the-badge" alt="Twitter"></a>
9
+ </p>
10
+
11
+ <p align="center">
12
+ <a href="https://github.com/NexaAI/nexa-sdk/edit/main/solutions/autoneural/README.md"><b>🌟 Github</b></a> |
13
+ <a href="https://nexa.ai/solution/intelligent-cockpit"><b>📄 Webpage</b></a>
14
+ </p>
15
+
16
  # AutoNeural-VL-1.5B
17
 
18
  ## **Introduction**
 
26
  - **Automotive-tuned dataset** – Trained with **200k** proprietary cockpit samples (AI Sentinel, Greeter, Car Finder, Safety) plus large-scale Infinity-MM instruction data.
27
  - **Production-focused** – Designed for always-on, low-power, privacy-preserving deployment in real vehicles.
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ## Use Cases
31
 
 
35
 
36
  ---
37
 
38
+ ## ⚡ **Benchmarks**
39
 
40
  Validated on **Qualcomm SA8295P NPU**:
41
 
 
53
 
54
  # **How to Use**
55
 
56
+ > ⚠️ **Hardware requirement:** AutoNeural is only available for **Qualcomm NPUs**.
57
 
58
  ### 1) Install Nexa-SDK
59
 
60
  Download the SDK,follow the installation steps provided on the model page.
61
 
 
62
 
63
  ### 2) Configure authentication
64
 
 
68
  nexa config set license '<access_token>'
69
  ```
70
 
 
 
71
  ### 3) Run the model
72
 
73
  ```bash
74
  nexa infer NexaAI/AutoNeural
75
  ```
76
 
77
+ ### 4) Image input
78
 
79
  Drag and drop one or more image files into the terminal window.
80
  Multiple images can be processed with a single query.
 
85
 
86
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6851901ea43b4824f79e27a9/eHNdopWWaoir2IP3Cu_AF.png" alt="Model Architecture" style="width:700px;"/>
87
 
 
 
 
 
 
 
 
 
88
 
 
89
 
90
  ---
91
 
92
+ ## **Training**
93
 
 
94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
+ ## **License**
97
 
98
  The AutoNeural model is released under the **Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0)** license.
99