NexaAI
/

AutoNeural

Image-Text-to-Text

Model card Files Files and versions

xet

Community

nexaml commited on 9 days ago

Commit

0b0ad6e

verified ·

1 Parent(s): 0305d13

Update README.md

Browse files

Files changed (1) hide show

README.md +20 -78

README.md CHANGED Viewed

@@ -1,3 +1,18 @@
 # AutoNeural-VL-1.5B
 ## **Introduction**
@@ -11,39 +26,6 @@ Key Features:
 - **Automotive-tuned dataset** – Trained with **200k** proprietary cockpit samples (AI Sentinel, Greeter, Car Finder, Safety) plus large-scale Infinity-MM instruction data.
 - **Production-focused** – Designed for always-on, low-power, privacy-preserving deployment in real vehicles.
-# **How to Use**
-> ⚠️ **Hardware requirement:** AutoNeural is optimized for **Qualcomm NPUs**.
-### 1) Install Nexa-SDK
-Download the SDK，follow the installation steps provided on the model page.
----
-### 2) Configure authentication
-Create an access token in the Model Hub, then run:
-```bash
-nexa config set license '<access_token>'
-```
----
-### 3) Run the model
-```bash
-nexa infer NexaAI/AutoNeural
-```
-### Image input
-Drag and drop one or more image files into the terminal window.
-Multiple images can be processed with a single query.
----
 ## Use Cases
@@ -53,7 +35,7 @@ AutoNeural powers real-time cockpit intelligence including **in-cabin detection*
 ---
-## ⚡ **Benchmarks on NPU**
 Validated on **Qualcomm SA8295P NPU**:
@@ -71,13 +53,12 @@ Validated on **Qualcomm SA8295P NPU**:
 # **How to Use**
-> ⚠️ **Hardware requirement:** AutoNeural is optimized for **Qualcomm NPUs**.
 ### 1) Install Nexa-SDK
 Download the SDK，follow the installation steps provided on the model page.
----
 ### 2) Configure authentication
@@ -87,15 +68,13 @@ Create an access token in the Model Hub, then run:
 nexa config set license '<access_token>'
 ```
----
 ### 3) Run the model
 ```bash
 nexa infer NexaAI/AutoNeural
 ```
-### Image input
 Drag and drop one or more image files into the terminal window.
 Multiple images can be processed with a single query.
@@ -106,52 +85,15 @@ Multiple images can be processed with a single query.
 <img src="https://cdn-uploads.huggingface.co/production/uploads/6851901ea43b4824f79e27a9/eHNdopWWaoir2IP3Cu_AF.png" alt="Model Architecture" style="width:700px;"/>
-### 🔍 **MobileNetV5 Vision Encoder (300M)**
-Optimized for edge hardware, with:
-* **Depthwise separable convolutions** for low compute and bounded activations.
-* **Local attention bottlenecks** only in late stages for efficient long-range reasoning.
-* **Multi-Scale Fusion Adapter (MSFA)** producing a compact **16×16×2048** feature map.
-* Stable **INT8/16** behavior with minimal post-quantization degradation.
-Yields **5.8× – 14× speedups** over ViT baselines across 256–768 px inputs.
 ---
-### 🧠 **Hybrid Transformer-SSM Language Backbone (1.2B)**
-Designed for NPU memory hierarchies:
-* **5:1 ratio of SSM layers to Transformer attention layers**
-* **Linear-time gated convolution layers** for most steps
-* **Tiny rolling state** instead of KV-cache → up to **60% lower memory bandwidth**
-* **W4A16 stable quantization** across layers
----
-### 🔗 **Normalization-Free Vision–Language Connector**
-A compact 2-layer MLP using **SiLU**, deliberately **removing RMSNorm** to avoid unstable activation ranges during static quantization.
-Ensures reliable deployment on W8A16/W4A16 pipelines.
----
-### 🚗 **Automotive-Grade Multimodal Intelligence**
-Trained on **10M Infinity-MM samples** plus **200k automotive cockpit samples**, covering:
-* AI Sentinel (vehicle security)
-* AI Greeter (identity recognition)
-* Car Finder (parking localization)
-* Passenger safety monitoring
-Ensures robust performance across lighting, demographics, weather, and motion scenarios.
----
-# **License**
 The AutoNeural model is released under the **Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0)** license.

+---
+license: cc-by-nc-4.0
+pipeline_tag: image-text-to-text
+---
+<p align="center">
+  <a href="https://arxiv.org/abs/2512.02924"><img src="https://img.shields.io/badge/📄%20arXiv-2512.02924-b31b1b?style=for-the-badge" alt="arXiv"></a>
+  <a href="https://discord.com/invite/nexa-ai"><img src="https://img.shields.io/badge/💬%20Discord-Nexa%20AI-5865F2?style=for-the-badge" alt="Discord"></a>
+  <a href="https://x.com/nexa_ai"><img src="https://img.shields.io/badge/𝕏%20Twitter-nexa__ai-000000?style=for-the-badge" alt="Twitter"></a>
+</p>
+<p align="center">
+  <a href="https://github.com/NexaAI/nexa-sdk/edit/main/solutions/autoneural/README.md"><b>🌟 Github</b></a> |
+  <a href="https://nexa.ai/solution/intelligent-cockpit"><b>📄 Webpage</b></a>
+</p>
 # AutoNeural-VL-1.5B
 ## **Introduction**
 - **Automotive-tuned dataset** – Trained with **200k** proprietary cockpit samples (AI Sentinel, Greeter, Car Finder, Safety) plus large-scale Infinity-MM instruction data.
 - **Production-focused** – Designed for always-on, low-power, privacy-preserving deployment in real vehicles.
 ## Use Cases
 ---
+## ⚡ **Benchmarks**
 Validated on **Qualcomm SA8295P NPU**:
 # **How to Use**
+> ⚠️ **Hardware requirement:** AutoNeural is only available for **Qualcomm NPUs**.
 ### 1) Install Nexa-SDK
 Download the SDK，follow the installation steps provided on the model page.
 ### 2) Configure authentication
 nexa config set license '<access_token>'
 ```
 ### 3) Run the model
 ```bash
 nexa infer NexaAI/AutoNeural
 ```
+### 4) Image input
 Drag and drop one or more image files into the terminal window.
 Multiple images can be processed with a single query.
 <img src="https://cdn-uploads.huggingface.co/production/uploads/6851901ea43b4824f79e27a9/eHNdopWWaoir2IP3Cu_AF.png" alt="Model Architecture" style="width:700px;"/>
 ---
+## **Training**
+## **License**
 The AutoNeural model is released under the **Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0)** license.