Update README.md
Browse files
README.md
CHANGED
|
@@ -2,14 +2,46 @@
|
|
| 2 |
|
| 3 |
## **Introduction**
|
| 4 |
|
| 5 |
-
**AutoNeural** is an
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
* A **MobileNetV5-based vision encoder** with depthwise separable convolutions.
|
| 10 |
-
* A **hybrid Transformer-SSM language backbone** that dramatically reduces KV-cache overhead.
|
| 11 |
-
* A **normalization-free MLP connector** tailored for quantization stability.
|
| 12 |
-
* Mixed-precision **W8A16 (vision)** and **W4A16 (language)** inference validated on real Qualcomm NPUs.
|
| 13 |
|
| 14 |
---
|
| 15 |
|
|
@@ -70,7 +102,7 @@ Multiple images can be processed with a single query.
|
|
| 70 |
|
| 71 |
---
|
| 72 |
|
| 73 |
-
## **
|
| 74 |
|
| 75 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/6851901ea43b4824f79e27a9/eHNdopWWaoir2IP3Cu_AF.png" alt="Model Architecture" style="width:700px;"/>
|
| 76 |
|
|
|
|
| 2 |
|
| 3 |
## **Introduction**
|
| 4 |
|
| 5 |
+
**AutoNeural** is an NPU-native vision–language model for in-car assistants, co-designed with a MobileNetV5 encoder and a hybrid Liquid AI 1.2B backbone to deliver **real-time multimodal understanding on Qualcomm SA8295P NPU**. It runs 768×768 images, cuts end-to-end latency by up to **14×**, and improves quantization error by **7×** compared to ViT–Transformer baselines on the same hardware.
|
| 6 |
|
| 7 |
+
Key Features:
|
| 8 |
+
- **NPU-native co-design** – MobileNet-based vision encoder + hybrid Transformer–SSM backbone, built for INT4/8/16 and NPU operator sets.
|
| 9 |
+
- **Real-time cockpit performance** – Up to **14× lower TTFT**, ~3× faster decode, and 4× longer context (4096 vs 1024) on Qualcomm SA8295P NPU.
|
| 10 |
+
- **High-resolution multimodal perception** – Supports **768×768** images with ~45 dB SQNR under mixed-precision quantization (W8A16 vision, W4A16 language).
|
| 11 |
+
- **Automotive-tuned dataset** – Trained with **200k** proprietary cockpit samples (AI Sentinel, Greeter, Car Finder, Safety) plus large-scale Infinity-MM instruction data.
|
| 12 |
+
- **Production-focused** – Designed for always-on, low-power, privacy-preserving deployment in real vehicles.
|
| 13 |
+
|
| 14 |
+
# **How to Use**
|
| 15 |
+
|
| 16 |
+
> ⚠️ **Hardware requirement:** AutoNeural is optimized for **Qualcomm NPUs**.
|
| 17 |
+
|
| 18 |
+
### 1) Install Nexa-SDK
|
| 19 |
+
|
| 20 |
+
Download the SDK,follow the installation steps provided on the model page.
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
### 2) Configure authentication
|
| 25 |
+
|
| 26 |
+
Create an access token in the Model Hub, then run:
|
| 27 |
+
|
| 28 |
+
```bash
|
| 29 |
+
nexa config set license '<access_token>'
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
### 3) Run the model
|
| 35 |
+
|
| 36 |
+
```bash
|
| 37 |
+
nexa infer NexaAI/AutoNeural
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
### Image input
|
| 41 |
+
|
| 42 |
+
Drag and drop one or more image files into the terminal window.
|
| 43 |
+
Multiple images can be processed with a single query.
|
| 44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
---
|
| 47 |
|
|
|
|
| 102 |
|
| 103 |
---
|
| 104 |
|
| 105 |
+
## **Model Architecture**
|
| 106 |
|
| 107 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/6851901ea43b4824f79e27a9/eHNdopWWaoir2IP3Cu_AF.png" alt="Model Architecture" style="width:700px;"/>
|
| 108 |
|