Image-Text-to-Text
nexaml commited on
Commit
0305d13
·
verified ·
1 Parent(s): 738ed1f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -7
README.md CHANGED
@@ -2,14 +2,46 @@
2
 
3
  ## **Introduction**
4
 
5
- **AutoNeural** is an **NPU-native multimodal vision–language model** co-designed from the ground up for real-time, on-device inference on NPU. Instead of adapting GPU-first architectures, AutoNeural redesigns both **vision encoding** and **language modeling** for the constraints and capabilities of NPUs—achieving **14× faster latency**, **3× higher input resolution**, **7× lower quantization error**, and **real-time automotive performance** even under aggressive low-precision settings.
6
 
7
- AutoNeural integrates:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
- * A **MobileNetV5-based vision encoder** with depthwise separable convolutions.
10
- * A **hybrid Transformer-SSM language backbone** that dramatically reduces KV-cache overhead.
11
- * A **normalization-free MLP connector** tailored for quantization stability.
12
- * Mixed-precision **W8A16 (vision)** and **W4A16 (language)** inference validated on real Qualcomm NPUs.
13
 
14
  ---
15
 
@@ -70,7 +102,7 @@ Multiple images can be processed with a single query.
70
 
71
  ---
72
 
73
- ## **Key Features**
74
 
75
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6851901ea43b4824f79e27a9/eHNdopWWaoir2IP3Cu_AF.png" alt="Model Architecture" style="width:700px;"/>
76
 
 
2
 
3
  ## **Introduction**
4
 
5
+ **AutoNeural** is an NPU-native vision–language model for in-car assistants, co-designed with a MobileNetV5 encoder and a hybrid Liquid AI 1.2B backbone to deliver **real-time multimodal understanding on Qualcomm SA8295P NPU**. It runs 768×768 images, cuts end-to-end latency by up to **14×**, and improves quantization error by **7×** compared to ViT–Transformer baselines on the same hardware.
6
 
7
+ Key Features:
8
+ - **NPU-native co-design** – MobileNet-based vision encoder + hybrid Transformer–SSM backbone, built for INT4/8/16 and NPU operator sets.
9
+ - **Real-time cockpit performance** – Up to **14× lower TTFT**, ~3× faster decode, and 4× longer context (4096 vs 1024) on Qualcomm SA8295P NPU.
10
+ - **High-resolution multimodal perception** – Supports **768×768** images with ~45 dB SQNR under mixed-precision quantization (W8A16 vision, W4A16 language).
11
+ - **Automotive-tuned dataset** – Trained with **200k** proprietary cockpit samples (AI Sentinel, Greeter, Car Finder, Safety) plus large-scale Infinity-MM instruction data.
12
+ - **Production-focused** – Designed for always-on, low-power, privacy-preserving deployment in real vehicles.
13
+
14
+ # **How to Use**
15
+
16
+ > ⚠️ **Hardware requirement:** AutoNeural is optimized for **Qualcomm NPUs**.
17
+
18
+ ### 1) Install Nexa-SDK
19
+
20
+ Download the SDK,follow the installation steps provided on the model page.
21
+
22
+ ---
23
+
24
+ ### 2) Configure authentication
25
+
26
+ Create an access token in the Model Hub, then run:
27
+
28
+ ```bash
29
+ nexa config set license '<access_token>'
30
+ ```
31
+
32
+ ---
33
+
34
+ ### 3) Run the model
35
+
36
+ ```bash
37
+ nexa infer NexaAI/AutoNeural
38
+ ```
39
+
40
+ ### Image input
41
+
42
+ Drag and drop one or more image files into the terminal window.
43
+ Multiple images can be processed with a single query.
44
 
 
 
 
 
45
 
46
  ---
47
 
 
102
 
103
  ---
104
 
105
+ ## **Model Architecture**
106
 
107
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6851901ea43b4824f79e27a9/eHNdopWWaoir2IP3Cu_AF.png" alt="Model Architecture" style="width:700px;"/>
108