NexaAI
/

AutoNeural

Image-Text-to-Text

Model card Files Files and versions

xet

Community

nexaml commited on 9 days ago

Commit

0305d13

verified ·

1 Parent(s): 738ed1f

Update README.md

Browse files

Files changed (1) hide show

README.md +39 -7

README.md CHANGED Viewed

@@ -2,14 +2,46 @@
 ## **Introduction**
-**AutoNeural** is an **NPU-native multimodal vision–language model** co-designed from the ground up for real-time, on-device inference on NPU. Instead of adapting GPU-first architectures, AutoNeural redesigns both **vision encoding** and **language modeling** for the constraints and capabilities of NPUs—achieving **14× faster latency**, **3× higher input resolution**, **7× lower quantization error**, and **real-time automotive performance** even under aggressive low-precision settings.
-AutoNeural integrates:
-* A **MobileNetV5-based vision encoder** with depthwise separable convolutions.
-* A **hybrid Transformer-SSM language backbone** that dramatically reduces KV-cache overhead.
-* A **normalization-free MLP connector** tailored for quantization stability.
-* Mixed-precision **W8A16 (vision)** and **W4A16 (language)** inference validated on real Qualcomm NPUs.
 ---
@@ -70,7 +102,7 @@ Multiple images can be processed with a single query.
 ---
-## **Key Features**
 <img src="https://cdn-uploads.huggingface.co/production/uploads/6851901ea43b4824f79e27a9/eHNdopWWaoir2IP3Cu_AF.png" alt="Model Architecture" style="width:700px;"/>

 ## **Introduction**
+**AutoNeural** is an NPU-native vision–language model for in-car assistants, co-designed with a MobileNetV5 encoder and a hybrid Liquid AI 1.2B backbone to deliver **real-time multimodal understanding on Qualcomm SA8295P NPU**. It runs 768×768 images, cuts end-to-end latency by up to **14×**, and improves quantization error by **7×** compared to ViT–Transformer baselines on the same hardware.
+Key Features:
+- **NPU-native co-design** – MobileNet-based vision encoder + hybrid Transformer–SSM backbone, built for INT4/8/16 and NPU operator sets.
+- **Real-time cockpit performance** – Up to **14× lower TTFT**, ~3× faster decode, and 4× longer context (4096 vs 1024) on Qualcomm SA8295P NPU.
+- **High-resolution multimodal perception** – Supports **768×768** images with ~45 dB SQNR under mixed-precision quantization (W8A16 vision, W4A16 language).
+- **Automotive-tuned dataset** – Trained with **200k** proprietary cockpit samples (AI Sentinel, Greeter, Car Finder, Safety) plus large-scale Infinity-MM instruction data.
+- **Production-focused** – Designed for always-on, low-power, privacy-preserving deployment in real vehicles.
+# **How to Use**
+> ⚠️ **Hardware requirement:** AutoNeural is optimized for **Qualcomm NPUs**.
+### 1) Install Nexa-SDK
+Download the SDK，follow the installation steps provided on the model page.
+---
+### 2) Configure authentication
+Create an access token in the Model Hub, then run:
+```bash
+nexa config set license '<access_token>'
+```
+---
+### 3) Run the model
+```bash
+nexa infer NexaAI/AutoNeural
+```
+### Image input
+Drag and drop one or more image files into the terminal window.
+Multiple images can be processed with a single query.
 ---
 ---
+## **Model Architecture**
 <img src="https://cdn-uploads.huggingface.co/production/uploads/6851901ea43b4824f79e27a9/eHNdopWWaoir2IP3Cu_AF.png" alt="Model Architecture" style="width:700px;"/>