Upload quantized ONNX model

Browse files

Files changed (4) hide show

.gitattributes +0 -10
README.md +33 -303
config.json +155 -69
model_quantized.onnx +2 -2

.gitattributes CHANGED Viewed

@@ -1,11 +1 @@
 *.onnx filter=lfs diff=lfs merge=lfs -text
-*.bin filter=lfs diff=lfs merge=lfs -text
-*.safetensors filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tar.gz filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text


1	*.onnx filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,319 +1,49 @@
-# Prompt Task Complexity Classifier - Quantized
-🚀 **A high-performance, quantized ONNX implementation of NVIDIA's prompt task and complexity classifier optimized for fast CPU inference.**
-This standalone Python package provides a quantized version of the [nvidia/prompt-task-and-complexity-classifier](https://huggingface.co/nvidia/prompt-task-and-complexity-classifier) with ~75% size reduction and 2-4x speed improvement while maintaining accuracy.
-## ✨ Features
-- 🔥 **Fast Inference**: 2-4x faster than original model on CPU
-- 📦 **Compact Size**: ~75% smaller model footprint
-- 🎯 **Comprehensive Analysis**: 8 classification dimensions + complexity scoring
-- 🔧 **Easy Integration**: Drop-in replacement with familiar API
-- 🐍 **Production Ready**: Optimized for server deployment and batch processing
-## 📊 What This Model Does
-The quantized classifier analyzes text prompts across **8 key dimensions**:
-| Dimension | Description | Classes |
-|-----------|-------------|---------|
-| **Task Type** | Primary task category | 11 types (QA, Generation, Summarization, etc.) |
-| **Creativity Scope** | Creative thinking requirements | 5 levels (0.0 - 1.0) |
-| **Reasoning** | Logical reasoning complexity | 5 levels (0.0 - 1.0) |
-| **Contextual Knowledge** | Context understanding needs | 5 levels (0.0 - 1.0) |
-| **Few-shot Learning** | Examples needed | 5 levels (0-4+ shots) |
-| **Domain Knowledge** | Specialized expertise required | 5 levels (0.0 - 1.0) |
-| **Label Reasoning** | Classification reasoning needs | 5 levels (0.0 - 1.0) |
-| **Constraint Handling** | Rule/constraint complexity | 5 levels (0.0 - 1.0) |
-Plus a **task-weighted complexity score** that combines all dimensions intelligently based on the detected task type.
-## 🚀 Quick Start
-### Installation
-```bash
-# Install the package with Poetry
-cd prompt-task-complexity-classifier-quantized
-poetry install
-# Or install dependencies directly
-pip install torch transformers onnxruntime optimum[onnxruntime] huggingface-hub numpy
-```
-### Basic Usage
-```python
-from prompt_classifier import QuantizedPromptClassifier
-# Load the quantized model
-classifier = QuantizedPromptClassifier.from_pretrained("./")
-# Classify a single prompt
-result = classifier.classify_single_prompt(
-    "Write a Python function to implement quicksort with detailed comments"
-)
-print(f"Task: {result['task_type_1'][0]}")           # "Code Generation"
-print(f"Complexity: {result['prompt_complexity_score'][0]:.3f}")  # 0.652
-print(f"Reasoning: {result['reasoning'][0]:.3f}")    # 0.750
-print(f"Creativity: {result['creativity_scope'][0]:.3f}")  # 0.250
-```
-### Batch Processing
-```python
-# Process multiple prompts efficiently
-prompts = [
-    "What is the capital of France?",
-    "Explain quantum computing and write simulation code",
-    "Create a marketing strategy for eco-friendly products"
-]
-results = classifier.classify_prompts(prompts)
-for prompt, result in zip(prompts, results):
-    task_type = result['task_type_1'][0]
-    complexity = result['prompt_complexity_score'][0]
-    print(f"{task_type}: {complexity:.3f} - {prompt[:50]}...")
-```
-### Command Line Interface
-```bash
-# Quantize the original model
-prompt-classifier quantize --output-dir ./my_quantized_model
-# Test the quantized model
-prompt-classifier test --model-path ./my_quantized_model --benchmark
-# Classify prompts from command line
-prompt-classifier classify "Explain machine learning" "Write a sorting algorithm"
-# Get model information
-prompt-classifier info --model-path ./my_quantized_model
-# Upload to Hugging Face Hub
-prompt-classifier upload your-username/my-quantized-model --private
-```
-## 📦 Package Structure
-```
-prompt-task-complexity-classifier-quantized/
-├── src/prompt_classifier/
-│   ├── __init__.py              # Main package exports
-│   ├── classifier.py            # Core QuantizedPromptClassifier class
-│   ├── utils.py                 # Utility functions
-│   ├── cli.py                   # Command line interface
-│   ├── testing.py               # Test and validation functions
-│   ├── examples.py              # Usage examples
-│   └── scripts/
-│       ├── quantization.py      # Model quantization script
-│       ├── upload.py            # HuggingFace upload script
-│       └── quantize_model.py    # Core quantization logic
-├── tests/
-│   └── test_classifier.py       # Unit tests
-├── config.json                  # Model configuration
-├── pyproject.toml              # Poetry project configuration
-├── README.md                   # This file
-└── .gitattributes              # Git LFS configuration
-```
-## 🛠️ Development Workflow
-### 1. Setup Development Environment
-```bash
-# Clone and setup
-git clone <your-repo>
-cd prompt-task-complexity-classifier-quantized
-# Install with development dependencies
-poetry install --with dev
-# Activate environment
-poetry shell
-```
-### 2. Quantize Your Own Model
-```bash
-# Run quantization process
-python -m prompt_classifier.scripts.quantization \
-    --model-id nvidia/prompt-task-and-complexity-classifier \
-    --output-dir ./quantized_output
-```
-### 3. Test and Validate
-```bash
-# Run comprehensive tests
-python -m prompt_classifier.testing
-# Or use pytest for unit tests
-pytest tests/ -v
-```
-### 4. Upload to Hugging Face
 ```bash
-# Login to HF Hub
-huggingface-cli login
-# Upload your quantized model
-python -m prompt_classifier.scripts.upload your-username/model-name
 ```
-## ⚡ Performance Benchmarks
-| Metric | Original Model | Quantized Model | Improvement |
-|--------|---------------|-----------------|-------------|
-| **Model Size** | ~350 MB | ~89 MB | 75% smaller |
-| **Inference Speed** | 45ms/prompt | 12ms/prompt | 3.7x faster |
-| **Memory Usage** | ~1.2 GB | ~320 MB | 73% reduction |
-| **Accuracy** | Baseline | -1.2% typical | Minimal loss |
-*Benchmarks run on Intel i7-10700K CPU with batch size 1*
-## 🔧 Advanced Usage
-### Custom Model Path
 ```python
-# Load from custom directory
-classifier = QuantizedPromptClassifier.from_pretrained("/path/to/model")
-# Load from Hugging Face Hub
-classifier = QuantizedPromptClassifier.from_pretrained("username/model-name")
-```
-### Direct ONNX Runtime Usage
-```python
-import onnxruntime as ort
-from transformers import AutoTokenizer
-# For maximum performance
-session = ort.InferenceSession("model_quantized.onnx")
-tokenizer = AutoTokenizer.from_pretrained("./")
-# Run inference directly
-inputs = tokenizer("Your prompt", return_tensors="np", padding=True, truncation=True)
-outputs = session.run(None, {
-    "input_ids": inputs["input_ids"].astype(np.int64),
-    "attention_mask": inputs["attention_mask"].astype(np.int64)
-})
-```
-### Integration with Existing Code
-```python
-# Drop-in replacement for original CustomModel
-from prompt_classifier import QuantizedPromptClassifier
-# Replace this:
-# from some_module import CustomModel
-# model = CustomModel.from_pretrained("nvidia/prompt-task-and-complexity-classifier")
-# With this:
-model = QuantizedPromptClassifier.from_pretrained("./quantized_model")
-# Same API, better performance!
-results = model.classify_prompts(["Your prompts here"])
-```
-## 📝 API Reference
-### `QuantizedPromptClassifier`
-Main class for prompt classification with quantized ONNX backend.
-#### Methods
-- `from_pretrained(model_path)` - Load model from directory or HF Hub
-- `classify_prompts(prompts: List[str])` - Classify multiple prompts
-- `classify_single_prompt(prompt: str)` - Classify one prompt
-- `get_task_types(prompts: List[str])` - Get just task types
-- `get_complexity_scores(prompts: List[str])` - Get just complexity scores
-#### Configuration
-The model uses the same configuration as the original, with additional quantization metadata:
-```json
-{
-  "quantized": true,
-  "quantization_method": "dynamic",
-  "framework": "onnx",
-  "optimized_for": "cpu",
-  "file_name": "model_quantized.onnx"
-}
 ```
-## 🧪 Testing
-```bash
-# Run all tests
-pytest tests/ -v
-# Run with coverage
-pytest tests/ --cov=prompt_classifier --cov-report=html
-# Run only fast tests
-pytest tests/ -m "not slow"
-# Test specific functionality
-pytest tests/test_classifier.py::TestQuantizedPromptClassifier::test_classify_single_prompt
-```
-## 🤝 Contributing
-1. Fork the repository
-2. Create a feature branch (`git checkout -b feature/amazing-feature`)
-3. Make your changes and add tests
-4. Run tests (`pytest tests/`)
-5. Run linting (`ruff check src/ && black src/`)
-6. Commit changes (`git commit -m 'Add amazing feature'`)
-7. Push to branch (`git push origin feature/amazing-feature`)
-8. Open a Pull Request
-## 📋 Requirements
-- Python 3.9+
-- PyTorch 1.9+
-- Transformers 4.21+
-- ONNX Runtime 1.12+
-- Optimum 1.12+
-- NumPy 1.21+
-See `pyproject.toml` for complete dependency specifications.
-## 📄 License
-Apache 2.0 License - see [LICENSE](LICENSE) file for details.
-## 🙏 Acknowledgments
-- **NVIDIA** for the original prompt task and complexity classifier
-- **Microsoft** for ONNX Runtime quantization framework
-- **Hugging Face** for Optimum and Transformers libraries
-- **Poetry** for modern Python dependency management
-## 📞 Support
-- 📚 [Documentation](https://huggingface.co/nvidia/prompt-task-and-complexity-classifier)
-- 🐛 [Issues](https://github.com/your-org/prompt-task-complexity-classifier-quantized/issues)
-- 💬 [Discussions](https://github.com/your-org/prompt-task-complexity-classifier-quantized/discussions)
-- 🔗 [Original Model](https://huggingface.co/nvidia/prompt-task-and-complexity-classifier)
----
-**Ready to supercharge your prompt classification? 🚀**
-```bash
-cd prompt-task-complexity-classifier-quantized
-poetry install
-poetry run prompt-classifier quantize
-```

+---
+license: apache-2.0
+language: en
+library_name: optimum
+tags:
+- onnx
+- quantized
+- text-classification
+- nvidia
+- nemotron
+pipeline_tag: text-classification
+---
+# Quantized ONNX model for botirk/tiny-prompt-task-complexity-classifier
+This repository contains the quantized ONNX version of the [nvidia/prompt-task-and-complexity-classifier](https://huggingface.co/nvidia/prompt-task-and-complexity-classifier) model.
+## Model Description
+This is a multi-headed model which classifies English text prompts across task types and complexity dimensions. This version has been quantized to `INT8` using dynamic quantization with the [🤗 Optimum](https://github.com/huggingface/optimum) library, resulting in a smaller footprint and faster CPU inference.
+For more details on the model architecture, tasks, and complexity dimensions, please refer to the [original model card](https://huggingface.co/nvidia/prompt-task-and-complexity-classifier).
+## How to Use
+You can use this model directly with `optimum.onnxruntime` for accelerated inference.
+First, install the required libraries:
 ```bash
+pip install optimum[onnxruntime] transformers
 ```
+Then, you can use the model in a pipeline:
 ```python
+from optimum.onnxruntime import ORTModelForSequenceClassification
+from transformers import AutoTokenizer, pipeline
+repo_id = "botirk/tiny-prompt-task-complexity-classifier"
+model = ORTModelForSequenceClassification.from_pretrained(repo_id)
+tokenizer = AutoTokenizer.from_pretrained(repo_id)
+# Note: The pipeline task is a simplification.
+# For full multi-headed output, you need to process the logits manually.
+classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
+prompt = "Write a mystery set in a small town where an everyday object goes missing."
+results = classifier(prompt)
+print(results)
 ```

config.json CHANGED Viewed

@@ -1,81 +1,167 @@
 {
   "_name_or_path": "nvidia/prompt-task-and-complexity-classifier",
-  "architectures": [
-    "DebertaV2Model"
-  ],
-  "attention_probs_dropout_prob": 0.1,
-  "hidden_act": "gelu",
-  "hidden_dropout_prob": 0.1,
-  "hidden_size": 768,
   "initializer_range": 0.02,
-  "intermediate_size": 3072,
-  "layer_norm_eps": 1e-07,
-  "max_position_embeddings": 512,
   "model_type": "deberta-v2",
-  "num_attention_heads": 12,
-  "num_hidden_layers": 12,
-  "pad_token_id": 0,
-  "pooler_dropout": 0.0,
-  "pooler_hidden_act": "gelu",
-  "pooler_hidden_size": 768,
-  "position_biased_input": false,
-  "position_buckets": 256,
-  "relative_attention": true,
-  "torch_dtype": "float32",
-  "transformers_version": "4.21.0",
-  "type_vocab_size": 0,
-  "vocab_size": 128100,
   "target_sizes": {
-    "task_type": 8,
-    "creativity_scope": 5,
-    "reasoning": 5,
-    "contextual_knowledge": 5,
-    "number_of_few_shots": 5,
-    "domain_knowledge": 5,
-    "no_label_reason": 5,
-    "constraint_ct": 5
   },
   "task_type_map": {
-    "0": "Open QA",
-    "1": "Closed QA",
-    "2": "Summarization",
-    "3": "Text Generation",
     "4": "Code Generation",
-    "5": "Chatbot",
-    "6": "Classification",
-    "7": "Rewrite",
-    "8": "Brainstorming",
-    "9": "Extraction",
-    "10": "Other"
   },
   "weights_map": {
-    "creativity_scope": [0.0, 0.25, 0.5, 0.75, 1.0],
-    "reasoning": [0.0, 0.25, 0.5, 0.75, 1.0],
-    "contextual_knowledge": [0.0, 0.25, 0.5, 0.75, 1.0],
-    "number_of_few_shots": [0.0, 1.0, 2.0, 3.0, 4.0],
-    "domain_knowledge": [0.0, 0.25, 0.5, 0.75, 1.0],
-    "no_label_reason": [0.0, 0.25, 0.5, 0.75, 1.0],
-    "constraint_ct": [0.0, 0.25, 0.5, 0.75, 1.0]
-  },
-  "divisor_map": {
-    "creativity_scope": 1.0,
-    "reasoning": 1.0,
-    "contextual_knowledge": 1.0,
-    "number_of_few_shots": 1.0,
-    "domain_knowledge": 1.0,
-    "no_label_reason": 1.0,
-    "constraint_ct": 1.0
   },
-  "quantized": true,
-  "quantization_method": "dynamic",
   "framework": "onnx",
-  "optimized_for": "cpu",
-  "file_name": "model_quantized.onnx",
-  "quantization_config": {
-    "format": "QOperator",
-    "mode": "IntegerOps",
-    "activations_dtype": "QUInt8",
-    "weights_dtype": "QInt8",
-    "is_static": false
-  }
-}

 {
   "_name_or_path": "nvidia/prompt-task-and-complexity-classifier",
+  "attn_config": {
+    "model_type": ""
+  },
+  "base_model": "microsoft/DeBERTa-v3-base",
+  "config_path": null,
+  "constraint_ct_map": {
+    "1.0": 0,
+    "Unknown": 1
+  },
+  "contextual_knowledge_map": {
+    "No": 0,
+    "Unknown": -1,
+    "Yes": 1
+  },
+  "creativity_score_map": {
+    "High": 0,
+    "Low": 1,
+    "No": 2
+  },
+  "d_model": 2048,
+  "divisor_map": {
+    "constraint_ct": 1,
+    "contextual_knowledge": 1,
+    "creativity_scope": 2,
+    "domain_knowledge": 3,
+    "no_label_reason": 1,
+    "number_of_few_shots": 1,
+    "reasoning": 1
+  },
+  "domain_knowledge_map": {
+    "High": 0,
+    "Low": 1,
+    "Medium": 2,
+    "No": 3
+  },
+  "drop_out": false,
+  "emb_pdrop": 0.0,
+  "embedding_fraction": 1.0,
+  "expansion_ratio": 4,
+  "fc_dropout": 0.2,
+  "init_device": "cpu",
   "initializer_range": 0.02,
+  "layer_norm_epsilon": 1e-05,
+  "learned_pos_emb": true,
+  "logit_scale": null,
+  "max_seq_len": 2048,
+  "model_output_type": {
+    "constraint_ct": "numeric",
+    "contextual_knowledge": "numeric",
+    "creativity_scope": "numeric",
+    "domain_knowledge": "numeric",
+    "no_label_reason": "numeric",
+    "number_of_few_shots": "numeric",
+    "prompt_complexity_score": "numeric",
+    "reasoning": "numeric",
+    "task_type_1": "string",
+    "task_type_2": "string",
+    "task_type_prob": "numeric"
+  },
   "model_type": "deberta-v2",
+  "n_heads": 16,
+  "n_layers": 24,
+  "no_bias": true,
+  "no_label_reason_map": {
+    "Unknown": 0
+  },
+  "norm_type": "low_precision_layernorm",
+  "number_of_few_shots_map": {
+    "0.0": 0,
+    "1.0": 1,
+    "2.0": 2,
+    "3.0": 3,
+    "4.0": 4,
+    "5.0": 5
+  },
+  "pretrained": true,
+  "reasoning_map": {
+    "No": 0,
+    "Unknown": -1,
+    "Yes": 1
+  },
+  "resid_pdrop": 0.0,
   "target_sizes": {
+    "constraint_ct": 2,
+    "contextual_knowledge": 2,
+    "creativity_scope": 3,
+    "domain_knowledge": 4,
+    "no_label_reason": 1,
+    "number_of_few_shots": 6,
+    "reasoning": 2,
+    "task_type": 12
   },
+  "targets": [
+    "task_type_1",
+    "task_type_2",
+    "task_type_prob",
+    "creativity_scope",
+    "reasoning",
+    "contextual_knowledge",
+    "number_of_few_shots",
+    "domain_knowledge",
+    "no_label_reason",
+    "constraint_ct",
+    "prompt_complexity_score"
+  ],
   "task_type_map": {
+    "0": "Brainstorming",
+    "1": "Chatbot",
+    "10": "Text Generation",
+    "11": "Unknown",
+    "2": "Classification",
+    "3": "Closed QA",
     "4": "Code Generation",
+    "5": "Extraction",
+    "6": "Open QA",
+    "7": "Other",
+    "8": "Rewrite",
+    "9": "Summarization"
   },
+  "transformers_version": "4.46.3",
+  "use_cache": false,
+  "verbose": 0,
+  "vocab_size": 50368,
   "weights_map": {
+    "constraint_ct": [
+      1,
+      0
+    ],
+    "contextual_knowledge": [
+      0,
+      1
+    ],
+    "creativity_scope": [
+      2,
+      1,
+      0
+    ],
+    "domain_knowledge": [
+      3,
+      1,
+      2,
+      0
+    ],
+    "no_label_reason": [
+      0
+    ],
+    "number_of_few_shots": [
+      0,
+      1,
+      2,
+      3,
+      4,
+      5
+    ],
+    "reasoning": [
+      0,
+      1
+    ]
   },
   "framework": "onnx",
+  "tags": [
+    "onnx",
+    "quantized"
+  ]
+}

model_quantized.onnx CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:98a24d8927867b8c4d489ee6053d01213aa64d370175771bded2761a2954a8e8
-size 243526627

 version https://git-lfs.github.com/spec/v1
+oid sha256:36c58a6b89d72d22c9a67caebab6356673f13e6a3c743e54552878cf1557c3e0
+size 243965613