gannima/Orchestrator-8B-GGUF
This repository contains GGUF quantized versions of nvidia/Orchestrator-8B, optimized for local inference.
License
This model is governed by the NVIDIA Open Model License.
Available Quantizations
| Filename | Quant Type | Size | Use Case |
|---|---|---|---|
nvidia-Orchestrator-8B-q8_0.gguf |
q8_0.gguf | 8.2G | High Accuracy |
nvidia-Orchestrator-8B-q6_k.gguf |
q6_k.gguf | 6.3G | Balanced High |
nvidia-Orchestrator-8B-q4_k_m.gguf |
q4_k_m.gguf | 4.7G | Fast / Low VRAM |
Usage with llama.cpp
1. Install
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make -j
2. Run Server
Example for Q6_K model:
./llama-server \
-m nvidia-Orchestrator-8B-q6_k.gguf \
-c 32768 \
-ngl 99 \
--port 8080 \
--chat-template chatml \
--temp 0.6 --top-p 0.9 --min-p 0.05
- Downloads last month
- 97
Hardware compatibility
Log In
to view the estimation
4-bit
6-bit
8-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support