Quantized version of ai-sage/GigaChat3-10B-A1.8B-bf16

This requires llama.cpp PR#17420 to properly detect as a deepseek lite model.

And this chat_template.jinja.

Quick Start

export CUDA_VISIBLE_DEVICES=0
export model="./output/GigaChat3-10B-A1.8B-f16.gguf"

 ./build/bin/llama-server \
  --model "$model" \
  --ctx-size 2000 \
  --parallel 1 \
  --threads 8 \
  --host 0.0.0.0 \
  --port 8088  \
  -cmoe \
  --jinja \
  --chat-template-file ./output/chat_template.jinja
Downloads last month
417
GGUF
Model size
11B params
Architecture
deepseek2
Hardware compatibility
Log In to view the estimation

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for evilfreelancer/GigaChat3-10B-A1.8B-GGUF

Quantized
(13)
this model