Quantized version of ai-sage/GigaChat3-10B-A1.8B-bf16
This requires llama.cpp PR#17420 to properly detect as a deepseek lite model.
And this chat_template.jinja.
Quick Start
export CUDA_VISIBLE_DEVICES=0
export model="./output/GigaChat3-10B-A1.8B-f16.gguf"
./build/bin/llama-server \
--model "$model" \
--ctx-size 2000 \
--parallel 1 \
--threads 8 \
--host 0.0.0.0 \
--port 8088 \
-cmoe \
--jinja \
--chat-template-file ./output/chat_template.jinja
- Downloads last month
- 417
Hardware compatibility
Log In
to view the estimation
8-bit
16-bit
Model tree for evilfreelancer/GigaChat3-10B-A1.8B-GGUF
Base model
ai-sage/GigaChat3-10B-A1.8B-bf16