Can you make GGUFs of both of the qwen 3 next models? I cannot find these anywhere and also no other comment asking for them.

#1449

by Hanspeter420 - opened Oct 13, 2025

Discussion

Hanspeter420

Oct 13, 2025

I would especially love 4 bit and 3 bit GGUFs of these. Both thinking and Instruct would be nice. Thanks in advance:)

nicoboss

Oct 13, 2025

I don't think it's worth it to provide GGUFs with the current issues still present in https://github.com/ggml-org/llama.cpp/pull/16095. If you really have an use case where 139 tokens of context is enough I could provide private quants but at least for my use case I need at least 2048 tokens of context so the model braking down after 139 tokens is devestating. I recommend you closely monitor https://github.com/ggml-org/llama.cpp/pull/16095 and let us know once the llama.cpp implementation is in a state you consider usable for your use case and I might do private quants. Official mradermacher quants we will provide as soon https://github.com/ggml-org/llama.cpp/pull/16095 is merged.

Hanspeter420

Oct 14, 2025

Oh I didn't know about the backend llama cpp issues. Thanks for the clarification I thought hybrid attention may have been added faster with DSA of Deepseek V3.2 and things like phi 4 mini flash reasoning but they may work entirely different (I don't know). I definitely also need 2048 or 4096 at least tokens of context thanks for the offers though.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment