Can you make GGUFs of both of the qwen 3 next models? I cannot find these anywhere and also no other comment asking for them.

#1449
by Hanspeter420 - opened

I would especially love 4 bit and 3 bit GGUFs of these. Both thinking and Instruct would be nice. Thanks in advance:)

I don't think it's worth it to provide GGUFs with the current issues still present in https://github.com/ggml-org/llama.cpp/pull/16095. If you really have an use case where 139 tokens of context is enough I could provide private quants but at least for my use case I need at least 2048 tokens of context so the model braking down after 139 tokens is devestating. I recommend you closely monitor https://github.com/ggml-org/llama.cpp/pull/16095 and let us know once the llama.cpp implementation is in a state you consider usable for your use case and I might do private quants. Official mradermacher quants we will provide as soon https://github.com/ggml-org/llama.cpp/pull/16095 is merged.

Oh I didn't know about the backend llama cpp issues. Thanks for the clarification I thought hybrid attention may have been added faster with DSA of Deepseek V3.2 and things like phi 4 mini flash reasoning but they may work entirely different (I don't know). I definitely also need 2048 or 4096 at least tokens of context thanks for the offers though.

Sign up or log in to comment