Missing tensor 'blk.92.nextn.embed_tokens.weight error
Hey all, with LM Studio I get an error when trying to load Q3 variant models giving the following error below from LM Studio:
π₯² Failed to load the model
Failed to load model
error loading model: missing tensor 'blk.92.nextn.embed_tokens.weight'
Using latest LM Studio release and llama.cpp v 1.52.0, wondering if its just due to needing latest llama.cpp which isn't available in lm studio yet perhaps?
Wow, no updates. I'm using the latest KoboldCPP and it won't load - same error for the Q2 quant I used.
I had the same issues, running a two weeks old llama.cpp docker image. "ghcr.io/ggml-org/llama.cpp:server-cuda"
After updating to the latest release it now loads fine. (running release version: 6673)
same for me - i have tried with both q6 and q8 quants.
It's the llama.cpp runtime, they added 4.6 support in release b6653 but lmstudio currently on beta and stable channels is at b6651. Will just need to wait for an update to go out for now.
Thanks for the hints!
I've been able, on text-generation-webui on Windows, to load the GLM-4.6 models by doing that command:
pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir
(take around 5-10 min)
Then I had warning about some extension like coqui-tts or gradio, I've entered into Session and remove all checkbox, save and restart everything. Then my GLM worked perfectly. Most likely as MadManDan said, llama.cpp need to be updated to latest version.