Missing tensor 'blk.92.nextn.embed_tokens.weight error

by MadManDan - opened Oct 2

Oct 2

Hey all, with LM Studio I get an error when trying to load Q3 variant models giving the following error below from LM Studio:

🥲 Failed to load the model

Failed to load model

error loading model: missing tensor 'blk.92.nextn.embed_tokens.weight'

Using latest LM Studio release and llama.cpp v 1.52.0, wondering if its just due to needing latest llama.cpp which isn't available in lm studio yet perhaps?

mostfake

Oct 3

Wow, no updates. I'm using the latest KoboldCPP and it won't load - same error for the Q2 quant I used.

WonderRico

Oct 3

•

edited Oct 3

I had the same issues, running a two weeks old llama.cpp docker image. "ghcr.io/ggml-org/llama.cpp:server-cuda"
After updating to the latest release it now loads fine. (running release version: 6673)

williamgkirby

Oct 3

same for me - i have tried with both q6 and q8 quants.

MadManDan

Oct 3

It's the llama.cpp runtime, they added 4.6 support in release b6653 but lmstudio currently on beta and stable channels is at b6651. Will just need to wait for an update to go out for now.

Ummite

Oct 8

Thanks for the hints!
I've been able, on text-generation-webui on Windows, to load the GLM-4.6 models by doing that command:
pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir
(take around 5-10 min)
Then I had warning about some extension like coqui-tts or gradio, I've entered into Session and remove all checkbox, save and restart everything. Then my GLM worked perfectly. Most likely as MadManDan said, llama.cpp need to be updated to latest version.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment