LFM2-VL doesn't process large images right in llama-cpp?

#1
by octopusmegalopod - opened

I noticed LFM2-VL hallucinates outputs for large images in llama-cpp. This makes it very hard for OCR purposes where we need to OCR long lines of text. But it works fine in PyTorch/Transformers. Is all the preprocessing logic for large images implemented in llama-cpp?

More info in the issue: https://github.com/ggml-org/llama.cpp/issues/17290

BTW: Thanks for releasing this model family. LFM2-VL is otherwise the best vision model available for edge devices - nothing else comes close to it.. This model can even pick up new languages quickly.

octopusmegalopod changed discussion title from LFM2-VL doesn't preprocess large images right in llama-cpp? to LFM2-VL doesn't process large images right in llama-cpp?
Liquid AI org

Thanks for reporting the issue @octopusmegalopod and for the details on how to reproduce.
The root cause was identified in the llama.cpp issue (see comment) and will be fixed soon.

Sign up or log in to comment