LFM2-VL doesn't process large images right in llama-cpp?

by octopusmegalopod - opened 24 days ago

24 days ago

I noticed LFM2-VL hallucinates outputs for large images in llama-cpp. This makes it very hard for OCR purposes where we need to OCR long lines of text. But it works fine in PyTorch/Transformers. Is all the preprocessing logic for large images implemented in llama-cpp?

More info in the issue: https://github.com/ggml-org/llama.cpp/issues/17290

BTW: Thanks for releasing this model family. LFM2-VL is otherwise the best vision model available for edge devices - nothing else comes close to it.. This model can even pick up new languages quickly.

octopusmegalopod changed discussion title from LFM2-VL doesn't preprocess large images right in llama-cpp? to LFM2-VL doesn't process large images right in llama-cpp? 24 days ago

tarek-liquid

Liquid AI org 24 days ago

Thanks for reporting the issue @octopusmegalopod and for the details on how to reproduce.
The root cause was identified in the llama.cpp issue (see comment) and will be fixed soon.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment