LFM2-VL doesn't process large images right in llama-cpp?
#1
by
octopusmegalopod
- opened
I noticed LFM2-VL hallucinates outputs for large images in llama-cpp. This makes it very hard for OCR purposes where we need to OCR long lines of text. But it works fine in PyTorch/Transformers. Is all the preprocessing logic for large images implemented in llama-cpp?
More info in the issue: https://github.com/ggml-org/llama.cpp/issues/17290
BTW: Thanks for releasing this model family. LFM2-VL is otherwise the best vision model available for edge devices - nothing else comes close to it.. This model can even pick up new languages quickly.
octopusmegalopod
changed discussion title from
LFM2-VL doesn't preprocess large images right in llama-cpp?
to LFM2-VL doesn't process large images right in llama-cpp?
Thanks for reporting the issue
@octopusmegalopod
and for the details on how to reproduce.
The root cause was identified in the llama.cpp issue (see comment) and will be fixed soon.