PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published Oct 16 • 109
Perception Encoder Collection OpenCLIP (PE Core image + text) and timm PE Core, Spatial, Lang (ViT only) weights. NOTE: These weights do not work with original modeling code. • 19 items • Updated Sep 19 • 6