Structured Extraction from Business Process Diagrams Using Vision-Language Models
Abstract
Business Process Model and Notation (BPMN) is a widely adopted standard for representing complex business workflows. While BPMN diagrams are often exchanged as visual images, existing methods primarily rely on XML representations for computational analysis. In this work, we present a pipeline that leverages Vision-Language Models (VLMs) to extract structured JSON representations of BPMN diagrams directly from images, without requiring source model files or textual annotations. We also incorporate optical character recognition (OCR) for textual enrichment and evaluate the generated element lists against ground truth data derived from the source XML files. Our approach enables robust component extraction in scenarios where original source files are unavailable. We benchmark multiple VLMs and observe performance improvements in several models when OCR is used for text enrichment. In addition, we conducted extensive statistical analyses of OCR-based enrichment methods and prompt ablation studies, providing a clearer understanding of their impact on model performance.
Community
Our recent work on using VLMs to extract information from BPMN diagrams
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Hybrid OCR-LLM Framework for Enterprise-Scale Document Information Extraction Under Copy-heavy Task (2025)
- Ontology-Driven Model-to-Model Transformation of Workflow Specifications (2025)
- NOMAD: A Multi-Agent LLM System for UML Class Diagram Generation from Natural Language Requirements (2025)
- Invoice Information Extraction: Methods and Performance Evaluation (2025)
- BPMN to PDDL: Translating Business Workflows for AI Planning (2025)
- Metadata Extraction Leveraging Large Language Models (2025)
- Context-Aware Visual Prompting: Automating Geospatial Web Dashboards with Large Language Models and Agent Self-Validation for Decision Support (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper