Model Card for MELP

MELP (Multi-scale ECG-Language Pretraining) is a novel model presented in the paper "From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining". It aims to overcome limitations in traditional ECG analysis by leveraging hierarchical supervision from ECG-text pairs to align ECG signals with textual reports.

Model Details

Model Description

Electrocardiograms (ECGs) play a vital role in monitoring cardiac health and diagnosing heart diseases. However, traditional deep learning approaches for ECG analysis rely heavily on large-scale manual annotations, which are both time-consuming and resource-intensive to obtain. To overcome this limitation, self-supervised learning (SSL) has emerged as a promising alternative, enabling the extraction of robust ECG representations that can be efficiently transferred to various downstream tasks.

MELP introduces a novel Multi-scale ECG-Language Pretraining (MELP) model that fully leverages hierarchical supervision from ECG-text pairs. MELP first pretrains a cardiology-specific language model to enhance its understanding of clinical text. It then applies three levels of cross-modal supervision—at the token, beat, and rhythm levels—to align ECG signals with textual reports, capturing structured information across different time scales. Experimental results demonstrate that MELP outperforms existing SSL methods, underscoring its effectiveness and adaptability across diverse clinical applications.

Developed by: The authors of the paper.
Funded by [optional]: [More Information Needed]
Shared by [optional]: [More Information Needed]
Model type: Multimodal ECG-Language Pretraining Model
Language(s) (NLP): English (clinical text)
License: Apache-2.0
Finetuned from model [optional]: [More Information Needed]

Model Sources [optional]

Repository: The code is available as mentioned in the paper's abstract. Please refer to the paper for the exact URL.
Paper [optional]: From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining
Demo [optional]: [More Information Needed]

Uses

Direct Use

MELP can be directly used for self-supervised learning of robust ECG representations. These representations can be efficiently transferred to various downstream tasks, such as zero-shot ECG classification, linear probing, and other transfer learning applications on ECG data.

Downstream Use [optional]

The model can be fine-tuned for diverse clinical applications, including but not limited to tasks that require aligning ECG signals with textual reports, thereby assisting in cardiac health monitoring and heart disease diagnosis.

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoConfig, AutoModel

# This is a placeholder for the actual model ID on the Hugging Face Hub.
# Replace "your_model_id" with the correct model identifier.
model_id = "your_model_id" # e.g., "org/melp-model"

# Load configuration
config = AutoConfig.from_pretrained(model_id)

# Load model
model = AutoModel.from_pretrained(model_id, config=config)

# For detailed usage instructions and examples, please refer to the paper's
# official code repository mentioned in the abstract.

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

@article{zhou2025token,
  title={From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining},
  author={Zhou, Zijian and Liu, Shikun and Han, Xiao and Liu, Haozhe and Ng, Kam Woh and Xie, Tian and Cong, Yuren and Li, Hang and Xu, Mengmeng and P{\'e}rez-R{\'u}a, Juan-Manuel and Patel, Aditya and Xiang, Tao and Shi, Miaojing and He, Sen},
  journal={arXiv preprint arXiv:2506.21803},
  year={2025}
}

APA:

Zhou, Z., Liu, S., Han, X., Liu, H., Ng, K. W., Xie, T., Cong, Y., Li, H., Xu, M., Pérez-Rúa, J.-M., Patel, A., Xiang, T., Shi, M., & He, S. (2025). From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining. arXiv preprint arXiv:2506.21803.

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Downloads last month: 15

Safetensors

Model size

65.6M params

Tensor type

BF16

Inference Providers NEW

Audio-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support