Can-SAVE: Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR

The source code to implement the feature engineering step of the Can-SAVE method.

Installation

git clone https://huggingface.co/ai-lab/Can-SAVE
cd CanSave
pip install -r requirements.txt

requirements.txt

pandas==1.5.3
numpy==1.23.2
lifelines==0.27.4
scikit-learn==1.1.3
scipy==1.10.0
PyYAML==6.0
openpyxl==3.0.10

Repository Structure

Can-SAVE/: Core implementation
EHR/: Simulated sample of EHR data
survival_models/: Output directory for fitted models (Kaplan-Meier estimators and AFT model)

Can-SAVE/
├── EHR/
│   └── id_26.csv
├── survival_models/
│   ├── kaplan_meier_both.pkl
│   ├── kaplan_meier_males.pkl
│   ├── kaplan_meier_females.pkl
│   └── aft.pkl
├── CanSave.py
├── Example_How_To_Train_Survival_Models.py
├── KaplanMeierEstimator.py
├── CONFIG_CanSave.yaml
├── icd10_groups.xlsx
├── requirements.txt
├── LICENSE
└── README.md

Quick Start

1) How to Train Survival Models

$ python Example_How_To_Train_Survival_Models.py

2) How to Do Feature Engineering for Can-SAVE

Terminal

$ python CanSave.py

Python

# required libraries
import numpy as np
import pandas as pd

from CanSave import CanSave

# entry point
if __name__ == '__main__':
    # Make new object for feature engineering
    config_path = './CONFIG_CanSave.yaml'
    cs = CanSave(CONFIG_PATH=config_path)
    print(help(cs))

    # Load the patient's EHR
    path_ehr = './EHR/id_26.csv'
    ehr = pd.read_csv(path_ehr, sep=';').set_index('patient_id')
    sex = ehr['sex'].iloc[0]
    birth_date = ehr['birth_date'].iloc[0]

    # Make feature engineering for the risk prediction
    features = cs.feature_engineering(
        sex         = sex,              # sex of the patient
        birth_date  = birth_date,       # birth date of the patient
        ehr         = ehr,              # Electronic Health Records of the patient
        date_pred   = '2022-01-01',     # date of the risk estimation
        deep_weeks  = 108               # deep of the EHR's history (in weeks)
    )

Citation

If you find the work useful, please cite our work:

@misc{philonenko2025,
      title={Can-SAVE: Deploying Low-Cost and Population-Scale Cancer 
      Screening via Survival Analysis Variables and EHR}, 
      author={Petr Philonenko and Vladimir Kokh and Pavel Blinov},
      year={2025},
      eprint={2309.15039},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2309.15039}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support