Can-SAVE: Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR

arXiv KDD 2026 Python 3.10 License: Apache 2.0

The source code to implement the feature engineering step of the Can-SAVE method.

Installation

git clone https://huggingface.co/ai-lab/Can-SAVE
cd CanSave
pip install -r requirements.txt

requirements.txt

pandas==1.5.3
numpy==1.23.2
lifelines==0.27.4
scikit-learn==1.1.3
scipy==1.10.0
PyYAML==6.0
openpyxl==3.0.10

Repository Structure

  • Can-SAVE/: Core implementation
  • EHR/: Simulated sample of EHR data
  • survival_models/: Output directory for fitted models (Kaplan-Meier estimators and AFT model)
Can-SAVE/
β”œβ”€β”€ EHR/
β”‚   └── id_26.csv
β”œβ”€β”€ survival_models/
β”‚   β”œβ”€β”€ kaplan_meier_both.pkl
β”‚   β”œβ”€β”€ kaplan_meier_males.pkl
β”‚   β”œβ”€β”€ kaplan_meier_females.pkl
β”‚   └── aft.pkl
β”œβ”€β”€ CanSave.py
β”œβ”€β”€ Example_How_To_Train_Survival_Models.py
β”œβ”€β”€ KaplanMeierEstimator.py
β”œβ”€β”€ CONFIG_CanSave.yaml
β”œβ”€β”€ icd10_groups.xlsx
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ LICENSE
└── README.md

Quick Start

1) How to Train Survival Models

$ python Example_How_To_Train_Survival_Models.py

2) How to Do Feature Engineering for Can-SAVE

Terminal

$ python CanSave.py

Python

# required libraries
import numpy as np
import pandas as pd

from CanSave import CanSave

# entry point
if __name__ == '__main__':
    # Make new object for feature engineering
    config_path = './CONFIG_CanSave.yaml'
    cs = CanSave(CONFIG_PATH=config_path)
    print(help(cs))

    # Load the patient's EHR
    path_ehr = './EHR/id_26.csv'
    ehr = pd.read_csv(path_ehr, sep=';').set_index('patient_id')
    sex = ehr['sex'].iloc[0]
    birth_date = ehr['birth_date'].iloc[0]

    # Make feature engineering for the risk prediction
    features = cs.feature_engineering(
        sex         = sex,              # sex of the patient
        birth_date  = birth_date,       # birth date of the patient
        ehr         = ehr,              # Electronic Health Records of the patient
        date_pred   = '2022-01-01',     # date of the risk estimation
        deep_weeks  = 108               # deep of the EHR's history (in weeks)
    )

Citation

If you find the work useful, please cite our work:

@misc{philonenko2025,
      title={Can-SAVE: Deploying Low-Cost and Population-Scale Cancer 
      Screening via Survival Analysis Variables and EHR}, 
      author={Petr Philonenko and Vladimir Kokh and Pavel Blinov},
      year={2025},
      eprint={2309.15039},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2309.15039}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support