--- title: MAPSS Multi Source Audio Perceptual Separation Scores emoji: 🎵 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.45.0 app_file: app.py pinned: false license: mit --- # MAPSS: Manifold-based Assessment of Perceptual Source Separation Granular evaluation of speech and music source separation with the MAPSS measures: - **Perceptual Matching (PM)**: Measures how closely an output perceptually aligns with its reference. Range: 0-1, higher is better. - **Perceptual Similarity (PS)**: Measures how well an output is separated from its interfering references. Range: 0-1, higher is better. ## Input Format Upload a ZIP file containing: ``` your_mixture.zip ├── references/ # Original clean sources │ ├── speaker1.wav │ ├── speaker2.wav │ └── ... └── outputs/ # Separated outputs from your algorithm ├── separated1.wav ├── separated2.wav └── ... ``` ### Audio Requirements - Format: WAV files - Sample rate: Any (automatically resampled to 16kHz) - Channels: Mono or stereo (converted to mono) - Number of files: Equal number of references and outputs ## Output Format The tool generates a ZIP file containing: - `ps_scores_{model}.csv`: PS scores for each speaker/source - `pm_scores_{model}.csv`: PM scores for each speaker/source - `params.json`: Experiment parameters used - `manifest_canonical.json`: File mapping and processing details ## Available Models | Model | Description | Default Layer | Use Case | |-------|-------------|---------------|----------| | `raw` | Raw waveform features | N/A | Baseline comparison | | `wavlm` | WavLM Large | 24 | Best overall performance | | `wav2vec2` | Wav2Vec2 Large | 24 | Strong performance | | `hubert` | HuBERT Large | 24 | Good for speech | | `wavlm_base` | WavLM Base | 12 | Faster, good quality | | `wav2vec2_base` | Wav2Vec2 Base | 12 | Faster processing | | `hubert_base` | HuBERT Base | 12 | Faster for speech | | `wav2vec2_xlsr` | Wav2Vec2 XLSR-53 | 24 | Multilingual | | `ast` | Audio Spectrogram Transformer | 12 | General audio | ## Parameters - **Model**: Select the embedding model for feature extraction - **Layer**: Which transformer layer to use (auto-selected by default) - **Alpha**: Diffusion maps parameter (0.0-1.0, default: 1.0) - 0.0 = No normalization - 1.0 = Full normalization (recommended) ## Citation If you use MAPSS in your research, please cite: ```bibtex @article{Ivry2025MAPSS, title = {MAPSS: Manifold-based Assessment of Perceptual Source Separation}, author = {Ivry, Amir and Cornell, Samuele and Watanabe, Shinji}, journal = {arXiv preprint arXiv:2509.09212}, year = {2025}, url = {https://arxiv.org/abs/2509.09212} } ``` ## Limitations - Processing time scales with number of sources, audio length and model size ## License Code: MIT License Paper: CC-BY-4.0 ## Support For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/amir-ivry/MAPSS-measures).