Papers
arxiv:2508.20885

SincQDR-VAD: A Noise-Robust Voice Activity Detection Framework Leveraging Learnable Filters and Ranking-Aware Optimization

Published on Aug 28
Authors:
,
,
,
,

Abstract

A SincQDR-VAD framework enhances voice activity detection in noisy settings by using learnable bandpass filters and a quadratic disparity ranking loss to improve AUROC and F2-Score with reduced computational resources.

AI-generated summary

Voice activity detection (VAD) is essential for speech-driven applications, but remains far from perfect in noisy and resource-limited environments. Existing methods often lack robustness to noise, and their frame-wise classification losses are only loosely coupled with the evaluation metric of VAD. To address these challenges, we propose SincQDR-VAD, a compact and robust framework that combines a Sinc-extractor front-end with a novel quadratic disparity ranking loss. The Sinc-extractor uses learnable bandpass filters to capture noise-resistant spectral features, while the ranking loss optimizes the pairwise score order between speech and non-speech frames to improve the area under the receiver operating characteristic curve (AUROC). A series of experiments conducted on representative benchmark datasets show that our framework considerably improves both AUROC and F2-Score, while using only 69% of the parameters compared to prior arts, confirming its efficiency and practical viability.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.20885 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.20885 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.