Whisper Small Fine-tuned on Nepali (OpenSLR 54)

This model is a fine-tuned version of openai/whisper-small on the OpenSLR 54 (Nepali Speech Corpus) dataset. It achieves state-of-the-art results for an open-source small model on this benchmark, trained on a massive 154-hour dataset.

Model Details

Model Description

  • Model architecture: Whisper Small (244M Parameters)
  • Language: Nepali (ne)
  • Task: Automatic Speech Recognition (Transcription)
  • Dataset: OpenSLR 54 (~157,000 utterances)
  • Fine-tuning Hardware: NVIDIA A100 80GB

Usage

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", model="fnawaraj/whisper-small-nepali-openslr")

# Transcribe an audio file
transcription = transcriber("path_to_nepali_audio.mp3")

print(transcription["text"])

Training Data

The model was trained on the OpenSLR 54 (Nepali Speech Corpus).

Total Audio Duration: ~154 Hours

Total Utterances: 157,905

Sampling Rate: 16kHz

Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

Learning Rate: 1e-05

Train Batch Size: 8

Eval Batch Size: 8

Gradient Accumulation Steps: 4 (Effective Batch Size: 32)

Optimizer: AdamW

LR Scheduler: Linear decay with warmup (500 steps)

Training Steps: 10,000

Mixed Precision: FP16

Evaluation Results

The model was evaluated on the unseen test split of the OpenSLR dataset (1,580 samples). Metric Score Word Error Rate (WER) 26.69% Validation Loss 0.210

Limitations

The model performs best on read speech (high quality).

It may struggle with extremely fast conversational speech or heavy background noise compared to models trained on diverse noise datasets.

Some phonetic spelling variations (e.g., short vs long vowels) may occur as they sound identical in spoken Nepali.
Downloads last month
172
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Dragneel/whisper-small-nepali

Finetuned
(3074)
this model

Evaluation results