whisper-turbo-multilingual-fleurs / README.md

Dafisns

Update README.md

ed9e8c2 verified 10 days ago

preview code

raw

history blame contribute delete

3.86 kB

metadata

library_name: transformers
license: apache-2.0
tags:
  - automatic-speech-recognition
  - whisper
  - audio
  - speech
  - generated_from_trainer
  - peft
  - lora
datasets:
  - google/fleurs
  - fsicoli/common_voice_22_0
  - edinburghcstr/edacc
language:
  - en
  - id
metrics:
  - wer
base_model:
  - openai/whisper-large-v3-turbo
model-index:
  - name: Whisper Turbo Multilingual (Fleurs + CV + EdAcc)
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: Combined Test Set (Fleurs + CV + EdAcc)
          type: mixed
        metrics:
          - type: wer
            value: 9.09
            name: WER (English - Combined)
          - type: wer
            value: 6.97
            name: WER (Indonesian - Combined)

Whisper Turbo Fine-Tuned on FLEURS, Common Voice & EdAcc (Indonesian & English)

This model is a fine-tuned version of openai/whisper-large-v3-turbo. It was trained on a combination of Google FLEURS, Common Voice 22.0, and Edinburgh International Accents (EdAcc) datasets.

The training focuses specifically on Indonesian (id_id) and English (en_us). A unique feature of this model is the inclusion of the EdAcc dataset to improve performance on Indonesian-accented English.

Developed by: Dafis Nadhif Saputra
Model type: Automatic Speech Recognition (ASR)
Language(s): Indonesian (id), English (en)
License: Apache-2.0
Finetuned from model: openai/whisper-large-v3-turbo

Evaluation Results

The model was evaluated using two different schemes:

1. Internal Training Validation

Measured during the training process on a mixed validation set (all datasets combined).

Epoch	Validation Loss	WER (%)
1	0.2717	7.42%
2	0.2638	7.33%

2. Final Standalone Evaluation

Measured after training on the full concatenated test sets for each language.

Language	Dataset Source	WER (%)
English	Fleurs + Common Voice + EdAcc	9.09%
Indonesian	Fleurs + Common Voice	6.97%

Training Details

Data Overview

The model was trained on approximately 15,000 samples combining:

Google FLEURS (Indonesian & English)
Common Voice 22.0 (Indonesian & English)
EdAcc (English with Indonesian Accent)

Hyperparameters (Summary)

The model was trained using PEFT (LoRA) to efficiently adapt the weights.

Learning Rate: 5e-5
Batch Size: 32 (Effective)
Epochs: 2
Precision: FP16
Optimizer: AdamW
LoRA Rank: 32

How to Get Started with the Model

You can use the pipeline from the transformers library to easily transcribe audio.

from transformers import pipeline
import torch

# Replace with your model ID
model_id = "Dafisns/whisper-turbo-multilingual-fleurs"

# Initialize the pipeline
pipe = pipeline(
    "automatic-speech-recognition", 
    model=model_id, 
    device="cuda" if torch.cuda.is_available() else "cpu",
    torch_dtype=torch.float16
)

# Transcribe an audio file
# Ensure you specify the language code ('indonesian' or 'english') for better accuracy

# Example for Indonesian audio:
result = pipe("path_to_your_indonesian_audio.mp3", generate_kwargs={"language": "indonesian"})
print(result["text"])

# Example for English audio:
result_en = pipe("path_to_your_english_audio.mp3", generate_kwargs={"language": "english"})
print(result_en["text"])