Dafisns's picture
Update README.md
ed9e8c2 verified
metadata
library_name: transformers
license: apache-2.0
tags:
  - automatic-speech-recognition
  - whisper
  - audio
  - speech
  - generated_from_trainer
  - peft
  - lora
datasets:
  - google/fleurs
  - fsicoli/common_voice_22_0
  - edinburghcstr/edacc
language:
  - en
  - id
metrics:
  - wer
base_model:
  - openai/whisper-large-v3-turbo
model-index:
  - name: Whisper Turbo Multilingual (Fleurs + CV + EdAcc)
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: Combined Test Set (Fleurs + CV + EdAcc)
          type: mixed
        metrics:
          - type: wer
            value: 9.09
            name: WER (English - Combined)
          - type: wer
            value: 6.97
            name: WER (Indonesian - Combined)

Whisper Turbo Fine-Tuned on FLEURS, Common Voice & EdAcc (Indonesian & English)

This model is a fine-tuned version of openai/whisper-large-v3-turbo. It was trained on a combination of Google FLEURS, Common Voice 22.0, and Edinburgh International Accents (EdAcc) datasets.

The training focuses specifically on Indonesian (id_id) and English (en_us). A unique feature of this model is the inclusion of the EdAcc dataset to improve performance on Indonesian-accented English.

  • Developed by: Dafis Nadhif Saputra
  • Model type: Automatic Speech Recognition (ASR)
  • Language(s): Indonesian (id), English (en)
  • License: Apache-2.0
  • Finetuned from model: openai/whisper-large-v3-turbo

LinkedIn Gmail

Evaluation Results

The model was evaluated using two different schemes:

1. Internal Training Validation

Measured during the training process on a mixed validation set (all datasets combined).

Epoch Validation Loss WER (%)
1 0.2717 7.42%
2 0.2638 7.33%

2. Final Standalone Evaluation

Measured after training on the full concatenated test sets for each language.

Language Dataset Source WER (%)
English Fleurs + Common Voice + EdAcc 9.09%
Indonesian Fleurs + Common Voice 6.97%

Training Details

Data Overview

The model was trained on approximately 15,000 samples combining:

  • Google FLEURS (Indonesian & English)
  • Common Voice 22.0 (Indonesian & English)
  • EdAcc (English with Indonesian Accent)

Hyperparameters (Summary)

The model was trained using PEFT (LoRA) to efficiently adapt the weights.

  • Learning Rate: 5e-5
  • Batch Size: 32 (Effective)
  • Epochs: 2
  • Precision: FP16
  • Optimizer: AdamW
  • LoRA Rank: 32

How to Get Started with the Model

You can use the pipeline from the transformers library to easily transcribe audio.

from transformers import pipeline
import torch

# Replace with your model ID
model_id = "Dafisns/whisper-turbo-multilingual-fleurs"

# Initialize the pipeline
pipe = pipeline(
    "automatic-speech-recognition", 
    model=model_id, 
    device="cuda" if torch.cuda.is_available() else "cpu",
    torch_dtype=torch.float16
)

# Transcribe an audio file
# Ensure you specify the language code ('indonesian' or 'english') for better accuracy

# Example for Indonesian audio:
result = pipe("path_to_your_indonesian_audio.mp3", generate_kwargs={"language": "indonesian"})
print(result["text"])

# Example for English audio:
result_en = pipe("path_to_your_english_audio.mp3", generate_kwargs={"language": "english"})
print(result_en["text"])