Dafisns
/

whisper-turbo-multilingual-fleurs

@@ -12,6 +12,7 @@ tags:
 datasets:
 - google/fleurs
 - fsicoli/common_voice_22_0
 language:
 - en
 - id
@@ -20,41 +21,28 @@ metrics:
 base_model:
 - openai/whisper-large-v3-turbo
 model-index:
-- name: Whisper Turbo Multilingual Fleurs + CV
   results:
   - task:
       type: automatic-speech-recognition
       name: Automatic Speech Recognition
     dataset:
-      name: Google FLEURS (Indonesian & English)
-      type: google/fleurs
     metrics:
     - type: wer
-      value: 5.95
-      name: WER (English)
     - type: wer
-      value: 7.57
-      name: WER (Indonesian)
-  - task:
-      type: automatic-speech-recognition
-      name: Automatic Speech Recognition
-    dataset:
-      name: Common Voice 22.0 (Indonesian & English)
-      type: fsicoli/common_voice_22_0
-    metrics:
-    - type: wer
-      value: 16.69
-      name: WER (English)
-    - type: wer
-      value: 11.41
-      name: WER (Indonesian)
 ---
-# Whisper Turbo Fine-Tuned on FLEURS & Common Voice (Indonesian & English)
-This model is a fine-tuned version of [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo). It was trained on a combination of **Google FLEURS** and **Common Voice 22.0** datasets, focusing specifically on **Indonesian (`id_id`)** and **English (`en_us`)** languages.
-This model leverages **PEFT (LoRA)** to efficiently adapt the Whisper Large V3 Turbo architecture to better handle Indonesian and English speech, achieving competitive Word Error Rates (WER) across different benchmarks.
 - **Developed by:** Dafis Nadhif Saputra
 - **Model type:** Automatic Speech Recognition (ASR)
@@ -66,33 +54,41 @@ This model leverages **PEFT (LoRA)** to efficiently adapt the Whisper Large V3 T
 ## Evaluation Results
-The model was evaluated on the test splits of both FLEURS and Common Voice. Below is the breakdown of the Word Error Rate (WER):
-| Dataset | Language | WER (%) |
 | :--- | :--- | :--- |
-| **Google FLEURS** | English | **5.95%** |
-| **Google FLEURS** | Indonesian | **7.57%** |
-| **Common Voice 22.0** | English | 16.69% |
-| **Common Voice 22.0** | Indonesian | 11.41% |
 ## Training Details
-### Data Preparation
-The training dataset consisted of:
-- **Google FLEURS:** ~70% of English Train Split + 100% of Indonesian Train Split.
-- **Common Voice 22.0:** 5,000 English samples + 4,000 Indonesian samples (Streaming).
-### Hyperparameters
-- **Method:** LoRA (Low-Rank Adaptation) via PEFT
-- **Rank (r):** 32
-- **Alpha:** 64
-- **Dropout:** 0.05
-- **Target Modules:** `q_proj`, `v_proj`, `k_proj`, `out_proj`
-- **Learning Rate:** 1e-5
-- **Batch Size:** 16 (with Gradient Accumulation = 2)
-- **Optimizer:** AdamW
-- **Scheduler:** Cosine (50 warmup steps)
-- **Epochs:** 3
 ## How to Get Started with the Model
@@ -100,18 +96,24 @@ You can use the `pipeline` from the `transformers` library to easily transcribe
 ```python
 from transformers import pipeline
 # Replace with your model ID
 model_id = "Dafisns/whisper-turbo-multilingual-fleurs"
 # Initialize the pipeline
-pipe = pipeline("automatic-speech-recognition", model=model_id, device="cuda")
 # Transcribe an audio file
 # Ensure you specify the language code ('indonesian' or 'english') for better accuracy
 # Example for Indonesian audio:
 result = pipe("path_to_your_indonesian_audio.mp3", generate_kwargs={"language": "indonesian"})
 print(result["text"])
 # Example for English audio:

 datasets:
 - google/fleurs
 - fsicoli/common_voice_22_0
+- edinburghcstr/edacc
 language:
 - en
 - id
 base_model:
 - openai/whisper-large-v3-turbo
 model-index:
+- name: Whisper Turbo Multilingual (Fleurs + CV + EdAcc)
   results:
   - task:
       type: automatic-speech-recognition
       name: Automatic Speech Recognition
     dataset:
+      name: Combined Test Set (Fleurs + CV + EdAcc)
+      type: mixed
     metrics:
     - type: wer
+      value: 9.09
+      name: WER (English - Combined)
     - type: wer
+      value: 6.97
+      name: WER (Indonesian - Combined)
 ---
+# Whisper Turbo Fine-Tuned on FLEURS, Common Voice & EdAcc (Indonesian & English)
+This model is a fine-tuned version of [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo). It was trained on a combination of **Google FLEURS**, **Common Voice 22.0**, and **Edinburgh International Accents (EdAcc)** datasets.
+The training focuses specifically on **Indonesian (`id_id`)** and **English (`en_us`)**. A unique feature of this model is the inclusion of the EdAcc dataset to improve performance on **Indonesian-accented English**.
 - **Developed by:** Dafis Nadhif Saputra
 - **Model type:** Automatic Speech Recognition (ASR)
 ## Evaluation Results
+The model was evaluated using two different schemes:
+### 1. Internal Training Validation
+Measured during the training process on a mixed validation set (all datasets combined).
+| Epoch | Validation Loss | WER (%) |
 | :--- | :--- | :--- |
+| 1 | 0.2717 | 7.42% |
+| **2** | **0.2638** | **7.33%** |
+### 2. Final Standalone Evaluation
+Measured after training on the full concatenated test sets for each language.
+| Language | Dataset Source | WER (%) |
+| :--- | :--- | :--- |
+| **English** | Fleurs + Common Voice + EdAcc | **9.09%** |
+| **Indonesian** | Fleurs + Common Voice | **6.97%** |
 ## Training Details
+### Data Overview
+The model was trained on approximately **15,000 samples** combining:
+* **Google FLEURS** (Indonesian & English)
+* **Common Voice 22.0** (Indonesian & English)
+* **EdAcc** (English with Indonesian Accent)
+### Hyperparameters (Summary)
+The model was trained using **PEFT (LoRA)** to efficiently adapt the weights.
+* **Learning Rate:** 5e-5
+* **Batch Size:** 32 (Effective)
+* **Epochs:** 2
+* **Precision:** FP16
+* **Optimizer:** AdamW
+* **LoRA Rank:** 32
 ## How to Get Started with the Model
 ```python
 from transformers import pipeline
+import torch
 # Replace with your model ID
 model_id = "Dafisns/whisper-turbo-multilingual-fleurs"
 # Initialize the pipeline
+pipe = pipeline(
+    "automatic-speech-recognition",
+    model=model_id,
+    device="cuda" if torch.cuda.is_available() else "cpu",
+    torch_dtype=torch.float16
+)
 # Transcribe an audio file
 # Ensure you specify the language code ('indonesian' or 'english') for better accuracy
 # Example for Indonesian audio:
 result = pipe("path_to_your_indonesian_audio.mp3", generate_kwargs={"language": "indonesian"})
 print(result["text"])
 # Example for English audio: