Dafisns commited on
Commit
ed9e8c2
·
verified ·
1 Parent(s): b7efd0e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -47
README.md CHANGED
@@ -12,6 +12,7 @@ tags:
12
  datasets:
13
  - google/fleurs
14
  - fsicoli/common_voice_22_0
 
15
  language:
16
  - en
17
  - id
@@ -20,41 +21,28 @@ metrics:
20
  base_model:
21
  - openai/whisper-large-v3-turbo
22
  model-index:
23
- - name: Whisper Turbo Multilingual Fleurs + CV
24
  results:
25
  - task:
26
  type: automatic-speech-recognition
27
  name: Automatic Speech Recognition
28
  dataset:
29
- name: Google FLEURS (Indonesian & English)
30
- type: google/fleurs
31
  metrics:
32
  - type: wer
33
- value: 5.95
34
- name: WER (English)
35
  - type: wer
36
- value: 7.57
37
- name: WER (Indonesian)
38
- - task:
39
- type: automatic-speech-recognition
40
- name: Automatic Speech Recognition
41
- dataset:
42
- name: Common Voice 22.0 (Indonesian & English)
43
- type: fsicoli/common_voice_22_0
44
- metrics:
45
- - type: wer
46
- value: 16.69
47
- name: WER (English)
48
- - type: wer
49
- value: 11.41
50
- name: WER (Indonesian)
51
  ---
52
 
53
- # Whisper Turbo Fine-Tuned on FLEURS & Common Voice (Indonesian & English)
54
 
55
- This model is a fine-tuned version of [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo). It was trained on a combination of **Google FLEURS** and **Common Voice 22.0** datasets, focusing specifically on **Indonesian (`id_id`)** and **English (`en_us`)** languages.
56
 
57
- This model leverages **PEFT (LoRA)** to efficiently adapt the Whisper Large V3 Turbo architecture to better handle Indonesian and English speech, achieving competitive Word Error Rates (WER) across different benchmarks.
58
 
59
  - **Developed by:** Dafis Nadhif Saputra
60
  - **Model type:** Automatic Speech Recognition (ASR)
@@ -66,33 +54,41 @@ This model leverages **PEFT (LoRA)** to efficiently adapt the Whisper Large V3 T
66
 
67
  ## Evaluation Results
68
 
69
- The model was evaluated on the test splits of both FLEURS and Common Voice. Below is the breakdown of the Word Error Rate (WER):
 
 
 
70
 
71
- | Dataset | Language | WER (%) |
72
  | :--- | :--- | :--- |
73
- | **Google FLEURS** | English | **5.95%** |
74
- | **Google FLEURS** | Indonesian | **7.57%** |
75
- | **Common Voice 22.0** | English | 16.69% |
76
- | **Common Voice 22.0** | Indonesian | 11.41% |
 
 
 
 
 
 
77
 
78
  ## Training Details
79
 
80
- ### Data Preparation
81
- The training dataset consisted of:
82
- - **Google FLEURS:** ~70% of English Train Split + 100% of Indonesian Train Split.
83
- - **Common Voice 22.0:** 5,000 English samples + 4,000 Indonesian samples (Streaming).
84
-
85
- ### Hyperparameters
86
- - **Method:** LoRA (Low-Rank Adaptation) via PEFT
87
- - **Rank (r):** 32
88
- - **Alpha:** 64
89
- - **Dropout:** 0.05
90
- - **Target Modules:** `q_proj`, `v_proj`, `k_proj`, `out_proj`
91
- - **Learning Rate:** 1e-5
92
- - **Batch Size:** 16 (with Gradient Accumulation = 2)
93
- - **Optimizer:** AdamW
94
- - **Scheduler:** Cosine (50 warmup steps)
95
- - **Epochs:** 3
96
 
97
  ## How to Get Started with the Model
98
 
@@ -100,18 +96,24 @@ You can use the `pipeline` from the `transformers` library to easily transcribe
100
 
101
  ```python
102
  from transformers import pipeline
 
103
 
104
  # Replace with your model ID
105
  model_id = "Dafisns/whisper-turbo-multilingual-fleurs"
106
 
107
  # Initialize the pipeline
108
- pipe = pipeline("automatic-speech-recognition", model=model_id, device="cuda")
 
 
 
 
 
109
 
110
  # Transcribe an audio file
111
  # Ensure you specify the language code ('indonesian' or 'english') for better accuracy
 
112
  # Example for Indonesian audio:
113
  result = pipe("path_to_your_indonesian_audio.mp3", generate_kwargs={"language": "indonesian"})
114
-
115
  print(result["text"])
116
 
117
  # Example for English audio:
 
12
  datasets:
13
  - google/fleurs
14
  - fsicoli/common_voice_22_0
15
+ - edinburghcstr/edacc
16
  language:
17
  - en
18
  - id
 
21
  base_model:
22
  - openai/whisper-large-v3-turbo
23
  model-index:
24
+ - name: Whisper Turbo Multilingual (Fleurs + CV + EdAcc)
25
  results:
26
  - task:
27
  type: automatic-speech-recognition
28
  name: Automatic Speech Recognition
29
  dataset:
30
+ name: Combined Test Set (Fleurs + CV + EdAcc)
31
+ type: mixed
32
  metrics:
33
  - type: wer
34
+ value: 9.09
35
+ name: WER (English - Combined)
36
  - type: wer
37
+ value: 6.97
38
+ name: WER (Indonesian - Combined)
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  ---
40
 
41
+ # Whisper Turbo Fine-Tuned on FLEURS, Common Voice & EdAcc (Indonesian & English)
42
 
43
+ This model is a fine-tuned version of [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo). It was trained on a combination of **Google FLEURS**, **Common Voice 22.0**, and **Edinburgh International Accents (EdAcc)** datasets.
44
 
45
+ The training focuses specifically on **Indonesian (`id_id`)** and **English (`en_us`)**. A unique feature of this model is the inclusion of the EdAcc dataset to improve performance on **Indonesian-accented English**.
46
 
47
  - **Developed by:** Dafis Nadhif Saputra
48
  - **Model type:** Automatic Speech Recognition (ASR)
 
54
 
55
  ## Evaluation Results
56
 
57
+ The model was evaluated using two different schemes:
58
+
59
+ ### 1. Internal Training Validation
60
+ Measured during the training process on a mixed validation set (all datasets combined).
61
 
62
+ | Epoch | Validation Loss | WER (%) |
63
  | :--- | :--- | :--- |
64
+ | 1 | 0.2717 | 7.42% |
65
+ | **2** | **0.2638** | **7.33%** |
66
+
67
+ ### 2. Final Standalone Evaluation
68
+ Measured after training on the full concatenated test sets for each language.
69
+
70
+ | Language | Dataset Source | WER (%) |
71
+ | :--- | :--- | :--- |
72
+ | **English** | Fleurs + Common Voice + EdAcc | **9.09%** |
73
+ | **Indonesian** | Fleurs + Common Voice | **6.97%** |
74
 
75
  ## Training Details
76
 
77
+ ### Data Overview
78
+ The model was trained on approximately **15,000 samples** combining:
79
+ * **Google FLEURS** (Indonesian & English)
80
+ * **Common Voice 22.0** (Indonesian & English)
81
+ * **EdAcc** (English with Indonesian Accent)
82
+
83
+ ### Hyperparameters (Summary)
84
+ The model was trained using **PEFT (LoRA)** to efficiently adapt the weights.
85
+
86
+ * **Learning Rate:** 5e-5
87
+ * **Batch Size:** 32 (Effective)
88
+ * **Epochs:** 2
89
+ * **Precision:** FP16
90
+ * **Optimizer:** AdamW
91
+ * **LoRA Rank:** 32
 
92
 
93
  ## How to Get Started with the Model
94
 
 
96
 
97
  ```python
98
  from transformers import pipeline
99
+ import torch
100
 
101
  # Replace with your model ID
102
  model_id = "Dafisns/whisper-turbo-multilingual-fleurs"
103
 
104
  # Initialize the pipeline
105
+ pipe = pipeline(
106
+ "automatic-speech-recognition",
107
+ model=model_id,
108
+ device="cuda" if torch.cuda.is_available() else "cpu",
109
+ torch_dtype=torch.float16
110
+ )
111
 
112
  # Transcribe an audio file
113
  # Ensure you specify the language code ('indonesian' or 'english') for better accuracy
114
+
115
  # Example for Indonesian audio:
116
  result = pipe("path_to_your_indonesian_audio.mp3", generate_kwargs={"language": "indonesian"})
 
117
  print(result["text"])
118
 
119
  # Example for English audio: