Update README.md
Browse files
README.md
CHANGED
|
@@ -25,8 +25,9 @@ following the [LLM teacher-student framework](https://ieeexplore.ieee.org/docume
|
|
| 25 |
Evaluation of the GPT model has shown that its annotation performance is
|
| 26 |
comparable to those of human annotators.
|
| 27 |
|
| 28 |
-
The fine-tuned ParlaCAP model achieves 0.
|
| 29 |
-
|
|
|
|
| 30 |
|
| 31 |
An additional evaluation on smaller samples from Czech ParlaMint-CZ, Bulgarian ParlaMint-BG and Ukrainian ParlaMint-UA datasets shows
|
| 32 |
that the model achieves macro-F1 scores of 0.736, 0.75 and 0.805 on these three test datasets, respectively.
|
|
@@ -35,18 +36,20 @@ For end use scenarios, we recommend filtering out predictions based on the model
|
|
| 35 |
|
| 36 |
When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
|
| 37 |
|
| 38 |
-
With this approach, we annotate as Mix
|
| 39 |
-
|
| 40 |
-
|
| 41 |
|
| 42 |
Performance of the model on the remaining instances (all instances not annotated as "Mix"):
|
| 43 |
|
| 44 |
| | micro-F1 | macro-F1 | accuracy |
|
| 45 |
|:---|-----------:|-----------:|-----------:|
|
| 46 |
| EN | 0.780 | 0.779 | 0.779 |
|
|
|
|
| 47 |
| HR | 0.724 | 0.726 | 0.724 |
|
| 48 |
|
| 49 |
|
|
|
|
| 50 |
## Use
|
| 51 |
|
| 52 |
To use the model:
|
|
|
|
| 25 |
Evaluation of the GPT model has shown that its annotation performance is
|
| 26 |
comparable to those of human annotators.
|
| 27 |
|
| 28 |
+
The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
|
| 29 |
+
0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and .. in macro-F1 on a Bosnian test set
|
| 30 |
+
(880 instances from ParlaMint-GB 4.1, ParlaMint-HR 4.1, ParlaMint-RS 4.1 and ParlaMint-BA 4.1, respectively, balanced by labels).
|
| 31 |
|
| 32 |
An additional evaluation on smaller samples from Czech ParlaMint-CZ, Bulgarian ParlaMint-BG and Ukrainian ParlaMint-UA datasets shows
|
| 33 |
that the model achieves macro-F1 scores of 0.736, 0.75 and 0.805 on these three test datasets, respectively.
|
|
|
|
| 36 |
|
| 37 |
When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
|
| 38 |
|
| 39 |
+
With this approach, we annotate as Mix 8.6% of instances in the English test set,
|
| 40 |
+
11.4% of instances in the Croatian test set, 11.1% of instances in the Serbian test set.
|
| 41 |
+
|
| 42 |
|
| 43 |
Performance of the model on the remaining instances (all instances not annotated as "Mix"):
|
| 44 |
|
| 45 |
| | micro-F1 | macro-F1 | accuracy |
|
| 46 |
|:---|-----------:|-----------:|-----------:|
|
| 47 |
| EN | 0.780 | 0.779 | 0.779 |
|
| 48 |
+
| RS | 0.749 | 0.743 | 0.749 |
|
| 49 |
| HR | 0.724 | 0.726 | 0.724 |
|
| 50 |
|
| 51 |
|
| 52 |
+
|
| 53 |
## Use
|
| 54 |
|
| 55 |
To use the model:
|