File size: 730 Bytes
bd81b64
 
 
 
 
 
 
 
 
8c4da2f
bd81b64
 
 
 
 
 
 
 
8c4da2f
bd81b64
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Description: 

- trained on text classification of type of net zero target. Text is from company ESG reports, data is labelled by Net Zero Tracker.
- text was truncated to 128 tokens before tokenization.
- 



Problems: 
- keeps outputting the same label regardless of input
- The text column is quite unstructured, varies in lenghth, some include/don't include URL, some include excerpts from ESG report, etc...
- truncation might have resulted in loss of data
- should try text generation task instead
- too many labels makes model behave poorly.



Moving Forward:

- better text preprocessing, remove urls, etc...
- change task to text generation. Might perform better (This means ClimateBert cannot be used as base model.)
-