| Description: | |
| - trained on text classification of type of net zero target. Text is from company ESG reports, data is labelled by Net Zero Tracker. | |
| - text was truncated to 128 tokens before tokenization. | |
| - | |
| Problems: | |
| - keeps outputting the same label regardless of input | |
| - The text column is quite unstructured, varies in lenghth, some include/don't include URL, some include excerpts from ESG report, etc... | |
| - truncation might have resulted in loss of data | |
| - should try text generation task instead | |
| - too many labels makes model behave poorly. | |
| Moving Forward: | |
| - better text preprocessing, remove urls, etc... | |
| - change task to text generation. Might perform better (This means ClimateBert cannot be used as base model.) | |
| - |