File size: 730 Bytes
bd81b64 8c4da2f bd81b64 8c4da2f bd81b64 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
Description: - trained on text classification of type of net zero target. Text is from company ESG reports, data is labelled by Net Zero Tracker. - text was truncated to 128 tokens before tokenization. - Problems: - keeps outputting the same label regardless of input - The text column is quite unstructured, varies in lenghth, some include/don't include URL, some include excerpts from ESG report, etc... - truncation might have resulted in loss of data - should try text generation task instead - too many labels makes model behave poorly. Moving Forward: - better text preprocessing, remove urls, etc... - change task to text generation. Might perform better (This means ClimateBert cannot be used as base model.) - |