A multi-class, multi-label NLP network that categorises text into one of ~160 categories, mostly relating to the African continent.
Go to file
2023-12-30 15:24:16 +02:00
africat Switch to SentencePiece for tokenisation and Roberta for the model 2023-12-30 15:24:16 +02:00
tests v0.1.0 2023-12-01 21:24:42 +02:00
.gitignore Cleanup, and device-aware training 2023-12-21 11:38:35 +02:00
Makefile v0.1.1 2023-12-01 21:27:25 +02:00
poetry.lock Add dependencies 2023-12-01 22:18:23 +02:00
pyproject.toml v0.1.4 2023-12-13 20:28:30 +02:00
README.md Cleanup, and device-aware training 2023-12-21 11:38:35 +02:00

This is a multi-class, multi-label NLP network that categorises text into one of ~160 categories, mostly relating to the African continent.

The training dataset is a proprietry dataset from allAfrica.com, consisting of stories that have been manuially categorised according to AllAfrica's inhouse categorisation scheme.

The trained model is freely available, as is the training and evaluation code.