tim/africat: A multi-class, multi-label NLP network that categorises text into one of ~160 categories, mostly relating to the African continent. - africat - Gitea on treehouse.org.za

tim/africat

Go to file

Timothy Allen 54db72fd89 Switch to SentencePiece for tokenisation and Roberta for the model

2023-12-30 15:24:16 +02:00

Switch to SentencePiece for tokenisation and Roberta for the model

2023-12-30 15:24:16 +02:00

v0.1.0

2023-12-01 21:24:42 +02:00

.gitignore

Cleanup, and device-aware training

2023-12-21 11:38:35 +02:00

Makefile

v0.1.1

2023-12-01 21:27:25 +02:00

poetry.lock

Add dependencies

2023-12-01 22:18:23 +02:00

pyproject.toml

v0.1.4

2023-12-13 20:28:30 +02:00

README.md

Cleanup, and device-aware training

2023-12-21 11:38:35 +02:00

README.md

This is a multi-class, multi-label NLP network that categorises text into one of ~160 categories, mostly relating to the African continent.

The training dataset is a proprietry dataset from allAfrica.com, consisting of stories that have been manuially categorised according to AllAfrica's inhouse categorisation scheme.

The trained model is freely available, as is the training and evaluation code.