19 Commits

Author SHA1 Message Date
0c199256dd Add extra debugging 2025-03-06 19:20:04 +02:00
c1e3ffdc0b Switch to SentencePiece for tokenisation and Roberta for the model 2025-03-06 19:20:04 +02:00
9052767750 Convert to a multi-hot index in the CSV, to simplify our DataSets and DataLoaders 2025-03-06 19:20:04 +02:00
0752eefaaa Add original title to story text 2025-03-06 19:20:04 +02:00
d2a5bb1717 Cleanup, and device-aware training 2025-03-06 19:20:04 +02:00
d46a5baebe Fix evaluation, as well as progress reporting. 2025-03-06 19:20:04 +02:00
6864e43ce4 Metadata 2025-03-06 19:20:04 +02:00
tim
22df0a0ba0 First working model 2025-03-06 19:20:01 +02:00
tim
b96c920d33 Get model working (basically) 2025-03-06 19:19:52 +02:00
tim
58edb72e6a Add reminder about old categories 2025-03-06 19:19:52 +02:00
tim
6c46404234 Format for poetry and add debugging 2025-03-06 19:19:52 +02:00
tim
06512d71d5 Add dependencies 2025-03-06 19:19:49 +02:00
tim
327367bbea Move to new location 2025-03-06 19:15:49 +02:00
tim
b02fa3c9b0 Clean up some minor issues (like iterating over the DataSet) & simplify 2025-03-06 19:15:49 +02:00
tim
31319bab0c Add possible split between training and validation data 2025-03-06 19:15:49 +02:00
tim
7108652756 First pass at imbibing a CSV of data and turning it into a dataset, and thence into a dataloader 2025-03-06 19:15:49 +02:00
tim
60f8afefea Convert a bunch of XML files into a CSV dataset 2025-03-06 19:15:49 +02:00
tim
3fcd445a83 v0.1.1 2025-03-06 19:15:47 +02:00
tim
3c912c4171 v0.1.0 2025-03-06 19:15:43 +02:00