Commit graph

4 commits

Author SHA1 Message Date
3332fa598a
Add new recompress subcommand
also fix typos, CLI definitions
2022-07-25 17:54:23 +01:00
82e826fd69
Fix bugs in remainder of rainfallwrangler:uniq :D 2022-07-22 18:05:03 +01:00
a966cdff35
uniq: bugfix a lot, but it's not working right just yet
There's still a bug in the file line deletor
2022-07-08 19:54:24 +01:00
1a657bd653
add new uniq subcommand
It deduplicates lines in the files, with the potential to add the ability to filter on a specific property later.
The reasoningf or this is thus:
1. There will naturally be periods of time where nothing happens
2. Too many duplicates will interfere and confuse with the contrastive learning algorithm, as in each batch it will have less variance in samples

This is especially important because contrastive learning causes it to compare every item in each batch with every othear item in the batch.
2022-07-04 19:46:06 +01:00