Commit graph

1 commit

Author SHA1 Message Date
1a657bd653
add new uniq subcommand
It deduplicates lines in the files, with the potential to add the ability to filter on a specific property later.
The reasoningf or this is thus:
1. There will naturally be periods of time where nothing happens
2. Too many duplicates will interfere and confuse with the contrastive learning algorithm, as in each batch it will have less variance in samples

This is especially important because contrastive learning causes it to compare every item in each batch with every othear item in the batch.
2022-07-04 19:46:06 +01:00