Commit graph

12 commits

Author SHA1 Message Date
b5310304bd
wrangler: update dependencies 2024-08-29 19:38:02 +01:00
844d8e6dd4
rw: update dependencies 2023-11-30 17:01:13 +00:00
f3652edf82
fixup 2022-08-05 19:10:40 +01:00
9399d1d8f5
Create (untested) JS interface to Python jsonl→tfrecord converter
also test Python .jsonl.gz → .tfrecord.gz
2022-08-05 19:10:28 +01:00
1a657bd653
add new uniq subcommand
It deduplicates lines in the files, with the potential to add the ability to filter on a specific property later.
The reasoningf or this is thus:
1. There will naturally be periods of time where nothing happens
2. Too many duplicates will interfere and confuse with the contrastive learning algorithm, as in each batch it will have less variance in samples

This is especially important because contrastive learning causes it to compare every item in each batch with every othear item in the batch.
2022-07-04 19:46:06 +01:00
1297f41105
.tfrecord files are too much hassle
let's go with a standard of .jsonl.gz instead
2022-07-01 18:28:39 +01:00
f5f267c6b6
Update dependencies 2022-07-01 16:56:51 +01:00
e030e6c2d5
Fix remaining(?) crashes= in our code 2022-05-19 19:13:28 +01:00
bb018c53f6
Fix many bugs
Many bugs remain though
2022-05-19 17:54:14 +01:00
cc5efbae8a
Implement tfrecodify subcommand.
It's all still untested, but that's the next step
2022-05-19 17:15:15 +01:00
9411ad3218
tweak licence 2022-05-13 19:08:04 +01:00
8a9cd6c1c0
Lay out some basic scaffolding
I *really* hope this works. This is the 3rd major revision of this
model. I've learnt a ton of stuff between now and my last attempt, so
here's hoping that all goes well :D

The basic idea behind this attempt is *Contrastive Learning*. If we
don't get anything useful with this approach, then we can assume that
it's not really possible / feasible.

Something we need to watch out for is the variance (or rather lack
thereof) in the dataset. We have 1.5M timesteps, but not a whole lot
will be happening in most of those....

We may need to analyse the variance of the water depth data and extract
a subsample that's more balanced.
2022-05-13 19:06:15 +01:00