just why, Tensorflow?!
tf.data.TextLineDataset looks almost too good to be true..... and it is, as despite supporting decompressing via gzip(!) it doesn't look like we can convince it to parse JSON :-/
• Node.js not exiting at all
• Node.js exiting on end_safe ing stream.Writable (?????)
• Incomplete files - "unexpected end of file" errors and invalid JSON
It deduplicates lines in the files, with the potential to add the ability to filter on a specific property later.
The reasoningf or this is thus:
1. There will naturally be periods of time where nothing happens
2. Too many duplicates will interfere and confuse with the contrastive learning algorithm, as in each batch it will have less variance in samples
This is especially important because contrastive learning causes it to compare every item in each batch with every othear item in the batch.
I *really* hope this works. This is the 3rd major revision of this
model. I've learnt a ton of stuff between now and my last attempt, so
here's hoping that all goes well :D
The basic idea behind this attempt is *Contrastive Learning*. If we
don't get anything useful with this approach, then we can assume that
it's not really possible / feasible.
Something we need to watch out for is the variance (or rather lack
thereof) in the dataset. We have 1.5M timesteps, but not a whole lot
will be happening in most of those....
We may need to analyse the variance of the water depth data and extract
a subsample that's more balanced.