Commit graph

369 commits

Author SHA1 Message Date
c52a9f961c
and another 2022-08-31 17:37:28 +01:00
58dbabd561
fix another crash 2022-08-31 17:33:07 +01:00
5fc0725917
slurm: chmod +x 2022-08-31 16:33:07 +01:00
dbf929325a
typo; add pretrain slurm job file 2022-08-31 16:32:17 +01:00
e0162bc70b
requirements.txt: add missing dependencies 2022-08-31 16:25:47 +01:00
f2312c1184
fix crash 2022-08-31 16:25:27 +01:00
15a3519107
ai: the best thing about implementing a model is that you don't have to test it on the same day :P 2022-08-11 18:26:28 +01:00
c0a9cb12d8
ai: start creating initial model implementation.
it's not hooked up to the CLI yet though.
Focus is still on ensuring the dataset is in the right format though
2022-08-10 19:03:25 +01:00
b52c7f89a7
Move dataset parsing function to the right place 2022-08-10 17:24:55 +01:00
222a6146ec
write glue for .jsonl.gz → .tfrecord.gz converter 2022-08-08 15:33:59 +01:00
28a3f578d5
update .gitignore 2022-08-04 16:49:53 +01:00
323d708692
dataset: add todo
just why, Tensorflow?!
tf.data.TextLineDataset looks almost too good to be true..... and it is, as despite supporting decompressing via gzip(!) it doesn't look like we can convince it to parse JSON :-/
2022-07-26 19:53:18 +01:00
b53c77a2cb
index.py: call static function name run 2022-07-26 19:51:28 +01:00
a7ed58fc03
ai: move requirements.txt to the right place 2022-07-26 19:25:11 +01:00
e93a95f1b3
ai dataset: add if main == main 2022-07-26 19:24:40 +01:00
de4c3dab17
typo 2022-07-26 19:14:55 +01:00
18a7d3674b
ai: create (untested) dataset 2022-07-26 19:14:10 +01:00
dac6919fcd
ai: start creating initial scaffolding 2022-07-25 19:01:10 +01:00
8a9cd6c1c0
Lay out some basic scaffolding
I *really* hope this works. This is the 3rd major revision of this
model. I've learnt a ton of stuff between now and my last attempt, so
here's hoping that all goes well :D

The basic idea behind this attempt is *Contrastive Learning*. If we
don't get anything useful with this approach, then we can assume that
it's not really possible / feasible.

Something we need to watch out for is the variance (or rather lack
thereof) in the dataset. We have 1.5M timesteps, but not a whole lot
will be happening in most of those....

We may need to analyse the variance of the water depth data and extract
a subsample that's more balanced.
2022-05-13 19:06:15 +01:00