92fc34ebb1
rw/child_process:json2tfrecord: Add RAINFALL_MAX_NUMBER env var
...
use use you'd need to export RAINFALL_MAX_NUMBER=some_integer_value, since it is used by a subprocess rather than the main Node.js process itself
2023-11-30 16:44:43 +00:00
4b2e418ddc
finish comment
...
...oops
2022-11-11 18:55:35 +00:00
e519b0adb3
GzipChildProcess: spawn-stream is buggy IIRC
2022-11-09 16:43:05 +00:00
784b8ed35c
recordify: catch NaN --count-file
2022-11-01 19:53:21 +00:00
91152ebb1c
wrangler:recordify update cli help
...
we only output .jsonl.gz to a DIRECTORY, so update cli help to reflect this
2022-11-01 18:29:47 +00:00
0c11ddca4b
rainfallwrangler does NOT mess up the ordering of the data
2022-10-18 19:07:14 +01:00
9edda1f397
rainfallwrangler json2tfrecord.py: normalise data
2022-09-01 19:03:15 +01:00
3e4128c0a8
resize rainfall to be 1/2 size of current
2022-09-01 18:47:07 +01:00
6cdf2b2389
wrangler python child: explicitly close stdout+stderr.
...
Hopefully this will avoid any more hanging issues.
2022-08-10 18:51:30 +01:00
5880bf9020
wrangler: add current date to process indicator.
...
There's a bug that causes it to hang, but we don't know why
2022-08-10 18:50:57 +01:00
231c832888
wrangler bugfix: crashes; logging output
2022-08-10 17:33:10 +01:00
b52c7f89a7
Move dataset parsing function to the right place
2022-08-10 17:24:55 +01:00
50f214450f
wrangler: fix crash
2022-08-10 17:05:01 +01:00
222a6146ec
write glue for .jsonl.gz → .tfrecord.gz converter
2022-08-08 15:33:59 +01:00
9399d1d8f5
Create (untested) JS interface to Python jsonl→tfrecord converter
...
also test Python .jsonl.gz → .tfrecord.gz
2022-08-05 19:10:28 +01:00
a02c3436ab
get python bridge working t convert .jsonl.gz → .tfrecord.gz
2022-08-05 18:07:04 +01:00
2ccc1be414
json2tfrecord: write (untested python to convert .jsonl → .tfrecord
2022-07-28 19:48:25 +01:00
927c30e189
recompress files in the right order
2022-07-25 18:44:23 +01:00
3332fa598a
Add new recompress subcommand
...
also fix typos, CLI definitions
2022-07-25 17:54:23 +01:00
03e398504a
Bugfix: fix crash when target dir isn't specified
2022-07-22 18:36:00 +01:00
82e826fd69
Fix bugs in remainder of rainfallwrangler:uniq :D
2022-07-22 18:05:03 +01:00
31bd7899b6
Merge branch 'main' of git.starbeamrainbowlabs.com:sbrl/PhD-Rainfall-Radar
2022-07-22 17:10:52 +01:00
ce303814d6
Bugfix: don't make 1 group for each duplicate....
2022-07-22 17:06:02 +01:00
38a0bd0942
uniq: bugfix a lot, but it's not working right just yet
...
There's still a bug in the file line deletor
2022-07-09 00:31:32 +01:00
a966cdff35
uniq: bugfix a lot, but it's not working right just yet
...
There's still a bug in the file line deletor
2022-07-08 19:54:24 +01:00
3b2715c6cd
recordify: fix process exiting and imcomplete files issues
...
• Node.js not exiting at all
• Node.js exiting on end_safe ing stream.Writable (?????)
• Incomplete files - "unexpected end of file" errors and invalid JSON
2022-07-08 18:54:00 +01:00
cb922ae8c8
fixup
2022-07-08 16:52:19 +01:00
b9a018f9a9
properly close all teh streams
2022-07-08 16:51:17 +01:00
1a657bd653
add new uniq subcommand
...
It deduplicates lines in the files, with the potential to add the ability to filter on a specific property later.
The reasoningf or this is thus:
1. There will naturally be periods of time where nothing happens
2. Too many duplicates will interfere and confuse with the contrastive learning algorithm, as in each batch it will have less variance in samples
This is especially important because contrastive learning causes it to compare every item in each batch with every othear item in the batch.
2022-07-04 19:46:06 +01:00
234e2b7978
Write \n end of line character
...
we actually forgot this, wow....
2022-07-04 17:05:05 +01:00
920cc3feaf
Properly close last writer
...
otherwise Node.js doesn't quit
2022-07-04 17:04:11 +01:00
588ee87b83
Bugfix: fix end-of-file
2022-07-01 19:34:26 +01:00
5b2d71f41f
it works
...
.....I think
2022-07-01 19:08:36 +01:00
1297f41105
.tfrecord files are too much hassle
...
let's go with a standard of .jsonl.gz instead
2022-07-01 18:28:39 +01:00
ba258fbba0
Remove debug loogging
2022-05-19 19:25:44 +01:00
e030e6c2d5
Fix remaining(?) crashes= in our code
2022-05-19 19:13:28 +01:00
3cb7e42505
it doesn't crash as much now, but it still isn't behaving.
2022-05-19 18:52:15 +01:00
bb018c53f6
Fix many bugs
...
Many bugs remain though
2022-05-19 17:54:14 +01:00
cc5efbae8a
Implement tfrecodify subcommand.
...
It's all still untested, but that's the next step
2022-05-19 17:15:15 +01:00
0fa7ae9d6a
Imnplement plumbing, but it's all untested
2022-05-18 17:47:02 +01:00
bf4866bdbc
Add data readers
2022-05-18 17:04:11 +01:00
8a9cd6c1c0
Lay out some basic scaffolding
...
I *really* hope this works. This is the 3rd major revision of this
model. I've learnt a ton of stuff between now and my last attempt, so
here's hoping that all goes well :D
The basic idea behind this attempt is *Contrastive Learning*. If we
don't get anything useful with this approach, then we can assume that
it's not really possible / feasible.
Something we need to watch out for is the variance (or rather lack
thereof) in the dataset. We have 1.5M timesteps, but not a whole lot
will be happening in most of those....
We may need to analyse the variance of the water depth data and extract
a subsample that's more balanced.
2022-05-13 19:06:15 +01:00