Commit graph

64 commits

Author SHA1 Message Date
Starbeamrainbowlabs 4b2e418ddc
finish comment
...oops
2022-11-11 18:55:35 +00:00
Starbeamrainbowlabs ce194d9227
slurm: customise log file names 2022-11-10 21:09:34 +00:00
Starbeamrainbowlabs e519b0adb3
GzipChildProcess: spawn-stream is buggy IIRC 2022-11-09 16:43:05 +00:00
Starbeamrainbowlabs ddbf2cb734
slurm-process: -n28 → --exclusive 2022-11-04 17:42:50 +00:00
Starbeamrainbowlabs a762664063
slurm-process: -n28 fo uniq call 2022-11-04 17:23:20 +00:00
Starbeamrainbowlabs 0166b4d09e
slurm-process: change log file names 2022-11-04 17:11:10 +00:00
Starbeamrainbowlabs 441ad92b12
slurm: fixup 2022-11-01 19:57:15 +00:00
Starbeamrainbowlabs bc0e5f05a8
slurm: fixup 2022-11-01 19:55:04 +00:00
Starbeamrainbowlabs 784b8ed35c
recordify: catch NaN --count-file 2022-11-01 19:53:21 +00:00
Starbeamrainbowlabs c17a4ca05a
slurm: fix sanity logic 2022-11-01 19:38:04 +00:00
Starbeamrainbowlabs 79b231198f
slurm-process: check input files are readable 2022-11-01 19:03:37 +00:00
Starbeamrainbowlabs a69fa9f0f3
slurm: rename 2022-11-01 18:59:55 +00:00
Starbeamrainbowlabs f8341e7d89
slurm: add .log 2022-11-01 18:59:15 +00:00
Starbeamrainbowlabs fecc63b6a2
wrangler: write high-level job file 2022-11-01 18:56:27 +00:00
Starbeamrainbowlabs 91152ebb1c
wrangler:recordify update cli help
we only output .jsonl.gz to a DIRECTORY, so update cli help to reflect this
2022-11-01 18:29:47 +00:00
Starbeamrainbowlabs 0c11ddca4b
rainfallwrangler does NOT mess up the ordering of the data 2022-10-18 19:07:14 +01:00
Starbeamrainbowlabs 9edda1f397
rainfallwrangler json2tfrecord.py: normalise data 2022-09-01 19:03:15 +01:00
Starbeamrainbowlabs 3e4128c0a8
resize rainfall to be 1/2 size of current 2022-09-01 18:47:07 +01:00
Starbeamrainbowlabs 6cdf2b2389
wrangler python child: explicitly close stdout+stderr.
Hopefully this will avoid any more hanging issues.
2022-08-10 18:51:30 +01:00
Starbeamrainbowlabs 5880bf9020
wrangler: add current date to process indicator.
There's a bug that causes it to hang, but we don't know why
2022-08-10 18:50:57 +01:00
Starbeamrainbowlabs 231c832888
wrangler bugfix: crashes; logging output 2022-08-10 17:33:10 +01:00
Starbeamrainbowlabs b52c7f89a7
Move dataset parsing function to the right place 2022-08-10 17:24:55 +01:00
Starbeamrainbowlabs 50f214450f
wrangler: fix crash 2022-08-10 17:05:01 +01:00
Starbeamrainbowlabs 0bac8c8c0c
fixup 2022-08-08 17:23:24 +01:00
Starbeamrainbowlabs 405f1a0bb0
fixup 2022-08-08 17:22:31 +01:00
Starbeamrainbowlabs 5e1356513c
slurm: use compute, because 28 tf processes in parallel is too much for the GPU memory 2022-08-08 17:22:18 +01:00
Starbeamrainbowlabs 133ef59af3
fixup 2022-08-08 16:33:05 +01:00
Starbeamrainbowlabs 80e1a33ee2
slurm-jsonl2tfrecord.job: auto install dependencies 2022-08-08 16:31:49 +01:00
Starbeamrainbowlabs 1442d20524
slurm: request gpu 2022-08-08 15:56:46 +01:00
Starbeamrainbowlabs f6f2e3694c
json2tfrecord: write slurm job file 2022-08-08 15:53:32 +01:00
Starbeamrainbowlabs 222a6146ec
write glue for .jsonl.gz → .tfrecord.gz converter 2022-08-08 15:33:59 +01:00
Starbeamrainbowlabs f3652edf82
fixup 2022-08-05 19:10:40 +01:00
Starbeamrainbowlabs 9399d1d8f5
Create (untested) JS interface to Python jsonl→tfrecord converter
also test Python .jsonl.gz → .tfrecord.gz
2022-08-05 19:10:28 +01:00
Starbeamrainbowlabs a02c3436ab
get python bridge working t convert .jsonl.gz → .tfrecord.gz 2022-08-05 18:07:04 +01:00
Starbeamrainbowlabs 2ccc1be414
json2tfrecord: write (untested python to convert .jsonl → .tfrecord 2022-07-28 19:48:25 +01:00
Starbeamrainbowlabs 927c30e189
recompress files in the right order 2022-07-25 18:44:23 +01:00
Starbeamrainbowlabs 3332fa598a
Add new recompress subcommand
also fix typos, CLI definitions
2022-07-25 17:54:23 +01:00
Starbeamrainbowlabs 593dc2d5ce
fixup 2022-07-22 18:51:29 +01:00
Starbeamrainbowlabs a593077d46
add slurm job file for uniq 2022-07-22 18:46:05 +01:00
Starbeamrainbowlabs 03e398504a
Bugfix: fix crash when target dir isn't specified 2022-07-22 18:36:00 +01:00
Starbeamrainbowlabs 82e826fd69
Fix bugs in remainder of rainfallwrangler:uniq :D 2022-07-22 18:05:03 +01:00
Starbeamrainbowlabs 31bd7899b6
Merge branch 'main' of git.starbeamrainbowlabs.com:sbrl/PhD-Rainfall-Radar 2022-07-22 17:10:52 +01:00
Starbeamrainbowlabs ce303814d6
Bugfix: don't make 1 group for each duplicate.... 2022-07-22 17:06:02 +01:00
Starbeamrainbowlabs 38a0bd0942
uniq: bugfix a lot, but it's not working right just yet
There's still a bug in the file line deletor
2022-07-09 00:31:32 +01:00
Starbeamrainbowlabs a966cdff35
uniq: bugfix a lot, but it's not working right just yet
There's still a bug in the file line deletor
2022-07-08 19:54:24 +01:00
Starbeamrainbowlabs 3b2715c6cd
recordify: fix process exiting and imcomplete files issues
• Node.js not exiting at all
 • Node.js exiting on end_safe ing stream.Writable (?????)
 • Incomplete files - "unexpected end of file" errors and invalid JSON
2022-07-08 18:54:00 +01:00
Starbeamrainbowlabs cb922ae8c8
fixup 2022-07-08 16:52:19 +01:00
Starbeamrainbowlabs b9a018f9a9
properly close all teh streams 2022-07-08 16:51:17 +01:00
Starbeamrainbowlabs 1a657bd653
add new uniq subcommand
It deduplicates lines in the files, with the potential to add the ability to filter on a specific property later.
The reasoningf or this is thus:
1. There will naturally be periods of time where nothing happens
2. Too many duplicates will interfere and confuse with the contrastive learning algorithm, as in each batch it will have less variance in samples

This is especially important because contrastive learning causes it to compare every item in each batch with every othear item in the batch.
2022-07-04 19:46:06 +01:00
Starbeamrainbowlabs 234e2b7978
Write \n end of line character
we actually forgot this, wow....
2022-07-04 17:05:05 +01:00