Commit graph

355 commits

Author SHA1 Message Date
b52c7f89a7
Move dataset parsing function to the right place 2022-08-10 17:24:55 +01:00
50f214450f
wrangler: fix crash 2022-08-10 17:05:01 +01:00
0bac8c8c0c
fixup 2022-08-08 17:23:24 +01:00
405f1a0bb0
fixup 2022-08-08 17:22:31 +01:00
5e1356513c
slurm: use compute, because 28 tf processes in parallel is too much for the GPU memory 2022-08-08 17:22:18 +01:00
133ef59af3
fixup 2022-08-08 16:33:05 +01:00
80e1a33ee2
slurm-jsonl2tfrecord.job: auto install dependencies 2022-08-08 16:31:49 +01:00
1442d20524
slurm: request gpu 2022-08-08 15:56:46 +01:00
f6f2e3694c
json2tfrecord: write slurm job file 2022-08-08 15:53:32 +01:00
222a6146ec
write glue for .jsonl.gz → .tfrecord.gz converter 2022-08-08 15:33:59 +01:00
f3652edf82
fixup 2022-08-05 19:10:40 +01:00
9399d1d8f5
Create (untested) JS interface to Python jsonl→tfrecord converter
also test Python .jsonl.gz → .tfrecord.gz
2022-08-05 19:10:28 +01:00
a02c3436ab
get python bridge working t convert .jsonl.gz → .tfrecord.gz 2022-08-05 18:07:04 +01:00
28a3f578d5
update .gitignore 2022-08-04 16:49:53 +01:00
2ccc1be414
json2tfrecord: write (untested python to convert .jsonl → .tfrecord 2022-07-28 19:48:25 +01:00
323d708692
dataset: add todo
just why, Tensorflow?!
tf.data.TextLineDataset looks almost too good to be true..... and it is, as despite supporting decompressing via gzip(!) it doesn't look like we can convince it to parse JSON :-/
2022-07-26 19:53:18 +01:00
b53c77a2cb
index.py: call static function name run 2022-07-26 19:51:28 +01:00
a7ed58fc03
ai: move requirements.txt to the right place 2022-07-26 19:25:11 +01:00
e93a95f1b3
ai dataset: add if main == main 2022-07-26 19:24:40 +01:00
de4c3dab17
typo 2022-07-26 19:14:55 +01:00
18a7d3674b
ai: create (untested) dataset 2022-07-26 19:14:10 +01:00
dac6919fcd
ai: start creating initial scaffolding 2022-07-25 19:01:10 +01:00
1ec502daea
Remove rogue package*.json files 2022-07-25 19:00:21 +01:00
927c30e189
recompress files in the right order 2022-07-25 18:44:23 +01:00
3332fa598a
Add new recompress subcommand
also fix typos, CLI definitions
2022-07-25 17:54:23 +01:00
d9b9a4f9fc
note tos elf 2022-07-22 19:04:41 +01:00
593dc2d5ce
fixup 2022-07-22 18:51:29 +01:00
a593077d46
add slurm job file for uniq 2022-07-22 18:46:05 +01:00
03e398504a
Bugfix: fix crash when target dir isn't specified 2022-07-22 18:36:00 +01:00
82e826fd69
Fix bugs in remainder of rainfallwrangler:uniq :D 2022-07-22 18:05:03 +01:00
31bd7899b6
Merge branch 'main' of git.starbeamrainbowlabs.com:sbrl/PhD-Rainfall-Radar 2022-07-22 17:10:52 +01:00
ce303814d6
Bugfix: don't make 1 group for each duplicate.... 2022-07-22 17:06:02 +01:00
38a0bd0942
uniq: bugfix a lot, but it's not working right just yet
There's still a bug in the file line deletor
2022-07-09 00:31:32 +01:00
a966cdff35
uniq: bugfix a lot, but it's not working right just yet
There's still a bug in the file line deletor
2022-07-08 19:54:24 +01:00
3b2715c6cd
recordify: fix process exiting and imcomplete files issues
• Node.js not exiting at all
 • Node.js exiting on end_safe ing stream.Writable (?????)
 • Incomplete files - "unexpected end of file" errors and invalid JSON
2022-07-08 18:54:00 +01:00
cb922ae8c8
fixup 2022-07-08 16:52:19 +01:00
b9a018f9a9
properly close all teh streams 2022-07-08 16:51:17 +01:00
1a657bd653
add new uniq subcommand
It deduplicates lines in the files, with the potential to add the ability to filter on a specific property later.
The reasoningf or this is thus:
1. There will naturally be periods of time where nothing happens
2. Too many duplicates will interfere and confuse with the contrastive learning algorithm, as in each batch it will have less variance in samples

This is especially important because contrastive learning causes it to compare every item in each batch with every othear item in the batch.
2022-07-04 19:46:06 +01:00
234e2b7978
Write \n end of line character
we actually forgot this, wow....
2022-07-04 17:05:05 +01:00
920cc3feaf
Properly close last writer
otherwise Node.js doesn't quit
2022-07-04 17:04:11 +01:00
588ee87b83
Bugfix: fix end-of-file 2022-07-01 19:34:26 +01:00
5b2d71f41f
it works
.....I think
2022-07-01 19:08:36 +01:00
1297f41105
.tfrecord files are too much hassle
let's go with a standard of .jsonl.gz instead
2022-07-01 18:28:39 +01:00
f5f267c6b6
Update dependencies 2022-07-01 16:56:51 +01:00
ba258fbba0
Remove debug loogging 2022-05-19 19:25:44 +01:00
e030e6c2d5
Fix remaining(?) crashes= in our code 2022-05-19 19:13:28 +01:00
3cb7e42505
it doesn't crash as much now, but it still isn't behaving. 2022-05-19 18:52:15 +01:00
bb018c53f6
Fix many bugs
Many bugs remain though
2022-05-19 17:54:14 +01:00
cc5efbae8a
Implement tfrecodify subcommand.
It's all still untested, but that's the next step
2022-05-19 17:15:15 +01:00
0fa7ae9d6a
Imnplement plumbing, but it's all untested 2022-05-18 17:47:02 +01:00