3.3 KiB
rainfallwrangler
Wrangles rainfall radar and water depth data into something sensible.
This Node.js-based tool is designed for wrangling rainfall, heightmap, and water depth data into something that the image semantic segmentation model that is the main feature of this repository can understand.
The reason for this is efficiency: nothing less than a set of .tfrecord
files for reading in parallel is sufficient if one wants the model to train in a reasonable length of time.
System requirements
- Linux (Windows may work but is untested. You will probably have a bad day if you use Windows)
- Node.js v16+
- Python 3.8+ (encoding .tfrecord files, as all existing
npm
packages fo doing this suck) - Experience with the terminal
- Lots of time and patience
Getting started
This tool, unlike nimrod-data-downloader
and terrain50-cli
, is not published to npm
. This is because of the rather niche use-case this tool has.
To get started, first clone this git repository:
git clone git@github.com:sbrl/research-rainfallradar.git;
cd research-rainfallradar/rainfallwrangler;
Then, install dependencies:
npm install
pip3 install --user -r requirements.txt
The entrypoint for the tool is at src/index.mjs
. Call it like so:
src/index.mjs --help
It has 4 subcommands:
- recordify: Converts a
.asc
heightmap, a concatenated.asc
water depths file (output from HAIL-CAESAR), and animrod-data-downloader
rainfall radar directory into an intermediate.jsonl.gz
dataset. Defaults to putting 4096 samples per file. - uniq: Deduplicates samples across an entire
.jsonl.gz
dataset. Basically hashes all samples with SHA256, marks duplicate hashes for deletion, and then files through all files in the dataset to remove those slated for deletion. - recompress: Recompresses a
.jsonl.gz
dataset to ensure that (by default, 4096) samples are in each file. Needed afteruniq
sinceuniq
can leave different numbers of records in each file. - jsonl2tfrecord: Converts the aforementioned
.jsonl.gz
dataset into a.tfrecord
dataset that the DeepLabV3+ model can understand
All of these subcommands, where possible, operate in parallel. The general workflow is:
recordify
uniq
recompress
jsonl2tfrecord
Full help for each command is available if you call --help
:
src/index.mjs --help # Show general help for everything
src/index.mjs recordify --help # Snow specific help for the recordify subcommand
Contributing
Contributions are very welcome - both issues and pull requests! Please mention in any pull requests that you release your work under the AGPL-3 (see below).
Licence
Same as that of the main repository. All the code in this repository is released under the GNU Affero General Public License 3.0 unless otherwise specified. The full license text is included in the LICENSE.md
file in this repository. GNU have a great summary of the licence which I strongly recommend reading before using this software.