From df621fd7d2ab55ab5217c0ca860f062b23ae5598 Mon Sep 17 00:00:00 2001 From: Starbeamrainbowlabs Date: Wed, 29 Nov 2023 17:31:47 +0000 Subject: [PATCH] README: Continue filling out, but we're not there yet. --- README.md | 37 +++++++++++++++++++++++++++++++------ rainfallwrangler/README.md | 24 ++++++++++++++++++++++++ 2 files changed, 55 insertions(+), 6 deletions(-) create mode 100644 rainfallwrangler/README.md diff --git a/README.md b/README.md index d6ff4c8..e23bf1d 100644 --- a/README.md +++ b/README.md @@ -36,15 +36,13 @@ The process of using this model is as as illustrated: ![Flowchart illustrating the data flow for using the code in this repository to make predictions water depth](./research-rainfallradar%20overview.png) -TODO fix this flowchart. - More fully: 1. Apply for access to [CEDA's 1km rainfall radar dataset](https://catalogue.ceda.ac.uk/uuid/27dd6ffba67f667a18c62de5c3456350) 2. Download 1km rainfall radar data (use [`nimrod-data-downloader`](https://www.npmjs.com/package/nimrod-data-downloader)) 3. Obtain a heightmap (or *Digital Elevation Model*, as it's sometimes known) from the Ordnance Survey (can't remember the link, please PR to add this) 4. Use [`terrain50-cli`](https://www.npmjs.com/package/terrain50-cli) to slice the the output from steps #2 and #3 to be exactly the same size [TODO: Preprocess to extract just a single river basin from the data] -5. Push through [HAIL-CAESAR](*https://github.com/sbrl/HAIL-CAESAR) (this fork has the ability to handle streams of .asc files rather than each time step having it's own filename) +5. Push through [HAIL-CAESAR](https://github.com/sbrl/HAIL-CAESAR) (this fork has the ability to handle streams of .asc files rather than each time step having it's own filename) 6. Use `rainfallwrangler` in this repository (finally!) to convert the output to .json.gz then .tfrecord files 7. Train a DeepLabV3+ prediction model @@ -63,12 +61,28 @@ This tool was also written me, [@sbrl](https://starbeamrainbowlabs.com/) - the p Full documentation on this tool is available at the above link. -<------ WRITING HERE +**Heightmap:** Anything will do, but I used the [Ordnance Survey Terrain50](https://www.ordnancesurvey.co.uk/products/os-terrain-50) heightmap, since it is in the OS National Grid format (eww >_<), same as the aforementioned rainfall radar data. +### Running the simulation +Once you have your data, ensure it is in a format that the HAIL-CAESAR model will understand. For the rainfall radar data, this is done using the `radar2caesar` command of `nimrod-data-downloader`, as mentioned above. -TODO document the next steps. +before running the simulation, the heightmap and rainfall radar will need cropping to match one another. For this the tool [`terrain50-cli`](https://www.npmjs.com/package/terrain50-cli) was developed. + +Once this is done, the next step is to run HAIL-CAESAR. Details on this can be found here: + + + +....unfortunately, due to the way HAIL-CAESAR is programmed, it reads *all* the rainfall radar data into memory first before running the simulation. From memory for data from 2006 to 2020 it used approximately 350GiB - 450GiB RAM. + +Replacing this simulation with a better one is on the agenda for moving forwards with this research project - especially since I need to re-run a hydrological simulation anyway when attempting a tile-based approach. + +### Preparing to train the model +Once the simulation has run to completion, all 3 pieces are now in place to prepare to train an AI model. The AI model training process requires that data is stored in `.tfrecord` files for efficiency given the very large size of the dataset in question. + +This is done using the `rainfallwrangler` tool in the eponymous directory in this repository. Full documentation on `rainfallwrangler` can be found in the README in that directory: + +[rainfallwrangler README](./rainfallwrangler/README.md) -## rainfallwrangler `rainfallwrangler` is a Node.js application to wrangle the dataset into something more appropriate for training an AI efficiently. The rainfall radar and water depth data are considered temporally to be regular time steps. Here's a diagram explaining the terminology: ``` @@ -90,5 +104,16 @@ Note to self: 150.12 hashes/sec on i7-4770 4c8t, ???.?? hashes/sec on Viper comp After double checking, rainfallwrangler does NOT mess with the ordering of the data. + +### Training the model +After all of the above steps are completed, a model can now be trained. + +The current state of the art (that was presented in the above paper!) is based on DeepLabV3+. A note of caution: this repository contains some older models, so it can be easy to mix them up. Hence this documentation :-) + +<------ WRITING HERE + +TODO: Continue the guide here. + + ## License All the code in this repository is released under the GNU Affero General Public License unless otherwise specified. The full license text is included in the [`LICENSE.md` file](./LICENSE.md) in this repository. GNU [have a great summary of the licence](https://www.gnu.org/licenses/#AGPL) which I strongly recommend reading before using this software. diff --git a/rainfallwrangler/README.md b/rainfallwrangler/README.md new file mode 100644 index 0000000..bed227e --- /dev/null +++ b/rainfallwrangler/README.md @@ -0,0 +1,24 @@ +# rainfallwrangler + +> Wrangles rainfall radar and water depth data into something sensible. + + +This Node.js-based tool is designed for wrangling rainfall, heightmap, and water depth data into something that the image semantic segmentation model that is the main feature of this repository can understand. + +The reason for this is efficiency: nothing less than a set of `.tfrecord` files for reading in parallel is sufficient if one wants the model to train in a reasonable length of time. + + +TODO: Write a guide for this tool here. + +## System requirements + + +## Getting started + + + +## Contributing + + +## Licence +Same as that of the main repository. TODO expand on this. \ No newline at end of file