README: Continue filling out, but we're not there yet.

This commit is contained in:
Starbeamrainbowlabs 2023-11-29 17:31:47 +00:00
parent 3eec01a28c
commit df621fd7d2
Signed by: sbrl
GPG key ID: 1BE5172E637709C2
2 changed files with 55 additions and 6 deletions

View file

@ -36,15 +36,13 @@ The process of using this model is as as illustrated:
![Flowchart illustrating the data flow for using the code in this repository to make predictions water depth](./research-rainfallradar%20overview.png) ![Flowchart illustrating the data flow for using the code in this repository to make predictions water depth](./research-rainfallradar%20overview.png)
TODO fix this flowchart.
More fully: More fully:
1. Apply for access to [CEDA's 1km rainfall radar dataset](https://catalogue.ceda.ac.uk/uuid/27dd6ffba67f667a18c62de5c3456350) 1. Apply for access to [CEDA's 1km rainfall radar dataset](https://catalogue.ceda.ac.uk/uuid/27dd6ffba67f667a18c62de5c3456350)
2. Download 1km rainfall radar data (use [`nimrod-data-downloader`](https://www.npmjs.com/package/nimrod-data-downloader)) 2. Download 1km rainfall radar data (use [`nimrod-data-downloader`](https://www.npmjs.com/package/nimrod-data-downloader))
3. Obtain a heightmap (or *Digital Elevation Model*, as it's sometimes known) from the Ordnance Survey (can't remember the link, please PR to add this) 3. Obtain a heightmap (or *Digital Elevation Model*, as it's sometimes known) from the Ordnance Survey (can't remember the link, please PR to add this)
4. Use [`terrain50-cli`](https://www.npmjs.com/package/terrain50-cli) to slice the the output from steps #2 and #3 to be exactly the same size [TODO: Preprocess to extract just a single river basin from the data] 4. Use [`terrain50-cli`](https://www.npmjs.com/package/terrain50-cli) to slice the the output from steps #2 and #3 to be exactly the same size [TODO: Preprocess to extract just a single river basin from the data]
5. Push through [HAIL-CAESAR](*https://github.com/sbrl/HAIL-CAESAR) (this fork has the ability to handle streams of .asc files rather than each time step having it's own filename) 5. Push through [HAIL-CAESAR](https://github.com/sbrl/HAIL-CAESAR) (this fork has the ability to handle streams of .asc files rather than each time step having it's own filename)
6. Use `rainfallwrangler` in this repository (finally!) to convert the output to .json.gz then .tfrecord files 6. Use `rainfallwrangler` in this repository (finally!) to convert the output to .json.gz then .tfrecord files
7. Train a DeepLabV3+ prediction model 7. Train a DeepLabV3+ prediction model
@ -63,12 +61,28 @@ This tool was also written me, [@sbrl](https://starbeamrainbowlabs.com/) - the p
Full documentation on this tool is available at the above link. Full documentation on this tool is available at the above link.
<------ WRITING HERE **Heightmap:** Anything will do, but I used the [Ordnance Survey Terrain50](https://www.ordnancesurvey.co.uk/products/os-terrain-50) heightmap, since it is in the OS National Grid format (eww >_<), same as the aforementioned rainfall radar data.
### Running the simulation
Once you have your data, ensure it is in a format that the HAIL-CAESAR model will understand. For the rainfall radar data, this is done using the `radar2caesar` command of `nimrod-data-downloader`, as mentioned above.
TODO document the next steps. before running the simulation, the heightmap and rainfall radar will need cropping to match one another. For this the tool [`terrain50-cli`](https://www.npmjs.com/package/terrain50-cli) was developed.
Once this is done, the next step is to run HAIL-CAESAR. Details on this can be found here:
<https://github.com/sbrl/HAIL-CAESAR/>
....unfortunately, due to the way HAIL-CAESAR is programmed, it reads *all* the rainfall radar data into memory first before running the simulation. From memory for data from 2006 to 2020 it used approximately 350GiB - 450GiB RAM.
Replacing this simulation with a better one is on the agenda for moving forwards with this research project - especially since I need to re-run a hydrological simulation anyway when attempting a tile-based approach.
### Preparing to train the model
Once the simulation has run to completion, all 3 pieces are now in place to prepare to train an AI model. The AI model training process requires that data is stored in `.tfrecord` files for efficiency given the very large size of the dataset in question.
This is done using the `rainfallwrangler` tool in the eponymous directory in this repository. Full documentation on `rainfallwrangler` can be found in the README in that directory:
[rainfallwrangler README](./rainfallwrangler/README.md)
## rainfallwrangler
`rainfallwrangler` is a Node.js application to wrangle the dataset into something more appropriate for training an AI efficiently. The rainfall radar and water depth data are considered temporally to be regular time steps. Here's a diagram explaining the terminology: `rainfallwrangler` is a Node.js application to wrangle the dataset into something more appropriate for training an AI efficiently. The rainfall radar and water depth data are considered temporally to be regular time steps. Here's a diagram explaining the terminology:
``` ```
@ -90,5 +104,16 @@ Note to self: 150.12 hashes/sec on i7-4770 4c8t, ???.?? hashes/sec on Viper comp
After double checking, rainfallwrangler does NOT mess with the ordering of the data. After double checking, rainfallwrangler does NOT mess with the ordering of the data.
### Training the model
After all of the above steps are completed, a model can now be trained.
The current state of the art (that was presented in the above paper!) is based on DeepLabV3+. A note of caution: this repository contains some older models, so it can be easy to mix them up. Hence this documentation :-)
<------ WRITING HERE
TODO: Continue the guide here.
## License ## License
All the code in this repository is released under the GNU Affero General Public License unless otherwise specified. The full license text is included in the [`LICENSE.md` file](./LICENSE.md) in this repository. GNU [have a great summary of the licence](https://www.gnu.org/licenses/#AGPL) which I strongly recommend reading before using this software. All the code in this repository is released under the GNU Affero General Public License unless otherwise specified. The full license text is included in the [`LICENSE.md` file](./LICENSE.md) in this repository. GNU [have a great summary of the licence](https://www.gnu.org/licenses/#AGPL) which I strongly recommend reading before using this software.

View file

@ -0,0 +1,24 @@
# rainfallwrangler
> Wrangles rainfall radar and water depth data into something sensible.
This Node.js-based tool is designed for wrangling rainfall, heightmap, and water depth data into something that the image semantic segmentation model that is the main feature of this repository can understand.
The reason for this is efficiency: nothing less than a set of `.tfrecord` files for reading in parallel is sufficient if one wants the model to train in a reasonable length of time.
TODO: Write a guide for this tool here.
## System requirements
## Getting started
## Contributing
## Licence
Same as that of the main repository. TODO expand on this.