From 60674fb6b3481a822ccd59312c24e53af39e01c5 Mon Sep 17 00:00:00 2001
From: Starbeamrainbowlabs <sbrl@starbeamrainbowlabs.com>
Date: Thu, 30 Nov 2023 16:32:46 +0000
Subject: [PATCH] README: finish filling out

---
 README.md | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 73 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index e23bf1d..edec03f 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,9 @@
 
 This is the 3rd major version of this model.
 
-Unfortunately using this model is rather complicated and involves a large number of steps. There is no way around this. This README (will) explain it the best I can though.
+Unfortunately using this model is rather complicated and involves a large number of steps. There is no way around this. This README explains it the best I can though.
+
+Should anything be unclear, please [open an issue](https://github.com/sbrl/research-rainfallradar/issues/new)
 
 > [!WARNING]
 > This README is currently under construction!
@@ -31,6 +33,11 @@ By modelling the task as an image segmentation problem, an alternative approach
  - 1TiB disk space free
  - Lots of time and patience
 
+> [!NOTE]
+> The format that HAIL-CAESAR accepts data in results in a ~450GiB rainfall radar file O.o
+> 
+> Thankfully the format that `nimrod-data-downloader` downloads in is only a couple of GiB in the end, and the `.tfrecord` files that the model accepts is only ~70GiB.
+
 ## Overview
 The process of using this model is as as illustrated:
 
@@ -110,10 +117,72 @@ After all of the above steps are completed, a model can now be trained.
 
 The current state of the art (that was presented in the above paper!) is based on DeepLabV3+. A note of caution: this repository contains some older models, so it can be easy to mix them up. Hence this documentation :-)
 
-<------ WRITING HERE
+This model is located in the file [`aimodel/src/deeplabv3_plus_test_rainfall.py`](./aimodel/src/deeplabv3_plus_test_rainfall.py), and is controlled via a system of environment variables. Before using it, you must first install any dependencies you're missing:
 
-TODO: Continue the guide here.
+```bash
+pip3 install --user -r aimodel/requirements.txt
+```
+
+The model should work with any recent version of Tensorflow. See the [version table](https://www.tensorflow.org/install/source#gpu) if you are having trouble with CUDA and/or CuDNN.
+
+With requirements installed, we can train a model. The general form this is done is like so:
+
+```bash
+cd aimodel
+[ENVIRONMENT_VARIABLES_HERE] src/deeplabv3_plus_test_rainfall.py
+```
+
+This model has mainly been tested and trained on the [University of Hull's Viper HPC](), which runs [Slurm](). As such, a Slurm job file is available in [`aimodel/slurm-TEST-deeplabv3p-rainfall.job`](./aimodel/slurm-TEST-deeplabv3p-rainfall.job), which wraps the aforementioned script.
+
+The following environment variables are supported:
+
+Environment Variable		| Meaning
+----------------------------|-------------------------------------------------
+IMAGE_SIZE=128				| Optional. Sets the size of the 'images' that the DeepLabV3+ model will work with.
+BATCH_SIZE=64				| Optional. Sets the batch size to train the model with.
+DIR_RAINFALLWATER			| The path to the directory the .tfrecord files containing the rainfall radar / water depth data.
+PATH_HEIGHTMAP 				| The path to the heightmap jsonl file to read in.
+PATH_COLOURMAP 				| The path to the colourmap for predictive purposes.
+DIR_OUTPUT					| The directory to write output files to. Automatically calculated in the Slurm job files unless manually set. See POSTFIX to alter DIR_OUTPUT without disrupting the automatic calculation. If you are calling `slurm-TEST-deeplabv3p-rainfall.job` directly then you MUST set this environment variable manually.
+PARALLEL_READS				| Multiplier for the number of files to read in parallel. 1 = number of CPU cores available. Very useful on high-read-latency systems (e.g. HPC like Viper) to avoid starving the GPU of data. WILL MANGLE THE ORDERING OF DATA. Set to 0 to disable and read data sequentially. WILL ONLY NOT MANGLE DATA IF PREDICT_AS_ONE IS SET. Defaults to 1.5.
+STEPS_PER_EPOCH				| The number of steps to consider an epoch. Defaults to None, which means use the entire dataset.
+NO_REMOVE_ISOLATED_PIXELS	| Set to any value to avoid the engine from removing isolated pixels - that is, water pixels with no other surrounding pixels, either side to side to diagonally.
+EPOCHS=50					| The number of epochs to train for.
+LOSS="cross-entropy"		| The loss function to use. Default: cross-entropy (possible values: cross-entropy, cross-entropy-dice).
+DICE_LOG_COSH				| When in cross-entropy-dice mode, in addition do loss = cel + log(cosh(dice_loss)) instead of just loss = cel + dice_loss. Default: unset
+WATER_THRESHOLD=0.1			| The threshold to cut water off at when training, in metres. Default: 0.1
+PATH_CHECKPOINT				| The path to a checkpoint to load. If specified, a model will be loaded instead of being trained.
+LEARNING_RATE=0.001			| The learning rate to use. Default: 0.001.
+UPSAMPLE=2					| How much to upsample by at the beginning of the model. A value of disables upscaling. Default: 2.
+STEPS_PER_EXECUTION=1		| How many steps to perform before surfacing from the GPU to e.g. do callbacks. Default: 1.
+RANDSEED					| The random seed to use when shuffling filepaths. Default: unset, which means use a random value.
+JIT_COMPILE					| Set to any value to compile the model with XLA. Defaults to unset; set to any value to enable.
+PREDICT_COUNT=25			| The number of items from the (SCRAMBLED) dataset to make a prediction for.
+PREDICT_AS_ONE				| [prediction only] Set to any value to avoid splitting the input dataset into training/validation and instead treat it as a single dataset. Default: False (treat it as training/validation)
+POSTFIX						| Postfix to append to the output directory name (primarily auto calculated if DIR_OUTPUT is not specified, but this allows adjustments to be made without setting DIR_OUTPUT).
+ARGS						| Optional. Any additional arguments to pass to the python program.
+
+> [!IMPORTANT]
+> It is strongly advised that all filepaths do **NOT** contain spaces.
+
+
+**Making predictions:** Set `PATH_CHECKPOINT` to point to a checkpoint file to make predictions with an existing model that you trained earlier instead of training a new one. Data is pulled from the given dataset, same as during training. The first `PREDICT_COUNT` items in the dataset are picked to make a prediction. 
+
+> [!NOTE] The dataset pipeline is naturally non-deterministic with respect to the order in which samples are read. Ensuring the ordering of samples is not mangled is only possible when making predictions, and requires a number of environment variables to be set:
+> 
+> - **`PREDICT_AS_ONE`:** Set to any value to disable the training / validation split
+> - **`PARALLEL_READS`:** Set to `0` to reading input files sequentially.
+
+## Contributing
+Contributions are very welcome - both issues and pull requests! Please mention in any pull requests that you release your work under the AGPL-3 (see below).
 
 
 ## License
-All the code in this repository is released under the GNU Affero General Public License unless otherwise specified. The full license text is included in the [`LICENSE.md` file](./LICENSE.md) in this repository. GNU [have a great summary of the licence](https://www.gnu.org/licenses/#AGPL) which I strongly recommend reading before using this software.
+All the code in this repository is released under the GNU Affero General Public License 3.0 unless otherwise specified. The full license text is included in the [`LICENSE.md` file](./LICENSE.md) in this repository. GNU [have a great summary of the licence](https://www.gnu.org/licenses/#AGPL) which I strongly recommend reading before using this software.
+
+> [!NOTE] AGPL 3.0 was chosen for a number of reasons. The code in this repository has taken a very large amount of effort to put together, and to this end it is my greatest wish that this code and all derivatives be open-source. Open-source AI models enable the benefits thereof to be distributed and shared to all, and ensure transparency surrounding methodology, process, and limitations.
+> 
+> You may contact me to negotiate a different licence, but do not hold out hope.
+> 
+> --Starbeamrainbowlabs, aka Lydia Bryan-Smith  
+> Primary author
\ No newline at end of file