- Jupyter Notebook 58.1%
- Python 40.3%
- Nix 1%
- Dockerfile 0.4%
- Shell 0.2%
| coelho | ||
| configs | ||
| doc | ||
| notebooks | ||
| requirements | ||
| scripts | ||
| src | ||
| tests | ||
| .dockerignore | ||
| .gitignore | ||
| Containerfile | ||
| flake.lock | ||
| flake.nix | ||
| README.md | ||
| run-container.sh | ||
| run-visualizer.sh | ||
Pool model trainer
Fine-tune a computer vision model to detect swimming pools in residential areas using overhead aerial or satellite photography.
Quick start
# 1. Enter the Nix dev shell (pulls PyTorch, OpenCV, rasterio, etc.)
nix develop
# 2. Install the one pip-only dependency
pip install segmentation-models-pytorch
# 3. Prepare your data (see "Data format" below)
# → images in data/images/
# → masks in data/masks/
# 4. Train
python scripts/train.py --config configs/default.yaml
# 5. Watch progress
tensorboard --logdir runs/
# 6. Run on a new image
python -m src.inference.predict \
--checkpoint runs/<run-name>/checkpoints/best.pth \
--image path/to/new_tile.png
Setup without Nix (bare metal / virtual env)
If you don't use Nix — for example on a GPU server where CUDA drivers and
toolkits are pre-installed — you can create a standard Python virtual
environment using the requirements/ files.
1. System dependencies
Install the native libraries needed by OpenCV, PyTorch, and compiled Python packages:
# Debian / Ubuntu
sudo apt-get update && sudo apt-get install -y \
libgl1-mesa-glx libglib2.0-0 libgomp1 gcc g++ make
# RHEL / Fedora / CentOS
sudo dnf install -y \
mesa-libGL glib2 libgomp gcc gcc-c++ make
2. Create a virtual environment
python3.11 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
3. Install PyTorch
Pick the index URL that matches your hardware:
# CPU only
pip install -r requirements/torch.txt --index-url https://download.pytorch.org/whl/cpu
# CUDA 11.8
pip install -r requirements/torch.txt --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.4
pip install -r requirements/torch.txt --index-url https://download.pytorch.org/whl/cu124
4. Install project dependencies
# Core training dependencies (image/CV, data wrangling, training utils)
pip install -r requirements/base.txt
# Optional dev tooling (Jupyter, Label Studio SDK, etc.)
pip install -r requirements/dev.txt
5. Verify
python -c "import torch; print(torch.cuda.is_available())"
python -c "import segmentation_models_pytorch as smp; print(smp.__version__)"
Then follow the same Training and Inference instructions above.
Data format
The training pipeline expects this directory layout:
data/
├── images/ # aerial / satellite tiles
│ ├── tile_001.png
│ ├── tile_002.png
│ ├── ...
│ └── tile_999.png
├── masks/ # label masks — one per image, same stem name
│ ├── tile_001.png
│ ├── tile_002.png
│ └── ...
├── train.txt # (optional) newline-separated stem names for training
└── val.txt # (optional) newline-separated stem names for validation
Image files
- Format: PNG is preferred. JPEG and GeoTIFF (
.tif/.tiff) also work. - Size: The model resizes everything to
image_size × image_size(default 256×256) during loading, so source images can be any resolution. Larger tiles give the model more context at the expense of GPU memory. - Channels: RGB (3 channels). If your imagery has a near-infrared band,
set
model.in_channels: 4in the config.
Mask files
- Format: Single-channel (grayscale) PNG.
- Pixel values: Integer class labels.
Value Meaning 0 Background (not a pool) 1 Pool - Same dimensions as the image. The model resizes masks together with images so spatial alignment is preserved through nearest-neighbor interpolation on the mask.
- Filenames must match the image they annotate. If
tile_001.pngis the image, the mask must be namedtile_001.pngas well (just in a different folder).
Split files (train.txt / val.txt)
Plain text files listing which samples go into each split, one per line, without the file extension:
# data/train.txt
tile_001
tile_003
tile_007
...
If you omit these files, the pipeline splits the data automatically using
data.val_fraction (default 15%).
Exporting from Label Studio
Label Studio is the most popular open-source tool for image annotation. This section covers how to get your annotations out of Label Studio and into the format above.
There are two workflows:
- SDK-based (recommended) — export directly via the Label Studio API, then convert. No manual download step.
- Manual JSON export — download the JSON from the Label Studio UI, then convert locally.
Step 1 — Set up your Label Studio project
- Create a new project with Labeling Setup → Computer Vision → Semantic Segmentation.
- Under Labeling Interface, add a Brush with nested Labels tag:
<View>
<Image name="image" value="$image"/>
<Brush name="pool" toName="image">
<Labels name="labels" toName="image">
<Label value="Swimming Pool" />
</Labels>
</Brush>
</View>
- Import your aerial/satellite tiles through the Label Studio UI.
- Annotate pools by painting over them with the brush tool.
- Use a brush size appropriate to your image resolution.
- Be consistent — label pool water only (or define a convention like "pool water + visible coping" and stick to it across all annotators).
Step 2 — Export from Label Studio
Option A — SDK export (recommended)
Use the Label Studio SDK to export directly from the API. This avoids manual download steps and can export + convert in a single command.
# Set credentials
export LABEL_STUDIO_URL=https://labelstudio.example.com
export LABEL_STUDIO_API_KEY=your_access_token_here
# Export only (saves JSON for inspection / reuse)
python scripts/export_label_studio_sdk.py \
--project-id 1 \
--output project-export.json
# Export + convert to training format in one step
python scripts/export_label_studio_sdk.py \
--project-id 1 \
--translate \
--image-root /path/to/original/images/ \
--output-dir data/
Credentials can be passed via --url / --api-key CLI flags or the
LABEL_STUDIO_URL / LABEL_STUDIO_API_KEY environment variables.
Install the SDK:
pip install label-studio-sdk
Option B — Manual JSON export
- Go to your project page and click Export.
- Choose JSON as the export format.
- Click Export to download a file like
project-1-at-2026-06-19.json.
Step 2a — Inspect export statistics
Before converting, verify your export looks right:
python scripts/stats_label_studio.py project-1-at-2026-06-19.json
Outputs:
- Total tasks in the export
- Completed vs incomplete tasks
- How many tasks show pool areas vs no pools
Step 3 — Convert to training format
Run the conversion script. Point it at either a local image directory or an S3-compatible bucket (AWS S3, Garage, MinIO, etc.).
Local images:
python scripts/convert_label_studio.py \
--input project-1-at-2026-06-19.json \
--image-root /path/to/original/images/ \
--output-dir data/
S3 images (e.g. self-hosted Garage):
python scripts/convert_label_studio.py \
--input project-1-at-2026-06-19.json \
--s3-bucket pool-tiles \
--s3-prefix tiles/ \
--s3-endpoint https://garage.example.com \
--s3-access-key GK... \
--s3-secret-key ... \
--output-dir data/
Credentials can also be set via the standard environment variables
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY instead of CLI flags.
What this does:
- Downloads (or copies/symlinks) each annotated image into
data/images/ - Decodes the RLE masks (or rasterizes polygons) into
data/masks/— uses the annotation canvas dimensions (original_width/original_height) from the Label Studio export so masks are correctly aligned with the image - Images annotated as "no pools" get an all-zeros mask (treated as background)
- Writes
data/train.txtanddata/val.txtwith an 85/15 split - Skips images that have never been annotated (truly unlabeled)
- Skips duplicate stems (same image appearing in multiple tasks)
- Prints a summary of what was exported
Full usage:
python scripts/convert_label_studio.py \
--input export.json \
--image-root /original/tiles/ \ # local images, OR:
--s3-bucket my-bucket \ # S3 bucket
--s3-endpoint https://s3.example.com \ # S3 endpoint (for non-AWS)
--s3-prefix tiles/ \ # optional key prefix
--s3-access-key ... --s3-secret-key ... # S3 credentials
--output-dir data/ \
--val-fraction 0.15 \
--image-format png \ # or jpg, tif
--mask-format png \
--copy # copy files, don't symlink (use --symlink for the opposite)
--seed 42
Note: S3 support requires the
boto3package. RLE decoding requireslabel-studio-converter. Install both with:pip install boto3 label-studio-converter
Step 4 — Verify the conversion
Spot-check samples with the built-in web viewer:
python scripts/visualize_data.py --data-dir data/ --port 8080
This starts a local web server showing each sample's source image, mask, and a red-tinted overlay side by side. Navigate with the on-screen buttons or the left/right arrow keys.
If you're running on a remote server, create an SSH tunnel:
ssh -L 8080:localhost:8080 your-server
Then open http://localhost:8080 in your browser.
Training
Before you start
The pretrained encoder weights are downloaded from Hugging Face Hub. To avoid rate-limiting and enable faster downloads, set a Hugging Face token:
export HF_TOKEN=hf_your_token_here
You can get a free token at https://huggingface.co/settings/tokens. Without one training still works, but downloads may be slower.
First run
python scripts/train.py --config configs/default.yaml
This will:
- Read images + masks from
data/ - Build a UNet with a ResNet-34 encoder (pretrained on ImageNet)
- Train for 50 epochs, validating after each epoch
- Log loss, IoU, and Dice to TensorBoard
- Save the best checkpoint (by validation IoU) to
runs/<timestamp>/checkpoints/best.pth
Override config from the command line
python scripts/train.py --config configs/default.yaml \
training.num_epochs=100 \
training.batch_size=4 \
model.encoder_name=resnet50
Resume from a checkpoint
python scripts/train.py --config configs/default.yaml \
training.resume_from=runs/20260619_120000/checkpoints/best.pth
Monitor with TensorBoard
tensorboard --logdir runs/ --port 6006
# Open http://localhost:6006 in a browser
Key metrics to watch:
val/iou_mean— your primary metric. Above 0.7 is good, above 0.85 is excellent.train/loss— should decrease smoothly. Spikes may mean your learning rate is too high.val/dice_class_1— Dice score for the pool class only (ignores background).
Training outputs
Each training run creates a timestamped directory under runs/. Here's what
ends up on disk:
runs/
└── 20260621_143052/ # auto-generated experiment name (timestamp)
├── tensorboard/ # TensorBoard event files
│ └── events.out.tfevents...
└── checkpoints/
├── best.pth # checkpoint with highest val/iou_mean
├── epoch_0005.pth # periodic snapshot (every save_every epochs)
├── epoch_0010.pth
├── ...
└── last.pth # checkpoint from the final epoch
Checkpoint contents — each .pth file is a standard PyTorch checkpoint
dictionary:
| Key | Contents |
|---|---|
epoch |
Integer — which epoch this was saved from |
model_state_dict |
Model weights (loadable with load_state_dict) |
optimizer_state_dict |
Optimizer state (for resuming training) |
metrics |
Dict with val/loss, val/iou_mean, etc. |
Which checkpoint should I use?
best.pth— use this for inference / predictions. It's the model with the highest validation IoU across all epochs.last.pth— the final model state. Useful if you want to resume training later.epoch_NNNN.pth— periodic snapshots. Handy if you notice the model started overfitting and want to pick an earlier epoch.
Inference
# Single image
python -m src.inference.predict \
--checkpoint runs/run_name/checkpoints/best.pth \
--image data/images/tile_042.png \
--output predictions/
# All images in a directory
python -m src.inference.predict \
--checkpoint runs/run_name/checkpoints/best.pth \
--dir data/images/ \
--output predictions/
# On GPU
python -m src.inference.predict \
--checkpoint runs/run_name/checkpoints/best.pth \
--image tile.png --device cuda
Output masks are written as grayscale PNGs where white (255) = predicted pool.
Using the model with Label Studio (pre-annotation)
Once you have a trained model, you can use it to pre-annotate tasks in Label Studio. The model takes a first pass at detecting pools on every image; annotators then review and fix the predictions instead of drawing every pool from scratch. This can dramatically speed up labeling throughput.
How it works
The script scripts/predict_label_studio.py:
- Connects to your Label Studio instance via the SDK.
- Lists all tasks in the target project.
- Downloads each image, runs the model, and converts the output mask into Label Studio's RLE brush-label format.
- Pushes the mask as a prediction on the task. When an annotator opens the task, the prediction appears as a pre-filled brush region — they can accept it, adjust the boundaries, or erase it entirely.
Quick start
# Install the SDK (one-time)
pip install label-studio-sdk
# Set credentials
export LABEL_STUDIO_URL=https://labelstudio.example.com
export LABEL_STUDIO_API_KEY=your_access_token_here
# Push predictions for all tasks in project 1
python scripts/predict_label_studio.py \
--checkpoint runs/20260621_143052/checkpoints/best.pth \
--project-id 1
Common workflows
Dry-run first — simulate without uploading to confirm the model produces sane output on your imagery:
python scripts/predict_label_studio.py \
--checkpoint runs/run_name/checkpoints/best.pth \
--project-id 1 --dry-run
Target specific tasks — only pre-annotate tasks 42, 43, and 44:
python scripts/predict_label_studio.py \
--checkpoint runs/run_name/checkpoints/best.pth \
--project-id 1 --task-ids 42 43 44
Skip already-predicted tasks — safe to re-run after adding new images to the project; tasks that already have predictions are left alone:
python scripts/predict_label_studio.py \
--checkpoint runs/run_name/checkpoints/best.pth \
--project-id 1 --skip-existing
Use GPU for faster inference on projects with many images:
python scripts/predict_label_studio.py \
--checkpoint runs/run_name/checkpoints/best.pth \
--project-id 1 --device cuda
Tune batch processing for higher throughput on large projects:
# Increase GPU batch size (default 8 — try 16 or 24 on GPUs with ≥16 GB VRAM)
python scripts/predict_label_studio.py \
--checkpoint runs/run_name/checkpoints/best.pth \
--project-id 1 --device cuda \
--inference-batch-size 16
# Speed up task listing with page size (default 100)
# (Not exposed as a flag — edit scripts/predict_label_studio.py if needed)
The pipeline processes images in batches for maximum GPU utilisation:
- Downloads a batch of images from S3 in parallel (up to 8 concurrent requests)
- Runs GPU inference on the entire batch at once
- Uploads predictions to Label Studio in parallel (up to 8 concurrent requests)
This keeps the GPU busy and minimises idle time waiting for I/O. Progress is reported every 10 seconds with tasks/second, success/error counts, and ETA.
Task list caching
The task list is cached locally to avoid re-downloading it on every run
(default TTL: 60 minutes). The cache lives at
.cache/tasks_<project_id>.json.
Important: If predictions are created (by any process) while a cached task list is still valid, a subsequent run will use the stale cache and may skip the anti-duplicate check — potentially creating duplicate predictions on tasks that were already annotated since the cache was written. To force a fresh task list:
# Delete the cache file before running
rm .cache/tasks_1.json
# Or disable the cache entirely (always re-fetches)
python scripts/predict_label_studio.py \
--checkpoint runs/run_name/checkpoints/best.pth \
--project-id 1 --device cuda \
--cache-ttl-min 0
# Or extend TTL for long-running production runs
python scripts/predict_label_studio.py \
--checkpoint runs/run_name/checkpoints/best.pth \
--project-id 1 --device cuda \
--cache-ttl-min 1440 # 24 hours
Image downloads from S3 are never cached — only the task metadata list is.
Use a different label name if your labeling config doesn't use "Swimming Pool":
python scripts/predict_label_studio.py \
--checkpoint runs/run_name/checkpoints/best.pth \
--project-id 1 --label-name "Pool" \
--from-name "labels" --to-name "image"
What annotators see
When a task has a prediction, Label Studio displays the pre-filled mask alongside the image. Annotators can:
- Press Ctrl+Enter to accept the prediction as-is (it becomes the annotation).
- Use the brush/eraser tools to refine the mask.
- Delete the prediction and draw from scratch if the model was wrong.
Iterative improvement (active learning)
This workflow works particularly well in a loop:
- Label a small initial set of images in Label Studio (50–100 tiles).
- Export the annotations and train a first model (following the sections above).
- Pre-annotate the remaining unlabeled tiles with the model.
- Review and correct the predictions — much faster than labeling from scratch.
- Retrain with the expanded dataset (original labels + corrected predictions).
- Repeat until the model is good enough that you only need to spot-check.
Each iteration should improve the model, which in turn produces better pre-annotations, which makes each labeling pass faster.
Labeling config requirements
Your Label Studio project must use a Brush with nested Labels tag (semantic segmentation with a brush tool). The default tag names match the setup described in Exporting from Label Studio:
<View>
<Image name="image" value="$image"/>
<Brush name="pool" toName="image">
<Labels name="labels" toName="image">
<Label value="Swimming Pool" />
</Labels>
</Brush>
</View>
If your config uses different tag names, pass --from-name, --to-name, and
--label-name to match.
Multi-model training
The project supports multiple model architectures registered through a ModelSpec
abstraction. Two families are included:
| Model Type | Architecture | Task | Output |
|---|---|---|---|
segmentation |
SMP UNet / DeepLabV3+ / FPN / MANet | Semantic segmentation | Per-pixel class mask |
detection |
Faster R-CNN (ResNet-50 FPN) | Object detection | Bounding boxes + scores |
Training different model types
# Train a segmentation model (default)
python scripts/train.py --config configs/default.yaml
# Train a detection model (bboxes derived from masks automatically)
python scripts/train.py --config configs/default.yaml \
model.model_type=detection \
training.batch_size=4 \
training.learning_rate=0.0005
Both models use the same data/ directory layout (images + masks). The detection
model derives bounding boxes from masks via connected-component labeling — no
separate annotation format needed.
Analysing specific Label Studio tasks
Download images from specific LS tasks, run inference with multiple models, and save predictions for visual comparison:
# Credentials (all support environment variables)
export LABEL_STUDIO_URL=https://labelstudio.example.com
export LABEL_STUDIO_API_KEY=your_token
export S3_ENDPOINT=https://s3.example.com
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export S3_REGION=garage # or us-east-1 for AWS
python scripts/analyze_tasks.py \
--project-id 8 \
--task-ids 122497 122498 122499 \
--checkpoints runs/seg/best.pth runs/det/best.pth \
--labels "Segmentation" "Detection" \
--output analysis/run1 \
--include-ground-truth \
--device cuda
The output directory layout:
analysis/run1/
├── images/ # source images downloaded from S3
│ ├── task_122497.png
│ └── ...
├── masks/ # ground truth masks (if --include-ground-truth)
│ ├── task_122497.png
│ └── ...
├── Segmentation/ # predictions from model A
│ ├── task_122497_mask.png
│ └── ...
├── Detection/ # predictions from model B
│ ├── task_122497_mask.png
│ └── ...
└── models.json # metadata about the models used
Interactive comparison viewer
python scripts/visualize_data.py --comparison-dir analysis/run1/
# Open http://localhost:8080 in a browser
# Remote server: ssh -L 8080:localhost:8080 your-server
The comparison viewer shows the source image, ground truth mask, and one colour-coded prediction panel per model. Use the left/right arrow keys or the on-screen buttons to navigate.
Static comparison report
For batch metrics across an entire data directory:
python scripts/compare_models.py \
--checkpoints runs/seg/best.pth runs/det/best.pth \
--labels "UNet" "Faster-RCNN" \
--data-dir data/ \
--output comparisons/run1 \
--max-samples 50 \
--device cuda
Generates per-image visualizations in comparisons/run1/visualizations/ and
a metrics summary JSON at comparisons/run1/metrics.json.
Archiving and reusing models
Once you have a well-performing model, you can package it into a portable archive for transfer between inference systems or long-term storage.
Archive a run
# Package a training run (best + last checkpoints, tensorboard, summary)
python scripts/archive_run.py runs/stadia_seg_v1
# Custom output path
python scripts/archive_run.py runs/stadia_seg_v1 -o models/pool-detector-v2.tar.gz
The archive contains only the essential outputs — no input data or intermediate epoch snapshots:
stadia_seg_v1/
├── checkpoints/
│ ├── best.pth # best validation checkpoint
│ └── last.pth # final epoch (resume-capable)
├── tensorboard/ # metric history
│ └── events.out.tfevents...
└── summary.json # metrics, model config, parameter count
Inspect an archive
python scripts/archive_run.py --list models/stadia_seg_v1.tar.gz
Prints model type, best metrics, parameter count, and file listing.
Transfer and reuse
Copy the archive to another system and extract:
scp models/stadia_seg_v1.tar.gz gpu-server:/home/user/models/
ssh gpu-server
tar xzf models/stadia_seg_v1.tar.gz
The extracted directory works directly with inference and Label Studio scripts:
# Single-image inference
python -m src.inference.predict \
--checkpoint stadia_seg_v1/checkpoints/best.pth \
--image tile.png --device cuda
# Label Studio pre-annotation
python scripts/predict_label_studio.py \
--checkpoint stadia_seg_v1/checkpoints/best.pth \
--project-id 1 --device cuda
# Resume training from the last checkpoint
python scripts/train.py --config configs/default.yaml \
training.resume_from=stadia_seg_v1/checkpoints/last.pth
The checkpoint format is self-contained — it embeds all model type and architecture metadata, so no config file is needed to reconstruct the model for inference.
Project structure
src/
├── data/
│ ├── dataset.py # PoolDataset + DataLoader builder + bbox utilities
│ └── augmentations.py # Shared augmentation pipelines
├── models/
│ ├── spec.py # ModelSpec base class / protocol
│ ├── registry.py # Global model registry (lazy discovery)
│ ├── segmentation.py # SegmentationSpec (SMP UNet/DeepLabV3+/FPN/MANet)
│ ├── detection.py # DetectionSpec (Faster R-CNN)
│ └── factory.py # Low-level create_model() + count_parameters()
├── training/
│ ├── trainer.py # Model-type-agnostic training loop
│ ├── metrics.py # IoU, Dice, pixel accuracy, box IoU/F1
│ └── losses.py # DiceLoss, CombinedLoss (CE + Dice)
├── inference/
│ └── predict.py # CLI for running a trained model on new images
└── utils/
└── config.py # Typed configuration dataclass
configs/
└── default.yaml # All hyperparameters in one place
scripts/
├── train.py # Training entry point
├── archive_run.py # Package training runs into portable archives
├── export_label_studio_sdk.py # Export Label Studio project via SDK
├── fast_export_ls.py # Fast export: fetch annotated tasks via API
├── convert_label_studio.py # Label Studio JSON → training data
├── stats_label_studio.py # Print high-level stats from an export
├── predict_label_studio.py # Push model predictions onto Label Studio tasks
├── analyze_tasks.py # Download LS tasks & run inference with multiple models
├── compare_models.py # Multi-model comparison: metrics + visualizations
└── visualize_data.py # Web server for visually inspecting training data & predictions
notebooks/
└── 01_explore_data.ipynb # Interactive data exploration
tests/
└── test_metrics.py # Unit tests for metrics and losses
Adding a GPU
When you have access to an NVIDIA GPU:
-
In
flake.nix, change the torch line:# From: torch = pythonPackages.torch; # To: pythonPackages = pkgs.python3Packages; # or pkgs.cudaPackages.python3Packages torch = pythonPackages.torchWithCuda;Then run
nix developagain. -
In
configs/default.yaml, set:training: use_amp: true device: auto -
Bump up the batch size — the GPU can handle larger batches.
Common issues
"Floating point exception" or "Illegal instruction" when importing torchvision/rasterio
→ Your CPU lacks AVX2 instructions. The pre-built Nix binaries are compiled for newer
CPUs. Rebuild locally with nix develop --option builders '' (slow, but works) or
move to a GPU server where the binaries match the hardware.
ModuleNotFoundError: segmentation_models_pytorch
→ Run pip install segmentation-models-pytorch inside the nix shell.
It installs to .venv/ which is auto-added to PYTHONPATH.
CUDA out of memory
→ Reduce training.batch_size or data.image_size. Start with batch_size: 4, image_size: 224.
"unable to allocate shared memory" when running in Docker / Podman
→ PyTorch DataLoader workers use /dev/shm for inter-process communication,
which defaults to 64 MB inside containers. Add --shm-size=2g to your
docker run / podman run command, or set training.num_workers: 0 in
the config.