scientific software development for NIWA forecasting operations

an overview of what i worked on at earth sciences new zealand (formerly niwa)

Summary

I built scientific Python software for operational meteorology during my industry R&D internship at NIWA Forecasting Services. The work covered ensemble forecast ingestion, data visualisation, documentation infrastructure, and exploratory machine learning — within a real forecasting environment where correctness and reproducibility actually matter. The source code is not public, but I'm happy to discuss the work in detail.

Project Metadata

Organisation: NIWA / Earth Sciences New Zealand
Role: R&D Intern (Scientific Software Development)
Domain: Operational meteorology / numerical weather prediction
Stack: Python · xarray · cfgrib · pandas · NumPy · scikit-learn · matplotlib · cartopy · Jupyter · Pixi · MkDocs · Git / GitLab · cron · HPC Linux
Timeline: 2025
Project type: Industry internship / scientific software

Background: Why This Problem Matters

Operational meteorology is time-sensitive. Forecast teams work on fixed cycles — model runs drop at set hours, products must be published, and downstream systems depend on consistent outputs. The software supporting this can't just work in a notebook: it needs to run reliably, be maintained by people who aren't always software engineers, and produce outputs that forecasters can trust.

NIWA's workflows used large numerical weather prediction datasets in formats like GRIB and NetCDF, HPC infrastructure, and tooling built over many years — some in languages like NCL that have since fallen out of mainstream use. The gap between "code that produces a result" and "code that can be trusted to produce results repeatedly" is where operational software lives.

A few things worth understanding before working in this domain:

ECMWF's Open Data API structure and what fields it exposes
How cfgrib maps GRIB keys to xarray dimensions (it has quirks)
How ensemble spread communicates forecast uncertainty to operational forecasters
How Pixi manages reproducible Python environments across machines

My Role

I worked as the sole intern on this project, responsible for prototyping and implementing software across four workstreams: data pipeline development, visualisation, legacy code migration, and documentation. I owned the technical implementation end-to-end within each workstream, with guidance from the meteorology and software teams on domain requirements and operational constraints.

Requirements and Constraints

All software needed to run unattended on a Linux HPC scheduler (cron, with Cylc as the operational target)
Environments had to be fully reproducible across developer machines and HPC nodes
GRIB decoding needed to fail loudly on unexpected fields rather than silently producing wrong output
Outputs needed to be interpretable by forecasters, not just developers
The migration of NCL code had to preserve coordinate handling semantics that were implicit in the original
Documentation had to remain useful after the person who wrote the code was no longer around

Development Environment

The work ran on NIWA's Linux HPC environment, with local development on a standard machine. Python environments were managed with Pixi — a conda-compatible package manager that pins both Python packages and system-level dependencies — ensuring the environment was portable and reproducible without "works on my machine" problems.

[project]
name = "niwa-ens"
channels = ["conda-forge"]
platforms = ["linux-64"]

[dependencies]
python = ">=3.11"
xarray = "*"
cfgrib = "*"
eccodes = "*"
cartopy = "*"

Git and GitLab were used for version control and code review. Jupyter notebooks were used for exploratory analysis before code was promoted to scripts.

Architecture and Approach

The work wasn't a single system — it was a set of discrete scientific software components: a data ingestion pipeline, visualisation utilities, a clustering analysis, a migration of legacy plotting code, and a documentation framework. Each component was designed to be maintainable by the meteorology team, not just by the person who wrote it.

The pipeline architecture followed a standard ETL shape: download ECMWF Open Data fields, decode GRIB to xarray datasets, apply domain-specific transformations, and stage outputs for downstream processing or visualisation. Cylc was identified as the operational workflow manager target for eventual handoff.

Engineering Process

The early weeks were orientation: understanding forecast cycle timing, how ECMWF Open Data is structured, and how cfgrib decodes GRIB to xarray. From there, work proceeded iteratively within each workstream — prototype in Jupyter, identify edge cases (especially around GRIB field quirks and coordinate handling), harden into scripts, and document.

Code was reviewed and iterated with the team. The migration work required close review of existing NCL scripts to surface implicit assumptions before translating them — a wrong assumption about coordinate ordering would produce visually plausible but incorrect output.

The Work

ECMWF ensemble data ingestion

The main data pipeline downloaded ECMWF Open Data ensemble forecast fields, decoded GRIB to xarray datasets, and staged outputs for downstream processing.

import xarray as xr
import cfgrib

def load_ensemble_field(path: str, variable: str) -> xr.Dataset:
    ds = xr.open_dataset(
        path,
        engine="cfgrib",
        backend_kwargs={"filter_by_keys": {"shortName": variable}},
    )
    return ds

GRIB decoding has quirks — fields with different grid definitions or ensemble numbering conventions end up in separate datasets. Handling this cleanly required understanding how cfgrib maps GRIB keys to xarray dimensions, and failing loudly on unexpected inputs rather than silently producing wrong output. The pipeline was prototyped with cron scheduling, with Cylc identified as the operational target.

Forecast uncertainty visualisation

A key output was prototype heatmap visualisations for multi-model ensemble interpretation — helping forecasters understand spread across ensemble members rather than relying on a single deterministic run.

[placeholder — ensemble heatmap prototype]

import matplotlib.pyplot as plt
import cartopy.crs as ccrs

fig, ax = plt.subplots(subplot_kw={"projection": ccrs.PlateCarree()})
ax.contourf(
    ds.longitude,
    ds.latitude,
    ds["tp"].isel(number=0),
    transform=ccrs.PlateCarree(),
)

The visualisation work used matplotlib and cartopy for map projections, with xarray handling dimensional selection and reduction.

Wellington weather regime clustering

The exploratory ML component applied k-means clustering to historical Wellington observations, investigating whether recurring local weather regimes could be identified from observational data.

[placeholder — clustering output]

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=5, random_state=42)
labels = kmeans.fit_predict(features)

Whether the clusters were meteorologically meaningful required interpretation alongside the meteorology team — the software produced the output, but domain knowledge shaped what the output meant.

NCL to Python migration

A portion of the work translated NCAR Command Language (NCL) visualisation scripts into Python. NCL is no longer actively developed, and the team wanted its plotting routines in a language that future developers could maintain.

The migration was mostly straightforward, but NCL's implicit coordinate handling required care — cartopy and xarray make coordinate systems explicit, which meant surfacing assumptions that had been implicit in the original code and deciding how to handle them correctly rather than just carrying them forward.

MkDocs documentation framework

I built a standardised MkDocs framework for Python projects across the team — intended to make scientific software understandable to both developers and meteorologists who hadn't written the original code.

The structure followed a consistent pattern: project overview, installation, API reference generated from docstrings via mkdocstrings, and usage examples. The goal was documentation that would still be useful when the person who wrote the code was no longer around.

Outcome

A reproducible ECMWF ensemble data pipeline, runnable unattended on HPC, with explicit failure handling for GRIB decoding edge cases
Prototype ensemble spread visualisations ready for operational evaluation
A clustering analysis of Wellington weather regimes with meteorologist-interpretable output
NCL plotting routines translated to maintained Python equivalents
A MkDocs documentation framework standardised across team projects

What I Would Do Differently

The pipeline was prototyped with cron before Cylc was fully scoped. If I were starting again, I'd invest earlier in understanding the Cylc workflow definition format — even at the prototype stage — so that the operational handoff required less rework.

I'd also be more systematic earlier about which GRIB keys needed explicit handling versus which could be left to cfgrib defaults. The edge cases I encountered with grid definitions and ensemble numbering were individually manageable, but a more structured inventory at the start would have saved repeated debugging later.