AMOCatlas conversion & compliance checker
The purpose of this notebook is to demonstrate the OceanSites format(s) from AMOCatlas.
The demo is organised to show
Step 1: Loading and plotting a sample dataset
Step 2: Converting one dataset to a standard format
Note that when you submit a pull request, you should clear all outputs from your python notebook for a cleaner merge.
[1]:
import pathlib
import sys
script_dir = pathlib.Path().parent.absolute()
parent_dir = script_dir.parents[0]
sys.path.append(str(parent_dir))
import importlib
import xarray as xr
import os
from amocatlas import readers, plotters, standardise, utilities
[2]:
# Specify the path for writing datafiles
data_path = os.path.join(parent_dir, "data")
Load RAPID 26°N
[3]:
# Load data from data/moc_transports (Quick start)
ds_rapid = readers.load_sample_dataset()
ds_rapid = standardise.standardise_rapid(ds_rapid, ds_rapid.attrs["source_file"])
# Load data from data/moc_transports (Full dataset)
datasetsRAPID = readers.load_dataset("rapid", transport_only=True)
standardRAPID = [
standardise.standardise_rapid(ds, ds.attrs["source_file"]) for ds in datasetsRAPID
]
Summary for array 'rapid':
Total datasets loaded: 1
Dataset 1:
Source file: moc_transports.nc
Dimensions:
- time: 14599
Variables:
- t_therm10: shape (14599,)
- t_aiw10: shape (14599,)
- t_ud10: shape (14599,)
- t_ld10: shape (14599,)
- t_bw10: shape (14599,)
- t_gs10: shape (14599,)
- t_ek10: shape (14599,)
- t_umo10: shape (14599,)
- moc_mar_hc10: shape (14599,)
Summary for array 'rapid':
Total datasets loaded: 1
Dataset 1:
Source file: moc_transports.nc
Dimensions:
- time: 14599
Variables:
- t_therm10: shape (14599,)
- t_aiw10: shape (14599,)
- t_ud10: shape (14599,)
- t_ld10: shape (14599,)
- t_bw10: shape (14599,)
- t_gs10: shape (14599,)
- t_ek10: shape (14599,)
- t_umo10: shape (14599,)
- moc_mar_hc10: shape (14599,)
/home/runner/micromamba/envs/amocatlas/lib/python3.14/site-packages/xarray/backends/plugins.py:109: RuntimeWarning: Engine 'gmt' loading failed:
Error loading GMT shared library at 'libgmt.so'.
libgmt.so: cannot open shared object file: No such file or directory
external_backend_entrypoints = backends_dict_from_pkg(entrypoints_unique)
[4]:
# Plot RAPID timeseries
plotters.plot_amoc_timeseries(
data=[standardRAPID[0]],
varnames=["moc_mar_hc10"],
labels=[""],
resample_monthly=True,
plot_raw=True,
title="RAPID 26°N"
)
[4]:
(<Figure size 1000x300 with 1 Axes>,
<Axes: title={'center': 'RAPID 26°N'}, xlabel='Time', ylabel='Transport [Sv]'>)
Step 2: Convert to AC1 Format
The next step is to convert the standardised dataset to AC1 format, which follows OceanSITES conventions.
Note: This conversion currently fails because the standardise.py step doesn’t add proper units to the TIME coordinate. This demonstrates the architectural principle that convert.py validates rather than assigns units.
[5]:
from amocatlas import convert, writers, compliance_checker
# Attempt to convert standardised data to AC1 format
print("🔄 Attempting to convert RAPID data to AC1 format...")
try:
ac1_datasets = convert.to_AC1(standardRAPID[0])
ac1_ds = ac1_datasets[0]
print("✅ Conversion successful!")
print(f" Suggested filename: {ac1_ds.attrs['suggested_filename']}")
print(f" Dimensions: {dict(ac1_ds.dims)}")
print(f" Variables: {list(ac1_ds.data_vars.keys())}")
# Save the dataset
output_file = os.path.join(data_path, ac1_ds.attrs['suggested_filename'])
success = writers.save_dataset(ac1_ds, output_file)
if success:
print(f"💾 Saved AC1 file: {output_file}")
# Run compliance check
print("\\n🔍 Running compliance check...")
result = compliance_checker.validate_ac1_file(output_file)
print(f"Status: {'✅ PASS' if result.passed else '❌ FAIL'}")
print(f"Errors: {len(result.errors)}")
print(f"Warnings: {len(result.warnings)}")
if result.errors:
print("\\nFirst few errors:")
for i, error in enumerate(result.errors[:3], 1):
print(f" {i}. {error}")
except Exception as e:
print(f"❌ Conversion failed: {e}")
print("\\nThis is expected because standardise.py needs to be updated to provide proper units.")
print("The convert.py module validates that units are present rather than assigning them.")
🔄 Attempting to convert RAPID data to AC1 format...
✅ Conversion successful!
❌ Conversion failed: 'suggested_filename'
\nThis is expected because standardise.py needs to be updated to provide proper units.
The convert.py module validates that units are present rather than assigning them.
[6]:
plotters.show_attributes(ac1_ds)
information is based on xarray Dataset
[6]:
| Attribute | Value | DType | |
|---|---|---|---|
| 0 | Conventions | CF-1.8, OceanSITES-1.4, ACDD-1.3 | str |
| 1 | format_version | 1.4 | str |
| 2 | data_type | OceanSITES time-series data | str |
| 3 | featureType | timeSeries | str |
| 4 | data_mode | D | str |
| 5 | title | RAPID Atlantic Meridional Overturning Circulat... | str |
| 6 | summary | Component transport time series from the RAPID... | str |
| 7 | source | RAPID moored array observations | str |
| 8 | site_code | RAPID | str |
| 9 | array | RAPID | str |
| 10 | geospatial_lat_min | 26.5 | float |
| 11 | geospatial_lat_max | 26.5 | float |
| 12 | geospatial_lon_min | -79.0 | float |
| 13 | geospatial_lon_max | -13.0 | float |
| 14 | platform_code | RAPID26N | str |
| 15 | time_coverage_start | 20040402T000000 | str |
| 16 | time_coverage_end | 20240327T235959 | str |
| 17 | contributor_name | Ben Moat, Ben Moat | str |
| 18 | contributor_email | ben.moat@noc.ac.uk, ben.moat@noc.ac.uk | str |
| 19 | contributor_id | https://orcid.org/0000-0001-8676-7779, https:/... | str |
| 20 | contributor_role | creator, PI | str |
| 21 | contributing_institutions | National Oceanography Centre (Southampton) (UK) | str |
| 22 | contributing_institutions_vocabulary | https://edmo.seadatanet.org/report/17 | str |
| 23 | contributing_institutions_role | str | |
| 24 | contributing_institutions_role_vocabulary | str | |
| 25 | contributor_role_vocabulary | https://vocab.nerc.ac.uk/collection/W08/current/ | str |
| 26 | source_acknowledgement | Data from the RAPID AMOC observing project is ... | str |
| 27 | license | CC-BY 4.0 | str |
| 28 | doi | doi: 10.5285/3f24651e-2d44-dee3-e063-7086abc0395e | str |
| 29 | date_created | 20251216T150348 | str |
| 30 | processing_level | Data verified against model or other contextua... | str |
| 31 | comment | Converted to AC1 format from moc_transports.nc... | str |
| 32 | naming_authority | AMOCatlas | str |
| 33 | id | OS_RAPID_20040402-20240327_DPR_transports_T12H | str |
| 34 | cdm_data_type | TimeSeries | str |
| 35 | QC_indicator | excellent | str |
| 36 | institution | AMOCatlas Community | str |
Demonstration: Working conversion with manual units fix
To demonstrate what a successful conversion would look like, let’s temporarily fix the TIME units and run the complete workflow:
[7]:
# Temporarily fix the TIME units to demonstrate successful conversion
# (This would normally be done in standardise.py)
demo_ds = standardRAPID[0].copy()
demo_ds['TIME'].attrs['units'] = 'seconds since 1970-01-01T00:00:00Z'
print("🔄 Converting RAPID data to AC1 format (with TIME units fixed)...")
try:
ac1_datasets = convert.to_AC1(demo_ds)
ac1_ds = ac1_datasets[0]
print("✅ Conversion successful!")
print(f" Suggested filename: {ac1_ds.attrs['id']}.nc")
print(f" Dimensions: {dict(ac1_ds.sizes)}")
print(f" Variables: {list(ac1_ds.data_vars.keys())}")
print(f" TIME units: {ac1_ds.TIME.attrs.get('units')}")
print(f" TRANSPORT units: {ac1_ds.TRANSPORT.attrs.get('units')}")
# Inspect the structure
print("\\n📊 Dataset structure:")
print(f" TRANSPORT shape: {ac1_ds.TRANSPORT.shape}")
print(f" Component names: {list(ac1_ds.TRANSPORT_NAME.values)}")
print(f" Global attributes: {len(ac1_ds.attrs)} attributes")
# Save the dataset using the writers module
output_file = os.path.join(data_path, ac1_ds.attrs['id'] + ".nc")
print(f"\\n💾 Saving to: {output_file}")
success = writers.save_dataset(ac1_ds, output_file)
if success:
print(f"✅ Successfully saved AC1 file!")
# File size check
file_size = os.path.getsize(output_file)
print(f" File size: {file_size:,} bytes")
else:
print("❌ Failed to save file")
except Exception as e:
print(f"❌ Conversion failed: {e}")
import traceback
traceback.print_exc()
🔄 Converting RAPID data to AC1 format (with TIME units fixed)...
✅ Conversion successful!
Suggested filename: OS_RAPID_20040402-20240327_DPR_transports_T12H.nc
Dimensions: {'TIME': 14599, 'LATITUDE': 1, 'N_COMPONENT': 8}
Variables: ['TRANSPORT', 'MOC_TRANSPORT', 'TRANSPORT_NAME', 'TRANSPORT_DESCRIPTION']
TIME units: seconds since 1970-01-01T00:00:00Z
TRANSPORT units: sverdrup
\n📊 Dataset structure:
TRANSPORT shape: (8, 14599)
Component names: [np.str_('Florida Straits'), np.str_('Ekman'), np.str_('Upper Mid-Ocean'), np.str_('Thermocline'), np.str_('Intermediate Water'), np.str_('Upper NADW'), np.str_('Lower NADW'), np.str_('AABW')]
Global attributes: 37 attributes
\n💾 Saving to: /home/runner/work/AMOCatlas/AMOCatlas/data/OS_RAPID_20040402-20240327_DPR_transports_T12H.nc
✅ Successfully saved AC1 file!
File size: 937,465 bytes
Step 3: Compliance Checking
Run the AC1 compliance checker to validate the converted file against the specification:
[8]:
# Run compliance check on the created file
if 'output_file' in locals() and os.path.exists(output_file):
print("🔍 Running AC1 compliance check...")
result = compliance_checker.validate_ac1_file(output_file)
print(f"\\n📊 Compliance Results:")
print(f" Status: {'✅ PASS' if result.passed else '❌ FAIL'}")
print(f" File Type: {result.file_type}")
print(f" Errors: {len(result.errors)}")
print(f" Warnings: {len(result.warnings)}")
if result.errors:
print(f"\\n❌ Errors ({len(result.errors)} total):")
for i, error in enumerate(result.errors[:5], 1):
print(f" {i}. {error}")
if len(result.errors) > 5:
print(f" ... and {len(result.errors) - 5} more errors")
if result.warnings:
print(f"\\n⚠️ Warnings ({len(result.warnings)} total):")
for i, warning in enumerate(result.warnings[:3], 1):
print(f" {i}. {warning}")
if len(result.warnings) > 3:
print(f" ... and {len(result.warnings) - 3} more warnings")
# Show validation categories
print(f"\\n🔧 What the compliance checker validates:")
print(" ✓ Filename pattern (OceanSITES conventions)")
print(" ✓ Required dimensions and variables")
print(" ✓ Variable attributes (units, standard_name, vocabulary)")
print(" ✓ Global attributes (conventions, metadata)")
print(" ✓ Data value ranges (coordinates, valid_min/max)")
print(" ✓ CF convention compliance (dimension ordering)")
else:
print("❌ No AC1 file available for compliance checking")
print("Please ensure the conversion step above succeeded first.")
🔍 Running AC1 compliance check...
\n📊 Compliance Results:
Status: ✅ PASS
File Type: component_transports
Errors: 0
Warnings: 0
\n🔧 What the compliance checker validates:
✓ Filename pattern (OceanSITES conventions)
✓ Required dimensions and variables
✓ Variable attributes (units, standard_name, vocabulary)
✓ Global attributes (conventions, metadata)
✓ Data value ranges (coordinates, valid_min/max)
✓ CF convention compliance (dimension ordering)