standardise

Functions to apply naming conventions, units, and metadata standards to datasets.

Standardisation functions for AMOC observing array datasets.

These functions take raw loaded datasets and: - Rename variables to standard names - Add variable-level metadata - Add or update global attributes - Prepare datasets for downstream analysis

Currently implemented: - SAMBA

amocatlas.standardise.clean_metadata(attrs: dict, preferred_keys: dict = None) dict[source]

Clean up a metadata dictionary.

  • Normalize key casing

  • Merge aliases with identical values

  • Apply standard naming (via preferred_keys mapping)

amocatlas.standardise.get_dynamic_version() str[source]

Get the actual software version using multiple detection methods.

Priority: 1. Git describe (for development in git repo) 2. Installed package version (for pip/conda installs) 3. Fallback to __version__ file

Returns:

Software version string

Return type:

str

amocatlas.standardise.merge_metadata_aliases(attrs: dict, preferred_keys: dict) dict[source]

Consolidate and rename metadata keys case‑insensitively (except featureType), using preferred_keys to map aliases to canonical names.

Parameters:
  • attrs (dict) – Metadata dictionary with potential duplicates.

  • preferred_keys (dict) – Mapping of lowercase alias keys to preferred canonical keys.

Returns:

Metadata dictionary with duplicates merged and keys renamed.

Return type:

dict

amocatlas.standardise.normalize_and_add_vocabulary(attrs: dict, normalizations: dict[str, tuple[dict[str, str], str]]) dict[source]

For each (attr, (value_map, vocab_url)) in normalizations.

  • If attr exists in attrs:
    • Map attrs[attr] using value_map (or leave it if unmapped)

    • Add attrs[f”{attr}_vocabulary”] = vocab_url

Parameters:
  • attrs (dict) – Metadata attributes, already cleaned & renamed.

  • normalizations (dict) – Keys are canonical attr names (e.g. “platform”), values are (value_map, vocabulary_url) tuples.

Returns:

attrs with normalized values and added <attr>_vocabulary entries.

Return type:

dict

amocatlas.standardise.reorder_metadata(attrs: dict) dict[source]

Return a new dict with keys ordered according to the OG1.0 global‐attribute list. Any attrs not in the spec list are appended at the end, in their original order.

amocatlas.standardise.resolve_metadata_conflict(key: str, existing_value: str, new_value: str, existing_source: str = 'unknown', new_source: str = 'unknown') str[source]

Resolve metadata conflicts using consistent logic with detailed warnings.

Resolution rules: 1. If values are identical, return without warning 2. If one is empty/whitespace and other isn’t, use non-empty 3. Otherwise, use longer value and warn about the conflict

Parameters:
  • key (str) – Metadata key name

  • existing_value (str) – Current value

  • new_value (str) – New value attempting to override

  • existing_source (str) – Description of where existing value came from

  • new_source (str) – Description of where new value came from

Returns:

The resolved value to use

Return type:

str

amocatlas.standardise.standardise_41n(ds: Dataset, file_name: str) Dataset[source]

Standardise 41N array dataset to consistent format.

amocatlas.standardise.standardise_47n(ds: Dataset, file_name: str) Dataset[source]

Standardise 47N array dataset to a consistent format.

Parameters:
  • ds (xr.Dataset) – Raw 47N array dataset to standardise.

  • file_name (str) – Original filename associated with the dataset, used for metadata.

Returns:

Standardised dataset with consistent metadata and formatting for the 47N array.

Return type:

xr.Dataset

amocatlas.standardise.standardise_arcticgateway(ds: Dataset, file_name: str) Dataset[source]

Standardise Arctic Gateway array dataset to consistent format.

amocatlas.standardise.standardise_array(ds: Dataset, file_name: str) Dataset[source]

Standardise a mooring array dataset using YAML-based metadata.

Deprecated since version This: function is deprecated. Use standardise_data() instead.

Parameters:
  • ds (xr.Dataset) – Raw dataset loaded from a reader with amocatlas_datasource metadata.

  • file_name (str) – Filename (e.g., ‘moc_transports.nc’) expected to match ds.attrs[“source_file”].

Returns:

Standardised dataset with renamed variables and enriched metadata.

Return type:

xr.Dataset

amocatlas.standardise.standardise_calafat2025(ds: Dataset, file_name: str) Dataset[source]

Standardise CALAFAT2025 array dataset to consistent format.

amocatlas.standardise.standardise_data(ds: Dataset, file_name: str) Dataset[source]

Standardise a dataset using YAML-based metadata.

Parameters:
  • ds (xr.Dataset) – Raw dataset loaded from a reader with amocatlas_datasource metadata.

  • file_name (str) – Filename (e.g., ‘moc_transports.nc’) expected to match ds.attrs[“source_file”].

Returns:

Standardised dataset with renamed variables and enriched metadata.

Return type:

xr.Dataset

Raises:
  • ValueError – If file_name does not match ds.attrs[“source_file”].

  • ValueError – If amocatlas_datasource is not found in dataset metadata.

amocatlas.standardise.standardise_dso(ds: Dataset, file_name: str) Dataset[source]

Standardise DSO array dataset to consistent format.

amocatlas.standardise.standardise_fbc(ds: Dataset, file_name: str) Dataset[source]

Standardise FBC array dataset to consistent format.

amocatlas.standardise.standardise_fw2015(ds: Dataset, file_name: str) Dataset[source]

Standardise FW2015 array dataset to consistent format.

amocatlas.standardise.standardise_mocha(ds: Dataset, file_name: str) Dataset[source]

Standardise MOCHA array dataset to consistent format.

amocatlas.standardise.standardise_move(ds: Dataset, file_name: str) Dataset[source]

Standardise MOVE array dataset to consistent format.

Parameters:
  • ds (xr.Dataset) – Raw MOVE dataset to standardise.

  • file_name (str) – Original filename for metadata.

Returns:

Standardised dataset with consistent metadata and formatting.

Return type:

xr.Dataset

amocatlas.standardise.standardise_osnap(ds: Dataset, file_name: str) Dataset[source]

Standardise OSNAP array dataset to consistent format.

amocatlas.standardise.standardise_rapid(ds: Dataset, file_name: str) Dataset[source]

Standardise RAPID array dataset to consistent format.

Deprecated since version This: function is deprecated. Use standardise_data() instead.

Parameters:
  • ds (xr.Dataset) – Raw RAPID dataset to standardise.

  • file_name (str) – Original filename for metadata.

Returns:

Standardised dataset with consistent metadata and formatting.

Return type:

xr.Dataset

amocatlas.standardise.standardise_samba(ds: Dataset, file_name: str) Dataset[source]

Standardise SAMBA array dataset to consistent format.

Deprecated since version This: function is deprecated. Use standardise_data() instead.

Parameters:
  • ds (xr.Dataset) – Raw SAMBA dataset to standardise.

  • file_name (str) – Original filename for metadata.

Returns:

Standardised dataset with consistent metadata and formatting.

Return type:

xr.Dataset

amocatlas.standardise.standardise_zheng2024(ds: Dataset, file_name: str) Dataset[source]

Standardise ZHENG2024 array dataset to consistent format.

amocatlas.standardise.standardize_depth_coordinate(ds: Dataset) Dataset[source]

Standardize DEPTH coordinate to comply with AMOCatlas specifications.

All datasets with a DEPTH coordinate should have standardized attributes: - data type: double - long_name: “Depth below surface of the water” - standard_name: “depth” - units: “meters”

Parameters:

ds (xr.Dataset) – Dataset to standardize DEPTH coordinate for.

Returns:

Dataset with standardized DEPTH coordinate attributes.

Return type:

xr.Dataset

amocatlas.standardise.standardize_latitude_coordinate(ds: Dataset) Dataset[source]

Standardize LATITUDE coordinate to comply with AMOCatlas specifications.

All datasets with a LATITUDE coordinate should have standardized attributes: - data type: double - long_name: “Latitude north (WGS84)” - standard_name: “latitude” - units: “degree_north”

Parameters:

ds (xr.Dataset) – Dataset to standardize LATITUDE coordinate for.

Returns:

Dataset with standardized LATITUDE coordinate attributes.

Return type:

xr.Dataset

amocatlas.standardise.standardize_longitude_coordinate(ds: Dataset) Dataset[source]

Standardize LONGITUDE coordinate to comply with AMOCatlas specifications.

All datasets with a LONGITUDE coordinate should have standardized attributes: - data type: double - long_name: “longitude east (WGS84)” - standard_name: “longitude” - units: “degree_east”

Parameters:

ds (xr.Dataset) – Dataset to standardize LONGITUDE coordinate for.

Returns:

Dataset with standardized LONGITUDE coordinate attributes.

Return type:

xr.Dataset

amocatlas.standardise.standardize_sigma0_coordinate(ds: Dataset) Dataset[source]

Standardize SIGMA0 coordinate to comply with AMOCatlas specifications.

All datasets with a SIGMA0 coordinate should have standardized attributes: - data type: double - long_name: “Potential density anomaly to 1000 kg/m3, surface reference” - standard_name: “sea_water_sigma_theta” - units: “kg m-3”

Parameters:

ds (xr.Dataset) – Dataset to standardize SIGMA0 coordinate for.

Returns:

Dataset with standardized SIGMA0 coordinate attributes.

Return type:

xr.Dataset

amocatlas.standardise.standardize_time_coordinate(ds: Dataset) Dataset[source]

Standardize TIME coordinate to comply with AMOCatlas specifications.

All datasets with a TIME coordinate should have standardized attributes: - data type: datetime64[ns] - long_name: “Time elapsed since 1970-01-01T00:00:00Z” - standard_name: “time” - calendar: “gregorian” - units: “seconds since 1970-01-01T00:00:00Z” - vocabulary: “http://vocab.nerc.ac.uk/collection/OG1/current/TIME/

Parameters:

ds (xr.Dataset) – Dataset to standardize TIME coordinate for.

Returns:

Dataset with standardized TIME coordinate attributes.

Return type:

xr.Dataset

amocatlas.standardise.standardize_units(ds: Dataset) Dataset[source]

Standardize variable units throughout the dataset.

Uses the comprehensive unit mapping from utilities module.

Parameters:

ds (xr.Dataset) – Dataset to standardize units for.

Returns:

Dataset with standardized variable units.

Return type:

xr.Dataset