standardise
Functions to apply naming conventions, units, and metadata standards to datasets.
Standardisation functions for AMOC observing array datasets.
These functions take raw loaded datasets and: - Rename variables to standard names - Add variable-level metadata - Add or update global attributes - Prepare datasets for downstream analysis
Currently implemented: - SAMBA
- amocatlas.standardise.clean_metadata(attrs: dict, preferred_keys: dict = None) dict[source]
Clean up a metadata dictionary.
Normalize key casing
Merge aliases with identical values
Apply standard naming (via preferred_keys mapping)
- amocatlas.standardise.get_dynamic_version() str[source]
Get the actual software version using multiple detection methods.
Priority: 1. Git describe (for development in git repo) 2. Installed package version (for pip/conda installs) 3. Fallback to __version__ file
- Returns:
Software version string
- Return type:
str
- amocatlas.standardise.merge_metadata_aliases(attrs: dict, preferred_keys: dict) dict[source]
Consolidate and rename metadata keys case‑insensitively (except featureType), using preferred_keys to map aliases to canonical names.
- Parameters:
attrs (dict) – Metadata dictionary with potential duplicates.
preferred_keys (dict) – Mapping of lowercase alias keys to preferred canonical keys.
- Returns:
Metadata dictionary with duplicates merged and keys renamed.
- Return type:
dict
- amocatlas.standardise.normalize_and_add_vocabulary(attrs: dict, normalizations: dict[str, tuple[dict[str, str], str]]) dict[source]
For each (attr, (value_map, vocab_url)) in normalizations.
- If attr exists in attrs:
Map attrs[attr] using value_map (or leave it if unmapped)
Add attrs[f”{attr}_vocabulary”] = vocab_url
- Parameters:
attrs (dict) – Metadata attributes, already cleaned & renamed.
normalizations (dict) – Keys are canonical attr names (e.g. “platform”), values are (value_map, vocabulary_url) tuples.
- Returns:
attrs with normalized values and added <attr>_vocabulary entries.
- Return type:
dict
- amocatlas.standardise.reorder_metadata(attrs: dict) dict[source]
Return a new dict with keys ordered according to the OG1.0 global‐attribute list. Any attrs not in the spec list are appended at the end, in their original order.
- amocatlas.standardise.resolve_metadata_conflict(key: str, existing_value: str, new_value: str, existing_source: str = 'unknown', new_source: str = 'unknown') str[source]
Resolve metadata conflicts using consistent logic with detailed warnings.
Resolution rules: 1. If values are identical, return without warning 2. If one is empty/whitespace and other isn’t, use non-empty 3. Otherwise, use longer value and warn about the conflict
- Parameters:
key (str) – Metadata key name
existing_value (str) – Current value
new_value (str) – New value attempting to override
existing_source (str) – Description of where existing value came from
new_source (str) – Description of where new value came from
- Returns:
The resolved value to use
- Return type:
str
- amocatlas.standardise.standardise_41n(ds: Dataset, file_name: str) Dataset[source]
Standardise 41N array dataset to consistent format.
- amocatlas.standardise.standardise_47n(ds: Dataset, file_name: str) Dataset[source]
Standardise 47N array dataset to a consistent format.
- Parameters:
ds (xr.Dataset) – Raw 47N array dataset to standardise.
file_name (str) – Original filename associated with the dataset, used for metadata.
- Returns:
Standardised dataset with consistent metadata and formatting for the 47N array.
- Return type:
xr.Dataset
- amocatlas.standardise.standardise_arcticgateway(ds: Dataset, file_name: str) Dataset[source]
Standardise Arctic Gateway array dataset to consistent format.
- amocatlas.standardise.standardise_array(ds: Dataset, file_name: str) Dataset[source]
Standardise a mooring array dataset using YAML-based metadata.
Deprecated since version This: function is deprecated. Use
standardise_data()instead.- Parameters:
ds (xr.Dataset) – Raw dataset loaded from a reader with amocatlas_datasource metadata.
file_name (str) – Filename (e.g., ‘moc_transports.nc’) expected to match ds.attrs[“source_file”].
- Returns:
Standardised dataset with renamed variables and enriched metadata.
- Return type:
xr.Dataset
- amocatlas.standardise.standardise_calafat2025(ds: Dataset, file_name: str) Dataset[source]
Standardise CALAFAT2025 array dataset to consistent format.
- amocatlas.standardise.standardise_data(ds: Dataset, file_name: str) Dataset[source]
Standardise a dataset using YAML-based metadata.
- Parameters:
ds (xr.Dataset) – Raw dataset loaded from a reader with amocatlas_datasource metadata.
file_name (str) – Filename (e.g., ‘moc_transports.nc’) expected to match ds.attrs[“source_file”].
- Returns:
Standardised dataset with renamed variables and enriched metadata.
- Return type:
xr.Dataset
- Raises:
ValueError – If file_name does not match ds.attrs[“source_file”].
ValueError – If amocatlas_datasource is not found in dataset metadata.
- amocatlas.standardise.standardise_dso(ds: Dataset, file_name: str) Dataset[source]
Standardise DSO array dataset to consistent format.
- amocatlas.standardise.standardise_fbc(ds: Dataset, file_name: str) Dataset[source]
Standardise FBC array dataset to consistent format.
- amocatlas.standardise.standardise_fw2015(ds: Dataset, file_name: str) Dataset[source]
Standardise FW2015 array dataset to consistent format.
- amocatlas.standardise.standardise_mocha(ds: Dataset, file_name: str) Dataset[source]
Standardise MOCHA array dataset to consistent format.
- amocatlas.standardise.standardise_move(ds: Dataset, file_name: str) Dataset[source]
Standardise MOVE array dataset to consistent format.
- Parameters:
ds (xr.Dataset) – Raw MOVE dataset to standardise.
file_name (str) – Original filename for metadata.
- Returns:
Standardised dataset with consistent metadata and formatting.
- Return type:
xr.Dataset
- amocatlas.standardise.standardise_osnap(ds: Dataset, file_name: str) Dataset[source]
Standardise OSNAP array dataset to consistent format.
- amocatlas.standardise.standardise_rapid(ds: Dataset, file_name: str) Dataset[source]
Standardise RAPID array dataset to consistent format.
Deprecated since version This: function is deprecated. Use
standardise_data()instead.- Parameters:
ds (xr.Dataset) – Raw RAPID dataset to standardise.
file_name (str) – Original filename for metadata.
- Returns:
Standardised dataset with consistent metadata and formatting.
- Return type:
xr.Dataset
- amocatlas.standardise.standardise_samba(ds: Dataset, file_name: str) Dataset[source]
Standardise SAMBA array dataset to consistent format.
Deprecated since version This: function is deprecated. Use
standardise_data()instead.- Parameters:
ds (xr.Dataset) – Raw SAMBA dataset to standardise.
file_name (str) – Original filename for metadata.
- Returns:
Standardised dataset with consistent metadata and formatting.
- Return type:
xr.Dataset
- amocatlas.standardise.standardise_zheng2024(ds: Dataset, file_name: str) Dataset[source]
Standardise ZHENG2024 array dataset to consistent format.
- amocatlas.standardise.standardize_depth_coordinate(ds: Dataset) Dataset[source]
Standardize DEPTH coordinate to comply with AMOCatlas specifications.
All datasets with a DEPTH coordinate should have standardized attributes: - data type: double - long_name: “Depth below surface of the water” - standard_name: “depth” - units: “meters”
- Parameters:
ds (xr.Dataset) – Dataset to standardize DEPTH coordinate for.
- Returns:
Dataset with standardized DEPTH coordinate attributes.
- Return type:
xr.Dataset
- amocatlas.standardise.standardize_latitude_coordinate(ds: Dataset) Dataset[source]
Standardize LATITUDE coordinate to comply with AMOCatlas specifications.
All datasets with a LATITUDE coordinate should have standardized attributes: - data type: double - long_name: “Latitude north (WGS84)” - standard_name: “latitude” - units: “degree_north”
- Parameters:
ds (xr.Dataset) – Dataset to standardize LATITUDE coordinate for.
- Returns:
Dataset with standardized LATITUDE coordinate attributes.
- Return type:
xr.Dataset
- amocatlas.standardise.standardize_longitude_coordinate(ds: Dataset) Dataset[source]
Standardize LONGITUDE coordinate to comply with AMOCatlas specifications.
All datasets with a LONGITUDE coordinate should have standardized attributes: - data type: double - long_name: “longitude east (WGS84)” - standard_name: “longitude” - units: “degree_east”
- Parameters:
ds (xr.Dataset) – Dataset to standardize LONGITUDE coordinate for.
- Returns:
Dataset with standardized LONGITUDE coordinate attributes.
- Return type:
xr.Dataset
- amocatlas.standardise.standardize_sigma0_coordinate(ds: Dataset) Dataset[source]
Standardize SIGMA0 coordinate to comply with AMOCatlas specifications.
All datasets with a SIGMA0 coordinate should have standardized attributes: - data type: double - long_name: “Potential density anomaly to 1000 kg/m3, surface reference” - standard_name: “sea_water_sigma_theta” - units: “kg m-3”
- Parameters:
ds (xr.Dataset) – Dataset to standardize SIGMA0 coordinate for.
- Returns:
Dataset with standardized SIGMA0 coordinate attributes.
- Return type:
xr.Dataset
- amocatlas.standardise.standardize_time_coordinate(ds: Dataset) Dataset[source]
Standardize TIME coordinate to comply with AMOCatlas specifications.
All datasets with a TIME coordinate should have standardized attributes: - data type: datetime64[ns] - long_name: “Time elapsed since 1970-01-01T00:00:00Z” - standard_name: “time” - calendar: “gregorian” - units: “seconds since 1970-01-01T00:00:00Z” - vocabulary: “http://vocab.nerc.ac.uk/collection/OG1/current/TIME/”
- Parameters:
ds (xr.Dataset) – Dataset to standardize TIME coordinate for.
- Returns:
Dataset with standardized TIME coordinate attributes.
- Return type:
xr.Dataset
- amocatlas.standardise.standardize_units(ds: Dataset) Dataset[source]
Standardize variable units throughout the dataset.
Uses the comprehensive unit mapping from utilities module.
- Parameters:
ds (xr.Dataset) – Dataset to standardize units for.
- Returns:
Dataset with standardized variable units.
- Return type:
xr.Dataset