`amocarray API`

Load and process transport estimates from major AMOC observing arrays.

readers

Shared utilities and base classes for AMOC readers.

amocarray.readers.load_dataset(array_name: str, source: str = None, file_list: str | list[str] = None, transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False) → list[Dataset][source]

Load raw datasets from a selected AMOC observing array.

Parameters:

array_name (str) – The name of the observing array to load. Options are: - ‘move’ : MOVE 16N array - ‘rapid’ : RAPID 26N array - ‘osnap’ : OSNAP array - ‘samba’ : SAMBA 34S array - ‘fw2015’ : FW2015 array - ‘41n’ : 41N array - ‘dso’ : DSO array
source (str, optional) – URL or local path to the data source. If None, the reader-specific default source will be used.
file_list (str or list of str, optional) – Filename or list of filenames to process. If None, the reader-specific default files will be used.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, optional) – Local directory for downloaded files.
redownload (bool, optional) – If True, force redownload of the data.

Returns:

List of datasets loaded from the specified array.

Return type:

list of xarray.Dataset

Raises:

ValueError – If an unknown array name is provided.

amocarray.readers.load_sample_dataset(array_name: str = 'rapid') → Dataset[source]

Load a sample dataset for quick testing.

Currently supports: - ‘rapid’ : loads the ‘RAPID_26N_TRANSPORT.nc’ file

Parameters:: array_name (str, optional) – The name of the observing array to load. Default is ‘rapid’.
Returns:: A single xarray Dataset from the sample file.
Return type:: xr.Dataset
Raises:: ValueError – If the array_name is not recognised.

Submodules

read_rapid

Reader for RAPID-MOCHA-WBTS array data at 26°N.

Load the RAPID transport dataset from a URL or local file path into an xarray.Dataset.

Parameters:

source (str, optional) – URL or local path to the NetCDF file(s). Defaults to the RAPID data repository URL.
file_list (str or list of str, optional) – Filename or list of filenames to process. If None, will attempt to list files in the source directory.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.

Returns:

The loaded xarray dataset with basic inline metadata.

Return type:

xr.Dataset

Raises:

ValueError – If the source is neither a valid URL nor a directory path.
FileNotFoundError – If no valid NetCDF files are found in the provided file list.

read_osnap

Reader for OSNAP (Overturning in the Subpolar North Atlantic Program) data.

amocarray.read_osnap.read_osnap(source: str, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False) → list[Dataset][source]

Load the OSNAP transport datasets from a URL or local file path into xarray Datasets.

Parameters:

source (str, optional) – Local path to the data directory (remote source is handled per-file).
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to OSNAP_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.

Returns:

List of loaded xarray datasets with basic inline and file-specific metadata.

Return type:

list of xr.Dataset

Raises:

ValueError – If no source is provided for a file and no default URL mapping is found.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.

read_move

Reader for MOVE (Meridional Overturning Variability Experiment) data.

amocarray.read_move.read_move(source: str, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False) → list[Dataset][source]

Load the MOVE transport dataset from a URL or local file path into xarray Datasets.

Parameters:

source (str, optional) – URL or local path to the NetCDF file(s). Defaults to the MOVE data repository URL.
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to MOVE_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.

Returns:

List of loaded xarray datasets with basic inline and file-specific metadata.

Return type:

list of xr.Dataset

Raises:

ValueError – If the source is neither a valid URL nor a directory path.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.

read_samba

Reader for SAMBA (South Atlantic MOC Basin-wide Array) data.

Load the SAMBA transport datasets from remote URL or local file path into xarray Datasets.

Parameters:

source (str, optional) – URL or local path to the dataset directory. If None, will use predefined URLs per file.
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to SAMBA_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.

Returns:

List of loaded xarray datasets with basic inline and file-specific metadata.

Return type:

list of xr.Dataset

Raises:

ValueError – If no source is provided for a file and no default URL mapping found.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.

standardise

Functions to apply naming conventions, units, and metadata standards to datasets.

Standardisation functions for AMOC observing array datasets.

These functions take raw loaded datasets and: - Rename variables to standard names - Add variable-level metadata - Add or update global attributes - Prepare datasets for downstream analysis

Currently implemented: - SAMBA

amocarray.standardise.clean_metadata(attrs: dict, preferred_keys: dict = None) → dict[source]: Clean up a metadata dictionary: - Normalize key casing - Merge aliases with identical values - Apply standard naming (via preferred_keys mapping)

amocarray.standardise.merge_metadata_aliases(attrs: dict, preferred_keys: dict) → dict[source]

Consolidate and rename metadata keys case‑insensitively (except featureType), using preferred_keys to map aliases to canonical names.

Parameters:

attrs (dict) – Metadata dictionary with potential duplicates.
preferred_keys (dict) – Mapping of lowercase alias keys to preferred canonical keys.

Returns:

Metadata dictionary with duplicates merged and keys renamed.

Return type:

dict

amocarray.standardise.normalize_and_add_vocabulary(attrs: dict, normalizations: dict[str, tuple[dict[str, str], str]]) → dict[source]

For each (attr, (value_map, vocab_url)) in normalizations:

If attr exists in attrs:
- Map attrs[attr] using value_map (or leave it if unmapped)
- Add attrs[f”{attr}_vocabulary”] = vocab_url

Parameters:

attrs (dict) – Metadata attributes, already cleaned & renamed.
normalizations (dict) – Keys are canonical attr names (e.g. “platform”), values are (value_map, vocabulary_url) tuples.

Returns:

attrs with normalized values and added <attr>_vocabulary entries.

Return type:

dict

amocarray.standardise.reorder_metadata(attrs: dict) → dict[source]: Return a new dict with keys ordered according to the OG1.0 global‐attribute list. Any attrs not in the spec list are appended at the end, in their original order.

amocarray.standardise.standardise_41n(ds: Dataset, file_name: str) → Dataset[source]

amocarray.standardise.standardise_array(ds: Dataset, file_name: str, array_name: str) → Dataset[source]

Standardise a mooring array dataset using YAML-based metadata.

Parameters:

ds (xr.Dataset) – Raw dataset loaded from a reader.
file_name (str) – Filename (e.g., ‘moc_transports.nc’) expected to match ds.attrs[“source_file”].
array_name (str) – Name of the mooring array (e.g., ‘samba’, ‘rapid’, ‘move’, ‘osnap’, ‘fw2015’, ‘mocha’).

Returns:

Standardised dataset with renamed variables and enriched metadata.

Return type:

xr.Dataset

Raises:

ValueError – If file_name does not match ds.attrs[“source_file”].

amocarray.standardise.standardise_dso(ds: Dataset, file_name: str) → Dataset[source]

amocarray.standardise.standardise_fw2015(ds: Dataset, file_name: str) → Dataset[source]

amocarray.standardise.standardise_mocha(ds: Dataset, file_name: str) → Dataset[source]

amocarray.standardise.standardise_move(ds: Dataset, file_name: str) → Dataset[source]

amocarray.standardise.standardise_osnap(ds: Dataset, file_name: str) → Dataset[source]

amocarray.standardise.standardise_rapid(ds: Dataset, file_name: str) → Dataset[source]

amocarray.standardise.standardise_samba(ds: Dataset, file_name: str) → Dataset[source]

plotters

Tools for visualising AMOC time series and transport data.

amocarray.plotters.monthly_resample(da: DataArray) → DataArray[source]: Resample to monthly mean if data is not already monthly.

amocarray.plotters.plot_amoc_timeseries(data, varnames=None, labels=None, colors=None, title='AMOC Time Series', ylabel=None, time_limits=None, ylim=None, figsize=(10, 3), resample_monthly=True, plot_raw=True)[source]

Plot original and optionally monthly-averaged AMOC time series for one or more datasets.

Parameters:

data (list of xarray.Dataset or xarray.DataArray) – List of datasets or DataArrays to plot.
varnames (list of str, optional) – List of variable names to extract from each dataset. Not needed if DataArrays are passed.
labels (list of str, optional) – Labels for the legend.
colors (list of str, optional) – Colors for monthly-averaged plots.
title (str) – Title of the plot.
ylabel (str, optional) – Label for the y-axis. If None, inferred from attributes.
time_limits (tuple of str or pd.Timestamp, optional) – X-axis time limits (start, end).
ylim (tuple of float, optional) – Y-axis limits (min, max).
figsize (tuple) – Size of the figure.
resample_monthly (bool) – If True, monthly averages are computed and plotted.
plot_raw (bool) – If True, raw data is plotted.

amocarray.plotters.plot_monthly_anomalies(**kwargs) → tuple[Figure, list[Axes]][source]: Plot the monthly anomalies for various datasets. Pass keyword arguments in the form: label_name_data, label_name_label. For example:

osnap_data = standardOSNAP[0][“MOC_all”], osnap_label = “OSNAP” …

amocarray.plotters.show_attributes(data: str | Dataset) → DataFrame[source]

Processes an xarray Dataset or a netCDF file, extracts attribute information, and returns a DataFrame with details about the attributes.

Parameters:: data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
Returns:: A DataFrame containing the following columns: - Attribute: The name of the attribute. - Value: The value of the attribute. - DType: The data type of the attribute.
Return type:: pandas.DataFrame
Raises:: TypeError – If the input data is not a file path (str) or an xarray Dataset.

amocarray.plotters.show_contents(data: str | Dataset, content_type: str = 'variables') → Styler | DataFrame[source]

Wrapper function to show contents of an xarray Dataset or a netCDF file.

Parameters:

data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
content_type (str, optional) – The type of content to show, either ‘variables’ (or ‘vars’) or ‘attributes’ (or ‘attrs’). Default is ‘variables’.

Returns:

A styled DataFrame with details about the variables or attributes.

Return type:

pandas.io.formats.style.Styler or pandas.DataFrame

Raises:

TypeError – If the input data is not a file path (str) or an xarray Dataset.
ValueError – If the content_type is not ‘variables’ (or ‘vars’) or ‘attributes’ (or ‘attrs’).

amocarray.plotters.show_variables(data: str | Dataset) → Styler[source]

Processes an xarray Dataset or a netCDF file, extracts variable information, and returns a styled DataFrame with details about the variables.

Parameters:: data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
Returns:: A styled DataFrame containing the following columns: - dims: The dimension of the variable (or “string” if it is a string type). - name: The name of the variable. - units: The units of the variable (if available). - comment: Any additional comments about the variable (if available). - standard_name: The standard name of the variable (if available). - dtype: The data type of the variable.
Return type:: pd.io.formats.style.Styler
Raises:: TypeError – If the input data is not a file path (str) or an xarray Dataset.

amocarray.plotters.show_variables_by_dimension(data: str | Dataset, dimension_name: str = 'trajectory') → Styler[source]

Extracts variable information from an xarray Dataset or a netCDF file and returns a styled DataFrame with details about the variables filtered by a specific dimension.

Parameters:

data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
dimension_name (str, optional) – The name of the dimension to filter variables by, by default “trajectory”.

Returns:

A styled DataFrame containing the following columns: - dims: The dimension of the variable (or “string” if it is a string type). - name: The name of the variable. - units: The units of the variable (if available). - comment: Any additional comments about the variable (if available).

Return type:

pandas.io.formats.style.Styler

Raises:

TypeError – If the input data is not a file path (str) or an xarray Dataset.

writers

amocarray.writers.save_dataset(ds: Dataset, output_file: str = '../test.nc') → bool[source]

Attempts to save the dataset to a NetCDF file. If a TypeError occurs due to invalid attribute values, it converts the invalid attributes to strings and retries the save operation.

Parameters:

ds (xarray.Dataset) – The dataset to be saved.
output_file (str, optional) – The path to the output NetCDF file. Defaults to ‘../test.nc’.

Returns:

True if the dataset was saved successfully, False otherwise.

Return type:

bool

Notes

This function is based on a workaround for issues with saving datasets containing attributes of unsupported types. See: https://github.com/pydata/xarray/issues/3743

tools

Helper functions for data manipulation, unit conversion, and clean-up.

amocarray.tools.convert_units_var(var_values: ndarray | float, current_unit: str, new_unit: str, unit_conversion: dict[str, dict[str, float]] = {'Celsius': {'degrees_Celsius': 1.0}, 'Pa': {'dbar': 0.0001}, 'S/m': {'mS/cm': 0.1}, 'Sv': {'Sverdrup': 1.0}, 'Sverdrup': {'Sv': 1}, 'cm': {'m': 0.01}, 'cm s-1': {'m s-1': 0.01}, 'cm/s': {'m/s': 0.01}, 'dbar': {'Pa': 10000, 'kPa': 10}, 'degrees_Celsius': {'Celsius': 1}, 'g m-3': {'kg m-3': 0.001}, 'kPa': {'dbar': 0.1}, 'kg m-3': {'g m-3': 1000.0}, 'km': {'m': 1000.0}, 'm': {'cm': 100, 'km': 0.001}, 'm s-1': {'cm s-1': 100.0}, 'm/s': {'cm/s': 100.0}, 'mS/cm': {'S/m': 10.0}}) → ndarray | float[source]

Converts variable values from one unit to another using a predefined conversion factor.

Parameters:

var_values (numpy.ndarray or float) – The values to be converted.
current_unit (str) – The current unit of the variable values.
new_unit (str) – The target unit to which the variable values should be converted.
unit_conversion (dict of {str: dict of {str: float}}, optional) – A dictionary containing conversion factors between units. The default is unit_conversion.

Returns:

The converted variable values. If no conversion factor is found, the original values are returned.

Return type:

numpy.ndarray or float

Raises:

KeyError – If the conversion factor for the specified units is not found in the unit_conversion dictionary.

Notes

If the conversion factor for the specified units is not available, a message is printed, and the original values are returned without any conversion.

amocarray.tools.find_best_dtype(var_name: str, da: DataArray) → dtype[source]

Determines the most suitable data type for a given variable.

Parameters:

var_name (str) – The name of the variable.
da (xarray.DataArray) – The data array containing the variable’s values.

Returns:

The optimal data type for the variable based on its name and values.

Return type:

numpy.dtype

amocarray.tools.generate_reverse_conversions(forward_conversions: dict[str, dict[str, float]]) → dict[str, dict[str, float]][source]

Create a unit conversion dictionary with both forward and reverse conversions.

Parameters:: forward_conversions (dict of {str: dict of {str: float}}) – Mapping of source units to target units and conversion factors. Example: {“m”: {“cm”: 100, “km”: 0.001}}
Returns:: dict of {str – Complete mapping of units including reverse conversions. Example: {“cm”: {“m”: 0.01}, “km”: {“m”: 1000}}
Return type:: dict of {str: float}}

Notes

If a conversion factor is zero, a warning is printed, and the reverse conversion is skipped.

amocarray.tools.reformat_units_var(ds: Dataset, var_name: str, unit_format: dict[str, str] = {'S/m': 'S m-1', 'cm/s': 'cm s-1', 'degrees_Celsius': 'Celsius', 'g/m^3': 'g m-3', 'm/s': 'm s-1', 'meters': 'm'}) → str[source]

Reformat the units of a variable in the dataset based on a provided mapping.

Parameters:

ds (xarray.Dataset) – The input dataset containing variables with units to be reformatted.
var_name (str) – The name of the variable whose units need to be reformatted.
unit_format (dict of {str: str}, optional) – A dictionary mapping old unit strings to new formatted unit strings. Defaults to unit_str_format.

Returns:

The reformatted unit string. If the old unit is not found in unit_format, the original unit string is returned.

Return type:

str

amocarray.tools.set_best_dtype(ds: Dataset) → Dataset[source]

Adjust the data types of variables in a dataset to optimize memory usage.

Parameters:: ds (xarray.Dataset) – The input dataset whose variables’ data types will be adjusted.
Returns:: The dataset with updated data types for its variables, potentially saving memory.
Return type:: xarray.Dataset

Notes

The function determines the best data type for each variable using find_best_dtype.
Attributes like valid_min and valid_max are updated to match the new data type.
If the new data type is integer-based, NaN values are replaced with a fill value.
Logs the percentage of memory saved after the data type adjustments.

amocarray.tools.set_fill_value(new_dtype: dtype) → int[source]

Calculate the fill value for a given data type.

Parameters:: new_dtype (numpy.dtype) – The data type for which the fill value is to be calculated.
Returns:: The calculated fill value based on the bit-width of the data type.
Return type:: int

utilities

Shared utilities for downloading, reading, and parsing data files.

amocarray.utilities.apply_defaults(default_source: str, default_files: List[str]) → Callable[source]

Decorator to apply default values for ‘source’ and ‘file_list’ parameters if they are None.

Parameters:

default_source (str) – Default source URL or path.
default_files (list of str) – Default list of filenames.

Returns:

A wrapped function with defaults applied.

Return type:

Callable

amocarray.utilities.download_file(url: str, dest_folder: str, redownload: bool = False, filename: str = None) → str[source]

Download a file from HTTP(S) or FTP to the specified destination folder.

Parameters:

url (str) – The URL of the file to download.
dest_folder (str) – Local folder to save the downloaded file.
redownload (bool, optional) – If True, force re-download of the file even if it exists.
filename (str, optional) – Optional filename to save the file as. If not given, uses the name from the URL.

Returns:

The full path to the downloaded file.

Return type:

str

Raises:

ValueError – If the URL scheme is unsupported.

amocarray.utilities.get_default_data_dir() → Path[source]

amocarray.utilities.get_project_root() → Path[source]: Return the absolute path to the project root directory.

amocarray.utilities.load_array_metadata(array_name: str) → dict[source]

Load metadata YAML for a given mooring array.

Parameters:: array_name (str) – Name of the mooring array (e.g., ‘samba’).
Returns:: Dictionary containing the parsed YAML metadata.
Return type:: dict

amocarray.utilities.normalize_whitespace(attrs: dict) → dict[source]: Replace non-breaking & other unusual whitespace in every string attr value with a normal ASCII space, and collapse runs of whitespace down to one space.

amocarray.utilities.parse_ascii_header(file_path: str, comment_char: str = '%') → Tuple[List[str], int][source]

Parse the header of an ASCII file to extract column names and the number of header lines.

Header lines are identified by the given comment character (default: ‘%’). Columns are defined in lines like: ‘<comment_char> Column 1: <column_name>’.

Parameters:

file_path (str) – Path to the ASCII file.
comment_char (str, optional) – Character used to identify header lines. Defaults to ‘%’.

Returns:

A tuple containing: - A list of column names extracted from the header. - The number of header lines to skip.

Return type:

tuple of (list of str, int)

amocarray.utilities.read_ascii_file(file_path: str, comment_char: str = '#') → DataFrame[source]

Read an ASCII file into a pandas DataFrame, skipping lines starting with a specified comment character.

Parameters:

file_path (str) – Path to the ASCII file.
comment_char (str, optional) – Character denoting comment lines. Defaults to ‘#’.

Returns:

The loaded data as a pandas DataFrame.

Return type:

pd.DataFrame

amocarray.utilities.resolve_file_path(file_name: str, source: str | Path | None, download_url: str | None, local_data_dir: Path, redownload: bool = False) → Path[source]

Resolve the path to a data file, using local source, cache, or downloading if necessary.

Parameters:

file_name (str) – The name of the file to resolve.
source (str or Path or None) – Optional local source directory.
download_url (str or None) – URL to download the file if needed.
local_data_dir (Path) – Directory where downloaded files are stored.
redownload (bool, optional) – If True, force redownload even if cached file exists.

Returns:

Path to the resolved file.

Return type:

Path

amocarray.utilities.safe_update_attrs(ds: Dataset, new_attrs: Dict[str, str], overwrite: bool = False, verbose: bool = True) → Dataset[source]

Safely update attributes of an xarray Dataset without overwriting existing keys, unless explicitly allowed.

Parameters:

ds (xr.Dataset) – The xarray Dataset whose attributes will be updated.
new_attrs (dict of str) – Dictionary of new attributes to add.
overwrite (bool, optional) – If True, allow overwriting existing attributes. Defaults to False.
verbose (bool, optional) – If True, emit a warning when skipping existing attributes. Defaults to True.

Returns:

The dataset with updated attributes.

Return type:

xr.Dataset

amocarray.utilities.validate_array_yaml(array_name: str, verbose: bool = True) → bool[source]

Validate the structure and required fields of an array-level metadata YAML.

Parameters:

array_name (str) – The array name (e.g., ‘samba’).
verbose (bool) – If True, print detailed validation messages.

Returns:

True if validation passes, False otherwise.

Return type:

bool

amocarray API

readers

Submodules

read_rapid

read_osnap

read_move

read_samba

standardise

plotters

writers

tools

utilities

`amocarray API`