amocarray API
Load and process transport estimates from major AMOC observing arrays.
readers
Shared utilities and base classes for AMOC readers.
- amocarray.readers.load_dataset(array_name: str, source: str = None, file_list: str | list[str] = None, transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False) list[Dataset] [source]
Load raw datasets from a selected AMOC observing array.
- Parameters:
array_name (str) – The name of the observing array to load. Options are: - ‘move’ : MOVE 16N array - ‘rapid’ : RAPID 26N array - ‘osnap’ : OSNAP array - ‘samba’ : SAMBA 34S array - ‘fw2015’ : FW2015 array - ‘41n’ : 41N array - ‘dso’ : DSO array
source (str, optional) – URL or local path to the data source. If None, the reader-specific default source will be used.
file_list (str or list of str, optional) – Filename or list of filenames to process. If None, the reader-specific default files will be used.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, optional) – Local directory for downloaded files.
redownload (bool, optional) – If True, force redownload of the data.
- Returns:
List of datasets loaded from the specified array.
- Return type:
list of xarray.Dataset
- Raises:
ValueError – If an unknown array name is provided.
- amocarray.readers.load_sample_dataset(array_name: str = 'rapid') Dataset [source]
Load a sample dataset for quick testing.
Currently supports: - ‘rapid’ : loads the ‘RAPID_26N_TRANSPORT.nc’ file
- Parameters:
array_name (str, optional) – The name of the observing array to load. Default is ‘rapid’.
- Returns:
A single xarray Dataset from the sample file.
- Return type:
xr.Dataset
- Raises:
ValueError – If the array_name is not recognised.
Submodules
read_rapid
Reader for RAPID-MOCHA-WBTS array data at 26°N.
- amocarray.read_rapid.read_rapid(source: str | Path | None, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False) list[Dataset] [source]
Load the RAPID transport dataset from a URL or local file path into an xarray.Dataset.
- Parameters:
source (str, optional) – URL or local path to the NetCDF file(s). Defaults to the RAPID data repository URL.
file_list (str or list of str, optional) – Filename or list of filenames to process. If None, will attempt to list files in the source directory.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
- Returns:
The loaded xarray dataset with basic inline metadata.
- Return type:
xr.Dataset
- Raises:
ValueError – If the source is neither a valid URL nor a directory path.
FileNotFoundError – If no valid NetCDF files are found in the provided file list.
read_osnap
Reader for OSNAP (Overturning in the Subpolar North Atlantic Program) data.
- amocarray.read_osnap.read_osnap(source: str, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False) list[Dataset] [source]
Load the OSNAP transport datasets from a URL or local file path into xarray Datasets.
- Parameters:
source (str, optional) – Local path to the data directory (remote source is handled per-file).
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to OSNAP_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
- Returns:
List of loaded xarray datasets with basic inline and file-specific metadata.
- Return type:
list of xr.Dataset
- Raises:
ValueError – If no source is provided for a file and no default URL mapping is found.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.
read_move
Reader for MOVE (Meridional Overturning Variability Experiment) data.
- amocarray.read_move.read_move(source: str, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False) list[Dataset] [source]
Load the MOVE transport dataset from a URL or local file path into xarray Datasets.
- Parameters:
source (str, optional) – URL or local path to the NetCDF file(s). Defaults to the MOVE data repository URL.
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to MOVE_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
- Returns:
List of loaded xarray datasets with basic inline and file-specific metadata.
- Return type:
list of xr.Dataset
- Raises:
ValueError – If the source is neither a valid URL nor a directory path.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.
read_samba
Reader for SAMBA (South Atlantic MOC Basin-wide Array) data.
- amocarray.read_samba.read_samba(source: str | Path | None, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False) list[Dataset] [source]
Load the SAMBA transport datasets from remote URL or local file path into xarray Datasets.
- Parameters:
source (str, optional) – URL or local path to the dataset directory. If None, will use predefined URLs per file.
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to SAMBA_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
- Returns:
List of loaded xarray datasets with basic inline and file-specific metadata.
- Return type:
list of xr.Dataset
- Raises:
ValueError – If no source is provided for a file and no default URL mapping found.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.
standardise
Functions to apply naming conventions, units, and metadata standards to datasets.
Standardisation functions for AMOC observing array datasets.
These functions take raw loaded datasets and: - Rename variables to standard names - Add variable-level metadata - Add or update global attributes - Prepare datasets for downstream analysis
Currently implemented: - SAMBA
- amocarray.standardise.clean_metadata(attrs: dict, preferred_keys: dict = None) dict [source]
Clean up a metadata dictionary: - Normalize key casing - Merge aliases with identical values - Apply standard naming (via preferred_keys mapping)
- amocarray.standardise.merge_metadata_aliases(attrs: dict, preferred_keys: dict) dict [source]
Consolidate and rename metadata keys case‑insensitively (except featureType), using preferred_keys to map aliases to canonical names.
- Parameters:
attrs (dict) – Metadata dictionary with potential duplicates.
preferred_keys (dict) – Mapping of lowercase alias keys to preferred canonical keys.
- Returns:
Metadata dictionary with duplicates merged and keys renamed.
- Return type:
dict
- amocarray.standardise.normalize_and_add_vocabulary(attrs: dict, normalizations: dict[str, tuple[dict[str, str], str]]) dict [source]
- For each (attr, (value_map, vocab_url)) in normalizations:
- If attr exists in attrs:
Map attrs[attr] using value_map (or leave it if unmapped)
Add attrs[f”{attr}_vocabulary”] = vocab_url
- Parameters:
attrs (dict) – Metadata attributes, already cleaned & renamed.
normalizations (dict) – Keys are canonical attr names (e.g. “platform”), values are (value_map, vocabulary_url) tuples.
- Returns:
attrs with normalized values and added <attr>_vocabulary entries.
- Return type:
dict
- amocarray.standardise.reorder_metadata(attrs: dict) dict [source]
Return a new dict with keys ordered according to the OG1.0 global‐attribute list. Any attrs not in the spec list are appended at the end, in their original order.
- amocarray.standardise.standardise_array(ds: Dataset, file_name: str, array_name: str) Dataset [source]
Standardise a mooring array dataset using YAML-based metadata.
- Parameters:
ds (xr.Dataset) – Raw dataset loaded from a reader.
file_name (str) – Filename (e.g., ‘moc_transports.nc’) expected to match ds.attrs[“source_file”].
array_name (str) – Name of the mooring array (e.g., ‘samba’, ‘rapid’, ‘move’, ‘osnap’, ‘fw2015’, ‘mocha’).
- Returns:
Standardised dataset with renamed variables and enriched metadata.
- Return type:
xr.Dataset
- Raises:
ValueError – If file_name does not match ds.attrs[“source_file”].
plotters
Tools for visualising AMOC time series and transport data.
- amocarray.plotters.monthly_resample(da: DataArray) DataArray [source]
Resample to monthly mean if data is not already monthly.
- amocarray.plotters.plot_amoc_timeseries(data, varnames=None, labels=None, colors=None, title='AMOC Time Series', ylabel=None, time_limits=None, ylim=None, figsize=(10, 3), resample_monthly=True, plot_raw=True)[source]
Plot original and optionally monthly-averaged AMOC time series for one or more datasets.
- Parameters:
data (list of xarray.Dataset or xarray.DataArray) – List of datasets or DataArrays to plot.
varnames (list of str, optional) – List of variable names to extract from each dataset. Not needed if DataArrays are passed.
labels (list of str, optional) – Labels for the legend.
colors (list of str, optional) – Colors for monthly-averaged plots.
title (str) – Title of the plot.
ylabel (str, optional) – Label for the y-axis. If None, inferred from attributes.
time_limits (tuple of str or pd.Timestamp, optional) – X-axis time limits (start, end).
ylim (tuple of float, optional) – Y-axis limits (min, max).
figsize (tuple) – Size of the figure.
resample_monthly (bool) – If True, monthly averages are computed and plotted.
plot_raw (bool) – If True, raw data is plotted.
- amocarray.plotters.plot_monthly_anomalies(**kwargs) tuple[Figure, list[Axes]] [source]
Plot the monthly anomalies for various datasets. Pass keyword arguments in the form: label_name_data, label_name_label. For example:
osnap_data = standardOSNAP[0][“MOC_all”], osnap_label = “OSNAP” …
- amocarray.plotters.show_attributes(data: str | Dataset) DataFrame [source]
Processes an xarray Dataset or a netCDF file, extracts attribute information, and returns a DataFrame with details about the attributes.
- Parameters:
data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
- Returns:
A DataFrame containing the following columns: - Attribute: The name of the attribute. - Value: The value of the attribute. - DType: The data type of the attribute.
- Return type:
pandas.DataFrame
- Raises:
TypeError – If the input data is not a file path (str) or an xarray Dataset.
- amocarray.plotters.show_contents(data: str | Dataset, content_type: str = 'variables') Styler | DataFrame [source]
Wrapper function to show contents of an xarray Dataset or a netCDF file.
- Parameters:
data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
content_type (str, optional) – The type of content to show, either ‘variables’ (or ‘vars’) or ‘attributes’ (or ‘attrs’). Default is ‘variables’.
- Returns:
A styled DataFrame with details about the variables or attributes.
- Return type:
pandas.io.formats.style.Styler or pandas.DataFrame
- Raises:
TypeError – If the input data is not a file path (str) or an xarray Dataset.
ValueError – If the content_type is not ‘variables’ (or ‘vars’) or ‘attributes’ (or ‘attrs’).
- amocarray.plotters.show_variables(data: str | Dataset) Styler [source]
Processes an xarray Dataset or a netCDF file, extracts variable information, and returns a styled DataFrame with details about the variables.
- Parameters:
data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
- Returns:
A styled DataFrame containing the following columns: - dims: The dimension of the variable (or “string” if it is a string type). - name: The name of the variable. - units: The units of the variable (if available). - comment: Any additional comments about the variable (if available). - standard_name: The standard name of the variable (if available). - dtype: The data type of the variable.
- Return type:
pd.io.formats.style.Styler
- Raises:
TypeError – If the input data is not a file path (str) or an xarray Dataset.
- amocarray.plotters.show_variables_by_dimension(data: str | Dataset, dimension_name: str = 'trajectory') Styler [source]
Extracts variable information from an xarray Dataset or a netCDF file and returns a styled DataFrame with details about the variables filtered by a specific dimension.
- Parameters:
data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
dimension_name (str, optional) – The name of the dimension to filter variables by, by default “trajectory”.
- Returns:
A styled DataFrame containing the following columns: - dims: The dimension of the variable (or “string” if it is a string type). - name: The name of the variable. - units: The units of the variable (if available). - comment: Any additional comments about the variable (if available).
- Return type:
pandas.io.formats.style.Styler
- Raises:
TypeError – If the input data is not a file path (str) or an xarray Dataset.
writers
- amocarray.writers.save_dataset(ds: Dataset, output_file: str = '../test.nc') bool [source]
Attempts to save the dataset to a NetCDF file. If a TypeError occurs due to invalid attribute values, it converts the invalid attributes to strings and retries the save operation.
- Parameters:
ds (xarray.Dataset) – The dataset to be saved.
output_file (str, optional) – The path to the output NetCDF file. Defaults to ‘../test.nc’.
- Returns:
True if the dataset was saved successfully, False otherwise.
- Return type:
bool
Notes
This function is based on a workaround for issues with saving datasets containing attributes of unsupported types. See: https://github.com/pydata/xarray/issues/3743
tools
Helper functions for data manipulation, unit conversion, and clean-up.
- amocarray.tools.convert_units_var(var_values: ndarray | float, current_unit: str, new_unit: str, unit_conversion: dict[str, dict[str, float]] = {'Celsius': {'degrees_Celsius': 1.0}, 'Pa': {'dbar': 0.0001}, 'S/m': {'mS/cm': 0.1}, 'Sv': {'Sverdrup': 1.0}, 'Sverdrup': {'Sv': 1}, 'cm': {'m': 0.01}, 'cm s-1': {'m s-1': 0.01}, 'cm/s': {'m/s': 0.01}, 'dbar': {'Pa': 10000, 'kPa': 10}, 'degrees_Celsius': {'Celsius': 1}, 'g m-3': {'kg m-3': 0.001}, 'kPa': {'dbar': 0.1}, 'kg m-3': {'g m-3': 1000.0}, 'km': {'m': 1000.0}, 'm': {'cm': 100, 'km': 0.001}, 'm s-1': {'cm s-1': 100.0}, 'm/s': {'cm/s': 100.0}, 'mS/cm': {'S/m': 10.0}}) ndarray | float [source]
Converts variable values from one unit to another using a predefined conversion factor.
- Parameters:
var_values (numpy.ndarray or float) – The values to be converted.
current_unit (str) – The current unit of the variable values.
new_unit (str) – The target unit to which the variable values should be converted.
unit_conversion (dict of {str: dict of {str: float}}, optional) – A dictionary containing conversion factors between units. The default is unit_conversion.
- Returns:
The converted variable values. If no conversion factor is found, the original values are returned.
- Return type:
numpy.ndarray or float
- Raises:
KeyError – If the conversion factor for the specified units is not found in the unit_conversion dictionary.
Notes
If the conversion factor for the specified units is not available, a message is printed, and the original values are returned without any conversion.
- amocarray.tools.find_best_dtype(var_name: str, da: DataArray) dtype [source]
Determines the most suitable data type for a given variable.
- Parameters:
var_name (str) – The name of the variable.
da (xarray.DataArray) – The data array containing the variable’s values.
- Returns:
The optimal data type for the variable based on its name and values.
- Return type:
numpy.dtype
- amocarray.tools.generate_reverse_conversions(forward_conversions: dict[str, dict[str, float]]) dict[str, dict[str, float]] [source]
Create a unit conversion dictionary with both forward and reverse conversions.
- Parameters:
forward_conversions (dict of {str: dict of {str: float}}) – Mapping of source units to target units and conversion factors. Example: {“m”: {“cm”: 100, “km”: 0.001}}
- Returns:
dict of {str – Complete mapping of units including reverse conversions. Example: {“cm”: {“m”: 0.01}, “km”: {“m”: 1000}}
- Return type:
dict of {str: float}}
Notes
If a conversion factor is zero, a warning is printed, and the reverse conversion is skipped.
- amocarray.tools.reformat_units_var(ds: Dataset, var_name: str, unit_format: dict[str, str] = {'S/m': 'S m-1', 'cm/s': 'cm s-1', 'degrees_Celsius': 'Celsius', 'g/m^3': 'g m-3', 'm/s': 'm s-1', 'meters': 'm'}) str [source]
Reformat the units of a variable in the dataset based on a provided mapping.
- Parameters:
ds (xarray.Dataset) – The input dataset containing variables with units to be reformatted.
var_name (str) – The name of the variable whose units need to be reformatted.
unit_format (dict of {str: str}, optional) – A dictionary mapping old unit strings to new formatted unit strings. Defaults to unit_str_format.
- Returns:
The reformatted unit string. If the old unit is not found in unit_format, the original unit string is returned.
- Return type:
str
- amocarray.tools.set_best_dtype(ds: Dataset) Dataset [source]
Adjust the data types of variables in a dataset to optimize memory usage.
- Parameters:
ds (xarray.Dataset) – The input dataset whose variables’ data types will be adjusted.
- Returns:
The dataset with updated data types for its variables, potentially saving memory.
- Return type:
xarray.Dataset
Notes
The function determines the best data type for each variable using find_best_dtype.
Attributes like valid_min and valid_max are updated to match the new data type.
If the new data type is integer-based, NaN values are replaced with a fill value.
Logs the percentage of memory saved after the data type adjustments.
- amocarray.tools.set_fill_value(new_dtype: dtype) int [source]
Calculate the fill value for a given data type.
- Parameters:
new_dtype (numpy.dtype) – The data type for which the fill value is to be calculated.
- Returns:
The calculated fill value based on the bit-width of the data type.
- Return type:
int
utilities
Shared utilities for downloading, reading, and parsing data files.
- amocarray.utilities.apply_defaults(default_source: str, default_files: List[str]) Callable [source]
Decorator to apply default values for ‘source’ and ‘file_list’ parameters if they are None.
- Parameters:
default_source (str) – Default source URL or path.
default_files (list of str) – Default list of filenames.
- Returns:
A wrapped function with defaults applied.
- Return type:
Callable
- amocarray.utilities.download_file(url: str, dest_folder: str, redownload: bool = False, filename: str = None) str [source]
Download a file from HTTP(S) or FTP to the specified destination folder.
- Parameters:
url (str) – The URL of the file to download.
dest_folder (str) – Local folder to save the downloaded file.
redownload (bool, optional) – If True, force re-download of the file even if it exists.
filename (str, optional) – Optional filename to save the file as. If not given, uses the name from the URL.
- Returns:
The full path to the downloaded file.
- Return type:
str
- Raises:
ValueError – If the URL scheme is unsupported.
- amocarray.utilities.get_project_root() Path [source]
Return the absolute path to the project root directory.
- amocarray.utilities.load_array_metadata(array_name: str) dict [source]
Load metadata YAML for a given mooring array.
- Parameters:
array_name (str) – Name of the mooring array (e.g., ‘samba’).
- Returns:
Dictionary containing the parsed YAML metadata.
- Return type:
dict
- amocarray.utilities.normalize_whitespace(attrs: dict) dict [source]
Replace non-breaking & other unusual whitespace in every string attr value with a normal ASCII space, and collapse runs of whitespace down to one space.
- amocarray.utilities.parse_ascii_header(file_path: str, comment_char: str = '%') Tuple[List[str], int] [source]
Parse the header of an ASCII file to extract column names and the number of header lines.
Header lines are identified by the given comment character (default: ‘%’). Columns are defined in lines like: ‘<comment_char> Column 1: <column_name>’.
- Parameters:
file_path (str) – Path to the ASCII file.
comment_char (str, optional) – Character used to identify header lines. Defaults to ‘%’.
- Returns:
A tuple containing: - A list of column names extracted from the header. - The number of header lines to skip.
- Return type:
tuple of (list of str, int)
- amocarray.utilities.read_ascii_file(file_path: str, comment_char: str = '#') DataFrame [source]
Read an ASCII file into a pandas DataFrame, skipping lines starting with a specified comment character.
- Parameters:
file_path (str) – Path to the ASCII file.
comment_char (str, optional) – Character denoting comment lines. Defaults to ‘#’.
- Returns:
The loaded data as a pandas DataFrame.
- Return type:
pd.DataFrame
- amocarray.utilities.resolve_file_path(file_name: str, source: str | Path | None, download_url: str | None, local_data_dir: Path, redownload: bool = False) Path [source]
Resolve the path to a data file, using local source, cache, or downloading if necessary.
- Parameters:
file_name (str) – The name of the file to resolve.
source (str or Path or None) – Optional local source directory.
download_url (str or None) – URL to download the file if needed.
local_data_dir (Path) – Directory where downloaded files are stored.
redownload (bool, optional) – If True, force redownload even if cached file exists.
- Returns:
Path to the resolved file.
- Return type:
Path
- amocarray.utilities.safe_update_attrs(ds: Dataset, new_attrs: Dict[str, str], overwrite: bool = False, verbose: bool = True) Dataset [source]
Safely update attributes of an xarray Dataset without overwriting existing keys, unless explicitly allowed.
- Parameters:
ds (xr.Dataset) – The xarray Dataset whose attributes will be updated.
new_attrs (dict of str) – Dictionary of new attributes to add.
overwrite (bool, optional) – If True, allow overwriting existing attributes. Defaults to False.
verbose (bool, optional) – If True, emit a warning when skipping existing attributes. Defaults to True.
- Returns:
The dataset with updated attributes.
- Return type:
xr.Dataset
- amocarray.utilities.validate_array_yaml(array_name: str, verbose: bool = True) bool [source]
Validate the structure and required fields of an array-level metadata YAML.
- Parameters:
array_name (str) – The array name (e.g., ‘samba’).
verbose (bool) – If True, print detailed validation messages.
- Returns:
True if validation passes, False otherwise.
- Return type:
bool