amocatlas API
API Reference
- read module
- readers (Legacy API)
- data_sources package
- plotters
format_units_for_plotting()format_variable_name_for_plotting()monthly_resample()plot_all_moc_overlaid_pygmt()plot_all_moc_pygmt()plot_amoc_2d_data()plot_amoc_timeseries()plot_bryden2005_pygmt()plot_moc_timeseries_pygmt()plot_monthly_anomalies()plot_osnap_components_pygmt()plot_rapid_components_pygmt()show_attributes()show_contents()show_variables()show_variables_by_dimension()
- writers
- tools
- standardise
clean_metadata()get_dynamic_version()merge_metadata_aliases()normalize_and_add_vocabulary()reorder_metadata()resolve_metadata_conflict()standardise_41n()standardise_47n()standardise_arcticgateway()standardise_array()standardise_calafat2025()standardise_data()standardise_dso()standardise_fbc()standardise_fw2015()standardise_mocha()standardise_move()standardise_osnap()standardise_rapid()standardise_samba()standardise_zheng2024()standardize_depth_coordinate()standardize_latitude_coordinate()standardize_longitude_coordinate()standardize_sigma0_coordinate()standardize_time_coordinate()standardize_units()
- utilities
apply_defaults()apply_unit_standardization_after_metadata()download_file()find_data_start()get_default_data_dir()get_project_root()get_standard_unit_mappings()is_valid_url()load_array_metadata()mask_invalid_values()normalize_whitespace()parse_ascii_header()read_ascii_file()resolve_file_path()safe_update_attrs()sanitize_variable_name()standardize_dataset_units()validate_array_yaml()
Load and process transport estimates from major AMOC observing arrays.
New Intuitive API (v0.2.0+)
The recommended API that returns standardized, analysis-ready data by default.
Intuitive namespace API for AMOCatlas data readers.
This module provides a more user-friendly API for accessing AMOC array data with discoverable function names and consistent return types. Each array gets its own function with IDE autocompletion support.
Key improvements over readers.load_dataset(): - Single dataset returned by default (most common use case) - all_files=True parameter for power users who need multiple files - Array-specific parameters feel natural (e.g., version for OSNAP) - IDE autocompletion works for array names
Examples
- Basic usage (single dataset):
>>> from amocatlas import read >>> data = read.rapid() # Single transport dataset >>> osnap = read.osnap(version="2025") # Latest OSNAP data >>> arctic = read.arcticgateway() # Arctic gateway transports
- Power user access (multiple datasets):
>>> all_rapid = read.rapid(all_files=True) # List of all RAPID files >>> all_osnap = read.osnap(all_files=True) # List of all OSNAP files
- Custom parameters:
>>> rapid_custom = read.rapid( ... source="https://my-mirror.com/rapid/", ... transport_only=False, ... redownload=True ... )
- amocatlas.read.arcticgateway(source: str | Path | None = None, file_list: str | List[str] | None = None, transport_only: bool = True, all_files: bool = False, raw: bool = False, data_dir: str | Path | None = None, redownload: bool = False, version: str = None, track_added_attrs: bool = False) Dataset | List[Dataset]
Load Arctic Gateway array data.
By default, returns standardized, analysis-ready data with consistent variable names, metadata, and units following oceanographic conventions. Use raw=True to get data in original format from the source files.
- Parameters:
source (str, Path, or None, optional) – URL or local path to the data source.
file_list (str, list of str, or None, optional) – Specific files to load. Defaults to transport files.
transport_only (bool, optional) – If True, load only transport data. Default: True.
all_files (bool, optional) – If True, return list of all datasets. If False, return single dataset. Default: False.
raw (bool, optional) – If True, return data in original format without standardization. If False (default), apply standardization for analysis-ready data.
data_dir (str, Path, or None, optional) – Local directory for data storage.
redownload (bool, optional) – Force redownload of data. Default: False.
version (str, optional) – Dataset version (ignored for this array). Default: None.
track_added_attrs (bool, optional) – INTERNAL USE ONLY - Track which attributes were added during metadata enrichment. When True, embeds a temporary ‘_amocatlas_metadata_changes’ attribute in each returned dataset containing {“added”: […], “modified”: […]}. This attribute should be extracted and removed by calling code (e.g., report generation). Not intended for end users. Default: False.
- Returns:
Standardized dataset (default) or raw dataset if raw=True. Single dataset by default, or list of datasets if all_files=True.
- Return type:
xr.Dataset or list of xr.Dataset
Notes
Standardization includes: - Consistent variable names across arrays - Proper CF-compliant metadata and attributes - Standardized units following oceanographic conventions - Additional quality control and formatting
- amocatlas.read.calafat2025(source: str | Path | None = None, file_list: str | List[str] | None = None, transport_only: bool = True, all_files: bool = False, raw: bool = False, data_dir: str | Path | None = None, redownload: bool = False, version: str = None, track_added_attrs: bool = False) Dataset | List[Dataset]
Load Calafat et al. 2025 array data.
By default, returns standardized, analysis-ready data with consistent variable names, metadata, and units following oceanographic conventions. Use raw=True to get data in original format from the source files.
- Parameters:
source (str, Path, or None, optional) – URL or local path to the data source.
file_list (str, list of str, or None, optional) – Specific files to load. Defaults to transport files.
transport_only (bool, optional) – If True, load only transport data. Default: True.
all_files (bool, optional) – If True, return list of all datasets. If False, return single dataset. Default: False.
raw (bool, optional) – If True, return data in original format without standardization. If False (default), apply standardization for analysis-ready data.
data_dir (str, Path, or None, optional) – Local directory for data storage.
redownload (bool, optional) – Force redownload of data. Default: False.
version (str, optional) – Dataset version (ignored for this array). Default: None.
track_added_attrs (bool, optional) – INTERNAL USE ONLY - Track which attributes were added during metadata enrichment. When True, embeds a temporary ‘_amocatlas_metadata_changes’ attribute in each returned dataset containing {“added”: […], “modified”: […]}. This attribute should be extracted and removed by calling code (e.g., report generation). Not intended for end users. Default: False.
- Returns:
Standardized dataset (default) or raw dataset if raw=True. Single dataset by default, or list of datasets if all_files=True.
- Return type:
xr.Dataset or list of xr.Dataset
Notes
Standardization includes: - Consistent variable names across arrays - Proper CF-compliant metadata and attributes - Standardized units following oceanographic conventions - Additional quality control and formatting
- amocatlas.read.dso(source: str | Path | None = None, file_list: str | List[str] | None = None, transport_only: bool = True, all_files: bool = False, raw: bool = False, data_dir: str | Path | None = None, redownload: bool = False, version: str = None, track_added_attrs: bool = False) Dataset | List[Dataset]
Load Denmark Strait Overflow array data.
By default, returns standardized, analysis-ready data with consistent variable names, metadata, and units following oceanographic conventions. Use raw=True to get data in original format from the source files.
- Parameters:
source (str, Path, or None, optional) – URL or local path to the data source.
file_list (str, list of str, or None, optional) – Specific files to load. Defaults to transport files.
transport_only (bool, optional) – If True, load only transport data. Default: True.
all_files (bool, optional) – If True, return list of all datasets. If False, return single dataset. Default: False.
raw (bool, optional) – If True, return data in original format without standardization. If False (default), apply standardization for analysis-ready data.
data_dir (str, Path, or None, optional) – Local directory for data storage.
redownload (bool, optional) – Force redownload of data. Default: False.
version (str, optional) – Dataset version (ignored for this array). Default: None.
track_added_attrs (bool, optional) – INTERNAL USE ONLY - Track which attributes were added during metadata enrichment. When True, embeds a temporary ‘_amocatlas_metadata_changes’ attribute in each returned dataset containing {“added”: […], “modified”: […]}. This attribute should be extracted and removed by calling code (e.g., report generation). Not intended for end users. Default: False.
- Returns:
Standardized dataset (default) or raw dataset if raw=True. Single dataset by default, or list of datasets if all_files=True.
- Return type:
xr.Dataset or list of xr.Dataset
Notes
Standardization includes: - Consistent variable names across arrays - Proper CF-compliant metadata and attributes - Standardized units following oceanographic conventions - Additional quality control and formatting
- amocatlas.read.fbc(source: str | Path | None = None, file_list: str | List[str] | None = None, transport_only: bool = True, all_files: bool = False, raw: bool = False, data_dir: str | Path | None = None, redownload: bool = False, version: str = None, track_added_attrs: bool = False) Dataset | List[Dataset]
Load Faroe Bank Channel array data.
By default, returns standardized, analysis-ready data with consistent variable names, metadata, and units following oceanographic conventions. Use raw=True to get data in original format from the source files.
- Parameters:
source (str, Path, or None, optional) – URL or local path to the data source.
file_list (str, list of str, or None, optional) – Specific files to load. Defaults to transport files.
transport_only (bool, optional) – If True, load only transport data. Default: True.
all_files (bool, optional) – If True, return list of all datasets. If False, return single dataset. Default: False.
raw (bool, optional) – If True, return data in original format without standardization. If False (default), apply standardization for analysis-ready data.
data_dir (str, Path, or None, optional) – Local directory for data storage.
redownload (bool, optional) – Force redownload of data. Default: False.
version (str, optional) – Dataset version (ignored for this array). Default: None.
track_added_attrs (bool, optional) – INTERNAL USE ONLY - Track which attributes were added during metadata enrichment. When True, embeds a temporary ‘_amocatlas_metadata_changes’ attribute in each returned dataset containing {“added”: […], “modified”: […]}. This attribute should be extracted and removed by calling code (e.g., report generation). Not intended for end users. Default: False.
- Returns:
Standardized dataset (default) or raw dataset if raw=True. Single dataset by default, or list of datasets if all_files=True.
- Return type:
xr.Dataset or list of xr.Dataset
Notes
Standardization includes: - Consistent variable names across arrays - Proper CF-compliant metadata and attributes - Standardized units following oceanographic conventions - Additional quality control and formatting
- amocatlas.read.fw2015(source: str | Path | None = None, file_list: str | List[str] | None = None, transport_only: bool = True, all_files: bool = False, raw: bool = False, data_dir: str | Path | None = None, redownload: bool = False, version: str = None, track_added_attrs: bool = False) Dataset | List[Dataset]
Load Frajka-Williams 2015 array data.
By default, returns standardized, analysis-ready data with consistent variable names, metadata, and units following oceanographic conventions. Use raw=True to get data in original format from the source files.
- Parameters:
source (str, Path, or None, optional) – URL or local path to the data source.
file_list (str, list of str, or None, optional) – Specific files to load. Defaults to transport files.
transport_only (bool, optional) – If True, load only transport data. Default: True.
all_files (bool, optional) – If True, return list of all datasets. If False, return single dataset. Default: False.
raw (bool, optional) – If True, return data in original format without standardization. If False (default), apply standardization for analysis-ready data.
data_dir (str, Path, or None, optional) – Local directory for data storage.
redownload (bool, optional) – Force redownload of data. Default: False.
version (str, optional) – Dataset version (ignored for this array). Default: None.
track_added_attrs (bool, optional) – INTERNAL USE ONLY - Track which attributes were added during metadata enrichment. When True, embeds a temporary ‘_amocatlas_metadata_changes’ attribute in each returned dataset containing {“added”: […], “modified”: […]}. This attribute should be extracted and removed by calling code (e.g., report generation). Not intended for end users. Default: False.
- Returns:
Standardized dataset (default) or raw dataset if raw=True. Single dataset by default, or list of datasets if all_files=True.
- Return type:
xr.Dataset or list of xr.Dataset
Notes
Standardization includes: - Consistent variable names across arrays - Proper CF-compliant metadata and attributes - Standardized units following oceanographic conventions - Additional quality control and formatting
- amocatlas.read.mocha(source: str | Path | None = None, file_list: str | List[str] | None = None, transport_only: bool = True, all_files: bool = False, raw: bool = False, data_dir: str | Path | None = None, redownload: bool = False, version: str = None, track_added_attrs: bool = False) Dataset | List[Dataset]
Load MOCHA array data.
By default, returns standardized, analysis-ready data with consistent variable names, metadata, and units following oceanographic conventions. Use raw=True to get data in original format from the source files.
- Parameters:
source (str, Path, or None, optional) – URL or local path to the data source.
file_list (str, list of str, or None, optional) – Specific files to load. Defaults to transport files.
transport_only (bool, optional) – If True, load only transport data. Default: True.
all_files (bool, optional) – If True, return list of all datasets. If False, return single dataset. Default: False.
raw (bool, optional) – If True, return data in original format without standardization. If False (default), apply standardization for analysis-ready data.
data_dir (str, Path, or None, optional) – Local directory for data storage.
redownload (bool, optional) – Force redownload of data. Default: False.
version (str, optional) – Dataset version (ignored for this array). Default: None.
track_added_attrs (bool, optional) – INTERNAL USE ONLY - Track which attributes were added during metadata enrichment. When True, embeds a temporary ‘_amocatlas_metadata_changes’ attribute in each returned dataset containing {“added”: […], “modified”: […]}. This attribute should be extracted and removed by calling code (e.g., report generation). Not intended for end users. Default: False.
- Returns:
Standardized dataset (default) or raw dataset if raw=True. Single dataset by default, or list of datasets if all_files=True.
- Return type:
xr.Dataset or list of xr.Dataset
Notes
Standardization includes: - Consistent variable names across arrays - Proper CF-compliant metadata and attributes - Standardized units following oceanographic conventions - Additional quality control and formatting
- amocatlas.read.move(source: str | Path | None = None, file_list: str | List[str] | None = None, transport_only: bool = True, all_files: bool = False, raw: bool = False, data_dir: str | Path | None = None, redownload: bool = False, version: str = None, track_added_attrs: bool = False) Dataset | List[Dataset]
Load MOVE 16°N array data.
By default, returns standardized, analysis-ready data with consistent variable names, metadata, and units following oceanographic conventions. Use raw=True to get data in original format from the source files.
- Parameters:
source (str, Path, or None, optional) – URL or local path to the data source.
file_list (str, list of str, or None, optional) – Specific files to load. Defaults to transport files.
transport_only (bool, optional) – If True, load only transport data. Default: True.
all_files (bool, optional) – If True, return list of all datasets. If False, return single dataset. Default: False.
raw (bool, optional) – If True, return data in original format without standardization. If False (default), apply standardization for analysis-ready data.
data_dir (str, Path, or None, optional) – Local directory for data storage.
redownload (bool, optional) – Force redownload of data. Default: False.
version (str, optional) – Dataset version (ignored for this array). Default: None.
track_added_attrs (bool, optional) – INTERNAL USE ONLY - Track which attributes were added during metadata enrichment. When True, embeds a temporary ‘_amocatlas_metadata_changes’ attribute in each returned dataset containing {“added”: […], “modified”: […]}. This attribute should be extracted and removed by calling code (e.g., report generation). Not intended for end users. Default: False.
- Returns:
Standardized dataset (default) or raw dataset if raw=True. Single dataset by default, or list of datasets if all_files=True.
- Return type:
xr.Dataset or list of xr.Dataset
Notes
Standardization includes: - Consistent variable names across arrays - Proper CF-compliant metadata and attributes - Standardized units following oceanographic conventions - Additional quality control and formatting
- amocatlas.read.noac47n(source: str | Path | None = None, file_list: str | List[str] | None = None, transport_only: bool = True, all_files: bool = False, raw: bool = False, data_dir: str | Path | None = None, redownload: bool = False, version: str = None, track_added_attrs: bool = False) Dataset | List[Dataset]
Load 47°N array data.
By default, returns standardized, analysis-ready data with consistent variable names, metadata, and units following oceanographic conventions. Use raw=True to get data in original format from the source files.
- Parameters:
source (str, Path, or None, optional) – URL or local path to the data source.
file_list (str, list of str, or None, optional) – Specific files to load. Defaults to transport files.
transport_only (bool, optional) – If True, load only transport data. Default: True.
all_files (bool, optional) – If True, return list of all datasets. If False, return single dataset. Default: False.
raw (bool, optional) – If True, return data in original format without standardization. If False (default), apply standardization for analysis-ready data.
data_dir (str, Path, or None, optional) – Local directory for data storage.
redownload (bool, optional) – Force redownload of data. Default: False.
version (str, optional) – Dataset version (ignored for this array). Default: None.
track_added_attrs (bool, optional) – INTERNAL USE ONLY - Track which attributes were added during metadata enrichment. When True, embeds a temporary ‘_amocatlas_metadata_changes’ attribute in each returned dataset containing {“added”: […], “modified”: […]}. This attribute should be extracted and removed by calling code (e.g., report generation). Not intended for end users. Default: False.
- Returns:
Standardized dataset (default) or raw dataset if raw=True. Single dataset by default, or list of datasets if all_files=True.
- Return type:
xr.Dataset or list of xr.Dataset
Notes
Standardization includes: - Consistent variable names across arrays - Proper CF-compliant metadata and attributes - Standardized units following oceanographic conventions - Additional quality control and formatting
- amocatlas.read.osnap(source: str | Path | None = None, file_list: str | List[str] | None = None, transport_only: bool = True, all_files: bool = False, raw: bool = False, data_dir: str | Path | None = None, redownload: bool = False, version: str = None, track_added_attrs: bool = False) Dataset | List[Dataset]
Load OSNAP array data.
By default, returns standardized, analysis-ready data with consistent variable names, metadata, and units following oceanographic conventions. Use raw=True to get data in original format from the source files.
- Parameters:
source (str, Path, or None, optional) – URL or local path to the data source.
file_list (str, list of str, or None, optional) – Specific files to load. Defaults to transport files.
transport_only (bool, optional) – If True, load only transport data. Default: True.
all_files (bool, optional) – If True, return list of all datasets. If False, return single dataset. Default: False.
raw (bool, optional) – If True, return data in original format without standardization. If False (default), apply standardization for analysis-ready data.
data_dir (str, Path, or None, optional) – Local directory for data storage.
redownload (bool, optional) – Force redownload of data. Default: False.
version (str, optional) – Dataset version (used for version selection). Default: None.
track_added_attrs (bool, optional) – INTERNAL USE ONLY - Track which attributes were added during metadata enrichment. When True, embeds a temporary ‘_amocatlas_metadata_changes’ attribute in each returned dataset containing {“added”: […], “modified”: […]}. This attribute should be extracted and removed by calling code (e.g., report generation). Not intended for end users. Default: False.
- Returns:
Standardized dataset (default) or raw dataset if raw=True. Single dataset by default, or list of datasets if all_files=True.
- Return type:
xr.Dataset or list of xr.Dataset
Notes
Standardization includes: - Consistent variable names across arrays - Proper CF-compliant metadata and attributes - Standardized units following oceanographic conventions - Additional quality control and formatting
- amocatlas.read.rapid(source: str | Path | None = None, file_list: str | List[str] | None = None, transport_only: bool = True, all_files: bool = False, raw: bool = False, data_dir: str | Path | None = None, redownload: bool = False, version: str = None, track_added_attrs: bool = False) Dataset | List[Dataset]
Load RAPID 26°N array data.
By default, returns standardized, analysis-ready data with consistent variable names, metadata, and units following oceanographic conventions. Use raw=True to get data in original format from the source files.
- Parameters:
source (str, Path, or None, optional) – URL or local path to the data source.
file_list (str, list of str, or None, optional) – Specific files to load. Defaults to transport files.
transport_only (bool, optional) – If True, load only transport data. Default: True.
all_files (bool, optional) – If True, return list of all datasets. If False, return single dataset. Default: False.
raw (bool, optional) – If True, return data in original format without standardization. If False (default), apply standardization for analysis-ready data.
data_dir (str, Path, or None, optional) – Local directory for data storage.
redownload (bool, optional) – Force redownload of data. Default: False.
version (str, optional) – Dataset version (ignored for this array). Default: None.
track_added_attrs (bool, optional) – INTERNAL USE ONLY - Track which attributes were added during metadata enrichment. When True, embeds a temporary ‘_amocatlas_metadata_changes’ attribute in each returned dataset containing {“added”: […], “modified”: […]}. This attribute should be extracted and removed by calling code (e.g., report generation). Not intended for end users. Default: False.
- Returns:
Standardized dataset (default) or raw dataset if raw=True. Single dataset by default, or list of datasets if all_files=True.
- Return type:
xr.Dataset or list of xr.Dataset
Notes
Standardization includes: - Consistent variable names across arrays - Proper CF-compliant metadata and attributes - Standardized units following oceanographic conventions - Additional quality control and formatting
- amocatlas.read.samba(source: str | Path | None = None, file_list: str | List[str] | None = None, transport_only: bool = True, all_files: bool = False, raw: bool = False, data_dir: str | Path | None = None, redownload: bool = False, version: str = None, track_added_attrs: bool = False) Dataset | List[Dataset]
Load SAMBA 34.5°S array data.
By default, returns standardized, analysis-ready data with consistent variable names, metadata, and units following oceanographic conventions. Use raw=True to get data in original format from the source files.
- Parameters:
source (str, Path, or None, optional) – URL or local path to the data source.
file_list (str, list of str, or None, optional) – Specific files to load. Defaults to transport files.
transport_only (bool, optional) – If True, load only transport data. Default: True.
all_files (bool, optional) – If True, return list of all datasets. If False, return single dataset. Default: False.
raw (bool, optional) – If True, return data in original format without standardization. If False (default), apply standardization for analysis-ready data.
data_dir (str, Path, or None, optional) – Local directory for data storage.
redownload (bool, optional) – Force redownload of data. Default: False.
version (str, optional) – Dataset version (ignored for this array). Default: None.
track_added_attrs (bool, optional) – INTERNAL USE ONLY - Track which attributes were added during metadata enrichment. When True, embeds a temporary ‘_amocatlas_metadata_changes’ attribute in each returned dataset containing {“added”: […], “modified”: […]}. This attribute should be extracted and removed by calling code (e.g., report generation). Not intended for end users. Default: False.
- Returns:
Standardized dataset (default) or raw dataset if raw=True. Single dataset by default, or list of datasets if all_files=True.
- Return type:
xr.Dataset or list of xr.Dataset
Notes
Standardization includes: - Consistent variable names across arrays - Proper CF-compliant metadata and attributes - Standardized units following oceanographic conventions - Additional quality control and formatting
- amocatlas.read.wh41n(source: str | Path | None = None, file_list: str | List[str] | None = None, transport_only: bool = True, all_files: bool = False, raw: bool = False, data_dir: str | Path | None = None, redownload: bool = False, version: str = None, track_added_attrs: bool = False) Dataset | List[Dataset]
Load 41°N array data.
By default, returns standardized, analysis-ready data with consistent variable names, metadata, and units following oceanographic conventions. Use raw=True to get data in original format from the source files.
- Parameters:
source (str, Path, or None, optional) – URL or local path to the data source.
file_list (str, list of str, or None, optional) – Specific files to load. Defaults to transport files.
transport_only (bool, optional) – If True, load only transport data. Default: True.
all_files (bool, optional) – If True, return list of all datasets. If False, return single dataset. Default: False.
raw (bool, optional) – If True, return data in original format without standardization. If False (default), apply standardization for analysis-ready data.
data_dir (str, Path, or None, optional) – Local directory for data storage.
redownload (bool, optional) – Force redownload of data. Default: False.
version (str, optional) – Dataset version (ignored for this array). Default: None.
track_added_attrs (bool, optional) – INTERNAL USE ONLY - Track which attributes were added during metadata enrichment. When True, embeds a temporary ‘_amocatlas_metadata_changes’ attribute in each returned dataset containing {“added”: […], “modified”: […]}. This attribute should be extracted and removed by calling code (e.g., report generation). Not intended for end users. Default: False.
- Returns:
Standardized dataset (default) or raw dataset if raw=True. Single dataset by default, or list of datasets if all_files=True.
- Return type:
xr.Dataset or list of xr.Dataset
Notes
Standardization includes: - Consistent variable names across arrays - Proper CF-compliant metadata and attributes - Standardized units following oceanographic conventions - Additional quality control and formatting
- amocatlas.read.zheng2024(source: str | Path | None = None, file_list: str | List[str] | None = None, transport_only: bool = True, all_files: bool = False, raw: bool = False, data_dir: str | Path | None = None, redownload: bool = False, version: str = None, track_added_attrs: bool = False) Dataset | List[Dataset]
Load Zheng et al. 2024 array data.
By default, returns standardized, analysis-ready data with consistent variable names, metadata, and units following oceanographic conventions. Use raw=True to get data in original format from the source files.
- Parameters:
source (str, Path, or None, optional) – URL or local path to the data source.
file_list (str, list of str, or None, optional) – Specific files to load. Defaults to transport files.
transport_only (bool, optional) – If True, load only transport data. Default: True.
all_files (bool, optional) – If True, return list of all datasets. If False, return single dataset. Default: False.
raw (bool, optional) – If True, return data in original format without standardization. If False (default), apply standardization for analysis-ready data.
data_dir (str, Path, or None, optional) – Local directory for data storage.
redownload (bool, optional) – Force redownload of data. Default: False.
version (str, optional) – Dataset version (ignored for this array). Default: None.
track_added_attrs (bool, optional) – INTERNAL USE ONLY - Track which attributes were added during metadata enrichment. When True, embeds a temporary ‘_amocatlas_metadata_changes’ attribute in each returned dataset containing {“added”: […], “modified”: […]}. This attribute should be extracted and removed by calling code (e.g., report generation). Not intended for end users. Default: False.
- Returns:
Standardized dataset (default) or raw dataset if raw=True. Single dataset by default, or list of datasets if all_files=True.
- Return type:
xr.Dataset or list of xr.Dataset
Notes
Standardization includes: - Consistent variable names across arrays - Proper CF-compliant metadata and attributes - Standardized units following oceanographic conventions - Additional quality control and formatting
readers (Legacy API)
Legacy API that returns raw data. Still supported for backwards compatibility.
AMOCatlas data readers: unified interface for AMOC observing arrays.
This module provides the main interface for loading data from multiple Atlantic Meridional Overturning Circulation (AMOC) observing arrays. It serves as the orchestrator that routes requests to specific array readers and provides both sample and full dataset loading capabilities.
The module supports data from: - RAPID (26°N) - MOVE (16°N) - OSNAP (Subpolar North Atlantic) - SAMBA (34.5°S) - MOCHA, 41°N, DSO, and FW2015 arrays
Main functions: - load_dataset(): Load full datasets from any supported array - load_sample_dataset(): Load small sample datasets for testing
- amocatlas.readers.load_dataset(array_name: str, source: str = None, file_list: str | list[str] = None, transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False) list[Dataset][source]
Load raw datasets from a selected AMOC observing array.
Deprecated since version This: function is deprecated and will be removed in a future version. Use the new intuitive API instead:
amocatlas.read(e.g.,amocatlas.read.rapid()).- Parameters:
array_name (str) – The name of the observing array to load. Options are: - ‘move’ : MOVE 16N array - ‘rapid’ : RAPID 26N array - ‘osnap’ : OSNAP array (2014-2022, configurable version via main reader) - ‘osnap_2025’ : OSNAP array (2014-2022, dedicated 2025 reader function) - ‘samba’ : SAMBA 34S array - ‘fw2015’ : FW2015 array - ‘41n’ : 41N array - ‘dso’ : DSO array - ‘calafat2025’ : CALAFAT2025 array - ‘zheng2024’ : ZHENG2024 array - ‘47n’ : 47N array - ‘fbc’ : Faroe Bank Channel overflow array - ‘arcticgateway’ : ARCTIC Gateway array
source (str, optional) – URL or local path to the data source. If None, the reader-specific default source will be used.
file_list (str or list of str, optional) – Filename or list of filenames to process. If None, the reader-specific default files will be used.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, optional) – Local directory for downloaded files.
redownload (bool, optional) – If True, force redownload of the data.
- Returns:
List of datasets loaded from the specified array.
- Return type:
list of xarray.Dataset
- Raises:
ValueError – If an unknown array name is provided.
- amocatlas.readers.load_sample_dataset(array_name: str = 'rapid') Dataset[source]
Load a sample dataset for quick testing.
Deprecated since version This: function is deprecated and will be removed in a future version. Use the new intuitive API instead:
amocatlas.read(e.g.,amocatlas.read.rapid()).Currently supports: - ‘rapid’ : loads the ‘RAPID_26N_TRANSPORT.nc’ file
- Parameters:
array_name (str, optional) – The name of the observing array to load. Default is ‘rapid’.
- Returns:
A single xarray Dataset from the sample file.
- Return type:
xr.Dataset
- Raises:
ValueError – If the array_name is not recognised.
data_sources
Individual data source readers organized by array/dataset.
AMOCatlas data source readers.
This package contains individual reader modules for each AMOC data source, including observing arrays, datasets, and overflow measurements.
Each module provides a read function for accessing its specific data source with consistent interfaces and error handling.
Module naming convention: - Arrays include latitude: rapid26n, move16n, osnap55n, samba34s - Special locations: wh41n (Willis & Hobbs), noac47n (North Atlantic Ocean Current) - Datasets by author/year: fw2015, calafat2025, zheng2024 - Overflow locations: dso (Denmark Strait), fbc (Faroe Bank Channel)
- amocatlas.data_sources.read_41n(source: str | Path | None, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False, track_added_attrs: bool = False) list[Dataset][source]
Load the 41N transport datasets from a URL or local file path into xarray Datasets.
- Parameters:
source (str, optional) – Local path to the data directory (remote source is handled per-file).
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to 41N_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
track_added_attrs (bool, optional) – If True, track which attributes were added during metadata enrichment.
- Returns:
List of loaded xarray datasets with basic inline and file-specific metadata.
- Return type:
list of xr.Dataset
- Raises:
ValueError – If no source is provided for a file and no default URL mapping is found.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.
- amocatlas.data_sources.read_47n(source: str | Path | None, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False, track_added_attrs: bool = False) list[Dataset][source]
Load the 47N transport datasets from a URL or local file path into xarray Datasets.
- Parameters:
source (str, optional) – Local path to the data directory (remote source is handled per-file).
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to 47N_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
track_added_attrs (bool, optional) – If True, track which attributes were added during metadata enrichment.
- Returns:
List of loaded xarray datasets with basic inline and file-specific metadata.
- Return type:
list of xr.Dataset
- Raises:
ValueError – If no source is provided for a file and no default URL mapping is found.
FileNotFoundError –
If the file cannot be downloaded or does not exist locally. –
- amocatlas.data_sources.read_arcticgateway(source: str, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False, track_added_attrs: bool = False) list[Dataset][source]
Load the ARCTIC Gateway transport dataset from a URL or local file path into xarray Datasets.
- Parameters:
source (str, optional) – URL or local path to the NetCDF file(s). Defaults to the ARCTIC data repository URL.
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to ARCTIC_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
track_added_attrs (bool, optional) – If True, track which attributes were added during metadata enrichment.
- Returns:
List of loaded xarray datasets with basic inline and file-specific metadata.
- Return type:
list of xr.Dataset
- Raises:
ValueError – If the source is neither a valid URL nor a directory path.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.
- amocatlas.data_sources.read_calafat2025(source: str, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False, track_added_attrs: bool = False) list[Dataset][source]
Load the CALAFAT2025 transport dataset from a URL or local file path into xarray Datasets.
- Parameters:
source (str, optional) – URL or local path to the NetCDF file(s). Defaults to the CALAFAT2025 data repository URL.
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to CALAFAT2025_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
track_added_attrs (bool, optional) – If True, track which attributes were added during metadata enrichment.
- Returns:
List of loaded xarray datasets with basic inline and file-specific metadata.
- Return type:
list of xr.Dataset
- Raises:
ValueError – If the source is neither a valid URL nor a directory path.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.
- amocatlas.data_sources.read_dso(source: str, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False, track_added_attrs: bool = False) list[Dataset][source]
Load the Denmark Strait Overflow (DSO) datasets from a URL or local file path into xarray Datasets.
- Parameters:
source (str, optional) – Local path to the data directory (remote source is handled per-file).
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to DSO_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
track_added_attrs (bool, optional) – If True, track which attributes were added during metadata enrichment.
- Returns:
List of loaded xarray datasets with basic inline and file-specific metadata.
- Return type:
——- list of xr.Dataset
Notes
The original DSO_transport_hourly_1996_2021.nc file contains a corrupted DEPTH coordinate value (9.97e+36). This function automatically detects and corrects this by setting the DEPTH to NaN and documenting the correction in the dataset’s history attribute.
- Raises:
ValueError –
If no source is provided for a file and no default URL mapping is found. –
FileNotFoundError If the file cannot be downloaded or does not exist locally. –
- amocatlas.data_sources.read_fbc(source: str | Path | None, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False, track_added_attrs: bool = False) list[Dataset][source]
Load the FBC (Faroe Banks Channel) transport datasets from a URL or local file path into xarray Datasets.
- Parameters:
source (str, optional) – Local path to the data directory (remote source is handled per-file).
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to FBC_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
track_added_attrs (bool, optional) – If True, track which attributes were added during metadata enrichment.
- Returns:
List of loaded xarray datasets with basic inline and file-specific metadata.
- Return type:
list of xr.Dataset
- Raises:
ValueError – If no source is provided for a file and no default URL mapping is found.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.
- amocatlas.data_sources.read_fw2015(source: str | Path | None, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False, track_added_attrs: bool = False) list[Dataset][source]
Load the FW2015 transport datasets from a URL or local file path into xarray Datasets.
- Parameters:
source (str, optional) – Local path to the data directory (remote source is handled per-file).
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to FW2015_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
track_added_attrs (bool, optional) – If True, track which attributes were added during metadata enrichment.
- Returns:
List of loaded xarray datasets with basic inline and file-specific metadata.
- Return type:
list of xr.Dataset
- Raises:
ValueError – If no source is provided for a file and no default URL mapping is found.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.
- amocatlas.data_sources.read_mocha(source: str, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False, track_added_attrs: bool = False) list[Dataset][source]
Load the MOCHA transport dataset from a URL or local file path into xarray Datasets.
- Parameters:
source (str, optional) – URL or local path to the NetCDF file(s). Defaults to the MOCHA data repository URL.
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to MOCHA_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
track_added_attrs (bool, optional) – If True, track which attributes were added during metadata enrichment.
- Returns:
List of loaded xarray datasets with basic inline and file-specific metadata.
- Return type:
list of xr.Dataset
- Raises:
ValueError – If the source is neither a valid URL nor a directory path.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.
- amocatlas.data_sources.read_move(source: str, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False, track_added_attrs: bool = False) list[Dataset][source]
Load the MOVE transport dataset from a URL or local file path into xarray Datasets.
- Parameters:
source (str, optional) – URL or local path to the NetCDF file(s). Defaults to the MOVE data repository URL.
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to MOVE_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
track_added_attrs (bool, optional) – If True, track which attributes were added by AMOCatlas processing. Returns tuple (datasets, added_attrs_per_dataset) when enabled.
- Returns:
If track_added_attrs=False: List of loaded xarray datasets. If track_added_attrs=True: Tuple of (datasets, added_attrs_per_dataset) where added_attrs_per_dataset is a list of dictionaries containing ‘added’ and ‘modified’ attribute tracking information.
- Return type:
list of xr.Dataset or tuple
- Raises:
ValueError – If the source is neither a valid URL nor a directory path.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.
- amocatlas.data_sources.read_osnap(source: str = None, file_list: str | list[str] = None, transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False, version: str = '2025', track_added_attrs: bool = False) list[Dataset][source]
Load the OSNAP transport datasets from a URL or local file path into xarray Datasets.
- Parameters:
source (str, optional) – Local path to the data directory (remote source is handled per-file).
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults depend on version: OSNAP_2025_DEFAULT_FILES for “2025”, OSNAP_DEFAULT_FILES for “2020”.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
version (str, optional) – Dataset version to use (“2025” for 2014-2022 data, “2020” for 2014-2020 data). Defaults to “2025” (latest version).
track_added_attrs (bool, optional) – If True, track which attributes were added during metadata enrichment.
- Returns:
List of loaded xarray datasets with basic inline and file-specific metadata.
- Return type:
list of xr.Dataset
- Raises:
ValueError – If an invalid version is specified.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.
- amocatlas.data_sources.read_osnap_2025(source: str = None, file_list: str | list[str] = None, transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False) list[Dataset][source]
Load the OSNAP 2025 datasets (2014-2022 coverage) from a URL or local file path.
This is a convenience function that calls read_osnap with version=”2025”.
- Parameters:
source (str, optional) – Local path to the data directory (remote source is handled per-file).
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to OSNAP_2025_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
- Returns:
List of loaded xarray datasets with basic inline and file-specific metadata.
- Return type:
list of xr.Dataset
- amocatlas.data_sources.read_rapid(source: str | Path | None, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False, track_added_attrs: bool = False) list[Dataset] | tuple[list[Dataset], list[list[str]]][source]
Load the RAPID transport dataset from a URL or local file path into an xarray.Dataset.
- Parameters:
source (str, optional) – URL or local path to the NetCDF file(s). Defaults to the RAPID data repository URL.
file_list (str or list of str, optional) – Filename or list of filenames to process. If None, will attempt to list files in the source directory.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
track_added_attrs (bool, optional) – If True, return tuple of (datasets, list_of_metadata_changes_per_dataset). If False, return only datasets. Default is False.
- Returns:
If track_added_attrs=False: List of loaded datasets with metadata. If track_added_attrs=True: Tuple of (datasets, list of metadata changes per dataset).
- Return type:
list[xr.Dataset] or tuple[list[xr.Dataset], list[dict]]
- Raises:
ValueError – If the source is neither a valid URL nor a directory path.
FileNotFoundError – If no valid NetCDF files are found in the provided file list.
- amocatlas.data_sources.read_samba(source: str | Path | None, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False, track_added_attrs: bool = False) list[Dataset][source]
Load the SAMBA transport datasets from remote URL or local file path into xarray Datasets.
- Parameters:
source (str, optional) – URL or local path to the dataset directory. If None, will use predefined URLs per file.
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to SAMBA_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
track_added_attrs (bool, optional) – If True, track which attributes were added during metadata enrichment.
- Returns:
List of loaded xarray datasets with basic inline and file-specific metadata.
- Return type:
list of xr.Dataset
- Raises:
ValueError – If no source is provided for a file and no default URL mapping found.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.
- amocatlas.data_sources.read_zheng2024(source: str | Path | None, file_list: str | list[str], transport_only: bool = True, data_dir: str | Path | None = None, redownload: bool = False, track_added_attrs: bool = False) list[Dataset][source]
Load the ZHENG2024 transport datasets from a URL or local file path into xarray Datasets.
- Parameters:
----------
source (str, optional) – Local path to the data directory (remote source is handled per-file).
file_list (str or list of str, optional) – Filename or list of filenames to process. Defaults to ZHENG2024_DEFAULT_FILES.
transport_only (bool, optional) – If True, restrict to transport files only.
data_dir (str, Path or None, optional) – Optional local data directory.
redownload (bool, optional) – If True, force redownload of the data.
track_added_attrs (bool, optional) – If True, track which attributes were added during metadata enrichment.
- Returns:
List of loaded xarray datasets with basic inline and file-specific metadata.
- Return type:
list of xr.Dataset
- Raises:
------ –
ValueError – If no source is provided for a file and no default URL mapping is found.
FileNotFoundError – If the file cannot be downloaded or does not exist locally.
standardise
Functions to apply naming conventions, units, and metadata standards to datasets.
Standardisation functions for AMOC observing array datasets.
These functions take raw loaded datasets and: - Rename variables to standard names - Add variable-level metadata - Add or update global attributes - Prepare datasets for downstream analysis
Currently implemented: - SAMBA
- amocatlas.standardise.clean_metadata(attrs: dict, preferred_keys: dict = None) dict[source]
Clean up a metadata dictionary.
Normalize key casing
Merge aliases with identical values
Apply standard naming (via preferred_keys mapping)
- amocatlas.standardise.get_dynamic_version() str[source]
Get the actual software version using multiple detection methods.
Priority: 1. Git describe (for development in git repo) 2. Installed package version (for pip/conda installs) 3. Fallback to __version__ file
- Returns:
Software version string
- Return type:
str
- amocatlas.standardise.merge_metadata_aliases(attrs: dict, preferred_keys: dict) dict[source]
Consolidate and rename metadata keys case‑insensitively (except featureType), using preferred_keys to map aliases to canonical names.
- Parameters:
attrs (dict) – Metadata dictionary with potential duplicates.
preferred_keys (dict) – Mapping of lowercase alias keys to preferred canonical keys.
- Returns:
Metadata dictionary with duplicates merged and keys renamed.
- Return type:
dict
- amocatlas.standardise.normalize_and_add_vocabulary(attrs: dict, normalizations: dict[str, tuple[dict[str, str], str]]) dict[source]
For each (attr, (value_map, vocab_url)) in normalizations.
- If attr exists in attrs:
Map attrs[attr] using value_map (or leave it if unmapped)
Add attrs[f”{attr}_vocabulary”] = vocab_url
- Parameters:
attrs (dict) – Metadata attributes, already cleaned & renamed.
normalizations (dict) – Keys are canonical attr names (e.g. “platform”), values are (value_map, vocabulary_url) tuples.
- Returns:
attrs with normalized values and added <attr>_vocabulary entries.
- Return type:
dict
- amocatlas.standardise.reorder_metadata(attrs: dict) dict[source]
Return a new dict with keys ordered according to the OG1.0 global‐attribute list. Any attrs not in the spec list are appended at the end, in their original order.
- amocatlas.standardise.resolve_metadata_conflict(key: str, existing_value: str, new_value: str, existing_source: str = 'unknown', new_source: str = 'unknown') str[source]
Resolve metadata conflicts using consistent logic with detailed warnings.
Resolution rules: 1. If values are identical, return without warning 2. If one is empty/whitespace and other isn’t, use non-empty 3. Otherwise, use longer value and warn about the conflict
- Parameters:
key (str) – Metadata key name
existing_value (str) – Current value
new_value (str) – New value attempting to override
existing_source (str) – Description of where existing value came from
new_source (str) – Description of where new value came from
- Returns:
The resolved value to use
- Return type:
str
- amocatlas.standardise.standardise_41n(ds: Dataset, file_name: str) Dataset[source]
Standardise 41N array dataset to consistent format.
- amocatlas.standardise.standardise_47n(ds: Dataset, file_name: str) Dataset[source]
Standardise 47N array dataset to a consistent format.
- Parameters:
ds (xr.Dataset) – Raw 47N array dataset to standardise.
file_name (str) – Original filename associated with the dataset, used for metadata.
- Returns:
Standardised dataset with consistent metadata and formatting for the 47N array.
- Return type:
xr.Dataset
- amocatlas.standardise.standardise_arcticgateway(ds: Dataset, file_name: str) Dataset[source]
Standardise Arctic Gateway array dataset to consistent format.
- amocatlas.standardise.standardise_array(ds: Dataset, file_name: str) Dataset[source]
Standardise a mooring array dataset using YAML-based metadata.
Deprecated since version This: function is deprecated. Use
standardise_data()instead.- Parameters:
ds (xr.Dataset) – Raw dataset loaded from a reader with amocatlas_datasource metadata.
file_name (str) – Filename (e.g., ‘moc_transports.nc’) expected to match ds.attrs[“source_file”].
- Returns:
Standardised dataset with renamed variables and enriched metadata.
- Return type:
xr.Dataset
- amocatlas.standardise.standardise_calafat2025(ds: Dataset, file_name: str) Dataset[source]
Standardise CALAFAT2025 array dataset to consistent format.
- amocatlas.standardise.standardise_data(ds: Dataset, file_name: str) Dataset[source]
Standardise a dataset using YAML-based metadata.
- Parameters:
ds (xr.Dataset) – Raw dataset loaded from a reader with amocatlas_datasource metadata.
file_name (str) – Filename (e.g., ‘moc_transports.nc’) expected to match ds.attrs[“source_file”].
- Returns:
Standardised dataset with renamed variables and enriched metadata.
- Return type:
xr.Dataset
- Raises:
ValueError – If file_name does not match ds.attrs[“source_file”].
ValueError – If amocatlas_datasource is not found in dataset metadata.
- amocatlas.standardise.standardise_dso(ds: Dataset, file_name: str) Dataset[source]
Standardise DSO array dataset to consistent format.
- amocatlas.standardise.standardise_fbc(ds: Dataset, file_name: str) Dataset[source]
Standardise FBC array dataset to consistent format.
- amocatlas.standardise.standardise_fw2015(ds: Dataset, file_name: str) Dataset[source]
Standardise FW2015 array dataset to consistent format.
- amocatlas.standardise.standardise_mocha(ds: Dataset, file_name: str) Dataset[source]
Standardise MOCHA array dataset to consistent format.
- amocatlas.standardise.standardise_move(ds: Dataset, file_name: str) Dataset[source]
Standardise MOVE array dataset to consistent format.
- Parameters:
ds (xr.Dataset) – Raw MOVE dataset to standardise.
file_name (str) – Original filename for metadata.
- Returns:
Standardised dataset with consistent metadata and formatting.
- Return type:
xr.Dataset
- amocatlas.standardise.standardise_osnap(ds: Dataset, file_name: str) Dataset[source]
Standardise OSNAP array dataset to consistent format.
- amocatlas.standardise.standardise_rapid(ds: Dataset, file_name: str) Dataset[source]
Standardise RAPID array dataset to consistent format.
Deprecated since version This: function is deprecated. Use
standardise_data()instead.- Parameters:
ds (xr.Dataset) – Raw RAPID dataset to standardise.
file_name (str) – Original filename for metadata.
- Returns:
Standardised dataset with consistent metadata and formatting.
- Return type:
xr.Dataset
- amocatlas.standardise.standardise_samba(ds: Dataset, file_name: str) Dataset[source]
Standardise SAMBA array dataset to consistent format.
Deprecated since version This: function is deprecated. Use
standardise_data()instead.- Parameters:
ds (xr.Dataset) – Raw SAMBA dataset to standardise.
file_name (str) – Original filename for metadata.
- Returns:
Standardised dataset with consistent metadata and formatting.
- Return type:
xr.Dataset
- amocatlas.standardise.standardise_zheng2024(ds: Dataset, file_name: str) Dataset[source]
Standardise ZHENG2024 array dataset to consistent format.
- amocatlas.standardise.standardize_depth_coordinate(ds: Dataset) Dataset[source]
Standardize DEPTH coordinate to comply with AMOCatlas specifications.
All datasets with a DEPTH coordinate should have standardized attributes: - data type: double - long_name: “Depth below surface of the water” - standard_name: “depth” - units: “meters”
- Parameters:
ds (xr.Dataset) – Dataset to standardize DEPTH coordinate for.
- Returns:
Dataset with standardized DEPTH coordinate attributes.
- Return type:
xr.Dataset
- amocatlas.standardise.standardize_latitude_coordinate(ds: Dataset) Dataset[source]
Standardize LATITUDE coordinate to comply with AMOCatlas specifications.
All datasets with a LATITUDE coordinate should have standardized attributes: - data type: double - long_name: “Latitude north (WGS84)” - standard_name: “latitude” - units: “degree_north”
- Parameters:
ds (xr.Dataset) – Dataset to standardize LATITUDE coordinate for.
- Returns:
Dataset with standardized LATITUDE coordinate attributes.
- Return type:
xr.Dataset
- amocatlas.standardise.standardize_longitude_coordinate(ds: Dataset) Dataset[source]
Standardize LONGITUDE coordinate to comply with AMOCatlas specifications.
All datasets with a LONGITUDE coordinate should have standardized attributes: - data type: double - long_name: “longitude east (WGS84)” - standard_name: “longitude” - units: “degree_east”
- Parameters:
ds (xr.Dataset) – Dataset to standardize LONGITUDE coordinate for.
- Returns:
Dataset with standardized LONGITUDE coordinate attributes.
- Return type:
xr.Dataset
- amocatlas.standardise.standardize_sigma0_coordinate(ds: Dataset) Dataset[source]
Standardize SIGMA0 coordinate to comply with AMOCatlas specifications.
All datasets with a SIGMA0 coordinate should have standardized attributes: - data type: double - long_name: “Potential density anomaly to 1000 kg/m3, surface reference” - standard_name: “sea_water_sigma_theta” - units: “kg m-3”
- Parameters:
ds (xr.Dataset) – Dataset to standardize SIGMA0 coordinate for.
- Returns:
Dataset with standardized SIGMA0 coordinate attributes.
- Return type:
xr.Dataset
- amocatlas.standardise.standardize_time_coordinate(ds: Dataset) Dataset[source]
Standardize TIME coordinate to comply with AMOCatlas specifications.
All datasets with a TIME coordinate should have standardized attributes: - data type: datetime64[ns] - long_name: “Time elapsed since 1970-01-01T00:00:00Z” - standard_name: “time” - calendar: “gregorian” - units: “seconds since 1970-01-01T00:00:00Z” - vocabulary: “http://vocab.nerc.ac.uk/collection/OG1/current/TIME/”
- Parameters:
ds (xr.Dataset) – Dataset to standardize TIME coordinate for.
- Returns:
Dataset with standardized TIME coordinate attributes.
- Return type:
xr.Dataset
- amocatlas.standardise.standardize_units(ds: Dataset) Dataset[source]
Standardize variable units throughout the dataset.
Uses the comprehensive unit mapping from utilities module.
- Parameters:
ds (xr.Dataset) – Dataset to standardize units for.
- Returns:
Dataset with standardized variable units.
- Return type:
xr.Dataset
plotters
Tools for visualising AMOC time series and transport data.
AMOCatlas plotting functions for visualization and publication figures.
- amocatlas.plotters.format_units_for_plotting(units: str) str[source]
Convert verbose units to concise plotting format.
Translates full unit names to standard abbreviations commonly used in oceanographic plots and publications.
- Parameters:
units (str) – Full unit string (e.g., from netCDF attributes).
- Returns:
Abbreviated unit string suitable for plot labels.
- Return type:
str
Examples
>>> format_units_for_plotting("Sverdrup") 'Sv' >>> format_units_for_plotting("degrees_north") '°N' >>> format_units_for_plotting("degrees_Celsius") '°C'
- amocatlas.plotters.format_variable_name_for_plotting(name: str) str[source]
Convert variable names with subscripts to matplotlib LaTeX format.
This function translates variable naming patterns that include Greek letters and other subscripts into proper matplotlib LaTeX syntax for publication-quality plots.
- Parameters:
name (str) – Variable name that may contain subscript patterns.
- Returns:
Variable name formatted with matplotlib LaTeX syntax for subscripts.
- Return type:
str
Examples
>>> format_variable_name_for_plotting("MOC_sigma0") 'MOC$_{\\sigma_0}$' >>> format_variable_name_for_plotting("MOC_z") 'MOC$_{z}$' >>> format_variable_name_for_plotting("density_theta") 'density$_{\\theta}$' >>> format_variable_name_for_plotting("temp_ref") 'temp$_{ref}$'
Notes
The function converts patterns with underscores to LaTeX subscripts: - Single letters: MOC_z → MOC$_{z}$ - Greek patterns: MOC_sigma → MOC$_{\sigma}$ - Numbers: MOC_sigma0 → MOC$_{\sigma_0}$ - Multiple parts: Only the first underscore pattern is converted
Matplotlib subscript syntax: $_{text}$
- amocatlas.plotters.monthly_resample(da: DataArray) DataArray[source]
Resample to monthly mean if time is datetime-like and spacing ~monthly. If time is float-year, just pass through as-is (no interpolation).
- amocatlas.plotters.plot_all_moc_overlaid_pygmt(osnap_df: DataFrame, rapid_df: DataFrame, move_df: DataFrame, samba_df: DataFrame, filtered: bool = False) pygmt.Figure[source]
Plot all MOC time series overlaid using separate coordinate systems.
This creates overlaid plots with different y-ranges for MOC data vs SAMBA anomaly, similar to the original moc_tseries_pygmt notebook with shiftflag=False.
- Parameters:
osnap_df (pandas.DataFrame) – OSNAP MOC data with ‘time_num’ and ‘moc’/’moc_filtered’.
rapid_df (pandas.DataFrame) – RAPID MOC data with ‘time_num’ and ‘moc’/’moc_filtered’.
move_df (pandas.DataFrame) – MOVE MOC data with ‘time_num’ and ‘moc’/’moc_filtered’.
samba_df (pandas.DataFrame) – SAMBA MOC data with ‘time_num’ and ‘moc’/’moc_filtered’.
filtered (bool, default False) – Whether to plot filtered data (True) or original data (False).
- Returns:
PyGMT figure object.
- Return type:
pygmt.Figure
- Raises:
ImportError – If PyGMT is not installed.
- amocatlas.plotters.plot_all_moc_pygmt(osnap_df: DataFrame, rapid_df: DataFrame, move_df: DataFrame, samba_df: DataFrame, filtered: bool = False) pygmt.Figure[source]
Plot all MOC time series (OSNAP, RAPID, MOVE, SAMBA) in a stacked PyGMT figure.
- Parameters:
osnap_df (pandas.DataFrame) – OSNAP MOC data with ‘time_num’ and ‘moc’/’moc_filtered’.
rapid_df (pandas.DataFrame) – RAPID MOC data with ‘time_num’ and ‘moc’/’moc_filtered’.
move_df (pandas.DataFrame) – MOVE MOC data with ‘time_num’ and ‘moc’/’moc_filtered’.
samba_df (pandas.DataFrame) – SAMBA MOC data with ‘time_num’ and ‘moc’/’moc_filtered’.
filtered (bool, default False) – Whether to plot filtered data (True) or original data (False).
- Returns:
PyGMT figure object.
- Return type:
pygmt.Figure
- Raises:
ImportError – If PyGMT is not installed.
- amocatlas.plotters.plot_amoc_2d_data(data: Dataset | DataArray, varname: str | None = None, title: str = 'AMOC 2D Data', ylabel: str | None = None, time_limits: tuple[str | Timestamp, str | Timestamp] | None = None, ylim: tuple[float, float] | None = None, figsize: tuple[float, float] = (12, 6), colormap: str = 'RdBu_r', vmin: float | None = None, vmax: float | None = None) tuple[Figure, Axes][source]
Plot 2D AMOC data with time on x-axis and depth/other coordinate on y-axis.
This function creates a color-filled contour plot suitable for visualizing 2D oceanographic data such as MOC streamfunction vs depth and time, or temperature profiles over time.
- Parameters:
data (xr.Dataset or xr.DataArray) – Dataset or DataArray containing 2D data to plot.
varname (str, optional) – Variable name to extract from dataset. Not needed if DataArray is passed.
title (str) – Title of the plot.
ylabel (str, optional) – Label for the y-axis (vertical coordinate). If None, inferred from data attributes.
time_limits (tuple of str or pd.Timestamp, optional) – X-axis time limits (start, end).
ylim (tuple of float, optional) – Y-axis limits (min, max).
figsize (tuple) – Size of the figure.
colormap (str) – Colormap for the 2D data. Default is ‘RdBu_r’ (polar colormap).
vmin (float, optional) – Minimum value for color scale. If None, inferred from data.
vmax (float, optional) – Maximum value for color scale. If None, inferred from data.
- Returns:
The matplotlib figure and axes objects.
- Return type:
tuple[plt.Figure, plt.Axes]
- Raises:
ValueError – If data doesn’t have the required dimensions or if TIME coordinate is missing.
- amocatlas.plotters.plot_amoc_timeseries(data: list[Dataset | DataArray] | Dataset | DataArray, varnames: list[str] | None = None, labels: list[str] | None = None, colors: list[str] | None = None, title: str = 'AMOC Time Series', ylabel: str | None = None, time_limits: tuple[str | Timestamp, str | Timestamp] | None = None, ylim: tuple[float, float] | None = None, figsize: tuple[float, float] = (10, 3), resample_monthly: bool = True, plot_raw: bool = True, lat_idx: int | None = None, region_idx: int | None = None, posterior_stat: str = 'mean') tuple[Figure, Axes][source]
Plot original and optionally monthly-averaged AMOC time series for one or more datasets.
- Parameters:
data (list of xarray.Dataset or xarray.DataArray) – List of datasets or DataArrays to plot.
varnames (list of str, optional) – List of variable names to extract from each dataset. Not needed if DataArrays are passed.
labels (list of str, optional) – Labels for the legend.
colors (list of str, optional) – Colors for monthly-averaged plots.
title (str) – Title of the plot.
ylabel (str, optional) – Label for the y-axis. If None, inferred from attributes.
time_limits (tuple of str or pd.Timestamp, optional) – X-axis time limits (start, end).
ylim (tuple of float, optional) – Y-axis limits (min, max).
figsize (tuple) – Size of the figure.
resample_monthly (bool) – If True, monthly averages are computed and plotted.
plot_raw (bool) – If True, raw data is plotted.
lat_idx (int, optional) – Latitude index to select when dataset has a ‘lat’ dimension. Required if dataset contains ‘lat’ dimension with posterior samples.
region_idx (int, optional) – Region index to select when dataset has a ‘number_regions’ dimension. Required if dataset contains ‘number_regions’ dimension with posterior samples.
posterior_stat (str, default "mean") – Statistic to use when collapsing posterior samples dimension. Options are “mean” or “median”.
- amocatlas.plotters.plot_bryden2005_pygmt() pygmt.Figure[source]
Plot Bryden et al. 2005 historical AMOC estimates using PyGMT.
Creates a plot of the historical AMOC estimates from Bryden et al. (2005) showing the decline from 1957 to 2004. This provides historical context for modern observational time series.
- Returns:
PyGMT figure object.
- Return type:
pygmt.Figure
- Raises:
ImportError – If PyGMT is not installed.
References
Bryden, H. L., Longworth, H. R., & Cunningham, S. A. (2005). Slowing of the Atlantic meridional overturning circulation at 25°N. Nature, 438(7068), 655-657.
- amocatlas.plotters.plot_moc_timeseries_pygmt(df: DataFrame, column: str = 'moc', label: str = 'MOC [Sv]') pygmt.Figure[source]
Plot MOC time series using PyGMT with publication-quality styling.
- Parameters:
df (pandas.DataFrame) – DataFrame with ‘time_num’ (decimal years) and data columns.
column (str, default "moc") – Name of the column to plot.
label (str, default "MOC [Sv]") – Y-axis label for the plot.
- Returns:
PyGMT figure object.
- Return type:
pygmt.Figure
- Raises:
ImportError – If PyGMT is not installed.
- amocatlas.plotters.plot_monthly_anomalies(**kwargs) tuple[Figure, list[Axes]][source]
Plot the monthly anomalies for various datasets.
Pass keyword arguments in the form: label_name_data, label_name_label.
- For example:
osnap_data = standardOSNAP[0][“MOC_all”], osnap_label = “OSNAP” …
- amocatlas.plotters.plot_osnap_components_pygmt(data: DataFrame | Dict) pygmt.Figure[source]
Plot OSNAP MOC components with shaded error bands using PyGMT.
- Parameters:
data (pandas.DataFrame or dict) – Must contain: - time_num (decimal years) - MOC_SIGMA0, MOC_EAST_SIGMA0, MOC_WEST_SIGMA0 (or legacy MOC_ALL, MOC_EAST, MOC_WEST) - MOC_EAST_SIGMA0_ERR, MOC_WEST_SIGMA0_ERR (or legacy MOC_EAST_ERR, MOC_WEST_ERR)
- Returns:
PyGMT figure object.
- Return type:
pygmt.Figure
- Raises:
ImportError – If PyGMT is not installed.
- amocatlas.plotters.plot_rapid_components_pygmt(df: DataFrame) pygmt.Figure[source]
Plot RAPID MOC and component transports using PyGMT.
- Parameters:
df (pandas.DataFrame) – Must include: - ‘time_num’ - ‘moc_mar_hc10’ (total MOC) - ‘t_gs10’ (Florida Current) - ‘t_ek10’ (Ekman) - ‘t_umo10’ (upper mid-ocean)
- Returns:
PyGMT figure object.
- Return type:
pygmt.Figure
- Raises:
ImportError – If PyGMT is not installed.
- amocatlas.plotters.show_attributes(data: str | Dataset) DataFrame[source]
Processes an xarray Dataset or a netCDF file, extracts attribute information, and returns a DataFrame with details about the attributes.
- Parameters:
data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
- Returns:
A DataFrame containing the following columns: - Attribute: The name of the attribute. - Value: The value of the attribute. - DType: The data type of the attribute.
- Return type:
pandas.DataFrame
- Raises:
TypeError – If the input data is not a file path (str) or an xarray Dataset.
- amocatlas.plotters.show_contents(data: str | Dataset, content_type: str = 'variables') Styler | DataFrame[source]
Wrapper function to show contents of an xarray Dataset or a netCDF file.
- Parameters:
data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
content_type (str, optional) – The type of content to show, either ‘variables’ (or ‘vars’) or ‘attributes’ (or ‘attrs’). Default is ‘variables’.
- Returns:
A styled DataFrame with details about the variables or attributes.
- Return type:
pandas.io.formats.style.Styler or pandas.DataFrame
- Raises:
TypeError – If the input data is not a file path (str) or an xarray Dataset.
ValueError – If the content_type is not ‘variables’ (or ‘vars’) or ‘attributes’ (or ‘attrs’).
- amocatlas.plotters.show_variables(data: str | Dataset) Styler[source]
Processes an xarray Dataset or a netCDF file, extracts variable information, and returns a styled DataFrame with details about the variables.
- Parameters:
data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
- Returns:
A styled DataFrame containing the following columns: - dims: The dimension of the variable (or “string” if it is a string type). - name: The name of the variable. - units: The units of the variable (if available). - comment: Any additional comments about the variable (if available). - standard_name: The standard name of the variable (if available). - dtype: The data type of the variable.
- Return type:
pd.io.formats.style.Styler
- Raises:
TypeError – If the input data is not a file path (str) or an xarray Dataset.
- amocatlas.plotters.show_variables_by_dimension(data: str | Dataset, dimension_name: str = 'trajectory') Styler[source]
Extracts variable information from an xarray Dataset or a netCDF file and returns a styled DataFrame with details about the variables filtered by a specific dimension.
- Parameters:
data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
dimension_name (str, optional) – The name of the dimension to filter variables by, by default “trajectory”.
- Returns:
A styled DataFrame containing the following columns: - dims: The dimension of the variable (or “string” if it is a string type). - name: The name of the variable. - units: The units of the variable (if available). - comment: Any additional comments about the variable (if available).
- Return type:
pandas.io.formats.style.Styler
- Raises:
TypeError – If the input data is not a file path (str) or an xarray Dataset.
writers
Data writing utilities for AMOCatlas.
This module provides functions for writing and exporting AMOCatlas datasets to various formats, with special handling for NetCDF export, attribute sanitization, and datetime encoding. Includes functions to save datasets with proper compression and metadata formatting.
- amocatlas.writers.save_AC1_dataset(ds: Dataset, data_dir: str | Path) Path[source]
Save AC1 dataset to netCDF using the OceanSITES ‘id’ attribute.
- Parameters:
ds (xarray.Dataset) – Dataset with AC1-compliant global attributes including ‘id’.
data_dir (str or pathlib.Path) – Directory to save the netCDF file.
- Returns:
Full path to the saved NetCDF file.
- Return type:
Path
- Raises:
ValueError – If ‘id’ global attribute is not found.
- amocatlas.writers.save_dataset(ds: Dataset, output_file: str = '../test.nc') bool[source]
Attempts to save the dataset to a NetCDF file. If a TypeError occurs due to invalid attribute values, it converts the invalid attributes to strings and retries the save operation.
- Parameters:
ds (xarray.Dataset) – The dataset to be saved.
output_file (str, optional) – The path to the output NetCDF file. Defaults to ‘../test.nc’.
- Returns:
True if the dataset was saved successfully, False otherwise.
- Return type:
bool
Notes
This function is based on a workaround for issues with saving datasets containing attributes of unsupported types. See: https://github.com/pydata/xarray/issues/3743
tools
Helper functions for data manipulation, unit conversion, and clean-up.
AMOCatlas analysis tools for data processing, filtering, and calculations.
- amocatlas.tools.apply_tukey_filter(df: DataFrame, column: str, window_months: int = 6, samples_per_day: float = 0.2, alpha: float = 0.5, add_back_mean: bool = False, output_column: str | None = None) DataFrame[source]
Apply a Tukey filter using NumPy convolution (safely handles NaN values).
This function uses pandas DataFrame input to leverage NumPy’s convolution capabilities with Tukey windows, which provides more flexibility than xarray’s built-in rolling operations for this specific filtering approach.
- Parameters:
df (pandas.DataFrame) – Input DataFrame containing the column to filter.
column (str) – Name of the column to apply the filter to.
window_months (int, default 6) – Filter window size in months.
samples_per_day (float, default 0.2) – Expected number of samples per day in the data.
alpha (float, default 0.5) – Tukey window parameter (0=rectangular, 1=Hann).
add_back_mean (bool, default False) – Whether to remove and add back the overall mean.
output_column (str, optional) – Name for the filtered output column. If None, uses “{column}_filtered”.
- Returns:
Copy of input DataFrame with filtered column added.
- Return type:
pandas.DataFrame
Notes
Uses pandas DataFrame rather than xarray Dataset because pandas provides better access to convolution operations with custom window functions.
- amocatlas.tools.bin_average_5day(df: DataFrame, time_column: str = 'time', value_column: str = 'moc') DataFrame[source]
Bin-average a time series into 5-day means.
- Parameters:
df (pandas.DataFrame) – Input DataFrame with time and value columns.
time_column (str, default "time") – Name of the datetime column.
value_column (str, default "moc") – Name of the data column to average.
- Returns:
DataFrame with 5-day averaged time and values.
- Return type:
pandas.DataFrame
- amocatlas.tools.bin_average_monthly(df: DataFrame, time_column: str = 'time') DataFrame[source]
Bin-average a time series into monthly means.
- Parameters:
df (pandas.DataFrame) – Input DataFrame with time column.
time_column (str, default "time") – Name of the datetime column.
- Returns:
DataFrame with monthly averaged data.
- Return type:
pandas.DataFrame
- amocatlas.tools.check_and_bin(df: DataFrame, time_column: str = 'time') DataFrame[source]
Check temporal resolution and bin to monthly if needed.
- Parameters:
df (pandas.DataFrame) – Input DataFrame with time column.
time_column (str, default "time") – Name of the datetime column.
- Returns:
Original DataFrame if already monthly, or monthly-binned version.
- Return type:
pandas.DataFrame
- amocatlas.tools.convert_units_var(var_values: ndarray | float, current_unit: str, new_unit: str, unit_conversion: dict[str, dict[str, float]] = {'PW': {'W': 1000000000000000.0}, 'Pa': {'dbar': 0.0001}, 'S/m': {'mS/cm': 0.1}, 'Sv': {'sverdrup': 1.0}, 'W': {'PW': 1e-15}, 'cm': {'m': 0.01}, 'cm s-1': {'m s-1': 0.01}, 'cm/s': {'m/s': 0.01}, 'dbar': {'Pa': 10000, 'kPa': 10}, 'degree_Celsius': {'degrees_Celsius': 1.0}, 'degrees_Celsius': {'degree_Celsius': 1}, 'g m-3': {'kg m-3': 0.001}, 'kPa': {'dbar': 0.1}, 'kg m-3': {'g m-3': 1000.0}, 'km': {'m': 1000.0}, 'm': {'cm': 100, 'km': 0.001}, 'm s-1': {'cm s-1': 100.0}, 'm/s': {'cm/s': 100.0}, 'mS/cm': {'S/m': 10.0}, 'sverdrup': {'Sv': 1}}) ndarray | float[source]
Converts variable values from one unit to another using a predefined conversion factor.
- Parameters:
var_values (numpy.ndarray or float) – The values to be converted.
current_unit (str) – The current unit of the variable values.
new_unit (str) – The target unit to which the variable values should be converted.
unit_conversion (dict of {str: dict of {str: float}}, optional) – A dictionary containing conversion factors between units. The default is unit_conversion.
- Returns:
The converted variable values. If no conversion factor is found, the original values are returned.
- Return type:
numpy.ndarray or float
- Raises:
KeyError – If the conversion factor for the specified units is not found in the unit_conversion dictionary.
Notes
If the conversion factor for the specified units is not available, a message is printed, and the original values are returned without any conversion.
- amocatlas.tools.extract_time_and_time_num(ds: Dataset, time_var: str = 'TIME') DataFrame[source]
Extract time coordinates from xarray Dataset and convert to pandas DataFrame.
- Parameters:
ds (xarray.Dataset) – Dataset containing time coordinate.
time_var (str, default "TIME") – Name of the time variable in the dataset.
- Returns:
DataFrame with ‘time’ (datetime) and ‘time_num’ (decimal year) columns.
- Return type:
pandas.DataFrame
- amocatlas.tools.find_best_dtype(var_name: str, da: DataArray) dtype[source]
Determines the most suitable data type for a given variable.
- Parameters:
var_name (str) – The name of the variable.
da (xarray.DataArray) – The data array containing the variable’s values.
- Returns:
The optimal data type for the variable based on its name and values.
- Return type:
numpy.dtype
- amocatlas.tools.generate_reverse_conversions(forward_conversions: dict[str, dict[str, float]]) dict[str, dict[str, float]][source]
Create a unit conversion dictionary with both forward and reverse conversions.
- Parameters:
forward_conversions (dict of {str: dict of {str: float}}) – Mapping of source units to target units and conversion factors. Example: {“m”: {“cm”: 100, “km”: 0.001}}
- Returns:
dict of {str – Complete mapping of units including reverse conversions. Example: {“cm”: {“m”: 0.01}, “km”: {“m”: 1000}}
- Return type:
dict of {str: float}}
Notes
If a conversion factor is zero, a warning is printed, and the reverse conversion is skipped.
- amocatlas.tools.handle_samba_gaps(df: DataFrame, time_column: str = 'time') DataFrame[source]
Handle temporal gaps in SAMBA MOC data to prevent plotting artifacts.
SAMBA data has significant gaps (e.g., 2011-2014) that cause plotting functions to draw connecting lines across missing periods. This function creates a regular monthly grid and masks interpolation to only occur within existing data periods, preventing spurious connections across large gaps.
- Parameters:
df (pandas.DataFrame) – Input DataFrame with time and MOC columns.
time_column (str, default "time") – Name of the datetime column.
- Returns:
DataFrame with regular monthly grid and gap-aware data masking.
- Return type:
pandas.DataFrame
Notes
PyGMT and other plotting functions connect all valid (non-NaN) data points regardless of temporal gaps. This function prevents artifacts by: 1. Creating a regular monthly time grid 2. Preserving NaN values where no original data existed 3. Only interpolating within continuous data segments
- amocatlas.tools.reformat_units_var(ds: Dataset, var_name: str, unit_format: dict[str, str] = {'S/m': 'S m-1', 'cm/s': 'cm s-1', 'degrees_Celsius': 'degree_Celsius', 'g/m^3': 'g m-3', 'm/s': 'm s-1', 'meters': 'm'}) str[source]
Reformat the units of a variable in the dataset based on a provided mapping.
- Parameters:
ds (xarray.Dataset) – The input dataset containing variables with units to be reformatted.
var_name (str) – The name of the variable whose units need to be reformatted.
unit_format (dict of {str: str}, optional) – A dictionary mapping old unit strings to new formatted unit strings. Defaults to unit_str_format.
- Returns:
The reformatted unit string. If the old unit is not found in unit_format, the original unit string is returned.
- Return type:
str
- amocatlas.tools.set_best_dtype(ds: Dataset) Dataset[source]
Adjust the data types of variables in a dataset to optimize memory usage.
- Parameters:
ds (xarray.Dataset) – The input dataset whose variables’ data types will be adjusted.
- Returns:
The dataset with updated data types for its variables, potentially saving memory.
- Return type:
xarray.Dataset
Notes
The function determines the best data type for each variable using find_best_dtype.
Attributes like valid_min and valid_max are updated to match the new data type.
If the new data type is integer-based, NaN values are replaced with a fill value.
Logs the percentage of memory saved after the data type adjustments.
- amocatlas.tools.set_fill_value(new_dtype: dtype) int[source]
Calculate the fill value for a given data type.
- Parameters:
new_dtype (numpy.dtype) – The data type for which the fill value is to be calculated.
- Returns:
The calculated fill value based on the bit-width of the data type.
- Return type:
int
- amocatlas.tools.to_decimal_year(dates: Series) Series[source]
Convert datetime series to decimal years, handling NaN values safely.
- Parameters:
dates (pandas.Series or pandas.DatetimeIndex) – Series or Index of datetime objects to convert.
- Returns:
Series of decimal years with NaN preserved for invalid dates.
- Return type:
pandas.Series
Examples
>>> import pandas as pd >>> dates = pd.Series(['2020-01-01', '2020-07-01', '2021-01-01']) >>> dates = pd.to_datetime(dates) >>> decimal_years = to_decimal_year(dates)
utilities
Shared utilities for downloading, reading, and parsing data files.
Utility functions for AMOCatlas package.
This module provides shared utility functions including: - File download and caching - Data directory management - URL and path validation - Metadata loading and validation - Decorator functions for default parameters
- amocatlas.utilities.apply_defaults(default_source: str, default_files: List[str]) Callable[source]
Decorator to apply default values for ‘source’ and ‘file_list’ parameters if they are None.
- Parameters:
default_source (str) – Default source URL or path.
default_files (list of str) – Default list of filenames.
- Returns:
A wrapped function with defaults applied.
- Return type:
Callable
- amocatlas.utilities.apply_unit_standardization_after_metadata(ds: Dataset) Dataset[source]
Apply unit standardization with high priority to override YAML metadata.
This function is designed to be called after metadata enrichment to ensure that standardized units take precedence over any units specified in YAML metadata files.
- Parameters:
ds (xr.Dataset) – Dataset that may have had units overwritten by metadata processing.
- Returns:
Dataset with units re-standardized.
- Return type:
xr.Dataset
Notes
This addresses the issue where YAML metadata files contain “Sv” units that override the standardized “Sverdrup” units. This function should be called as the final step in standardization.
Examples
>>> # In standardization pipeline >>> ds = apply_metadata_from_yaml(ds) # This might set units: Sv >>> ds = apply_unit_standardization_after_metadata(ds) # This fixes it
- amocatlas.utilities.download_file(url: str, dest_folder: str, redownload: bool = False, filename: str = None) str[source]
Download a file from HTTP(S) or FTP to the specified destination folder.
- Parameters:
url (str) – The URL of the file to download.
dest_folder (str) – Local folder to save the downloaded file.
redownload (bool, optional) – If True, force re-download of the file even if it exists.
filename (str, optional) – Optional filename to save the file as. If not given, uses the name from the URL.
- Returns:
The full path to the downloaded file.
- Return type:
str
- Raises:
ValueError – If the URL scheme is unsupported.
- amocatlas.utilities.find_data_start(file_path: str) int[source]
Locate the first line of numerical data in a legacy ASCII file.
This function scans an ASCII text file line by line and returns the zero-based line index of the first row that appears to contain data. A data row is identified as a non-empty line whose first non-whitespace character is a digit. This is useful for files with long, human-readable headers (titles, references, separators) preceding the actual data table.
- Parameters:
file_path (str) – Path to the ASCII file to be scanned.
- Returns:
Zero-based line index at which the numerical data table begins.
- Return type:
int
- Raises:
ValueError – If no data-like lines are found in the file.
- amocatlas.utilities.get_default_data_dir() Path[source]
Get the default data directory path for AMOCatlas.
- amocatlas.utilities.get_project_root() Path[source]
Return the absolute path to the project root directory.
- amocatlas.utilities.get_standard_unit_mappings() Dict[str, str][source]
Get the comprehensive mapping of unit variations to standard units.
Uses defaults.PREFERRED_UNITS as target values for standardization.
- Returns:
Dictionary mapping various unit forms to their standard equivalents.
- Return type:
Dict[str, str]
Notes
This centralizes all unit standardization rules for consistency across the AMOCatlas package. Add new unit mappings here as needed. Target values come from defaults.PREFERRED_UNITS to ensure consistency.
Examples
>>> mappings = get_standard_unit_mappings() >>> print(mappings["Sv"]) # "Sverdrup" >>> print(mappings["deg C"]) # "degree_C"
- amocatlas.utilities.is_valid_url(url: str) bool[source]
Validate if a given string is a valid URL with supported schemes.
- Parameters:
url (str) – The URL string to validate.
- Returns:
True if the URL is valid and uses a supported scheme (‘http’, ‘https’, ‘ftp’), otherwise False.
- Return type:
bool
- amocatlas.utilities.load_array_metadata(datasource_id: str) dict[source]
Load metadata YAML for a given data source.
- Parameters:
datasource_id (str) – Datasource identifier (e.g., ‘rapid26n’, ‘samba34s’).
- Returns:
Dictionary containing the parsed YAML metadata.
- Return type:
dict
- amocatlas.utilities.mask_invalid_values(ds: Dataset) Dataset[source]
Mask values outside valid_min/valid_max ranges as NaN.
Many netCDF files contain valid_min and valid_max attributes that define the valid range for variables. Values outside this range should be treated as missing data but are often not automatically masked by xarray.
- Parameters:
ds (xr.Dataset) – Dataset to check for invalid values.
- Returns:
Dataset with values outside valid ranges masked as NaN.
- Return type:
xr.Dataset
Examples
>>> # Variable has valid_min=-100, valid_max=100 but contains 9.97e+36 >>> ds_clean = mask_invalid_values(ds) >>> # Now extreme values are masked as NaN
- amocatlas.utilities.normalize_whitespace(attrs: dict) dict[source]
Replace non-breaking & other unusual whitespace in every string attr value with a normal ASCII space, and collapse runs of whitespace down to one space.
- amocatlas.utilities.parse_ascii_header(file_path: str, comment_char: str = '%') Tuple[List[str], int][source]
Parse the header of an ASCII file to extract column names and the number of header lines.
Header lines are identified by the given comment character (default: ‘%’). Columns are defined in lines like: ‘<comment_char> Column 1: <column_name>’.
- Parameters:
file_path (str) – Path to the ASCII file.
comment_char (str, optional) – Character used to identify header lines. Defaults to ‘%’.
- Returns:
A tuple containing: - A list of column names extracted from the header. - The number of header lines to skip.
- Return type:
tuple of (list of str, int)
- amocatlas.utilities.read_ascii_file(file_path: str, comment_char: str = '#') DataFrame[source]
Read an ASCII file into a pandas DataFrame, skipping lines starting with a specified comment character.
- Parameters:
file_path (str) – Path to the ASCII file.
comment_char (str, optional) – Character denoting comment lines. Defaults to ‘#’.
- Returns:
The loaded data as a pandas DataFrame.
- Return type:
pd.DataFrame
- amocatlas.utilities.resolve_file_path(file_name: str, source: str | Path | None, download_url: str | None, local_data_dir: Path, redownload: bool = False) Path[source]
Resolve the path to a data file, using local source, cache, or downloading if necessary.
- Parameters:
file_name (str) – The name of the file to resolve.
source (str or Path or None) – Optional local source directory.
download_url (str or None) – URL to download the file if needed.
local_data_dir (Path) – Directory where downloaded files are stored.
redownload (bool, optional) – If True, force redownload even if cached file exists.
- Returns:
Path to the resolved file.
- Return type:
Path
- amocatlas.utilities.safe_update_attrs(ds: Dataset, new_attrs: Dict[str, str], overwrite: bool = False, verbose: bool = True) Dataset[source]
Safely update attributes of an xarray Dataset without overwriting existing keys, unless explicitly allowed.
- Parameters:
ds (xr.Dataset) – The xarray Dataset whose attributes will be updated.
new_attrs (dict of str) – Dictionary of new attributes to add.
overwrite (bool, optional) – If True, allow overwriting existing attributes. Defaults to False.
verbose (bool, optional) – If True, emit a warning when skipping existing attributes. Defaults to True.
- Returns:
The dataset with updated attributes.
- Return type:
xr.Dataset
- amocatlas.utilities.sanitize_variable_name(name: str) str[source]
Sanitize variable names to create valid Python identifiers.
Replaces illegal Python identifier characters (spaces, parentheses, periods, hyphens, etc.) with underscores and collapses repeated underscores into single ones.
- Parameters:
name (str) – The original variable name that may contain illegal characters
- Returns:
A sanitized variable name that is a valid Python identifier
- Return type:
str
Examples
>>> sanitize_variable_name("Total MOC anomaly (relative to record-length average of 14.7 Sv)") 'Total_MOC_anomaly__relative_to_record_length_average_of_14_7_Sv' >>> sanitize_variable_name("Upper-cell volume transport anomaly") 'Upper_cell_volume_transport_anomaly'
- amocatlas.utilities.standardize_dataset_units(ds: Dataset, mapping: Dict[str, str] | None = None, log_changes: bool = True) Dataset[source]
Standardize units throughout a dataset using comprehensive mapping rules.
- Parameters:
ds (xr.Dataset) – Dataset to standardize units for.
mapping (Dict[str, str], optional) – Custom unit mapping. If None, uses get_standard_unit_mappings().
log_changes (bool, optional) – Whether to log unit changes. Default is True.
- Returns:
Dataset with standardized units.
- Return type:
xr.Dataset
Notes
This function applies unit standardization to all variables and coordinates in the dataset. It’s designed to be the central unit standardization function for AMOCatlas, replacing the simpler standardize_units function.
Examples
>>> ds_std = standardize_dataset_units(ds) >>> # Check if Sv was converted to Sverdrup >>> print(ds_std['transport'].attrs['units']) # "Sverdrup"
- amocatlas.utilities.validate_array_yaml(datasource_id: str, verbose: bool = True) bool[source]
Validate the structure and required fields of a datasource metadata YAML.
- Parameters:
datasource_id (str) – The datasource identifier (e.g., ‘rapid26n’, ‘samba34s’).
verbose (bool) – If True, print detailed validation messages.
- Returns:
True if validation passes, False otherwise.
- Return type:
bool