OceanSITES Format for AMOC Arrays
This document describes how RAPID, MOVE, OSNAP, and SAMBA array products are mapped into OceanSITES-compliant NetCDF structures within the amocarray package. It summarizes conventions from the OceanSITES format reference manual (v1.4, from 2020) and describes decisions for AMOC-specific implementation.
File naming
According to OceanSITES, the filenaming convention is: OS_[PlatformCode]_[DeploymentCode]_[DataMode]_[PARTX].nc, when applied to individual instrument deployments.
OceanSITES: Gridded and derived data
OceanSITES says a number of higher-level data products can be created:
merged: A “long time series” version that may concatenate multiple deployments (some homogenization). Not used in amocarray.
gridded (GRD): A “gridded” version which interpolates to a space-time grid different from native instrumental resolution (this is what OSNAP and RAPID provide for their TEMPERATURE and SALINITY fields)
derived (DPR): A “derived” data product (e.g., the “overturning circulation” or “meridional heat transport”)
File naming for array data products
OS_[PSPANCode]_[StartEndCode]_ [ContentType]_[PARTX].nc
PSPANCode: the [PlatformCode] can be replaced with an appropriate choice of site, project, array or network which can be taken from the global attributes of the underlying source data. For amocarray, we will use the array global attribute (should be e.g. RAPID, OSNAP, MOVE, SAMBA, 11South).
StartEndCode: the [DeploymentCode] can be replaced with a time range that is appropriate for the data in the file. For amocarray, this will be the time range of the data in the file. Preferred format is e.g. “20050301-20190831” to indicate data from March 2005 through August 2019.
ContentType: the [DataMode] can be replaced with a three-letter code that describes the content of the file (distinguished from the deployment files, which have a one-letter code here), one of:
LTS (not used in amocarray): The data are “long time series” data that are essentially at the native instrumental resolution in space and time. The primary difference from the deployment-by-deployment files is that a single file contains merged data from multiple deployments.
GRD: The data are “gridded”, meaning that some sort of binning, averaging, interpolating has been done to format the data onto a space-time grid that is different from the native resolution, and more than a simple concatenation like the “LTS” option. This is what OSNAP and RAPID provide for their TEMPERATURE and SALINITY fields.
DPR: The data are a “derived product”, which means that there are data that were derived from multiple sites or some other higher-order processing that the data provider distinguishes from the lower-level data. This is the case for the overturning transports and component transports and streamfunctions.
[PARTX] - An optional user-defined field for additional identification or explanation of data. For gridded data, this could include the record interval as subfields of ISO 8601 (PnYnMnDTnHnMnS), e.g. P1M for monthly data, T30M for 30 minutes, T1H for hourly. For amocarray, this will be a short code corresponding to the types of data in the file:
RAPID (PSPANCode = RAPID): - ts_gridded.nc has individual locations with timeSeriesProfile. The PARTX for OceanSITES will be gridded_mooring, and ContentType = GRD. - moc_vertial.nc and moc_transports.nc have the streamfunction, and time series of component transports at 12-hour intervals. The PARTX for OceanSITES will combine both and this will be transports_T12H.nc, and ContentType = DPR. - 2d_gridded.nc has the 2D gridded data. The PARTX for OceanSITES will be sections_T10D for monthly, and ContentType = GRD. - meridional_transports.nc has the MOC transport in depth and sigma coordinates, as well as MHT and MFT on a 10-day grid. The PARTX for OceanSITES will be transports_T10D.nc, and ContentType = DPR.
OSNAP (PSPANCode = OSNAP): - OSNAP_MOC_MHT_MFT_TimeSeries_201408_202006_2023.nc has the MOC, MHT and MFT on a monthly grid. OSNAP_Streamfunction_*nc has the streamfunction for west, east and all on a monthly and sigma grid. The PARTX for OceanSITES will be transports_T1M.nc, with ContentType = DPR. - OSNAP_2D_Gridded_Temperature_Salinity_Velocity_201408_202006.nc has the 2D gridded data for TEMP, SAL and VELO on TIME, LONGITUDE, LATITUDE and DEPTH. The PARTX for OceanSITES will be sections_T1M.nc, with ContentType = GRD.
MOVE (PSPANCode = MOVE): - OS_MOVE_TRANSPORTS.nc has the total transport in a layer 1200-4950 dbar, the internal, offset and boundary transports. The PARTX for OceanSITES will be transports_T1M.nc, with ContentType = DPR.
SAMBA (PSPANCode = SAMBA): - Upper_Abyssal_Transport_Anomalies.txt has the upper and abyssal transport anomalies. The PARTX for OceanSITES will be Kersale_transports_T1M.nc, with ContentType = DPR. - MOC_TotalAnomaly_and_constituents.asc has the MOC total anomaly and constituents/components. The PARTX for OceanSITES will be Meinen_transports_T1M.nc, with ContentType = DPR.
Unclear whether to call “Meinen” the “2site” and “Kersale” the “9site”.
Feature Types
The following OceanSITES featureTypes are used:
Feature Type |
Used For |
Description |
---|---|---|
|
MOC, MHT, MFT, Ekman |
1D time series at a single or derived location |
|
T/S profiles at fixed mooring locations |
Includes depth and time as dimensions |
|
Interpolated sections from OSNAP, RAPID, SAMBA |
Regular grids in depth/longitude or density/longitude |
|
Streamfunction, MOC decompositions |
Derived products, not raw observations |
Global Metadata
The following global attributes are recommended for OceanSITES-compliant NetCDF files. The RS
column indicates the requirement status:
M = Mandatory (required for compliance or by GDAC)
HD = Highly Desired (strongly recommended)
S = Suggested (optional but useful)
Unidata Attribute Convention for Data Discovery (ACDD). See [here](https://www.esipfed.org/what-is-acdd/).
Additional metadata attributes from the deployment-by-deployment files (as
specified earlier in this document) are possible and welcome, as long as they make sense for the data product in question.
1. Discovery and Identification
The following global attributes are recommended for inclusion in all OceanSITES-compliant NetCDF files. This table includes both required and suggested metadata fields relevant for data discovery, attribution, and catalog integration.
Attribute Name |
Definition |
Example |
RS |
---|---|---|---|
|
Name of the OceanSITES site. Technically, site codes should be approved by the OceanSITES Project Office to avoid duplication. |
|
M (for GDAC) |
|
Indicates if data are real-time ( |
|
M (for GDAC) |
|
Short, human-readable phrase or sentence describing the dataset. |
|
HD |
|
List of OceanSITES theme areas to which this dataset belongs (comma separated, see reference manual for options). Omitted for datasets not derived from moored observations. |
ex.: “Transport Moored Arrays” |
S |
|
A unique name that identifies the institution or organisation who provided the id. ACDD-1.3 recommends using reverse-DNS naming. |
|
S |
|
OceanSITES array grouping based on scientific rationale. Note that this will be part of the |
ex.: “RAPID” |
M |
|
Unique dataset ID (often filename without .nc, which would be “OS_<array>_<YYYYMMDD>-<YYYYMMDD>_<GRD/DPR>_<PARTX>” where <array> is one of “RAPID”, “MOVE”, etc, the datestrings are the start and end dates of the dataset, GRD for gridded data, DPR for derived products, and the PARTX is some unique combination of “transports_T1M” or “sections_T10D” or similar). |
|
M |
|
Longer free-format text describing the dataset. This attribute should allow data discovery for a human reader. A paragraph of up to 100 words is appropriate. (ACDD) |
“Oceanographic mooring data from the RAPID array at 26°N in the Atlantic since 2004. Measured properties: temperature, salinity at 20 dbar intervals and 10-day intervals.” |
S |
|
Use a term from the SeaVoX Platform Categories vocabulary (L06) list, usually one of the following: “moored surface buoy”, “subsurface mooring”, ”ship” (CF) |
|
HD |
|
Name of the person responsible for the scientific project. Multiple PIs are separated by commas. |
ex.: “Alice Juarez, John Smith” |
M |
|
Email address of the PI. |
ex.: “ajuarez@whoi.edu, john.smith@noc.ac.uk” |
S |
|
ORCiD or other persistent ID for the PI. |
M |
|
|
Name of the person (or group) who created the dataset. Multiple creators are separated by commas. |
ex.: “Alice Juarez” |
S |
|
Email address of the creator. |
ex.: “ajuarez@whoi.edu” |
S |
|
ORCiD or other persistent ID for the creator. |
S |
|
|
Describes the creator entity: |
ex.: “institution” |
S |
|
Institution associated with the creator. |
ex.: “WHOI” |
S |
|
Vocabulary source for keywords. E.g. GCMD Science Keywords. |
ex.: “GCMD Science Keywords” |
S |
|
Provide comma-separated list of terms that will aid in discovery of the dataset. (ACDD) |
ex.: “EARTH SCIENCE > Oceans > Ocean Circulation > Thermohaline Circulation” |
S |
|
Miscellaneous information about the data or methods used to produce it. Any free-format text is appropriate. (CF) |
ex.: “Preliminary version; subject to revision” |
S |
|
A unique platform code. This code is either assigned by the site PI (see principal_investigator) or by the data provider. |
Note that this is required for OceanSITES for GDAC, but it is not implemented in the current version of the amocarray package. |
M (for GDAC) |
|
Web URL for the PI. |
S |
|
|
Web profile for the creator. |
S |
|
|
A grouping of sites based on common shore-based logistics, funding, or infrastructure. |
ex.: “EuroSITES” |
S |
2. Geo-spatial-temporal Metadata
The following attributes are recommended for inclusion in all OceanSITES-compliant NetCDF files. This table includes both required and suggested metadata fields relevant for data discovery, attribution, and catalog integration.
Attribute Name |
Definition |
Example |
RS |
---|---|---|---|
|
Geographical coverage. SeaVox Water Body Gazetteer vocabulary (C19) |
|
S |
|
The southernmost latitude, a value between -90 and 90 degrees; may be string or numeric. (ACDD, GDAC) |
|
M (for GDAC) |
|
The northernmost latitude, a value between -90 and 90 degrees; may be string or numeric. (ACDD, GDAC) |
|
M (for GDAC) |
|
Must conform to udunits. If not specified then ”degree_north” is assumed. (ACDD) |
ex.: geospatial_lat_units = “degrees_north” |
S |
|
The westernmost longitude, a value between -180 and 180 degrees. (ACDD, GDAC) |
ex.: geospatial_lon_min = -80.0 |
M (for GDAC) |
|
The easternmost longitude, a value between -180 and 180 degrees. (ACDD, GDAC) |
|
M (for GDAC) |
|
Must conform to udunits. If not specified then ”degree_east” is assumed. (ACDD) |
ex.: geospatial_lon_units = “degrees_east” |
S |
|
The minimum depth or height of the data, a value between -10000 and 10000 meters. Describes the numerically smaller vertical limit. (ACDD) |
|
M (for GDAC) |
|
The maximum depth or height of the data, a value between -10000 and 10000 meters. Describes the numerically larger vertical limit. (ACDD) |
|
M (for GDAC) |
|
Indicates which direction is positive; “up” means that z represents height, while a value of “down” means that z represents pressure or depth. If not specified then “down” is assumed. (ACDD) |
ex.: geospatial_vertical_positive = “down” |
S |
|
Units of depth, pressure, or height. If not specified then “meter” is assumed. (ACDD) |
ex.: geospatial_vertical_units = “m” |
S |
|
Datetime of the first measurement in this dataset in ISO 8601 format. (ACDD) |
|
M (for GDAC) |
|
Datetime of the last measurement in this dataset in ISO 8601 format. (ACDD) |
|
M (for GDAC) |
|
Use ISO 8601 ‘duration’ convention (ACDD) |
|
S |
|
The time interval between records: Use ISO 8601 (PnYnMnDTnHnMnS). (ACDD) |
|
S |
|
for files using the Discrete Sampling Geometry, available in CF-1.5 and later. See CF documents. For OceanSITES, this should be one of: |
|
M |
|
From Reference table 1: OceanSITES specific. (GDAC) |
|
M |
3. Conventions
Attribute Name |
Definition |
Example |
RS |
---|---|---|---|
|
OceanSITES format version |
ex.: 1.4 |
M |
|
Name of the conventions used in the dataset. |
ex.: “CF-1.7, OceanSITES-1.4, ACDD-1.2” |
S |
4. Publication Information
Attribute Name |
Definition |
Example |
RS |
---|---|---|---|
|
Name of the person responsible for metadata and formatting |
S |
|
|
Web address of the institution or data publisher |
S |
|
|
Published or web-based references that describe the data or methods used to produce it. Include a reference to OceanSITES and a project-specific reference if appropriate. |
S |
|
|
A statement describing the data distribution policy; it may be a project- or DAC-specific statement, but must allow free use of data. OceanSITES has adopted the CLIVAR data policy, which explicitly calls for free and unrestricted data exchange. Details at: http://www.clivar.org/resources/data/data-policy (ACDD) |
|
S |
|
The citation to be used in publications using the dataset; should include a reference to OceanSITES, the name of the PI, the site name, platform code, data access date, time, and URL, and, if available, the DOI of the dataset. |
ex.: “These data were collected and made freely available by the OceanSITES program and the national programs that contribute to it.” |
S |
|
A place to acknowledge various types of support for the project that produced this data. (ACDD) |
|
S |
5. Provenance
Attribute Name |
Definition |
Example |
RS |
---|---|---|---|
|
The date on which the this file was created. Version date and time for the data contained in the file. See note on time format below. (ACDD) |
ex.: date_created =”2016-04-11T08:35:00Z” |
M |
|
The date on which this file was last modified. (ACDD) |
ex.: date_modified =”2016-04-11T08:35:00Z” |
S |
|
Provides an audit trail for modifications to the original data. It should contain a separate line for each modification, with each line beginning with a timestamp, and including user name, modification name, and modification arguments. The time stamp should follow the format outlined in the note on time formats below. (NUG) |
ex.: history= “2012-04-11T08:35:00Z data collected, A. Meyer; |
S |
|
Level of processing and quality control applied to data. Preferred values are listed in reference table 3. |
processing_level = ”Data verified against model or other contextual information” (OceanSITES specific) |
S |
|
A value valid for the whole dataset |
|
S |
|
A semi-colon-separated list of names of any individuals or institutions that contributed to the collection, editing or publication of the data in the file. (ACDD) |
ex.: “Alice Juarez; John Smith” |
S |
|
The roles of any individuals or institutions that contributed to the creation of this data, separated by semi-colons (ACDD) |
ex.: “data collector; data editor” |
S |
|
The email addresses of any individuals or institutions that contributed to the creation of this data, separated by semi-colons (ACDD) |
S |
Dimension and definition
OceanSITES recommends coordinates with an “axis” attribute defining that they represent the X, Y, Z or T axis (which should appear in the relative order T, Z, Y, X). Here, they use the naming: TIME, LATITUDE, LONGITUDE, and DEPTH. (Note: this departs from OSNAP data files). Apparently in OceanSITES, “depth” is strongly preferred over “pressure”.
Dimension |
Definition |
Comment |
---|---|---|
|
unlimited, axis=”T” |
Time coordinate in days since 1950-01-01 |
|
vertical, axis=”Z” |
Positive downward, in meters |
|
horizontal, axis=”Y”, axis=”X” |
In degrees north/east |
Coordinates
Coordinate name |
Coordinate attributes |
RS |
---|---|---|
|
|
M |
|
|
S |
|
|
S |
|
|
S |
For Time, by default, it represents the center of the data sample or averaging period. This is not consistent with OSNAP native format.
Geophysical variables
All variables must follow CF and OceanSITES standard_name rules (lowercase, underscores, no capitals).
Use standard_name
where defined; otherwise, include descriptive long_name
and appropriate units
.
VARIABLE NAME |
variable attributes |
Example |
RS |
---|---|---|---|
|
|
|
S |
Flags and QC
For Flags, these are indicated as <PARAM>_QC with standard values “flag_values” = 0, 1, 2, 3, 4, 7, 8, 9 and “flag_meanings” = “unknown good_data probably_good_data potentially_correctable_bad_data bad_data nominal_value interpolated_value missing_value” (attribute to the variable) defined. There is also an optional <PARAM>_UNCERTAINTY with “technique_title” as “Title of the document that describes the technique that was applied to estimate the uncertainty of the data”. I’m not sure whether either of these applies to the “_FLAG” for RAPID or the “_ERR” for OSNAP. But OSNAP does have the “QC_indicator” and “Processing_level”. QC_indicator is OceanSITES specific (see table 2) and “processing_level” is table 3.
The QC_indicator (ref table 2) are used in the <PARAM>_QC variable to describe the quality of each measurement. I’m not sure this is how OSNAP uses it. Processing level options applied to all measurements of a variable and are given as an overall indicator in the attributes of each variable:
Raw instrument data
Instrument data that has been converted to geophysical values
Post-recovery calibrations have been applied
Data has been scaled using contextual information
Known bad data has been replaced with null values
Known bad data has been replaced with values based on surrounding
data
Ranges applied, bad data flagged
Data interpolated
Data manually reviewed
Data verified against model or other contextual information
Other QC process applied
AMOC array data
RAPID data files use dimensions of depth and time but coordinates of pressure in the 12-hourly data. In the 10-day data, it is time, longitude, and depth and also sigma0 for dimensions, with coordinates of pressure and sigma0.(Verify this). Dimension orders do not follow CF conventions so arrays will need to be rotated. Axis is not specified (needs to be added).
OSNAP data files use dimensions of TIME, LEVEL, LATITUDE and LONGITUDE and sometimes also DEPTH. Axis is specified. The order of dimensions is consistent with OceanSITES. Standard names are missing for some variables (e.g., the T_ALL and sea_water_velocity doesn’t seem to be an option in CF. Probably because we need a version like sea_water_velocity_across_line.) We can use ocean_meridional_overturning_streamfunction for the streamfunction, and perhaps ocean_volume_transport_across_line which is in CF conventions and is what MOVE uses.
MOVE data files use dimensions of TIME only. Standard names are missing for some variables (e.g., the transport_component_internal and transport_component_internal_offset and transport_component_boundary). CF standard names does have baroclinic_northward_sea_water_velocity so perhaps we can use baroclinic_transport_across_line.
SAMBA data files are also in TIME only. Standard names are everywhere Transport_anomaly. CF conventions allows adding _anomaly but then it should be something like ocean_volume_transport_anomaly_across_line or something similar.
References
Relevant references:
OceanSITES data format reference manual, but additionally attempts to specify vocabularies. Note, if the link to the pdf is broken, here is a version downloaded in 2025:
oceansites_data_format_reference_manual.pdf
which describes OceanSITES version 1.4.Vocabularies are primarily CF standard names. See AC1_standard_names.
CF conventions has a number of relevant sections, including: