Developer Guide for AMOCatlas
Welcome to the AMOCatlas Developer Guide! This comprehensive guide will help you contribute effectively to the project, whether you’re fixing bugs, adding new data readers, or improving documentation.
Quick Navigation:
Quickstart - Get contributing in 5 minutes
Development Environment - Setup and tools
Adding New Features - Core contribution patterns
Code Standards - Style and quality guidelines
Git Workflow - Fork, branch, and PR process
Testing - Running tests locally and in CI
Specialized Guides - Links to detailed references
Related Documentation:
Git for Beginners: Workflow Guide - Step-by-step Git workflow for beginners
Project Maintenance Checklist - Maintenance tasks
Extra GitHub features: Actions & Pages - CI/CD and release process
Quickstart
Get started contributing in 5 minutes:
Fork and clone the repository:
# Fork on GitHub, then clone your fork git clone https://github.com/YOUR_USERNAME/amocatlas.git cd amocatlas
Set up development environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements-dev.txt pip install -e .
Create a feature branch:
git checkout -b yourname-patch-1
Make your changes and test them:
pytest # Run tests pre-commit run --all-files # Check formatting and linting
Push and create a pull request:
git add . git commit -m "feat: your descriptive commit message" git push origin yourname-patch-1 # Then create PR on GitHub
Project Overview
AMOCatlas is a Python package for accessing and analyzing data from Atlantic Meridional Overturning Circulation (AMOC) observing arrays. The project aims to provide:
Unified data access across multiple AMOC arrays (RAPID, OSNAP, MOVE, SAMBA, etc.)
Consistent data formats and standardized metadata
Visualization tools including publication-quality PyGMT figures
Analysis functions for filtering, processing, and comparing datasets
Core Architecture
amocatlas/
├── readers.py # Main interface - load_dataset(), load_sample_dataset()
├── read_*.py # Individual array readers (rapid, osnap, move, etc.)
├── utilities.py # Shared functions (downloads, file handling)
├── tools.py # Analysis functions (filtering, unit conversion)
├── plotters.py # Visualization (matplotlib + PyGMT)
├── standardise.py # Data format standardization
├── writers.py # Data export functionality
└── logger.py # Structured logging system
Data Flow: User calls readers.load_dataset("rapid") → readers.py routes to read_rapid.py → downloads data → standardizes format → returns xarray Dataset(s)
Package-Level Imports
We import modules in __init__.py instead of at the top of each individual module:
# In amocatlas/__init__.py
from . import (
readers,
plotters,
compliance_checker,
convert,
# etc.
)
This means:
Tests can do
from amocatlas import compliance_checkerWithout this, tests would need
from amocatlas.compliance_checker import ...Modules like
plotters.HAS_PYGMTare accessible becauseplottersis imported at package level
Development Environment
Prerequisites
Python 3.9 or higher
Git for version control
Optional: PyGMT for publication figures (see installation notes below)
Setup Steps
Clone the repository:
git clone https://github.com/AMOCcommunity/amocatlas.git cd amocatlas
Create virtual environment:
python3 -m venv venv source venv/bin/activate && micromamba deactivate # Safeguard if using micromamba
Install dependencies:
pip install -r requirements-dev.txt # Includes runtime + development tools pip install -e . # Install amocatlas in editable mode
Test your setup:
pytest # Run tests python -c "import amocatlas; print('Success!')"
Development Tools
Black: Code formatting (88 character line length)
Ruff: Linting and import sorting
pytest: Testing framework with coverage reporting
pre-commit: Automated code quality checks (run manually)
Sphinx: Documentation generation
PyGMT Installation Notes
PyGMT is an optional dependency for publication-quality figures but can be challenging to install:
# Try conda/mamba first (recommended):
conda install pygmt -c conda-forge
# Or pip (may require GMT to be installed separately):
pip install pygmt
See the PyGMT installation guide for platform-specific instructions.
Adding New Features
Adding a New Data Reader
This is the most common contribution type. Here’s the step-by-step process:
Create the reader module
amocatlas/read_newarray.py:"""Reader for NEWARRAY data.""" import xarray as xr from amocatlas.utilities import download_file from amocatlas.logger import log_info def read_newarray(source: str = None, **kwargs) -> list[xr.Dataset]: """Read NEWARRAY data and return standardized datasets. Parameters ---------- source : str, optional Data source URL or path. **kwargs Additional parameters passed to data loading. Returns ------- list[xr.Dataset] List of standardized xarray datasets. """ log_info("Loading NEWARRAY data...") # Implementation here return [dataset]
Add to the main readers interface in
amocatlas/readers.py:# Add to AVAILABLE_ARRAYS AVAILABLE_ARRAYS = { # ... existing arrays "newarray": "amocatlas.read_newarray", }
Create tests in
tests/test_read_newarray.py:import pytest from amocatlas.read_newarray import read_newarray def test_read_newarray(): """Test basic functionality of NEWARRAY reader.""" datasets = read_newarray() assert isinstance(datasets, list) assert len(datasets) > 0
Document the original format in
docs/source/format_orig_newarray.rst:NEWARRAY Original Format ======================== Description of the native NEWARRAY data format, including: - File structure and naming conventions - Variable names and units in original format - Metadata structure - Any format-specific considerations
Add sample data (if needed) and update other documentation.
Adding Visualization Functions
Add to amocatlas/plotters.py. Choose between matplotlib (default) or PyGMT (publication quality):
def plot_new_visualization(data, **kwargs):
"""Create a new type of visualization.
Parameters
----------
data : pandas.DataFrame or xarray.Dataset
Input data to plot.
**kwargs
Plotting options.
Returns
-------
matplotlib.figure.Figure or pygmt.Figure
Generated plot.
"""
# Implementation
return fig
For PyGMT functions, include the availability check:
def plot_new_pygmt_viz(data, **kwargs):
"""Create publication-quality plot using PyGMT."""
_check_pygmt() # This function handles missing PyGMT gracefully
# Implementation
Code Standards
Python Style
Type hints: Use for all function parameters and return values
Docstrings: NumPy-style docstrings for all public functions
Naming: snake_case for functions and variables, ALL_CAPS for xarray variables
Line length: 88 characters (Black default)
Import order: Standard library → Third party → Local imports (handled by Ruff)
Example Function:
def convert_units_var(
var_values: xr.DataArray,
current_unit: str,
new_unit: str,
unit_conversion: dict = None,
) -> xr.DataArray:
"""Convert variable values from one unit to another.
Parameters
----------
var_values : xr.DataArray
The numerical values to convert.
current_unit : str
Unit of the original values.
new_unit : str
Desired unit for the output values.
unit_conversion : dict, optional
Dictionary containing conversion factors between units.
Returns
-------
xr.DataArray
Converted values in the desired unit.
"""
# Implementation
return converted_values
Data Standards
xarray variables: ALL_CAPS (e.g.,
TRANSPORT,TIME,DEPTH)Attributes: lowercase_with_underscores following OceanGliders OG1 format
Units: Always include units in variable attributes, never in variable names
Missing values: Handle NaN values consistently across functions
Quality Checks
Run these before committing:
black amocatlas/ tests/ # Format code
ruff check amocatlas/ tests/ # Lint code
pytest --cov=amocatlas # Run tests with coverage
pre-commit run --all-files # Run all quality checks
Git Workflow
Basic Workflow
Keep your fork synced:
# Add upstream remote (one time only) git remote add upstream https://github.com/AMOCcommunity/amocatlas.git # Sync your fork git checkout main git fetch upstream git merge upstream/main git push origin main
Create feature branches:
git checkout main git pull origin main git checkout -b yourname-patch-1 # Or descriptive name like: fix-osnap-metadata
Make commits with clear messages:
git add . git commit -m "feat: add support for NEWARRAY dataset" # or git commit -m "fix: handle missing timestamps in RAPID data"
Commit Message Format
Use conventional commits for consistency:
[type]: brief description of change
Types:
- feat: new feature
- fix: bug fix
- docs: documentation changes
- style: formatting changes (no logic change)
- refactor: code restructuring (no behavior change)
- test: adding or updating tests
- chore: maintenance tasks
Pull Request Process
Push your branch:
git push origin yourname-patch-1
Create PR on GitHub targeting
AMOCcommunity/amocatlas:mainAddress feedback and update your branch as needed
Merge once approved (you can merge your own PR after approval)
Testing
Running Tests Locally
# Run all tests
pytest
# Run with coverage report
pytest --cov=amocatlas --cov-report term-missing
# Run specific test file
pytest tests/test_readers.py
# Run specific test
pytest tests/test_readers.py::test_load_sample_dataset_rapid
Writing Tests
Place tests in tests/ directory following the naming convention test_*.py:
import pytest
import numpy as np
from amocatlas import readers
def test_load_sample_dataset():
"""Test that sample datasets load correctly."""
ds = readers.load_sample_dataset("rapid")
assert ds is not None
assert "TRANSPORT" in ds.variables
def test_data_processing():
"""Test data processing functions."""
# Use sample data for testing
data = np.array([1, 2, 3, 4, 5])
result = your_function(data)
expected = np.array([2, 4, 6, 8, 10])
np.testing.assert_array_equal(result, expected)
GitHub Actions CI
Tests run automatically on:
Pull requests
Pushes to main branch
The CI tests on multiple platforms (Windows, macOS, Linux) and Python versions. Check the “Actions” tab on GitHub to see test results.
Pre-commit Checks
Before submitting PRs, ensure these pass:
pre-commit run --all-files
This runs:
Black code formatting
Ruff linting and import sorting
Basic pytest tests (on modified files)
PyGMT Development
Note: PyGMT development is advanced and not expected for most contributors.
PyGMT functions in amocatlas/plotters.py follow these patterns:
All PyGMT functions include
_check_pygmt()for graceful fallbackFunctions return
pygmt.FigureobjectsInclude AMOCatlas timestamp:
_add_amocatlas_timestamp(fig)Handle optional dependency gracefully with informative error messages
For detailed PyGMT development, see the existing PyGMT functions in plotters.py and refer to PyGMT documentation.
Troubleshooting
Common Issues
Pre-commit not running?
pre-commit run --all-files # Run manually
Tests failing locally but passing in CI?
Check your virtual environment is activated
Ensure you have the latest
requirements-dev.txtinstalled
Import errors after installing in editable mode?
pip install -e . --force-reinstall
PyGMT installation issues?
Try conda/mamba:
conda install pygmt -c conda-forgeCheck PyGMT installation guide
PyGMT is optional - all other functionality works without it
VSCode not recognizing virtual environment?
Ensure Python interpreter is set to
./venv/bin/pythonReload VSCode window after activating environment
Specialized Guides
For detailed information on specific topics, see these dedicated guides:
Git for Beginners: Workflow Guide - Step-by-step Git workflow with screenshots for beginners
Project Maintenance Checklist - Maintenance and dependency management
Extra GitHub features: Actions & Pages - CI/CD workflows and release process
Resources
This developer guide incorporates best practices from the Python scientific computing community and is designed to grow with the project.