Developer Guide for amocarray

Welcome to the amocarray Developer Guide!

This guide will help you set up your local development environment, understand the project structure, and contribute effectively to the project. Whether you’re fixing bugs, adding new readers, or improving documentation, this guide is your starting point.

Related resources:


Table of Contents

  1. Quickstart

  2. Project Overview

  3. Project Structure

  4. Setting Up Development Environment

  5. Development Workflow

  6. .gitignore vs .git/info/exclude

  7. Commit Message Style Guide

  8. Logging and Debugging

  9. Troubleshooting

  10. Further Resources


1. Quickstart: First Contribution

  1. Fork the repository

  2. Clone the upstream repository:

git clone https://github.com/AMOCcommunity/amocarray.git
cd amocarray
  1. Create a virtual environment and install dependencies:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
  1. Make your changes (update a doc, fix a function!)

  2. Run tests and pre-commit checks:

pytest
pre-commit run --all-files
  1. Push to your fork

  2. Open a pull request πŸš€


2. Project Overview

amocarray is a Python package to process and analyse data from AMOC observing arrays.
It is designed to support researchers and data users by providing tools to read, standardise, and work with multiple datasets.

Core goals:

  • Consistent handling of multiple AMOC arrays

  • Easy integration of new data sources

  • High code quality and reproducibility


3. Project Structure

amocarray/
β”œβ”€β”€ amocarray/               # Core modules (readers, utilities, standardisation)
β”‚   β”œβ”€β”€ readers.py           # High-level interface for loading datasets
β”‚   β”œβ”€β”€ read_move.py         # Reader for MOVE data
β”‚   β”œβ”€β”€ read_rapid.py        # Reader for RAPID data
β”‚   β”œβ”€β”€ read_osnap.py        # Reader for OSNAP data
β”‚   β”œβ”€β”€ read_samba.py        # Reader for SAMBA data
β”‚   β”œβ”€β”€ utilities.py         # Helper functions (file handling, downloads, etc.)
β”‚   β”œβ”€β”€ tools.py             # Unit conversions and data cleaning
β”‚   β”œβ”€β”€ standardise.py       # Functions for dataset standardisation
β”‚   β”œβ”€β”€ plotters.py          # Plotting utilities
β”‚   β”œβ”€β”€ writers.py           # Data writing utilities
β”‚   └── logger.py            # Project-wide structured logging
β”œβ”€β”€ tests/                   # Unit tests
β”œβ”€β”€ data/                    # Local data storage (downloads etc.)
β”œβ”€β”€ docs/                    # Documentation sources (built with Sphinx)
β”œβ”€β”€ notebooks/               # Jupyter notebooks for exploration and demos
β”œβ”€β”€ .github/                 # GitHub workflows and actions
β”œβ”€β”€ pyproject.toml           # Project metadata and build system config
β”œβ”€β”€ CITATION.cff             # Citation file for this project
β”œβ”€β”€ CONTRIBUTING.md          # Contribution guidelines
β”œβ”€β”€ README.md                # Project overview and installation instructions
β”œβ”€β”€ requirements.txt         # Runtime dependencies
β”œβ”€β”€ requirements-dev.txt     # Development dependencies
└── .pre-commit-config.yaml  # Pre-commit hooks configuration

Project Management and Configuration Files

  • pyproject.toml: Project metadata and build system configuration.

  • CITATION.cff: Citation information for the project.

  • CONTRIBUTING.md: Guidelines for contributors.

  • README.md: Project overview, installation, and usage instructions.

  • .pre-commit-config.yaml: Pre-commit hook configurations.

Core Modules

  • readers.py: Clean, high-level interface for loading datasets.

  • read_move.py, read_rapid.py, etc.: Specific reader modules for MOVE, RAPID, OSNAP, SAMBA.

  • utilities.py: Shared helper functions.

  • tools.py: Unit conversions and data cleaning.

  • standardise.py: Dataset standardisation functions.

  • logger.py: Project-wide structured logging.

  • plotters.py: Plotting utilities.

  • writers.py: Data writing utilities.


4. Setting Up Development Environment

Step 1: Clone the repository

git clone https://github.com/AMOCcommunity/amocarray.git
cd amocarray

Step 2: Set up a virtual environment

In a terminal window, at the root of the repository (next to the LICENSE file), run

python3 -m venv venv
source venv/bin/activate && micromamba deactivate

Note the addition to the line source venv/bin/activate: the part && micromamba deactivate is a safeguard in case you sometimes use micromamba. It will ensure that you’ve deactivated any micromamba environments in this terminal.

Step 3: Install dependencies

pip install -r requirements.txt
pip install -r requirements-dev.txt

If you have added or changed these, and want to make sure you have a clean install, you can do a

pip install -r requirements-dev.txt --force-reinstall

which will reinstall the packages at the newest version available.

Step 4: (Optional) Install pre-commit hooks manually

We recommend running pre-commits to fix formatting and run tests, prior to making a pull request (or even prior to committing). These will help you fix any problems you might otherwise encounter when the GitHub actions run the tests on your PR.

You can run pre-commit manually:

pre-commit run --all-files

Advanced (optional): If you know how to get these running, then to install hooks

pre-commit install

Step 5: Build the documentation (optional)

cd docs
make html

5. Development Workflow

Branching Model

  • Work on feature branches from main.

  • No enforced naming convention, but commonly we use: Eleanor-patch-X, where X increments for each patch.

Fork & Pull Requests

  • Fork the repository on GitHub.

  • Push your changes to your fork.

  • Open a pull request to AMOCcommunity/amocarray.

See: Git collaboration guide

Keeping Your Fork Up To Date

git remote add upstream https://github.com/AMOCcommunity/amocarray.git
git fetch upstream
git merge upstream/main

6. Ignoring Local Files: .gitignore vs .git/info/exclude

When working with local files that should not be tracked by Git, you have two main options:

.gitignore

  • Lives in the root of the repository.

  • Changes are shared with all contributors.

  • Best for files or patterns that should be ignored project-wide (e.g., temporary build files, virtual environments).

Example entries:

__pycache__/
venv/
data/

.git/info/exclude

  • Personal, local ignores specific to your environment.

  • Behaves like .gitignore but is never committed.

  • Use for local files you want to ignore without affecting the shared project settings.

Example usage:

my_temp_outputs/
notes.txt

You can edit .git/info/exclude manually at any time.

Best Practice

  • Use .gitignore for project-wide ignores.

  • Use .git/info/exclude for personal, local excludes β€” no risk of accidentally committing changes to shared ignore patterns!


7. Commit Message Style Guide

We use clear, consistent commit messages to make our history readable and to support changelog automation in the future.

Format

[type] short description of the change
  • Use lowercase for the description (except proper nouns).

  • Keep it concise but descriptive (ideally under 72 characters).

  • Use the imperative mood: β€œfix bug” not β€œfixed bug” or β€œfixes bug”.

Types

Tag

Purpose

feat

New feature

fix

Bug fix

docs

Documentation only changes

style

Code style changes (formatting, no logic)

refactor

Code improvements without behavior change

test

Adding or improving tests

ci

Changes to CI/CD pipelines

chore

Maintenance or auxiliary tooling changes

cleanup

Removing old code or housekeeping

Examples

fix osnap reader dimension handling
feat add metadata support for samba reader
docs update README with installation steps
test add coverage for utilities module
ci add pre-commit config for linting
cleanup remove deprecated functions from tools.py

Why this matters

  • βœ… Easier to read history

  • βœ… Easier changelog generation (future automation-ready!)

  • βœ… Helps reviewers quickly understand the purpose of commits

When in doubt, keep your commits small and focused!


8. Logging and Debugging

With PR #25, structured logging has been introduced to amocarray.

Logs track steps during data reading and, in the future, will also report changes during dataset standardisation.

How logging works

Logging is handled in logger.py using:

setup_logger(array_name, output_dir="logs")
  • Creates a log file per array (MOVE, RAPID, OSNAP, etc.)

  • Timestamped as: YYYYMMDDTHH

  • Currently appends the string β€œread” β€” this may evolve to include other processes like standardisation.

Enabling and disabling logging

Logging is controlled by the global variable LOGGING_ENABLED in logger.py.

You can toggle logging dynamically:

from amocarray import logger
logger.enable_logging()
logger.disable_logging()

Writing logs in modules

We wrap standard Python logging calls to allow toggling:

from amocarray.logger import log_info, log_warning, log_error, log_debug

Then, in your code:

log_info("Dataset successfully loaded.")
log_warning("Missing metadata detected.")
log_error("File not found.")
log_debug("Variable dimensions: %s", dims)

Note: This departs from typical imports (from amocarray import logger) to keep calls clean and familiar: log_info(...) rather than logger.log.info(...).

Log levels

We use standard Python logging levels, with our most common being:

  • log_error: Critical failures or exceptions.

  • log_warning: Potential issues that do not stop execution.

  • log_info: Useful process steps and confirmations.

  • log_debug: Detailed diagnostic output.

All levels are currently captured:

log.setLevel(logging.DEBUG)  # capture everything; handlers filter later

Best practices

  • βœ… Use logging to track important steps in your code.

  • βœ… Log warnings for unusual but non-breaking behaviour.

  • βœ… Use log_debug for rich details useful in debugging.

  • βœ… Avoid excessive logging inside tight loops.

As amocarray expands, logs will play an increasing role in transparency and reproducibility.


9. Troubleshooting

Pre-commit not running?

Run manually: pre-commit run --all-files

VSCode virtualenv not recognised?

Ensure VSCode Python interpreter is set to ./venv/bin/python.

Tests failing due to missing data?

Check your data directory is correctly set.

Pre-commit pytest hook fails but pytest passes manually?

Ensure your virtual environment is activated in your VSCode terminal settings.

My commits are blocked by pre-commit errors?

Fix all reported issues (linting, formatting, etc.) then try committing again.


10. Further Resources


This developer guide was prepared based on interactions with callumGPT and with ChatGPT to help structure and clarify.