Developer Guide for amocarray
ο
Welcome to the amocarray
Developer Guide!
This guide will help you set up your local development environment, understand the project structure, and contribute effectively to the project. Whether youβre fixing bugs, adding new readers, or improving documentation, this guide is your starting point.
Related resources:
Table of Contentsο
1. Quickstart: First Contributionο
Fork the repository
Clone the upstream repository:
git clone https://github.com/AMOCcommunity/amocarray.git
cd amocarray
Create a virtual environment and install dependencies:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
Make your changes (update a doc, fix a function!)
Run tests and pre-commit checks:
pytest
pre-commit run --all-files
Push to your fork
Open a pull request π
2. Project Overviewο
amocarray
is a Python package to process and analyse data from AMOC observing arrays.
It is designed to support researchers and data users by providing tools to read, standardise, and work with multiple datasets.
Core goals:
Consistent handling of multiple AMOC arrays
Easy integration of new data sources
High code quality and reproducibility
3. Project Structureο
amocarray/
βββ amocarray/ # Core modules (readers, utilities, standardisation)
β βββ readers.py # High-level interface for loading datasets
β βββ read_move.py # Reader for MOVE data
β βββ read_rapid.py # Reader for RAPID data
β βββ read_osnap.py # Reader for OSNAP data
β βββ read_samba.py # Reader for SAMBA data
β βββ utilities.py # Helper functions (file handling, downloads, etc.)
β βββ tools.py # Unit conversions and data cleaning
β βββ standardise.py # Functions for dataset standardisation
β βββ plotters.py # Plotting utilities
β βββ writers.py # Data writing utilities
β βββ logger.py # Project-wide structured logging
βββ tests/ # Unit tests
βββ data/ # Local data storage (downloads etc.)
βββ docs/ # Documentation sources (built with Sphinx)
βββ notebooks/ # Jupyter notebooks for exploration and demos
βββ .github/ # GitHub workflows and actions
βββ pyproject.toml # Project metadata and build system config
βββ CITATION.cff # Citation file for this project
βββ CONTRIBUTING.md # Contribution guidelines
βββ README.md # Project overview and installation instructions
βββ requirements.txt # Runtime dependencies
βββ requirements-dev.txt # Development dependencies
βββ .pre-commit-config.yaml # Pre-commit hooks configuration
Project Management and Configuration Filesο
pyproject.toml
: Project metadata and build system configuration.CITATION.cff
: Citation information for the project.CONTRIBUTING.md
: Guidelines for contributors.README.md
: Project overview, installation, and usage instructions..pre-commit-config.yaml
: Pre-commit hook configurations.
Core Modulesο
readers.py
: Clean, high-level interface for loading datasets.read_move.py
,read_rapid.py
, etc.: Specific reader modules for MOVE, RAPID, OSNAP, SAMBA.utilities.py
: Shared helper functions.tools.py
: Unit conversions and data cleaning.standardise.py
: Dataset standardisation functions.logger.py
: Project-wide structured logging.plotters.py
: Plotting utilities.writers.py
: Data writing utilities.
4. Setting Up Development Environmentο
Step 1: Clone the repositoryο
git clone https://github.com/AMOCcommunity/amocarray.git
cd amocarray
Step 2: Set up a virtual environmentο
In a terminal window, at the root of the repository (next to the LICENSE
file), run
python3 -m venv venv
source venv/bin/activate && micromamba deactivate
Note the addition to the line source venv/bin/activate
: the part && micromamba deactivate
is a safeguard in case you sometimes use micromamba. It will ensure that youβve deactivated any micromamba environments in this terminal.
Step 3: Install dependenciesο
pip install -r requirements.txt
pip install -r requirements-dev.txt
If you have added or changed these, and want to make sure you have a clean install, you can do a
pip install -r requirements-dev.txt --force-reinstall
which will reinstall the packages at the newest version available.
Step 4: (Optional) Install pre-commit hooks manuallyο
We recommend running pre-commits to fix formatting and run tests, prior to making a pull request (or even prior to committing). These will help you fix any problems you might otherwise encounter when the GitHub actions run the tests on your PR.
You can run pre-commit manually:
pre-commit run --all-files
Advanced (optional): If you know how to get these running, then to install hooks
pre-commit install
Step 5: Build the documentation (optional)ο
cd docs
make html
5. Development Workflowο
Branching Modelο
Work on feature branches from
main
.No enforced naming convention, but commonly we use:
Eleanor-patch-X
, where X increments for each patch.
Fork & Pull Requestsο
Fork the repository on GitHub.
Push your changes to your fork.
Open a pull request to
AMOCcommunity/amocarray
.
Keeping Your Fork Up To Dateο
git remote add upstream https://github.com/AMOCcommunity/amocarray.git
git fetch upstream
git merge upstream/main
6. Ignoring Local Files: .gitignore
vs .git/info/exclude
ο
When working with local files that should not be tracked by Git, you have two main options:
.gitignore
ο
Lives in the root of the repository.
Changes are shared with all contributors.
Best for files or patterns that should be ignored project-wide (e.g., temporary build files, virtual environments).
Example entries:
__pycache__/
venv/
data/
.git/info/exclude
ο
Personal, local ignores specific to your environment.
Behaves like
.gitignore
but is never committed.Use for local files you want to ignore without affecting the shared project settings.
Example usage:
my_temp_outputs/
notes.txt
You can edit .git/info/exclude
manually at any time.
Best Practiceο
Use
.gitignore
for project-wide ignores.Use
.git/info/exclude
for personal, local excludes β no risk of accidentally committing changes to shared ignore patterns!
7. Commit Message Style Guideο
We use clear, consistent commit messages to make our history readable and to support changelog automation in the future.
Formatο
[type] short description of the change
Use lowercase for the description (except proper nouns).
Keep it concise but descriptive (ideally under 72 characters).
Use the imperative mood: βfix bugβ not βfixed bugβ or βfixes bugβ.
Typesο
Tag |
Purpose |
---|---|
|
New feature |
|
Bug fix |
|
Documentation only changes |
|
Code style changes (formatting, no logic) |
|
Code improvements without behavior change |
|
Adding or improving tests |
|
Changes to CI/CD pipelines |
|
Maintenance or auxiliary tooling changes |
|
Removing old code or housekeeping |
Examplesο
fix osnap reader dimension handling
feat add metadata support for samba reader
docs update README with installation steps
test add coverage for utilities module
ci add pre-commit config for linting
cleanup remove deprecated functions from tools.py
Why this mattersο
β Easier to read history
β Easier changelog generation (future automation-ready!)
β Helps reviewers quickly understand the purpose of commits
When in doubt, keep your commits small and focused!
8. Logging and Debuggingο
With PR #25, structured logging has been introduced to amocarray
.
Logs track steps during data reading and, in the future, will also report changes during dataset standardisation.
How logging worksο
Logging is handled in logger.py
using:
setup_logger(array_name, output_dir="logs")
Creates a log file per array (MOVE, RAPID, OSNAP, etc.)
Timestamped as:
YYYYMMDDTHH
Currently appends the string βreadβ β this may evolve to include other processes like standardisation.
Enabling and disabling loggingο
Logging is controlled by the global variable LOGGING_ENABLED
in logger.py
.
You can toggle logging dynamically:
from amocarray import logger
logger.enable_logging()
logger.disable_logging()
Writing logs in modulesο
We wrap standard Python logging calls to allow toggling:
from amocarray.logger import log_info, log_warning, log_error, log_debug
Then, in your code:
log_info("Dataset successfully loaded.")
log_warning("Missing metadata detected.")
log_error("File not found.")
log_debug("Variable dimensions: %s", dims)
Note: This departs from typical imports (
from amocarray import logger
) to keep calls clean and familiar:log_info(...)
rather thanlogger.log.info(...)
.
Log levelsο
We use standard Python logging levels, with our most common being:
log_error
: Critical failures or exceptions.log_warning
: Potential issues that do not stop execution.log_info
: Useful process steps and confirmations.log_debug
: Detailed diagnostic output.
All levels are currently captured:
log.setLevel(logging.DEBUG) # capture everything; handlers filter later
Best practicesο
β Use logging to track important steps in your code.
β Log warnings for unusual but non-breaking behaviour.
β Use
log_debug
for rich details useful in debugging.β Avoid excessive logging inside tight loops.
As amocarray
expands, logs will play an increasing role in transparency and reproducibility.
9. Troubleshootingο
Pre-commit not running?ο
Run manually:
pre-commit run --all-files
VSCode virtualenv not recognised?ο
Ensure VSCode Python interpreter is set to
./venv/bin/python
.
Tests failing due to missing data?ο
Check your data directory is correctly set.
Pre-commit pytest
hook fails but pytest
passes manually?ο
Ensure your virtual environment is activated in your VSCode terminal settings.
My commits are blocked by pre-commit errors?ο
Fix all reported issues (linting, formatting, etc.) then try committing again.
10. Further Resourcesο
This developer guide was prepared based on interactions with callumGPT and with ChatGPT to help structure and clarify.