In this repository, you can find the the Pangeo Forge project roadmap. The roadmap is where you can learn about Pangeo Forge project, its subprojects (i.e. pangeo-smithy) and how they fit together, and the road ahead. Pangeo Forge is just getting started so please open issues to ask questions or to propose changes and/or additions to the roadmap itself.
The idea of Pangeo Forge is to copy the very successful pattern of Conda Forge for crowdsourcing the curation of an analysis-ready data library. In Conda Forge, a maintainer contributes a recipe which is used to generate a conda package from a source code tarball. Behind the scenes, CI downloads the source code, builds the package, and uploads it to a repository. In Pangeo Forge, a maintainer contributes a recipe which is used to generate an analysis-ready cloud-based copy of a dataset in a cloud-optimized format like Zarr. Behind the scenes, CI downloads the original files from their source (e.g. FTP, HTTP, or OpenDAP), combines them using xarray, writes out the Zarr file, and uploads to cloud storage. Pangeo Forge has grown out of the Pangeo Project, an open source community promoting open, reproducible, and scalable science.
Pangeo Forge brings together a number of smaller subprojects to enable automatic the automatic production and publication of cloud-optimized datasets. Those subprojects are described briefly below:
Pangeo-forge provides a central workflow manager and API for the productions of cloud-optimized datasets. It is being designed to include a high-level Pipeline API (built on top of Prefect) that will be useful inside and outside of pangeo-forge infrastructure. Read about the pangeo-forge roadmap here.
Pangeo-smithy is a tool for managing pangeo-forge feedstocks. It combines a pangeo-forge recipes with the Continuous Integration and Continuous Deployment (CI/CD) services. Read about the pangeo-smithy roadmap here.
Staged-recipes is a GitHub repository that manages the submission of new pangeo-forge recipes. You can think of this as a holding area for new feedstocks. Read about the staged-recipes roadmap here.
Pangeo-forge is just getting started. There's lots of work to do and lots of room for contributors to engage. Here are a few ways you may consider getting involved:
- Document an example recipie
- Contribute to any of the subprojects above. At the time of writing (8/11/2020), the pangeo-forge API is the most active area of development.
- Comment on the project road map in this repository.
- Pipeline: A Python object that defines the steps to aquire, convert, and publish a dataset.
- Feedstock: A GitHub repository in the pangeo-forge GitHub organization that is managed by pangeo-smithy.
This work is licensed under CC BY 4.0.