Skip to content

leebardon/roadmap

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Pangeo Forge public roadmap

In this repository, you can find the the Pangeo Forge project roadmap. The roadmap is where you can learn about Pangeo Forge project, its subprojects (i.e. pangeo-smithy) and how they fit together, and the road ahead. Pangeo Forge is just getting started so please open issues to ask questions or to propose changes and/or additions to the roadmap itself.

Background

The idea of Pangeo Forge is to copy the very successful pattern of Conda Forge for crowdsourcing the curation of an analysis-ready data library. In Conda Forge, a maintainer contributes a recipe which is used to generate a conda package from a source code tarball. Behind the scenes, CI downloads the source code, builds the package, and uploads it to a repository. In Pangeo Forge, a maintainer contributes a recipe which is used to generate an analysis-ready cloud-based copy of a dataset in a cloud-optimized format like Zarr. Behind the scenes, CI downloads the original files from their source (e.g. FTP, HTTP, or OpenDAP), combines them using xarray, writes out the Zarr file, and uploads to cloud storage. Pangeo Forge has grown out of the Pangeo Project, an open source community promoting open, reproducible, and scalable science.

Subprojects

Pangeo Forge brings together a number of smaller subprojects to enable automatic the automatic production and publication of cloud-optimized datasets. Those subprojects are described briefly below:

pangeo-forge

Pangeo-forge provides a central workflow manager and API for the productions of cloud-optimized datasets. It is being designed to include a high-level Pipeline API (built on top of Prefect) that will be useful inside and outside of pangeo-forge infrastructure. Read about the pangeo-forge roadmap here.

pangeo-smithy

Pangeo-smithy is a tool for managing pangeo-forge feedstocks. It combines a pangeo-forge recipes with the Continuous Integration and Continuous Deployment (CI/CD) services. Read about the pangeo-smithy roadmap here.

staged-recipes

Staged-recipes is a GitHub repository that manages the submission of new pangeo-forge recipes. You can think of this as a holding area for new feedstocks. Read about the staged-recipes roadmap here.

Contributing

Pangeo-forge is just getting started. There's lots of work to do and lots of room for contributors to engage. Here are a few ways you may consider getting involved:

  1. Document an example recipie
  2. Contribute to any of the subprojects above. At the time of writing (8/11/2020), the pangeo-forge API is the most active area of development.
  3. Comment on the project road map in this repository.

Definitions

  • Pipeline: A Python object that defines the steps to aquire, convert, and publish a dataset.
  • Feedstock: A GitHub repository in the pangeo-forge GitHub organization that is managed by pangeo-smithy.

This work is licensed under CC BY 4.0.

About

Pangeo Forge public roadmap

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published