Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate providers into sub-projects #44511

Closed
potiuk opened this issue Nov 30, 2024 · 4 comments
Closed

Separate providers into sub-projects #44511

potiuk opened this issue Nov 30, 2024 · 4 comments
Assignees

Comments

@potiuk
Copy link
Member

potiuk commented Nov 30, 2024

Each provider should have it's own independed sub-project in our mono-repo.

They should be fully "standalone" so that you can not only develop them completely independently from airflow core, but also that all the dependencies of thoseshould be stored in their own pyproject.toml. And uv workspace feature should be used to bind together all the provider sub-projects so that you can continue the development where you have airflow, task-sdk, providers and all the other sub-projects of Airlflow monorepo together in a single editable environment.

This move has been attempted earlier in #28292 and #28291 - but we did not have a good "workspace" solution and Airflow 2 namespace approach prevented us from making it good environment for provider development. With Airflow 3 and uv workspace feature that has been added - largely with our input so that Airlfow's provider structure could benefit from the uv workspace functionality, it's now entirely possible to do.

This means that dependencies should be moved from providrer.yaml files to pyproject.toml

The ideal setup there is to have this kind of structure (details to be worked out):

providers/
         providers-amazon/
                        src
                        tests-integration
                        tests-system
                        tests
                        docs
                        pyproject.toml
                        ...

Some important properties of the solution:

  • Airflow "core" projects should not rely on providers being installed
  • It should be possible to install all airflow core packages and providers and synchronize/resolves wit `uv sync
  • it should be possible to install provider in --editable mode treating it as separate project from the workspace
  • It should be possible to install provider with GitHub URL
  • docs/ all kinds of tests, images etc. should all work independently (though thanks to monorepo, we can keep the code to run those in breeze as we do currently. Some of the current doc code will need to be moved to breeze as well for that likely
  • we should be able to apply pyproject.toml changes for all providers automatically (might be semi-automated or with pre-commit). Quite often we make "global" changes there that affect all providers - and currently it is done via modifying breeze and templates for dynamically generated pyproject.toml file
  • we need to keep reproducibility of provider packages intact - which likely means that they should be still generated with breeze - with all the "extra" stuff such as making sure we have controlled package build environment.
  • we will need to change building of packages in CI in docker container environment - while currently we use flit as build backend and this comes from generated pyproject.toml that is placed inside breeze, if we keep pyproject.toml files in the repository, incoming PRs from forks might change build backends and thus inject any code in our build process

The most likely way to implement it is to:

  1. manually convert one / few representative but not biggest providers first (POC) - and make a few releases with those - while updating our breeze automation to work in both cases - that will allow to iron out some teething problems
  2. develop automation for converting the providers - similar to Add script to move providers to a new directory structure #28291
  3. perform the test if the uv workspace feature is usable at scale of 100+ projects bound together (and work with uv team to fix it if not)
  4. apply - rather quickly, but incrementally - the automation to all the providers of ours - while letting all the in-progress contributors about the changes upfront and explaing what needs to be done
@SuccessMoses
Copy link
Contributor

sounds like a good idea

@uranusjr
Copy link
Member

uranusjr commented Dec 9, 2024

Sounds good to me to. At this point I wonder if we should also move core Airflow into a subdirectory.

@potiuk
Copy link
Member Author

potiuk commented Dec 9, 2024

Sounds good to me to. At this point I wonder if we should also move core Airflow into a subdirectory.

Absolutely.

potiuk added a commit to potiuk/airflow that referenced this issue Dec 21, 2024
When providers system tests were modified in provider tests were
modified, selective checks had two bugs that compounded led to provider tests
being skipped.

1) Modification of system tests lead to "all providers affected"
   condition (wrongly) - because "TESTS" were checked as root path
   before "SYSTEM TESTS" - and since system tests were subfolder
   of tests, the code that checked if system tests path belongs
   to provider never found it.

2) "All Providers affected" in such case (when it was not caused by
   another condition lead to "skip-provider-tests" set to true due
   to missing "else" in one of the conditions.

This PR fixes both cases and simplifies the conditions so that they are
easier to understand and modify. Those conditions will be further
simplified when we separate providers to separate projects apache#44511
because there we will not have to check separately several sub-folders
of airflow, but for now it's good enough.

Some of those conditions are simplified as well because we are in
Python 3.9+ and "is_relative_to" is now available in Path.
potiuk added a commit to potiuk/airflow that referenced this issue Dec 21, 2024
When providers system tests were modified in provider tests were
modified, selective checks had two bugs that compounded led to provider tests
being skipped.

1) Modification of system tests lead to "all providers affected"
   condition (wrongly) - because "TESTS" were checked as root path
   before "SYSTEM TESTS" - and since system tests were subfolder
   of tests, the code that checked if system tests path belongs
   to provider never found it.

2) "All Providers affected" in such case (when it was not caused by
   another condition lead to "skip-provider-tests" set to true due
   to missing "else" in one of the conditions.

This PR fixes both cases and simplifies the conditions so that they are
easier to understand and modify. Those conditions will be further
simplified when we separate providers to separate projects apache#44511
because there we will not have to check separately several sub-folders
of airflow, but for now it's good enough.

Some of those conditions are simplified as well because we are in
Python 3.9+ and "is_relative_to" is now available in Path.
potiuk added a commit to potiuk/airflow that referenced this issue Dec 21, 2024
When providers system tests were modified in provider tests were
modified, selective checks had two bugs that compounded led to provider tests
being skipped.

1) Modification of system tests lead to "all providers affected"
   condition (wrongly) - because "TESTS" were checked as root path
   before "SYSTEM TESTS" - and since system tests were subfolder
   of tests, the code that checked if system tests path belongs
   to provider never found it.

2) "All Providers affected" in such case (when it was not caused by
   another condition lead to "skip-provider-tests" set to true due
   to missing "else" in one of the conditions.

This PR fixes both cases and simplifies the conditions so that they are
easier to understand and modify. Those conditions will be further
simplified when we separate providers to separate projects apache#44511
because there we will not have to check separately several sub-folders
of airflow, but for now it's good enough.

Some of those conditions are simplified as well because we are in
Python 3.9+ and "is_relative_to" is now available in Path.
potiuk added a commit that referenced this issue Dec 21, 2024
When providers system tests were modified in provider tests were
modified, selective checks had two bugs that compounded led to provider tests
being skipped.

1) Modification of system tests lead to "all providers affected"
   condition (wrongly) - because "TESTS" were checked as root path
   before "SYSTEM TESTS" - and since system tests were subfolder
   of tests, the code that checked if system tests path belongs
   to provider never found it.

2) "All Providers affected" in such case (when it was not caused by
   another condition lead to "skip-provider-tests" set to true due
   to missing "else" in one of the conditions.

This PR fixes both cases and simplifies the conditions so that they are
easier to understand and modify. Those conditions will be further
simplified when we separate providers to separate projects #44511
because there we will not have to check separately several sub-folders
of airflow, but for now it's good enough.

Some of those conditions are simplified as well because we are in
Python 3.9+ and "is_relative_to" is now available in Path.
@potiuk potiuk moved this from Ready to In progress in CI / DEV ENV planned work Dec 24, 2024
@potiuk potiuk self-assigned this Dec 24, 2024
LefterisXefteris pushed a commit to LefterisXefteris/airflow that referenced this issue Jan 5, 2025
When providers system tests were modified in provider tests were
modified, selective checks had two bugs that compounded led to provider tests
being skipped.

1) Modification of system tests lead to "all providers affected"
   condition (wrongly) - because "TESTS" were checked as root path
   before "SYSTEM TESTS" - and since system tests were subfolder
   of tests, the code that checked if system tests path belongs
   to provider never found it.

2) "All Providers affected" in such case (when it was not caused by
   another condition lead to "skip-provider-tests" set to true due
   to missing "else" in one of the conditions.

This PR fixes both cases and simplifies the conditions so that they are
easier to understand and modify. Those conditions will be further
simplified when we separate providers to separate projects apache#44511
because there we will not have to check separately several sub-folders
of airflow, but for now it's good enough.

Some of those conditions are simplified as well because we are in
Python 3.9+ and "is_relative_to" is now available in Path.
agupta01 pushed a commit to agupta01/airflow that referenced this issue Jan 6, 2025
When providers system tests were modified in provider tests were
modified, selective checks had two bugs that compounded led to provider tests
being skipped.

1) Modification of system tests lead to "all providers affected"
   condition (wrongly) - because "TESTS" were checked as root path
   before "SYSTEM TESTS" - and since system tests were subfolder
   of tests, the code that checked if system tests path belongs
   to provider never found it.

2) "All Providers affected" in such case (when it was not caused by
   another condition lead to "skip-provider-tests" set to true due
   to missing "else" in one of the conditions.

This PR fixes both cases and simplifies the conditions so that they are
easier to understand and modify. Those conditions will be further
simplified when we separate providers to separate projects apache#44511
because there we will not have to check separately several sub-folders
of airflow, but for now it's good enough.

Some of those conditions are simplified as well because we are in
Python 3.9+ and "is_relative_to" is now available in Path.
got686-yandex pushed a commit to got686-yandex/airflow that referenced this issue Jan 30, 2025
When providers system tests were modified in provider tests were
modified, selective checks had two bugs that compounded led to provider tests
being skipped.

1) Modification of system tests lead to "all providers affected"
   condition (wrongly) - because "TESTS" were checked as root path
   before "SYSTEM TESTS" - and since system tests were subfolder
   of tests, the code that checked if system tests path belongs
   to provider never found it.

2) "All Providers affected" in such case (when it was not caused by
   another condition lead to "skip-provider-tests" set to true due
   to missing "else" in one of the conditions.

This PR fixes both cases and simplifies the conditions so that they are
easier to understand and modify. Those conditions will be further
simplified when we separate providers to separate projects apache#44511
because there we will not have to check separately several sub-folders
of airflow, but for now it's good enough.

Some of those conditions are simplified as well because we are in
Python 3.9+ and "is_relative_to" is now available in Path.
@potiuk potiuk closed this as completed Feb 18, 2025
@github-project-automation github-project-automation bot moved this from In progress to Done in CI / DEV ENV planned work Feb 18, 2025
@potiuk
Copy link
Member Author

potiuk commented Feb 18, 2025

This is officially done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants