Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Move carbon cost calculations to backend #17044

Open
nuwang opened this issue Nov 17, 2023 · 19 comments
Open

[Feature Request] Move carbon cost calculations to backend #17044

nuwang opened this issue Nov 17, 2023 · 19 comments

Comments

@nuwang
Copy link
Member

nuwang commented Nov 17, 2023

The carbon cost calculations are really great and very nicely done. Recently, I was trying to calculate the carbon cost of some jobs and realized that the calculations were being done on the front end. I managed to extract the code with some effort, but I think it would be really great to consider moving this to the backend, and also include a gxadmin query that could be used to figure out the carbon cost of a job.

Feature request

  • Calculate the carbon/energy costs whenever a job is completed, maybe in a separate table?
  • Enable carbon/energy cost per job to be interrogated from the server
  • Add a gxadmin query for fetching job carbon/energy costs as well as aggregate carbon/energy costs

Related issues: #15046

@bgruening
Copy link
Member

@Renni771 is currently looking into exactly that as part of his thesis :-)

@mvdbeek
Copy link
Member

mvdbeek commented Nov 17, 2023

I don't think we should persist this on the backend, this is stuff you can easily do on demand and has nothing to do with Galaxy. If this is part of some client tooling we ship with Galaxy that is fine, but otherwise we should only record raw data.

@bgruening
Copy link
Member

bgruening commented Nov 17, 2023

We will record raw data and try to persist it in the DB. E.g. overall cpu-runtime. overall memory consumptions etc ... @Renni771 maybe you can lay out your plan here once you have it, instead of creating a new issue. Thanks.

@Renni771
Copy link
Contributor

@mvdbeek, @bgruening the preliminary plan is as follows:

The thesis' focus is on raising awareness for green computing. The main motivation of moving the carbon emissions estimation logic to the backend is that we are considering storing an "all time" carbon emissions rating for a user. Additional features like calculating the carbon emissions of a history or an entire workflow are also being planned. These total values are something we're considering persisting in the DB so they don't always need to be re-calculated "from history" on the fly - most particularly in the case of workflows.

Constantly calculating emissions values on demand isn't an issue for single jobs/histories until we get to the point were we consider emissions for workflows as this relies on runtime metrics that I'm not currently sure we can get until someone has actually run the job at least once before, so we have access to data like CPU usage and runtime, for example.

I understand that this logic doesn't necessarily need to live in the backend as it can be done on demand. In the case of histories, we don't need to move the logic since the job metrics, upon which the emissions estimations are based, are already on the galaxy client. This may be a motivation to encapsulate the logic into an API endpoint so that any user can calculate estimations for "anything" really. This allows us to decouple the logic from the backend and client so we don't have to ship the feature as a part of galaxy and this solves the issues of user's not having access to the logic which currently lives on the client.

@mvdbeek
Copy link
Member

mvdbeek commented Nov 21, 2023

This is a little vague, do you agree this can be a separate service ?

@Renni771
Copy link
Contributor

@mvdbeek Yes, as I understand it, storing the logic in an API endpoint is in that sense providing it as a service. Unless we're at a misunderstanding here. What do you mean here?

@mvdbeek
Copy link
Member

mvdbeek commented Nov 21, 2023

I mean this doesn't have to be part of core Galaxy, this can be an external script you run.

@Renni771
Copy link
Contributor

Forgive me, as I'm not familiar with entire galaxy architecture, but yes, I agree that we can provide the carbon estimation logic as service in an external script. Where would this actually live in the code base?

@mvdbeek
Copy link
Member

mvdbeek commented Nov 21, 2023

You can create a new project on github, I don't think this needs to live with the codebase either.

@bgruening
Copy link
Member

As @Renni771 said its about creating awareness. So we think this needs to be exposed in the account of a user.

Independent of the carbon emission / or carbon cost, we think that we need to have an aggregated number of CPU-hours, Memory usage, storage usage ... others? Those matrices are meaningful for users who need to estimate future resources and also for admins and PIs.
We think that those aggregated numbers need to be stored somewhere (in the DB) per user and should not be calculated on-demand, as this is expensive.

On top of those aggregated numbers, we can then do some interesting things - carbon costs is just one of them.

@mvdbeek
Copy link
Member

mvdbeek commented Nov 21, 2023

I am happy if you want to improve the API and services, recording and management of resource data etc. I don't know that I fundamentally agree with

should not be calculated on-demand, as this is expensive.

and I am skeptical that we want to store aggregate data in galaxy's database, this doesn't seem like the best fit.

gxadmin query that could be used to figure out the carbon cost of a job.

seems like a good start that can feed any sort of external service.

@nuwang
Copy link
Member Author

nuwang commented Nov 21, 2023

I'm partial to Marius' suggestion. This could be an "add-on" to galaxy, which lives in a separate repository like TPV does. The main difference will be that it would also have some database tables that can technically be created in the same database as Galaxy, have its own migrations scripts etc., but the code for it doesn't need to live in the same repo.

If it does, is it ok to have referential integrity with the existing Job table for example? I've not really looked into how that kind of thing is generally modelled in SQLAlchemy. This could also be a nice opportunity to explore these kinds of add-ons in general.

@ElectronicBlueberry
Copy link
Member

Regardless of performance, consumption information can only be calculated on demand if jobs are never deleted. Otherwise an aggregate is necessary for the information to be reliable.

@mvdbeek
Copy link
Member

mvdbeek commented Nov 21, 2023

Jobs are never deleted at this point, and we don't do aggregates at all in the current app, so that's something to figure out were one to go down that direction.

@davelopez
Copy link
Contributor

So, if I understand this correctly, the actual calculation of carbon emission, AWS costs, or any other processed metric should be an external service (API outside Galaxy), and what Galaxy can provide is just the raw computation data (CPU time, memory, cores, etc.) whatever Galaxy is already providing (or improving it if it makes sense).

Then on the client, you can have "plugins" that would request the raw computation metrics from Galaxy and pass them to those external services to get a result and show it to the user.

I guess there is also value in "not aggregating" the raw values in any database and instead allowing to query particular date ranges, etc., and aggregate them on demand.

@nuwang
Copy link
Member Author

nuwang commented Nov 21, 2023

Then on the client, you can have "plugins" that would request the raw computation metrics from Galaxy and pass them to those external services to get a result and show it to the user.

I'm not sure I understood this. I thought this could be done in the backend somehow? Having the client deal with this would mean that no other system can query this same information without repeating effort. Some separate, independent backend could do it, and how that service obtains information from Galaxy and populates its internal data (or not) would be an implementation decision, but I think it should just offer some ReST API that anyone can consume?

@davelopez
Copy link
Contributor

davelopez commented Nov 21, 2023

The raw data is what you get from Galaxy in the client, for example, for a particular user, all the metrics of all the jobs run in the last month, year, etc.

This raw data is now in the client store. Now you can aggregate it and store the aggregation in the store too. Then you can use this data to query different external services like the carbon emissions and the AWS costs to name some (notice there is no duplication of effort other than each service needs to use the "same aggregated data" from the store and return a different result) and then render in the UI the result.

Just an idea, it might not be the ideal solution.

@Renni771
Copy link
Contributor

Renni771 commented Nov 24, 2023

This raw data is now in the client store. Now you can aggregate it and store the aggregation in the store too. Then you can use this data to query different external services like the carbon emissions and the AWS costs to name some (notice there is no duplication of effort other than each service needs to use the "same aggregated data" from the store and return a different result) and then render in the UI the result.

I understand this idea since the current carbon emissions implementation does literally this on the client, just like the AWS estimates.

The general data flow is:

Job metrics plugin --> job metrics store (client) ---> carbon emissions component (CO2 emissions logic lives here)
           ^
           |
           |
           v      
Job metrics endpoint (backend)

I'm actually not opposed to the idea of querying and processing raw metrics data, for histories and workflows, from stores as this is currently what galaxy does - the metrics data is also already on the client in a store. This almost suggests that keeping this logic on the client is the way to go. What do you think @davelopez, @bgruening?

Then on the client, you can have "plugins" that would request the raw computation metrics from Galaxy and pass them to those external services to get a result and show it to the user.

Would we agree that encapsulating the carbon emissions logic as an external service outside of the client (since it has nothing to do with galaxy core) is an alternative preferred approach? If so, what benefit does this give us, besides being able to run carbon emission estimates outside of client environments?

@davelopez
Copy link
Contributor

This almost suggests that keeping this logic on the client is the way to go.

IMHO for now it's fine to keep the logic in the client but we should strive to make it an external service if possible. Maybe a tiny FastAPI application that admins can run as a micro-service would be enough?

Would we agree that encapsulating the carbon emissions logic as an external service outside of the client (since it has nothing to do with galaxy core) is an alternative preferred approach? If so, what benefit does this give us, besides being able to run carbon emission estimates outside of client environments?

Exactly those two benefits 😄

  • Not having something unrelated to Galaxy in Galaxy core + maintenance burden
  • Address this feature request to reuse the carbon emission calculations somewhere else. Which can still include a gxadmin query that uses this external service to calculate the costs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Triage/Discuss
Development

No branches or pull requests

6 participants