Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize analyze_outcomes.py #8560

Conversation

lpy4105
Copy link
Contributor

@lpy4105 lpy4105 commented Nov 23, 2023

Description

Fix: #8423

Improve the performance of analyze_outcomes.py.

Test locally with outcomes.csv: 1m37s VS 0m11s

PR checklist

  • changelog not required, test script change.
  • backport not required, see comment.
  • tests not required

@lpy4105 lpy4105 added enhancement needs-review Every commit must be reviewed by at least two team members, needs-reviewer This PR needs someone to pick it up for review size-s Estimated task size: small (~2d) component-test Test framework and CI scripts priority-high High priority - will be reviewed soon labels Nov 23, 2023
Copy link
Contributor

@mpg mpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the impressive performance improvement ! I also really like the readability improvements. I only have a couple of minor points and one suggestion for further improvement.

tests/scripts/analyze_outcomes.py Outdated Show resolved Hide resolved
tests/scripts/analyze_outcomes.py Outdated Show resolved Hide resolved
tests/scripts/analyze_outcomes.py Outdated Show resolved Hide resolved
tests/scripts/analyze_outcomes.py Outdated Show resolved Hide resolved
@mpg
Copy link
Contributor

mpg commented Nov 27, 2023

backport done, or not required

I'm on the fence here. On one hand 2.28 does not have driver_vs_reference analysis and only has one task, so all the sharing between task and restructuring to simplify driver_vs_reference is irrelevant there. On the other hand, I think we are going to evolve analyze_coverage in 2.28, see #2691, so perhaps for that work it would be useful if 2.28 and 3.x+ use the same data structure?

@gilles-peskine-arm do you have an opinion here (as you're involved in #2691)?

Copy link
Contributor

@gilles-peskine-arm gilles-peskine-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the performance improvement, but I find the logic hard to follow. This is partly preexisting, but it's getting worse.

tests/scripts/analyze_outcomes.py Outdated Show resolved Hide resolved
tests/scripts/analyze_outcomes.py Outdated Show resolved Hide resolved
tests/scripts/analyze_outcomes.py Outdated Show resolved Hide resolved
tests/scripts/analyze_outcomes.py Outdated Show resolved Hide resolved
tests/scripts/analyze_outcomes.py Show resolved Hide resolved
setup = ';'.join([platform, config])
if key not in outcomes:
outcomes[key] = TestCaseOutcomes()
(_platform, config, suite, case, result, _cause) = line.split(';')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config is not unique: when we run the same all.sh component on Linux and FreeBSD, the entries only differ in the platform column.

I'm not sure if this requires some change to the behavior of the code, or a comment to explain the limitations that this implies.

If a test case passes on Linux and fails on FreeBSD, it'll end up in both the successes set and the failures set. How does this affect the rest of the script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config is not unique

Yes, but currently I think currently we don't have a task that cares about platform. That's why I don't use platform field at present.

How does this affect the rest of the script?

analyze_coverage only cares about the presents of test cases, it doesn't care if a test case is in successes or failures or both.
analyze_driver_vs_reference could be affected if the components are running on multiple platform, but this is not true for current CI?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that currently we don't have any tasks that care about platform, or for which it would be a problem to have a case that's both in successes and failures.

However, before reading Gilles's comment, I had not realised that config is not unique and that this could result in a test case being in both successes and failures set, so I agree that this deserves at least a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, not directly related, but this made me think: I think what both functions care about is not really the distinction between "success" and "failure", but the distinction between "executed" and "skipped": for analyze_coverage that's pretty clear, and for analyze_driver_vs_reference even though the docstring currently says "passing" what this is really about is that we don't want tests being skipped in the driver component unless we know it's expected and justified.

Generally speaking, failures will be found and reported at earlier stages in the CI, so I think this script is really about detecting unexpected skips, not about failures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what both functions care about is not really the distinction between "success" and "failure", but the distinction between "executed" and "skipped"

I agree. But I consider this optional here.

When I wrote the original script, I recorded the pass/fail information because it was very easy and I didn't know how the script would evolve beyond the initial feature of test case coverage. After a few years, it turns out we don't need the pass/fail information, but it doesn't really harm to keep it (it isn't really increasing the complexity).

Copy link
Contributor

@mpg mpg Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's probably out of scope here - and this PR already went quite beyond its original scope, so we don't want to keep growing it. But I think this would make things simpler: if we only care about the distinction executed vs skipped, then we only need to record one of those sets (because a test is skipped iff it isn't executed), and ComponentOutcomes can go from a namedtuple of two set to being just a set - probably with a new name then.

Let's not do it here as there's more value in getting this PR merged quickly. But if you agree with the general approach, I'll create a follow-up task about it.

tests/scripts/analyze_outcomes.py Outdated Show resolved Hide resolved
tests/scripts/analyze_outcomes.py Outdated Show resolved Hide resolved
tests/scripts/analyze_outcomes.py Outdated Show resolved Hide resolved
@gilles-peskine-arm gilles-peskine-arm added needs-work and removed needs-review Every commit must be reviewed by at least two team members, needs-reviewer This PR needs someone to pick it up for review labels Nov 27, 2023
@gilles-peskine-arm
Copy link
Contributor

Regarding backporting: I don't currently expect that this script will change much in 2.28, other than (hopefully, if we finally manage to finish #2691) switching non-coverage from a warning to an error, and expanding the non-coverage allowlist. So it shouldn't hurt if the data structures to keep track of outcomes are different in 2.28 and 3.x+.

We don't care about the number of hits of the test cases,
so break the iteration when the case hits.

Signed-off-by: Pengyu Lv <[email protected]>
@lpy4105 lpy4105 force-pushed the issue/8423/optimize-analyze_outcomes_py branch from c7c35b4 to 18908ec Compare November 28, 2023 05:04
@lpy4105 lpy4105 added needs-review Every commit must be reviewed by at least two team members, and removed needs-work labels Nov 28, 2023
Copy link
Contributor

@gilles-peskine-arm gilles-peskine-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from a couple of minor points

tests/scripts/analyze_outcomes.py Outdated Show resolved Hide resolved
tests/scripts/analyze_outcomes.py Outdated Show resolved Hide resolved
Also fix a typo in the comments.

Signed-off-by: Pengyu Lv <[email protected]>
Copy link
Contributor

@mpg mpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me. Thanks for improving the script that much - not just performance but also making the code clearer and cleaner!

Just one minor thing about the boolean, and one suggestion for slightly more compact code in one place.

tests/scripts/analyze_outcomes.py Outdated Show resolved Hide resolved
tests/scripts/analyze_outcomes.py Outdated Show resolved Hide resolved
Signed-off-by: Pengyu Lv <[email protected]>
Copy link
Contributor

@mpg mpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Contributor

@gilles-peskine-arm gilles-peskine-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gilles-peskine-arm gilles-peskine-arm added approved Design and code approved - may be waiting for CI or backports and removed needs-review Every commit must be reviewed by at least two team members, labels Nov 29, 2023
@gilles-peskine-arm gilles-peskine-arm added this pull request to the merge queue Nov 29, 2023
Merged via the queue into Mbed-TLS:development with commit 18eab98 Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Design and code approved - may be waiting for CI or backports component-test Test framework and CI scripts enhancement priority-high High priority - will be reviewed soon size-s Estimated task size: small (~2d)
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Optimize ref-vs-driver in analyze_outcomes.py
3 participants