Skip to content

Commit

Permalink
docs: *-regressions.rst: explain how quickly issues should be handled
Browse files Browse the repository at this point in the history
Add a section with a few rules of thumb about how
quickly developers should address regressions to
Documentation/process/handling-regressions.rst; additionally,
add a short paragraph about this to the companion document
Documentation/admin-guide/reporting-regressions.rst as well.

The rules of thumb were written after studying the quotes from Linus
found in handling-regressions.rst and especially influenced by
statements like "Users are literally the _only_ thing that matters" and
"without users, your program is not a program, it's a pointless piece of
code that you might as well throw away". The author interpreted those in
perspective to how the various Linux kernel series are maintained
currently and what those practices might mean for users running into a
regression on a small or big kernel update.

That for example lead to the paragraph starting with "Aim to get fixes
for regressions mainlined within one week after identifying the culprit,
if the regression was introduced in a stable/longterm release or the
devel cycle for the latest mainline release". Some might see this as
pretty high bar, but on the other hand something like that is needed to
not leave users out in the cold for too long -- which can quickly happen
when updating to the latest stable series, as the previous one is
normally stamped "End of Life" about three or four weeks after a new
mainline release. This makes a lot of users switch during this
timeframe. Any of them thus risk running into regressions not promptly
fixed; even worse, once the previous stable series is EOLed for real,
users that face a regression might be left with only three options:

 (1) continue running an outdated and thus potentially insecure kernel
     version from an abandoned stable series

 (2) run the kernel with the regression

 (3) downgrade to an earlier longterm series still supported

This is better avoided, as (1) puts users and their data in danger, (2)
will only be possible if it's a minor regression that doesn't interfere
with booting or serious usage, and (3) might be regression itself or
impossible on the particular machine, as the users might require drivers
or features only introduced after the latest longterm series branched
of.

In the end this lead to the aforementioned "Aim to fix regression within
one week" part. It's also the reason for the "Try to resolve any
regressions introduced in the current development cycle before its
end.".

Signed-off-by: Thorsten Leemhuis <[email protected]>
CC: Linus Torvalds <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Lukas Bulwahn <[email protected]>
Link: https://lore.kernel.org/r/a7b717b52c0d54cdec9b6daf56ed6669feddee2c.1644994117.git.linux@leemhuis.info
Signed-off-by: Jonathan Corbet <[email protected]>
  • Loading branch information
knurd authored and Jonathan Corbet committed Feb 24, 2022
1 parent 1ecf393 commit d2b40ba
Show file tree
Hide file tree
Showing 2 changed files with 99 additions and 0 deletions.
12 changes: 12 additions & 0 deletions Documentation/admin-guide/reporting-regressions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,18 @@ your report on the radar of these people by CCing or forwarding each report to
the regressions mailing list, ideally with a "regzbot command" in your mail to
get it tracked immediately.

How quickly are regressions normally fixed?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Developers should fix any reported regression as quickly as possible, to provide
affected users with a solution in a timely manner and prevent more users from
running into the issue; nevertheless developers need to take enough time and
care to ensure regression fixes do not cause additional damage.

The answer thus depends on various factors like the impact of a regression, its
age, or the Linux series in which it occurs. In the end though, most regressions
should be fixed within two weeks.

Is it a regression, if the issue can be avoided by updating some software?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
87 changes: 87 additions & 0 deletions Documentation/process/handling-regressions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@ The important bits (aka "The TL;DR")
mandated by Documentation/process/submitting-patches.rst and
:ref:`Documentation/process/5.Posting.rst <development_posting>`.

#. Try to fix regressions quickly once the culprit has been identified; fixes
for most regressions should be merged within two weeks, but some need to be
resolved within two or three days.


All the details on Linux kernel regressions relevant for developers
===================================================================
Expand Down Expand Up @@ -125,6 +129,89 @@ tools and scripts used by other kernel developers or Linux distributions; one of
these tools is regzbot, which heavily relies on the "Link:" tags to associate
reports for regression with changes resolving them.

Prioritize work on fixing regressions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You should fix any reported regression as quickly as possible, to provide
affected users with a solution in a timely manner and prevent more users from
running into the issue; nevertheless developers need to take enough time and
care to ensure regression fixes do not cause additional damage.

In the end though, developers should give their best to prevent users from
running into situations where a regression leaves them only three options: "run
a kernel with a regression that seriously impacts usage", "continue running an
outdated and thus potentially insecure kernel version for more than two weeks
after a regression's culprit was identified", and "downgrade to a still
supported kernel series that lack required features".

How to realize this depends a lot on the situation. Here are a few rules of
thumb for you, in order or importance:

* Prioritize work on handling regression reports and fixing regression over all
other Linux kernel work, unless the latter concerns acute security issues or
bugs causing data loss or damage.

* Always consider reverting the culprit commits and reapplying them later
together with necessary fixes, as this might be the least dangerous and
quickest way to fix a regression.

* Developers should handle regressions in all supported kernel series, but are
free to delegate the work to the stable team, if the issue probably at no
point in time occurred with mainline.

* Try to resolve any regressions introduced in the current development before
its end. If you fear a fix might be too risky to apply only days before a new
mainline release, let Linus decide: submit the fix separately to him as soon
as possible with the explanation of the situation. He then can make a call
and postpone the release if necessary, for example if multiple such changes
show up in his inbox.

* Address regressions in stable, longterm, or proper mainline releases with
more urgency than regressions in mainline pre-releases. That changes after
the release of the fifth pre-release, aka "-rc5": mainline then becomes as
important, to ensure all the improvements and fixes are ideally tested
together for at least one week before Linus releases a new mainline version.

* Fix regressions within two or three days, if they are critical for some
reason -- for example, if the issue is likely to affect many users of the
kernel series in question on all or certain architectures. Note, this
includes mainline, as issues like compile errors otherwise might prevent many
testers or continuous integration systems from testing the series.

* Aim to fix regressions within one week after the culprit was identified, if
the issue was introduced in either:

* a recent stable/longterm release

* the development cycle of the latest proper mainline release

In the latter case (say Linux v5.14), try to address regressions even
quicker, if the stable series for the predecessor (v5.13) will be abandoned
soon or already was stamped "End-of-Life" (EOL) -- this usually happens about
three to four weeks after a new mainline release.

* Try to fix all other regressions within two weeks after the culprit was
found. Two or three additional weeks are acceptable for performance
regressions and other issues which are annoying, but don't prevent anyone
from running Linux (unless it's an issue in the current development cycle,
as those should ideally be addressed before the release). A few weeks in
total are acceptable if a regression can only be fixed with a risky change
and at the same time is affecting only a few users; as much time is
also okay if the regression is already present in the second newest longterm
kernel series.

Note: The aforementioned time frames for resolving regressions are meant to
include getting the fix tested, reviewed, and merged into mainline, ideally with
the fix being in linux-next at least briefly. This leads to delays you need to
account for.

Subsystem maintainers are expected to assist in reaching those periods by doing
timely reviews and quick handling of accepted patches. They thus might have to
send git-pull requests earlier or more often than usual; depending on the fix,
it might even be acceptable to skip testing in linux-next. Especially fixes for
regressions in stable and longterm kernels need to be handled quickly, as fixes
need to be merged in mainline before they can be backported to older series.


More aspects regarding regressions developers should be aware of
----------------------------------------------------------------
Expand Down

0 comments on commit d2b40ba

Please sign in to comment.