forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue
Pull compat_ioctl cleanup from Arnd. Here's his description: This series concludes the work I did for linux-5.5 on the compat_ioctl() cleanup, killing off fs/compat_ioctl.c and block/compat_ioctl.c by moving everything into drivers. Overall this would be a reduction both in complexity and line count, but as I'm also adding documentation the overall number of lines increases in the end. My plan was originally to keep the SCSI and block parts separate. This did not work easily because of interdependencies: I cannot do the final SCSI cleanup in a good way without first addressing the CDROM ioctls, so this is one series that I hope could be merged through either the block or the scsi git trees, or possibly both if you can pull in the same branch. The series comes in these steps: 1. clean up the sg v3 interface as suggested by Linus. I have talked about this with Doug Gilbert as well, and he would rebase his sg v4 patches on top of "compat: scsi: sg: fix v3 compat read/write interface" 2. Actually moving handlers out of block/compat_ioctl.c and block/scsi_ioctl.c into drivers, mixed in with cleanup patches 3. Document how to do this right. I keep getting asked about this, and it helps to point to some documentation file. The branch is based on another one that fixes a couple of bugs found during the creation of this series. Changes since v3: https://lore.kernel.org/lkml/[email protected]/ - Move sr_compat_ioctl fixup to correct patch (Ben Hutchings) - Add Reviewed-by tags Changes since v2: https://lore.kernel.org/lkml/[email protected]/ - Rebase to v5.5-rc4, which contains the earlier bugfixes - Fix sr_block_compat_ioctl() error handling bug found by Ben Hutchings - Fix idecd_locked_compat_ioctl() compat_ptr() bug - Don't try to handle HDIO_DRIVE_TASKFILE in drivers/ide - More documentation improvements Changes since v1: https://lore.kernel.org/lkml/[email protected]/ - move out the bugfixes into a branch for itself - clean up scsi sg driver further as suggested by Christoph Hellwig - avoid some ifdefs by moving compat_ptr() out of asm/compat.h - split out the blkdev_compat_ptr_ioctl function; bug spotted by Ben Hutchings - Improve formatting of documentation Signed-off-by: Martin K. Petersen <[email protected]>
- Loading branch information
Showing
685 changed files
with
8,480 additions
and
4,607 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -152,6 +152,7 @@ Linus Lüssing <[email protected]> <[email protected]> | |
Linus Lüssing <[email protected]> <[email protected]> | ||
Li Yang <[email protected]> <[email protected]> | ||
Li Yang <[email protected]> <[email protected]> | ||
Lukasz Luba <[email protected]> <[email protected]> | ||
Maciej W. Rozycki <[email protected]> <[email protected]> | ||
Marc Zyngier <[email protected]> <[email protected]> | ||
Marcin Nowakowski <[email protected]> <[email protected]> | ||
|
@@ -265,6 +266,7 @@ Vinod Koul <[email protected]> <[email protected]> | |
Viresh Kumar <[email protected]> <[email protected]> | ||
Viresh Kumar <[email protected]> <[email protected]> | ||
Viresh Kumar <[email protected]> <[email protected]> | ||
Vivien Didelot <[email protected]> <[email protected]> | ||
Vlad Dogaru <[email protected]> <[email protected]> | ||
Vladimir Davydov <[email protected]> <[email protected]> | ||
Vladimir Davydov <[email protected]> <[email protected]> | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
What: /sys/bus/platform/devices/MLNXBF04:00/driver/lifecycle_state | ||
What: /sys/bus/platform/devices/MLNXBF04:00/lifecycle_state | ||
Date: Oct 2019 | ||
KernelVersion: 5.5 | ||
Contact: "Liming Sun <[email protected]>" | ||
|
@@ -10,7 +10,7 @@ Description: | |
GA Non-Secured - Non-Secure chip and not able to change state | ||
RMA - Return Merchandise Authorization | ||
|
||
What: /sys/bus/platform/devices/MLNXBF04:00/driver/post_reset_wdog | ||
What: /sys/bus/platform/devices/MLNXBF04:00/post_reset_wdog | ||
Date: Oct 2019 | ||
KernelVersion: 5.5 | ||
Contact: "Liming Sun <[email protected]>" | ||
|
@@ -19,7 +19,7 @@ Description: | |
to reboot the chip and recover it to the old state if the new | ||
boot partition fails. | ||
|
||
What: /sys/bus/platform/devices/MLNXBF04:00/driver/reset_action | ||
What: /sys/bus/platform/devices/MLNXBF04:00/reset_action | ||
Date: Oct 2019 | ||
KernelVersion: 5.5 | ||
Contact: "Liming Sun <[email protected]>" | ||
|
@@ -30,7 +30,7 @@ Description: | |
emmc - boot from the onchip eMMC | ||
emmc_legacy - boot from the onchip eMMC in legacy (slow) mode | ||
|
||
What: /sys/bus/platform/devices/MLNXBF04:00/driver/second_reset_action | ||
What: /sys/bus/platform/devices/MLNXBF04:00/second_reset_action | ||
Date: Oct 2019 | ||
KernelVersion: 5.5 | ||
Contact: "Liming Sun <[email protected]>" | ||
|
@@ -44,7 +44,7 @@ Description: | |
swap_emmc - swap the primary / secondary boot partition | ||
none - cancel the action | ||
|
||
What: /sys/bus/platform/devices/MLNXBF04:00/driver/secure_boot_fuse_state | ||
What: /sys/bus/platform/devices/MLNXBF04:00/secure_boot_fuse_state | ||
Date: Oct 2019 | ||
KernelVersion: 5.5 | ||
Contact: "Liming Sun <[email protected]>" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,6 +39,7 @@ Core utilities | |
../RCU/index | ||
gcc-plugins | ||
symbol-namespaces | ||
ioctl | ||
|
||
|
||
Interfaces for kernel debugging | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,253 @@ | ||
====================== | ||
ioctl based interfaces | ||
====================== | ||
|
||
ioctl() is the most common way for applications to interface | ||
with device drivers. It is flexible and easily extended by adding new | ||
commands and can be passed through character devices, block devices as | ||
well as sockets and other special file descriptors. | ||
|
||
However, it is also very easy to get ioctl command definitions wrong, | ||
and hard to fix them later without breaking existing applications, | ||
so this documentation tries to help developers get it right. | ||
|
||
Command number definitions | ||
========================== | ||
|
||
The command number, or request number, is the second argument passed to | ||
the ioctl system call. While this can be any 32-bit number that uniquely | ||
identifies an action for a particular driver, there are a number of | ||
conventions around defining them. | ||
|
||
``include/uapi/asm-generic/ioctl.h`` provides four macros for defining | ||
ioctl commands that follow modern conventions: ``_IO``, ``_IOR``, | ||
``_IOW``, and ``_IOWR``. These should be used for all new commands, | ||
with the correct parameters: | ||
|
||
_IO/_IOR/_IOW/_IOWR | ||
The macro name specifies how the argument will be used. It may be a | ||
pointer to data to be passed into the kernel (_IOW), out of the kernel | ||
(_IOR), or both (_IOWR). _IO can indicate either commands with no | ||
argument or those passing an integer value instead of a pointer. | ||
It is recommended to only use _IO for commands without arguments, | ||
and use pointers for passing data. | ||
|
||
type | ||
An 8-bit number, often a character literal, specific to a subsystem | ||
or driver, and listed in :doc:`../userspace-api/ioctl/ioctl-number` | ||
|
||
nr | ||
An 8-bit number identifying the specific command, unique for a give | ||
value of 'type' | ||
|
||
data_type | ||
The name of the data type pointed to by the argument, the command number | ||
encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer, | ||
leading to a limit of 8191 bytes for the maximum size of the argument. | ||
Note: do not pass sizeof(data_type) type into _IOR/_IOW/IOWR, as that | ||
will lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t). | ||
_IO does not have a data_type parameter. | ||
|
||
|
||
Interface versions | ||
================== | ||
|
||
Some subsystems use version numbers in data structures to overload | ||
commands with different interpretations of the argument. | ||
|
||
This is generally a bad idea, since changes to existing commands tend | ||
to break existing applications. | ||
|
||
A better approach is to add a new ioctl command with a new number. The | ||
old command still needs to be implemented in the kernel for compatibility, | ||
but this can be a wrapper around the new implementation. | ||
|
||
Return code | ||
=========== | ||
|
||
ioctl commands can return negative error codes as documented in errno(3); | ||
these get turned into errno values in user space. On success, the return | ||
code should be zero. It is also possible but not recommended to return | ||
a positive 'long' value. | ||
|
||
When the ioctl callback is called with an unknown command number, the | ||
handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in | ||
-ENOTTY being returned from the system call. Some subsystems return | ||
-ENOSYS or -EINVAL here for historic reasons, but this is wrong. | ||
|
||
Prior to Linux 5.5, compat_ioctl handlers were required to return | ||
-ENOIOCTLCMD in order to use the fallback conversion into native | ||
commands. As all subsystems are now responsible for handling compat | ||
mode themselves, this is no longer needed, but it may be important to | ||
consider when backporting bug fixes to older kernels. | ||
|
||
Timestamps | ||
========== | ||
|
||
Traditionally, timestamps and timeout values are passed as ``struct | ||
timespec`` or ``struct timeval``, but these are problematic because of | ||
incompatible definitions of these structures in user space after the | ||
move to 64-bit time_t. | ||
|
||
The ``struct __kernel_timespec`` type can be used instead to be embedded | ||
in other data structures when separate second/nanosecond values are | ||
desired, or passed to user space directly. This is still not ideal though, | ||
as the structure matches neither the kernel's timespec64 nor the user | ||
space timespec exactly. The get_timespec64() and put_timespec64() helper | ||
functions can be used to ensure that the layout remains compatible with | ||
user space and the padding is treated correctly. | ||
|
||
As it is cheap to convert seconds to nanoseconds, but the opposite | ||
requires an expensive 64-bit division, a simple __u64 nanosecond value | ||
can be simpler and more efficient. | ||
|
||
Timeout values and timestamps should ideally use CLOCK_MONOTONIC time, | ||
as returned by ktime_get_ns() or ktime_get_ts64(). Unlike | ||
CLOCK_REALTIME, this makes the timestamps immune from jumping backwards | ||
or forwards due to leap second adjustments and clock_settime() calls. | ||
|
||
ktime_get_real_ns() can be used for CLOCK_REALTIME timestamps that | ||
need to be persistent across a reboot or between multiple machines. | ||
|
||
32-bit compat mode | ||
================== | ||
|
||
In order to support 32-bit user space running on a 64-bit machine, each | ||
subsystem or driver that implements an ioctl callback handler must also | ||
implement the corresponding compat_ioctl handler. | ||
|
||
As long as all the rules for data structures are followed, this is as | ||
easy as setting the .compat_ioctl pointer to a helper function such as | ||
compat_ptr_ioctl() or blkdev_compat_ptr_ioctl(). | ||
|
||
compat_ptr() | ||
------------ | ||
|
||
On the s390 architecture, 31-bit user space has ambiguous representations | ||
for data pointers, with the upper bit being ignored. When running such | ||
a process in compat mode, the compat_ptr() helper must be used to | ||
clear the upper bit of a compat_uptr_t and turn it into a valid 64-bit | ||
pointer. On other architectures, this macro only performs a cast to a | ||
``void __user *`` pointer. | ||
|
||
In an compat_ioctl() callback, the last argument is an unsigned long, | ||
which can be interpreted as either a pointer or a scalar depending on | ||
the command. If it is a scalar, then compat_ptr() must not be used, to | ||
ensure that the 64-bit kernel behaves the same way as a 32-bit kernel | ||
for arguments with the upper bit set. | ||
|
||
The compat_ptr_ioctl() helper can be used in place of a custom | ||
compat_ioctl file operation for drivers that only take arguments that | ||
are pointers to compatible data structures. | ||
|
||
Structure layout | ||
---------------- | ||
|
||
Compatible data structures have the same layout on all architectures, | ||
avoiding all problematic members: | ||
|
||
* ``long`` and ``unsigned long`` are the size of a register, so | ||
they can be either 32-bit or 64-bit wide and cannot be used in portable | ||
data structures. Fixed-length replacements are ``__s32``, ``__u32``, | ||
``__s64`` and ``__u64``. | ||
|
||
* Pointers have the same problem, in addition to requiring the | ||
use of compat_ptr(). The best workaround is to use ``__u64`` | ||
in place of pointers, which requires a cast to ``uintptr_t`` in user | ||
space, and the use of u64_to_user_ptr() in the kernel to convert | ||
it back into a user pointer. | ||
|
||
* On the x86-32 (i386) architecture, the alignment of 64-bit variables | ||
is only 32-bit, but they are naturally aligned on most other | ||
architectures including x86-64. This means a structure like:: | ||
|
||
struct foo { | ||
__u32 a; | ||
__u64 b; | ||
__u32 c; | ||
}; | ||
|
||
has four bytes of padding between a and b on x86-64, plus another four | ||
bytes of padding at the end, but no padding on i386, and it needs a | ||
compat_ioctl conversion handler to translate between the two formats. | ||
|
||
To avoid this problem, all structures should have their members | ||
naturally aligned, or explicit reserved fields added in place of the | ||
implicit padding. The ``pahole`` tool can be used for checking the | ||
alignment. | ||
|
||
* On ARM OABI user space, structures are padded to multiples of 32-bit, | ||
making some structs incompatible with modern EABI kernels if they | ||
do not end on a 32-bit boundary. | ||
|
||
* On the m68k architecture, struct members are not guaranteed to have an | ||
alignment greater than 16-bit, which is a problem when relying on | ||
implicit padding. | ||
|
||
* Bitfields and enums generally work as one would expect them to, | ||
but some properties of them are implementation-defined, so it is better | ||
to avoid them completely in ioctl interfaces. | ||
|
||
* ``char`` members can be either signed or unsigned, depending on | ||
the architecture, so the __u8 and __s8 types should be used for 8-bit | ||
integer values, though char arrays are clearer for fixed-length strings. | ||
|
||
Information leaks | ||
================= | ||
|
||
Uninitialized data must not be copied back to user space, as this can | ||
cause an information leak, which can be used to defeat kernel address | ||
space layout randomization (KASLR), helping in an attack. | ||
|
||
For this reason (and for compat support) it is best to avoid any | ||
implicit padding in data structures. Where there is implicit padding | ||
in an existing structure, kernel drivers must be careful to fully | ||
initialize an instance of the structure before copying it to user | ||
space. This is usually done by calling memset() before assigning to | ||
individual members. | ||
|
||
Subsystem abstractions | ||
====================== | ||
|
||
While some device drivers implement their own ioctl function, most | ||
subsystems implement the same command for multiple drivers. Ideally the | ||
subsystem has an .ioctl() handler that copies the arguments from and | ||
to user space, passing them into subsystem specific callback functions | ||
through normal kernel pointers. | ||
|
||
This helps in various ways: | ||
|
||
* Applications written for one driver are more likely to work for | ||
another one in the same subsystem if there are no subtle differences | ||
in the user space ABI. | ||
|
||
* The complexity of user space access and data structure layout is done | ||
in one place, reducing the potential for implementation bugs. | ||
|
||
* It is more likely to be reviewed by experienced developers | ||
that can spot problems in the interface when the ioctl is shared | ||
between multiple drivers than when it is only used in a single driver. | ||
|
||
Alternatives to ioctl | ||
===================== | ||
|
||
There are many cases in which ioctl is not the best solution for a | ||
problem. Alternatives include: | ||
|
||
* System calls are a better choice for a system-wide feature that | ||
is not tied to a physical device or constrained by the file system | ||
permissions of a character device node | ||
|
||
* netlink is the preferred way of configuring any network related | ||
objects through sockets. | ||
|
||
* debugfs is used for ad-hoc interfaces for debugging functionality | ||
that does not need to be exposed as a stable interface to applications. | ||
|
||
* sysfs is a good way to expose the state of an in-kernel object | ||
that is not tied to a file descriptor. | ||
|
||
* configfs can be used for more complex configuration than sysfs | ||
|
||
* A custom file system can provide extra flexibility with a simple | ||
user interface but adds a lot of complexity to the implementation. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,7 @@ KUnit - Unit Testing for the Linux Kernel | |
|
||
start | ||
usage | ||
kunit-tool | ||
api/index | ||
faq | ||
|
||
|
Oops, something went wrong.