Skip to content

Commit

Permalink
Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsv…
Browse files Browse the repository at this point in the history
…erity/linux

Pull fsverity updates from Eric Biggers:
 "Fix the longstanding implementation limitation that fsverity was only
  supported when the Merkle tree block size, filesystem block size, and
  PAGE_SIZE were all equal.

  Specifically, add support for Merkle tree block sizes less than
  PAGE_SIZE, and make ext4 support fsverity on filesystems where the
  filesystem block size is less than PAGE_SIZE.

  Effectively, this means that fsverity can now be used on systems with
  non-4K pages, at least on ext4. These changes have been tested using
  the verity group of xfstests, newly updated to cover the new code
  paths.

  Also update fs/verity/ to support verifying data from large folios.

  There's also a similar patch for fs/crypto/, to support decrypting
  data from large folios, which I'm including in here to avoid a merge
  conflict between the fscrypt and fsverity branches"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
  fscrypt: support decrypting data from large folios
  fsverity: support verifying data from large folios
  fsverity.rst: update git repo URL for fsverity-utils
  ext4: allow verity with fs block size < PAGE_SIZE
  fs/buffer.c: support fsverity in block_read_full_folio()
  f2fs: simplify f2fs_readpage_limit()
  ext4: simplify ext4_readpage_limit()
  fsverity: support enabling with tree block size < PAGE_SIZE
  fsverity: support verification with tree block size < PAGE_SIZE
  fsverity: replace fsverity_hash_page() with fsverity_hash_block()
  fsverity: use EFBIG for file too large to enable verity
  fsverity: store log2(digest_size) precomputed
  fsverity: simplify Merkle tree readahead size calculation
  fsverity: use unsigned long for level_start
  fsverity: remove debug messages and CONFIG_FS_VERITY_DEBUG
  fsverity: pass pos and size to ->write_merkle_tree_block
  fsverity: optimize fsverity_cleanup_inode() on non-verity files
  fsverity: optimize fsverity_prepare_setattr() on non-verity files
  fsverity: optimize fsverity_file_open() on non-verity files
  • Loading branch information
torvalds committed Feb 20, 2023
2 parents f18f984 + 51e4e31 commit 6639c3c
Show file tree
Hide file tree
Showing 22 changed files with 699 additions and 500 deletions.
4 changes: 2 additions & 2 deletions Documentation/filesystems/fscrypt.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1277,8 +1277,8 @@ the file contents themselves, as described below:

For the read path (->read_folio()) of regular files, filesystems can
read the ciphertext into the page cache and decrypt it in-place. The
page lock must be held until decryption has finished, to prevent the
page from becoming visible to userspace prematurely.
folio lock must be held until decryption has finished, to prevent the
folio from becoming visible to userspace prematurely.

For the write path (->writepage()) of regular files, filesystems
cannot encrypt data in-place in the page cache, since the cached
Expand Down
96 changes: 47 additions & 49 deletions Documentation/filesystems/fsverity.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,10 +118,11 @@ as follows:
- ``hash_algorithm`` must be the identifier for the hash algorithm to
use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256. See
``include/uapi/linux/fsverity.h`` for the list of possible values.
- ``block_size`` must be the Merkle tree block size. Currently, this
must be equal to the system page size, which is usually 4096 bytes.
Other sizes may be supported in the future. This value is not
necessarily the same as the filesystem block size.
- ``block_size`` is the Merkle tree block size, in bytes. In Linux
v6.3 and later, this can be any power of 2 between (inclusively)
1024 and the minimum of the system page size and the filesystem
block size. In earlier versions, the page size was the only allowed
value.
- ``salt_size`` is the size of the salt in bytes, or 0 if no salt is
provided. The salt is a value that is prepended to every hashed
block; it can be used to personalize the hashing for a particular
Expand Down Expand Up @@ -161,6 +162,7 @@ FS_IOC_ENABLE_VERITY can fail with the following errors:
- ``EBUSY``: this ioctl is already running on the file
- ``EEXIST``: the file already has verity enabled
- ``EFAULT``: the caller provided inaccessible memory
- ``EFBIG``: the file is too large to enable verity on
- ``EINTR``: the operation was interrupted by a fatal signal
- ``EINVAL``: unsupported version, hash algorithm, or block size; or
reserved bits are set; or the file descriptor refers to neither a
Expand Down Expand Up @@ -495,9 +497,11 @@ To create verity files on an ext4 filesystem, the filesystem must have
been formatted with ``-O verity`` or had ``tune2fs -O verity`` run on
it. "verity" is an RO_COMPAT filesystem feature, so once set, old
kernels will only be able to mount the filesystem readonly, and old
versions of e2fsck will be unable to check the filesystem. Moreover,
currently ext4 only supports mounting a filesystem with the "verity"
feature when its block size is equal to PAGE_SIZE (often 4096 bytes).
versions of e2fsck will be unable to check the filesystem.

Originally, an ext4 filesystem with the "verity" feature could only be
mounted when its block size was equal to the system page size
(typically 4096 bytes). In Linux v6.3, this limitation was removed.

ext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files. It
can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared.
Expand All @@ -518,9 +522,7 @@ support paging multi-gigabyte xattrs into memory, and to support
encrypting xattrs. Note that the verity metadata *must* be encrypted
when the file is, since it contains hashes of the plaintext data.

Currently, ext4 verity only supports the case where the Merkle tree
block size, filesystem block size, and page size are all the same. It
also only supports extent-based files.
ext4 only allows verity on extent-based files.

f2fs
----
Expand All @@ -538,11 +540,10 @@ Like ext4, f2fs stores the verity metadata (Merkle tree and
fsverity_descriptor) past the end of the file, starting at the first
64K boundary beyond i_size. See explanation for ext4 above.
Moreover, f2fs supports at most 4096 bytes of xattr entries per inode
which wouldn't be enough for even a single Merkle tree block.
which usually wouldn't be enough for even a single Merkle tree block.

Currently, f2fs verity only supports a Merkle tree block size of 4096.
Also, f2fs doesn't support enabling verity on files that currently
have atomic or volatile writes pending.
f2fs doesn't support enabling verity on files that currently have
atomic or volatile writes pending.

btrfs
-----
Expand All @@ -567,51 +568,48 @@ Pagecache
~~~~~~~~~

For filesystems using Linux's pagecache, the ``->read_folio()`` and
``->readahead()`` methods must be modified to verify pages before they
are marked Uptodate. Merely hooking ``->read_iter()`` would be
``->readahead()`` methods must be modified to verify folios before
they are marked Uptodate. Merely hooking ``->read_iter()`` would be
insufficient, since ``->read_iter()`` is not used for memory maps.

Therefore, fs/verity/ provides a function fsverity_verify_page() which
verifies a page that has been read into the pagecache of a verity
inode, but is still locked and not Uptodate, so it's not yet readable
by userspace. As needed to do the verification,
fsverity_verify_page() will call back into the filesystem to read
Merkle tree pages via fsverity_operations::read_merkle_tree_page().
Therefore, fs/verity/ provides the function fsverity_verify_blocks()
which verifies data that has been read into the pagecache of a verity
inode. The containing folio must still be locked and not Uptodate, so
it's not yet readable by userspace. As needed to do the verification,
fsverity_verify_blocks() will call back into the filesystem to read
hash blocks via fsverity_operations::read_merkle_tree_page().

fsverity_verify_page() returns false if verification failed; in this
case, the filesystem must not set the page Uptodate. Following this,
fsverity_verify_blocks() returns false if verification failed; in this
case, the filesystem must not set the folio Uptodate. Following this,
as per the usual Linux pagecache behavior, attempts by userspace to
read() from the part of the file containing the page will fail with
EIO, and accesses to the page within a memory map will raise SIGBUS.

fsverity_verify_page() currently only supports the case where the
Merkle tree block size is equal to PAGE_SIZE (often 4096 bytes).
read() from the part of the file containing the folio will fail with
EIO, and accesses to the folio within a memory map will raise SIGBUS.

In principle, fsverity_verify_page() verifies the entire path in the
Merkle tree from the data page to the root hash. However, for
efficiency the filesystem may cache the hash pages. Therefore,
fsverity_verify_page() only ascends the tree reading hash pages until
an already-verified hash page is seen, as indicated by the PageChecked
bit being set. It then verifies the path to that page.
In principle, verifying a data block requires verifying the entire
path in the Merkle tree from the data block to the root hash.
However, for efficiency the filesystem may cache the hash blocks.
Therefore, fsverity_verify_blocks() only ascends the tree reading hash
blocks until an already-verified hash block is seen. It then verifies
the path to that block.

This optimization, which is also used by dm-verity, results in
excellent sequential read performance. This is because usually (e.g.
127 in 128 times for 4K blocks and SHA-256) the hash page from the
127 in 128 times for 4K blocks and SHA-256) the hash block from the
bottom level of the tree will already be cached and checked from
reading a previous data page. However, random reads perform worse.
reading a previous data block. However, random reads perform worse.

Block device based filesystems
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Block device based filesystems (e.g. ext4 and f2fs) in Linux also use
the pagecache, so the above subsection applies too. However, they
also usually read many pages from a file at once, grouped into a
also usually read many data blocks from a file at once, grouped into a
structure called a "bio". To make it easier for these types of
filesystems to support fs-verity, fs/verity/ also provides a function
fsverity_verify_bio() which verifies all pages in a bio.
fsverity_verify_bio() which verifies all data blocks in a bio.

ext4 and f2fs also support encryption. If a verity file is also
encrypted, the pages must be decrypted before being verified. To
encrypted, the data must be decrypted before being verified. To
support this, these filesystems allocate a "post-read context" for
each bio and store it in ``->bi_private``::

Expand All @@ -626,14 +624,14 @@ each bio and store it in ``->bi_private``::
verity, or both is enabled. After the bio completes, for each needed
postprocessing step the filesystem enqueues the bio_post_read_ctx on a
workqueue, and then the workqueue work does the decryption or
verification. Finally, pages where no decryption or verity error
occurred are marked Uptodate, and the pages are unlocked.
verification. Finally, folios where no decryption or verity error
occurred are marked Uptodate, and the folios are unlocked.

On many filesystems, files can contain holes. Normally,
``->readahead()`` simply zeroes holes and sets the corresponding pages
Uptodate; no bios are issued. To prevent this case from bypassing
fs-verity, these filesystems use fsverity_verify_page() to verify hole
pages.
``->readahead()`` simply zeroes hole blocks and considers the
corresponding data to be up-to-date; no bios are issued. To prevent
this case from bypassing fs-verity, filesystems use
fsverity_verify_blocks() to verify hole blocks.

Filesystems also disable direct I/O on verity files, since otherwise
direct I/O would bypass fs-verity.
Expand All @@ -644,7 +642,7 @@ Userspace utility
This document focuses on the kernel, but a userspace utility for
fs-verity can be found at:

https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git
https://git.kernel.org/pub/scm/fs/fsverity/fsverity-utils.git

See the README.md file in the fsverity-utils source tree for details,
including examples of setting up fs-verity protected files.
Expand Down Expand Up @@ -793,9 +791,9 @@ weren't already directly answered in other parts of this document.
:A: There are many reasons why this is not possible or would be very
difficult, including the following:

- To prevent bypassing verification, pages must not be marked
- To prevent bypassing verification, folios must not be marked
Uptodate until they've been verified. Currently, each
filesystem is responsible for marking pages Uptodate via
filesystem is responsible for marking folios Uptodate via
``->readahead()``. Therefore, currently it's not possible for
the VFS to do the verification on its own. Changing this would
require significant changes to the VFS and all filesystems.
Expand Down
19 changes: 7 additions & 12 deletions fs/btrfs/verity.c
Original file line number Diff line number Diff line change
Expand Up @@ -783,30 +783,25 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
/*
* fsverity op that writes a Merkle tree block into the btree.
*
* @inode: inode to write a Merkle tree block for
* @buf: Merkle tree data block to write
* @index: index of the block in the Merkle tree
* @log_blocksize: log base 2 of the Merkle tree block size
*
* Note that the block size could be different from the page size, so it is not
* safe to assume that index is a page index.
* @inode: inode to write a Merkle tree block for
* @buf: Merkle tree block to write
* @pos: the position of the block in the Merkle tree (in bytes)
* @size: the Merkle tree block size (in bytes)
*
* Returns 0 on success or negative error code on failure
*/
static int btrfs_write_merkle_tree_block(struct inode *inode, const void *buf,
u64 index, int log_blocksize)
u64 pos, unsigned int size)
{
u64 off = index << log_blocksize;
u64 len = 1ULL << log_blocksize;
loff_t merkle_pos = merkle_file_pos(inode);

if (merkle_pos < 0)
return merkle_pos;
if (merkle_pos > inode->i_sb->s_maxbytes - off - len)
if (merkle_pos > inode->i_sb->s_maxbytes - pos - size)
return -EFBIG;

return write_key_bytes(BTRFS_I(inode), BTRFS_VERITY_MERKLE_ITEM_KEY,
off, buf, len);
pos, buf, size);
}

const struct fsverity_operations btrfs_verityops = {
Expand Down
72 changes: 60 additions & 12 deletions fs/buffer.c
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
#include <linux/sched/mm.h>
#include <trace/events/block.h>
#include <linux/fscrypt.h>
#include <linux/fsverity.h>

#include "internal.h"

Expand Down Expand Up @@ -295,20 +296,53 @@ static void end_buffer_async_read(struct buffer_head *bh, int uptodate)
return;
}

struct decrypt_bh_ctx {
struct postprocess_bh_ctx {
struct work_struct work;
struct buffer_head *bh;
};

static void verify_bh(struct work_struct *work)
{
struct postprocess_bh_ctx *ctx =
container_of(work, struct postprocess_bh_ctx, work);
struct buffer_head *bh = ctx->bh;
bool valid;

valid = fsverity_verify_blocks(page_folio(bh->b_page), bh->b_size,
bh_offset(bh));
end_buffer_async_read(bh, valid);
kfree(ctx);
}

static bool need_fsverity(struct buffer_head *bh)
{
struct page *page = bh->b_page;
struct inode *inode = page->mapping->host;

return fsverity_active(inode) &&
/* needed by ext4 */
page->index < DIV_ROUND_UP(inode->i_size, PAGE_SIZE);
}

static void decrypt_bh(struct work_struct *work)
{
struct decrypt_bh_ctx *ctx =
container_of(work, struct decrypt_bh_ctx, work);
struct postprocess_bh_ctx *ctx =
container_of(work, struct postprocess_bh_ctx, work);
struct buffer_head *bh = ctx->bh;
int err;

err = fscrypt_decrypt_pagecache_blocks(bh->b_page, bh->b_size,
bh_offset(bh));
err = fscrypt_decrypt_pagecache_blocks(page_folio(bh->b_page),
bh->b_size, bh_offset(bh));
if (err == 0 && need_fsverity(bh)) {
/*
* We use different work queues for decryption and for verity
* because verity may require reading metadata pages that need
* decryption, and we shouldn't recurse to the same workqueue.
*/
INIT_WORK(&ctx->work, verify_bh);
fsverity_enqueue_verify_work(&ctx->work);
return;
}
end_buffer_async_read(bh, err == 0);
kfree(ctx);
}
Expand All @@ -319,15 +353,24 @@ static void decrypt_bh(struct work_struct *work)
*/
static void end_buffer_async_read_io(struct buffer_head *bh, int uptodate)
{
/* Decrypt if needed */
if (uptodate &&
fscrypt_inode_uses_fs_layer_crypto(bh->b_page->mapping->host)) {
struct decrypt_bh_ctx *ctx = kmalloc(sizeof(*ctx), GFP_ATOMIC);
struct inode *inode = bh->b_page->mapping->host;
bool decrypt = fscrypt_inode_uses_fs_layer_crypto(inode);
bool verify = need_fsverity(bh);

/* Decrypt (with fscrypt) and/or verify (with fsverity) if needed. */
if (uptodate && (decrypt || verify)) {
struct postprocess_bh_ctx *ctx =
kmalloc(sizeof(*ctx), GFP_ATOMIC);

if (ctx) {
INIT_WORK(&ctx->work, decrypt_bh);
ctx->bh = bh;
fscrypt_enqueue_decrypt_work(&ctx->work);
if (decrypt) {
INIT_WORK(&ctx->work, decrypt_bh);
fscrypt_enqueue_decrypt_work(&ctx->work);
} else {
INIT_WORK(&ctx->work, verify_bh);
fsverity_enqueue_verify_work(&ctx->work);
}
return;
}
uptodate = 0;
Expand Down Expand Up @@ -2245,6 +2288,11 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
int nr, i;
int fully_mapped = 1;
bool page_error = false;
loff_t limit = i_size_read(inode);

/* This is needed for ext4. */
if (IS_ENABLED(CONFIG_FS_VERITY) && IS_VERITY(inode))
limit = inode->i_sb->s_maxbytes;

VM_BUG_ON_FOLIO(folio_test_large(folio), folio);

Expand All @@ -2253,7 +2301,7 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
bbits = block_size_bits(blocksize);

iblock = (sector_t)folio->index << (PAGE_SHIFT - bbits);
lblock = (i_size_read(inode)+blocksize-1) >> bbits;
lblock = (limit+blocksize-1) >> bbits;
bh = head;
nr = 0;
i = 0;
Expand Down
10 changes: 4 additions & 6 deletions fs/crypto/bio.c
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,11 @@
*/
bool fscrypt_decrypt_bio(struct bio *bio)
{
struct bio_vec *bv;
struct bvec_iter_all iter_all;
struct folio_iter fi;

bio_for_each_segment_all(bv, bio, iter_all) {
struct page *page = bv->bv_page;
int err = fscrypt_decrypt_pagecache_blocks(page, bv->bv_len,
bv->bv_offset);
bio_for_each_folio_all(fi, bio) {
int err = fscrypt_decrypt_pagecache_blocks(fi.folio, fi.length,
fi.offset);

if (err) {
bio->bi_status = errno_to_blk_status(err);
Expand Down
Loading

0 comments on commit 6639c3c

Please sign in to comment.