Skip to content

Commit

Permalink
Merge tag 'for-5.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/…
Browse files Browse the repository at this point in the history
…git/kdave/linux

Pull btrfs updates from David Sterba:
 "We don't have any big feature updates this time, there are lots of
  small enhacements or fixes. A highlight perhaps is the parallel fsync
  performance improvements, numbers below.

  Regarding the dio/iomap that was reverted last time, the required API
  changes are likely to land in the upcoming cycle, the btrfs part will
  be updated afterwards.

  User visible changes:

   - new mount option rescue= to group all recovery-related mount
     options so we don't have many specific options, currently
     introducing only aliases for existing options, future extensions
     are in development to allow read-only mount with partially damaged
     structures:
      - usebackuproot is an alias for rescue=usebackuproot
      - nologreplay is an alias for rescue=nologreplay

   - start deprecation of mount option inode_cache, removal scheduled to
     v5.11

   - removed deprecated mount options alloc_start and subvolrootid

   - device stats corruption counter gets incremented when a checksum
     mismatch is found

   - qgroup information exported in /sys/fs/btrfs/<UUID>/qgroups/<id>
     using sysfs

   - add link /sys/fs/btrfs/<UUID>/bdi pointing to the associated
     backing dev info

   - FS_INFO ioctl enhancements:
      - add flags to request/describe newly added items
      - new item: numeric checksum type and checksum size
      - new item: generation
      - new item: metadata_uuid

   - seed device: with one new read-write device added, print the new
     device information in /proc/mounts

   - balance: detect cancellation by Ctrl-C in existing cancellation
     points

  Performance improvements:

   - optimized versions of various helpers on little-endian
     architectures, where we don't have to do LE/BE conversion from
     on-disk format

   - tree-log/fsync optimizations leading to lower max latency reported
     by dbench, reduced by about 12%

   - all chunk tree leaves are prefetched at mount time, can improve
     mount time on large (terabyte-sized) filesystems

   - speed up parallel fsync of files with reflinked/deduped extents,
     with jobs 16 to 1024 the throughput gets improved roughly by 50% on
     average and runtime decreased roughly by 30% on average, notable
     outlier is 128 jobs with +121.2% on throughput and -54.6% runtime

   - another speed up of parallel fsync, reduce number of checksum tree
     lookups and contention, the improvements start to show up with 2
     tasks with +20% throughput and -16% runtime up to 64 with +200%
     throughput and -66% runtime

  Core:

   - umount-time qgroup leak checker

   - qgroups
      - add a way to unreserve partial range after failure, avoiding
        some EDQUOT errors
      - improved flushing logic when EDQUOT is hit

   - possible EINTR interruption caused by failed reservations after
     transaction start is better handled and documented

   - transaction abort errors are unified to EROFS in case it's not the
     original reason of abort or we don't have other way to determine
     the reason

  Fixes:

   - make truncate succeed on a NOCOW file even if data space is
     exhausted

   - fix cancelling balance on filesystem with exhausted metadata space

   - anon block device:
      - preallocate anon bdev when subvolume is created to report
        failure early
      - shorten time the anon bdev id is allocated
      - don't allocate anon bdev for internal roots

   - minor memory leak in ref-verify

   - refuse invalid combinations of compression and NOCOW file flags

   - lockdep fixes, updating the device locks

   - remove obsolete fallback logic for block group profile adjustments
     when switching from 1 to more devices, causing allocation of
     unwanted block groups

  Other cleanups, refactoring, simplifications:

   - conversions from struct inode to struct btrfs_inode in internal
     functions

   - removal of unused struct members"

* tag 'for-5.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (151 commits)
  btrfs: do not set the full sync flag on the inode during page release
  btrfs: release old extent maps during page release
  btrfs: fix race between page release and a fast fsync
  btrfs: open-code remount flag setting in btrfs_remount
  btrfs: if we're restriping, use the target restripe profile
  btrfs: don't adjust bg flags and use default allocation profiles
  btrfs: fix lockdep splat from btrfs_dump_space_info
  btrfs: move the chunk_mutex in btrfs_read_chunk_tree
  btrfs: open device without device_list_mutex
  btrfs: sysfs: use NOFS for device creation
  btrfs: return EROFS for BTRFS_FS_STATE_ERROR cases
  btrfs: document special case error codes for fs errors
  btrfs: don't WARN if we abort a transaction with EROFS
  btrfs: reduce contention on log trees when logging checksums
  btrfs: remove done label in writepage_delalloc
  btrfs: add comments for btrfs_reserve_flush_enum
  btrfs: relocation: review the call sites which can be interrupted by signal
  btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on relocation tree
  btrfs: relocation: allow signal to cancel balance
  btrfs: raid56: remove out label in __raid56_parity_recover
  ...
  • Loading branch information
torvalds committed Aug 3, 2020
2 parents 92b7e49 + 5e548b3 commit 6dec9f4
Show file tree
Hide file tree
Showing 47 changed files with 1,909 additions and 1,221 deletions.
211 changes: 73 additions & 138 deletions fs/btrfs/block-group.c
Original file line number Diff line number Diff line change
Expand Up @@ -65,11 +65,8 @@ static u64 btrfs_reduce_alloc_profile(struct btrfs_fs_info *fs_info, u64 flags)
spin_lock(&fs_info->balance_lock);
target = get_restripe_target(fs_info, flags);
if (target) {
/* Pick target profile only if it's already available */
if ((flags & target) & BTRFS_EXTENDED_PROFILE_MASK) {
spin_unlock(&fs_info->balance_lock);
return extended_to_chunk(target);
}
spin_unlock(&fs_info->balance_lock);
return extended_to_chunk(target);
}
spin_unlock(&fs_info->balance_lock);

Expand Down Expand Up @@ -118,12 +115,12 @@ u64 btrfs_get_alloc_profile(struct btrfs_fs_info *fs_info, u64 orig_flags)

void btrfs_get_block_group(struct btrfs_block_group *cache)
{
atomic_inc(&cache->count);
refcount_inc(&cache->refs);
}

void btrfs_put_block_group(struct btrfs_block_group *cache)
{
if (atomic_dec_and_test(&cache->count)) {
if (refcount_dec_and_test(&cache->refs)) {
WARN_ON(cache->pinned > 0);
WARN_ON(cache->reserved > 0);

Expand Down Expand Up @@ -1111,7 +1108,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
if (ret < 0)
goto out;

mutex_lock(&fs_info->chunk_mutex);
spin_lock(&block_group->lock);
block_group->removed = 1;
/*
Expand Down Expand Up @@ -1143,8 +1139,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
remove_em = (atomic_read(&block_group->frozen) == 0);
spin_unlock(&block_group->lock);

mutex_unlock(&fs_info->chunk_mutex);

if (remove_em) {
struct extent_map_tree *em_tree;

Expand Down Expand Up @@ -1532,21 +1526,70 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
spin_unlock(&fs_info->unused_bgs_lock);
}

static int read_bg_from_eb(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
struct btrfs_path *path)
{
struct extent_map_tree *em_tree;
struct extent_map *em;
struct btrfs_block_group_item bg;
struct extent_buffer *leaf;
int slot;
u64 flags;
int ret = 0;

slot = path->slots[0];
leaf = path->nodes[0];

em_tree = &fs_info->mapping_tree;
read_lock(&em_tree->lock);
em = lookup_extent_mapping(em_tree, key->objectid, key->offset);
read_unlock(&em_tree->lock);
if (!em) {
btrfs_err(fs_info,
"logical %llu len %llu found bg but no related chunk",
key->objectid, key->offset);
return -ENOENT;
}

if (em->start != key->objectid || em->len != key->offset) {
btrfs_err(fs_info,
"block group %llu len %llu mismatch with chunk %llu len %llu",
key->objectid, key->offset, em->start, em->len);
ret = -EUCLEAN;
goto out_free_em;
}

read_extent_buffer(leaf, &bg, btrfs_item_ptr_offset(leaf, slot),
sizeof(bg));
flags = btrfs_stack_block_group_flags(&bg) &
BTRFS_BLOCK_GROUP_TYPE_MASK;

if (flags != (em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
btrfs_err(fs_info,
"block group %llu len %llu type flags 0x%llx mismatch with chunk type flags 0x%llx",
key->objectid, key->offset, flags,
(BTRFS_BLOCK_GROUP_TYPE_MASK & em->map_lookup->type));
ret = -EUCLEAN;
}

out_free_em:
free_extent_map(em);
return ret;
}

static int find_first_block_group(struct btrfs_fs_info *fs_info,
struct btrfs_path *path,
struct btrfs_key *key)
{
struct btrfs_root *root = fs_info->extent_root;
int ret = 0;
int ret;
struct btrfs_key found_key;
struct extent_buffer *leaf;
struct btrfs_block_group_item bg;
u64 flags;
int slot;

ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
if (ret < 0)
goto out;
return ret;

while (1) {
slot = path->slots[0];
Expand All @@ -1563,49 +1606,10 @@ static int find_first_block_group(struct btrfs_fs_info *fs_info,

if (found_key.objectid >= key->objectid &&
found_key.type == BTRFS_BLOCK_GROUP_ITEM_KEY) {
struct extent_map_tree *em_tree;
struct extent_map *em;

em_tree = &root->fs_info->mapping_tree;
read_lock(&em_tree->lock);
em = lookup_extent_mapping(em_tree, found_key.objectid,
found_key.offset);
read_unlock(&em_tree->lock);
if (!em) {
btrfs_err(fs_info,
"logical %llu len %llu found bg but no related chunk",
found_key.objectid, found_key.offset);
ret = -ENOENT;
} else if (em->start != found_key.objectid ||
em->len != found_key.offset) {
btrfs_err(fs_info,
"block group %llu len %llu mismatch with chunk %llu len %llu",
found_key.objectid, found_key.offset,
em->start, em->len);
ret = -EUCLEAN;
} else {
read_extent_buffer(leaf, &bg,
btrfs_item_ptr_offset(leaf, slot),
sizeof(bg));
flags = btrfs_stack_block_group_flags(&bg) &
BTRFS_BLOCK_GROUP_TYPE_MASK;

if (flags != (em->map_lookup->type &
BTRFS_BLOCK_GROUP_TYPE_MASK)) {
btrfs_err(fs_info,
"block group %llu len %llu type flags 0x%llx mismatch with chunk type flags 0x%llx",
found_key.objectid,
found_key.offset, flags,
(BTRFS_BLOCK_GROUP_TYPE_MASK &
em->map_lookup->type));
ret = -EUCLEAN;
} else {
ret = 0;
}
}
free_extent_map(em);
goto out;
ret = read_bg_from_eb(fs_info, &found_key, path);
break;
}

path->slots[0]++;
}
out:
Expand Down Expand Up @@ -1657,19 +1661,12 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start,
return -EIO;

map = em->map_lookup;
data_stripe_length = em->len;
data_stripe_length = em->orig_block_len;
io_stripe_size = map->stripe_len;

if (map->type & BTRFS_BLOCK_GROUP_RAID10)
data_stripe_length = div_u64(data_stripe_length,
map->num_stripes / map->sub_stripes);
else if (map->type & BTRFS_BLOCK_GROUP_RAID0)
data_stripe_length = div_u64(data_stripe_length, map->num_stripes);
else if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
data_stripe_length = div_u64(data_stripe_length,
nr_data_stripes(map));
/* For RAID5/6 adjust to a full IO stripe length */
if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK)
io_stripe_size = map->stripe_len * nr_data_stripes(map);
}

buf = kcalloc(map->num_stripes, sizeof(u64), GFP_NOFS);
if (!buf) {
Expand Down Expand Up @@ -1748,25 +1745,12 @@ static int exclude_super_stripes(struct btrfs_block_group *cache)
return ret;

while (nr--) {
u64 start, len;

if (logical[nr] > cache->start + cache->length)
continue;

if (logical[nr] + stripe_len <= cache->start)
continue;

start = logical[nr];
if (start < cache->start) {
start = cache->start;
len = (logical[nr] + stripe_len) - start;
} else {
len = min_t(u64, stripe_len,
cache->start + cache->length - start);
}
u64 len = min_t(u64, stripe_len,
cache->start + cache->length - logical[nr]);

cache->bytes_super += len;
ret = btrfs_add_excluded_extent(fs_info, start, len);
ret = btrfs_add_excluded_extent(fs_info, logical[nr],
len);
if (ret) {
kfree(logical);
return ret;
Expand Down Expand Up @@ -1818,7 +1802,7 @@ static struct btrfs_block_group *btrfs_create_block_group_cache(

cache->discard_index = BTRFS_DISCARD_INDEX_UNUSED;

atomic_set(&cache->count, 1);
refcount_set(&cache->refs, 1);
spin_lock_init(&cache->lock);
init_rwsem(&cache->data_rwsem);
INIT_LIST_HEAD(&cache->list);
Expand Down Expand Up @@ -2207,54 +2191,6 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used,
return 0;
}

static u64 update_block_group_flags(struct btrfs_fs_info *fs_info, u64 flags)
{
u64 num_devices;
u64 stripped;

/*
* if restripe for this chunk_type is on pick target profile and
* return, otherwise do the usual balance
*/
stripped = get_restripe_target(fs_info, flags);
if (stripped)
return extended_to_chunk(stripped);

num_devices = fs_info->fs_devices->rw_devices;

stripped = BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID56_MASK |
BTRFS_BLOCK_GROUP_RAID1_MASK | BTRFS_BLOCK_GROUP_RAID10;

if (num_devices == 1) {
stripped |= BTRFS_BLOCK_GROUP_DUP;
stripped = flags & ~stripped;

/* turn raid0 into single device chunks */
if (flags & BTRFS_BLOCK_GROUP_RAID0)
return stripped;

/* turn mirroring into duplication */
if (flags & (BTRFS_BLOCK_GROUP_RAID1_MASK |
BTRFS_BLOCK_GROUP_RAID10))
return stripped | BTRFS_BLOCK_GROUP_DUP;
} else {
/* they already had raid on here, just return */
if (flags & stripped)
return flags;

stripped |= BTRFS_BLOCK_GROUP_DUP;
stripped = flags & ~stripped;

/* switch duplicated blocks with raid1 */
if (flags & BTRFS_BLOCK_GROUP_DUP)
return stripped | BTRFS_BLOCK_GROUP_RAID1;

/* this is drive concat, leave it alone */
}

return flags;
}

/*
* Mark one block group RO, can be called several times for the same block
* group.
Expand Down Expand Up @@ -2300,7 +2236,7 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
* If we are changing raid levels, try to allocate a
* corresponding block group with the new raid level.
*/
alloc_flags = update_block_group_flags(fs_info, cache->flags);
alloc_flags = btrfs_get_alloc_profile(fs_info, cache->flags);
if (alloc_flags != cache->flags) {
ret = btrfs_chunk_alloc(trans, alloc_flags,
CHUNK_ALLOC_FORCE);
Expand All @@ -2327,7 +2263,7 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
ret = inc_block_group_ro(cache, 0);
out:
if (cache->flags & BTRFS_BLOCK_GROUP_SYSTEM) {
alloc_flags = update_block_group_flags(fs_info, cache->flags);
alloc_flags = btrfs_get_alloc_profile(fs_info, cache->flags);
mutex_lock(&fs_info->chunk_mutex);
check_system_chunk(trans, alloc_flags);
mutex_unlock(&fs_info->chunk_mutex);
Expand Down Expand Up @@ -2521,7 +2457,8 @@ static int cache_save_setup(struct btrfs_block_group *block_group,
num_pages *= 16;
num_pages *= PAGE_SIZE;

ret = btrfs_check_data_free_space(inode, &data_reserved, 0, num_pages);
ret = btrfs_check_data_free_space(BTRFS_I(inode), &data_reserved, 0,
num_pages);
if (ret)
goto out_put;

Expand Down Expand Up @@ -3392,7 +3329,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
ASSERT(list_empty(&block_group->dirty_list));
ASSERT(list_empty(&block_group->io_list));
ASSERT(list_empty(&block_group->bg_list));
ASSERT(atomic_read(&block_group->count) == 1);
ASSERT(refcount_read(&block_group->refs) == 1);
btrfs_put_block_group(block_group);

spin_lock(&info->block_group_cache_lock);
Expand Down Expand Up @@ -3447,15 +3384,13 @@ void btrfs_unfreeze_block_group(struct btrfs_block_group *block_group)
spin_unlock(&block_group->lock);

if (cleanup) {
mutex_lock(&fs_info->chunk_mutex);
em_tree = &fs_info->mapping_tree;
write_lock(&em_tree->lock);
em = lookup_extent_mapping(em_tree, block_group->start,
1);
BUG_ON(!em); /* logic error, can't happen */
remove_extent_mapping(em_tree, em);
write_unlock(&em_tree->lock);
mutex_unlock(&fs_info->chunk_mutex);

/* once for us and once for the tree */
free_extent_map(em);
Expand Down
3 changes: 1 addition & 2 deletions fs/btrfs/block-group.h
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,7 @@ struct btrfs_block_group {
/* For block groups in the same raid type */
struct list_head list;

/* Usage count */
atomic_t count;
refcount_t refs;

/*
* List of struct btrfs_free_clusters for this block group.
Expand Down
11 changes: 11 additions & 0 deletions fs/btrfs/btrfs_inode.h
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,17 @@ struct btrfs_inode {
*/
u64 last_unlink_trans;

/*
* The id/generation of the last transaction where this inode was
* either the source or the destination of a clone/dedupe operation.
* Used when logging an inode to know if there are shared extents that
* need special care when logging checksum items, to avoid duplicate
* checksum items in a log (which can lead to a corruption where we end
* up with missing checksum ranges after log replay).
* Protected by the vfs inode lock.
*/
u64 last_reflink_trans;

/*
* Number of bytes outstanding that are going to need csums. This is
* used in ENOSPC accounting.
Expand Down
Loading

0 comments on commit 6dec9f4

Please sign in to comment.