Skip to content

Commit

Permalink
Merge tag 'for-linus-20190726' of git://git.kernel.dk/linux-block
Browse files Browse the repository at this point in the history
Pull block fixes from Jens Axboe:

 - Several io_uring fixes/improvements:
     - Blocking fix for O_DIRECT (me)
     - Latter page slowness for registered buffers (me)
     - Fix poll hang under certain conditions (me)
     - Defer sequence check fix for wrapped rings (Zhengyuan)
     - Mismatch in async inc/dec accounting (Zhengyuan)
     - Memory ordering issue that could cause stall (Zhengyuan)
      - Track sequential defer in bytes, not pages (Zhengyuan)

 - NVMe pull request from Christoph

 - Set of hang fixes for wbt (Josef)

 - Redundant error message kill for libahci (Ding)

 - Remove unused blk_mq_sched_started_request() and related ops (Marcos)

 - drbd dynamic alloc shash descriptor to reduce stack use (Arnd)

 - blkcg ->pd_stat() non-debug print (Tejun)

 - bcache memory leak fix (Wei)

 - Comment fix (Akinobu)

 - BFQ perf regression fix (Paolo)

* tag 'for-linus-20190726' of git://git.kernel.dk/linux-block: (24 commits)
  io_uring: ensure ->list is initialized for poll commands
  Revert "nvme-pci: don't create a read hctx mapping without read queues"
  nvme: fix multipath crash when ANA is deactivated
  nvme: fix memory leak caused by incorrect subsystem free
  nvme: ignore subnqn for ADATA SX6000LNP
  drbd: dynamically allocate shash descriptor
  block: blk-mq: Remove blk_mq_sched_started_request and started_request
  bcache: fix possible memory leak in bch_cached_dev_run()
  io_uring: track io length in async_list based on bytes
  io_uring: don't use iov_iter_advance() for fixed buffers
  block: properly handle IOCB_NOWAIT for async O_DIRECT IO
  blk-mq: allow REQ_NOWAIT to return an error inline
  io_uring: add a memory barrier before atomic_read
  rq-qos: use a mb for got_token
  rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule
  rq-qos: don't reset has_sleepers on spurious wakeups
  rq-qos: fix missed wake-ups in rq_qos_throttle
  wait: add wq_has_single_sleeper helper
  block, bfq: check also in-flight I/O in dispatch plugging
  block: fix sysfs module parameters directory path in comment
  ...
  • Loading branch information
torvalds committed Jul 26, 2019
2 parents 750c930 + 9c0b259 commit 0441281
Show file tree
Hide file tree
Showing 20 changed files with 224 additions and 92 deletions.
67 changes: 43 additions & 24 deletions block/bfq-iosched.c
Original file line number Diff line number Diff line change
Expand Up @@ -3354,38 +3354,57 @@ static void bfq_dispatch_remove(struct request_queue *q, struct request *rq)
* there is no active group, then the primary expectation for
* this device is probably a high throughput.
*
* We are now left only with explaining the additional
* compound condition that is checked below for deciding
* whether the scenario is asymmetric. To explain this
* compound condition, we need to add that the function
* We are now left only with explaining the two sub-conditions in the
* additional compound condition that is checked below for deciding
* whether the scenario is asymmetric. To explain the first
* sub-condition, we need to add that the function
* bfq_asymmetric_scenario checks the weights of only
* non-weight-raised queues, for efficiency reasons (see
* comments on bfq_weights_tree_add()). Then the fact that
* bfqq is weight-raised is checked explicitly here. More
* precisely, the compound condition below takes into account
* also the fact that, even if bfqq is being weight-raised,
* the scenario is still symmetric if all queues with requests
* waiting for completion happen to be
* weight-raised. Actually, we should be even more precise
* here, and differentiate between interactive weight raising
* and soft real-time weight raising.
* non-weight-raised queues, for efficiency reasons (see comments on
* bfq_weights_tree_add()). Then the fact that bfqq is weight-raised
* is checked explicitly here. More precisely, the compound condition
* below takes into account also the fact that, even if bfqq is being
* weight-raised, the scenario is still symmetric if all queues with
* requests waiting for completion happen to be
* weight-raised. Actually, we should be even more precise here, and
* differentiate between interactive weight raising and soft real-time
* weight raising.
*
* The second sub-condition checked in the compound condition is
* whether there is a fair amount of already in-flight I/O not
* belonging to bfqq. If so, I/O dispatching is to be plugged, for the
* following reason. The drive may decide to serve in-flight
* non-bfqq's I/O requests before bfqq's ones, thereby delaying the
* arrival of new I/O requests for bfqq (recall that bfqq is sync). If
* I/O-dispatching is not plugged, then, while bfqq remains empty, a
* basically uncontrolled amount of I/O from other queues may be
* dispatched too, possibly causing the service of bfqq's I/O to be
* delayed even longer in the drive. This problem gets more and more
* serious as the speed and the queue depth of the drive grow,
* because, as these two quantities grow, the probability to find no
* queue busy but many requests in flight grows too. By contrast,
* plugging I/O dispatching minimizes the delay induced by already
* in-flight I/O, and enables bfqq to recover the bandwidth it may
* lose because of this delay.
*
* As a side note, it is worth considering that the above
* device-idling countermeasures may however fail in the
* following unlucky scenario: if idling is (correctly)
* disabled in a time period during which all symmetry
* sub-conditions hold, and hence the device is allowed to
* enqueue many requests, but at some later point in time some
* sub-condition stops to hold, then it may become impossible
* to let requests be served in the desired order until all
* the requests already queued in the device have been served.
* device-idling countermeasures may however fail in the following
* unlucky scenario: if I/O-dispatch plugging is (correctly) disabled
* in a time period during which all symmetry sub-conditions hold, and
* therefore the device is allowed to enqueue many requests, but at
* some later point in time some sub-condition stops to hold, then it
* may become impossible to make requests be served in the desired
* order until all the requests already queued in the device have been
* served. The last sub-condition commented above somewhat mitigates
* this problem for weight-raised queues.
*/
static bool idling_needed_for_service_guarantees(struct bfq_data *bfqd,
struct bfq_queue *bfqq)
{
return (bfqq->wr_coeff > 1 &&
bfqd->wr_busy_queues <
bfq_tot_busy_queues(bfqd)) ||
(bfqd->wr_busy_queues <
bfq_tot_busy_queues(bfqd) ||
bfqd->rq_in_driver >=
bfqq->dispatched + 4)) ||
bfq_asymmetric_scenario(bfqd, bfqq);
}

Expand Down
9 changes: 3 additions & 6 deletions block/blk-cgroup.c
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ static struct blkcg_policy *blkcg_policy[BLKCG_MAX_POLS];

static LIST_HEAD(all_blkcgs); /* protected by blkcg_pol_mutex */

static bool blkcg_debug_stats = false;
bool blkcg_debug_stats = false;
static struct workqueue_struct *blkcg_punt_bio_wq;

static bool blkcg_policy_enabled(struct request_queue *q,
Expand Down Expand Up @@ -944,10 +944,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
dbytes, dios);
}

if (!blkcg_debug_stats)
goto next;

if (atomic_read(&blkg->use_delay)) {
if (blkcg_debug_stats && atomic_read(&blkg->use_delay)) {
has_stats = true;
off += scnprintf(buf+off, size-off,
" use_delay=%d delay_nsec=%llu",
Expand All @@ -967,7 +964,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
has_stats = true;
off += written;
}
next:

if (has_stats) {
if (off < size - 1) {
off += scnprintf(buf+off, size-off, "\n");
Expand Down
3 changes: 3 additions & 0 deletions block/blk-iolatency.c
Original file line number Diff line number Diff line change
Expand Up @@ -917,6 +917,9 @@ static size_t iolatency_pd_stat(struct blkg_policy_data *pd, char *buf,
unsigned long long avg_lat;
unsigned long long cur_win;

if (!blkcg_debug_stats)
return 0;

if (iolat->ssd)
return iolatency_ssd_stat(iolat, buf, size);

Expand Down
9 changes: 0 additions & 9 deletions block/blk-mq-sched.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,15 +61,6 @@ static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
e->type->ops.completed_request(rq, now);
}

static inline void blk_mq_sched_started_request(struct request *rq)
{
struct request_queue *q = rq->q;
struct elevator_queue *e = q->elevator;

if (e && e->type->ops.started_request)
e->type->ops.started_request(rq);
}

static inline void blk_mq_sched_requeue_request(struct request *rq)
{
struct request_queue *q = rq->q;
Expand Down
10 changes: 6 additions & 4 deletions block/blk-mq.c
Original file line number Diff line number Diff line change
Expand Up @@ -669,8 +669,6 @@ void blk_mq_start_request(struct request *rq)
{
struct request_queue *q = rq->q;

blk_mq_sched_started_request(rq);

trace_block_rq_issue(q, rq);

if (test_bit(QUEUE_FLAG_STATS, &q->queue_flags)) {
Expand Down Expand Up @@ -1960,9 +1958,13 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
rq = blk_mq_get_request(q, bio, &data);
if (unlikely(!rq)) {
rq_qos_cleanup(q, bio);
if (bio->bi_opf & REQ_NOWAIT)

cookie = BLK_QC_T_NONE;
if (bio->bi_opf & REQ_NOWAIT_INLINE)
cookie = BLK_QC_T_EAGAIN;
else if (bio->bi_opf & REQ_NOWAIT)
bio_wouldblock_error(bio);
return BLK_QC_T_NONE;
return cookie;
}

trace_block_getrq(q, bio, bio->bi_opf);
Expand Down
7 changes: 6 additions & 1 deletion block/blk-rq-qos.c
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,7 @@ static int rq_qos_wake_function(struct wait_queue_entry *curr,
return -1;

data->got_token = true;
smp_wmb();
list_del_init(&curr->entry);
wake_up_process(data->task);
return 1;
Expand Down Expand Up @@ -244,7 +245,9 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,
return;

prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE);
has_sleeper = !wq_has_single_sleeper(&rqw->wait);
do {
/* The memory barrier in set_task_state saves us here. */
if (data.got_token)
break;
if (!has_sleeper && acquire_inflight_cb(rqw, private_data)) {
Expand All @@ -255,12 +258,14 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,
* which means we now have two. Put our local token
* and wake anyone else potentially waiting for one.
*/
smp_rmb();
if (data.got_token)
cleanup_cb(rqw, private_data);
break;
}
io_schedule();
has_sleeper = false;
has_sleeper = true;
set_current_state(TASK_UNINTERRUPTIBLE);
} while (1);
finish_wait(&rqw->wait, &data.wq);
}
Expand Down
2 changes: 1 addition & 1 deletion block/genhd.c
Original file line number Diff line number Diff line change
Expand Up @@ -1969,7 +1969,7 @@ static const struct attribute *disk_events_attrs[] = {
* The default polling interval can be specified by the kernel
* parameter block.events_dfl_poll_msecs which defaults to 0
* (disable). This can also be modified runtime by writing to
* /sys/module/block/events_dfl_poll_msecs.
* /sys/module/block/parameters/events_dfl_poll_msecs.
*/
static int disk_events_set_dfl_poll_msecs(const char *val,
const struct kernel_param *kp)
Expand Down
1 change: 0 additions & 1 deletion drivers/ata/libahci_platform.c
Original file line number Diff line number Diff line change
Expand Up @@ -408,7 +408,6 @@ struct ahci_host_priv *ahci_platform_get_resources(struct platform_device *pdev,
hpriv->mmio = devm_ioremap_resource(dev,
platform_get_resource(pdev, IORESOURCE_MEM, 0));
if (IS_ERR(hpriv->mmio)) {
dev_err(dev, "no mmio space\n");
rc = PTR_ERR(hpriv->mmio);
goto err_out;
}
Expand Down
14 changes: 12 additions & 2 deletions drivers/block/drbd/drbd_receiver.c
Original file line number Diff line number Diff line change
Expand Up @@ -5417,7 +5417,7 @@ static int drbd_do_auth(struct drbd_connection *connection)
unsigned int key_len;
char secret[SHARED_SECRET_MAX]; /* 64 byte */
unsigned int resp_size;
SHASH_DESC_ON_STACK(desc, connection->cram_hmac_tfm);
struct shash_desc *desc;
struct packet_info pi;
struct net_conf *nc;
int err, rv;
Expand All @@ -5430,6 +5430,13 @@ static int drbd_do_auth(struct drbd_connection *connection)
memcpy(secret, nc->shared_secret, key_len);
rcu_read_unlock();

desc = kmalloc(sizeof(struct shash_desc) +
crypto_shash_descsize(connection->cram_hmac_tfm),
GFP_KERNEL);
if (!desc) {
rv = -1;
goto fail;
}
desc->tfm = connection->cram_hmac_tfm;

rv = crypto_shash_setkey(connection->cram_hmac_tfm, (u8 *)secret, key_len);
Expand Down Expand Up @@ -5571,7 +5578,10 @@ static int drbd_do_auth(struct drbd_connection *connection)
kfree(peers_ch);
kfree(response);
kfree(right_response);
shash_desc_zero(desc);
if (desc) {
shash_desc_zero(desc);
kfree(desc);
}

return rv;
}
Expand Down
3 changes: 3 additions & 0 deletions drivers/md/bcache/super.c
Original file line number Diff line number Diff line change
Expand Up @@ -931,6 +931,9 @@ int bch_cached_dev_run(struct cached_dev *dc)
if (dc->io_disable) {
pr_err("I/O disabled on cached dev %s",
dc->backing_dev_name);
kfree(env[1]);
kfree(env[2]);
kfree(buf);
return -EIO;
}

Expand Down
12 changes: 5 additions & 7 deletions drivers/nvme/host/core.c
Original file line number Diff line number Diff line change
Expand Up @@ -2311,17 +2311,15 @@ static void nvme_init_subnqn(struct nvme_subsystem *subsys, struct nvme_ctrl *ct
memset(subsys->subnqn + off, 0, sizeof(subsys->subnqn) - off);
}

static void __nvme_release_subsystem(struct nvme_subsystem *subsys)
static void nvme_release_subsystem(struct device *dev)
{
struct nvme_subsystem *subsys =
container_of(dev, struct nvme_subsystem, dev);

ida_simple_remove(&nvme_subsystems_ida, subsys->instance);
kfree(subsys);
}

static void nvme_release_subsystem(struct device *dev)
{
__nvme_release_subsystem(container_of(dev, struct nvme_subsystem, dev));
}

static void nvme_destroy_subsystem(struct kref *ref)
{
struct nvme_subsystem *subsys =
Expand Down Expand Up @@ -2477,7 +2475,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
mutex_lock(&nvme_subsystems_lock);
found = __nvme_find_get_subsystem(subsys->subnqn);
if (found) {
__nvme_release_subsystem(subsys);
put_device(&subsys->dev);
subsys = found;

if (!nvme_validate_cntlid(subsys, ctrl, id)) {
Expand Down
8 changes: 2 additions & 6 deletions drivers/nvme/host/multipath.c
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,6 @@ module_param(multipath, bool, 0444);
MODULE_PARM_DESC(multipath,
"turn on native support for multiple controllers per subsystem");

inline bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl)
{
return multipath && ctrl->subsys && (ctrl->subsys->cmic & (1 << 3));
}

/*
* If multipathing is enabled we need to always use the subsystem instance
* number for numbering our devices to avoid conflicts between subsystems that
Expand Down Expand Up @@ -622,7 +617,8 @@ int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
{
int error;

if (!nvme_ctrl_use_ana(ctrl))
/* check if multipath is enabled and we have the capability */
if (!multipath || !ctrl->subsys || !(ctrl->subsys->cmic & (1 << 3)))
return 0;

ctrl->anacap = id->anacap;
Expand Down
6 changes: 5 additions & 1 deletion drivers/nvme/host/nvme.h
Original file line number Diff line number Diff line change
Expand Up @@ -485,7 +485,11 @@ extern const struct attribute_group *nvme_ns_id_attr_groups[];
extern const struct block_device_operations nvme_ns_head_ops;

#ifdef CONFIG_NVME_MULTIPATH
bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl);
static inline bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl)
{
return ctrl->ana_log_buf != NULL;
}

void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
struct nvme_ctrl *ctrl, int *flags);
void nvme_failover_req(struct request *req);
Expand Down
6 changes: 3 additions & 3 deletions drivers/nvme/host/pci.c
Original file line number Diff line number Diff line change
Expand Up @@ -2254,9 +2254,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
if (!dev->ctrl.tagset) {
dev->tagset.ops = &nvme_mq_ops;
dev->tagset.nr_hw_queues = dev->online_queues - 1;
dev->tagset.nr_maps = 1; /* default */
if (dev->io_queues[HCTX_TYPE_READ])
dev->tagset.nr_maps++;
dev->tagset.nr_maps = 2; /* default + read */
if (dev->io_queues[HCTX_TYPE_POLL])
dev->tagset.nr_maps++;
dev->tagset.timeout = NVME_IO_TIMEOUT;
Expand Down Expand Up @@ -3029,6 +3027,8 @@ static const struct pci_device_id nvme_id_table[] = {
.driver_data = NVME_QUIRK_LIGHTNVM, },
{ PCI_DEVICE(0x1d1d, 0x2601), /* CNEX Granby */
.driver_data = NVME_QUIRK_LIGHTNVM, },
{ PCI_DEVICE(0x10ec, 0x5762), /* ADATA SX6000LNP */
.driver_data = NVME_QUIRK_IGNORE_DEV_SUBNQN, },
{ PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) },
{ PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001) },
{ PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) },
Expand Down
Loading

0 comments on commit 0441281

Please sign in to comment.