Skip to content

Commit

Permalink
xprtrdma: Fix corner cases when handling device removal
Browse files Browse the repository at this point in the history
Michal Kalderon has found some corner cases around device unload
with active NFS mounts that I didn't have the imagination to test
when xprtrdma device removal was added last year.

- The ULP device removal handler is responsible for deallocating
  the PD. That wasn't clear to me initially, and my own testing
  suggested it was not necessary, but that is incorrect.

- The transport destruction path can no longer assume that there
  is a valid ID.

- When destroying a transport, ensure that ib_free_cq() is not
  invoked on a CQ that was already released.

Reported-by: Michal Kalderon <[email protected]>
Fixes: bebd031 ("xprtrdma: Support unplugging an HCA from ...")
Signed-off-by: Chuck Lever <[email protected]>
Cc: [email protected] # v4.12+
Signed-off-by: Anna Schumaker <[email protected]>
  • Loading branch information
chucklever authored and amschuma-ntap committed Apr 10, 2018
1 parent 5717459 commit 2552428
Showing 1 changed file with 9 additions and 4 deletions.
13 changes: 9 additions & 4 deletions net/sunrpc/xprtrdma/verbs.c
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,6 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
wait_for_completion(&ia->ri_remove_done);

ia->ri_id = NULL;
ia->ri_pd = NULL;
ia->ri_device = NULL;
/* Return 1 to ensure the core destroys the id. */
return 1;
Expand Down Expand Up @@ -447,7 +446,9 @@ rpcrdma_ia_remove(struct rpcrdma_ia *ia)
ia->ri_id->qp = NULL;
}
ib_free_cq(ep->rep_attr.recv_cq);
ep->rep_attr.recv_cq = NULL;
ib_free_cq(ep->rep_attr.send_cq);
ep->rep_attr.send_cq = NULL;

/* The ULP is responsible for ensuring all DMA
* mappings and MRs are gone.
Expand All @@ -460,6 +461,8 @@ rpcrdma_ia_remove(struct rpcrdma_ia *ia)
rpcrdma_dma_unmap_regbuf(req->rl_recvbuf);
}
rpcrdma_mrs_destroy(buf);
ib_dealloc_pd(ia->ri_pd);
ia->ri_pd = NULL;

/* Allow waiters to continue */
complete(&ia->ri_remove_done);
Expand Down Expand Up @@ -627,14 +630,16 @@ rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
{
cancel_delayed_work_sync(&ep->rep_connect_worker);

if (ia->ri_id->qp) {
if (ia->ri_id && ia->ri_id->qp) {
rpcrdma_ep_disconnect(ep, ia);
rdma_destroy_qp(ia->ri_id);
ia->ri_id->qp = NULL;
}

ib_free_cq(ep->rep_attr.recv_cq);
ib_free_cq(ep->rep_attr.send_cq);
if (ep->rep_attr.recv_cq)
ib_free_cq(ep->rep_attr.recv_cq);
if (ep->rep_attr.send_cq)
ib_free_cq(ep->rep_attr.send_cq);
}

/* Re-establish a connection after a device removal event.
Expand Down

0 comments on commit 2552428

Please sign in to comment.