Skip to content

Commit

Permalink
Better recheck of dead upstream servers.
Browse files Browse the repository at this point in the history
Previously nginx used to mark backend again as live as soon as fail_timeout
passes (10s by default) since last failure.  On the other hand, detecting
dead backend takes up to 60s (proxy_connect_timeout) in typical situation
"backend is down and doesn't respond to any packets".  This resulted in
suboptimal behaviour in the above situation (up to 23% of requests were
directed to dead backend with default settings).

More detailed description of the problem may be found here (in Russian):
http://mailman.nginx.org/pipermail/nginx-ru/2011-August/042172.html

Fix is to only allow one request after fail_timeout passes, and
mark backend as "live" only if this request succeeds.

Note that with new code backend will not be marked "live" unless "check"
request is completed, and this may take a while in some specific workloads
(e.g. streaming).  This is believed to be acceptable.
  • Loading branch information
mdounin committed Oct 12, 2011
1 parent 72df0f4 commit b713e48
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 8 deletions.
4 changes: 2 additions & 2 deletions src/http/modules/ngx_http_upstream_ip_hash_module.c
Original file line number Diff line number Diff line change
Expand Up @@ -185,8 +185,8 @@ ngx_http_upstream_get_ip_hash_peer(ngx_peer_connection_t *pc, void *data)
break;
}

if (now - peer->accessed > peer->fail_timeout) {
peer->fails = 0;
if (now - peer->checked > peer->fail_timeout) {
peer->checked = now;
break;
}
}
Expand Down
21 changes: 15 additions & 6 deletions src/http/ngx_http_upstream_round_robin.c
Original file line number Diff line number Diff line change
Expand Up @@ -443,8 +443,8 @@ ngx_http_upstream_get_round_robin_peer(ngx_peer_connection_t *pc, void *data)
break;
}

if (now - peer->accessed > peer->fail_timeout) {
peer->fails = 0;
if (now - peer->checked > peer->fail_timeout) {
peer->checked = now;
break;
}

Expand Down Expand Up @@ -491,8 +491,8 @@ ngx_http_upstream_get_round_robin_peer(ngx_peer_connection_t *pc, void *data)
break;
}

if (now - peer->accessed > peer->fail_timeout) {
peer->fails = 0;
if (now - peer->checked > peer->fail_timeout) {
peer->checked = now;
break;
}

Expand Down Expand Up @@ -663,15 +663,16 @@ ngx_http_upstream_free_round_robin_peer(ngx_peer_connection_t *pc, void *data,
return;
}

peer = &rrp->peers->peer[rrp->current];

if (state & NGX_PEER_FAILED) {
now = ngx_time();

peer = &rrp->peers->peer[rrp->current];

/* ngx_lock_mutex(rrp->peers->mutex); */

peer->fails++;
peer->accessed = now;
peer->checked = now;

if (peer->max_fails) {
peer->current_weight -= peer->weight / peer->max_fails;
Expand All @@ -686,6 +687,14 @@ ngx_http_upstream_free_round_robin_peer(ngx_peer_connection_t *pc, void *data,
}

/* ngx_unlock_mutex(rrp->peers->mutex); */

} else {

/* mark peer live if check passed */

if (peer->accessed < peer->checked) {
peer->fails = 0;
}
}

rrp->current++;
Expand Down
1 change: 1 addition & 0 deletions src/http/ngx_http_upstream_round_robin.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ typedef struct {

ngx_uint_t fails;
time_t accessed;
time_t checked;

ngx_uint_t max_fails;
time_t fail_timeout;
Expand Down

0 comments on commit b713e48

Please sign in to comment.