Feature request: load monitoring #21

bvdeenen · 2012-02-02T09:06:02Z

Hi all

Since I'm on a roll with emysql, I'd like to talk about a feature we could use. We can probably implement it here (at spilgames.com), but I'd like your thoughts on it.

What I'd like is some kind of emysql load monitor figure, so that I can slow down front end nodes if they start hammering emysql too hard, and also to tell the money guys to buy new hardware when required. I know that during testing I can hit the connection_lock_timeout quite easily (just spawn 50k requests at the same time :-). We plan to use emysql in a production environment where dozens of servers running probably 100's of Erlang nodes. I need some sort of load figure in emysql so that I can control the total flow through our cluster.

I was thinking of hooking something to emysql_conn_mgr:wait_for_connection/1 to count the number of requests that are waiting for a connection, and create an interface function emysql:install_load_callback that makes it possible to do something with it.

Is this in your opinion the best way/place to do it? If the code is good, would you like to include it in the eonblast/emysql repository? Any other suggestions?

Bart

The text was updated successfully, but these errors were encountered:

Eonblast · 2012-02-02T10:07:50Z

Hi Bart,

sounds very good to me, certainly interesting to merge into the package if you get it to work.

To get a better picture, how could you slow down front end nodes? Telling users to wait and come back later? Or slow down game update rates?

In other words: what remedy for overload are you seeing concretely? Like, what could your callback achieve? If you monitor or limit the queues, should an execute() call simply return an error message? Or do you see a scenario where the problem could actually be alleviated (like firing up more machines in the cloud)? And thus the queues allowed to grow further until the extra power arrives?

Maybe wait times would be more important to look at than the actual lenght of the queues.

The easiest I can think of is measuring time spent in the queue (both wallclock and cpu) and check the length of the queue, set hard limits and return errors instead of results when the thresholds are crossed.

As for monitoring, you'd have those three values exposed as functions and be polling it now and then. Or it could be dumped into a revolving log.

Henning

bvdeenen · 2012-02-02T10:20:51Z

Hi Henning

my front nodes are channeled through what I call pipeline processes to provide atomic interactions with parts of the databases. These calls are blocking, and I could easily add a timer:sleep() call there before I let them indirectly call emysql:execute. There are a limited number of pipeline_factory processes that direct the front nodes to the pipelines, and I can slow it down easily there. But also, I really need some sort of load figure, so that we see how busy it is, so that we can hook up more hardware.

So you agree the emysql_conn_mgr:wait_for_connection/1 is a good place to start working on this?

I'll give it a shot in a few weeks, I'm too busy with something else at the moment.

Eonblast · 2012-02-02T11:24:58Z

Yes, maybe one deeper, handle_call({start_wait, PoolId}, {From, _Mref}, State), that is where the actual wait queue is coming in play, instead of an execute() call getting a connection right away.

But for your purpose it may be better to check the lengths of the queues instead every now and then and take action. Otherwise you have the callback triggered for every queued process once the queue is of a threshold size. That's probably overhead you want to avoid in a situation when resources are getting scarce in the first place.

Getting the lengths of the queues of course is trivial.

If you put your pipelines to sleep, who is in turn waiting for the pipelines? I still wonder what the effective solution is in your case. Somebody has to wait or come back later. Who is eventually asked to be patient in your setup?

See you around,
Henning

bvdeenen · 2012-02-02T12:45:16Z

I haven't thought about it in detail, but my stuff is being called from front-end servers, that handle http stuff from browsers and flash games. The front-end servers are stateless, so I think by just delaying the handling of the requests, I'm having the browsers and flash games slow down their requests.
But I have to be careful that I'm not letting them wait too long, so that some clever guy puts some retry mechanism in their javascript, and I get hammered at double the speed as soon as I start slowing down.

Still being in the discussion stage here, and not yet a priority. I'd just like to get some hooks in emysql to see how busy it is.

hdiedrich · 2013-01-09T07:01:29Z

I have worked something out for this. Any progress on your side?

Best,
Henning

jlouis · 2013-07-02T14:48:56Z

Hey guys, how is this handled here, 6 months after? Do we have a fix, or is this still in the todo-stage.

I am monitoring the pool status and then I am outputting something when my pools are going low. But it makes a lot of sense to use alarm_handler invocation when the pool is exhausted I think. This would let you know something when the system becomes overloaded, and you can then piggy-back on this in your system, and dampen the aggression on the driver.

jlouis · 2014-02-26T11:32:08Z

I still think we need a way to query the pool state and have a way for the pool to set alarms when it is exhausted and the wait queue is very long.

jlouis added the Enhance label Feb 27, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: load monitoring #21

Feature request: load monitoring #21

bvdeenen commented Feb 2, 2012

Eonblast commented Feb 2, 2012

bvdeenen commented Feb 2, 2012

Eonblast commented Feb 2, 2012

bvdeenen commented Feb 2, 2012

hdiedrich commented Jan 9, 2013

jlouis commented Jul 2, 2013

jlouis commented Feb 26, 2014

Feature request: load monitoring #21

Feature request: load monitoring #21

Comments

bvdeenen commented Feb 2, 2012

Eonblast commented Feb 2, 2012

bvdeenen commented Feb 2, 2012

Eonblast commented Feb 2, 2012

bvdeenen commented Feb 2, 2012

hdiedrich commented Jan 9, 2013

jlouis commented Jul 2, 2013

jlouis commented Feb 26, 2014