Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: load monitoring #21

Open
bvdeenen opened this issue Feb 2, 2012 · 7 comments
Open

Feature request: load monitoring #21

bvdeenen opened this issue Feb 2, 2012 · 7 comments
Labels

Comments

@bvdeenen
Copy link
Collaborator

bvdeenen commented Feb 2, 2012

Hi all

Since I'm on a roll with emysql, I'd like to talk about a feature we could use. We can probably implement it here (at spilgames.com), but I'd like your thoughts on it.

What I'd like is some kind of emysql load monitor figure, so that I can slow down front end nodes if they start hammering emysql too hard, and also to tell the money guys to buy new hardware when required. I know that during testing I can hit the connection_lock_timeout quite easily (just spawn 50k requests at the same time :-). We plan to use emysql in a production environment where dozens of servers running probably 100's of Erlang nodes. I need some sort of load figure in emysql so that I can control the total flow through our cluster.

I was thinking of hooking something to emysql_conn_mgr:wait_for_connection/1 to count the number of requests that are waiting for a connection, and create an interface function emysql:install_load_callback that makes it possible to do something with it.

Is this in your opinion the best way/place to do it? If the code is good, would you like to include it in the eonblast/emysql repository? Any other suggestions?

Bart

@Eonblast
Copy link
Owner

Eonblast commented Feb 2, 2012

Hi Bart,

sounds very good to me, certainly interesting to merge into the package if you get it to work.

To get a better picture, how could you slow down front end nodes? Telling users to wait and come back later? Or slow down game update rates?

In other words: what remedy for overload are you seeing concretely? Like, what could your callback achieve? If you monitor or limit the queues, should an execute() call simply return an error message? Or do you see a scenario where the problem could actually be alleviated (like firing up more machines in the cloud)? And thus the queues allowed to grow further until the extra power arrives?

Maybe wait times would be more important to look at than the actual lenght of the queues.

The easiest I can think of is measuring time spent in the queue (both wallclock and cpu) and check the length of the queue, set hard limits and return errors instead of results when the thresholds are crossed.

As for monitoring, you'd have those three values exposed as functions and be polling it now and then. Or it could be dumped into a revolving log.

Henning

@bvdeenen
Copy link
Collaborator Author

bvdeenen commented Feb 2, 2012

Hi Henning

my front nodes are channeled through what I call pipeline processes to provide atomic interactions with parts of the databases. These calls are blocking, and I could easily add a timer:sleep() call there before I let them indirectly call emysql:execute. There are a limited number of pipeline_factory processes that direct the front nodes to the pipelines, and I can slow it down easily there. But also, I really need some sort of load figure, so that we see how busy it is, so that we can hook up more hardware.

So you agree the emysql_conn_mgr:wait_for_connection/1 is a good place to start working on this?

I'll give it a shot in a few weeks, I'm too busy with something else at the moment.

@Eonblast
Copy link
Owner

Eonblast commented Feb 2, 2012

Yes, maybe one deeper, handle_call({start_wait, PoolId}, {From, _Mref}, State), that is where the actual wait queue is coming in play, instead of an execute() call getting a connection right away.

But for your purpose it may be better to check the lengths of the queues instead every now and then and take action. Otherwise you have the callback triggered for every queued process once the queue is of a threshold size. That's probably overhead you want to avoid in a situation when resources are getting scarce in the first place.

Getting the lengths of the queues of course is trivial.

If you put your pipelines to sleep, who is in turn waiting for the pipelines? I still wonder what the effective solution is in your case. Somebody has to wait or come back later. Who is eventually asked to be patient in your setup?

See you around,
Henning

@bvdeenen
Copy link
Collaborator Author

bvdeenen commented Feb 2, 2012

I haven't thought about it in detail, but my stuff is being called from front-end servers, that handle http stuff from browsers and flash games. The front-end servers are stateless, so I think by just delaying the handling of the requests, I'm having the browsers and flash games slow down their requests.
But I have to be careful that I'm not letting them wait too long, so that some clever guy puts some retry mechanism in their javascript, and I get hammered at double the speed as soon as I start slowing down.

Still being in the discussion stage here, and not yet a priority. I'd just like to get some hooks in emysql to see how busy it is.

@hdiedrich
Copy link
Collaborator

I have worked something out for this. Any progress on your side?

Best,
Henning

@jlouis
Copy link
Collaborator

jlouis commented Jul 2, 2013

Hey guys, how is this handled here, 6 months after? Do we have a fix, or is this still in the todo-stage.

I am monitoring the pool status and then I am outputting something when my pools are going low. But it makes a lot of sense to use alarm_handler invocation when the pool is exhausted I think. This would let you know something when the system becomes overloaded, and you can then piggy-back on this in your system, and dampen the aggression on the driver.

@jlouis
Copy link
Collaborator

jlouis commented Feb 26, 2014

I still think we need a way to query the pool state and have a way for the pool to set alarms when it is exhausted and the wait queue is very long.

@jlouis jlouis added the Enhance label Feb 27, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants