-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calling RServe using ThreadPool #30
Comments
Sorry. I am fairly certain that I have fixed this issue. I adapted this example for C# and it seems to work: Hopefully this helps anyone else who has a similar issue. It might also be worthwhile to provide a sample code of something like multithreaded requests. If you would like to do this, I might be able to contribute some of my demo example code to yours (mine uses a threadpool). Closing for now and let me know what you think. Thanks, |
I think there are two issues here:
Of course, you could just launch multiple servers with one RConnection to each server and it would definitely be thread-safe. But then you'd incur the startup cost of each server and the additional memory overhead because you'll have multiple R processes running. |
So does this mean that this can not be used in an C# MVC environment? I was planning on using this library in webservice layer. Calls to the webservice would make calls to an RServer. |
This has nothing to do with C# MVC. MVC is a framework. Like all things .net you can make many choices. If you need to talk to R in a multi-threaded way then you need to select a path that won't break. I've outlined 3 options above. the first is a fix I an make in the code. the second I would need to research. the third is guaranteed thread-safe but requires more overhead. It's up to you to decide which approach is appropriate for your situation. If you tell me more about what you're trying to do in MVC perhaps I can help if it's simple/quick for me to understand. |
Sure. I am going to have a simple WebApi Controller/Endpoint. Something like:
Just pseudo code, but I think you get the idea. I apologize for not knowing more. I'm trying to learn more about MVC and how it handles multiple requests. It seems that MVC requests use thread pools. When a new request comes in, a thread is grabbed from the thread pool. This thread is used to complete the request and then the thread is returned to the pool. So if multiple requests come in, multiple RConnections (on different threads) would be created. My understanding was that RServe (on unlix/linux) would fork a process for each connection. I've done some simple testing and things seems ok....but I'm still learning about the pitfalls. Any advice on which approach you think would be appropriate would be great. Thanks! |
It looks like when I run my sample code, I am seeing this: It looked like RServe would launch multiple processes and the answers that I was receiving back were not garbage. But I need to confirm with more test code. I was running a long running R function and sending it the thread number (to R). so that way, if it had an answer from the calculation and the thread number looked correct, I could be fairly certain it was spawing multiple servers. I also ran Linux "top" command and was able to see multiple RServe(s) entering and exiting. The problems I am now seeing are related to this message: *** stack smashing detected ***: /usr/lib64/R/library/Rserve/libs//Rserve terminated |
I have a sample code of this actually working. I have it down to as simple as I believe is possible at this point. I am launching 4 threads on a 4 core machine using ThreadPool and ManualResetEvent. Suraj, would you be interested in me emailing you the solution? If so, my email is at [email protected] I think this could help you a bit. Then, you can look in the "top" command output and see the RServe processes, etc. I'm still getting a (different) failure (at around 30 seconds in): I'm wondering if it is my hosting provider (Linode and DigitalOcean) but not yet sure. This seems to happen on and off now and again. EDIT: Right now I do not see this error on DigitalOcean (using the same code). I DO see it on Linode. I'm not sure what is causing it to happen yet. echo 30> /proc/sys/net/ipv4/tcp_keepalive_intvl *I've also seen keep.alive option here, but not sure if that is the issue and cannot get it to work yet... |
I think if this is working on your box and on DigitalOcean, then perhaps Linode doesn't provide enough resources to make this scenario work. I don't know enough about these clouds to be helpful. If I were trying to do what you are doing, I wanted to maximize performance for the end-user and you are OK creating multiple RServers (so you can afford the memory/CPU resources), then I would do the following:
something like this (I just quickly blasted out this code, so obviously you need to think some more about it make sure it's correct)...
|
Nice! Thanks for providing the code. I will look at your code soon here as well to see how I can also improve mine. Sorry for the confusion with regard to my Linux boxes. That was not a "real issue," I was able to get mine to work: the issue was that I was attempting to launch too many threads for the number of cores that I have. With my Linux boxes and the threadpool, it looks like I can safely add n and n+1 threads (4 core machine can have up to 5 threads). I can now do about 390 runs per minute of a code that takes one second each time to run with 7 threads. When I have been doing this successfully, I have been creating an individual RServe server for each dedicated thread number. Without that, I was experiencing many more issues. Thank you for your help here. I'll keep you up to date; I can potentially share code here in the future if you would like. |
closing this out and adding: #32 |
I tried this and it doesn't work. The second connection blocks to the same server blocks on receive in
I'm experimenting with this and if successful, would like to submit a PR. Is this something you'd be interested in and would have the bandwidth to review? |
Hi @atifaziz - Something like this should be built on top of RServeCLI2, not included in the framework. I suggest creating a new project that takes a dependency on this one. Which brings up a good point - I need to create a NuGet package for this. Will open a bug. |
oh...seems like there already is one: #34 |
nuget package is now published to the public Nuget gallery |
@SurajGupta I'm not quite sure that can be done without perhaps reviewing some of the guts. It's not a simple case of pooled connections and sever process management. I'm considering using this from a web front-end and that to do that in an efficient and robust manner requires non-blocking communications as well as streaming of results where possible. I think all those additions would benefit this project. If we go with the build-on-top argument then frankly you can also break up this project into SEXP structures, QAP1 and connection management, all of which can be layered on top of each other, managed and improved separately. Yet they are all interconnected in one project here. What I want to avoid is forking and going off at a tangent because that's easy to do but not necessarily something that benefits in the long-run. I would prefer collaborating on an existing project that's already in good shape and building out where necessary. It also helps to have more than one pair of eyeballs on a problem. Still not interested? Thoughts? |
What would you change about the project? What do you need exposed that is not currently exposed? We've already established that the multi-threaded approach connecting to a single R server won't work. To use multiple R servers, you'll need multiple clients connections and a manager sitting on top. RServeCLI's purpose is to be very good at letting you launch a server/client combo. Its hard for me to see why you would change that or how that hinders the manager. Maybe you can tell me more about your proposed design. |
Unless I missed something, I don't think you can launch today because
Async communications. On a server, you don't want to tie threads down waiting to receive results on the socket while the computation is taking place in R. I would use TPL +
From a structural perspective, I would like to see it broken up and more layered: SEXP ⬅️ QAP1 ⬅️ connection management ⬅️ multiplexing. This would not be a primary goal but would facilitate spiking, experimentation & evolving the layers independently. It would also enable different implementations of those layers to be fine-tuned for desktop and server scenarios. This may sound a lot but I imagine doing it in baby steps and only as it makes sense. |
Yes - I misspoke. RServeCLI is supposed to be a very good client for an already running R Server. The Server is included in the test project simply to test the client.
You mean on the client you don't want to tie threads down, not the server, right? Right now RServe is 100% synchronous. I agree it would be useful to have async versions of all methods on RConnection, like I do want to note that async isn't necessary for your web-site to support multiple R Servers. You could write a thin wrapper around RConnection and make the calls into each RConnection async. Your "manager" would talk to the wrappers, and not directly to RConnections. It would not be 100% efficient, but I'm betting you wouldn't really notice the perf difference and it would work really well.
Those are all great suggestions. This project could certainly use more layering and I would happily accept contributions to that end. But again, I want to be clear - async + better layering is not justification for adding a feature to support multiple servers. That feature would certainly benefit from async/better layering. This project has always been about creating an awesome, easy-to-use, expressive, stable client. I think it hasn't been very extensible and we should fix that. If we do a better job in that department, then I think that adding a multi-server feature would be easy to write on top of the project, instead of as a part of it. But we can also pause that debate and wait until we get there? We are both agreed that async and better layering would be a precursor to any multi-server work? |
@atifaziz - thoughts on my last comment? want to keep the momentum up on this one. |
I meant server but in the end it will benefit both really. On a client, you can keep the UI responsive during I/O. On the server, you don't want to tie down threads blocking on I/O and stress/starve your thread pool. The thread can be returned to the pool to serve other requests while I/O is pending or computation is taking place.
It's possible to maintain binary compatibility. What it would change is the minimum required version of .NET from 3.5 to 4.5 (or even 4.5.2). So for someone who was already on 4.5.x, there would be no change provided all the public synchronous API is maintained. Another way to tackle this is to consider the current v1 release as targeting & supporting the 3.5 base but have v2 moving up to the latest in the interest of productivity. This is F/OSS so people can still submit features and fixes against v1. And if v2 has lots going for and someone needed it working on .NET 3.5 then they could try and back-port the features. With the layering I mentioned, one can certainly also imagine a mixed approach. For example SEXP data types and some of the QAP1 protocol handling shouldn't need the latest and greatest runtime and framework. In fact, they could even be housed in a PCL. The higher layers that want to rely on more recent features from the platform could then target more specifically.
It can become a nightmare-ish or a drag. It depends on how big is the user base, which is always hard to tell with F/OSS projects unless you have download statistics (which aren't really there since the NuGet package was just published recently) or people knocking on your issue tracker for a release.
Anyone else here? 😄 Wonder how you would get a consensus on that.
Yes, one could do that but it would require your own thread pooling and management, which is a lot of infrastructure to build and especially get right given that it's also hard to test. Async on the other hand is easy to program, reason about (sequential to read without callbacks if you're using
It's not so much about performance as it is about efficient use of resources.
Right, it would be beneficial anyhow. However I don't want to drive & add a feature without having a real use-case driving it.
My sentiments exactly.
Absolutely on both accounts!
Cool, so how do we go about this? First of, can we say v1 is done and break off to v2? Then, are you okay to move to VS 2015, C# 6 (both of which can target .NET 3.5 should things go bad)? Finally, what about targeting .NET 4.5 to get best of |
I think we've established that you can't have two clients connected to the same server at the same time? My original proposal for multi-client support was to just launch multiple servers and only allow one server to serve a single client at any given time. So each individual server is only ever managing a single thread? See my comment below - I think we may be talking about two different things.
I think this project is ready for an upgrade and I say we keep it simple - lets just update to VS2015, .net 4.5 and if anyone complains we can address it at that point. As you mention, it's hard to tell how many users there are and it's open source so anyone can fork/branch/and compile to whatever they want. For now, I'd say lets not maintain a separate PCL library, lets just upgrade the whole thing to 4.5/VS2015. Agreed?
I have a sense that we may be talking about two different things. What I'm saying is that I don't think any given RConnection should be choosing between multiple servers. I don't think that a multi-server feature should be built-into the project. I DO think an RConnection should be thread safe and usable by multiple client threads and that RConnection should do all the magic to make sure that only one thread is talking to the server at one time. Is that what you are saying? If it's not, then as we've agreed, we can debate that once we've done some cleanup =)
We're agreed to use conventional spacing and remove regions. Also, I'm adamant that we maintain "Treat Warnings as Errors" when building. I think Warnings are a weird concept - if the compiler guys are warning you, you should be worried. Almost every .net project that I've worked on in the past decade has had that setting enabled and I've never regretted it. So now that I've said all of that, I should also say that I'm completely slammed at work and I don't have a ton of time to devote to this. My proposal is that I own getting the project updated to VS2015/4.5, I get StyleCop and Code Analysis running and building in AppVeyor, and then hand it off to you to do the cleanup we've discussed. I would also add you as a owner of the project. |
Looks that way.
Yes!
No and I think this is where we might be running into a subtle difference. You're speaking in terms of threads whereas I'm thinking thread-agnostic. A client using the connection could send a command over one thread and then return that thread to the pool while waiting for a reply. Once the results are ready, another thread from the pool could then service the reply. There is no thread-safety needed here. The one client-connection-server relation is something that needs to be managed through an object pool. Servers are launched and connections are added to a pool. Clients then borrow connections from the pool and return when done. A connection should not serve any other client until returned to the pool. While a connection is allocated to one client, it can be used over any thread. Anything else should be undefined behavior. Are we still on two different line of thoughts?
Excellent! 10 x 👍
I'm fine with this except when missing XML doc summaries become errors! However, this thread is already getting long so I suggest we discuss that as a separate issue.
I understand and honestly I can't promise how much time I'll be able to devote either but I'd like to definitely see this project support server scenarios. And in case that goal cannot be achieved fully, I don't think the foundations that will have been laid in support of it will do the project harm.
Thanks & appreciate your trust! |
I think we may still have differing thoughts on this - but lets let the code speak for itself. In general I agree that it would be cool to use a pooled approach so that threads can be released when unused and that the client could send a request on one thread and get a response on another thread. That would allow multiple clients to talk to the same pool. I think allowing multiple servers to be registered with the pool is probably a natural extension of this idea. But this approach makes the pool and not the RConnection the primary object of this project. What we have to watch out for is that the simple scenario remains simple - one client one server. If the re-design makes that scenario more complicated for the consumer, then I would consider it a smell. So maybe the pool is a separate and parallel concept to RConnection. Lets see how it turns out!
Sorry it's taken me so long to reply. You have been added as a contributor - feel free to tackle anything we've discussed. I'm getting less optimistic about my time - I think i'll be underwater until after the new year. Feel free to go ahead with the VS2015/.net upgrade. |
We are completely aligned on how and where the pooling should take place. I also agree that the simplicity of the programming model as it exist today should remain so instead of being convoluted by any pooling requirements. The |
I have been looking this over and I cannot seem to figure out if this is possible:
1.) Can I call RServe rapidly with different threads from C# ThreadPool? When I try to do it, I seem to be getting this error result:
"Additional information: Recieved an unknown command parameter type from the server:16."
2.) Is there another way I should do this? Should I attempt to launch a few RServe daemons and then send multiple requests to each?
I'm attempting to send multiple lists of a window of 200 numbers to do the same calculation. Basically I save 210 numbers, and then want to pass data with these indices to perform the same calculation (sliding 200 length window of data points): I send these "in parallel"-
[0:199]
[1:200]
[2:201]
[3:202]
etc.
*IF I do these calculations "in line" where the first blocks the rest (without the ThreadPool or using Threading of some sort) my calculation might be too slow. I appreciate your help.
Thanks,
Mike P. Sopko
The text was updated successfully, but these errors were encountered: