-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epoch and lfc become unusable for programs with large (>500) banks and multiports #433
Comments
In looking at the graph classes that we are using, e.g. for detecting cycles, there are quite a few inefficiencies there. For example, to obtain a list of nodes in the graph, the code currently creates a new list on the heap and copies two lists into it to make one list. It does this fresh each time you obtain the list. Also, the graph instances duplicate information that is already in the ReactorInstance, ReactionInstance, etc. classes. I saw quite a few other potential problems. I'm not sure that is the culprit, but graphs of size 1000 are not big, so I think we have a problem if we can't handle those. |
Do we need to expand the graph in order to do the cycle analysis and other steps? I find it interesting that this works fine for |
Interesting conjecture. Unfortunately, below is a counterexample. As given below, there is no cycle. However, if you change this line:
to
then you get a cycle. This counterexample also shows a bug in the validator, which fails to find the cycle. Instead, the cycle gets found later and an internal error is thrown (an exception that a user should never see).
This counterexample also shows a downside of allowing target code to specify widths. It means that cycles will not be detected at compile time. |
Interesting example! Perhaps we should simply forbid such programs (where there could be a cycle). The example would be much clearer if I am not 100% sure, but I think we could do the cycle analysis simply on the graph shown in the diagram, where all multiport connections are treated as a single line, and all banks as a singe instance. Then the logic would detect a cycle in your first example, even though it is correct under our current rules. However, I don't think we would lose generality by imposing this additional restriction, as cycles could be resolved by introducing additional ports and being clearer about the individual connections. |
Forbidding such examples seems dangerous. I suspect there is a much
simpler version of this example that would illustrate the same
problem. Also, it could get particularly difficult to reason about
which programs are allowed and which are not, particularly once we
introduce interleaved connections.
In any case, a graph with 1000 nodes is small. We have a serious
problem if we can’t handle that. We just mask the problem if we forbid
such programs.
|
My point wasn't so much about forbidding programs with a large number of nodes. It was more to simplify the cycle analysis such that it works with an arbitrary number of nodes (could be even > 1,000,000). But I agree, we need to handle this carefully. By the way, the C++ runtime implements its own dependency analysis. This analysis works fine for 1000 nodes, although it introduces a small but noticeable delay to the program startup. This indicates that the cycle analysis generally is possible with a reasonable overhead for a graph of this size. However, I am still a bit concerned about the initialization delay that the cycle detection introduces. While it works for 1,000 nodes, I don't think it will work for 1,000,000 nodes. So we probably also need to consider ways to simplify the analysis. |
So I've fiddled with [VisualVM] for a bit and got to test the example program in the OP. First I have to mention that the graph doesn't have 1000 nodes. We have 1000 nodes which each have 1000 inputs and 1000 outputs so we have 2 million ports total. Each instance is a node in TopologyGraph. I don't think this is a small or even regular size graph, though I could be wrong. I'm mentioning this because it's not clear from the discussion that you're aware of this. Here's a heap sample during the IDE freeze: LinkedHashSet is implemented with LinkedHashMap internally so there are only around 1 million LinkedHashMaps, but there are 3 million LinkedHashSets! The 2 million PortInstance are expected, and as we can see, they're not really a problem for the heap (they add up to 12% usage). The problem is the HashSets. Does PortInstance use LinkedHashSet somehow? This should account for 2 million of the sets. Also ports are stored into a TopologyGraph instance. Looking into this graph hierarchy I find that the graph maintains set of successors and a set of predecessors for the entire graph. This means that each edge between ports creates 2 sets in that data structure... One glaring problem here is that each of these sets has only few elements (maybe most of them have 1). But even so, since we use LinkedHashSet everywhere, each corresponds to at least
This holds for LinkedHashSets with a single element, which is basically the worst case. We should consider using more specialized set implementations for small sets, and only inflate Sets to a LinkedHashSet when we need to add many elements. The entire codebase uses LinkedHashSet as a type everywhere... This is really bad practice, we should refactor everything to just require Set. Anyway I'll start refactoring and measure the difference. Full data, loadable into your own VVM instance: ide-vm-snapshot.zip Also I think there's a memory leak since I've seen the number of PortInstance double when editing the file then saving. Possibly, the topology graph is stored in some field somewhere which is not cleaned up. I still have to investigate this... |
This sounds like a good start! Did you push your changes somewhere? |
The reason for using I have a bit of a disagreement with @lhstrh on the way we are currently handling construction and analysis of graphs. The way that I would do it would be to define graph algorithms to operate on interfaces, and then to make ReactorInstance and related classes implement those interfaces. Instead, what we do now is to replicate the information in ReactorInstance and related classes in separate graph data structures. There are some algorithms where this is a reasonable way to do it (particularly algorithms that are based on removing nodes or edges from the graph), but I suspect these algorithms can be replaced with non-destructive ones. |
I understand that, and the "specialized" collections I'm referring to also preserve iteration ordering (they're those Unfortunately in Java, the collections framework works a lot by contract. The only reasonable common denominator between collection types is very general interfaces like Set or List or Map. Whether they are modifiable or read-only, whether they preserve insertion order or not, whether they are thread-safe or not, all of that is up to the programmer to guarantee (or leave unspecified) and document. There are no interfaces for these more precise characteristics. (Side note, any List of course preserves insertion order, and using LinkedList over ArrayList is wasteful in nearly all cases). In my experience though this doesn't cause any problem. We already have the discipline never to create a HashSet over a LinkedHashSet, so iteration order will be reproducible. But there's nothing wrong with creating immutable singleton sets or empty sets, especially when they save so much memory. The only way to make this work is to only couple to interfaces like Set or Map. We can talk about this during the code review meeting I hope
We could have a middle ground by keeping separate graph classes, but having the logic to implement an edge be abstract. So the directed graph here wouldn't have directly sets to store succ and pred. Only abstract methods like Our implementation of This way we avoid duplicating the graph, we keep graph algorithms tidy and outside our domain classes, and we don't introduce interfaces that are only rarely needed. (This is basically an adapter that adapts our domain classes to what the graph classes expect, but we would still program to interfaces) |
#442 helped with the issue, but I am not sure if we can consider it resolved just yet. Compiling the Banking benchmark with 1000 accounts is now possible without error, but it still takes about 2 minutes on my machine (spend in lfc, not the target compiler) and consumes more than 3GB of memory. |
Apparently this got worse again... with current master I get a |
Actually I spoke too soon... Since gcc also takes a considerable amount of memory and most of the memory allocated in our compiler is not freed before invoking the target compiler, I can not compile this on a machine with 16GB of memory (+Swap). |
As it turns out, even 32GB of memory are not enough. Due to #635, gcc also consumes a huge amount of memory. |
As @Soroosh129 suspected yesterday, the compilation indeed works for the C++ banking benchmark (consuming about 3GB of memory) and does not work for the C version (consuming more than 8GB). This explains why I was so confused about the compilation working or not; I simply used the C++ version when I tried before. So it seems that the C code generator has a significantly larger memory footprint than the C++ generator. This is likely in part due to the use of graph data structures for generating in C, which we don't use in C++. However, I think the instance graph is created nonetheless in GeneratorBase. So there might be another issue lurking in the C generator apart from the graphs. |
Looks like we can finally close this (#875). |
While experimenting with the banking benchmark, which uses by default 1000 instances of a reactor, I noticed that the IDE becomes unusable for such a large number of reactor instances. The IDE simply hangs and after a while shows an error indicating that there is no more memory. lfc shows the same behavior... I assume that this originates from the fact that we unroll the complete instantiation graph and create an object for every single reactor instance, but I am still wondering why the memory footprint is so high. Maybe there is a memory leak of some sort.
The error can be reproduced using the following simple program:
Note that depending on your available memory, you might need to increase
num_nodes
to trigger the error. While experimenting, I had the impression that the memory footprint does not scale linearly with the number of nodes.The text was updated successfully, but these errors were encountered: