Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is module DFS really needed? #232

Open
ivanperez-keera opened this issue Apr 27, 2023 · 2 comments
Open

Is module DFS really needed? #232

ivanperez-keera opened this issue Apr 27, 2023 · 2 comments

Comments

@ivanperez-keera
Copy link
Contributor

ivanperez-keera commented Apr 27, 2023

The module DFS implements graph algorithms. From that module, only the functions out and t_close are used externally (in NFA.hs). The functions list_tree, mk_graph, edges, rev_edges, reverse_graph, scc, top_sort and dff, and the type Edge, are not used.

However, there's something else that strikes me from that module, and points to a potential cleaning opportunity:

  • The top comment reads "This module is a portable version of [...] [an] encoding of [...] linear graph algorithms. This module uses balanced binary trees instead of mutable arrays to implement the depth-first search so the complexity of the algorithms is n.log(n) instead of linear." However, isn't linear better than n times log(n)??? Why go for the latter if you can opt for the former?

  • There is an implementation of graphs in Data.Graph in containers (and containers is already a dependency of alex), although I haven't checked if that was there already in the earliest version of GHC supported by alex, which I think is GHC 7 (?). The two functions used externally, out and t_close, could be defined, if I understand their purpose correctly, as:

out :: Graph -> Int -> [Int]
out = Data.Array.(!)

t_close :: Graph -> Graph
t_close g = buildG (bounds g) [ (v1, v2) | v1 <- vertices g, v2 <- reachable g v1 ]

List comprehensions are generally slower than map so there's probably a faster way of building t_close if performance is an issue here.

My questions are:

  1. Should this module be replaced by a use of Data.Graph?
  2. If not, should the unused functions mentioned above be removed?
@andreasabel
Copy link
Member

Wise man said: Never change a running system.
So, I am a bit skeptic.

Since this code has been subjected to 30 years of testing in practice, it is likely bug-free. So, what incentive is there to change it?
If one could demonstrate a performance gain using some other implementation, via realistic benchmarks, this would be an argument to change things.

Concerning Data.Graph, does it have an efficient transitive-closure algorithm?
The one you suggest looks like it would not have the best performance (it is not the Warshall algorithm).

[ (v1, v2) | v1 <- vertices g, v2 <- reachable g v1 ]

Concerning removal of unused functions: Wouldn't this make the little graph library DFS less complete?
Unused functions will get removed by the linker automatically, wouldn't they?

@andreasabel andreasabel changed the title Is DFS really needed? Is module DFS really needed? Apr 27, 2023
@ivanperez-keera
Copy link
Contributor Author

ivanperez-keera commented Apr 27, 2023

Since this code has been subjected to 30 years of testing in practice, it is likely bug-free. So, what incentive is there to change it?

Reducing the lines of code in a system is always a good incentive. And since we are relying on a standard Haskell library, we can make similar assertions about the quality of the code we are relying on based on usage (this module was there in containers-0.1). Note that both modules are based on the same idea (both cite the Launchbury/King paper).

Concerning Data.Graph, does it have an efficient transitive-closure algorithm?

The function reachable is advertised as 𝑂(𝑉+𝐸), and traversing over vertices is 𝑂(𝑉), so the overall function should be doable in 𝑂(𝑉+𝐸) * 𝑂(𝑉) with a concatMap. Floyd-Warshall is 𝑂(𝑉^3) .

Note that if we cannot implement this in comparable or better time than Sort's for similarly sparse/dense graphs, it suggests that this implementation might be faster than Data.Graph and so containers should incorporate it.

Wouldn't this make the little graph library DFS less complete?

But will anyone ever use that? Otherwise, alex contains the perfect DFS library... that nobody will use.

Also worth noting is that only the functions that have effectively been used from the Sort module have been subjected to such long-term "tests". The ones that have not been used have been subjected to (near) zero tests. For all we know, they could be 100% wrong. Keeping them there would incorrectly signal that this code has been subjected to 30 years of testing in the wild, when maybe only 2 functions have.

And why stop there? Why not add more potentially useful (but actually unused) functions, if that makes it somehow "better"?

There's very clear criteria when it comes to the code that should be added in the system: is it needed? If not, then it's out. (And we may even be able to tell whether any of it was ever used, demonstrating even more the "practical uselessness" of keeping it around.)

Unused functions will get removed by the linker automatically, wouldn't they?

They should (that also takes compilation time, btw, even if little). But the executable size is not the biggest concern here.

A bigger concern is that many maintenance and QA actions are directly and strongly related to the number of lines of code in a system. More lines, more code, more cost. More to clean. More to test. More to maintain. More to adjust and adapt. More to understand when you are new to a system. More to document. More to read to figure out if / where / how it is used and why it's there if it's not used.

There is an action that reduces that cost dramatically: removing unused code.

We can help alex, happy and other systems "live" for another 30 years by making the jobs of current and future maintainers easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants