max_cardinality_matching heuristic
Hi all, in which sense is the algorithm in graph_tool.topology.max_cardinality_matching with the heuristic=True switch a heuristic and not exact? As I understand, it might not return a maximum-cardinality matching even if there is one. But what about the edge weights? Does it always return a minimum-weight matching (respectively maximum-weight with minimize=False)? Or is this not guaranteed? Note that there are pitfalls to this question: eg. if all edge weights are equal to one, then the optimal min-weight matching is empty if I do not insist on max-cardinality. Can someone enlighten me what the heuristic actually optimizes? Cardinality or weights? Both (in which sense) or none (what's it good for, then)? And does it always find the optimum for one of those two criteria? Best, Daniel -- Daniel Müllner Stanford University Department of Mathematics 450 Serra Mall, Building 380 Stanford, CA 94305 USA e-mail: muellner@math.stanford.edu http://math.stanford.edu/~muellner
Hi, On 12/18/2012 01:10 AM, Daniel Müllner wrote:
Hi all,
in which sense is the algorithm in graph_tool.topology.max_cardinality_matching with the heuristic=True switch a heuristic and not exact? As I understand, it might not return a maximum-cardinality matching even if there is one. But what about the edge weights? Does it always return a minimum-weight matching (respectively maximum-weight with minimize=False)? Or is this not guaranteed?
Indeed this is a bit confusing. If heuristic=False the maximum should be always guaranteed, and the is_maximal return value should also be one. Why is this value then returned? This is explained in the boost documentation: http://www.boost.org/doc/libs/1_52_0/libs/graph/doc/maximum_matching.html Why is a verification algorithm needed? Edmonds' algorithm is fairly complex, and it's nearly impossible for a human without a few days of spare time to figure out if the matching produced by edmonds_matching on a graph with, say, 100 vertices and 500 edges is indeed a maximum cardinality matching. A verification algorithm can do this mechanically, and it's much easier to verify by inspection that the verification algorithm has been implemented correctly than it is to verify by inspection that Edmonds' algorithm has been implemented correctly. The Boost Graph library makes it incredibly simple to perform the subroutines needed by the verifier (such as finding all the connected components of odd cardinality in a graph, or creating the induced graph on all vertices with a certain label) in just a few lines of code. So, in reality, as far as the algorithm is correct, this test is not necessary and could have been omitted. I decided to keep it. Maybe a better option would be to optionally activate it, since in most cases it is not desired. If heuristic=True, a much simpler algorithm is run, which does not guarantee that optimum is found, and also performs no checking.
Note that there are pitfalls to this question: eg. if all edge weights are equal to one, then the optimal min-weight matching is empty if I do not insist on max-cardinality. Can someone enlighten me what the heuristic actually optimizes? Cardinality or weights? Both (in which sense) or none (what's it good for, then)? And does it always find the optimum for one of those two criteria? Best, Daniel
The weighted version optimizes the sum of the weights. Thus, if the weights are one (or simply equal), it reduces to the cardinality. The heuristic _does not_ guarantee that the optimum is found, it only guarantees that it runs really fast in linear time. In fact, in many cases the maximum will _not_ be found. This algorithm is used for the sfdp_layout() algorithm, which needs more to be fast, rather than exact. There are exact algorithms for the weighted version, but I have yet to implement them. I intend to implement them at some point, and thus make the 'weight' parameter effective also when heuristic=False. I hope this clarifies. Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
Hi Tiago and everyone else,
So, in reality, as far as the algorithm is correct, this test is not necessary and could have been omitted. I decided to keep it. Maybe a better option would be to optionally activate it, since in most cases it is not desired.
In case the validation does not take much time (say, less than 20% of the time to find the matching), I'd suggest to keep it but to not output the result. I'd instead raise an exception instead if validation failed (which should never happen unless there is a bug!). Alternatively: Isn't the validation much more valuable when the heuristic is used, ie. when it indeed can happen that the matching does not have maximal cardinality? Is it possible to include the validation for the heuristic=True setting? In this case, it would make more sense to keep the test result as an output value, but then for both "heuristic" settings! Anyway that's just a suggestion.
The weighted version optimizes the sum of the weights. Thus, if the weights are one (or simply equal), it reduces to the cardinality. The heuristic _does not_ guarantee that the optimum is found, it only guarantees that it runs really fast in linear time.
It's still not clear to me what the heuristic optimizes since the two criteria "find a matching with maximal cardinality" and "minimize the sum of edge weights" are contrary goals. Let's assume unitary edge weights for simplicity. If someone told me to write a matching algorithm that does not necessarily yield a maximum cardinality matching but optimizes for minimal sum of weights, I'd play dumb and output an empty matching. That has weight zero, is thus "better" than any other partial matching and is generated in constant time. I'm sure the algorithm does a cleverer thing - but what? I also suspect that I can extract the answer if I study the paper that Tiago gave as a reference in the documentation, but in case you remember what the algorithm really aims for, can you spell it out for everyone, Tiago? I'd indeed be very interested in a refined max_cardinality_matching that has an exact algorithm for the weighted problem. I find the way networkx approaches the problem very good; the only issue there is speed. In networkx, the user chooses whether an overall maximum-weight matching is searched for (which might not be maximum-cardinality) or whether a maximum-weight matching is chosen among all maximum-cardinality matchings. Best, Daniel
On 12/19/2012 06:37 AM, Daniel Müllner wrote:
Hi Tiago and everyone else,
So, in reality, as far as the algorithm is correct, this test is not necessary and could have been omitted. I decided to keep it. Maybe a better option would be to optionally activate it, since in most cases it is not desired.
In case the validation does not take much time (say, less than 20% of the time to find the matching), I'd suggest to keep it but to not output the result. I'd instead raise an exception instead if validation failed (which should never happen unless there is a bug!).
This makes a lot of sense. It is much better than the current state. The verification step is much faster O(N + E), while the matching algorithm itself is O(NE), so it should not be a problem. I'll make this modification.
Alternatively: Isn't the validation much more valuable when the heuristic is used, ie. when it indeed can happen that the matching does not have maximal cardinality? Is it possible to include the validation for the heuristic=True setting? In this case, it would make more sense to keep the test result as an output value, but then for both "heuristic" settings!
Yes, it is possible to include the verification if heuristic=True, but only for the unweighted version. This sounds like a meaningful modification as well.
The weighted version optimizes the sum of the weights. Thus, if the weights are one (or simply equal), it reduces to the cardinality. The heuristic _does not_ guarantee that the optimum is found, it only guarantees that it runs really fast in linear time.
It's still not clear to me what the heuristic optimizes since the two criteria "find a matching with maximal cardinality" and "minimize the sum of edge weights" are contrary goals. Let's assume unitary edge weights for simplicity. If someone told me to write a matching algorithm that does not necessarily yield a maximum cardinality matching but optimizes for minimal sum of weights, I'd play dumb and output an empty matching. That has weight zero, is thus "better" than any other partial matching and is generated in constant time. I'm sure the algorithm does a cleverer thing - but what? I also suspect that I can extract the answer if I study the paper that Tiago gave as a reference in the documentation, but in case you remember what the algorithm really aims for, can you spell it out for everyone, Tiago?
Maximum cardinality is always implied in all cases, even when the weights are minimized. So returning an empty match is not meaningful.
I'd indeed be very interested in a refined max_cardinality_matching that has an exact algorithm for the weighted problem. I find the way networkx approaches the problem very good; the only issue there is speed. In networkx, the user chooses whether an overall maximum-weight matching is searched for (which might not be maximum-cardinality) or whether a maximum-weight matching is chosen among all maximum-cardinality matchings.
The heuristic in graph-tool performs the second version. I think it is clear that the documentation right now is not as good as it could be, and algorithm could also be improved to cover more general cases. I'll keep this in the TODO list. If you wish, you can open a ticket for it, so that I can be reminded of it, and you can keep track of the progress. Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>
participants (2)
-
Daniel Müllner -
Tiago de Paula Peixoto