On 12/19/2012 06:37 AM, Daniel Müllner wrote:
Hi Tiago and everyone else,
So, in reality, as far as the algorithm is correct, this test is not necessary and could have been omitted. I decided to keep it. Maybe a better option would be to optionally activate it, since in most cases it is not desired.
In case the validation does not take much time (say, less than 20% of the time to find the matching), I'd suggest to keep it but to not output the result. I'd instead raise an exception instead if validation failed (which should never happen unless there is a bug!).
This makes a lot of sense. It is much better than the current state. The verification step is much faster O(N + E), while the matching algorithm itself is O(NE), so it should not be a problem. I'll make this modification.
Alternatively: Isn't the validation much more valuable when the heuristic is used, ie. when it indeed can happen that the matching does not have maximal cardinality? Is it possible to include the validation for the heuristic=True setting? In this case, it would make more sense to keep the test result as an output value, but then for both "heuristic" settings!
Yes, it is possible to include the verification if heuristic=True, but only for the unweighted version. This sounds like a meaningful modification as well.
The weighted version optimizes the sum of the weights. Thus, if the weights are one (or simply equal), it reduces to the cardinality. The heuristic _does not_ guarantee that the optimum is found, it only guarantees that it runs really fast in linear time.
It's still not clear to me what the heuristic optimizes since the two criteria "find a matching with maximal cardinality" and "minimize the sum of edge weights" are contrary goals. Let's assume unitary edge weights for simplicity. If someone told me to write a matching algorithm that does not necessarily yield a maximum cardinality matching but optimizes for minimal sum of weights, I'd play dumb and output an empty matching. That has weight zero, is thus "better" than any other partial matching and is generated in constant time. I'm sure the algorithm does a cleverer thing - but what? I also suspect that I can extract the answer if I study the paper that Tiago gave as a reference in the documentation, but in case you remember what the algorithm really aims for, can you spell it out for everyone, Tiago?
Maximum cardinality is always implied in all cases, even when the weights are minimized. So returning an empty match is not meaningful.
I'd indeed be very interested in a refined max_cardinality_matching that has an exact algorithm for the weighted problem. I find the way networkx approaches the problem very good; the only issue there is speed. In networkx, the user chooses whether an overall maximum-weight matching is searched for (which might not be maximum-cardinality) or whether a maximum-weight matching is chosen among all maximum-cardinality matchings.
The heuristic in graph-tool performs the second version. I think it is clear that the documentation right now is not as good as it could be, and algorithm could also be improved to cover more general cases. I'll keep this in the TODO list. If you wish, you can open a ticket for it, so that I can be reminded of it, and you can keep track of the progress. Cheers, Tiago -- Tiago de Paula Peixoto <tiago@skewed.de>