Rank
Aggregation: Together We're Strong
By

Dr.
Frans Schalekamp
Postdoctoral
Researcher, Institute for Theoretical Computer
Science
Tsinghua
University, Beijing

Date:
Feb 16, 2009 (Monday) 
Time:
4:30p.m  5:30p.m 
Venue:
Rm. 121, Ho Sin Hang Engineering Building, CUHK 
Abstract
:
We
consider the problem of finding a ranking of a set
of elements that is "closest to" a given set of input
rankings of the elements; more precisely, we want
to find a permutation that minimizes the Kendalltau
distance to the input rankings, where the Kendalltau
distance is defined as the sum over all input rankings
of the number of pairs of elements that are in a different
order in the input ranking than in the output ranking.
If the input rankings are permutations, this problem
is known as the Kemeny rank aggregation problem. This
problem arises for example in building metasearch
engines for Web search, aggregating viewers' rankings
of movies, or giving recommendations to a user based
on several different criteria, where we can think
of having one ranking of the alternatives for each
criterion. Many of the approximation algorithms and
heuristics that have been proposed in the literature
are either positional, comparison sort or local search
algorithms. The rank aggregation problem is a special
case of the (weighted) feedback arc set problem, but
in the feedback arc set problem we use only information
about the preferred relative ordering of pairs of
elements to find a ranking of the elements, whereas
in the case of the rank aggregation problem, we have
additional information in the form of the complete
input rankings. The positional methods are the only
algorithms that use this additional information. Since
the rank aggregation problem is NPhard, none of these
algorithms is guaranteed to find the optimal solution,
and different algorithms will provide different solutions.
We give theoretical and practical evidence that a
combination of these different approaches gives algorithms
that are superior to the individual algorithms. Theoretically,
we give lower bounds on the performance for many of
the "pure" methods. Practically, we perform an extensive
evaluation of the "pure" algorithms and combinations
of different approaches. We give three recommendations
for which (combination of) methods to use based on
whether a user wants to have a very fast, fast or
reasonably fast algorithm.
(joint
work with Anke van Zuylen)
Biography
:
Frans
Schalekamp (PhD 2007, Cornell University) is a Postdoctoral
Researcher at the Institute for Theoretical Computer
Science at Tsinghua University, Beijing. During the
final stages of writing and after finishing his thesis
under the supervision of David Shmoys, he worked at
a Computational Biology startup company, Nature Source
Genetics (NSG) in Ithaca, of which he was the first
employee. At NSG his work focused on the design of
algorithms to determine populations with good properties
for commercial breeding programs. His research interests
are in Combinatorial Optimization, Algorithms, Computational
Biology, Applied Optimization and Simulation. 