Mergeable Summaries

Ke Yi, HKUST, Hong Kong

Abstract:
This talk is about the mergeability of data summaries. Informally speaking, mergeability requires that, given two summaries on two data sets, there is a way to merge the two summaries into a single summary on the two data sets combined together, while preserving the error and size guarantees. This property means that the summaries can be merged in a way akin to other algebraic operators such as sum and max, which is especially useful for computing summaries on massive distributed data. Several data summaries are trivially mergeable by construction, most notably all the sketches that are linear functions of the data sets. But some other fundamental ones like those for heavy hitters and quantiles, are not (known to be) mergeable. In this talk, we will see how these summaries are indeed mergeable or can be made mergeable after appropriate modiļ¬cations.

Biography:
Ke Yi is an Associate Professor at Hong Kong University of Sciecne and Technology. He belongs to both the Theoretical Computer Science group and the Database group at HKUST.He received his B.E. from Tsinghua University and Ph.D. from Duke University, in 2001 and 2006 respectively, both in computer science. His research interests include Algorithms on big data; databases; data summarization; algorithms on distributed data; data streams; external memory algorithms; data structures; computational geometry.