On the I/O Complexity of Dynamic Element Distinctness

Yufei Tao, CUHK, Hong Kong

Abstract:
In this talk, we consider the dynamic version of the {\em element distinctness problem} in external memory. The objective is to maintain a multi-set S of integers under insertions, such that at any moment the following query can be answered efficiently: are all the elements in S distinct? The problem admits two standard solutions. The first one maintains a hash structure on S, and supports both an insertion and a query in O(1) expected I/Os. The second one simply stores S in a linked list, and thus, supports an insertion in O(1/B) amortized I/Os; a query can be answered in O((N/B) \log_{M/B} (N/B)) I/Os by sorting, where N = |S|, B is the block size, and M is the memory size. We show that both of these naive solutions are already optimal within a poly-logarithmic factor. Specifically, for any Las Vegas structure using N^{O(1)} blocks, if its expected amortized insertion cost is o(1/\log B), then it must incur \Omega(\N / (B \log B)) expected I/Os answering a query in the worst case. This means that the problem is repugnant to update buffering: the achievable query cost jumps from the best possible dramatically to almost the sorting cost as soon as the insertion cost drops slightly below \Omega(1).

Biography:
Yufei Tao is a Professor in the Department of Computer Science and Engineering, Chinese University of Hong Kong (CUHK). Before joining CUHK in 2006, he was a Visiting Scientist at the Carnegie Mellon University during 2002-2003, and an Assistant Professor at the City University of Hong Kong during 2003-2006. He received a Hong Kong Young Scientist Award in 2002 from the Hong Kong Institution of Science. He obtained his PhD degree from Hong Kong University of Science and Technology in 2002, proudly under the supervision of Prof. Dimitris Papadias. He also holds a position of Visiting Professor, under the World Class University (WCU) program of the Korean government, in the Division of Web Science and Technology, Korea Advanced Institute of Science and Technology (KAIST), Republic of Korea. Yufei has worked extensively on indexing and query optimization in spatial and temporal databases. His current research aims to leverage the theory of data structures and computational geometry to develop practical algorithms with non-trivial theoretical guarantees, particularly in dealing with massive datasets that do not fit in memory.