通知公告
网站首页 >  通知公告
讲座信息:《大数据中的近似搜索技术研究》

发布日期:2014-12-23  访问量:

标题: Locality Sensitive Hashing for Big Data


报告时间:2014 年 12月 26日 下午三点到四点半 信息楼四楼 学术报告厅


摘要:

    Finding (approximate) nearest neighbor is a fundamental problem in computational geometry, and plays an important role in many database, multimedia, and machinelearning problems, as real-world objects are usually modelled or represented as one or several high-dimensional feature vectors. The problem is notoriously hard due to the phenomenon named "curse of dimensionality", and it becomes even more challenging in the big data era, calling for new ideas and methodologies.


    While there are numerous heuristic methods to solve this problem, methods based on Locality Sensitive Hashing (LSH) are arguble the most popular and promising approach as they have both rigorous theoretical guarantees on space and time complexities and excellent query performance in practice. However, to achieve such performance, LSH-based methods require huge amount of index. Recent work reduces the index size from O((nd)^{1.5}) to O(n log(n)) (n is the number of data points and d is the dimensionality), but the index is still typically an order of magnitude larger than the size of the data.


    In this talk, we present our recent breakthrough in this problem. We propose a novel method to solve the problem with theoretical guarantees and a tiny index. Furthermore, our method supports a variety of functionalities, such as finding the exact nearest neighbors with guaranteed probabilities. In the experiment, our methods demonstrate superior performance against the state-of-the-art LSH-based methods, and scale up well to a billion 128-dimensional points on a single commodity PC.


个人介绍:

    Dr. Wei Wang is an Associate Professor in the School of Computer Science and Engineering, The University of New South Wales, Australia. His current research interests include keyword search on (semi-)structured data, similarity query processing, high dimensional indexing, and spatial databases. He has published over ninety research papers, with many in premier database journals (TODS, VLDBJ, and TKDE) and conferences (SIGMOD, VLDB, ICDE, and WWW). He is currently  serving on the editorial boards of IEEE Transactions on Knowledge and Data Engineering (TKDE).