Videos

Active Clustering and Ranking

Presenter
September 26, 2011
Keywords:
  • Clustering
MSC:
  • 91C20
Abstract
Clustering and ranking is often based on pairwise similarities (metric data) or comparisons (ordinal data). Most methods assume that the entire collection of all possible pairwise similarities or comparisons are known, but in high-dimensional settings there may be missing data and/or the costs of collecting this information may be prohibitive. The existence of clusters or other intrinsic low-dimensional structure can induce redundancy in these data, and therefore it may be possible to robustly cluster or rank based on a small subset of the data. I will discuss two results that demonstrate this phenomenom. First, I will describe a clustering procedure that generates a hierarchical clustering of N objects using only N log N similarities, instead of all N(N-1)/2 similarities. The method can recover all clusters of size larger than log N, even in the presence of a limited fraction of arbitrarily corrupted or noisy similarities. Second, if N objects are embedded in d-dimensions (d