Bulk Johnson-Lindenstrauss Lemmas
64th ISI World Statistics Congress - Ottawa, Canada
Format: CPS Abstract
Keywords: dimensionreduction, high-dimensional
Session: CPS 59 - Statistical methodology VI
Tuesday 18 July 5:30 p.m. - 6:30 p.m. (Canada/Eastern)
For a set X of N points in D-dimensional space, the Johnson-Lindenstrauss lemma provides random linear maps that approximately preserve all pairwise distances in X -- up to relative error epsilon with high probability -- using a target dimension of O(log(N)/epsilon^2). Certain known point sets actually require a target dimension this large -- any smaller dimension forces at least one distance to be stretched or compressed too much. What happens to the remaining distances? If we only allow a fraction eta of the distances to be distorted beyond relative error epsilon, we show a target dimension of O(log(4e/eta)log(N)/(epsilon^2 R)) is sufficient for the remaining distances. With the stable rank of a matrix as the squared ratio of the Frobenius norm to the operator norm, the parameter R is the minimal stable rank over certain log(N) sized subsets of the difference set X-X or its unit normalized version, involving each point of X exactly once. The linear maps may be taken as random matrices with i.i.d. zero-mean unit-variance sub-gaussian entries. When the data is sampled i.i.d. as a given random vector xi, refined statements are provided; the most improvement happens when the unit normalized hat(xi-xi') is isotropic, with xi' an independent copy of xi, and includes the case of i.i.d. coordinates.