Distances between points in high-dimensional spaces

See page 39 in the CIML book, Chap 3. This notebook builds off those experiments.

What is a typical distance between two random points?

Simple experiment: we'll generate several random n-dimensional points, and compute the distances between every pair.

Question 1: How many unique pairs of points are there?

In [11]:
import numpy as np
from scipy.spatial.distance import pdist
import matplotlib.pyplot as plt
%matplotlib inline

def pair_distances_randompoints(n, metric):
    """generates random n-dimensional vectors,
    plots histogram of distances between all the pairs of vectors"""
    data = np.random.rand(200, n)  # random points with co-ordinates in [0, 1] as row vectors
    pairwise = pdist(data, metric)  # pdist computes all-pairs distances
    plt.figure()
    plt.hist(pairwise, 50)
    plt.xlabel('Distance')
    plt.ylabel('Number of Pairs')
    plt.title('{0} distances in {1}-dim space. Mean={2:.2f}. Variance/Mean={3:.3f}'.format(metric,
                                                                    n,
                                                                    np.mean(pairwise),
                                                                   np.var(pairwise)/np.mean(pairwise)))
for n in [1, 2, 3, 4, 10, 10000]:
    pair_distances_randompoints(n, 'euclidean')
In [ ]: