The Curse of Dimensionality, Part 2

Sublime Curiosity

A long time ago, I wrote a post about the “curse of dimensionality”, which is a big problem for statisticians and machine-learning scientists. The basic problem is this (see this page for a more detailed explanation by somebody who actually knows what they’re talking about): Say you’re analyzing a population of people, and you want to see if there are any correlations between their height, weight, and age. Those are three independent variables, so your data is three-dimensional. Let’s say you measure height to the nearest centimeter, weight to the nearest kilogram, and age to the nearest year. If you use reasonable ranges for the parameters (height between 10 cm and 200 cm, weight between 0 and 200 kg, and age between 0 and 120 years), there are a lot of possible values. 4,800,000, to be exact. But, even with the parameters varying wildly, it’s not too hard to sample…

