One of the cool things about big data is that it lets you see every trend, whether you expect it or not. I remember the old hunch reports that would show interesting anomalies, like the fact that dog owners are 16% more likely to consider Paul McCartney their favorite Beatle. These are facts, we may not know why they are, but we know that they are. In this way a good statistical data set presents the world the way it “is” not the way it “ought to be”. Even if owning a dog “should” be unrelated to your favorite Beatle, it “is” correlated.
That’s why I was really interested to find out that Google is dumping the dictionary for auto-correct and going instead for the trends of how people search and how people put words on websites. This is interesting because spelling has always been something where “should” rules “is”. The old men in long robes (at least that’s how I think of them) at Webster’s and American Heritage tell us how we SHOULD spell words. Think of the possibilities with Google’s switch. What if I started a campaign tomorrow to get everyone to spell tomorrow with one r, “tomorow”. If I could talk a large number of influential websites and a bunch of searchers in to doing this, we could change the spelling of the word (at least according to Google). We could do this even if the entire staff of Webster’s and American Heritage disagreed with us.
Maybe it’s just because I’m a stat nerd, but I think that’s really interesting. Call it the democratization of all things.