Monday, April 7, 2014

Big Data Done Wrong?

An Op-Ed piece in today's New York Times by Gary Marcus and Ernest Davis present Who's Bigger as the seventh of eight (or nine) problems with Big Data, specifically "giving scientific-sounding solutions to hopelessly imprecise questions".  They acknowledge that we get many things right, but complain about "egregious errors".

But guys: given a 379 page book with thousands of rankings to pick from, your killer example is that we ranked Francis Scott Key at position 19 on the poets list?   If they don't have a complaint until position 19 on one of several dozens of tables in our book, well, we must be doing pretty darn good.

But their chosen example is illuminating, because it gets to the heart of what our rankings are and are not designed to do.  Our book carefully claims to measure "historical significance" or "meme strength", not "importance" as they insist on misrepresenting in the article.

So how historically durable will the Francis Scott Key meme be, say 100 years from now?   If there is still a United States stuck with the same national anthem (I'd take that bet), then we can be pretty certain the Marcus and Davis great-great-great-grandchildren will learn Key's words and the story behind his work.

"Oh say can you see?"  Only if you are willing to look at what data is actually trying to tell you.