Being put in the Smithsonian Institution is perhaps the the most prestigious destination possible for any material object. The flag that is the star-spangled banner is in the Smithsonian. The Hope Diamond is in the Smithsonian. The ruby slippers Judy Garland wore in "The Wizard of Oz" is in the Smithsonian.
Now I can proudly say that I, too, am in The Smithsonian.
In particular, Smithsonian magazine is running a special issue on ``The 100 Most Significant Americans of All Time", with their rankings powered by our book Who's Bigger. It was an interesting exercise to come up with these rankings, because it is a challenge to define exactly who is an American. Did they have to be born here? Live most of their life here? Die here? Become a citizen?
The editor of this special issue (Tom Frail) broke our rankings into ten different subdomains, and provided a nice capsule biography and often-surprising picture for each of the chosen people (was Ronald Reagan really ever that young?). It is a fun read and easy entree to the Who's Bigger universe.
Monday, November 17, 2014
Thursday, November 13, 2014
Avery Fisher and his Hall
Today's news has an interesting story about how the family of businessman / donor Avery Fisher reached a settlement with the management of Lincoln Center to take his name off of the New York Philharmonic's Avery Fisher Hall to free it up for a new, presumably much larger donor.
This story resonates with me for two reasons. First, attaching your name to an important building is an excellent way to retain historical significance. Our rankings puts Avery Fisher at 199,082, meaning he ranks among the top quarter of Wikipedia figures. The company where he made his money (Fisher Radio) has long since been absorbed, and his name is no longer that of an active brand. Giving up his name on the building will condemn his fame to decline with time consistently with other mortals.
Avery Fisher in important to me because my father Morris Skiena worked for him as a radio repairman early on, at a time when Fisher had only three employees. Indeed in this 1946 Fortune Magazine article about Fisher, my father is the guy at the bench with his back to you on the lower left of page 161.
My father knew the future was television, not radio. So, by employing the business sense that Skienas are famous for, Dad left the company before Fisher hit it big to become a television repairman. I get reminded of this story every time I pass Avery Fisher Hall. I still call other city landmarks by their old, honest names: the PanAm Building, the Triborough Bridge, and the RCA building in Rockefeller Center, so I suspect it will always be Avery Fisher Hall to be regardless of which swell ultimately coughs up the dough to choose its name.
This story resonates with me for two reasons. First, attaching your name to an important building is an excellent way to retain historical significance. Our rankings puts Avery Fisher at 199,082, meaning he ranks among the top quarter of Wikipedia figures. The company where he made his money (Fisher Radio) has long since been absorbed, and his name is no longer that of an active brand. Giving up his name on the building will condemn his fame to decline with time consistently with other mortals.
Avery Fisher in important to me because my father Morris Skiena worked for him as a radio repairman early on, at a time when Fisher had only three employees. Indeed in this 1946 Fortune Magazine article about Fisher, my father is the guy at the bench with his back to you on the lower left of page 161.
My father knew the future was television, not radio. So, by employing the business sense that Skienas are famous for, Dad left the company before Fisher hit it big to become a television repairman. I get reminded of this story every time I pass Avery Fisher Hall. I still call other city landmarks by their old, honest names: the PanAm Building, the Triborough Bridge, and the RCA building in Rockefeller Center, so I suspect it will always be Avery Fisher Hall to be regardless of which swell ultimately coughs up the dough to choose its name.
Thursday, September 18, 2014
I was a Rebel at the Wisest Place on Earth
On Labor Day I gave my Who's Bigger talk to students and faculty at the University of Virginia College in Wise, Virginia as part of their ``Digitial Rebel'' series. Thanks to Daniel Ray and the rest of their faculty for very gracious hosting. It was an interesting experience, and from their questions I can safely assert that I never spoke to a Wiser audience.
This visit was particularly meaningful to me because I received my undergraduate degree at the University of Virginia in Charlottesville, which serves as the mother ship to Wise. It was nice to see connections in the names of the sport teams (the Cavaliers), the school logo (a representation of Thomas Jefferson [10]'s Rotunda building at UVa, and even an architectural remnant left after the 1895 Rotunda fire.
Wise is a small town is in the Appalachian Mountains not far from the corner where Virginia meets Tennessee and Kentucky. Life there appears quite different than living in Manhattan, but has its own advantages. Driving around town, I noticed a sign proclaiming actor George C. Scott [12170] as a local product. Our rankings mark him as the biggest of the Wise men.
Monday, July 28, 2014
Star Sighting: Who's Bigger at Cafe Boulud?
My wife and I took advantage of a beautiful New York evening with kids away at camp to dine tonight al fresco at Cafe Boulud, one of the nicest restaurants in the city. We tried to take advantage of Restaurant Week prices to learn they apply only to lunch, but that is not the story I am trying to tell.
As we finished our entree, we were surprised to see television talk show host Charlie Rose sit down at the table next to us. And then at the next table sat down William Goldman, the Academy Award-winning screenwriter of Butch Cassidy and the Sundance Kid, and countless other prominent films. They chatted together like old friends, as perhaps they are.
This kind of thing doesn't happen to us much, but I couldn't help wondering: Who's Bigger? Through the miracle of cell phone technology I looked it up. They are amazingly similar. Our algorithms rank Charlie Rose at 13,760, a bit ahead of William Goldman at 15,213.
I would like to report that I told them their rankings, which they found fascinating, and that I am now booked to appear on the next Charlie Rose Show. The truth is, I behaved like a proper New Yorker and let them eat.
I guess I am better at network algorithms like PageRank than networking. At least I hope so.
As we finished our entree, we were surprised to see television talk show host Charlie Rose sit down at the table next to us. And then at the next table sat down William Goldman, the Academy Award-winning screenwriter of Butch Cassidy and the Sundance Kid, and countless other prominent films. They chatted together like old friends, as perhaps they are.
This kind of thing doesn't happen to us much, but I couldn't help wondering: Who's Bigger? Through the miracle of cell phone technology I looked it up. They are amazingly similar. Our algorithms rank Charlie Rose at 13,760, a bit ahead of William Goldman at 15,213.
I would like to report that I told them their rankings, which they found fascinating, and that I am now booked to appear on the next Charlie Rose Show. The truth is, I behaved like a proper New Yorker and let them eat.
I guess I am better at network algorithms like PageRank than networking. At least I hope so.
The Biggest Americans (The Atlantic)
Identifying the most historically significant figures in American History is a natural question for our analysis methods. Indeed, our rankings will be used to fuel a special issue of The Smithsonian magazine this fall on the top figures in American history. Look for details to come in a future post.
But here we react to a special issue on the 100 most influential figures in American history under the aegis of Ross Douthat, which appeared in the The Atlantic Monthly in December 2006. Their methodology was based a historian poll, where rankings from ten historians were combined into a single consensus ranking. This inspires the obvious question of how our top 100 Americans compare to the Atlantic's choices.
To proceed, one must move past the definitional issues of who qualifies as American. Is an explorer like Columbus American? A naturalized citizen like Albert Einstein? Someone born in the U.S. who established themselves elsewhere, like the poet T.S. Eliot? Our opinions on these matters are, respectively no, yes, and no, to be broadly consistent with the Atlantic.
How did the historians do? Pretty well, since there is great overlap between our rankings and theirs. Fully fifty of The Atlantic's top 100 rank in our top 100, with another twenty listed in our second hundred candidates. Our top three figures are exactly their top three figures (Lincoln, Washington, and Jefferson), in the same order. The rank correlation between the orders we rank their top 100 is 0.654, demonstrated by the dot plot below:
Perhaps the most important difference between our two rankings is how we deal with the Presidents of the United States. Our rankings put 41 of the 43 men to serve as president in our 100 most significant historical figures (our apologies to Jimmy Carter and Chester Alan Arthur, who just got nosed out). By contrast, only 17 presidents made the Atlantic's top 100 Americans.
We think this reflects a clear editorial judgement on their part: it seems less interesting for readers when half your list is stuffed with presidents. Ten of their top 20 Americans were presidents, yet only two of the men ranked 51 to 100 (including Richard Nixon sitting provocatively at 99). One can hear the summons for diversity and controversy (Ralph Nader?) affecting the historian's better judgement. My guess is that the historians were confronted with a pre-selected group of figures, who they generally ordered in a sensible manner.
Here are the ten Atlantic figures with the weakest Who's Bigger rankings. Three are journalist/media figures (Gallup, Bennett, and Lippmann), while three others are scientists (Salk, Watson, and Mead):
By contrast, here are the ten highest ranking Americans missing from the Atlantic:
Name WB
Ranking
George_W._Bush 36 (our algorithm's most-regretted ranking)
Edgar_Allan_Poe 54
John_F._Kennedy 71
Nikola_Tesla 93
Grover_Cleveland 98
Andrew_Johnson 105
Barack_Obama 111 (admittedly, elected after the Atlantic article)
Bill_Clinton 115
Madonna 121
Bob_Dylan 130
These might not personally all be my choices for the ten most historically-significant missing Americans. But I have no doubt that I would put our team ahead of the Atlantic's in any game of Who's Bigger.
But here we react to a special issue on the 100 most influential figures in American history under the aegis of Ross Douthat, which appeared in the The Atlantic Monthly in December 2006. Their methodology was based a historian poll, where rankings from ten historians were combined into a single consensus ranking. This inspires the obvious question of how our top 100 Americans compare to the Atlantic's choices.
To proceed, one must move past the definitional issues of who qualifies as American. Is an explorer like Columbus American? A naturalized citizen like Albert Einstein? Someone born in the U.S. who established themselves elsewhere, like the poet T.S. Eliot? Our opinions on these matters are, respectively no, yes, and no, to be broadly consistent with the Atlantic.
How did the historians do? Pretty well, since there is great overlap between our rankings and theirs. Fully fifty of The Atlantic's top 100 rank in our top 100, with another twenty listed in our second hundred candidates. Our top three figures are exactly their top three figures (Lincoln, Washington, and Jefferson), in the same order. The rank correlation between the orders we rank their top 100 is 0.654, demonstrated by the dot plot below:
Perhaps the most important difference between our two rankings is how we deal with the Presidents of the United States. Our rankings put 41 of the 43 men to serve as president in our 100 most significant historical figures (our apologies to Jimmy Carter and Chester Alan Arthur, who just got nosed out). By contrast, only 17 presidents made the Atlantic's top 100 Americans.
We think this reflects a clear editorial judgement on their part: it seems less interesting for readers when half your list is stuffed with presidents. Ten of their top 20 Americans were presidents, yet only two of the men ranked 51 to 100 (including Richard Nixon sitting provocatively at 99). One can hear the summons for diversity and controversy (Ralph Nader?) affecting the historian's better judgement. My guess is that the historians were confronted with a pre-selected group of figures, who they generally ordered in a sensible manner.
Here are the ten Atlantic figures with the weakest Who's Bigger rankings. Three are journalist/media figures (Gallup, Bennett, and Lippmann), while three others are scientists (Salk, Watson, and Mead):
Name | Atlantic Ranking | WB Ranking | Norm A Rank | Norm WB Rank |
George_Gallup | 83 | 29736 | 82 | 100 |
James_Gordon_Bennett,_Sr. | 69 | 19473 | 68 | 99 |
Benjamin_Spock | 88 | 8638 | 87 | 98 |
Walter_Lippmann | 90 | 5854 | 89 | 97 |
Betty_Friedan | 78 | 5553 | 77 | 96 |
Lyman_Beecher | 92 | 5203 | 91 | 95 |
Sam_Walton | 73 | 4923 | 72 | 94 |
Jonas_Salk | 34 | 3775 | 33 | 93 |
James_D._Watson | 68 | 3619 | 67 | 92 |
Margaret_Mead | 82 | 3025 | 81 | 91 |
By contrast, here are the ten highest ranking Americans missing from the Atlantic:
Name WB
Ranking
George_W._Bush 36 (our algorithm's most-regretted ranking)
Edgar_Allan_Poe 54
John_F._Kennedy 71
Nikola_Tesla 93
Grover_Cleveland 98
Andrew_Johnson 105
Barack_Obama 111 (admittedly, elected after the Atlantic article)
Bill_Clinton 115
Madonna 121
Bob_Dylan 130
These might not personally all be my choices for the ten most historically-significant missing Americans. But I have no doubt that I would put our team ahead of the Atlantic's in any game of Who's Bigger.
Wednesday, April 23, 2014
Re-ranking the Pantheon
At the suggestion of Cesar Hidalgo, the leader of the Pantheon project, we repeated our previous analysis restricted to the top 1000 people in the Pantheon rankings. This better captures the people their rankings think are important, so differences in our relative rankings become more meaningful.
First we look at the people from this pool who our methods rank higher than Pantheon. By definition, all of these people will be highly regarded by both of our rankings. It is clear that we favor American and British leaders higher than they do, because we analyze only the English Wikipedia :
860 907 47 Woodrow Wilson U.S. President
841 996 155 Edward I of England British King
776 961 185 Leonhard Euler Mathematician
674 697 23 Theodore Roosevelt U.S. President
634 799 165 John Milton British Poet/Philosopher
600 985 385 Alexander II of Russia Russian Czar
583 789 206 Edward VI of England British King
556 666 110 Dwight D. Eisenhower U.S. President
553 970 417 John Dewey American Educator
550 954 404 Alexander I of Russia Russian Czar
542 636 94 Harry S. Truman U.S. President
539 654 115 Bill Clinton U.S. President
538 889 351 Francis I of France French King
536 936 400 Soren Kierkegaard Danish Philosopher
530 563 33 Charles Dickens British Writer
524 594 70 William the Conqueror British King
509 815 306 Jacques Cartier French explorer of America
505 742 237 Henry IV of France French King
503 677 174 Geoffrey Chaucer British Writer
498 616 118 Lewis Carroll British Writer
495 762 267 Alfred the Great British King
486 962 476 Eleanor of Aquitaine French/British Queen Consort
446 809 363 George H. W. Bush U.S. President
442 983 541 Archduke Franz Ferdinand Proximate cause of WWI
441 900 459 John Wayne U.S. actor and "Duke"
439 545 106 Alexander Graham Bell Inventor of the telephone
First we look at the people from this pool who our methods rank higher than Pantheon. By definition, all of these people will be highly regarded by both of our rankings. It is clear that we favor American and British leaders higher than they do, because we analyze only the English Wikipedia :
860 907 47 Woodrow Wilson U.S. President
841 996 155 Edward I of England British King
776 961 185 Leonhard Euler Mathematician
674 697 23 Theodore Roosevelt U.S. President
634 799 165 John Milton British Poet/Philosopher
600 985 385 Alexander II of Russia Russian Czar
583 789 206 Edward VI of England British King
556 666 110 Dwight D. Eisenhower U.S. President
553 970 417 John Dewey American Educator
550 954 404 Alexander I of Russia Russian Czar
542 636 94 Harry S. Truman U.S. President
539 654 115 Bill Clinton U.S. President
538 889 351 Francis I of France French King
536 936 400 Soren Kierkegaard Danish Philosopher
530 563 33 Charles Dickens British Writer
524 594 70 William the Conqueror British King
509 815 306 Jacques Cartier French explorer of America
505 742 237 Henry IV of France French King
503 677 174 Geoffrey Chaucer British Writer
498 616 118 Lewis Carroll British Writer
495 762 267 Alfred the Great British King
486 962 476 Eleanor of Aquitaine French/British Queen Consort
446 809 363 George H. W. Bush U.S. President
442 983 541 Archduke Franz Ferdinand Proximate cause of WWI
441 900 459 John Wayne U.S. actor and "Duke"
439 545 106 Alexander Graham Bell Inventor of the telephone
Still, these are figures who are generally quite familiar to me: I've heard of all of them, although I would not be confident in my ability to tell one Alexander from the other. By contrast, there are several figures among the ones they rank much higher than we do who I could not place, or place as celebrities more than historical figures:
-7960 673 8633 Justin Bieber Teenaged popular singer
-8008 943 8951 Haruki Murakami Japanese novelist
-8460 850 9310 Carus Short-ruling Roman Emperor
-8463 765 9228 Antisthenes Greek Philosopher
-8601 880 9481 Jenna Jameson American porn star
-8630 734 9364 Anacreon Greek Poet
-8746 363 9109 Anaximenes of Miletus Greek Philosopher
-8836 352 9188 James son of Alphaeus One of Jesus' twelve apolstles
-8932 919 9851 Polykleitos Greek sculptor
-9008 934 9942 Lysippos Greek sculptor
-9674 851 10525 Carinus Roman Emperor with Carus (above)
-9866 671 10537 Hor-Aha Egyptian Pharaoh
-10628 920 11548 Kaka Brazilian soccer player
-10696 775 11471 Orhan Pamuk Turkish novelist
-11153 839 11992 Abu Nuwas Classical Arabic poet
-11722 906 12628 Trebonianus Gallus Short-ruling Roman Emperor
-11771 560 12331 Praxiteles Greek sculptor
-11834 368 12202 Vitellius Very short-ruling Roman Emperor
-13291 607 13898 Gaius Maecenas Roman political advisor
-14507 701 15208 Milan Kundera Contemporary Czech novelist
-14571 843 15414 Emir Kusturica Bosnian filmmaker
-16783 610 17393 Paulo Coelho Brazilian novelist
-19060 820 19880 Monica Bellucci Italian actress and model
-21652 737 22389 Francois Villon French poet of the Middle Ages
-22604 974 23578 Pedro Almodovar Spanish Film director
-22754 935 23689 Quintillus Short-lived Roman Emperor
-26427 963 27390 Jean Reno French actor
This roster makes clear the differences in our models for aging historical reputations. About half of these historically-overvalued people are relatively minor figures from ancient times: short-lived Emperors and second-tier philosophers/poets/artists. Many of the rest are contemporary celebrities who don't really belong in anyone's top thousand historical figures, like porn star Jenna Jameson.
There are also a few international artists of real stature (including Orhan Pamuk, Milan Kundera, and Pedro Almodvar) who might be undervalued by the English Wikipedia relative to international editions. Still, I think our rankings place them in the right order of magnitude.
Wednesday, April 16, 2014
Ranking the Pantheon
A previous post described the MIT Pantheon, another project which used Wikipedia data to rank historical figures. We (meaning Charles, of course) extracted their rankings and matched them to our historical significance rankings, so we could compare them. There is some subtlety in algorithmic name matching, such as determining whether our "Jesus" is the same person as their "Jesus Christ", but we succeeded in matching 10,116 of the Pantheon names to our Who's Bigger rankings. This is roughly 90% of the total, providing a reasonable basis for comparison.
First off: it is clear that there is substantial agreement among our placement of historical figures, with a Spearman rank correlation of 0.65 between us and them. Both sets of rankings incorporate aging as part of the methodology, so much of this agreement rests on our preferences for the tried and true. The Who's Bigger rankings of these figures have a rank correlation of 0.58 with year-of-birth (older historical figures being more highly ranked), while the comparable number is 0.53 with Pantheon.
More revealing is to look at the extremes: the figures whom we assign very different ranks from them. In particular, we computed the difference between our ranks (Pantheon - us) and present the figures with the largest and smallest differences. This is not a perfect statistic, since Pantheon ranks less than 12,000 people while our numbers go well above 800,000. But it is revealing none the less.
Diff Panrank BigRank Name Who's Dat?
10120 10521 401 'John Marshall' Chief Justice of the US Supreme Court
10058 11184 1126 'Donald Bradman' Great Cricket champion
10027 10823 796 'William H. Seward' U.S. Secretary of State (bought Alaska)
9963 11077 1114 'Gough Whitlam' Australian Prime Minister
9933 10812 879 'John Churchill 1st Duke of Marlborough' English Statesman
9915 10802 887 'George Washington Carver' African-American Inventor
9886 10405 519 'Tipu Sultan' Ruler of the Kingdom of Mysore
9735 10146 411 'John Jay' Early U.S. Statesman
9536 9935 399 'John C. Calhoun' U.S. Senator /VP (nullification)
9454 9886 432 'Susan B. Anthony' U.S. Suffragist (women's right to vote)
9439 11243 1804 'Alexander Mackenzie' Second Prime Minister of Canada
9243 10064 821 'Abigail Adams' Wife of President John Adams
9215 10729 1514 'Robert Menzies' Longest serving Australian Prime Min.
9207 10917 1710 'Robert Byrd' Long-serving U.S. Senator
9175 10406 1231 'Sojourner Truth' African-American abolitionist
9171 10562 1391 'Lucille Ball' TV Comedian (I Love Lucy)
9171 9330 159 'John A. Macdonald' First Prime Minister of Canada
9165 10466 1301 'Edmund Barton' First Prime Minister of Australia
9130 10318 1188 'Mary Todd Lincoln' Wife of President Abraham Lincoln
9008 10086 1078 'Svetlana Kuznetsova' Russian tennis star
Diff Panrank BigRank Name Who's Dat?
First off: it is clear that there is substantial agreement among our placement of historical figures, with a Spearman rank correlation of 0.65 between us and them. Both sets of rankings incorporate aging as part of the methodology, so much of this agreement rests on our preferences for the tried and true. The Who's Bigger rankings of these figures have a rank correlation of 0.58 with year-of-birth (older historical figures being more highly ranked), while the comparable number is 0.53 with Pantheon.
More revealing is to look at the extremes: the figures whom we assign very different ranks from them. In particular, we computed the difference between our ranks (Pantheon - us) and present the figures with the largest and smallest differences. This is not a perfect statistic, since Pantheon ranks less than 12,000 people while our numbers go well above 800,000. But it is revealing none the less.
Diff Panrank BigRank Name Who's Dat?
10120 10521 401 'John Marshall' Chief Justice of the US Supreme Court
10058 11184 1126 'Donald Bradman' Great Cricket champion
10027 10823 796 'William H. Seward' U.S. Secretary of State (bought Alaska)
9963 11077 1114 'Gough Whitlam' Australian Prime Minister
9933 10812 879 'John Churchill 1st Duke of Marlborough' English Statesman
9915 10802 887 'George Washington Carver' African-American Inventor
9886 10405 519 'Tipu Sultan' Ruler of the Kingdom of Mysore
9735 10146 411 'John Jay' Early U.S. Statesman
9536 9935 399 'John C. Calhoun' U.S. Senator /VP (nullification)
9454 9886 432 'Susan B. Anthony' U.S. Suffragist (women's right to vote)
9439 11243 1804 'Alexander Mackenzie' Second Prime Minister of Canada
9243 10064 821 'Abigail Adams' Wife of President John Adams
9215 10729 1514 'Robert Menzies' Longest serving Australian Prime Min.
9207 10917 1710 'Robert Byrd' Long-serving U.S. Senator
9175 10406 1231 'Sojourner Truth' African-American abolitionist
9171 10562 1391 'Lucille Ball' TV Comedian (I Love Lucy)
9171 9330 159 'John A. Macdonald' First Prime Minister of Canada
9165 10466 1301 'Edmund Barton' First Prime Minister of Australia
9130 10318 1188 'Mary Todd Lincoln' Wife of President Abraham Lincoln
9008 10086 1078 'Svetlana Kuznetsova' Russian tennis star
Almost all of these figures are from the English-speaking world: United States, Canada, Australia, Great Britain. It is no surprise that our methods (which only analyze the English language Wikipedia) generally rank these people higher than Pantheon (which analyzes editions from all languages). I personally recognize 14 of the twenty names here, and think they are generally quite Big, although I cringe a bit where some of our rankings are clearly too high (particularly Sultan and Kuznetsova).
The major American figures here are generally from the 19th century, which makes sense given the difference between our aging model and the one employed in Pantheon (full disclosure: Pantheon has recently changed its rankings, and what we have here may not be their current rankings). In particular, our rankings have fully discounted a historical figure 160 years after birth, while they continued historical discounting arbitrarily far into the past). Thus 19th century figures have generally achieved steady state by our analysis, so we value them relatively higher than Pantheon would.
The other side of the coin are the people who Pantheon ranks very much higher than we do. The figures below all ranked in the bottom half of Wikipedia figures by our analysis, yet were identified by Pantheon among the 12,000 most interesting figures for analysis:
The major American figures here are generally from the 19th century, which makes sense given the difference between our aging model and the one employed in Pantheon (full disclosure: Pantheon has recently changed its rankings, and what we have here may not be their current rankings). In particular, our rankings have fully discounted a historical figure 160 years after birth, while they continued historical discounting arbitrarily far into the past). Thus 19th century figures have generally achieved steady state by our analysis, so we value them relatively higher than Pantheon would.
The other side of the coin are the people who Pantheon ranks very much higher than we do. The figures below all ranked in the bottom half of Wikipedia figures by our analysis, yet were identified by Pantheon among the 12,000 most interesting figures for analysis:
Diff Panrank BigRank Name Who's Dat?
-472241 8052 480293 'Alexandra Stan' Romanian singer and model
-484757 11086 495843 'Serge Haroche' French Nobel Prize winner in Physics, 2012
-493874 9471 503345 'Lola Pagnani' Italian actress
-495688 11148 506836 'Stephane Lannoy' French soccer referee
-497360 10133 507493 'Olivier Giroud' French soccer player
-517354 11160 528514 'Wouter Weylandt' Belgian professional cyclist killed in 2011
-525449 9576 535025 'Nathalia Dill' Brazilian television actress
-525475 10601 536076 'Milos Zeman' Current president of the Czech Republic
-525633 11232 536865 'David J. Wineland' Nobel Prize winner in Physics, 2012
-526148 10774 536922 'Gianluca Ramazzotti' Italian singer-songwriter
-555909 11029 566938 'Linda Maria Baros' Contemporary French poet
-558970 10942 569912 'Jules A. Hoffmann' French Nobel Prize winner in Medicine, 2011
-573789 11144 584933 'Pastora Soler' Spanish Eurovision singer
-581161 11286 592447 'Sun Yang' Chinese Olympic swimmer
-601660 10310 611970 'Kevin Grobkreutz' German soccer player
-607491 11318 618809 'Missy Franklin' American Olympic Swimmer, 2012
-613223 11278 624501 'Brian Kobilka' American Nobel Prize winner in Medicine, 2011
-632278 11224 643502 'Lobsang Sangay' Prime minister in exile for Tibet
-685152 10556 695708 'Bernice Bejo' French-Argentine actress
-689256 11296 700552 'Vaclav Pilar' Czech soccer player
-693543 9577 703120 'Raphael Varane' French soccer player
-717448 11231 728679 'Ludmilla Radchenko' Russian model and active
-751460 10907 762367 'Anton Lamazares' Contemporary Spanish painter
-803441 11270 814711 'Petr Jiracek' Czech soccer player
These people are generally Europeans, who have the easiest time rising to the Pantheon Wikipedia language threshold. They are also all very contemporary figures, many of who achieved their greatest renown for achievements occurring after the Wikipedia edition we analyzed in our rankings (October 11, 2010), so presumably they would be ranked somewhat higher if we reran our analysis today.
However, I personally only recognized one name here, and it required some prompting. Bernice Bejo was the lead actress in "The Artist" which, by the way, was a wonderful picture. These people would generally not be in my 12,000 most significant (or famous) historical figures, but Pantheon's objectives are somewhat different than ours. My guess is the both groups are content with our ranking differences given our different motivations.
These people are generally Europeans, who have the easiest time rising to the Pantheon Wikipedia language threshold. They are also all very contemporary figures, many of who achieved their greatest renown for achievements occurring after the Wikipedia edition we analyzed in our rankings (October 11, 2010), so presumably they would be ranked somewhat higher if we reran our analysis today.
However, I personally only recognized one name here, and it required some prompting. Bernice Bejo was the lead actress in "The Artist" which, by the way, was a wonderful picture. These people would generally not be in my 12,000 most significant (or famous) historical figures, but Pantheon's objectives are somewhat different than ours. My guess is the both groups are content with our ranking differences given our different motivations.
Monday, April 7, 2014
Big Data Done Wrong?
An Op-Ed piece in today's New York Times by Gary Marcus and Ernest Davis present Who's Bigger as the seventh of eight (or nine) problems with Big Data, specifically "giving scientific-sounding solutions to hopelessly imprecise questions". They acknowledge that we get many things right, but complain about "egregious errors".
But guys: given a 379 page book with thousands of rankings to pick from, your killer example is that we ranked Francis Scott Key at position 19 on the poets list? If they don't have a complaint until position 19 on one of several dozens of tables in our book, well, we must be doing pretty darn good.
But their chosen example is illuminating, because it gets to the heart of what our rankings are and are not designed to do. Our book carefully claims to measure "historical significance" or "meme strength", not "importance" as they insist on misrepresenting in the article.
So how historically durable will the Francis Scott Key meme be, say 100 years from now? If there is still a United States stuck with the same national anthem (I'd take that bet), then we can be pretty certain the Marcus and Davis great-great-great-grandchildren will learn Key's words and the story behind his work.
"Oh say can you see?" Only if you are willing to look at what data is actually trying to tell you.
But guys: given a 379 page book with thousands of rankings to pick from, your killer example is that we ranked Francis Scott Key at position 19 on the poets list? If they don't have a complaint until position 19 on one of several dozens of tables in our book, well, we must be doing pretty darn good.
But their chosen example is illuminating, because it gets to the heart of what our rankings are and are not designed to do. Our book carefully claims to measure "historical significance" or "meme strength", not "importance" as they insist on misrepresenting in the article.
So how historically durable will the Francis Scott Key meme be, say 100 years from now? If there is still a United States stuck with the same national anthem (I'd take that bet), then we can be pretty certain the Marcus and Davis great-great-great-grandchildren will learn Key's words and the story behind his work.
"Oh say can you see?" Only if you are willing to look at what data is actually trying to tell you.
Thursday, March 20, 2014
Time Magazine's College Rankings
Time Magazine has launched an interactive feature ranking colleges by the prominence of the Wikipedia pages of their living graduates. Harvard appears to be the top dog by this measure, just edging past Stony Brook (which again failed to make its way into the NCAA basketball tournament, the event which inspired Time's feature):
Their ranking methodology includes certain Wikipedia variables analogous to what we have used, including length and links in/out of the page -- which serves as a poor man's version of PageRank. But PageRank is much better for meaningful notions of importance: links into a page only matter if they are from prominent individuals, and links out have little obvious meaning except that it should be correlated strongly with article length.
The other aspect of such an analysis is properly attributing alumni to schools. The Wikipedia categories give fairly unreliable annotations, although after checking I can confirm that Pat Benatar in fact did attend Stony Brook for a year before dropping out. I guess we "hit her with our best shot".
Their ranking methodology includes certain Wikipedia variables analogous to what we have used, including length and links in/out of the page -- which serves as a poor man's version of PageRank. But PageRank is much better for meaningful notions of importance: links into a page only matter if they are from prominent individuals, and links out have little obvious meaning except that it should be correlated strongly with article length.
The other aspect of such an analysis is properly attributing alumni to schools. The Wikipedia categories give fairly unreliable annotations, although after checking I can confirm that Pat Benatar in fact did attend Stony Brook for a year before dropping out. I guess we "hit her with our best shot".
Tuesday, March 18, 2014
The Pantheon
I was reading my Sunday New York Times when my heart skipped a beat. There in the magazine was an article ``Who's More Famous than Jesus?'' which had to, just had to, be about our Who's Bigger rankings.
Well, it wasn't. A project at MIT called Pantheon was the source of the article. Pantheon also uses analysis of Wikipedia data to rank the fame of historical figures. I will confess to a little sense of Schadenfreude in reading the comments complaining about theie rankings, including:
- Their bias towards Americans in particular and the Western world in general.
- That they contain too few women in highly ranked places
- Gleefully pointing out occasional mechanical misclassifications of individuals (particularly problematic was identifying John Wayne Gacy as a comedian instead of a serial killer)
- Making too big a deal of small differences between rankings of closely matched people
- Complaining that Wikipedia is not a reliable source to analyze world culture.
This all sounded very familiar, because these comments have been made about our rankings as well.
It seems worthwhile to compare our rankings and methodology with that underlying Pantheon. There are several differences between our approaches to using Wikipedia as a resource:
- Languages -- Pantheon makes use of the multiplicity of Wikipedia language editions in its analysis. To be ranked as truly famous one must appear in at least 25 different language editions. This would make the rankings more inclusive of world opinion than our English-only analysis, although reader comments still complain about the Anglo-centric bias of the results.
- Variables -- Of the Wikipedia variables we employ in our rankings (two forms of PageRank, hits, edits, and article lengths), Pantheon only employs page hits. Thus their notion of Fame is more akin to our notion of Celebrity (which loads heavily on hits). Gravitas is the other component of historical significance, which we found loading most heavily on PageRank. Thus we would expect their rankings to over-emphasize popular culture ahead of ours.
- Corrections for Time -- Pantheon employs an exponential decay model of fame in an attempt to correct for the recency bias of fame. This overcompensates for the passage of time: six of the Pantheon top ten were ancient Greeks, with three others (Jesus, Confucius, and Julius Caesar) living 2,000 or more years ago. The most recent member of the Pantheon top ten only gets us to the Renaissance (Leonardo da Vinci). Our aging model is more sophisticated, and calibrated to appearances of names in 200 years of scanned books / Google Ngrams.
- Validation -- Their website includes an analysis of how their rankings compare to performance in three sports domains: Formula 1 racing, tennis, and swimming. Our book discusses how our rankings compare to sports statistics (particularly with respect to baseball), but we also perform a more general set of validation tests, including correlations against 35 published rankings, prices of collectables including paintings and autographs, and public opinion polls.
To their credit, their website is fun to play with and features a host of interesting visualizations.
But how good are the rankings? It is easy to cherry-pick any set of rankings for things that look weird. They name Rasmus Lerdorf (developer of the programming language PhP, who frankly I had never heard of) among their top 11,000 people, on the strength of being in more than 25 Wikipedia editions (he is actually in 31). By comparison, we have him as the 51,670th most significant figure. They rank Justin Bieber at 671 to our 8633, and Johnny Depp at 203 to our 2739, suggesting an over-emphasis of celebrity at the expense of gravitas.
But the right way to compare rankings is through validation measures. This takes work, but I hope we can do such a study soon. We will report our results here when we do.
Saturday, February 15, 2014
First Ladies: Siena rank vs. Skiena rank?
As part of a collaboration with C-SPAN, Siena Research Institute has just presented the results of its latest historian poll ranking the top American First Ladies, i.e. the wives of the presidents. They have conducted five such rankings over the past 31 years, through a process of asking experts where they rank in such categories as Background, Value to the Country, Leadership, Being her own Woman, Accomplishments, and Courage.
We constructed our own rankings of First Ladies in Who's Bigger, through Wikipedia analysis, so it is an interesting exercise to compare our rankings. Bottom line -- we come off quite well.
We agree with the poll's selection of Eleanor Roosevelt as the top first lady. In fact, six of our top ten appear among the top ten in the Siena Poll. All of our top ten rank in the top half of the 38 first ladies ranked by Siena, except for one. We regard Mary Todd Lincoln as the fifth most significant first lady, where they rank her as the 30th best. There is no contradiction here: the meme associated with Mary Todd Lincoln is of a needy, crazy woman tormenting her husband when he really had other things to deal with. She was indeed historically significant, but not in a favorable sense.
Our ranking of the ten least significant First Ladies included three Siena didn't bother to rank. Chester Arthur and Martin Van Buren were widowers when they entered the White House, so it questionable whether we should have considered their spouses at all. William Henry Harrison died after a month in office, barely leaving his wife with time to unpack. Our remaining seven slots are filled with four from Siena's bottom ten (the wives of Taylor, Pierce, Fillmore, and McKinley), with the remaining three all ranking in the bottom half of the Siena poll.
These results demonstrate the ability of our ranking methods to tease apart significance even of relatively minor historical figures (the average first lady ranks in the neighborhood of 15-20,000 or so). My suspicion is that Wikipedia-based rankings does particularly well at this task because the expert panelists probably snuck peaks at the encyclopedia to help answer the poll! I expect very few historians could keep straight the accomplishments of all the first ladies without a refresher.
We constructed our own rankings of First Ladies in Who's Bigger, through Wikipedia analysis, so it is an interesting exercise to compare our rankings. Bottom line -- we come off quite well.
We agree with the poll's selection of Eleanor Roosevelt as the top first lady. In fact, six of our top ten appear among the top ten in the Siena Poll. All of our top ten rank in the top half of the 38 first ladies ranked by Siena, except for one. We regard Mary Todd Lincoln as the fifth most significant first lady, where they rank her as the 30th best. There is no contradiction here: the meme associated with Mary Todd Lincoln is of a needy, crazy woman tormenting her husband when he really had other things to deal with. She was indeed historically significant, but not in a favorable sense.
Our ranking of the ten least significant First Ladies included three Siena didn't bother to rank. Chester Arthur and Martin Van Buren were widowers when they entered the White House, so it questionable whether we should have considered their spouses at all. William Henry Harrison died after a month in office, barely leaving his wife with time to unpack. Our remaining seven slots are filled with four from Siena's bottom ten (the wives of Taylor, Pierce, Fillmore, and McKinley), with the remaining three all ranking in the bottom half of the Siena poll.
These results demonstrate the ability of our ranking methods to tease apart significance even of relatively minor historical figures (the average first lady ranks in the neighborhood of 15-20,000 or so). My suspicion is that Wikipedia-based rankings does particularly well at this task because the expert panelists probably snuck peaks at the encyclopedia to help answer the poll! I expect very few historians could keep straight the accomplishments of all the first ladies without a refresher.
Tuesday, January 21, 2014
A Moment of Wikipedia Glory!
Charles and I were surprised and flattered to discover that Who's Bigger has officially been granted its very own Wikipedia page, in English. By repeating our computational analysis in the future, we will now be able to rigorously determine whether we are bigger than, say, the Bible. OK, maybe this is somewhat aspirational, but as the first of my five books to earn its own Wikipedia page Who's Bigger already becomes my biggest book by default.
The coolest thing is that for eight hours starting 08:00, 21 January 2014 (UTC) our book held pride of place under Did you know? on the front page of Wikipedia! Such placement matters. Statistics show that our page has been accessed 7596 times over the past thirty days, exactly 5144 of which came on January 21.
The coolest thing is that for eight hours starting 08:00, 21 January 2014 (UTC) our book held pride of place under Did you know? on the front page of Wikipedia! Such placement matters. Statistics show that our page has been accessed 7596 times over the past thirty days, exactly 5144 of which came on January 21.
Thursday, January 16, 2014
Professor, R.I.P.
What one is exposed to as a youth can have a tremendous impact on future life paths. I have spent my full working life as a college professor, but did not come from an academic family. During my youth, there was only one professor I was really aware of, and his model no doubt influenced my choice of career in ways that I am not fully aware of.
I feel moved to acknowledge the influence of Russell Johnson, the Professor on the TV show ``Gilligan's Island'', who passed away today. He never seemed constrained by disciplinary boundaries; a generalist with deep knowledge of every subject, and a flair for creating high technology items out of coconut shells. His model of the nerdy academic wandering his tropical island paradise was so compelling that maybe it helped turn me into a nerdy academic wandering my not-so-tropical, Long Island not-so-paradise...
This seems an appropriate opportunity to rank the seven stars of Gilligan's Island by their historical significance:
I feel moved to acknowledge the influence of Russell Johnson, the Professor on the TV show ``Gilligan's Island'', who passed away today. He never seemed constrained by disciplinary boundaries; a generalist with deep knowledge of every subject, and a flair for creating high technology items out of coconut shells. His model of the nerdy academic wandering his tropical island paradise was so compelling that maybe it helped turn me into a nerdy academic wandering my not-so-tropical, Long Island not-so-paradise...
This seems an appropriate opportunity to rank the seven stars of Gilligan's Island by their historical significance:
- Jim Backus (Mr. Howell) 15,374.
- Alan Hale, Jr. (The Captain) 23,841.
- Bob Denver (Gilligan) 26,025.
- Tina Louise (Ginger) 28,365.
- Dawn Wells (Mary Ann) 35,815.
- Natalie Schafer (Mrs. Howell) 43,321.
- Russell Johnson (The Professor) 48,384.
These actor rankings grossly reflect my sense of reality. Jim Backus was a genuine movie star (remember him as the father in ``Rebel without a Cause''?) who achieved his greatest cultural role as the voice of Mr. Magoo. I would have ranked Bob Denver ahead of Alan Hale, but the captain appeared in several movies prior to his role on the show. None of the four supporting actors ever had any really significant roles outside the show, a typecasting fate which seemed to strike many of the television actors of the era.
Thursday, January 2, 2014
Putting "Who's Bigger" in its Proper Place
Not all reviewers of our book "Who's Bigger" have been fully appreciative of our work. Don't worry: we will look at these in a future blog post.
But the most effective job of putting "Who's Bigger" into proper perspective was done by one Abby Skiena, age 10, the daughter of mine to whom the book was dedicated to.
On the official publication date I presented both of my children with signed copies of the book for them to forever treasure. Abby was excited enough to bring hers to school the next day for show and tell.
"Abby, did your classmates think the book was cool?," I asked when I got home.
"Kinda," she answered without enthusiasm. "But you see, I went after Caroline."
"Oh. What did Caroline have to show?"
"She just got back from Harry Potter World, with lots of souvenirs. She even passed out Every Flavor Jelly Beans for us to eat. One kid got vomit flavor..."
But the most effective job of putting "Who's Bigger" into proper perspective was done by one Abby Skiena, age 10, the daughter of mine to whom the book was dedicated to.
On the official publication date I presented both of my children with signed copies of the book for them to forever treasure. Abby was excited enough to bring hers to school the next day for show and tell.
"Abby, did your classmates think the book was cool?," I asked when I got home.
"Kinda," she answered without enthusiasm. "But you see, I went after Caroline."
"Oh. What did Caroline have to show?"
"She just got back from Harry Potter World, with lots of souvenirs. She even passed out Every Flavor Jelly Beans for us to eat. One kid got vomit flavor..."
Subscribe to:
Posts (Atom)