Wednesday, April 16, 2014

Ranking the Pantheon

A previous post described the MIT Pantheon, another project which used Wikipedia data to rank historical figures.   We (meaning Charles, of course) extracted their rankings and matched them to our historical significance rankings, so we could compare them.   There is some subtlety in algorithmic name matching, such as determining whether our "Jesus" is the same person as their "Jesus Christ", but we succeeded in matching 10,116 of the Pantheon names to our Who's Bigger rankings.  This is roughly 90% of the total, providing a reasonable basis for comparison.

First off: it is clear that there is substantial agreement among our placement of historical figures, with a Spearman rank correlation of 0.65 between us and them.   Both sets of rankings incorporate aging as part of the methodology, so much of this agreement rests on our preferences for the tried and true.  The Who's Bigger rankings of these figures have a rank correlation of 0.58 with year-of-birth (older historical figures being more highly ranked), while the comparable number is 0.53 with Pantheon.

More revealing is to look at the extremes: the figures whom we assign very different ranks from them.   In particular, we computed the difference between our ranks (Pantheon - us) and present the figures with the largest and smallest differences.   This is not a perfect statistic, since Pantheon ranks less than 12,000 people while our numbers go well above 800,000.   But it is revealing none the less.

Diff       Panrank  BigRank       Name                                    Who's Dat?
10120     10521       401        'John Marshall'                            Chief Justice of the US Supreme Court
10058     11184     1126        'Donald Bradman'                       Great Cricket champion
10027     10823       796        'William H. Seward'                    U.S. Secretary of State (bought Alaska)
9963       11077     1114        'Gough Whitlam'                         Australian Prime Minister
9933       10812       879        'John Churchill 1st Duke of Marlborough'     English Statesman
9915       10802       887        'George Washington Carver'       African-American Inventor
9886       10405       519        'Tipu Sultan'                                Ruler of the Kingdom of Mysore
9735       10146       411        'John Jay'                                     Early U.S. Statesman
9536        9935        399         'John C. Calhoun'                       U.S. Senator /VP (nullification)
9454        9886        432         'Susan B. Anthony'                     U.S. Suffragist (women's right to vote)
9439      11243      1804         'Alexander Mackenzie'                Second Prime Minister of Canada
9243      10064        821         'Abigail Adams'                           Wife of President John Adams
9215      10729      1514         'Robert Menzies'                          Longest serving Australian Prime Min.
9207      10917      1710         'Robert Byrd'                               Long-serving U.S. Senator
9175      10406      1231         'Sojourner Truth'                         African-American abolitionist
9171      10562      1391         'Lucille Ball'                                TV Comedian (I Love Lucy)
9171        9330        159         'John A. Macdonald'                   First Prime Minister of Canada
9165      10466      1301         'Edmund Barton'                         First Prime Minister of Australia
9130      10318      1188         'Mary Todd Lincoln'                   Wife of President Abraham Lincoln
9008      10086      1078         'Svetlana Kuznetsova'                 Russian tennis star

Almost all of these figures are from the English-speaking world: United States, Canada, Australia, Great Britain.   It is no surprise that our methods (which only analyze the English language Wikipedia) generally rank these people higher than Pantheon (which analyzes editions from all languages).  I personally recognize 14 of the twenty names here, and think they are generally quite Big, although I cringe a bit where some of our rankings are clearly too high (particularly Sultan and Kuznetsova).

The major American figures here are generally from the 19th century, which makes sense given the difference between our aging model and the one employed in Pantheon (full disclosure: Pantheon has recently changed its rankings, and what we have here may not be their current rankings).   In particular, our rankings have fully discounted a historical figure 160 years after birth, while they continued historical discounting arbitrarily far into the past).   Thus 19th century figures have generally achieved steady state by our analysis, so we value them relatively higher than Pantheon would.

The other side of the coin are the people who Pantheon ranks very much higher than we do.   The figures below all ranked in the bottom half of Wikipedia figures by our analysis, yet were identified by Pantheon among the 12,000 most interesting figures for analysis:

Diff            Panrank  BigRank       Name                                    Who's Dat?
-472241      8052         480293  'Alexandra Stan'                       Romanian singer and model
-484757      11086      495843  'Serge Haroche'                         French Nobel Prize winner in Physics, 2012
-493874      9471        503345  'Lola Pagnani'                            Italian actress
-495688      11148      506836  'Stephane Lannoy'                    French soccer referee
-497360      10133      507493  'Olivier Giroud'                         French soccer player
-517354      11160      528514  'Wouter Weylandt'                   Belgian professional cyclist killed in 2011
-525449      9576        535025  'Nathalia Dill'                            Brazilian television actress
-525475      10601      536076  'Milos Zeman'                           Current president of the Czech Republic
-525633      11232      536865  'David J. Wineland'                  Nobel Prize winner in Physics, 2012
-526148      10774      536922  'Gianluca Ramazzotti'             Italian singer-songwriter
-555909      11029      566938  'Linda Maria Baros'                 Contemporary French poet
-558970      10942      569912  'Jules A. Hoffmann'                  French Nobel Prize winner in Medicine, 2011
-573789      11144      584933  'Pastora Soler'                           Spanish Eurovision singer
-581161      11286      592447  'Sun Yang'                                Chinese Olympic swimmer
-601660      10310      611970  'Kevin Grobkreutz'                 German soccer player
-607491      11318      618809  'Missy Franklin'                      American Olympic Swimmer, 2012
-613223      11278      624501  'Brian Kobilka'                        American Nobel Prize winner in Medicine, 2011
-632278      11224      643502  'Lobsang Sangay'                    Prime minister in exile for Tibet
-685152      10556      695708  'Bernice Bejo'                          French-Argentine actress
-689256      11296      700552  'Vaclav Pilar'                            Czech soccer player
-693543       9577       703120  'Raphael Varane'                      French soccer player
-717448       11231     728679  'Ludmilla Radchenko'            Russian model and active 
-751460       10907     762367  'Anton Lamazares'                   Contemporary Spanish painter
-803441       11270      814711  'Petr Jiracek'                             Czech soccer player

These people are generally Europeans, who have the easiest time rising to the Pantheon Wikipedia language threshold.   They are also all very contemporary figures, many of who achieved their greatest renown for achievements occurring after the Wikipedia edition we analyzed in our rankings (October 11, 2010), so presumably they would be ranked somewhat higher if we reran our analysis today.

However, I personally only recognized one name here, and it required some prompting. Bernice Bejo was the lead actress in "The Artist" which, by the way, was a wonderful picture.   These people would generally not be in my 12,000 most significant (or famous) historical figures, but Pantheon's objectives are somewhat different than ours.   My guess is the both groups are content with our ranking differences given our different motivations.