Monday, December 16, 2013

Other People's PageRanks

Two recent studies have been brought to our attention, both using PageRank on Wikipedia to analyze historical figures. We were previously unaware of them, and it seems of interest to report how these relate to our own work.

In Biographical Social Networks on Wikipedia - A cross-cultural study of links that made history by Aragón, Kaltenbrunner, Laniado, and Volkovich, the authors conduct a study on several graph properties (including in-degree, PageRank, and betweeness) for a large set of people pages. Particularly interesting is the fact that they performed their analysis in 15 major languages, providing a test of how the top ranked figures vary across languages.

These rosters "reveal remarkable similarities between distinct groups of language Wikipedias", which is important in countering a frequent criticism that our English-only analysis results in a grave cultural bias. The differentially ranked figures across different languages are quite interesting, but the take-home lesson is that English-only rankings like ours are more stable than might generally be appreciated.  They also observed the women were apparently underrepresented in Wikipedia: see the article in MIT Technology Review.

The second paper, Highlighting Entanglement of Cultures via Ranking of Multilingual Wikipedia Articles by Eom and Shepelyansky, was published on October 3, 2013, well after our book went to press. They analyze Wikipedia in nine languages, using three measures of network centrality: PageRank (based on in-coming links), CheiRank (based on out-going links), and 2DRank (based on both). PageRank generally resulted in the most informative analysis.

Eom and Shepelyansky are interested in how different cultures evaluate people. By looking at the 30 highest ranked figures of each language, they can identify which historical figures are globally of interest and who are local to particular editions. Generally-speaking, political figures like kings and presidents of nations rank as local heros. By taking a consensus of the figures in the nine languages worth of Wikipedia's, they obtain a global hero ranking. Their top ten are shown below, along with where they appear in our historical ranking:
  1. Napoleon (2)
  2. Jesus (1)
  3. Carl Linnaeus (31)
  4. Aristotle (8)
  5. Adolf Hitler (7)
  6. Julius Caesar (15)
  7. Plato (25)
  8. Charlemagne (22)
  9. William Shakespeare (4)
  10. Pope John Paul II (91)
Another just-published article of theirs,  Time Evolution of Wikipedia Network Ranking, tracks changes in PageRank and other centrality measures in Wikipedia over time.  Finally, in earlier work, they prepared rankings of universities, companies, and several groups of people, including comparisons of PageRank against Hart's ``The 100''.

Obviously the high PageRank figures both teams found in English were exactly the same as we found, modulo minor differences of Wikipedia version number and technical decisions about which pages/links to include. This results in a bit of deja vu as people like Linnaeus, Napoleon, and Elizabeth II rise to uncomfortably high places.

What we see as the major contributions of our work revolve around: 
  • Integrating other sources of information, like hits, article length, and page edits.
  • Isolating the distinct factors of celebrity and gravitas underlying all these variables.
  • Developing a reputation decay model to permit fair comparisons of contemporary and distant historical figures.
  • Evaluating the resulting rankings against a variety of gold-standards and independent metrics, including other published rankings, public opinion polls, frequency in published books, sports statistics, and the prices of autographs and paintings. We note that our combined significance score significantly outperformed PageRank on these metrics. See page 37 in "Who's Bigger" for details. 
  • And finally using this rankings to perform a systematic study of issues like who belongs in children's American history textbooks, the effectiveness of human decision processes (like recognizing the most appropriate members of a Hall of Fame) and the underrepresentation of women in the historical record.