s Thoughts from the Physics Chick: July 2011

Thursday, July 21, 2011

Superhero Metrics, Part IV: Applications

There are times when you want to work with a binary category—one that defines items as either in the category or out, with no gradations—but there are times that you want to be able to be able to rank items in a category by popularity, prototypicality, etc.

Want to try out a new author? Start with their most popular work. (If you don’t like that, then you probably won’t like their work, in general.) Can’t remember why an actor’s face looks so familiar? Go to imdb and sort their filmography by votes. (Their most popular works—which you’re more likely to have seen—will sort to the top.) Want to become more knowledgeable about opera or any other topic? Start by learning about the most popular stuff so that you have a foundation in the subject.

(I admit, I’m having some trouble writing this section, because wanting to be able to sort items in a category by popularity or some other metric is second nature to me. However, others may have entirely different information needs and searching habits.)

In my last post, I'll write a summary and conclusion.

Tuesday, July 12, 2011

Superhero Metrics, Part IV: Disadvantages

The WIMCS is not without its own biases and shortcomings. One of the biggest flaws I can see is that it measures only current popularity, without regard to historical popularity or influence.

For example, I’d guess that Iron Man only comes in at such a high ranking because of the success of the 2008 and 2010 films. (Indeed, the page for the 2008 film has a slightly higher WIMCS ranking than the page for Iron Man, himself, suggesting that the film is driving interest in the character, rather than the other way around.)

In the case of Iron Man, specifically, the films were released after Wikipedia came into existence, which means that we can look back in time at the page history to see how the release of the films affected its WIMCS ranking.

However, in the case of such cultural touchstones as Superman, the institutional memory of Wikipedia is far too short to remember a time when Superman was the biggest superhero by far, instead of tied for second with Spider-Man. (Still, if you’ve got an article about you in the Old English Wikipedia, there has to be a sense in which you’ve culturally “arrived.”*)

Another big flaw in this methodology is testability. The WIMCS was developed as a more precise alternative to keyword searches, but in order to verify its accuracy, it needs to be checked against some other metric, which brings us back to keyword searches or some other technique.

For the moment, the only metric I’ve tested it against is my own gut reaction. So, if Bizarro had a higher WIMCS ranking than Superman, I’d know the methodology was flawed, because I know that Superman is more culturally significant that Bizarro. However, if Superman had been slightly ahead of or behind Spider-Man, I don’t think that I would have been surprised. (Indeed WIMCS rankings may not be fine-tuned enough for it to matter if one article ranks slightly ahead of or behind another one, although I still believe that large differences in rankings should be considered significant.)

Yet another disadvantage is that this technique can measure a topic’s overall popularity, but not its popularity within a certain class. So, Beethoven and Bizet are both members of the class “Opera composers,” and Beethoven’s WIMCS rating is 2.5 times that of Bizet’s, so Beethoven is the more popular opera composer, right? Well, no. Although Beethoven did write one opera (Fidelio), he’s much better known for his musical compositions in other forms, while Bizet’s Carmen is a staple of opera companies. (And, indeed, the WIMCS ratings for Fidelio and Carmen bear out this distinction.)

And the last disadvantage is fairly straightforward: This technique doesn’t work for topics that don’t have their own Wikipedia page.

In part V, I’ll discuss potential applications for this ranking technique.

Monday, July 04, 2011

Superhero Metrics, Part III: Advantages

One of the biggest advantages of this methodology is precision of meaning, which is the main disadvantage of keyword searches. So, Wikipedia neatly separates out the pages for “wolverine,” the animal (46 links); Wolverine, the X-Man (37 links); and the Wolverines who are the mascot of Utah Valley University (1 link, to page in Farsi, of all languages).

Another advantage is that the initial connections between wikis are made by humans, which prevents two words that look the same but have different meanings from being matched up.

Third, even though the Wikipedias of the world are freely editable, these results would be fairly difficult to fake, because you’d have to be able to write about a subject in many different languages.

Lastly, you can use this methodology to get results for any topic that has an entry in the English Wikipedia (although it will obviously be less useful for more obscure topics).

However, there are also a number of disadvantages to this methodology, which I'll cover in Part IV.