Last Monday, Slate Magazine ran an article called “Awkward Suggestions: Let's Have Fun with the Google Search Box.” The article takes Google's predictive search feature (the feature that suggests “Michelle Obama” if you start typing “michelle o”) and compares what “less intelligent” vs. “more intelligent” people are searching for, by comparing queries that start with “less intelligent” and “more intelligent” phrases, such as “how 2” vs. “how might one.”
I agree that it's interesting that typing in “how 2" prompts phrases such as “how 2 kiss” and “how 2 get a six pack,” while typing in “how might one” prompts phrases such as “how might one treat poisoning from curare” and “how might one discover a new piece of music.” However, I take issue with the notion that one search is “less intelligent” and one is “more intelligent.” They're both bad searches.
Imagine the search query “how to grow bananas.” Now imagine two web pages, one titled “how to kill ninjas” and one titled “tips on growing bananas.” The second one is clearly more relevant to the search, but the first title is actually closer to the search query, because it matches 2/4 words, while the second one matches only 1/4 words. Our smart human brains know that “bananas” is more important to the query than “how” and “to,” but computers don't know this unless programmers tell them.
What search engine programmers do, then, is give the computer a list of “stop words,” or, words to ignore when matching documents to searches. The actual stop words will vary from program to program, but they typically include pronouns, prepositions, conjunctions, modals, and some adverbs. Here is a good sample list.
So, now that our search engine is armed with a stop list, it knows to ignore the words “how” and “to,” which means that “kill ninjas” has 0/2 words in common with the original query while, “growing bananas” is a better match at 1/2. (The reason it's still only 1/2 is that computers also don't know that the words “grow” and “growing” are close enough that someone who searches for one is probably also interested in pages containing the other. Introducing a “synonym ring” which equates “grow” and “growing” allows a search engine to match queries containing one term with results containing the other.)
My point (and I do have one) is that all of the search strings compared in the article are composed entirely of stop words. The results for “how 2 tie a tie,” “how might one tie a tie” and “tie a tie” are virtually identical, because including stop words in a Google search query has almost no effect on the search results. Slate Magazine isn't comparing “more intelligent” and “less intelligent” queries, they're just casually-worded bad queries and hifalutin bad queries.
This is a Basque carol with English lyrics written by Sabine Baring-Gould. I found a version sung in the original Basque on YouTube, but it was absolutely terrible, so I'm embedding an English-language version, instead.
Just to remind you of what a truly funky language Basque is, I'm copying out the first verse and English gloss:
Oi Betleem! Ala egun zoure gloriak, Oi Betleem! Hanitch beitu distiatzen! Zoure ganik heltu argiak, Bethatzen tu bazter guziak, Oi Betleem!
O Bethlehem! Ah! how your glory today shines out brightly! The light that comes from you fills every corner.
Unknown choir:
Honorable Mention:
Like I said, I couldn't find a good Basque-language version on YouTube, but I did find this version by The American Boychoir at Yahoo Music.
Over the last several days, I've had a couple of friends ask me to vote for them in online contests. One of them lets you cast a vote once a day, and as I've seen my friend's entry rise in the stats, I've realized that success in these types of contests is due more to an ability to mobilize masses of friends than it is to innate talent. Don't get me wrong — I wouldn't support either of these people if I didn't think they had talent and deserved to succeed — but I'm not actually invested in either competition and I suspect that most of the people who are voting are in a similar position.
I got to thinking about how I might try and fix such a system so that more people were voting for people besides their friends. I think I would set up the voting system so that everyone who wanted to vote for someone would also be presented with four other randomly selected entries and required to cast a "second place vote" for one of them. For the sake of honesty, the second place vote would probably need to count less than a "first place" vote, but since the second place votes would presumably be made without personal bias, they could end up determining the winner.