Wednesday, November 04, 2009

"Smart" searches in Google

Last Monday, Slate Magazine ran an article called “Awkward Suggestions: Let's Have Fun with the Google Search Box.” The article takes Google's predictive search feature (the feature that suggests “Michelle Obama” if you start typing “michelle o”) and compares what “less intelligent” vs. “more intelligent” people are searching for, by comparing queries that start with “less intelligent” and “more intelligent” phrases, such as “how 2” vs. “how might one.”

I agree that it's interesting that typing in “how 2" prompts phrases such as “how 2 kiss” and “how 2 get a six pack,” while typing in “how might one” prompts phrases such as “how might one treat poisoning from curare” and “how might one discover a new piece of music.” However, I take issue with the notion that one search is “less intelligent” and one is “more intelligent.” They're both bad searches.

Imagine the search query “how to grow bananas.” Now imagine two web pages, one titled “how to kill ninjas” and one titled “tips on growing bananas.” The second one is clearly more relevant to the search, but the first title is actually closer to the search query, because it matches 2/4 words, while the second one matches only 1/4 words. Our smart human brains know that “bananas” is more important to the query than “how” and “to,” but computers don't know this unless programmers tell them.

What search engine programmers do, then, is give the computer a list of “stop words,” or, words to ignore when matching documents to searches. The actual stop words will vary from program to program, but they typically include pronouns, prepositions, conjunctions, modals, and some adverbs. Here is a good sample list.

So, now that our search engine is armed with a stop list, it knows to ignore the words “how” and “to,” which means that “kill ninjas” has 0/2 words in common with the original query while, “growing bananas” is a better match at 1/2. (The reason it's still only 1/2 is that computers also don't know that the words “grow” and “growing” are close enough that someone who searches for one is probably also interested in pages containing the other. Introducing a “synonym ring” which equates “grow” and “growing” allows a search engine to match queries containing one term with results containing the other.)

My point (and I do have one) is that all of the search strings compared in the article are composed entirely of stop words. The results for “how 2 tie a tie,” “how might one tie a tie” and “tie a tie” are virtually identical, because including stop words in a Google search query has almost no effect on the search results. Slate Magazine isn't comparing “more intelligent” and “less intelligent” queries, they're just casually-worded bad queries and hifalutin bad queries.


At November 06, 2009 5:17 AM, Blogger Trevor said...

Damn straight.

At November 07, 2009 7:56 AM, Blogger Jill said...

Your banner is really pretty. How did you get that?

At November 07, 2009 11:00 AM, Blogger Katya said...

Jill - It's an illustration from the master's thesis of a friend of mine.

At November 12, 2009 4:53 PM, Blogger Saule Cogneur said...

cleva girl!


