Wednesday, January 16, 2013

Facebook's New Search Engine and the Law of Small Numbers

Facebook has announced its new search engine and it will seek to garner a large share of Google's market and advertising revenue.  Will Facebook succeed?  The Zuck is making a big deal about "personalized searches" based on your friends' collective wisdom.  The Zuck needs to take a statistics class because he has forgotten about the curse of dimensionality.    On Facebook I have roughly 200 friends and many of them (I apologize) are not good friends!  Suppose, I rely on this crew to tell me about "France and good restaurants in Paris".  I would guess that 5% of my friends have been to Paris in the last 5 years -- so 200*.05 = 10.  Of these ten people, do I trust that their taste in restaurants is the same as mine?  Do they have the same income I do?  From these 10 data points, perhaps 2 of them have posted to Facebook some info about the restaurants they liked in Paris.  So, from these 2 data points do I really learn anything about finding a good restaurant for me in Paris?  Unfortunately, I think the answer is "no".  Google averages over a much larger sample of strangers.  Google can provide more data by providing sites that show their reviews so I can cherry pick people who seem similar to me in terms of sophistication and priorities.  For example, I don't care what the restaurant looks like but I don't like smoking and I like good food even if it is expensive. I don't see how FB addresses this. They don't have enough data to tailor.  The law of small numbers is nasty!

UPDATE:  FB may respond to this challenge by providing search results for strangers who I don't know who have posted their views about Paris restaurants.  So, in this case FB's final search product will be a weighted average of my few friends and a Google substitute.  I'm not convinced that this is a major contribution.

The interesting statistics issue here is this tradeoff between customized information (my friends' views) versus seeing the collective wisdom of a larger sample of strangers.