Search

May 12, 2008

Powerset crosses the Uncanny Valley

In case you haven't heard the news, Powerset launched today.  Since I can't sleep due to launch adrenaline, I suppose it's time to weigh in on Powerset unofficially.  It's fun to write a press release, but it's more fun to philosophize on my own time!

Powerset_uncanny_valley Last year, I came across the concept of the Uncanny Valley though an employee of PARC.  The basic concept runs something like this: as you make a robot more like a human being, people tend to respond with more empathy.  However, there's a dip in the empathy curve when the robot looks enough like a human, but has behavior incongruent with what we'd expect from a real human being.  We tend to forgive weird mannerisms (jerkiness, oddities in speech, etc.) when a robot looks like a robot.  When a robot looks like a human, we expect it to act like a human.

As an example, cartoon characters are usually more believable than computer generated characters, unless the CG is so perfect as to be indistinguishable from reality.

Keyword search engines sit somewhere on the left of the curve.  We're willing to accept the choppy "keywordese" language of a search engine because we appreciate the results that come back and we can make excuses when it goes awry: "It's just a computer.  It doesn't really understand me."  As with most technology products, the utility curve and the expectation curve are aligned exactly because we treat keyword search engines as an unfeeling technology.

Oddly enough, when a system is endowed with an understanding of meaning (call it "semantics"), we suddenly demand that the system act, think, infer, and comprehend just like another human.  Instead of just being a cold of technology, we begin to treat the system with agency and want it to act like other agents (i.e., humans). 

The question shouldn't (and can't) be about releasing a system that has a perfect understanding of language, because that perfect understanding is many years down the road.  How is it that companies like Radar, TrueKnowledge, and Powerset believe that we can create products based on imperfect technology?

There are two ways to stay out of the Uncanny Valley.  First, create unique features that are so compelling that users are willing to deal with imperfection.  Keyword search hasn't survived because we always get the right answer - plenty of searches yield nothing.  But, though we may get wrong answers, we often enough get answers that would never have been possible before keyword search.  Second, access to semantic features shouldn't require a significant change in user behavior.  No matter how good (or "natural") a feature is, if a user has trouble finding it, it won't get used.

People often think that NL search is doomed to failure both because computers don't understand enough and, even if they did, users won't start entering natural language queries.

I think Powerset has scored on both counts.  As an example, for topical queries like moses or machiavelli, Powerset assembles a summary of Factz based on our deep linguistic analysis of each sentence.  This kind of automatic aggregation of content just wasn't possible before Powerset.  But, users don't have to do anything special.  They just get Factz "for free" by using their keyword usage paradigm.

I'm really proud of Powerset's first product.  The debate on whether or not we've crossed the valley will continue, but one thing is certain: a better, more natural way of interacting with technology is coming. (Of course, whether we sit to the left or the right of the Uncanny Valley is the subject for another post.)

Bed time, finally!

March 05, 2008

The Ask Lesson: Don't Go Head-to-Head with Google

Ask.com Seeks Makeover as Woman's Site is the real title of an article and the real strategy of a public company.  Wow!  Talk about a shift in focus!

To me, this move indicates that Google, Yahoo, and Microsoft (GYM) have complete market domination.  The capex of building a search engine is too high, not to mention the marketing problem of getting users to change their internet front door.  If IAC-backed Ask couldn't compete with innovative features, comparable result quality, and a big marketing campaign, then cash-strapped startups aren't going to fare much better.

I'm certainly not suggesting that Google will reign forever, considering I'm now at my second search startup (Kosmix, then Powerset).  Rather, I now realize that going head-to-head with Google in search is as ill-fated going head-to-head with Microsoft in operating systems.

What are then the conditions for success?  I obviously don't have the answer, but here are some lessons I've gathered from my experience.

  • All of the core components to build a search engine are readily available.  Only a radical shift in technology built on top of this that will drive a significantly different user experience.  Just creating a better SERP based on the same technologies won't be noticeable to user and is easy for GYM to copy.  But, who knows what will herald this Kuhnian shift?
  • Users won't remember your vertical focus.  Consumers have trouble remembering Kayak for travel, Trulia for Real Estate, Indeed for Jobs, and dozens of other engines.  The default is to type a problem into GYM. Therefore, use search to drive traffic.  GYM owns front door traffic to the Web.  Don't fight, play along in the ecosystem.  If you can get some repeat users along the way, then good for you.
  • Plan a series of interim products.  A Google index is probably hundreds, if not thousands of times as large as the biggest startup index.  Oh, and it's lightning fast.   And it never stops working.  Only after years will all of your components to congeal into a final product.  Better be prepared with a long runway and some good interim steps
     

I suspect most search startups will find their niche in something related to search, much like software companies began playing in the Microsoft ecosystem instead of building competing operating systems.  For example, Kosmix  is using its technology to build a homepage for every page on the Web.  There are definitely still holdouts like Hakia and Mahalo.  Hakia is headed for a battle against the big boys: their experience still isn't even equivalent to Google.  Maholo might be able to market its user-generated pages in Google, but then it's sounding like a Squidoo. 

Ask?  Sounds like they're quitting the search engine business and turn into a social network/content play.  Oh well.  Glad that things didn't work out when I interviewed there last year.

October 02, 2007

Advantages of Kosmix's KFS vs. HDFS

I was excited to learn last week that my friends at Kosmix have decided to open source a project long in the works: the Kosmix Distributed File System, or KFS  (see the offical blog post).  A number of people have commented on this release including Ethan Stock of zVents, who plans to use KFS along with their HyperTable clone of BigTable, and Rich Skrenta, who gives an excellent list of features of KFS.

Now, as a dumb product manager, my biggest questions were about KFS vs. HDFS, which is the distributed file system built by the Hadoop project.  Powerset already makes extensive use of the Hadoop stack, including HDFS.  So, I asked Sriram Rao, the lead engineer of KFS if he could explain to me what the different is between HDFS and KFS.  Here are some of his answers, which I think give more insight into why Kosmix chose to build KFS.

  • So why did Kosmix build KFS instead of using HDFS?  Apparently, KFS/HDFS were done in parallel.  The implementation was done from 2006-2007 and now Kosmix feels it's in a releasable state.  One of the reasons to stick with KFS over HDFS is that HDFS is written in Java and Kosmix's back-end is written in C++ and they were worried about the speed of the JNI interface.
  • File writing -  HDFS writes to a file once and read many times.  But, when writing to a file, you have to write from the start to the end and that is it.  Conversely, in KFS you can write to a file as many times as you want and write anywhere in the file (i.e., seek and write) and append to an existing file.  I've heard that Yahoo is working to fix this problem in HDFS, but it still isn't implemented.
  • Data integrity - Currently, with HDFS, after you write to a file, the data becomes “visible” to other apps only when the application closes the file. So, if the process were to crash before closing, the data written is lost.  With KFS, the data becomes visible when it gets pushed out to the chunkservers.  For performance, clients cache data; when the cache is full or when the applicatiohn choses, data gets flushed out.
  • Data rebalancing - KFS has rudimentary support for automatic rebalancing.  When you add new nodes/there is a change in space utilization amongst nodes, the system may migrate chunks from over-utilized nodes to under-utilized nodes.  HDFS doesn’t have such support now.

Hopefully I transcribed these accurately!  Definitely check out the KFS project, as the more people contributing, the better.  Powerset will be evaluating KFS in the coming weeks to see if it has any features that can propel us ahead of using HDFS.

April 03, 2007

Kosmix appoints Jonathan Miller to the Board

Kosmix, my former employer, today announced that Jonathan Miller, former CEO of of AOL, is joining their Board of Directors.  Miller is credited with saving AOL from extinction, in the sense that he jettisoned the paid subscription dinosaur in favor of AOL.com as a destination and a healthy advertising business.  With his background in media and advertising, he'll bring a healthy perspective to the BoD and play an active role in the company.  Note that he is the first member of the BoD that's neither a Founder or an investor.

When I tell people that I worked at Kosmix, many still have the misconception that Kosmix is a health search engine.  Kosmix actually crawls the entire Web and health is just one of six verticals.  There are very few small companies, if any, small companies that have the basics components of a search engine sitting around their server rooms.  And, Kosmix has done a fabulous job taking its categorization technology and innovating the user experience.  Check out this example SERP for chlamydia.  Through a combination of pulling in 3rd-party content (e.g. Wikipedia), segmenting results, and surfacing topics, Kosmix has created a rich user experience that's looks more like content than a results page.   Note that the tops at the top, like related drugs, are derived from the intelligence of the Web: these are not hand-picked results.

Kosmix tends to fly under the radar, but it's a product of their humility and focus.  The technology is there and I'm excited to see what advances they make in 2007.  They are making me a proud shareholder!

tags technorati :