Semantic Web

May 21, 2008

The Rising Stars of the Semantic Web (SemTech2008 Keynote Panel)

Here's my verbatim-ish transcription of the keynote at the Semantic Technology Conference in San Jose: The Rising Stars of the Semantic Web.

Powerset (Dr. Barney Pell, founder & CTO)

Powerset is organizing and indexing content from Wikipedia and Freebase.  First query is Henry VIII.  For topical queries, Powerset provides a dossier of.  Factz: automatically extracted triples from across Wikipedia articles.  For a famous person like Henry VIII, there are assertions all over Wikipedia about Henry VIII.  Second query: what did the fda approve?  Powerset doesn’t have any special domain knowledge (English history or medicine) that allows it to pull out information from text.  Search results are taking advantage of both semantic features and traditional search features. Now off to articles: Tim Berners-Lee.  Powerset republishes Wikipedia articles with some additional features, like the Article Ouline that follows you as you browse.  By looking at the Explore Factz, you can see all of the concepts in the page and the relations.  Clicking on a concept shows all of the Factz in the Article Outline.  Final query: what did tom cruise star in?  An NL query translated into a Freebase query, which shows all of the movies that Tom Cruise starred in. 

AdaptiveBlue (Alex Iskold, founder & CEO)

A network on top of the existing Web.   Basic bits of the web are understood by AB’s technology, e.g. books, stocks, addresses, etc.  Stop visualizing the Web as a collection of pages but as a collection of things.  For example, "The Kite Runner" may show up on both Amazon and Barnes&Noble.com, but they both refer to the same book.  Why is this different?  Bringing semantics to the Web as it exists now, not creating a new semantic Web.  This allows the content to evolve with the Web itself.  Two primary products are BlueOrganizer (a browser plug-in) and SmartLinks (semantic improvements to links on your blog).

Twine by Radar Networks (Nova Spivack, founder & CEO)


Actually this was a video made by a high school student who is now an intern at Radar Networks.  The video is to the tune of Radar Love and a fast-paced tour through Twine.  Unfortunately, there was so much going on that my limited human brain couldn't absorb it all: making new Twines, e-mailing things to a Twine, adding Google videos to Twine, history of video games, oh my!  Kids these days.  They just click them mice and keyboardz so fast.  Video ends with "Twine: How have we lived without it."

Talis (Ian Davis, CTO and creator of RSS 1.0)

Privately owned, well-established and innovative software company that's been around for 40 years,> all of the other companies on the panel combined.  Talis wants to be critical infrastructure to the semantic Web, such as applications for libraries, education, learning and community.  

Stealth-Company.com (Tom Gruber, CTO and founder)

No particular product to pitch ('cause they're still a stealth company).  The killer app isn't search, maps, e-mail, videos, etc.  It's intelligent design that brings together all of that cool stuff.  The iPhone is an example of this.  The killer app is your life (online): applying the best of the internet technologies (intelligently) to your life.  It's closer than we think (um gottes willen).  Paradigm shift.  Paradigm 1 is the hyperlink (user finds links, computer connects).  Paradigm 2 is the portal (user chooses your channels, computer delivers).  Paradigm 3 is the search engine (user states the query, computer finds relevant documents).  But why do we need to find the "magic" words to find what we want?  The next level will anticipate our needs.  I'm anticipating Stealth Company!

Q&A Session
  • "What is the best opportunity that would get you a quote from Carla Thompson in the New York Times"
    • Nova - unsexy=a store of trillions of triples  sexy=x-rated "sementic" application
    • Alex - intersection of iPhone and the semantic Web, i.e. an application that understands your contexts as you move around
    • Tom - "the interface" companies need to address basic human needs.
    • Ian - similar answer to above.  Concrete example: something around travel.
    • Barney - search advertising and publishing.
  • "Once you get Carla's attention, what do you do next.  How do you get to Gartner, Jupiter, and Forrester to notice?"
    • Alex - 2008 is the year of the semantic web, with or without Garner. 
    • Barney - "I think it's perfectly adequate that Guidewire is here: we don't need Gartner."  A big question is: what is the category that we're creating?  For example, what happens when we're able to join different pieces of content. . . new interface paradigms. . . new ways of consuming information. 2 or 3 categories will be created by the time of this conference next year.  "This conference next year will be twice the size and half of the companies will be consumer companies."
    • Nova - these firms don't think of "semantic web" as a space.  They just want to know how it's going to help the categories that they already track.  Maybe the name "semantic web" isn't really a good term anyway, since there are so many different companies, products, and underlying technologies.  Big analysts are playing catch-up.  The people in this room should define the category and educate the market.
Whew!  That was fun :)

Report from SemTech 2008 & the SD Forum Semantic Web SIG

Carla already did a great writeup of the panel I was on yesterday, so I won't bother taping out my disjointed thoughts.  However, I was at the SD Forum Semantic Web SIG panel last night (hosted at the Semantic Technology Conference) entitled, Will Semantics Give Web Search a Face Lift.

A common theme at this conference seems to be on the semantics of "semantic," and I found the problem to be especially apparent on this panel.  Oddly enough, there were no demos, but here's a summary of the panelists' opinions.  Consider the following:
  • Healthline (Dr. AJ Chen) - since Dr. Chen was moderating, his brief slides focused on what a semantic search solution looks like under the hood.  Specifically, Healthline has health-specific ontologies that they're using for refinement, improved results, and improved ads.  Though I think Healthline's solution is very intersting, this is a pretty much the standard ontology implementation for vertical search companies. Semantic web = vertical ontologies used in a vertical domain.
  • Google (Dr. Fernando Pereira) - Dr. Pereira seemed to have been sufficiently brainwashed by Google's Borg culture.  Though he admitted that Google is looking into next-generation technologies, his slides were a laundry list of why-not reasons: lexical information always lags behind real world usage, ontologies are messy, categorization schemes are difficult (impossible?) to create, annotations are noisy and potential spam channels. He cited utilitarian philosophy as Google's mantra: the greatest good for the greatest number of people.  Semantic web = interesting ideas, but I don't care unless it shows a marked improvement in horizontal results and can scale immediately to the entire Web.
  • Yahoo (Dr. Peter Mika) - this talk was the closest to the standard W3C version of the Semantic Web.  Through Search Monkey, Yahoo is enabling content publishers to produce rich abstracts in their search results.  The idea is to provide a better user experience in search results, allowing users to get from "do to done faster."  Semantic Web = encouraging publishers to use standards and using them to improve search result listings.
  • Hakia (Dr. Christian Hempelmann) - Dr. Hempelmann's slides began with a sigh about needing beer.  Fair enough.  His slides then talked about what he doesn't consider semantics: proximity, syntax, 80% solutions, small incremental improvements.  In contrast, he argued that Hakia's extensive ontology was the only way to get to true semantics. Semantic Web = our way or the highway.
Wow!  Four people on a panel, essentially talking past each other.  Luckily, Dr. Ron Kaplan, our beloved Chief Science Officer from Powerset, made some important comments.  First, he noted that his (admittedly biased) version of the Semantic Web could be called the "Syntactic Web," i.e., that structural relationships are a necessary condition for finding meaning.  He worried that some of the panelists were saying "Search is good enough. Semantics can't get us to God's Own search engine.  Therefore, Semantics isn't worthwhile."  Ron pointed out that if semantics can get us better search results, the real question is whether it's good enough for users to appreciate the difference without them rebelling because it isn't perfect.  My favorite quote related to this point was: "We know that stupidity scales.  That's not the problem.  The question is: can we do better than that?" (thanks to Uldis for capturing this on Twitter)

So, the panel was basically four people talking past each other, plus some pointed comments from Ron to keep things lively.  On one end of the spectrum, Google seems to poo-poo semantics and NLP, generally. On the other side, Hakia seemed to be promoting a dogmatic view of what semantics is supposed to be.  Yahoo sits somewhere in the middle, trying to use approved technologies to improve results incrementally.  Healthline uses targeted ontologies to improve their vertical, but with little chance to improve Web search generally.

My Powerset-influenced opinion is that either extreme is wrong; that a general solution over incremental vertical improvements is key to overhauling Web search; that being open is better than being dogmatic; that a "semantic" web search will include semantics, syntax, statistics, and god knows what else; and that none of the major players seems are outwardly focused on this direction.

*Update* - Reposted on AltSearchEngines.

May 12, 2008

Powerset crosses the Uncanny Valley

In case you haven't heard the news, Powerset launched today.  Since I can't sleep due to launch adrenaline, I suppose it's time to weigh in on Powerset unofficially.  It's fun to write a press release, but it's more fun to philosophize on my own time!

Powerset_uncanny_valley Last year, I came across the concept of the Uncanny Valley though an employee of PARC.  The basic concept runs something like this: as you make a robot more like a human being, people tend to respond with more empathy.  However, there's a dip in the empathy curve when the robot looks enough like a human, but has behavior incongruent with what we'd expect from a real human being.  We tend to forgive weird mannerisms (jerkiness, oddities in speech, etc.) when a robot looks like a robot.  When a robot looks like a human, we expect it to act like a human.

As an example, cartoon characters are usually more believable than computer generated characters, unless the CG is so perfect as to be indistinguishable from reality.

Keyword search engines sit somewhere on the left of the curve.  We're willing to accept the choppy "keywordese" language of a search engine because we appreciate the results that come back and we can make excuses when it goes awry: "It's just a computer.  It doesn't really understand me."  As with most technology products, the utility curve and the expectation curve are aligned exactly because we treat keyword search engines as an unfeeling technology.

Oddly enough, when a system is endowed with an understanding of meaning (call it "semantics"), we suddenly demand that the system act, think, infer, and comprehend just like another human.  Instead of just being a cold of technology, we begin to treat the system with agency and want it to act like other agents (i.e., humans). 

The question shouldn't (and can't) be about releasing a system that has a perfect understanding of language, because that perfect understanding is many years down the road.  How is it that companies like Radar, TrueKnowledge, and Powerset believe that we can create products based on imperfect technology?

There are two ways to stay out of the Uncanny Valley.  First, create unique features that are so compelling that users are willing to deal with imperfection.  Keyword search hasn't survived because we always get the right answer - plenty of searches yield nothing.  But, though we may get wrong answers, we often enough get answers that would never have been possible before keyword search.  Second, access to semantic features shouldn't require a significant change in user behavior.  No matter how good (or "natural") a feature is, if a user has trouble finding it, it won't get used.

People often think that NL search is doomed to failure both because computers don't understand enough and, even if they did, users won't start entering natural language queries.

I think Powerset has scored on both counts.  As an example, for topical queries like moses or machiavelli, Powerset assembles a summary of Factz based on our deep linguistic analysis of each sentence.  This kind of automatic aggregation of content just wasn't possible before Powerset.  But, users don't have to do anything special.  They just get Factz "for free" by using their keyword usage paradigm.

I'm really proud of Powerset's first product.  The debate on whether or not we've crossed the valley will continue, but one thing is certain: a better, more natural way of interacting with technology is coming. (Of course, whether we sit to the left or the right of the Uncanny Valley is the subject for another post.)

Bed time, finally!

March 08, 2008

Biding my Twine

Damn it, I want in!  I haven't found a good program to manage all my life's text 'cause I'm too lazy, e.g. my anemic del.icio.us account.  My 'puter should just know why I tagged a page.  I sure as hell shouldn't be the one picking a word to place the article on the correct branch of my "folksonomy," because that graph is going to be an unholy orgy of Familienähnlichkeit.

Chatter about the Semantic Web will hopefully increase (Google trends indicates a decrease) as more companies, like Powerset and Radar, launch cool, useful applications.  Once users realize what's possible, they'll demand more from those indolent hunks of silicon sitting on their laps.