Today I was at SES San Jose and represented Bing on a panel called "Don't Call it a Comeback: Semantic Technology and Search” hosted by Dana Todd. The format was interesting: the four major search engines acted as the panel and three startups – Twine, WolframAlpha, and HubPages – gave presentations.
SES is very pragmatic conference focused on publisher issues, whereas semantic search discussions typically are forward-looking or, perhaps less generously, speculative. Luckily for our audience, our conversation was peppered with practical examples.
Nova Spivack of Twine started off by arguing that the Web is now too big for keyword search. He reasoned both that users need more tools to organize the millions of results they get back for a query and that they want to ask more complex questions of that data. He suggested that the next steps in this third decade of the Web (since Tim Berners-Lee wrote his seminal paper in 1989) will be natural language search, semantic search (defined by Nova as “parameterized search”), which will eventually give rise (in Web 4.0) to applications that are able to make inferences and reason.
The most exciting portion of his presentation was his announcement of Twine 2.0: semantic search at Web scale. He humbly admitted that Twine is a small company and that Twine 2.0 is an incredibly ambitious project (indeed!). In a thinly veiled plea to the search engines on the panel, he suggested that he'd like to partner with one of us. I hope to get a demo soon!
Next in line was Lane Soelberg from CoopVentures, who is helping to bring WolframAlpha to market (I wrote a post about WolframAlpha marketing prior to their launch). Instead of just rehashing what WA does, he gave some compelling examples of how publishers could create mashups using the forthcoming WolframAlpha API and their own data. One example was a hypothetical insurance company that combined their own internal accident database with WA weather, traffic, and holiday data to create an Accident Predictor.
Finally, Jason Menayan from HubPages gave a very practical example of how semantic technology can be used in SEO. HubPages is a UGC site that generates over 60MM page views a month. HubPages tried two experiments. First, they semantically generated content suggestions for a few pages using Digger. Though only done with 3 pages that hadn’t shown up in Google results yet, the results were positive. The content suggestions broadened the topics on the page and caused them to be ranked highly within a few weeks. HubPages is in the process of experimenting with more pages to get a more statistically significant set. The second test was to present both semantic tags from Relevad and regularly generated tags as suggestions for the author of the page. The idea of the semantic keywords is sometimes they surface relationships that are less likely to be found in a typical keyword style. Both sets of keywords seemed to help traffic equally as well up to a point -- the curve started to fall around 35 tags / per page. Not a clear win for semantics, but keep in mind that "folks" were still choosing the tags for the folksonomy. Maybe automatic (semantic?) tag generation would be more efficient? .
Given the fascinating presentations, there wasn’t a lot of time for the folks from the search engines to comment. We did get a few chances to respond to questions from Dana and the audience.
Oddly enough, the representatives from Google, Yahoo, and Bing (me!) were all people focused on creating the next generation of rich captions and had great statistics to share. According to Kevin Haas from Yahoo, “lots and lots” of searches contain SearchMonkey results (try this). After a bit of prying, he conceded that the number was over 50%, but his demeanor suggested that this number was much higher. Personally, I’d be more interested in the number of results that have a SearchMonkey result in the top 10 or top 3 results, as a better measure.
The first question from the audience came from a concerned audience member who worried that search engines could too much control of content once the content was marked up. Othar Hansson from Google assured the publisher that, after testing Rich Snippets for months, enhanced captions actually promoted click to the vendor sites. He also stated that Google focuses on informing the user click and that they don’t believe many user queries can be answered on the SERP itself. I humbly disagree, but will leave the judgment ultimately up to the users.
Another question to the search engines was: what are you doing in semantic search today? In typical search engine style, we were all a bit cagey with our responses, partially because it’s hard to succinctly describe the technology that goes into any given search. My response went something like this: given that Bing is a decision engine, it’s critical that we approach semantics from both the query side and the document side. On the query side, a user should be able to type in a query in a way that’s most natural to her vocabulary, not the specialized vocabulary of a search engine. Therefore, it’s incumbent upon us to translate the normal language of the user into the language of a search engine (if necessary). However, the document side is just as important. The more a search engine knows about a document, the better we can rank the document in search results, the better snippet we can generate, and the more information we can present on the SERP. Bing is certainly working to make strides on both fronts. Usually these advances are opaque to you, the end-user, and so much the better! We’re all about solving user problems, not about touting our technical prowess.
I’d love to hear comments from readers, both on my impressions of the content of this panel and what you’d like to hear about on panels in the future. No doubt there will be more semantic search discussions at conferences in the future: what sorts of issues would you like addressed and what kind of burning questions do you have? Some questions that never got answered (courtesy of Dana Todd):
- Why has semantics struggled so hard for an identity, and what’s necessary to help it get market traction and mass appeal?
- What are the factors that limit/enable success for a semantic project or technology?
- We typically think of semantics in terms of words; WolframAlpha is now attempting to redefine to include numbers and pure data. How does that change things?
