Today I was at SES San Jose and represented Bing on a panel called "Don't Call it a Comeback: Semantic Technology and Search” hosted by Dana Todd. The format was interesting: the four major search engines acted as the panel and three startups – Twine, WolframAlpha, and HubPages – gave presentations.
SES is very pragmatic conference focused on publisher issues, whereas semantic search discussions typically are forward-looking or, perhaps less generously, speculative. Luckily for our audience, our conversation was peppered with practical examples.
Nova Spivack of Twine started off by arguing that the Web is now too big for keyword search. He reasoned both that users need more tools to organize the millions of results they get back for a query and that they want to ask more complex questions of that data. He suggested that the next steps in this third decade of the Web (since Tim Berners-Lee wrote his seminal paper in 1989) will be natural language search, semantic search (defined by Nova as “parameterized search”), which will eventually give rise (in Web 4.0) to applications that are able to make inferences and reason.
The most exciting portion of his presentation was his announcement of Twine 2.0: semantic search at Web scale. He humbly admitted that Twine is a small company and that Twine 2.0 is an incredibly ambitious project (indeed!). In a thinly veiled plea to the search engines on the panel, he suggested that he'd like to partner with one of us. I hope to get a demo soon!
Next in line was Lane Soelberg from CoopVentures, who is helping to bring WolframAlpha to market (I wrote a post about WolframAlpha marketing prior to their launch). Instead of just rehashing what WA does, he gave some compelling examples of how publishers could create mashups using the forthcoming WolframAlpha API and their own data. One example was a hypothetical insurance company that combined their own internal accident database with WA weather, traffic, and holiday data to create an Accident Predictor.
Finally, Jason Menayan from HubPages gave a very practical example of how semantic technology can be used in SEO. HubPages is a UGC site that generates over 60MM page views a month. HubPages tried two experiments. First, they semantically generated content suggestions for a few pages using Digger. Though only done with 3 pages that hadn’t shown up in Google results yet, the results were positive. The content suggestions broadened the topics on the page and caused them to be ranked highly within a few weeks. HubPages is in the process of experimenting with more pages to get a more statistically significant set. The second test was to present both semantic tags from Relevad and regularly generated tags as suggestions for the author of the page. The idea of the semantic keywords is sometimes they surface relationships that are less likely to be found in a typical keyword style. Both sets of keywords seemed to help traffic equally as well up to a point -- the curve started to fall around 35 tags / per page. Not a clear win for semantics, but keep in mind that "folks" were still choosing the tags for the folksonomy. Maybe automatic (semantic?) tag generation would be more efficient? .
Given the fascinating presentations, there wasn’t a lot of time for the folks from the search engines to comment. We did get a few chances to respond to questions from Dana and the audience.
Oddly enough, the representatives from Google, Yahoo, and Bing (me!) were all people focused on creating the next generation of rich captions and had great statistics to share. According to Kevin Haas from Yahoo, “lots and lots” of searches contain SearchMonkey results (try this). After a bit of prying, he conceded that the number was over 50%, but his demeanor suggested that this number was much higher. Personally, I’d be more interested in the number of results that have a SearchMonkey result in the top 10 or top 3 results, as a better measure.
The first question from the audience came from a concerned audience member who worried that search engines could too much control of content once the content was marked up. Othar Hansson from Google assured the publisher that, after testing Rich Snippets for months, enhanced captions actually promoted click to the vendor sites. He also stated that Google focuses on informing the user click and that they don’t believe many user queries can be answered on the SERP itself. I humbly disagree, but will leave the judgment ultimately up to the users.
Another question to the search engines was: what are you doing in semantic search today? In typical search engine style, we were all a bit cagey with our responses, partially because it’s hard to succinctly describe the technology that goes into any given search. My response went something like this: given that Bing is a decision engine, it’s critical that we approach semantics from both the query side and the document side. On the query side, a user should be able to type in a query in a way that’s most natural to her vocabulary, not the specialized vocabulary of a search engine. Therefore, it’s incumbent upon us to translate the normal language of the user into the language of a search engine (if necessary). However, the document side is just as important. The more a search engine knows about a document, the better we can rank the document in search results, the better snippet we can generate, and the more information we can present on the SERP. Bing is certainly working to make strides on both fronts. Usually these advances are opaque to you, the end-user, and so much the better! We’re all about solving user problems, not about touting our technical prowess.
I’d love to hear comments from readers, both on my impressions of the content of this panel and what you’d like to hear about on panels in the future. No doubt there will be more semantic search discussions at conferences in the future: what sorts of issues would you like addressed and what kind of burning questions do you have? Some questions that never got answered (courtesy of Dana Todd):
- Why has semantics struggled so hard for an identity, and what’s necessary to help it get market traction and mass appeal?
- What are the factors that limit/enable success for a semantic project or technology?
- We typically think of semantics in terms of words; WolframAlpha is now attempting to redefine to include numbers and pure data. How does that change things?

Great writeup, Mark. I'll add one more question that we talked about, but I didn't feel I got a solid answer on:
How can semantics improve ad targeting and effectiveness? Missing from our (already overpopulated) panel was Lucid Media and Hakia, both of which are promoting their technology as improving ad relevancy in a contextual environment. The biggest complaint most advertisers have about contextual advertising is the haphazard relevancy matching.
Is semantic more effective than plain-old contextual, and how is it really different? And, does behavioral/demographic trump the need for semantic matching?
Thanks again to all of the panelists, who really brought their A-game to the discussion. I wish we could have had 2 more hours!
Best,
Dana Todd, CMO
Newsforce, Inc.
Posted by: DanaTodd | August 12, 2009 at 10:00 AM
Dana,
Thanks for the mention. We have spent ten years advancing the use of semantics, most recently applying our relevance engine to online ad targeting and optimization. We have shown through numerous campaigns that it is highly effective in terms of both advertiser performance and return on spend. But your last point about behavioral trumping the need for semantics is a white-hot issue right now with all the privacy self-regulation discussions going on in Washington. While past studies from Marketing Sherpa and AdAge have shown the two approaches to be closely matched and often complimentary, the recent privacy issues may mean more trouble for the behavioral camps. More and more advertisers are looking to the less intrusive forms of contextual for their ad placements. The FTC recognized this when they specifically excluded contextual advertising from the privacy discussion. This may become the single most important driver of semantic technologies in ad placement over then next decade and one we are all watching very closely.
Christopher Weiss
LucidMedia
Posted by: Christopher Weiss | August 12, 2009 at 11:26 AM
Terrific writeup, Mark. It was a great panel; lots of important insight into the opportunities that semantic technology has in the realm of SEO, advertising, and information aggregation in the coming years. An interesting time to be alive. :)
Regarding your first question at the end: I think grasping what semantics really is (I think Nova mentioned that everyone struggles to describe what it really is, even those who understand it) has been a bit of an impediment. A thorough think-through about what it means for the Web landscape and how information retrieval happens has been another. Practical application to refine contextual targeting, disambiguate searches, and the like will go a long way in getting past the hurdle.
Cheers,
Jason Menayan
YieldBuild
Posted by: Jason Menayan @ YieldBuild | August 13, 2009 at 04:26 PM
This may sound like a dumb question, Mark, but how does one visually scan a page of search results and know what is semantically influenced vs. whatever-else influenced?
You gave a link to a Yahoo search for "kelly shoes" for example. I look at that page and I don't automatically see anything jumping out at me as "semantic". What should we look for, to see clues of semantic search influence?
Thanks!
Dana
Posted by: twitter.com/danatodd | September 15, 2009 at 12:38 PM
Can some tell me what semantic based website is?
can some tell me what semantic based website is? how does it function and the benefit using semantic technology ?
i did some search on the internet and become more confused as semantic has a lot of different meaning to different subjects
Posted by: Viagra Online | December 23, 2009 at 08:21 AM