01 July 2007

let. to ed. re. "A Smarter Web" -- Tech Review

Here's a letter to Technology Review that was published:
We read with interest John ­Borland's piece on the Semantic Web ("A Smarter Web," March/April 2007). We agree that this is an exciting time in the Semantic Web's development, yet we want to point out that its great degree of structure has drawbacks. As the article noted, Semantic Web users must learn complex ontology languages and structure their information and data using them. This difficulty inhibits the growth of the Semantic Web. It is thus arguable whether the Semantic Web can approach the scale of the standard Web, where anyone can easily create and publish content.
Ideally, we should combine the strengths of the Semantic Web and the normal Web. Search would be a good place to start. Today, global free-text search is the primary means of querying the whole Web, but it provides only coarse-grained access to documents. In contrast, the Semantic Web allows much more precise queries across multiple information sources (say, querying for a particular attribute, such as "street address"). However, it is on a much smaller scale, involving far fewer documents. We could imagine combining normal and Semantic Web queries--for instance, to search the free text of all real-estate Web pages written by women in Boston during the last week for the word "Jacuzzi." Taking this further, the few structured relationships currently in the Semantic Web could be used to refine the results of mainstream search engines.
Finally, as so much activity in the life sciences is focused on large-scale interoperation on the Web (as found in drug discovery), we feel that biological research could serve as a useful guide and driving force for the development of Web 3.0.


Citation of Letter
http://www.technologyreview.com/Infotech/18851/page2/
The Semantic Web
July/August Issue of Technology Review
Mark Gerstein and Andrew Smith
Computational Biology and Bioinformatics Program
Yale University
New Haven, CT


Letter in response to:
http://www.technologyreview.com/Infotech/18395/
Monday, March 19, 2007
Part I: A Smarter Web
New technologies will make online search more intelligent--and may even lead to a "Web 3.0."
By John Borland
Last year, Eric Miller, an MIT-affiliated computer scientist, stood on a beach in southern France, watching the sun set, studying a document he'd printed earlier that afternoon. A March rain had begun to fall, and the ink was beginning to smear....

Original Letter Text (before edit by magazine)

We read with great interest John Borland's March/April 2007 article "A Smarter
Web." We agree that this is an exciting time in the development of the semantic
web (or Web 3.0), and that it is on the cusp of more widespread acceptance and
use. A problem with the semantic web, however, is that it is not as flexible as
the free-text publishing supported by the standard web. As the article noted,
users must learn the semantic web's ontology languages and structure their
information and data using them. This presents a learning curve to users, acting
to inhibit the growth and spread of semantic web data. It is thus arguable
whether the semantic web can approach the huge size of the standard web where
almost anyone can easily create and publish web pages. The standard web will
likely still be the primary web most users see and use for the foreseeable
future, while the semantic web could remain a niche.

We thus feel that a practical direction is to investigate ways that the semantic
web and standard web can work together and leverage each other in a kind of
symbiosis. Keyword-based web search ala Google is the primary way of mining the
web for information today, but it only provides coarse-grained topical access to
documents and there are many kinds of information requests it cannot handle. For
example, queries that combine general relational information (such as provided
by the semantic web) about pages with keyword based searches are not supported.
Furthermore, one wants to be able to develop ways of leveraging small amounts of
highly structured information (as in the semantic web) as "training sets" to
better enable querying and clustering of the large bodies of unstructured, free
text information on the web; i.e. the small amount of highly structured
information could be used to bootstrap the automated organization, in support of
better querying, of the much larger unstructured information through data
mining. Since searching is widely perceived to be a crucial web application, the
semantic web's ability to improve it could be of high practical value and an
important driving force to help more fully realize the vision of the semantic
web. An important part of Web 3.0 should thus be to enumerate the kinds of
information requests that could be fruitfully made, and the kinds of information
infrastructure and data mining techniques needed to fulfill them. Finally, there
is much activity and excitement within biological research towards the goal of
truly large-scale integration and interoperation of its vast data, e.g. to aid
in more efficient drug discovery. The life sciences could thus be a useful
guide, test case, and driving force for Web 3.0.