Storobia Archive

The Semantic Web

The World Wide Web is a huge information resource. Most of what we want is out there... somewhere. With all that information scattered around in a disorganised jumble, how do we - or our AI assistant - actually find what we want? Current search engines still have difficulty with the ambiguity of human language and the ad hoc nature of web documents.

We need are tools that can help us locate relevant web documents more efficently and extract information from them automatically. For this to happen we need a method for those tools to better "understand" a web page. Enter the semantic web.

What Is The Semantic Web?

Tim Berners-Lee is credited with having coined the term "semantic web" which he defined as:

"a web of data that can be processed directly and indirectly by machines"

The idea is that all sorts of documents - not just web pages - can be analysed automatically and relevant data extracted. This would allow greater personal productivity, internet data mining, better communication between suppliers and customers, etc. Semantic blog software could even make the blogosphere less of a labyrinth!

One way of achieving this would be to extend existing data and systems via some form of "metadata": data about data.

Standards and Tools

To achieve the goal of a truly semantic web a number of tools have been proposed. The most significant of these are XML, RDF and OWL.
  • XML
    XML stands for eXtensible Markup Language. It's a generalised document markup language that defines the structure of a document without defining its meaning
  • RDF
    RDF stands for Resource Description Framework. It's essentially metadata written in XML which can be used to say what the data is.
  • OWL
    OWL is the Web Ontology Language. It allows statements about the relationships between data items. When combined with RDF, OWL is the beginning of a way of defining meaning

Top Down or Bottom Up?

Key to making the semantic web a reality is interoperability, not just at the XML level but at the higher levels of OWL and ontology. So where should should this be defined?

Many people, especially those in the academic world, favour a centralised definition of a standard ontology to be used by everyone. That would be technically desirable and make the realisation of the semantic web far simpler.

Unfortunately it's also impractical. Most of the information on the web today is not produced by academics or technologists. The idea of bloggers using a centrally defined ontology is unrealistic.

There's also the issue that the web already contains a huge amount of useful semantic data in the form of user defined "tags". This "folksonomy" is unstructured and riddled with contradictions and holes, yet also extremely powerful. It would be foolish to throw it away.

So the challenge facing the semantic web is to somehow bridge the gap between the informal folksonomy of internet users and the more rigorous ontology required for automated information processing. That's not going to be an easy thing to do. If we do achieve it then it might really be "Web 3.0".

This is an old page archive from Storobia. Please read the site terms of use