Tim Berners-Lee, the creator of the World Wide Web, is trying to take his innovative concepts to the next level. TBL was at MIT this week promoting his new idea, the “Semantic Web.” He is claiming it will mean as significant a paradigm shift in how we access knowledge as the original WWW was from what went before. There was an article about it in the January 29 Tech section of the Washington Post.
From what I can gather, it will involve adding several new tags to a Web page that will interconnect their knowledge contents in new ways. How this will change the way they are accessed is not clear to me. Some naysayers are predicting that most web page authors will not want to bother adding the extra tags, but the whole thing depends on everyone massively using these tags. <digression> You know what this reminds me of? XML. It’s another protocol that involves upgrading HTML documents with spiffy new tags that spiff up the information retrieval process, or something like that. I remember 3 years ago it was being said that XML was the next big thing. I even went and took a class in it. Uh… when was the last time anyone heard any mention of XML? What happened to it? </digression>
Anyway, Tim Berners-Lee is one serious techie, not someone I’d sneer at a priori. If he has major new ideas, they just may be as momentous as he says they will be. I’d like to understand more how the Semantic Web is supposed to revolutionize our information connectivity.
My take on it is that it will indeed be a lot of extra work, but then so is Java and Java scripting, and look how much of that’s on the present World Wide Web. Those who are making web pages that will be improved by semantic linking will put in the extra work, and lots of other people won’t. But eventually, it could be quite common. But don’t take my word for it: HTML is too much bother for me most of the time, so I probably won’t be one of the early adopters.
Well, first of all XML may no longer be quite the buzzword that it once was, but it is still growing - especially because of web services (a buzzword in its own right). XML is not anything like the semantic web, because XML does not define any of the languages built on top of XML (and XML does not necessarily have anything to do with HTML, except that they are both cousins of SGML and are sometimes used together).
The purpose of the semantic web is to infuse documents with explicit meanings that are standardized from the outset. Its sort of like taking HTML meta tags to the next level. Most people probably will not adopt it quickly, because it won’t be terribly important to them. But as google and other search engines provided more successful search results against semantic-enabled content sites who want increased visibility will invest the effort.
Also, there will be tools to help insert the semantic tags - a utility could predict what tags are likely and the human chooses among these quickly and easily - or adds their own if the AI can’t predict it.
I hate to spend too much time on this digression and Cooper already handled it, but I’ll reiterate that XML is alive and well. It’s is definitely in use in the real world as part of communication and data storage protocols. Among other things, a lot of web services (another buzzword which the media thinks has faded but which is doing quite well getting real work done) depend on XML, and Microsoft is apparently going to use XML as the basic file storage format in the next versions of Office. XML has been a tremendous help in writing middleware between legacy apps and replacing horrible formats like EDI. I’ve also written XML schemas for a lot of internal apps to handle network data in cases where simple delimited data might have been used in the past. XML is much easier to parse robustly because of the explicit tags, so it’s the best option in almost every case where you have one machine talking to another.
I differ from Cooper only in the relationship between XML and HTML. HTML is essentially an XML schema designed for content markup. I realize that’s a retroactive definition since HTML existed first, but if you can understand why HTML is a good way for servers and browsers to communicate content markup, then you should understand how valuable XML is to other situations.
I can’t provide too much input on the main point of the OP. TBL’s proposals look good and he’s a tremendously smart guy so I tend to take him seriously, but I can’t see how author-defined additions to content are going to help us search or navigate too well. We learned from meta-tags that most web authors can’t be trusted to accurately describe their content.
I’d second (or is it third?) the previous mentions about XML - i use it for some pretty nifty Flash menus for clients (that allow them to customise the menus without ever having to touch the .swf) as well as other bits and bobs.