Is XML the future of the internet?

I am interested in learning a little more about the difference between html and xml.

As far as I understand it, most web sites currently run on html, because the browsers most people use (netscape and explorer) read this language easiest.

XML on the other hand is more powerful, has more applications, etc., but currently browsers don’t understand (much) of it — is this correct?

The reason browsers can’t understand XML is because it is too plastic —> there are too many different ways it can be presented. (This correct?)

If this question is a little off-base, please be understanding as my current knowledge of markup is limited - but I seek to learn!

So, is XML the future of the internet - or is HTML here to stay?

Sidenote: is HTML/XML easy to learn?

HTML isn’t a programming language, it’s a markup language. All it does is tell a browser how to display the content of the page: in bold, in italics, with a picture here or a table there. It’s very, very easy to learn; 99% of it is simple pairs of ‘tags’. The ‘opening tag’ tells the browser software to start displaying something in a particular way (say, to display text in bold or italics) and the ‘closing tag’ tells it to stop. These tags are shown as follows:

<?> indicates an opening tag
</?> indicates a closing tag

(where ? is the command that’s being started and stopped)

So, for example, to display text in bold you’d use:

<B> to start the bold passage
</B> to stop the bold passage

XML isn’t quite the same. It doesn’t tell the browser how to display the content of the page, but is used to describe what the content between the tags is. So, for example, you could use <ADDRESS> and </ADDRESS> to mark a particular block of text as an address. It doesn’t make any difference to how the browser displays that text; it makes a difference in that the software now ‘understands’ what’s between those tags.

Here’s where it gets tricky. As far as I know, there is no universal standard for XML tags. It’s absolutely customisable, the idea being that users can set it up to reflect their needs. In theory it can be used to identify data across a number of sites. Say, for example, you were a retail fishmonger, and you wanted to compare prices across a number of sites. Using XML it would be possible to query a range of sites to check the goods available, since they would be using consistent tags to describe the goods - <SPECIES>, <WEIGHT> and so on.

Of course, that relies on all of those websites agreeing in advance to use the same set of XML tags. That’s why it hasn’t really taken off, in my opinion: there’s not enough mileage in designing or rewriting websites to share consistent XML tags. Companies see the cost but not the benefit.

I’m sure other dopers can enlighten you and me both on issues of technical feasibility…!

When I read this article in Scientific American two years ago, I understood XML a lot better than I do now.

Unfortunately, that’s not the whole story any longer. The year 1999 was a prophetic one for SciAm with their report on the sophisticated hypersearching techniques that became Google (or, as it is colloquially known on this board, “Googol”–hee, you’re a legend, tsunamisurfer!).

However, I haven’t seen XML take the Web by storm as Googol did. Maybe it’s just not noticeable to a layman like me. Perhaps our scientists can give us an update on what has happened with XML since May of 1999.

Also relevant here is XHTML. It’s a reformulation of HTML so that it is valid XML. XML is a bit more rigourous in it’s definitions (mainly for ease of parsing) which requires little changes like making a line break be “<br />” instead of “<br>”.

That’s probably the real future of the web.

I took a class in XML last year, and was told that the next generation of browsers will be XHTML-enabled. Web page designers will need to sharpen up their tag placements: whereas now you could get away with sloppy tag placement, for example <B><I><U>hello</I></B></U>, now you will have to close embedded tags in the exact reverse order that you opened them in: <b><i><u>hello</u></i></b>. Another thing XHTML will require is all lowercase tags. Right now, browsers don’t care if you use upper or lower case or any mixture of the two. So from now on get in the habit of properly embedding your tags and using all lowercase.

Thanks for the input thus far, dopers.

So is XHTML easy to learn or not? Would someone who knows it be able to teach me for instance, or is it something one should learn in a classroom setting? (I’m fairly computer literate.) What’s the timeline on learning?

(In case you are wondering why all the questions, I’m thinking of changing careers.)

Also, is it still a lucrative skill to have? Are web publishers still in high demand? Is this something one can do via freelance work? Just wondering, thanks.

The real benefit (and problem) with XML is that it describes the semantic content of the document. Ideally, this makes a wide range of automated information extraction and processing applications possible. Unfortunately, marking things up in XML can be very labor intensive. This is particularly true for content of a non-regular nature. It would be fairly easy to automatically mark up an annual report of a company. It would be much harder to mark up, for example, an essay on economic and geopolitical developments in Indonesia.

You’ve gotten some very good information, but I thought I could add a bit of detail to sum it all up for you.

SGML (Standard Generalized Markup Language) is the daddy of them all. It is a “meta-language”, which basically means that it is used to describe other languages. HTML is one of those other languages. HTML isn’t written in SGML in a programming sense. Instead it’s “defined” using SGML.

If you look at XHTML as cleaned-up (the official term is “well-formed”) HTML, you cover about 99% of the bases. One of the main goals was to make HTML more portable. For those who haven’t downloaded a browser lately, they happen to be getting rather massive. This makes them a bit difficult to implement well on cell phones, PDAs, etc. One thing that adds to the complexity of browsers is the ability to read badly formed HTML documents. XHTML is an attempt to counter this.

All start tags in XHTML must have end tags. If you want to start a paragraph, you damn well better end it somewhere. This makes the parsing code much smaller. When the code runs across a <p>, it can safely keep parsing the text as a paragraph until it runs across a </p>. It doesn’t have to look for numerous other tags, some of which would also indicate the the designer had intended for the paragraph to end. Also stated earlier was the fact that all elements and attributes must be lower case. You don’t have to include logic to handle both <B> and <b>, you know it will always be the latter. To anyone with programming experience, this will start looking very helpful.

The easiest way to describe XML is as a SGML replacement, that is still understandable by a layman. I do this kind of stuff for a living, but SGML is still scary as hell. XML contains a grammar that defines documents. You could even define HTML in XML (which is basically what XHTML is doing). mattk is correct when stating that there are not a lot of standard definitions out there yet, but they are slowly being created. Once this happens, you can expect XML to take off like a rocket.

An example of how this might work in the real world:

If the real estate industry had a standard that defined several tags, such as : [numbedrooms][/numbedrooms], [stories][/stories/],[imgfront][/imgfront], etc., any application that wanted to display home information could do so, regardless of which realtor’s database contained the actual listing.

Currently, most users (displayers) of foreign data must deal with the conversion process on their end. If the source changes, tough luck. XML standardization puts the onus on the source of the data. If someone doesn’t comply, software applications in that industry will just ignore them.

As for learning XHTML, if you already know HTML, then you just have to unlearn and learn a few things and you’re done. If you don’t know HTML, you can still become fairly adept in a few weeks, using any online tutorial or book. There are many tricks that you can pick up over time, but the basics are a breeze.

As for lucrative skills in web publishing, a combination of (X)HTML, Java, XML will definitely work. Other possibilities are Cold Fusion, ASP, Javascript, etc. The market for those who only have knowledge of HTML is a bit flooded, so it’s not as lucrative as it once was.

One thing people should be careful of in this discussion is that XML is not an analogy to HTML. The correct way to view things is that XML is a metaspecification for defining languages - part of an XML document, either explicitly included or implied, is a DTD (document type definition) specification, which defines what tags are legal, what types of content and attributes they may have, and what kinds of tags may appear in other tags.

The way to view HTML is as one particular implied DTD, with a syntax that is a good bit looser than what XML will strictly allow - hence the XHTML spec, which formalizes much of the current HTML as an XML DTD. As noted, this will require you to close content-less tags with />, explicitly double quote all attributes, etc.

As an example, I could create a document for the purpose of listing SDMB members which would have an enclosing <sdmb> tag, containing <member> tags with a NAME attribute, and a number of optional tags for other pieces of information about the member inside it. My DTD would describe this structure, and the following might be an example document:

<sdmb>
<member NAME=“yabob”>
<description>Some guy pontificationg about XML</description>
</member>
<member NAME=“Jomo Mojo” />
</sdmb>

Now WHY is this such a great idea? Well basically, it provides a way of separating document data from presentation, and it provides an easily understood way of transferring structured data between loosely coupled components.

In that latter regard, I can assure you that XML is VERY important, but is not something you are going to see as an end user of the web. The various ecommerce components behind the storefront you are interacting with will be exchanging information via XML based messages.

There are currently several emerging standards as to how this messaging is to work, ie. what DTD will be used.

For presentation, something else you may here of is XSL - the extensible stylesheet language. This is what you are intended to use, for instance, to convert an XML document into presentation. It may also be used to convert from one structured form (DTD) into another. This use is often referred to as XSLT.

Example - note my <sdmb> document above. It implies structured data INDEPENDENT of presentation - to use it, you would have to decide how you wanted to present it. XSL provides the mechanism for converting between the structured data and, say, a <TABLE> containing users and selected information about them, or a <UL> based list of users. The key point is that the presentation metaphor hasn’t polluted the data.

I could go on at length, but I will stop here.