The different kinds of tags

So, everyone knows that there are (at least) three different kinds of tags used in Discourse. There are Markup tags, which use things like asterisks and underscores (which apparently do the same thing; I’d have thought that underscores would underline, but never mind that). There are BBcode tags, enclosed in [square brackets], like [b][/b] and and [u][/u]. And there are HTML-like tags, enclosed in <angle brackets>, like <del></del> and <sup></sup>.

I had thought that these were all implemented in the same way, and just coexisted with each other for… reasons. But I’ve noticed that this isn’t the case.

Try typing some of those tags, and watch the preview pane to the right. When you use Markup or BBCode tags, the tags don’t do anything until you close the tag. If I type **bold , then I get two asterisks and the word “bold”. It’s only when I put in the two closing asterisks that the word turns boldface.

By contrast, if I start typing superscript, then the moment I finish the greater-than-sign to finish the tag, the tag disappears from the preview pane, and everything I type after that is superscripted, even before I type the closing tag. In fact, I’m not typing a closing tag at all right now, just submitting the post to see how it looks.

And indeed, the superscript shows up in the final post, too, even without ever having a closing tag. This means that the HTML tags are implemented differently from the Markup or BBCode tags. Which, in turn, worries me, because… are the HTML tags even actually implemented in Discourse at all? Or are they just leaking directly through to a lower level of abstraction? If Discourse is in fact just allowing HTML tags to leak through, that’s a huge problem, because leaks like that can cause big problems’); DROP TABLE Posts;–

I think the HTML tags are better considered to be just a type of markup that just happens to use some of the same elements. It’s not really HTML.

In this case, Discourse closed the <sup> tag for you. It also inserted <p></p> tags around your paragraphs.

I’ve seen lots of forum/publishing software leak tags, so it’s not an unheard-of problem. But Discourse only allows a very limited set of HTML-like tags in the first place, so it only has to worry about closing a very limited set of things. Easier to get right than the general problem

Happy birthday!

Still, there’s some reason why Discourse auto-closes one kind of tag and not another. I won’t say there’s a GOOD reason for it, but there’s surely a reason.

The Markdown spec doesn’t seem to specify what happens with unclosed tags (though maybe I just didn’t find it). But in any case, other Markdown viewers had the same behavior as here. For example:

Type in “**bold” and it doesn’t actually make it bold.

Unclosed HTML tags (with some specific exceptions) are invalid HTML. Browsers try to be resilient, but ultimately it’s their choice what to do with these tags: ignore them completely, auto-close them at a reasonable stopping point, etc. And software like Discourse can do something different yet, since this isn’t real HTML and it wants to avoid tag leaks.

HTML tags are considered a part of a Markdown, oddly enough. So it is the same renderer that is handling both the symbols and the HTML tags.

https://daringfireball.net/projects/markdown/syntax#html