So is every website at its core nothing but an HTML file?

When I was a teenager in the late 1990s, I was playing around a little creating a website and putting it on Geocities. Of course, being a kid, I didn’t have much content to offer, and the pages looked terrible, but it was the cool thing to do.

So I’m looking a little into web programming today, and it seems it’s still basically HTML files. You can beef it up with CSS and JavaScript, but those are integrated into an HTML file. I suppose the biggest innovation, conceptually, compared to what was going on back then is the prevalence of server-side scripting languages. But as I understand it, what these do is generate HTML code in real time which is then sent to the client and displayed by the browser just as it would display a static HTML file.

So am I correct in assuming that no matter what you do or see on the web, it’s always HTML, either static or (more likely) dynamically generated? Or are there web technologies that run independently of HTML?

This is where most of the magic lies. While they are technically integrated (although the CSS is often a linked external file), they’re not HTML. The <script> element is HTML, but an HTML parser has no clue what is happening inside.

Well, yes and no. What you see rendered in your browser is of course HTML. But the days of writing static HTML are all but gone. For most interactive sites nowadays, what you’re actually loading up in your browser is an app authored entirely in Javascript (or a derivative thereof, like TypeScript). That JS app will generate the HTML that formats your browser view, and it will make HTTP requests in the background to fetch dynamically generated data from a back end REST API. The data generally will come in JSON or XML format, and your JS app will parse it and integrate it into the dynamically generated HTML.

I’ve hand-written quite a bit of CSS and JS in the past 10 years or so. Being able to read HTML is really important, but I can’t even remember the last time I sat down with a text editor and wrote HTML from scratch. Even the JavaScript authoring is reduced/abstracted away by frameworks like React or Angular.

If you’re fishing for a suggestion on where to start with all this newfangled web stuff, I would suggest running through a React.js tutorial. This is not a magic job-seeking bullet, but it will at least get you up to speed with one very common way of doing things.

Your browser is rendering HTML, but that HTML might have been created by a script running locally within your browser. And things you do on the page might change that HTML code, without the server necessarily even knowing that you did it.

There used to be technologies that ran independently, like ActiveX, Java components, Flash… thank the Web Lords they are gone and dead.

I’d prefer to think of it as the fundamental web display technology is a DOM tree and HTML has become just the standard way of serializing a single state of a DOM tree.

When you download an HTML file, the browser uses that HTML file to construct an initial DOM tree, and from then on any updates (by javascript, for example) are made on the tree using tree-like operations (e.g. “find a node with property X, and then add a new node underneath that containing Y”) rather than generating new/additional HTML to be parsed.

(There are also RPC methods that send and receipt HTTP requests/responses like a web page, and the payload of the response is JSON or something else that is not HTML, but those are not normally meant to be displayed by browsers, so probably don’t qualify as “websites”).

Technically…

There is a thing called WebAssembly that allows you to treat the webpage as a canvas and which is not considered to be an embedded rectangle for a plugin the way that applets, vrml, etc. are.

At the moment, I’m not sure that it is used much. Maybe something like Unity is compiling out to it for web? But, in theory, there’s nothing to stop a person from creating a webpage purely in Wasm.

This is my understanding. If there are any mistakes, please correct me.

Originally the DOM tree was defined using a HTML file downloaded from the webserver and the browser built the DOM tree and rendered that into the webpage displayed to the user. The DOM tree is the main data structure, it describes nodes and subnodes. CSS was added to create a separate notation to describe the styles associated with each node and subnode. Javascript was added to introduce behaviour to a webpage. Javascript programs can manipulate the DOM, adding and deleting nodes on the fly. The result is still rendered by the browser, but it adds dynamic behaviour that can be triggered by mouse clicks and this animates a webpage and makes it responsive to the user. The original design was a bit clunky: download HTML, render it, if a user clicks on a menu item, download another HTML file and render it. This cycle could be slow. Having Javascript edit the DOM within the browser and re-render small bits of the DOM based on a user mouse click, made webpages seem much faster and perform like a regular desktop app, despite being dependent on data coming across a sometimes slow Internet connection.

So the server needs only to provide a minimal HTML document and Javascript can build out all the detail. Moreover Javascript can also get data from the server in the JSON serialisation format. It takes this data, turns it into variables and then uses this data to build the DOM. The JSON files can be static or created on the server by a scripting language and a database.

There is a nice division of labour here. Server-side developers using some grown up language like Java, Python, Go…or whatever. The there are Front-end developers who are Javascript jockeys who are concerned with the complexities of running Javascript efficiently in the web browser in all its variations. Server-side developers have an easier time because their server environment is much more predictable, but they are have to deal with scaling up to handling lots of browser conversations with the web app.

In the past there were other languages that would also run in a browser environment and for a time these did a far better job than Javascript. Flash was the widest used. But there was also Java applets, which was attractive to larger companies that used Java as their main development language. So there you had an example of a server side language being used in the browser. Interestingly as Javascript developed the same process happened in the other direction. Javascript could be used as a server side language, Node.js. So train all front and server side developers in Javascript and call them ‘full stack’ developers. What could possibly go wrong? Well the browser and the server environments are very different and there is no one language that has the features to deal with all the challenges and remain efficient. There has been a procession of languages on the server side with libraries and enhancements to deal with scaling and other challenges of running a server. This has not been the case in the browser environment. There have been various incarnations of Javascript, it took a long time to mature as a language. But the browser is also moving target. Ensuring Javascript programs work with all versions of the various competing browsers out there is a daunting task and introduces a lot of extra ‘tooling’. Every year or so a new Javascript framework comes out and another set of tools to help develop the code. Moreover, browsers now run on all kinds of devices from smartphones to tablets to desktops with a huge range of screen sizes, processing power and memory. It is a messy dogs dinner.

Recently alternative languages that will run in the browser are staging a comeback to get away from all the Javascript complexity by simply bypassing it. Browsers are similar to a software virtual machine. They abstract a processor that can execute programs that are written in ‘byte code’. This is WASM. This is not a new idea. Java works like that with its Java Virtual Machine that must run on each server for Java to run.

So if a regular language can generate browser bytecode a program can run anywhere there is a browser. This is an attractively large number of machines worldwide. Some languages are compiled and can be configured to generate code that can execute on several different computer architectures. Browser based byte code, simply adds another to the list. Wiki tells me there are about 40 languages that support WASM now.

For larger organisations that have some control over the computers in use, using a regular languages that may have battle tested tools and testing, this may make a lot of sense to include a browser as a target environment for running app. For Internet startups whose customers are the great unwashed out there in jungle that is the Internet…maybe, maybe not.

Note that there are plenty of other document models out there intended for use with apps other than browsers Microsoft Word is an example. So is Adobe PDF. These are oriented toward print media and they don’t have the dynamic features of Javascript that can create an interactive experience for the end user.

Berners-Lee designed HTML to be a simplified markup language to try to help organised the multiplicity of formats on the computers at CERN in the 1990s. Mark up languages have been around ever since there has been a printing business. Many big industries had their own versions to try to put some kind or standard and order in their documentation. He created a version that was orientated towards rendering the document not on paper, but on screen. So HTML included simple headings, a bit of styling, rudimentary forms and able to handle images. He also added the magic hyperlink, which gave us the facility to surf from one page to another transparently despite being on different servers. It solved a problem. Then it evolved as people found new applications and computers and their screens became more capable.

There were other markup languages that came after HTML. XML in all its incarnations. This did not fare so well, which is why we are still stuck with a dominance of proprietary formats like PDF and various Microsoft formats and a handful of open document standards. The brave new world of virtual reality is probably the next file format bun fight.

Simple HTML is still serves a quite useful purpose. It is understood and by the simplest and lowest power devices and requires little bandwidth. If you want to read a simple web page full of useful information, you don’t need all the bells and whistles that go with Javascript to animate the experience. There are also some key advantages. It is very easy to create static HTML webpages from a more human friendly markup language such as…Markdown.

Static websites are definitely a thing.

So you are right, every website at its core is nothing but HTML.

While it is still HTML on the web page, how it is being managed and generated has improved greatly. Rarely does anyone directly code an HTML web page, they use a CMS.

I’m not sure that I would call Markdown user-friendly. It makes it easy to add formatting to a piece of writing, but it makes it too easy: You often end up with formatting where you didn’t intend it. Like, how do you spell the name of that Korean War sitcom from the 70s? MAS*H, of course.

I wish to clarify one thing: while, in theory, every part of a webpage could be sent in a single HTML file, in practice the different types of content are split up. You have the HTML file, the JavaScript files, the CSS files, the image files, the video files, and so on. The HTML file will usually contain the URLs of the other files. though now the protocol can automatically send the files along with the HTML file.

Furthermore, once the JavaScript is loaded, it can send and receive data. This data can be in many different file formats.

I also note that much of the files are sent in a compressed form, and nearly always encrypted these days.