Data URI parsing in favicon images

It seems like the software is not properly reading data URI scheme in the icons when it indexes sites.

Example:

The small image to the left of the name “The Daily Beast” shows a broken image on my computer. When I check the image source here on Discourse, I get

https://www.thedailybeast.com/the-naacp-lawsuit-against-trump-is-delayed-after-some-guy-named-ricky-took-the-paperwork/data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAIAAACQkWg2AAABPUlEQVQoz6WSv6rCMBTGOymCIL6BizqIgvgHBRdxcPYJxNkHEDd1EhUclE6uPoKzu4OggoKP4CC3JqZNbJqbGKk3XVq4IcPJB7/DOV8+DdbrP6GQvEYk8kgm8WLBKFX0aBTkcni5ZIxp/M3cQyk9n2GthnVd0V8vejjAcpms13+A+53ZtqD2e5DPu7pzu/FGgtpsnq3WF4CNhtXviwpjIxZzdZDJ4NlMNDoeQamkAKjTkbUc3QXMdyN6OoFCIRBgDQZyVFip+APcAHq98oLoOmq3/YHPQQhWq9Zo5A/Y260wSu6QSgXbodd7ozb/2WAudbuu/g/g2Wzi8VhUpmnE48pIw6EMCA+V5nWDj7rbgWLxqyP0CRpfOptVAULk75DVyhvKy4XnF8/nmifeIJ3G0ylzHEUPhx+JBJ5MuFG/5/Ynabxw9yEAAAAASUVORK5CYII=

But the URL that I found in the header on the page is

data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAIAAACQkWg2AAABPUlEQVQoz6WSv6rCMBTGOymCIL6BizqIgvgHBRdxcPYJxNkHEDd1EhUclE6uPoKzu4OggoKP4CC3JqZNbJqbGKk3XVq4IcPJB7/DOV8+DdbrP6GQvEYk8kgm8WLBKFX0aBTkcni5ZIxp/M3cQyk9n2GthnVd0V8vejjAcpms13+A+53ZtqD2e5DPu7pzu/FGgtpsnq3WF4CNhtXviwpjIxZzdZDJ4NlMNDoeQamkAKjTkbUc3QXMdyN6OoFCIRBgDQZyVFip+APcAHq98oLoOmq3/YHPQQhWq9Zo5A/Y260wSu6QSgXbodd7ozb/2WAudbuu/g/g2Wzi8VhUpmnE48pIw6EMCA+V5nWDj7rbgWLxqyP0CRpfOptVAULk75DVyhvKy4XnF8/nmifeIJ3G0ylzHEUPhx+JBJ5MuFG/5/Ynabxw9yEAAAAASUVORK5CYII=

Image of what it looks currently looks like:

Image of what it should look like:

Huh, good catch. It definitely seems like there’s some code along the lines of “if the favicon URL doesn’t start with http(s), prefix it with the site URL”. Which fixes relative links but doesn’t work for data URIs, clearly.

I quickly perused the Discourse source code, and while I wasn’t able to find where the code deals with the favicon (also called site-icon and some others) inside Oneboxing, it does seem like there’s some marginally dodgy code elsewhere, for instance:
// edge case … what if this is not http or protocoless?
if (!/^http|^///i.test(href)) {
continue;
}

So it wouldn’t surprise me if there’s also a regex matching on /^http/ here as well. Pinging @codinghorror .

Oh, so the daily beast isn’t using a link to a PNG for favicon, like a normal website, but instead has decided to use bizarre encoded image data. That’s… dumb?

<link rel="icon" type="image/png" sizes="16x16" href="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAIAAACQkWg2AAABPUlEQVQoz6WSv6rCMBTGOymCIL6BizqIgvgHBRdxcPYJxNkHEDd1EhUclE6uPoKzu4OggoKP4CC3JqZNbJqbGKk3XVq4IcPJB7/DOV8+DdbrP6GQvEYk8kgm8WLBKFX0aBTkcni5ZIxp/M3cQyk9n2GthnVd0V8vejjAcpms13+A+53ZtqD2e5DPu7pzu/FGgtpsnq3WF4CNhtXviwpjIxZzdZDJ4NlMNDoeQamkAKjTkbUc3QXMdyN6OoFCIRBgDQZyVFip+APcAHq98oLoOmq3/YHPQQhWq9Zo5A/Y260wSu6QSgXbodd7ozb/2WAudbuu/g/g2Wzi8VhUpmnE48pIw6EMCA+V5nWDj7rbgWLxqyP0CRpfOptVAULk75DVyhvKy4XnF8/nmifeIJ3G0ylzHEUPhx+JBJ5MuFG/5/Ynabxw9yEAAAAASUVORK5CYII="/>

Why? This is so inefficient; it’s sending a whole copy of the data down every time rather than a link to the image, which could be cached. I mean I guess it would make sense if the favicon was changing with every page load somehow…

Are there other sites doing this stupid approach in your experience?

It’s dumber than that. It uses that same URI multiple times, setting it as different sized icons. And then it actually has a link to a standard ICO favicon listed under rel="shortcut icon"— though it is admittedly a different image.

I don’t know. I’ve definitely seen other infoboxes with broken favicons, but this is the first one I actually checked to see what the underlying problem was. (I was troubleshooting an issue on my computer at the time, and wanted to make sure it wasn’t an issue on my end.)

But, as you say, it would make sense for designers to use this on sites where the favicon is intended to be different on each page. Plus using data URIs is entirely standards compliant, as they can be used in any place where a normal URL can be used. As such, I would suggest Discourse should support them when parsing data scraped from other sites.

I agree it’s laughably inefficient, but not every designer uses the most efficient code. I’ve still not ever gotten most designers to optimize their PNGs—not even Randal Munroe does it, and his xkcd website is one of the most lean modern sites I’ve ever seen.

It’s only 522 bytes. Much less than an entire fetch would be, given HTTP headers and other overhead. It could be cached, as you note, but whether it’s a net win would depend on the cache expiration time, which they might not want to be very high. If the expiration time would have been a day, it’s probably working in their favor since I doubt they get more than a few visits per day per person.

Also, 522 bytes is completely negligible compared to all the other crap in most modern sites…

It seems to me that it would be always be a win. Presumably most users view more than one page. And, as I said, they use the same data URI three times, for three different favicon sizes. And it would add up.

The advantage of data URIs, as I understand them, are about rendering speed. Since they are included in the page, they are downloaded as soon as the part of the page is downloaded. However, for a favicon, this is not generally all that useful. My testing has shown that most browsers do not update the favicon until the entire page has loaded.

It does seem a generally bad idea, except in the case where each page has its own favicon. Especially doing it three times when a single declaration without an icon size would be sufficient. It seems to be cruft that has built up over time.

I agree that the larger sizes push it into never-a-win territory. But I took a look at the headers at the site and both the request and response are ~500 bytes each. That’s comparable to the 16x16 encoded icon, so you’ve got around 3 page loads before it hits breakeven if that were the only one.

Incidentally, Discourse itself is using a data URI for an image:
https://sea1.discourse-cdn.com/straightdope/assets/_application-ab183eecc34d73e9a6ef9b150738f306f61a1163ff7bcd3b337183834aa3b28f.js

var MINIMUM_SIZE = 150;
var hiddenData = new WeakMap();
var LOADING_DATA = “data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==”; // We hide an image by replacing it with a transparent gif

function hide(image) {

That one’s a bit more legitimate, though (just a 1x1 transparent gif).

Definitely a good one to know about – thanks for the heads up. Keep an eye out for any others.

I notice a lot of broken Favicons, actually. But I’d been forgetting to check why they were broken.

Here is a broken one for Esquire magazine:

Markup creating the issue:

<link href="data:;base64,=" rel="shortcut icon">
<link href="/sites/esquire/assets/images/favicon.ico" rel="icon">
<link href="/sites/esquire/assets/images/favicon.ico" rel="shortcut icon">

Browsers seem to take the last declaration as canonical, rather than the first one. They may also prioritize pure rel="icon" over the IE rel="shortcut icon".