Loooong URLs

What’s the point of extra long URLs? I was posting to another forum I frequent and was trying to provide a link to a product. I was a bit surprised when it displayed as:

If I put that into TinyURL it will give a muck much shorter URL linking to the same place. Mathematically it’s not necessary so why the long URLs?

ETA: since the SDMB shortens them, I posted a screen capture here (about half of it): https://s3.amazonaws.com/gs-geo-images/6f59d8af-c667-461c-9c0b-fd2cc0873081.png

Because that URL has all kinds of tracking information in it.
TinyURL just gives you a short url that then links to the much longer one.

Something else fun to do with URLs:

https://www.irs.gov%2FLucas%20Jackson%20is%20delinquent@boards.straightdope.com/sdmb/showpost.php?p=21122457&postcount=1&prosecution=Y

There! I’ve linked your OP to your delinquent account at the IRS. :smiley:

Wow! My antivirus hated that link! Says it’s a phishing attempt.

I think the limit is about 2200 characters. I often have to pass strings for my web site that exceeds that. Lots of 16 digit numbers, each one representing a different property. When I exceed the 2200 limit, I have to do it in batches.

Uh… the user never sees these long strings.

Yeah! antivirus working as it should. It would be better if browsers themselves identified this as a phishing attempt.

Note that a lot of people, self included, hate the tinyurl type shorteners. You have no idea where the link is really going to. So I avoid them whenever possible.

All the parameters in the long URL are needed by the destination web site. As noted above, tinyurl doesn’t eliminate the long URL, it just provides a short URL for you, the human, to use, but it just redirects to the long URL. The destination web site (ebay) still sees the long URL with all the parameters.

There’s no standard – each browser and each web server has its own limit. The apache server normally has a limit of 8192 characters, but you can recompile to increase that limit (I’ve had to do this in the past). Internet Explorer has a limit of 2083 characters, and due to Microsoft’s dominance, that’s usually considered the global limit, at least for pages that will be accessed from a browser.

If you need to pass more parameters than will fit in a URL, it’s pretty straightforward to use POST rather than GET, in which case there is no limit on the length of parameters.

Most of that URL is superflous, everything after the &ul_ref= looks like an encoded list of other URLs, probably the list of other pages on Ebay that you visited on the way to that one, a breadcrumb of sorts. You actually need only the stuff before the first question mark. If you know how you can confirm that easily: open the page’s source (the HTML), look for either the tag <link rel=“canonical”> or <meta name=“og:url”>. The first is for search engines to distinguish between links to unique pages, and the second is the Open Graph standard used by Facebook.

Another thing to try, if the page has one of those “share” widgets with buttons to email/tweet/pin/like, the URL in those should be the same as the canonical one. Easiest is to click on the “email to a friend” button, which takes you to a compose window in your email program, then copy the URL from inside the auto-generated email body.

Links, especially to products or things you Googled, often include lots of information about how you got there. Useful for those doing marketing, but not needed by the end user. I look carefully at anything following an ampersand, and often find it can be eliminated. Of course, different programmers take different approaches to building the long strings, but often it’s not too difficult to figure out what’s indispensable.

Just in case anyone is unaware of common URL structure, there’s often a set of variables or parameters passed to the script on the server, like this:

http://somesite.com/page?var1=value1&var2=value2&var3=value3

e.g. the URL for posting this reply is:

https://boards.straightdope.com/sdmb/newreply.php?do=newreply&noquote=1&p=21124865

You can often see what it’s doing and alter the URL yourself, deleting any unnecessary parts.

They aren’t neccessary.

here’s a valid URL for that item. Stamping & Embossing Ink & Pads for sale | eBay … which is
ebay then /p/1250745730

This is probably more information than you were looking for but to give a more technical answer than the others have given you…

Websites operate in what’s called a “stateless” way.

Most of your applications, say like a computer game, are keeping track of what you’re doing and what you’ve done before. Everything it needs to know, it has stored somewhere in memory or on the hard drive, and that’s all piping directly into everything you see. This is called being “stateful”. It knows what the current state is.

The server that is generating a webpage (by default) has no idea if it has ever seen you before. It doesn’t care how you got there. After it finishes generating the HTML for the page you’re going to view, it forgets everything about you. This is stateless handling.

The purpose of this is that it makes it easy to create an application that needs to be able to handle millions of users. Functionally, you can’t accomplish that. You need to split the users out over a number of machines in order to handle that amount of volume. It’s like at a bank, you can’t just have one teller for everyone. You have to divide the workload out.

So now we could try to ensure that every time your browser comes back to request a new page, it goes to the same server, and then it would be able to keep information about who you are and what you were doing in memory and on the hard drive, but that doesn’t work very well in practice. The machine might crash, the number of users might have grown too large and the system needs to migrate some users to a new server, etc. There’s also the problem that all of the data for all of the users just might be too large. People can sit on the same page for 30 minutes before clicking to the next one. A single server might be handling a million customers who are really taking their time between each request, so it’s not taking a lot of the server’s time to handle each one, so it can handle a whole bunch of them, but if it has to keep their information in memory, then it will still get swamped out that way.

Overall, we want to be able to freely move you around, on every single request, to a new server if at all possible. Maybe we will, maybe we won’t, but if we assume that we might have to do that switch at any moment, then it’s easiest to just run stateless.

So, for example, if you’re looking for events around town. Page 1 might ask you to choose some dates. When you pick the dates, that will create a URL with a start and end date, and the server who sees the URL doesn’t need to know who you are or what you were doing before, it just knows that you’ll be happy so long as it generates a page with information between those two dates. No matter server you go to, so long as you hand them that URL, it will work.

It’s like the rack of forms at the Post Office or other government building. Pick the form, fill it out, and the person at the counter will be able to process the request, no matter which one you get to. If they need to send you to some other agency, they can give you a form for them, and any one of the tellers over there will be able to handle that if it’s one of their forms.

When you sign up a the library, they hand you an ID card. From then on, you carry that with you. It has an expiration date and some information and styling that allows the any person who works at the library to ascertain that it’s really one of their issued IDs. They don’t need to keep a master librarian who memorizes you and every other member of he library and he has to come out and wave you through every time you visit. Just issue a card to you and then anyone can handle your requests.

Sign-in at a website works in the same way. When you first visit and sign in, an identifier is issued to your browser. Every subsequent request, the ID is passed (not through the URL, but through a hidden parameter in the request) and the server can quickly know that you’re the real deal without having to know who you are.

This is all stateless handling.

The URL is like those forms I talked about. Each page has information each needs in order to know what you want. The URL will supply that.

But also like modern day forms, it can have some junk information like, “Hey, you should put your email address in here so we can sign you up to our spam list!”

In this case, it’s usually going to be more innocuous than that, though. Usually a lot of this is more along the lines of information about page or pages that you just came from. The people who make the website want to make the website better. If they’ve just changed something to make it easier for people to find their way through the site, they want to know if their change actually worked. If it makes it worse, then they’ll want to reset it to how it was and come up with a better technique. So each page might insert an identifier for itself into the URL for the next page.

Often a lot of it is also necessary. The URL in the OP contains a second URL inside of it, that’s been obfuscated so that it doesn’t confuse the browser.

Say that I’m sending you to PayPal. PayPal can handle payments for a whole variety of businesses. I need to send them my identifiers, so they know who to send the money to, once you’ve sent it to them, but they also need to know how to send you back to my website. So I’ll stick a URL in the URL for their website.

On my site, you see a page listing your order, then you see PayPal, and then you see a page back on my website that says, “Hey! Thanks! Your order will be ready to ship in 2 days.” To accomplish that, we had to send the “Thanks” page URL to PayPal. That thanks page needs to be able to look your order up, so the URL that we sent to PayPal also needs to include your order ID.

Overall, the URL for PayPal needs to contain:

The site: paypal.com
The function: /send-money
Business ID: business-id=1234567890
Return URL:

  • The site: mysite.com
  • The function: /checkout-thanks
  • Order ID: order-id=ABCDEFG

There can also be a whole bunch of extra stuff for encryption and security, and the previous stuff for tracking, and so on. The tracking information is junk and can be discarded, but most of the rest can’t.

Thanks!