I’ll often come across sites, usually on Facebook, that I’ve learned are terminally buggy, ad-ridden, clickbait, wastes of my time, etc. to open, typically labelled “promoted” or something of the sort, and I was wondering why someone couldn’t extract the content of these sites and render them in the usable form that people want them to be in. (Of course this person would need to be much better than I am at navigating websites, copying material, and above all at preventing malware and such from infecting his computer.) If this is possible (I’m not sure it’s as easy as I think), is it legal? If so, it would be great to open these buggy wastes of time, and peruse a bunch of these websites safely. Can you tell me why this is an unfeasible idea? Thanks. I’m thinking it would be a popular website and a service to those of us who know better than to open up one of them.
There are adblockers for browsers that already block ads, pop-ups, and other content you don’t want to see. Try uBlock Origin.
There are also a number of browser add-ons specifically for Facebook, Twitter, etc. that will allow you to customize the content.
Why do you want to open these sites again? They’re full of spam, threats, or crap.
“Clickbait” by definition has no useful content. Why would I want to extract worthless content? I’d rather block it.
There’s a subreddit dedicated to terrible clickbait called Saved You A Click that basically does the same thing. Spoiler alert: It’s all garbage.
Well, you can start here for the extraction part:
The legal issues are fraught; Facebook and other social media sites make information on posts or pages public which is protected by the End User Agreement (as is their agglomeration and sifting of data for personal information to be sold to third parties); if you are surreptitiously scraping data and republishing it, you are almost certainly violating copyrights even if information is not explicitly stated.
As for “rendering them in a usable form”, that is trivial, requiring no coding, scraping, sorting, or filtering: You’re welcome,
I’m not sure what your idea even is. For most “real content” you can find on bad websites there already are good website with the same content. Do you have any examples of what you would like to see available on a better site?
The kind of content that you can exclusively find on terrible websites is usually of dubious legality and/or something people aren’t willing to pay for. A lot of the brokenness of those pages is about trying to make a buck out of the visitors.
Aside from what everyone has already mentioned - namely that the content is garbage - the content is not usually the original source, either. I don’t generally click through to click-bait type links, but I do occasionally click on stories that are posted on random US-based local news stations. Most of the time, there’s a lot of popups and auto-play videos, and it turns out the source is something like the AP or NYT or another city’s local news.
I’ve used professional scraping services before, for a legit reason. It was not cheap, nor was it fast and easy. I would hate to spend my scraping credits on shit stories that are re-writes of free and legit sources.
Also once the sites figure out you’re scraping…they’ll block your scraper service.
Usually gosspy, celebrity garbage that I get interested in seeing --“Which movie stars own gigantic ranches in Wyoming?” or “Famous singers who have never been married,” that sort of crap–that I might click on but I see it’s “promoted” and I’ve been there before and wanted those five minutes back. The content might be summed in a list, or a few paragraphs at most, but I know those sites are full of crap, in every sense of the term, so I don’t click, but I think sometimes that an aggregator of these sorts of sites, with cleaned up delivery, might be successful. Not for me–I don’t have the skills or the interests, but I wondered why someone with those skills hadn’t tried it.
I get it. Sometimes I do want to see “25 images of bra-less TV actresses from the '70s”, but I don’t want to click through 25 pages with a single image each and 25 pages of just ads to see them all.
Exactly. I was wondering if there were legal or technical obstacles to someone aggregating such sites.
If you really want to know why you’re supposed to cover your side view mirrors with plastic bags or pour salt down the drain and don’t want to scroll through a ton of clickbait, try posing the question on Google. Many times a site like Snopes will provide the answer.
Yes. Copyright law is the primary legal obstacle.
This. But often, even if you find the info on a “good” website, it turns out to not be worth the time and energy.
I’ve occasionally been tempted to click on a clickbait “story”, so instead I googled it in hopes of finding a non-slideshow/non-ad barrage/non-trashy version of it.
But then I realize that I don’t need to know about one weird trick, or celebrities’ private lives… Hey, I’ve got better things to do, and it’s just because I’m lonely and bored and tired that I’m even clicking on drivel. (And it’s those times that I lose the ability to discern drivel from good stuff).
Just say no to drivel.
I have to admit, I’m really curious about the “Once famous star now works in Placerville” one. I mean, I’m only a little way from there, I could go say hi!
But I know better than to click on that stuff.
You wouldn’t have to copy it, just hot-link to the heart of the content. That’s sort of a gray area for IP rights, IANAL.
Yeah, me too. Once in a while. On a slow day. But I don’t.
Is it really illegal to start a website that purports only to cite clickbait sites as sources (“According to getchermalwareheah!.com, Tom Cruise and Penelope Cruz own Carnival Cruiselines”)
The SDMB used to regularly be cloned.
This website removes clutter for news sites.
There exists a site called
which does something similar to what the OP suggests.
You cut-and-paste a webpage address, and it shows you a text-only version of that page.
Try https://deslide.clusterfake.net/ for getting all images together on one page from a slideshow-like presentation.