AI affecting Stack Overflow and Reddit

Dr.Strangelove · June 8, 2023, 7:36pm

You brought up the Getty case as a response to what I said about the Facebook ToS. And I’m saying that’s a completely different thing because Stability AI never had a license.

I don’t know what Stability will argue in court, but the experts that journalists have reached out to seem to think it’ll rest on fair-use claims:

“The complaint is technically more accurate than the class action lawsuit,” said Guadamaz. “The case will likely rest on the [copyright] infringement claim, and the defendants are likely to argue fair use. Could go either way.”

Yes, and Facebook (or anyone with similar terms) can sell a license to a third party so that they can use your work for AI training. Without any explicit permission on your part aside from what you already agreed to in the ToS. And this isn’t hyopthetical; this is exactly what Shutterstock is already doing with OpenAI, with a very similar ToS. They have sublicensed their contributors’ work for use with AI training.

“such as” is doing a lot of heavy lifting in Facebook’s statement.

Rights to “publicly perform or display” as well as “create derivative works” are the bread and butter of copyright. Huge swaths of cases hinge on one or both of those. But you’re just giving that to Facebook when you put content on their site.

Like Stability AI, Clearview was scraping their data. Again: no license. No user consent at all, even in some legalese sense.

The Cambridge Analytica case is more interesting, since Facebook ostensibly controlled that information. But it ended in a settlement, not a legal judgment. So not much to learn in a legal sense.

The details matter a lot. If it’s not possible to extract personal data from the trained blob, then there’s no real violation, even if private data was used to train it. But that would have to be argued in court.

Banquet_Bear · June 8, 2023, 10:22pm

…so a human author will process information differently than an AI that is processing 5000x as much information?

Thats exactly what I said.

What we expect future authors to do is to live their lives. They will wake up every morning. Go for a walk in the rain. Fall in love. Get their heart broken. Have kids. Get a job. Dance. Get drunk.

Future authors will also read books. But reading books isn’t the only way that humans learn. They learn by doing things, by experiencing things, by doing things right, by making mistakes. Life is a great big chaotic mess.

And that great big chaotic mess is what distinguishes a human writer from an AI.

That would be because humans and software are two different things. And there is a difference between a human reading and experiencing a book and software processing data.

Banquet_Bear · June 8, 2023, 11:31pm

…no they can’t.

The terms said sub-licence.

It doesn’t say sell a sub-licence.

You licence your work to Facebook every single time you upload content to the platform. But that licensing arrangement doesn’t involve any exchange of money.

Facebook can’t sell your photos or your content. That isn’t in the terms and conditions.

No. No no no no no no no. Just no.

Facebook is a social media company.

Shutterstock is a business that sells stock photography licences.

They aren’t the same thing.

Shutterstock have a standard terms of usage which you’ve read. But they also have the contributors terms of service. And its the contributor terms of service that allow Shutterstock to be able to licence images on behalf of their client in exchange for money.

Shutterstock contributors terms of service:

By submitting any Content to Shutterstock, you grant to Shutterstock a worldwide, sublicensable, non-exclusive right and license to index, analyze, categorize, archive reproduce, prepare derivative works incorporating, publicly display, sell, advertise and market, any Content uploaded by you and accepted by Shutterstock for any reasonable business purpose, including but not limited to the distribution of your Content to Shutterstock customers, to optimize the performance and operation of Shuttestock’s platform and services, and to develop new features and products. You also give permission to Shutterstock to add, modify or remove information related to your Content in order to manage and license such Content.

…

However, by submitting Content to Shutterstock, you expressly waive any artists’ authorship rights or any droit moral that you would otherwise have under the laws of the State of New York, United States Copyright Act or similar laws of any jurisdiction, so that customers may use your Content in accordance with the Licenses issued by Shutterstock.

…

https://submit.shutterstock.com/legal/terms

Do you see the word “sell” there?

Thats something that isn’t in any of the Facebook terms of service. The Shutterstock terms include a license to index, analyze, categorize, publicly display, sell, advertise and market, any content uploaded by the user.

My understanding is that Shutterstock are working in partnership with Stability AI. It goes further than just “sublicencing contributors work.” And that includes compensation to works that ended up in the dataset.

https://support.submit.shutterstock.com/s/article/Shutterstock-ai-and-Computer-Vision-Contributor-FAQ?language=en_US

Facebook and Shutterstock are not the same. Not even in the same ballpark. Not even on the same planet.

Nah. Its exactly the same as it says on the tin.

Do you know what “create a derivative work” allows people to do on Facebook?

It allows them to slap on a black-and-white filter. That’s a derivative work. Thats it. You wouldn’t be able to do that unless you give Facebook permission.

And what is it, exactly do you think “publicly perform or display” means in this context?

If you don’t grant Facebook a licence to display your content, they can’t display your content! They won’t be able to show your photos, or your videos, or anything else that you upload. How were you imagining that this worked?

What we learn from these cases is that there obviously is a “privacy angle here”, even in the United States of America where there is no GDPR.

This is largely incorrect. For example, under the GDPR the subject has the right to know what information has been collected about them, the right of rectification, the right to erasure. Even if that personal data is part of a “trained blob”, there will be an actual real violation if data protection laws are not followed.

And that includes paying more than lip services to the possibility that people might be able to “extract personal data from the trained blob”.

Dr.Strangelove · June 9, 2023, 12:28am

I didn’t say that. I said they can sell a license. A specific license to use your photos for AI training. It’s a sublicense of the license you already granted to FB when you uploaded your photos.

Shutterstock can sell your work. They can also sell a more limited license, say for AI training. And it’s the latter that’s at play here.

Circling back to the OP, we have Reddit’s user agreement. The relevant part:

When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.

Largely the same thing. Doesn’t say “sell” anywhere, but is clear about sublicensing, and that they might make it available to third parties that “partner with” (aka: pay money to) Reddit. And as we know, Reddit plans on making money from their “premium” API service.

Even though I think the API deal is largely about advertising, it’s nevertheless still true that Reddit wants to license their user content to AI companies. They’re pretty explicit about that.

That tin has a giant weasel on the lid.

Of course. But when you handed them the right to put a stupid filter on your photo, you also gave them the right to take out a full-page ad in the NYT with a photoshopped picture of Trump taking a dump on your photo. Derivative works cover a whole lotta ground.

I don’t know what you’re going on about. Of course that’s how it works. That’s what pays for the service in the first place: monetizing your data. And all those rights you signed over enables them to make money in a new way now, by using it to train AI.

As far as Facebook specifically is concerned, like I already said, I don’t actually anticipate them selling to third parties. Because they have a lot to gain by keeping it to themselves and using it to train their own systems.

The smaller sites, like Reddit, don’t have quite the same options. But they can sell the data to others.

We’re talking here about media that the user has contributed. Photos and posts, mainly. So “what information has been collected about them” is obvious; it’s exactly the set of stuff they posted. I’m not talking about making subvert audio recordings or whatever.

As for the right of “erasure”; well, that rather depends on if the post-training data is considered “personal” or not. You can certainly ask for your data to be removed from the training set. But from the trained blob? It’s not really personal data at that point, but it remains to be seen how the courts interpret that.

Banquet_Bear · June 9, 2023, 1:57am

…they. Can’t. Sell. A. Specific. Licence. To. Use. Your. Photos. For. AI. Training.

That isn’t allowable under the terms and conditions. I’ve bolded the specific word that is relevant here.

Shutterstock can sell your work.

Facebook cannot sell your work unless you are using the Facebook Marketplace, then those specific terms and conditions then apply.

The relevant part in relation to the API is on the API terms of service:

https://www.reddit.com/wiki/api-terms/

Reddit plans on making money from their premium API service by charging lots of money for the premium API service. It doesn’t own the content that users upload and API users cannot do anything except display that content.

Reddit can’t licence content that it doesn’t own, that it doesn’t have the rights too. Shutterstock and Reddit are not the same.

Except this is what has actually happened. And you’ve bought nothing of substance to the table.

Except Facebook aren’t taking out full-page ads in the NTY with a photoshopped picture of Trump taking a dump in your photo.

And thats kinda the point here. The TOS allows plenty of scope here to allow for different types of filters, to allow Stories and Reels and different ways to edit/display and remix uploaded content to be displayed. It has to be general. Because they are constantly adding new features. Derivative works have to cover a whole lotta ground here.

But the minute they tried anything outside of that scope, like for example " take out a full-page ad in the NYT with a photoshopped picture of Trump taking a dump on your photo", they are breaching their own TOS in mulitple ways. They can’t use your content in this way.

What the rights to “publicly perform or display” means is that you’ve granted Facebook the right to publicly perform or display your work until you remove your work from the platform.

No we are talking about data. Data includes things that users contribute, but it also includes thing that users have not. There is nothing obvious at all about what information is contained in a “trained blob.”

The tech-bros running the AI companies don’t get to decide what post-training data is considered personal or not. Transparency is the issue here. in the EU people have the right to know what information has been collected about them. The AI companies cannot just point to a “trained blob”, shrug their shoulders and expect that to be the end of it.

Yes, if the trained blob, in any way, contains any information relating to an identified or identifiable natural person, then its personal data and subject to the GDPR.

I_Love_Me_Vol.I · June 11, 2023, 12:00pm

Read it more closely. It says “share it with others such as Meta Products…”

“Such as” which means Meta is just an example but they could also license it to Google, Burger King, and the volleyball
team at Freshwater State University. All without you even knowing about it, much less authorizing it.

In terms of making money off of someone else’s work, licensing brings in green just like a sale would. Profit-wise its the same difference as leasing vs. selling a car is to a Ford dealer.

Well they’re not going to give it away so… yeah. They sell it.

Banquet_Bear · June 11, 2023, 12:19pm

…trust me: I’ve read it very very very closely.

But they aren’t.

I’m a former commercial photographer that keeps their ear to the ground on copyright issues and if Facebook were doing this: we would know about it.

They are not secretly licensing content for profit. It isn’t happening.

They can’t licence for money content they don’t own without explicit permission of the copyright holder: and the TOS don’t allow them to do this.

They don’t sell it.

I_Love_Me_Vol.I · June 11, 2023, 12:25pm

Did you catch the “such as”?

Banquet_Bear · June 11, 2023, 12:43pm

…yep.

I_Love_Me_Vol.I · June 12, 2023, 12:18am

Well ok.

IANAL but it seems the wording in the TOS specifically gives Facebook/Meta the ability to license user content to pretty much anyone they want. Even if FB/Meta have never done so, FB users have nonetheless agreed that FB can do so at any time.

I wouldnt mind a Doper lawyer-type offering an opinion on this but I thought it was kinda common knowlege that while a user still holds the copyright they can’t stop (or even be aware of) FB selling usage rights of one’s
content to third parties.

Sam_Stone · June 12, 2023, 1:02am

Seems pretty straightforward:

Simply put, you own the content you post to social media, but you’ve given each
platform a license to use it as spelled out in their terms and conditions. These licenses are slightly different from each other, but all of them grant the social media platform the right to use your copyrighted work in whatever way they see fit.

Theoretically they could exploit it commercially and even sell or sublicense their license to a third party to use, and, as each one specifies that the license is “royalty free”, you would never receive any part of the profit. Now, this is by no means a reason to panic; the likelihood Facebook will start selling copies of your photos of your niece’s dance recital is very low. What this does mean, however, is that if one day one of your photos pops up on TV in a Facebook commercial, you can’t really do anything about it.

That site has relevant sections of the ToS for the major social media sites. They all retain the right to use your content any way they want, short of literally selling away your ownership rights.

Banquet_Bear · June 12, 2023, 1:29pm

…no they don’t retain that right.

Again:

FB TOS:

However, to provide our services we need you to give us some legal permissions (known as a “License”) to use this content. This is solely for the purposes of providing and improving our Products and services as described in Section 1 above.

Specifically, when you share, post or upload content that is covered by intellectual property rights on or in connection with our Products, you grant us a non-exclusive, transferable, sub-licensable, royalty-free and worldwide licence to host, use, distribute, modify, run, copy, publicly perform or display, translate and create derivative works of your content (consistent with your privacy and application settings). This means, for example, that if you share a photo on Facebook, you give us permission to store, copy and share it with others (again, consistent with your settings) such as Meta Products or service providers that support those products and services. This licence will end when your content is deleted from our systems.

https://www.facebook.com/legal/terms?paipv=0&eav=AfYF8LGsfSb7FK2cME_A_b337JMWUxYf2k_tbPYB7tBntDAr3YLOEQEMBER8B9wBt1w&_rdr

I’ve bolded the relevant part.

The licence is solely for providing and improving Facebook’s products and services. Thats it. This is a standard usage agreement. This isn’t a licence to use your content any way they want. The TOS explicitly states how the content will be used. Its very limited in scope. And its written in plain english.

But there are other issues that prevent Facebook from selling/licencing to third parties.

The Facebook TOS don’t limit the user to only uploading content that they own the IP for. For example your uncle might take a photo of you. They send you the photo, you upload that photo to Facebook. Who owns that photo? In the United States, it would be the uncle. When they send it to you, they’ve granted you an implicit licence to share that image.

But you don’t own the rights to that image. So Facebook couldn’t sell a licence to that image to a third party. Because the uploader didn’t own the rights.

It would be the same if the uploader shared a photo of Muhammad Ali to their Facebook page. The user may have uploaded it for the purposes of critique. Does that now mean Facebook have the right to sell that image due to the TOS? No they don’t. Firstly, they would require permission from the copyright holder of that image, whether that is the estate of Muhammad Ali, or the original photographer. But the right of publicity also comes into play. You can’t just sell an image that contains an identifiable person for commercial use. You could potentially sell it for editorial use. But to a third party for training an AI dataset? That hasn’t been tested, however the Facebook TOS, as it stands, don’t allow it.

You can see the differences by comparing how Shutterstock handles image uploads to Facebook.

Every time you upload to Shutterstock, you agree that you own the rights to the content you are uploading. If you are making the images available for commercial use, you must include signed model releases for every identifiable person in that photo, and where applicable, a property release as well. If you are making images available for editorial use, they may check your credentials.

As an example of this: here are a couple of photos of the late-great Jonah Lomu I sold through an agency to the Daily Mail.

(The second and third images down the page, credited to me/ @ Demotix)

I can sell that photo editorially just fine. But I’d get myself into big trouble if I tried to sell those images commercially. The stock agencies wouldn’t allow it.

But Facebook doesn’t require any of this information when you upload a photo. You just upload a photo.

Which is why Facebook can’t sell licences to your content to third parties. For starters there is no system in place for uploaders to confirm they own the rights to content that they upload. Secondly, any identifiable people in those images have rights as well.

Princhester · June 13, 2023, 4:29am

@Banquet_Bear, I suspect facebook could drive a truck through those terms and conditions - they say they only have a licence to use your stuff for providing their products and services, but the definitions of their products and services are drafted so broadly that they can do almost anything with your stuff and claim it’s all part of providing products and services.

I’m reminded of Milo Minderbender who takes everyone’s parachute for MM Enterprises and sells it but when Yossarian complains, Milo points out that it’s for Yossarian’s own good since “everyone has a share” in MM Enterprises.

I haven’t read this whole detailed debate but frankly I can’t take seriously any claim not to be aware that all the stuff we are typing into or posting to the internet for everyone see is fair game (either legally, or gray area legally or illegally) for re-use by anyone who cared to use it.

Nitpicks about not being fully aware of all consequences or possible uses don’t interest me at all.

I don’t say this as a booster for the proposition but simply as something that seems obvious at a practical level.

Banquet_Bear · June 13, 2023, 4:45am

…which is why Facebook has been selling user content every day since 2004…oh, hold on, they haven’t been. Because the TOS doesn’t allow it. Because copyright law doesn’t allow it. Because the right of publicity wouldn’t allow it. It right there in black and white.

This is incorrect.

Princhester · June 13, 2023, 5:13am

As I understand it the focus of this thread isn’t so much about the direct sale of photographs or unamended text but about usage of that material – particularly for training AI.

Facebook can’t sell such material direct to a complete third-party. However, even a superficial review of their terms shows that they are permitted to use that material (including by themselves, suppliers and subcontractors) for the purpose of providing Facebook’s services.

So Facebook can - either in-house, or through a third party supplier or subcontractor - use users’ material for the purpose of training an AI tool to provide Facebook’s services. Once that occurs, Facebook then has an AI tool which can be sold because as I understand it the material once digested has been re-worked and re-ordered such that that it no longer contains the copyrighted material in original form.

This then means that Facebook or the Facebook supplier can then sell the AI tool without infringing copyright.

I only had to glance at Facebook’s terms, as a lawyer, to see that they are designed to seem to be all about loving and serving their customers, while actually being written to allow them rights from here to the moon and back.

BigT · June 13, 2023, 5:25am

The specific context though is not Facebook creating its own AI. It’s Reddit selling its data to AI developers through its API pricing.

As I’ve said all along, they can just scrape the website, and bypass the API. The only people it hurts are those who legitimately use the API, because they can’t so easily switch to scraping–at least, not in a short time frame.

That’s why I don’t accept that this has to do with AI.

Princhester · June 13, 2023, 5:33am

Sadly, in the world of PR this isn’t how it works. Although I’m a lawyer, I have worked with high-powered PR consultants. They all say similar things – if the public doesn’t like you they will either not listen to or nitpick any explanation you give, and by saying anything you are just keeping the brouhaha alive. The only approach that works is to apologise and/or say the absolute minimum (or nothing) until the issue goes away.

This strategy is at its most important when the explanation you could give - while practical - is unpopular.

I read and post on reddit quite a bit. A substantial proportion of redditors quite literally use the word “profit” as synonym for “evil”. If reddit’s explanation for their current actions amounts to “we want to do this to make a profit” their PR people will have told them not to say that, no matter how reasonable or necessary, on pain of death.

This will be the strategy that reddit is following.

In short, you simply can’t take a large well advised company’s silence as meaning anything other than “we have to be silent because anything we say will get us in trouble with the stupids”. It annoys the hell out of me too but here we are.

Banquet_Bear · June 13, 2023, 8:58am

…this tangent was started because someone claimed “if you post your data to social media–well, they just own all of it anyway.” This isn’t correct.

Bolding mine.

This is far from settled. The Getty lawsuit includes multiple examples of AI generated images that still included the Getty copyright watermark. Sudowrite, an AI writing tool admits on its front page that, if you enter the right prompt, it will plagiarize, word for word. And you’ve got obvious examples like this:

This is a minefield. Facebook have already dug themselves a deep hole with the failure of the “Metaverse.” Creating then selling an AI Tool based on the dubious legal premise that the material “no longer contains the copyrighted material in original form” isn’t one that I even think Facebook legal advisors would entertain.

And that also doesn’t take into account the fact that multiple jurisdictions are looking at putting in place new laws and regulations that will force AI companies to disclose what copyright materials are included in a dataset, and may restrict their use.

EU Law to Force AI Imagers to Disclose Copyrighted Photos in Dataset | PetaPixel.

It would be much simpler just to change the TOS.

The Facebook TOS don’t allow them rights from here to the moon and back.

Jophiel · June 13, 2023, 4:17pm

I’m not sure what that’s supposed to show/prove. It looks like someone just did an IMG2IMG on those pictures, basically editing the image rather than creating an image. As others mention in the replies, it’s basically like running an image through filters. You’re inputting the files, saying “make minor changes” and, unsurprisingly, getting the slightly modified version back out.

The tweet-poster also links to someone else “proving” how Stable Diffusion just steals and recreates images but he does so by making a four image LoRA ( basically a tiny mini model) then running it full weight (which you don’t typically do) and prompting for the exact text he tagged into the images before making the LoRA. This is like me training a model with only the information that Dickens wrote “It was the best of times, it was the worst of times” in Tale of Two Cities, asking it to write something identical to Tale of Two Cities and then calling it proof of plagiarism when “Iyt wus the beist of ttimies…” comes out. I set it up to do exactly that.

It’s the difference between putting four marbles in a bag and being surprised when you pull a fully formed marble back out versus reaching into a sack of sand and pulling out a marble’s worth of mass; each time is going to be different.

LoRAs aren’t full models, they’re not trained the same and not utilized the same way. And someone who knows enough to train a LoRA ought to know the difference – which makes it “odd” that someone would train a LoRA then use it as “proof” that Stable Diffusion is doing such-and-such. No one would actually use a LoRA like that; it’s a waste of time since all it can do is make poor quality knock-offs.

If someone actually wanted to make this point, it’s pretty simple: Film yourself grabbing a random image off HaveIBeenTrained then recreate it on the spot using a base Stable Diffusion model. You’ll find it’s a lot harder than turning to Page 1 in a one page book you wrote yourself to prove how easy it is to find Page 1.

Banquet_Bear · June 13, 2023, 4:35pm

…it in part demonstrates that AI is a legal minefield, and that “material once digested has been re-worked and re-ordered such that that it no longer contains the copyrighted material in original form” is in no way a settled legal position, and that content creators have every right to be skeptical about AI claims.

Topic		Replies	Views
Is Reddit Really Going to Have A 'Blackout'? Miscellaneous and Personal Stuff I Must Share	135	4409	June 21, 2023
Reddit to charge for API Access Miscellaneous and Personal Stuff I Must Share ai	9	351	April 20, 2023
Request: don't put ChatGPT or other AI-generated content in non-AI threads About This Message Board ai	201	8213	June 23, 2024
Note for using AI About This Message Board ai	42	380	February 3, 2025
Anyone Seen AI submissions? In My Humble Opinion ai	62	2152	May 17, 2024

AI affecting Stack Overflow and Reddit

Related topics