AI affecting Stack Overflow and Reddit

Darren_Garrison · June 8, 2023, 12:14am

Facebook does use AI for image recognition. I wonder if they trained anything on their own data?

Dr.Strangelove · June 8, 2023, 12:22am

Who cares what the intent is? The clause is still there. And it still gives them permission to sell your posts and photos to any third party they wish for AI training. Not to mention doing their own training.

Your best hope is that Meta sees their content library as too valuable a “secret sauce” to license to others. But the same won’t be true of others, such as:

Shutterstock isn’t exactly a social media platform, but their ToS looks remarkably similar:

By submitting any Content to Shutterstock, you grant to Shutterstock a worldwide, sublicensable, non-exclusive right and license to index, analyze, categorize, archive reproduce, prepare derivative works …

Darren_Garrison · June 8, 2023, 12:28am

That’s another thing about AI training data. If anyone ends up having to get paid for it, there is 99.99% it will be the social media sites and other businesses (such as Getty) that get paid and 0.01% chance any of that money will be redistributed to the users.

Dr.Strangelove · June 8, 2023, 12:36am

It’s possible that some sites will make that a selling point. I.e., host your images here; our license doesn’t allow use for AI training. Or have an opt-in with a cut of the profits.

But such sites are destined to remain small. Because there’s a lot of good money to be made as a source of training data, and most people don’t care, so the Meta and such will outcompete those other sites.

Banquet_Bear · June 8, 2023, 12:36am

…because both the intent and the actions of the social media companies match up.

What the clause doesn’t allow them to do:

resell your posts
resell your photos
resell your posts or your photos to any third party

None of that is there. None of that is allowable under the terms and conditions.

There is also nothing about ownership. Which was the “nitpick” that started all of this.

Meta doesn’t own a “content library” of user contributed images. The uploader retains ownership rights.

Shutterstock ALSO does not own any of the work that gets uploaded to it.

Dr.Strangelove · June 8, 2023, 12:41am

And yet they cut a deal with OpenAI to allow their library to be used for training data.

I don’t know or care what your bizarrely narrow definition of “ownership” is, but the relevant right in this context is whether the creator can disallow its use as training data. And, surprise: they can’t. At least not without deleting it from the platform, which rather defeats the point of putting it on the platform in the first place.

The ToS of all these companies, whether social media or stock imagery or otherwise, allows them to both train their own AIs and to license the content for others to train their AIs. It’s all there in the ToS and it’s happening already.

Banquet_Bear · June 8, 2023, 12:50am

…thats a separate issue to ownership. (and one that many contributors to Shutterstock are up in arms about)

It isn’t bizarre. It’s the entire basis of what make intellectual property rights system. I’ve been a commercial photographer for over eleven years and I’ve been negotiating image licensing rights cross-borders the entire time. Ownership here is pretty clear. Facebook and Twitter and Instagram don’t own your images. They never have, they never will. They can’t resell your images and if they even tried they would get sued into oblivion.

I think more correctly, this hasn’t been tested in law yet.

There is no denying that its “happening already.” Whether or not the TOS allows this, and whether or not the law will continue to let it happen are things that we won’t know for the next couple of years.

Dr.Strangelove · June 8, 2023, 1:00am

There are two separate issues here. First, whether these ToS allow companies to train their AIs on user-submitted data (or sub-license the rights to a third party). This seems like really straightforward contract law and I expect the sites to be 100% in the clear. It’s just a really straightforward reading of the terms.

And second, whether fair-use rights allows anyone with access to the images to train their data on it. The argument in favor being that it’s not much different than other established uses like search, and the counterargument being that AI is something new and fair use is a pretty fuzzy thing anyway, so it’s not at all settled.

I think the first one is crystal clear and isn’t going to change at all. I’ll acknowledge that the second is not established, even if personally I think it’s more likely to go in one direction.

Banquet_Bear · June 8, 2023, 1:10am

…nothing is ever straightforward. Which is why things like the Getty lawsuit will take a long time to sort out. The terms and conditions on the Getty website are pretty clear. So the infringement should be pretty obvious here right? 100%?

Fair use has limitations, and if Fair Use is the doctrine used in defence of monetizing AI training datasets, then you are setting yourself up for a fight. This isn’t parody, this isn’t news reporting, this isn’t criticism, the amount used will vary from potential infringement to potential infringement.

Dr.Strangelove · June 8, 2023, 1:23am

The Getty lawsuit falls in the second category. Stability AI has no license, but is claiming fair use. Like I said, I don’t think that case is established. Could go either way.

I don’t know of any cases where users are suing platforms where the ToS gave the platform a permissive license. That is the case where I expect the platforms to have an easy win.

Sam_Stone · June 8, 2023, 1:35am

Even training is a muddy concept. There is no guarantee that any content ingested by an AI will affect any of its output. No copies are made, or were ever made.

It’s kind of like saying that since an author read your book, and their style now looks vaguely like yours even though it’s a completely different book, he somehow owes you money because the book would have been different if he hadn’t read yours first.

And how is it different than an artist learning techniques from studying other artists? One of the artists often mentioned in the ‘AI is stealing other art’ conversations is Greg Rutkowski. Here are some of his images:

https://www.artstation.com/rutkowski

But Rutkowski himself says that his work is heavily infuenced by other artists:

There are too many great artists who influenced my work at some point so it would be impossible to mention all of them, but I’ll try to name the few that come to mind!
From the “old masters,” I’m hugely inspired by: Aleksander Gierymski, Jan Matejko, Jozef Chelmonski, Ilya Repin, Joaquinn Sorolla and many, many others.
From the “digital masters,” I like Tyler Jacobson, John Park, Jamie Jones, Craig Mullins, Piotr Jablonski, Eytan Zana, Michael Komarck, Sergey Kolesov, Wesley Burt, Slawomir Maniak, Wangjie Li, Yizheng Ke and many others (I could easily add two or three times more).

I appreciate them mostly because they have artworks with that “organic” factor I like. Their pieces stand out from the many others and I’m always finding lots of new things that I can learn from them.

Bolding mine. Should Rutkowski have to pay money to those people? If not, why should an AI, if it has learned to paint in exactly the sake way?

Greg Rutkowski on Rethinking his Approach to Art

The first time I saw a Rutkowski work, I thought it was Frank Frazetta:

I’m sure he was an influence. Both on Rutkowski and on Stable Diffusion.

Banquet_Bear · June 8, 2023, 1:35am

…Getty was an example of TOS “not being straightforward contract law” expecting “sites to be 100% in the clear.” It almost isn’t just a “really straightforward reading of the terms.” If it was, then these things wouldn’t have happened.

Banquet_Bear · June 8, 2023, 1:39am

…but it isn’t like an “author reading your book.”

It’s software that is processing thousands of books, millions of words of texts. It is doing something that no human on the earth is capable of doing. The analogy doesn’t hold.

Dr.Strangelove · June 8, 2023, 1:40am

I’ve no idea what you’re claiming here. Getty holds the rights to their imagery, not Stability AI. Stability’s argument rests entirely on fair use and nothing that exists in any ToS.

If the lawsuit was about Getty training their own AI, or about them licensing their work to some third party, then it would be a different story.

Banquet_Bear · June 8, 2023, 1:44am

…TOS’s are not straightforward.

For example: some people think that the Facebook terms and conditions allows them to resell the images to a third party. Even though the terms don’t allow them to do this, people still think that they can. Some people think the Facebook TOS gives ownership to Facebook. But it doesn’t.

TOS’s are not as straightforward as you think, and a “plain reading of the terms” can mean different things to different people.

Dr.Strangelove · June 8, 2023, 1:56am

It’s just my opinion, so we’ll have to see. But given that big platforms have been doing this for many years with no pushback, I’m not expecting much.

The Getty lawsuit demonstrates nothing here. Again, Stability AI does not have a license to a data by any reading of a ToS. They’re pinning their lawsuit on fair use.

“Some people”. You know, you don’t have to go through these circumlocutions to call me out. You can just say things directly.

Of course, in context, it was clear that I was talking about licensing the data for use with AI training, and I can find only one place where I inadvertently said “sell… your photos” instead of “a license”. A more generous reading on your part would have made that obvious.

Banquet_Bear · June 8, 2023, 3:54am

…the big platforms have “been pushed back on” and have been settling lawsuits for decades now. This is just yet-another-thing.

This was about “plain-reading of a TOS”, and the plain reading of the Getty TOS make it quite clear that fair use isn’t a defence.

And yet, you claim Stability AI will be “pining their lawsuit on fair use.” But that would be a losing strategy here. Because fair-use would most probably decided per infraction. There isn’t a global “fair use defence.” Fair use isn’t a “get out of jail free” card.

Licensing what data? To whom? To do what?

These are the important questions here. That data isn’t for Facebook to do with what they please. Another case:

You said this:

Facebook cannot resell a licence to your work to others without your permission. Facebook don’t have the ability to do almost anything they want with your work. Facebook don’t have ownership of anything you upload. These are things that are simply incorrect.

Dr.Strangelove · June 8, 2023, 4:16am

And? Yeah, they’re in a weak position. Because they didn’t license the data.

If Getty had actually sublicensed their data to Stability AI, there wouldn’t be any lawsuit. But that’s expensive, and Stability would rather get the stuff for free. They might succeed if they can argue that AI training is like search and should be held to the same rules. But it’s not a slam dunk like if they had just licensed the data.

The ToS clearly states they have sub-licensing rights. I don’t know why you’re implying otherwise.

I linked to an actual law firm that says as much. It may be 11 years old, but the Facebook ToS haven’t changed substantially in that time. So between you and them, I’m going to believe them.

The law always supersedes a ToS. And Europe certainly has reasonably strong privacy laws.

Is there a privacy angle here? Could be, I guess, but probably not in the US where we have no GDPR. Except, kinda, in a few states. It remains to be seen how enforceable these laws are.

But even then, it seems like there would have to be an actual privacy violation, which is unlikely given how the systems are trained.

Banquet_Bear · June 8, 2023, 9:51am

…again, I was talking specifically about fair use which has a very specific meaning, and really wouldn’t apply in this case.

They can’t resell a licence to your work to others without your permission. Do you see the word “resell” there? It means “sell something to someone else.”

Facebook can’t do that. They don’t have ownership of your work, they can’t resell it, licence or otherwise unless you’ve granted them permission, and there is nothing in the TOS that suggests otherwise.

You linked to an opinion. But there are a lot of things on that page that are not correct, overblown, and hyperbolic. It reads like a clickbait article. You are free to believe them if you like. But if you are ever in the need for an IP attorney in the States, I would shop around first.

Or you could believe what it says in the Facebook TOS.

Facebook TOS:

You retain ownership of the intellectual property rights (things such as copyright or trademarks) in any such content that you create and share on Facebook and other Meta Company Products that you use. Nothing in these Terms takes away the rights you have to your own content. You are free to share your content with anyone else, wherever you want.However, to provide our services we need you to give us some legal permissions (known as a “License”) to use this content. This is solely for the purposes of providing and improving our Products and services as described in Section 1 above.Specifically, when you share, post or upload content that is covered by intellectual property rights on or in connection with our Products, you grant us a non-exclusive, transferable, sub-licensable, royalty-free and worldwide licence to host, use, distribute, modify, run, copy, publicly perform or display, translate and create derivative works of your content (consistent with your privacy and application settings). This means, for example, that if you share a photo on Facebook, you give us permission to store, copy and share it with others (again, consistent with your settings) such as Meta Products or service providers that support those products and services. This licence will end when your content is deleted from our systems.

The sub-licencing? Is to other Meta Products like Instagram or service providers that are used to host and share the work.

You don’t have to believe me. But you shouldn’t be trusting a decades old article that doesn’t actually dive deep into the law and is at odds with almost every other intellectual property expert I follow.

Facebook can’t resell your work. Facebook doesn’t sell your work. The TOS doesn’t allow them to do that. This is long settled stuff. It isn’t controversial.

Yes, there is clearly a privacy angle. Clearview AI got dinged big time on this. And yeah, it can happen in the States.

If there is personal data of any sort in AI datasets, then yes, there are going to be issues, not matter “how the systems are trained”. Every tech adjacent company needs to have robust data-protection policies in place, and people need to know what they are.

Cheesesteak · June 8, 2023, 7:00pm

This makes the analogy even stronger. A human author reading someone else’s book is going to be more influenced by that text than an AI which is processing 5,000x as much information. Yet we accept without concern the influence of pre-existing works on the writings produced by human authors. We expect that future authors will read lots of books and learn from them in order to become better authors. It would be ridiculous to expect a human author to request permission to read and be influenced by another author’s books.

Topic		Replies	Views
Is Reddit Really Going to Have A 'Blackout'? Miscellaneous and Personal Stuff I Must Share	135	4408	June 21, 2023
Reddit to charge for API Access Miscellaneous and Personal Stuff I Must Share ai	9	351	April 20, 2023
Request: don't put ChatGPT or other AI-generated content in non-AI threads About This Message Board ai	201	8202	June 23, 2024
Note for using AI About This Message Board ai	42	378	February 3, 2025
Anyone Seen AI submissions? In My Humble Opinion ai	62	2152	May 17, 2024

Related topics