I'm missing something about AI training

Cheesesteak · June 26, 2025, 4:44pm

I’m pretty sure MJ created and distributed pictures of Mickey Mouse:

to MJ’s customers
for money
lots and lots of money, like half a billion a year kind of money

It doesn’t matter how the training works, if you hand out pictures of Mickey Mouse to your customers for lots of money, Disney is going to sue you, and win.

Disney has no interest in chasing down some nobody user who received a picture of MM and maybe didn’t even use it.

griffin1977 · June 26, 2025, 7:00pm

So the training is copying. If the training was just unobjectionable “analysis” then why would the training inputs need to be filtered?

Jophiel · June 26, 2025, 7:04pm

What? They’re filtered for nudity, explicit gore, etc to try to steer the model away from being able to replicate those things. If, for some reason, you hated ducks, you might want to train a model on everything except ducks so it would just digitally shrug when anyone asked for a duck.

griffin1977 · June 26, 2025, 11:59pm

Yes in order to prevent the system replicating or copying (the words are synonyms) porn it filters it out before it’s encoded by the training process.

Disney’s point is they should do the same to stop it copying their copywritten works. The process of encoding the work via training is where Ihe copying happens thats why the filter has to happen there. If it wasn’t, and the training was just unobjectionable analysis, they would be asking for filtering in the prompt or the output image.

griffin1977 · June 27, 2025, 12:04am

Disney disagrees:

As explained below, Midjourney is able to reproduce, publicly display, and
distribute these copies because Midjourney selected and copied Plaintiffs’ Copyrighted
Works as part of the training process for its Image Service.

Jophiel · June 27, 2025, 2:56am

Only in the same sense that every dinosaur I draw is a copy of previous paleo-art (having never seen a dinosaur in real life). We’ve gone over this in the previous 285 posts.

No, they were saying that the act of identifying and collating images to include or remove from the dataset is a process that requires having/making copies of the original images being trained on. This is before any training even happens.

That is EXACTLY what Disney was asking for:

And other AI image- and video-generating services have instituted copyright protection measures that recognize and protect the rights of content creators like Disney and Universal. These readily available measures can be implemented in at least two ways: first by rejecting prompts that request the display or download of Plaintiffs’ copyrighted characters, and second by using technology to screen for infringing image outputs.
11. Before taking legal action, Plaintiffs asked Midjourney to stop its theft of their intellectual property, including by implementing these simple measures.

(Emphasis mine)

However, the system is unlikely to output concepts it hasn’t been trained on so limiting the amount of nudity in the model is a pre-step taken to try to avoid it. MJ has filters at the prompt level (“This prompt does not meet community standards…”) and output (“We cannot display the image…”) but they still try to prevent the model from being filled with naked people to make it harder for MJ to figure out what naked people look like.

griffin1977 · June 27, 2025, 7:44pm

You think that, but Disney (and their lawsuit) disagrees. This lawsuit is explicitly based on the claim that the training IS copying it. Not in a “all art is inspired by other art” way but a “you have violated our copyright and owe us millions of dollars” way

To be clear, Midjourney had to copy Plaintiffs’ Copyrighted Works in order
for it to be able to subsequently disseminate reproductions and derivatives of Plaintiffs’
Copyrighted Works as outputs. Midjourney’s unauthorized copying of Plaintiffs’
Copyrighted Works to train its Image Service infringes Plaintiffs’ copyrights in their
Copyrighted Works.

Jophiel · June 27, 2025, 7:53pm

No, that is a small part of it and much of THAT is based on the pre-training procedures. Read lines 115-117 above your quoted portion. 115c is the only part of that actually about the actual model training and even THAT has a big “We dunno what actually happens here” included in it. It’s a very minor part of the lawsuit compared to the output portion or even the pre-training data scraping and collating.

Hinging the lawsuit on convincing a judge that the act of training is a violation would be a huge, stupid mistake on the part of Disney’s lawyers. Luckily (for them) they’re not that stupid and instead base the case around the output portion which is much easier to show violation and the pre-training which is much easier to prove standard ole “Copying my stuff” copyright violation.

I notice you ignored the whole portion about Disney expressly asking Midjourney for prompt/output filters as a solution to their issues.

griffin1977 · June 27, 2025, 8:35pm

It’s such a small part of it it’s the first thing the mention, in their claim for relief, before mentioning the output images…

FIRST CLAIM FOR RELIEF
(Direct Copyright Infringement)
.
.
Midjourney has directly infringed Plaintiffs’ Copyrighted Works by
unlawfully reproducing, publicly displaying, distributing, and making derivative works
based on Plaintiffs’ Copyrighted Works both in developing and training its Image Service
and in the output Midjourney generates for its subscribers.

Again, the output images figure prominently because they are such a compelling way to show their copywrite was infringed without getting into a technical discussion of how diffusion models work. But the crux of this case is the training, and the copying and encoding that happens there.

Jophiel · June 27, 2025, 8:40pm

I don’t know what you think that means or how it proves anything. Again, most of the “training” complaint spelled out applies to the pre-training handling of files during collation and filtering. Yes, it mentions the actual training. Briefly. And with a caveat that they don’t actually know how the training works. Also it’s pretty funny to take the line “developing and training its Image Service and in the output” and get from that “Obviously training was the most important”

Why you want to cling to the idea that this is the crux of the case is beyond me but, sure, whatever. We already have a precedent that training is Fair Use so I guess Disney’s lawyers are idiots to make that the “crux” rather than the much easier fly balls of the collation and output/display.

Once again, you’ve failed to address how Disney’s own remedy to this was prompt/output filters. Something that you said would actually show that it wasn’t about the training.

griffin1977 · June 27, 2025, 8:54pm

It says nothing of the sort. The lawsuit repeatedly says the process of training is illegally copying their works. Whenever it summarizes their complaint it makes that the focus of what MJ is doing wrong and what they want restitution for…

As explained below, Midjourney is able to reproduce, publicly display, and

distribute these copies because Midjourney selected and copied Plaintiffs’ Copyrighted
Works as part of the training process for its Image Service.

Midjourney’s ability to repeatedly access the data stored in its software system and reproduce, publicly display, and distribute further copies of Plaintiffs’ Copyrighted Works
for its subscribers demonstrates that Midjourney’s training of its generative AI model
involved the fixation of copies of Plaintiffs’ Copyrighted Works in a tangible medium from
which the work can be perceived, reproduced, or otherwise communicated with the aid of
a machine or device.

Once the training process is complete, due to Midjourney’s massive copying
of Plaintiffs’ Copyrighted Works, and as a direct and intentional result of Midjourney’s
development and training, Midjourney’s Image Service generates reproductions and
derivatives of Plaintiffs’ Copyrighted Works.

That means not reproducing, displaying, distributing, or otherwise making
available Disney’s copyrighted works without an express license. Midjourney—through its
training and outputs—has been violating each of these rights.

Etc, etc, etc …

Jophiel · June 27, 2025, 8:56pm

Did you read 115-117? Just doing a word search for “training” isn’t enough. The “training process” explicitly includes the scraping, collation and filtering/cleaning of images prior to actual training.

griffin1977 · June 27, 2025, 9:03pm

Yeah they made the same point I made above. They don’t understand all the exact details of the training process as they aren’t public (but intend to find out in discovery) but…

Just like I say above its how computers work, how information theory works, and just common sense. If a computer system ingests a picture of Mickey Mouse and then spits out a picture of Mickey mouse, then it has copied and stored a digital representation of that picture. Whether it’s stored as a JPG or in an ML model that’s provably what’s going on, absent of magical information fairies.

Jophiel · June 27, 2025, 9:07pm

“They don’t know what’s going on but they made it the center of their multi-billion dollar legal case” is one heck of a take but I don’t see this going a whole lot further. Uh, good luck to Disney with that.

For real though...

Actually, Disney will be fine since that’s not the center of their case and I suspect they’ll arrive at a settlement about prompt/output filtering long before this gets resolved in a courtroom, especially with a legal precedent now that training itself is protected by Fair Use.

griffin1977 · June 27, 2025, 9:32pm

“Here’s a copyrighted image of Mickey Mouse that was scanned in Mid Journey and stored in their computer systems without our permission. How do we know it was copied and stored? Because their computer system later spat out this almost identical image of Mickey Mouse that is clearly a copy of the original”. That’s a pretty damn good case, unless MJ have some very compelling evidence for the existence of magical information fairies

Cheesesteak · June 27, 2025, 10:44pm

AKA “directions”

Jophiel · July 2, 2025, 1:01am

I’d been hearing about some pseudo-group that has been uploading AI generated music^[1] onto Spotify & related platforms and claiming to be a real band. When they were called out on it, they insisted that they were a real band of real people playing real music and never used AI. Anyway, here’s their totally real and legit gig “photo”:

That is the most AI looking image to possibly use to try to convince people that you’re not just using AI.

I know there’s multiples of these uploading AI music, not just these “guys” ↩︎

Darren_Garrison · July 2, 2025, 4:47am

They are normal humans musicing with their normal hands.

Der_Trihs · July 2, 2025, 5:08am

By that logic simply looking at a picture of Mickey Mouse is a copyright violation, since your brain does in fact copy and store the data.

And considering that the pro-copyright people periodically push the idea that just looking at copyrighted material without paying is a copyright violation, I can’t dismiss the possibility of that being on purpose rather than a logical flaw.

griffin1977 · July 2, 2025, 12:15pm

Yes, if the court finds that computer programs have the same rights as people, MJ could get away with that argument

My JPG compression C code has the same right to copy and store data as you or I dammit!! Freedom!!

Topic		Replies	Views
AI-generated images and artistic endeavor Cafe Society	33	888	September 5, 2022
A.I. artist claims people are stealing his work Cafe Society ai	82	1097	February 19, 2025
Source images for AI images Factual Questions	26	921	February 14, 2023
Is Artificial Intelligence (AI) plagiarizing? In My Humble Opinion ai	78	2388	May 17, 2024
Artifiial Intelligence Overestimated? Great Debates	84	5288	January 8, 2018

I'm missing something about AI training

Related topics