I'm missing something about AI training

It’s just regular ol’ mixing software, its expensive AF and has some bells and whistles (autotune, effects, etc) but it works exactly like I describe above.

It’s a computer program, made of a series of assembly language instructions. It takes as input some songs someone else wrote (they don’t have to be songs someone else wrote but I have zero musical talent, so that’s all I got) encoded as 1s and 0s, and some parameters provided by me. The program runs, by executing one dumb assembly language instruction after another, each instruction takes some bits from the input data and does something (a very small predictable thing, like load a few bits of input data, do some arithmetic, go to another instruction, etc.) When the program completes a new piece of audio (encoded as 1s and 0s) has been created.

This is exactly what the other program, that is an AI model for creating audio, does. The encoding of the audio files is a lot more complicated but that’s all it is, it cannot be anything else, because that program (and every program every written) works exactly as I describe above. There are no AI fairies.

Scarce enough to cost a hundred bucks, and some people still resent having to pay that much to a human.

Eh, I said “under” a hundred (I’ve had stuff done for $20-$30 in the past) but, in any event, we obviously have different opinions on what counts as “scarce”.

You insist on stating the “dumb” code is just doing a glorified recording and regurgitating of input music.

Code that can take notes and compare them to neighboring notes and records the intervals and maps them against known key patterns and then notes previous instances where it came upon such a patterns and "notes’ that a particular artist uses that pattern more often than others is now able, when queried to use the patterns to “write” in the style of said artist. This is light years different than recording these patterns and just stringing them together, or taking the entire work note for note and tweaking them in place, such as your mixing code might do. Using source material in an analog process, note for note, is not the same as discovering higher level patterns and using those patterns.

I wish I could find an article I once read where someone analyzed a large amount of Jerry Garcia’s solos, noting keys, preferred modes, etc, and then proceeded to use those instructions to produce a solo that sounded like Jerry but was in no way a copy of any known recorded piece.

Analyze something to such detail and patterns emerge, and using those patterns is not the same as copying known works.

An analogous situation in fashion would be like this: Chez Moi is known for using floral design in their styles. The flowers are typically flattened so that they appear polygonal versus naturally varying. The colors tend to not be varying much.

A human copycat takes Chez Moi dress style 1 and changes the colors and adds more copies of the floral arrangement.
AI takes a simple flower layout it has learned by being shown 10000 different flowers, then morphs the edges into a polygon (taking care to use different polygons from those it has been shown), then generates the final pattern. It uses polygons because the human who fired off the process told it to use the style of Chez Moi.

If you don’t see a fundamental difference in these two approaches, then we are at an impasse.

Back to AIs “understanding” concepts, I recently made a set of images in Dall-E 3 via Bing involving Big Bird, and DE3 kept drawing the Bird with several white feathers in his brow. I thought it might be another example of a mistaken concept that DE3 had baked in, but it turns out that I’m the one that wasn’t that observant and Big Bird has had the white feathers all along. Here’s a grid of six images involving Big Bird I’ve made at different times, all including the white feathers.

I decided to do a test of several AIs using a new prompt. The prompt is:

several 1980s polaroid photos of jack skellington kissing big bird from sesame street in central perk from friends. grainy disposable camera snapshot with bokeh, motion blur, shallow dof and forced perspective. front and back views.

And the results. (Each image is a single output image from the AI, not a composite grid):

Dall-E 3, via Bing. Clearly the winner in understanding the three concepts of Big Bird, Jack Skellington, and a group of Polaroid photos:

Dall-E 2, the first AI model to make news widely to the general public and to “shock the world” in how well it could do some things, is worlds behind Dall-E 3 in this:

Kandinsky 2.1, a Russian AI roughly the same age as Dall-E 2. It is pretty good with some things but appears to have no clue about Big Bird or Jack Skellington.

The recent release Flux Schnell has a pretty good understanding of Jack Skellington. Some Big Birds are much worse than others, none quite as good as Dall-E 3. It didn’t produce any “several” images:

Ideogram 1.0. Gets Jack, doesn’t get Bird, and makes multiples of the same photo:

Ideogram 2a Turbo. Ditto from Ideogram 1.0:

Google Imagen 3.0 Fast. Good understanding of Jack and the Bird, bad understanding of multiple Polaroids:

Stable Diffusion 1.5, an antique. The first popular, “impressive” release of SD:

Realism Engine, a third-party extension of Stable Diffusion 2.1:

SDXL 1.0, the most recent release of Stable Diffusion:

Dreamshaper XL, a third-party extension of SDXL 1.0:

And, just for fun, a try in the positively ancient CLIP + Guided Diffusion, aka “Coherent” over at Night Cafe:

The advantage of SDXL or Flux is that you can just put in a Lora for what you need. Civitai has a bunch of Jack Skellington and Big Bird loras available for people who want to use those figures (or Polaroid frame & photo quality loras if you want that effect).

Even SD1.5 is still very capable once you get off the base model (bleah) and add some extensions. Those systems really work best when you can tinker with them versus out of the box. SD2.1 is still butts though :smiley:

So OpenAI’s Sam Altman has urged Trump intervene in this very matter. Using the ol’ Red China bogeyman card (“if you don’t let us copy all these people’s work then China will take over!”)

OpenAI is hoping that Donald Trump’s AI Action Plan, due out this July, will settle copyright debates by declaring AI training fair use—paving the way for AI companies’ unfettered access to training data that OpenAI claims is critical to defeat China in the AI race.

Currently, courts are mulling whether AI training is fair use

I’m sure the Trump administration, de facto run as is by an oligarch with billions upon billions invested in AI models trained in this way (including OpenAI itself), will come to a fair and equitable conclusion on this matter :roll_eyes:

Another bit on AI “understanding” cultural items. I was playing around in Bing and tried for a snake eating cake by a lake (along with a set of style instructions I often use, “with bokeh, shallow dof and forced perspective”). Then I changed that to Jake eating cake by a lake, and discovered that Bing/Dall-E 3 associates “Jake” with the character from Adventure Time. So I tried a bunch more prompts. Here are one example each from Jake eating cake by a lake, Jake and the fat man, Jake and the flat man, Jake and the batman, Jake and the cat man, Jake and the splat man, Jake and the fit man, Jake and the foot man, and Jake and the hand man.

It tends to create plasticy figures, including articulated or Lego figures, such as Jakes and Batmans

but occasionally tries realism, like this Jake eating by a lake.

It doesn’t work for every pairing I can try: while it did for the majority, some results contain nothing Jake-like. And the weirdest quirk is that it only makes the Jakes when I include the style description. With a simple “Jake and x”, it has no clue and creates entirely random things.

With style description:

Without style description:

Bumping this thread because I’ve been advised it’s possibly the most relevant. The mouse’s lawyers are turning up the heat:

“End of AI?” is, of course, clickbait. I don’t really blame LegalEagle for that – it’s the game of YouTube – but there’s no putting the smoke back in the bottle at this point.

The case is mainly about outputs rather than inputs. Disney’s major claim is that people are able to use Midjourney to make derivative works that infringe its copyright. This can be circumvented either through strict filtering (even if not 100% effective, MJ should at least show good faith effort to prevent this) or licensing to Disney and other major entities. I don’t know how practical the second one is since I have no idea how much money MJ is making vs its costs but other major generative AI clients have stricter filtering than MJ does. For the most part, MJ’s filtering is against gore and nudity (with mixed results) and not generating realistic depictions of celebrities.

But, again, even if Disney won a billion-thousand dollars and MJ had to shut down and David Holz had to live under a bridge and eat mice for the rest of his impoverished days, it wouldn’t be the “end of AI” in any sense. It would perhaps be the restriction of publicly available generative art AI services but companies like Microsoft would be harder to beat up in court and there’s a huge thriving community of local users who don’t use the public services at all and can easily trade models and LoRAs through the internet. As Napster was mentioned in the video, it’s a good comparison for this: You can maybe take down one of the most casually available means of distributing music but pirated music was still trivial to find for anyone willing to work harder than double-clicking Napster.exe

A conspiracy by Mickey to keep his brethern down?

This case will be valuable precedent for Disney et al. Once they win this, they’ll be able to put the hammer down on anybody’s commercial service whose filters are less than perfect.

I suspect Disney will be less interested in a legal fight with Microsoft or Google than they are with Midjourney which is a stand-alone business that lacks billions of dollars. As long as those services show a good faith effort to prevent the sort of spot-on images from the court filings, I think they’ll just hit a detente.

On the other hand, Midjourney’s laissez faire attitude towards IP infringement combined with their smaller dog status (plus their superior photorealistic rendering in many cases) really works in Disney’s favor to establish a precedent.

A federal judge has ruled that training AI models on copyrighted works is a transformative use and thus falls under Fair Use of the works.

This does not apply to the output – an image of Darth Vader can still be considered IP infringement – but the use of copyrighted materials to train an AI model has been found to be legal in federal court.

[Edit: I’ll note that I’m going off the conclusion of the guy who posted this to Bluesky. Another guy disagrees and other people disagree with the disagreeing guy, etc. I haven’t had time to go through it myself nor am I a lawyer]

But if the result goes against MJ then it would be a finding that the training is copying and not fair use

The suit focuses on the outputs because a picture of Mickey mouse generated by MJ next to a Disney copyrighted image is a hell of lot more compelling than an in depth discussion of how diffusion models work.

Those images didn’t magically appear they were generated from the information stored in the system, and that information comes from the training data.

Nah, more likely the judge would just concentrate on the output and, if finding MJ in violation, it would be for making IP infringing works. How it happened on the input side is irrelevant; it’s not as though it’s legal for me to make bootleg Star Wars movies and Minions posters provided I don’t use AI to do it.

This could, instead, be because generating and selling pictures of Mickey Mouse without Disney’s permission is copyright violation, while analyzing legally obtained pictures of Mickey Mouse isn’t.

No as if that were the case Mid Journey would not be the target of the case, it would be the people who used MJ to create and distribute pictures of Mickey Mouse that would be defending this lawsuit.

This lawsuit is saying Mid journey copied these images and the mechanism they used was to encode thes images via a ML model, just as Napster encoded songs via MP3

If that’s what the lawsuit is saying, then I expect it will fail, because that’s just factually untrue.

Nah, the crux of the lawsuit is about Midjourney’s generation and display of the images. They even talk about how they asked MJ to implement filters to prevent the generation of the images:

Midjourney could easily stop its theft and exploitation of Plaintiffs’ intellectual property. Midjourney controls what copyrighted content it selects, copies, and includes in its Image Service, and it has the means to implement protection measures to prevent the ongoing copying, public display, and distribution of Plaintiffs’ works.
Midjourney already has in place technological measures to prevent its distribution and public display of certain images and artwork such as violence or nudity. And other AI image- and video-generating services have instituted copyright protection measures that recognize and protect the rights of content creators like Disney and Universal.
[…]
Before taking legal action, Plaintiffs asked Midjourney to stop its theft of their intellectual property, including by implementing these simple measures.

Midjourney has directly infringed Plaintiffs’ Copyrighted Works by unlawfully reproducing, publicly displaying, distributing, and making derivative works based on Plaintiffs’ Copyrighted Works both in developing and training its Image Service and in the output Midjourney generates for its subscribers.
[…]
As alleged above, the unauthorized reproduction, public display, distribution, and creation of derivatives of Plaintiffs’ Copyrighted Works through Midjourney’s output infringes Plaintiffs’ exclusive copyrights under Section 106 of the Copyright Act.

They do talk about the training process but, even there, most of the emphasis is on how MJ needed to collect and clean/filter images for the training set, an act that would require making copies of the images. Very little is spent on the whole “Our jpegs are in the magic brain” aspect of it. Which is a smart move because, even if a another judge rules that the training is a valid Fair Use case as the previous one just did, that doesn’t protect MJ from other copyright claims.

The recent case I linked a few days ago has the precedent: Training an AI model on copyrighted media is fair use but the collection of that media for training must be legal. The plaintiff there won the part of the case about the actual training but still has to answer for collecting a library via copyright-infringing methods. Much like it’s perfectly legal to buy shoes, but it’s not legal to steal money to buy shoes with.