I'm missing something about AI training

I guess at this point I’d recommend you do some research into how generative AI works since you’re not going to learn or accept it from me.

I’ve lost what points either of you are trying to make.

For me, a product is art if a human is trying to express something. That expression can be through a tool that turns words into sounds, or it can be through a tool that changes pre-existing sounds into sounds.

Art doesn’t have to be new. Art doesn’t have to be effective. Art doesn’t have to be something I appreciate. Art doesn’t have to be copyrightable or trademarkable.

Hah, I don’t blame you. My original point was simply that AI output is not automatically “derivative”, especially in a legal sense. This was backed up by the court when they dismissed that argument saying that, for something to be derivative, you would have to be able to show copied work.

Everything since then has been a round-robin of “But what if I have a box and it has a million things and a fairy and some machine code that does a thing…”

The discussion about “Is it art?” is interesting. The other part, not so much.

AI is not taking existing tracks and combining them with random noise and maths. AI is taking a large sample of existing tracks, analyzing them to create rules and then use those rules to create something that fits into the requested category.

Such as, AI can analyze thousands of things called “haiku” and when asked to produce a “haiku about underwear” it will give you a verse with the correct lines and syllables for a haiku and using words frequently used when discussing underwear, with the result being recognizably a haiku about underwear.

Is the AI copying existing haiku, or taking an existing haiku and combining it with underwear words?

I don’t believe so. I’d say it’s creating a set of rules by analyzing language. From haiku, the verse structure, with heightened imagery and language. From general text, words used in conjunction with the word underwear and general grammar and sentence structure.

But there are two types of art.

  • Original art that is wholely owned by the creator, it might be, in fact will be, inspired by other pieces of art but it’s unique and original. E.g. I just sat down, wrote and recorded a song on my guitar.
  • Derivative art that may have added some original elements but is fundamentally based on someone else’s art and the original creator has some ownership of it. E.g. I remixed a piece of music

The OP appears to be saying that art created by AI software is the former: completely unique original art that was simply inspired by the training data but not derived from it.

If that’s the case what stops art created by other, non-AI, software based on other artists work being original art?

It’s all fundamentally the same operation. You take someone else’s art in digital form, give that to a piece of software, that then processes it in very complex way and spits out a “new” bit of work based on your instructions. Where is the cut off between a bit of software that can produce an original work and one that can’t?

Thank you for explaining. That distinction isn’t relevant to how I appreciate art, but it is something that needs to be hashed out in the courts.

That’s combining them with random noise and maths. The rules are based entirely on the training data so that end result is just a very clever combination of the training data. There is nothing magic going on, it’s just a series of dumb machine language instructions that take the training data (again a bunch of other peoples art), and then spit out a new piece of art based on them. Plenty of traditional non-AI algorithms involve building a set of rules based on the input and then using them to produce an output.

What’s the line between one computer program that is producing a unique piece of art from other people’s art and one that is producing derivative work?

I spend an embarrassing amount of time “stress testing” AI image models with weird requests. You seem to be implying that AI output is just a sophisticated collage of bits and pieces of images from the training data. That can sometimes be true if the training data is very limited. But the models also create new imagery based on a concepts that the AI has derived that are very unlikely to be directly from the training data. For instance, here are samples from a series of images of 70s-80s pop culture icons I made in Dall-E 3. In the “copying” column, requests for images involving Jaws often (but not always) output reproductions of the Jaws rising from the ocean image used in the classic poster, sometimes copying it very closely. That image probably exists in the training database hundreds or thousands of times, so have a huge weight in the AI’s concept of “Jaws”. Images of the Godfather also seem to be largely reproducing highly-reproduced images. But then look at the images of ET. Dall-E 3 has no idea how to deal with the very stubby, barely visible legs of ET from the movie, so it invents long legs for him that are detailed and smoothly consistent with the body and are in no way derived from the source images of ET in the training data. They come from the AI “understanding” the concept of legs, going by the general principle that legs tend to be long, and inventing novel long legs for ET. This type of thing isn’t uncommon in AI image generation: it looks at the training data, derives an inaccurate concept of what an x is, then draws instances of x as that inaccurate concept inconsistent with the training images.

Here’s another broad example. When SDXL was in open beta, I tried a lot of prompts of “Leon and Matilda from The Professional” in style x. Never describing what Leon and Matilda should look like. But SDXL already knew: it had examined the training data and learned that Leon was an older balding man with a beard and glasses and that Matilda was a younger woman/girl with short dark hair. But very few of the images show signs of being copies of details from source images, they are entirely new images that adhere to that general derived concept (some more closely than others).

I can absolutely visually demonstrate what is happening in a simple neural network, such as for handwritten digit identification. I’ve yet to see a visual demonstration of how a brain does the same thing for a task like that. Large NNs are well understood, and we could absolutely visualize what was going on, it would just take trillions or more outputs to do so.

The brain is a massively larger system with multiple parts involved in handwriting recognition. But there is no doubt that the most reductionist level includes weight different inputs and outputs to create patterns of activity. We can see those inputs and outputs in simple learning models, even years back in Aplysia. In human brains the scale is huge numbers and systems embedded in systems too complex to map for even simple learning paradigms but … so?

This is the part that I question, while acknowledging that I have little expertise in how the human brain works. I’ve never seen anything that proves that the brain works in the manner I’ve described for NNs, so if you have such proof I’d certainly love to see it.

Oh there are woo people who claim some quantum crap, and ghost in the machine beliefs persist in some places … but the brain is at its base of huge number of neuronal connections and their supporting cells and milieu taking in inputs and firing off outputs that are usually but not always digitized, within a context of inputs from both outside and within the body. It’s very well modeled at some basic levels and in some simple organisms. There is a reason the computer models are called neural networks… they were very literally inspired by the simple level of human brain function we have. They (mostly) made that basic level really big. The nesting levels and interplay between the levels, how they reverberate information top down and bottom up and sideways… there is no clear consensus of how the brain does it or how that creates the emergent properties of human experience, and no attempt to replicate the little we do understand of that in these models.

I have no idea what you’d count as proof. We can trigger firing in certain regions and cause sensations. We can show that various synapses change shape in response to learning and numbers change. We see the expansion and pruning of dendritic networks over the course of development.

Imagine we existed in an alternate universe where AI could accomplish everything you asked of it but also Drake and Kendrick Lamar did not exist. You ask an AI to generate 1000 rap songs and the exact music and lyrics to Not Like Us is spat out as 1 of the 1000. Would you notice it? Would you clock it’s significance?

Like, it’s a fine song, catchy, competently made etc, but that’s not why it won the Grammys and Kendrick performed at the Superbowl half time show. Without the entire context of Drake and Kendrick’s beef and the accusations of inappropriate relationships and the storyline of the beef up until that point and the timings of the songs dropping and Drake’s lawsuit against UMG etc etc. it’s just a series of notes and lyrics.

Art is in conversation with Art. What makes it art is how that song exists in context. Kendrick could have conceivably used AI as a tool to create Not Like Us (he almost certainly didn’t) but AI could not have created Not Like Us even if it could have created the notes and the lyrics because it cannot create the context. It can’t feel petty, it can’t have been hurt by Drake, it can’t have developed an opinion and point of view based on being black in society, it intrinsically can’t be those things.

What you go to fine dining for is not deliciousness. Will it be delicious? Yeah, that’s the table stakes. The reason you go to fine dining instead of having the most expertly crafted burger or lobster or A5 steak is to experience a point of view. It’s to be able to ask, why is this chef, in this moment, placing this plate of food in front of me, what do they want me to experience? The same plate of food, lacking this context, and crucially, the ability to interrogate context, is just another plate of food.

We answered this question back in the 80s. The fad back then was “high level languages” (like COBOL or FORTRAN) that would allow business people to write software and obviate the need for programmers.

Business people felt like they had such a clear idea of what programs they wanted in their heads, surely they could just describe those programs in roughly natural language and out the other side would pop a bug free program.

What we eventually figured out is that the act of describing a program is programming and actually knowing what you want to build is the hard bit of programming. Actually pinning down what was in a business person’s head was the primary challenge of programming because the actual details, the things that make a program a program, were just yadda yaddaed away in the conceptualization stage and actually pinning them down and making a decision required actual thought and research (eg: I’m making a pet tracking app, do we expect to allow people to own more than 12 pets? If so, this needs to be a list that can expand and maybe a search/sort function to allow them to find their pet, otherwise, it can just be 12 fixed slots).

This is the same eternal conversation, only now in the field of art. Art is the choices, “refining the prompt” to get you to what your artistic intent is requires you to make so many choices that by the time you’ve gotten there, it’s not appreciably any less difficult than just doing the art in the first place. Like, yes, it sometimes saves time on the execution but if you’re a skilled artist, the execution was already what you were spending the most time on, it was the choices you had to make along the way to get to your artistic intent.

And if you’re not a skilled artists, yes, what you’re unskilled at is the craft of making art but you’re also unskilled at the ability to make artistic choices and you will find far more people with technical skill and no eye than artists with a talented eye and no skill because the only way to get that eye is via the diligent application of skill.

Just because it’s not always a literal collage of bits of one image stuck together with bits of another it doesn’t mean its not a clever combination of the training data. I mean for starters that first image is absolutely a collage of the training data. I could literally have created that by cutting out pictures of Jaws and Star wars from a magazine and glueing them to some poster paper.

But the second image is still just as clearly a combination of some training data. It took some stills from the movie Leon and some pictures someone has taken of some bobble heads and combined them. That’s not detracting from the technical achievement of what this code is doing, it’s absolutely incredible sci-fi stuff. But it’s still some code that takes someone else’s images and combined them to make a new image, no matter how clever that combination is. it’s not being “inspired” by the movie the Matrix and the concept of Bobbleheads, it’s taking the literal 0s and 1s that represent those images, using them as an input to a computer program, and presenting the output to you as a new image.

No. It’s taking existing tracks, using them to figure out the patterns that make one set of noises “music” and another set “white noise”, and then transforming a random datascape so that it’s slightly closer to the patterns noticed in “music”, and repeating the process thousands or millions of times until you have a new and unique song.

It’s not really like how humans think, but it’s also nothing like copying.

This deserves emphasis.

New technologies come round and the sets of choices change. Photography created the art of photography and got painting to move from the realistic Acadamy School to Impressionism. The nature of the choices changed. The context changed.

For now artists and hacks are playing with this tool. Not every pretty smear of paint is art. Not every photograph is art. Not everything created using AI is.

I do not believe we are at the point where the AI is creating art independently. But we are at the point where it is new medium for artists to play in. Maybe we will get to the former someday.

Again you just described “taking existing tracks and combining them with random noise and clever maths to produce a new track based on the parameters you provided”.

I’m not downplaying how incredibly clever the maths is or how impressive the results are. But its still the same fundamental operation, you take the literal 1s and 0s that represent someone else’s music, use them as an input to a computer program (again a series of dumb deterministic machine instructions), which then spits out a new series of 0s and 1s based on parameters provided by the user.

“Clever maths” is like the “Yadda yadda” episode of Seinfeld. You take the most important part, handwave it away and say “See, so they’re exactly the same!”

I’d argue thats exactly what you are doing. You are just arm waving and saying this maths is so clever that it means this computer programming is capable of creating unique original non derivative work. Why? It’s just clever! It use rules! And noise!