I'm missing something about AI training

Okay, I would like someone to show me the “important” distinctions between AI learning by analyzing existing music, and a human who listens to the same music and gets inspired to sound like their heroes.

Surely it can’t be the speed with which AI will learn vs the speed a human picks up on stylistic nuances of artists?

The difference is that once trained, a generate ‘AI’ can produce ‘music’ at a prodigious rate, around the clock, and without any coddling or contract negotiations with a human artist…assuming, of course, that you aren’t concerned about innovation, originality, distinctiveness, et cetera, and with low cost (provided that investors are still willing to dump billions of dollars into training and ‘compute’ with no conceivable profit or clear business case).

Stranger

So it can produce more than a human.

Okay, so back in the 1980’s Todd Rundgren’s Utopia makes a tremendously influenced Beatles style album. Clearly the band learned from the masters. But no one then (or now) was screaming, hey pay the Beatles for teaching you their style (assuming that Utopia purchased Beatles albums). So assuming an AI group buys an artists catalogue and teaches it that artist’s style, why would that be any different from the Utopia scenario?

Are we afraid that that is all that will ever be done?

Are you just specifically asking about the learning/training phase, or are you also including the part where the AI uses what it has “learned” to produce “new” material?

Either way, I’m not sure we know enough about how the human mind and creative faculties work to be able to tell what, and how much, is different between that and how an AI does similar things.

One clear distinction of how a human learns versus how a generative AI is trained is that the human experiences ‘data’—be it music, text, still images, video, et cetera—by interpreting through their embodies sense. This means that they generally have to be exposed to the information repeatedly to gain an ability to replicate it, usually through commercial channels (in the modern era, at least), and training through emulation, eventually producing a style that is some combination of syncretism and inspiration. A generative AI, on the other hand, is fed the ‘data’ directly in digital format as part of a massive data set, breaks it into tokenized elements, and formulates ‘associations’ in its artificial neural network to replicate it in a generalized form. Generative AI like large language models have no actual semantic definitions other than these associations, hence why it ‘hallucinates’ (which is a poor term to describe the fact that its algorithm really just produces garbage), and no real ability to actually create something unique or novel. This is why all LLM responses basically read like Wikipedia entries, sounding authoritative even if they are producing unadulterated nonsense. It also explains why it takes yottabytes of data and hundreds of millions or billions of kilowatt-hours to train a sophisticated LLM, while your 25 watt brain can equal its learning capability (if not its capacity in generating results) with less than 10,000 kWh even though it spends much of its time figuring out how to keep itself fueled, sheltered, pleasured, and convincing another 25 watt brain to join it in an act of reproduction, notwithstanding how much of art an literature involves describing the experience of these things through firsthand experience that a generative AI will never have.

Stranger

The difference is that AI is new and therefore scary because everything new is always scary.

I’m a programmer, and right now AI is the Big Thing. I can’t say that I’ve dipped my toe in the AI programming sea. More like I’m at the beach looking at the water.

AI is basically statistical analysis of the “training” dataset. At some level, it’s basically averaging up the input.

The difference between AI and people in re the original question is that AI can only interpolate, whereas people can extrapolate. So, the person making music can take a song and say, what happens if I do this?

The AI can’t ask questions, only remix its information. It can’t see possibilities outside of its database.

Which means that IMO, the music created by AI is going to end up bland or dissonant.

I would encourage everyone to go make a picture with Dall-E or similar. I think you’ll find that the images are Hallmark card bland.

I think the fundamental question is what is the closest analogy to using AI trained on existing music:

  • Me, a human, listening to a bunch of music and being inspired to write songs that are a little bit like Led Zeppelin, but also a bit like Taylor Swift and Cannibal Corpse, but not directly copying the words and music of any of those. Which is totally ok and how creativity works.
  • Taking short clips of Stairway to heaven, Shake it Off, and Shredded Human, and combining them and passing it off as my own song. Which not ok, and will get you sued.

It will be up to the courts to decide this.

That’s a very reductive way of looking at it. The advanced image generation AIs have strengths and weaknesses, but are capable of creating images in a vast number of styles if you know how to ask for it with detailed, descriptive prompts. Whether or not the output is useful depends on what you want to use it for. Here is a sampling of images in different styles, all created using Dall-E 3 via Bing, the majority of them in the past couple of weeks. Like them or don’t like them but they have nothing to do with Hallmark cards.

But I don’t think AI is doing what your second bullet point says.
I guess some folks are convinced that AI is merely a copier.
I was of the impression that it notices patterns not consciously obvious to the human mind and uses them.

But is that so, that AI cannot extrapolate? Isn’t this the crux of the argument?

This reminds me of the Star Trek episode “The Ultimate Computer”. People refusing to believe the machine “thinks”, and doesn’t just mimic. i don’t think it’s fully understood how AI does what it does and some of our fragile egos refuse to consider the possibility that AI can “extrapolate”.

Dismissive one liners are nearly always incorrect and provide no value.

The current legal system is designed around the idea that art is difficult to make, and we need to incentivize artists with copyright to help create more for the public good. That is the sole reason it exists–the text is literally in the US constitution explaining this.

AI copies. It does not create. it doesn’t have ideas that it then builds something around, like a human does. It fundamentally works differently. humans always put in the ideas, and then the AI gives a bunch of stuff that doesn’t quite fit until the human finds the good stuff.

Which is possibly valid if the human actually does that, but most AI image stuff is for content farms, which provide negative value.

That doesn’t make it wrong per se, but it does mean that people have a reason to think “maybe I don’t my art used for this. It’s not being used to help other artists make more content, but to make stuff that doesn’t improve the public domain and will overwhelm the ability of other artists to do their work.”

The people who actually know what is going on disagree. The fictional version of AI is not the reality.

There are things we don’t understand, but the idea that the experts are wrong because of hubris is one of those things that makes for compelling drama while rarely the case, because other experts would show them they are wrong

It plays on a human desire to be one who proves the smart people wrong and on human fears about ourselves.

I’m not a huge fan of Dall-E as it tends to have a “look” that I associate with AI art, especially the dismissive Facebook slop kind. That’s not to say all Dall-E images are slop or bad, just that Dall-E puts a sort of sheen on them that I notice right away.

Here’s a couple shots of the current Midjourney “popular” gallery. I think it does a fairly good job of showing the range possible with AI.



Personally, I think there’s some very nice images in there. Perfectly fine for illustrative or “Oh, that’s pretty”/“Oh, neat” purposes. I don’t need my AI images to capture the human condition any more than I need my book covers and wall posters to do so.

This might be true if you just told an AI “Make me a hundred songs, see ya later”. But I think most people who use AI for artistic purposes then use their squishy human brains to guide it through the process of “But what if we did this?”. So there’s no real reason for the end result to be bland or dissonant aside from the human prompting it accepting a bland or dissonant result.

This is really the key point to answering the OP. And from the little I understand we aren’t even so sure we understand how gen AI actually gets to what it produces.

I think we can make a confident guess that they work differently but it is just a guess.

This seems to me to be a poor distinction. Human conscious experience, qualia, is a top level thing, but under the hood our data is tokenized elements of neurons integrating inputs and firing or not in various patterns reverberating about in nested levels. Our experience is one way of thinking of our creative process but it is just as much the neuronal data, not quite digitized, but closer to that than not.

The part that matters is how those patterns of data processing are set up. I doubt AI replicates the patterns that we have, but since our understanding of each is so incomplete?

That seems trivial to address. Tell it to ask itself the questions of this sort of data that a human scientist or artist would ask and answer it.

The people who actually know know they don’t know and generally say so. Mostly though they are too busy figuring out ways to stay ahead in the race and finding best ways to apply the new tools to boost productivity. To profit the most.

These paragraphs seem contradictory. Does the training database consist of bland or dissonant music? It seems to me that bland and dissonant music is AI extrapolating from its training set.

A large amount of AI image output is generic or badly flawed, but a large amount of it isn’t, and people have to put some effort into “pixel peeping” in order to find some imperfection that they can insist makes the image horrible.

I think a lot of the hatred of AI comes from the same impulse that leads to supporting labor unions. Periodically some group goes on strike and suddenly a large swath of the general public appears to really care if Chicago cab drivers continue to be provided with free swizzle sticks in their break room. Using AI seems like “scabbing” to them, and is probably more offensive as quality improves, not less. They’re singing “Look for the human label…”.

True. And you could say the same for most of the stuff on, say, DeviantArt. I think more AI stuff makes it out into the wild because of basic novelty. Even if it’s visually bad, someone is still amused that they made “Joe Biden Eating Cake While Riding A Unicorn” and feels the need to share that with the world. So the world gets to see an AI eyesore of Biden on a unicorn and thinks “This is what AI art looks like”.

It’s not exactly that obviously. But IMO it’s a closer analogy to a human being inspired by existing music to create new music.

It is fundamentally copying the training data. By training you are throwing the training data in a black box and then the algorithm is pulling it back out and giving it to you combined in clever ways by the AI model based on your prompt, but it’s not creating anything new. It’s not “inspired” by the training data it’s literally just giving it back to you, as demonstrated by things like the “ghost” watermarks image generators will give you sometimes when they produce an image based on watermarked training data.

Another analogy might be taking those tracks and combining them in an clever way with sophisticated audio mixing software. That’s still a derivative work, if you pass it off as your own without giving credit to the original artists you will get sued.

It can obviously but why is that significantly different to me “extrapolating” by taking two tracks made by someone else, running them through auto tune, changing their speed, and then mixing them together.

In that case I’ve made a derivative work, I need to get permission from the original artists before I sell my “new” track.

The way the training data is combined to make the new track by AI is more complicated and cleverer but it’s fundamentally the same operation IMO