I'm missing something about AI training

It has nothing to do with “rights,” it’s about copyright. Duplicating a copyrighted image by hand is also violating copyright; being human or machine makes no difference.

You might think that but centuries of legal precedent disagrees. Humans have a fundamental right to “copy and store” data with their brain. Even for highly classified or illegal content, that would send you to prison for a long time if you copy it with a machine, you are not commiting a crime if you simply look at it, even though doing that absolutely stores it in your brain.

Yup, and it’s the act of copying with a machine, whether it’s a pencil, JPG encoder or ML model training code, that is the violation the copyright.

And that’s something the pro-copyright people have long wanted to overturn. These are people who think that physical media should be banned and all electronic displays fitted with eye trackers to make sure that not a single person sees copyrighted material without paying for it. They think that reading over somebody’s should should be illegal.

And I’m worried that the legal effort against so-called “AI plagiarism” could be the lever they need to push us into the dystopia they want. Especially in the present extremely pro-corporate political environment.

My favorite is their hamburger lie. So much wrong in that pic.

https://www.instagram.com/p/DLdBvCQgGjB/

“Got burgers to celebrate our first 2 albums being so well received :hamburger: :guitar:

I’ve a feeling they’re all having a big of a laugh . Or trying to make a point about AI and society.

LLM’s are programmed to be very confident about the bullshit they’re generating.
(They take after their parents)

All “AI” that tried was trounced by an Atari 2600 chess game.

So with almost unlimited computing power, LLM’s are playing chess on the level of a 12 year old.
People are just hardwired to be gullible.
AI doesn’t exist. They are Google without ads pretending to be smart.

After a few days of being called out, their “spokesman” says it was all a brilliant “art hoax”

The Velvet Sundown, an obviously fictional “band” that’s gone viral after somehow racking up more than 500,000 monthly listeners on Spotify out of nowhere, used the generative-AI platform Suno in the creation of their songs, and consider themselves an “art hoax,” a band spokesperson reveals to Rolling Stone.

I’m not convinced this wasn’t just “We can make lazy money this way” with a lot of ass-covering after being called out but, if it was some planned gotcha, it feels pretty boring. Probably because it was so obvious. I never listened to the music though, just saw the obviously AI generated images they were trying to pass off as photographs.

From the “spokesman” (can you have a band spokesman if there is no band?)"

People have this idea that you have to please everybody and you have to follow the rules. And that’s not how music and culture progress. Music and culture progressed by people doing weird experiments and sometimes they work and sometimes they don’t.

Weird rules-pushing experiments like generating generic 70’s soft rock. I’d be more curious about it as an “experiment” if some established artist was to set this up, do it more competently (at least use a better AI gen for the photos) then pull the cover off saying “Ah HA! See how Spotify just lets anyone make fake music to get rich while real working musicians struggle??” or similar.

AI music–at least the stuff I’ve encountered and messed with–does have a signature sound to it. Kind of like that glossy sheen of AI generated imagery. Hard to explain. I haven’t listened to this band’s music.

If you haven’t played with Suno, though, and are curious, give it a whirl. I was absolutely gobsmacked at how good its music generation is. Is it a bit formulaic? I supposed it has to be by definition, but it strings together a song more quickly than I ever could. And, honestly, it does sometimes surprise me. Great for mining for ideas in one’s own human-played music.

I mean, you’re welcome to believe what you want to. I’m guessing most of your statement hinges on what “AI” is, and I suspect we have different definitions. I’m definitely not talking AGI. What we have here today is mind-bogglingly insane. And I am starting to get a little worried about it, despite my boosterism. Part of me almost wants to go to pre-Nov 2022, before Chat GPT 3.5, despite all the real-world usefulness I get from LLMs.

I played with it briefly but felt like I personally didn’t have a good enough foundation of music to do anything really interesting beyond “Write me an accordion folk song about frogs riding elephants”. Unlike images where I can formulate a prompt full of visual concepts, music has me saying “Uhhh… a love song? And, make it sad, I guess?” and probably getting the musical equivalent of the glossy sepia uncanny valley Velvet Sundown band photos.

It was an extremely stupid contest and all the laughter over it was also extremely stupid. That a chess program beats a language model at playing chess is exactly as suprising and significant as a toaster beating a harmonica in cooking toast.

If I use a pencil to draw a copyrighted image, that’s a violation.

If I use a pencil to notate all of the information I would need to replicate the copyrighted image (using my considerable skill as an artist) is that a violation? I haven’t drawn it, but I could do so trivially any time I want. (hold onto this, I’ll get back to it)

In computer world, we don’t share images, we share digital information that is used to create the image on a screen or a printout. The fact that what’s copied is a string of digits and not a picture doesn’t make it less of a copy. However, when Disney’s servers send an image of Mickey to me, my computer must store a copy, somewhere, and I’m under no obligation to delete that copy.

Just like when I buy a DisneyWorld ticket with Mickey’s image on it, I don’t have to throw it out, that copy is mine, the copyright holder gave it to me.

Where their rights come back into play is if I want to take the copy they gave me and make more copies of it.

I can’t photocopy my ticket and sell the image of Mickey. I can’t copy the image by hand and sell it. My computer can’t send copies of Mickey.jpg to anyone. My computer can’t be instructed to create a file that is definitely NOT Mickey.jpg but generates an image that looks an awful lot like Mickey, and give it to anyone.

Now back up to my second question. If it’s OK for me to notate how to draw Mickey, is it OK for a computer to do the same? They are not creating a cohesive Mickey.llm file that can be converted back into an image, they are identifying what makes Mickey… Mickey. The circle ears, the white gloved 3 fingered hands, etc. Is THAT information a copyright violation?

Yup trivially. If you write down the Huffman encoding of the Mickey mouse image on a piece paper that’s mathematically identical to saving it as a JPG. That is absolutely “copying and storing” as defined by copyright law whether you are using a piece of paper or a computer.

No, not writing out a JPG, writing out in human words how to draw Mickey.

If me writing a paragraph of notes defining what a picture of Mickey looks like is copyright violation, there’s no point having a conversation about protecting copyright, because I want that right to go away and copyright holders can go cry to someone else.

But that does NOT contain all the information needed to draw Mickey mouse. Yeah a human can draw an exact copy of Mickey Mouse from the description “an anthropomorphic mouse with black ears and nose, wearing shorts”, but only a human who has seen Mickey mouse, someone who has never seen Mickey mouse would not.

I agree the act of seeing Mickey Mouse is “copy and storing” it in the brain (in a very complex way we do not fully understand but it’s copying and storing none the less) but it’s a very special type of copying and storing that is explicitly protected in law. If we are going to extend that protection to computer programs which computer programs get those rights?

Sure, THAT description doesn’t contain all the information.

This description has a bit more meat to it. Add in some more specific instructions so it doesn’t rely on you seeing a photo of each step and Bob’s your Uncle. This is a navel gazing theoretical exercise.

Which you came up with!

There is a bunch of case law stating in detail when just describing a work becomes copying a work, and when a work being inspired by another becomes a derivative work. If suggest you look at it if you want to answer your theoretical exercise.

That’s irrelevant to my point (Disney’s point in that lawsuit) if I write a computer program that scans in a copyrighted work, and encodes it, I have definitely copied and stored it.

I disagree.
It shows that all the hype is just that.
I can produce what looks like a coherent sentence, but when it gets to the end of the paragraph it forgets what the question was.
That shows it cannot handle any query more complex than “show me a recipe for apple pie, but gluten free”
If you add: I don’t like raisins and please find me a recipe with less sugar, you’re just as likely to get a recipe for cherry.
That means that it performs slightly worse than a de-shittyfied google.

All the rest is just showing how easily we are impressed.

Is it a nice search interface?
Sure

Does it translate shuff particularly well?
Meh

Is it a good spell check?
Sure

The only people who should be worried are people writing marketing fluff. It can really puke that out like nothing else. I haven’t seen it produce 2 lines of code that make sense - an old school stackoverflow search will point you in the right direction a lot faster. It reproduces Wikipedia in a nice format but is factually weak and cannot (on a conceptual level) recognize discrepancies or mistakes.
The images generated are all cartoonesk. After you see 100 pictures or so you start to recognize the soulless aspect and stupid mistakes (I am pretty sure I can find any number of AI or heavily photoshopped pictures in a stack of real foto’s)

By way of analogy, please consider this scenario about humans:

  • A human person invents a thing - it’s a fine something that everyone needs or at least, wants, and it performs its intended task beautifully and efficiently.
  • The inventor goes into business to produce the thing and sell it - people want it and so far everything is looking great.
  • While that process is ongoing, a group of other human people quickly gear up to independently make their own version of the thing; they have access to the resources to do this much faster, and they have copious marketing exposure; they flood the space with ads (using images and product descriptions that are clearly just slight modifications of the ad copy and detailed spec from the original inventor; because they have the resources (and didn’t need to expend much effort on inventing and writing etc), they are able to undercut the original thing in price and capturing most of the market.
  • Without sufficient orders, the original inventor of the thing never quite gets the product to market, or does, but fails to break even because everyone is buying the cheaper and more well-known version of the thing.
  • Also, it turns out the mass-produced knockoff of the thing is… well, just a bit shit. This version lacks some of the promised features, it isn’t very durable, it has sharp edges and very occasionally it explodes, injuring the user.
  • People are disappointed with the thing, but there’s no high-quality original version available, because the person who invented the original version is bankrupt.
  • The consumers lose; the inventor loses. The only people who gain are the people who don’t care at all about the product.

This isn’t really even a fictional scenario - this sort of thing already happens with a lot of products - and I think most people think it is bad that this happens, and would desire some way of restraining it from happening like this.

Right? I mean, is there anyone who would look at the above scenario and say “YES! That is what I want! I want a shitty knockoff! I want a world in which shitty knockoffs are the norm!”?

Also, even if it could code well, the fact that it doesn’t understand what it’s doing and can’t provide clear comments as to what it’s coding means that what you’d end up with is black box code that’s extremely difficult to understand, much less debug or modify.

Even if it was as good as a human programmer that would mean that you’d end up with buggy code that you can’t actually debug for all practical purposes. It’s not like human programmers can write complex programs without errors, after all; the difference is that they can go back and fix things. And that they understand the purpose of the program so they can actually tell if something is a bug in the first place.