Is Artificial Intelligence (AI) plagiarizing?

I think that is also the reason we see more test-charts with the type of “www.Al128.com” watermarks …

for even the most blatant ripper-offer (who might work out of mongolia, namibia or belize, and thus be hard to stop, legally)

See this from upthread:

Sometimes quantity has a quality all its own. That’s what distinguishes you learning to write better prose by carefully reading the whole NYT every day, and the AI learning to sell thinly disguised regurgitations of NYT’s work as its own.

Ok, i have to say i read half of this thread and became very excited. Very interesting. So i cant help myself from asking (before reading the last 30 posts) does AI know what is copyrighted and what is not? Isnt that hard to figure out (if it even attempts) for an AI? Is it looking for tags in the text about copyright, or how does it treat this?

Everything someone wrote is copyrighted by default (maybe with some exceptions I will leave to the lawyers here).

Your post is copyrighted (although I suspect the SDMB terms means they own what you wrote…not sure, I have not read their ToS in a long time).

The original author does get some credit, at least, from the copy of the article on the author’s own site (which must exist, or the AI couldn’t have scraped it).

But with any listing of the top 5 products of some sort, everyone already knows that you can’t just look at any random listing, because there are obviously a lot of folks out there who have a strong incentive to make dishonest lists. You need to find your lists from a source that already has a good reputation. And while an AI scrapebot might steal the content of your site, it’s going to be a lot harder for them to steal your reputation.

Last I read (and this is fairly standard for internet forums and the like), you retain your copyright, but you grant the site a license to use it (including in a published book-- Some of the Straight Dope books contain excerpts from the early days of the SDMB).

I’m coming into this conversation late and waking up a somewhat old thread but I’ve been thinking about this subject a lot recently.

Regarding the idea that a machine vacuuming up other’s work in order to learn is somehow plagiarism: search engines do this, and in a much more plagiaristic way, by retaining and sometimes displaying snippets of the material, often without the authors’ consent, whereas LLMs only store the statistical relations of tokens (words or parts of words).

If copyright offfices, courts, and legislators make rules against LLMs because of their scanning of copyrighted material, will that then make Google, Bing, etc. illegal operations?

Regarding AI art: are AIs really copying other people’s work or are they simply developing the concept of “cat”, “dog”, “tree”, etc.? Are they, in a way, discovering things comparable to Plato’s perfect forms? Is that plagiarism?

As a practical matter, the development of AI seems to me to be the next logical step in humankind’s technological evolution. And, considering the massive problems the human species currently faces, we may need a bit of “artificial super intelligence” to solve some of these problems.

Such an intelligence requires, almost by definition, a massive amount of data, such as the huge repository of information known as the internet. In fact the existence of something like the internet may have been a necessary stepping stone.

Also as a practical matter, if western democracies make rulings or pass laws that hinder the development of ASI, China, North Korea, and Russia won’t give a fuck about our copyright laws and could leapfrog beyond us technologically.

Humankind seems, for better or worse, to be on the verge of massive changes in nearly everything, whether we like it or not. We may be witnessing something bigger than the industrial revolution or the invention of agriculture. IP rights may become an unenforceable hindrance. I’m not at all sure that I like that, but it is what it is.

They are developing a concept. And they are able to take those concepts and portray them in combinations, poses, and media unlikely to match any specific image in the training database. Some AIs learn more accurate concepts for specific items than others. For example I once tried “gorn wedding” in Meta’s Imagine AI, and it knew that a gorn was something bipedal, green, and reptilian but didn’t know specific details of what an actual Gorn looks like (in any of the Star Trek incarnations).

And it can take those concepts and put them into different poses, situations, and media from any likely images in the training set. For example these reasonably accurate Bojack Horseman images that aren’t the simple drawing style of the cartoon. (He was the only character Bing recognized by name, other characters had to made from descriptions.)

Or these realistic-ish Garfield and Shrek images.

Or these various other images putting recognizable characters in novel situations–none are copy/paste jobs like working with clip art.

If it’s plagiarism then all human beings are plagiarists. If merely looking at something is plagiarism then functioning in the real world makes everyone a plagiarist, short of somebody kept in a dark closet since infancy.

Also, I find the constant attempts to make copyright ever more intrusive and oppressive much more frightening and destructive than (so-called) AI. People declare that AI will destroy art, but it doesn’t need to because copyright is already doing so. Copyright is a blowtorch burning away human culture; a hundred years from now nearly everything of modern culture will have been destroyed by it AI or no AI.

Heck, if anything copyright makes AI likely to be more pervasive since so much of the human-made art that might compete with it is locked away by copyright rotting away somewhere.

Of course. My question was rhetorical.

I would argue that copyright was a necessary evil. Sure there are a few people who would create works for the love of it and are able to spend time doing so, but most people need to be paid for their work in order to sustain themselves.

Congress should never have allowed themselves to be lobbied into increaing the length of copyrights in order to protect Mickey Mouse. That’s led to the “rotting away” that you described.

Consider the movie Frozen. Then consider a billion knockoffs of the movie and its characters, concepts, changed just enough to avoid whatever the copyright laws are. That’s an unusable world from the standpoint of a consumer. Hooray for you that you can command AI to do these things. So can everyone else, so can effectively industrial processes, arguably AI itself. It’s an unusable world of junk creations.

Gatekeeping is going to be absolutely essential in our future. Look, the AI spewed out this study with readable charts and text, but what is the data that went into the study? I don’t know, there’s a billion other studies with slightly different or vastly different outcomes, maybe I should read every last one?

We’re absolutely going to need to gatekeep in the sea of junk, and that could lead to totalitarianism or an Encyclopedia Brittanica approach. But that’s quickly going to become more important than “look what the AI put out.”

Even without generative AI fraudulent studies are a growing problem that has not been well addressed.

With generative AI faked data is going to be harder for reviewers to detect, and completely faked articles available online may be impossible to distinguish from the real things.

We already have a real problem with the public trust of the scientific community and the scientific process. An increase in fake science getting reported as news, which I cannot see not happening, is going to kneecap public confidence even more.

I have no doubt that AI will be a great tool for scientific discovery AND the problematic issues need to be dealt with proactively as best as possible.

Gatekeeping may be considered essential but it may also be impossible. Like nuclear weapons, the cat is out of the bag.

Maybe we can use AI to find a solution, lol.

Oh, I don’t think it’s out of the bag at all. Someone will figure it out. Whoever can get people to listen. Which may be demagogues.

If you have a goal, it’s on you to figure out how to reach people. I think if it’s between cult like things, demagogues, simpler answers, or a sea of lies and junk, you’re going to find people choosing the former. You can adapt, or believe that just because AI can spit out a bunch of junk that people have to consume it.

Who do you think can figure it out? I’m not anywhere near that certain. I do think the cat is out of the bag. Someone will use it, and it won’t be just to spew out fake research papers.

Trusted sources. Reputations. They’ve worked in the past.

Anyone is going to be able to fart out something with charts and words. That, in and of itself, is going to be worthless. You will need to have someone check your work. And we can’t check all the work, all of the crap that AI and AI-aided humanity can fart out by the oceanload each and every day. So you will have to establish, or re-establish, trusted sources and reputations and do, yes, gatekeeping.

Or we can just fart out crap with no quality control, and have the demagogues take over.

I just looked up the number of self-published books. In 2021 there were 2.5 million to 3 million, as compared to 500K to 1 million traditionally published books. Gatekeepers - libraries and book stores - help in filtering them for the average consumer. AI is going to make this worse, but besides flooding Amazon unless they put in a filter, it won’t change the market very much.
I suspect that before long there are going to be policies banning AI-generated papers. There is a thread here about the ones I have found already.

I think it was a necessary evil that was implemented almost entirely wrong, leaving all the evil and not much of the necessity. It’s overwhelmingly used to deny people the opportunity to partake of copyrighted things, which is the opposite of its supposed purpose.

And it’s applied in a very draconian and plutocratic way which is very relevant to the discussion of “What to do about AI”, since you can take it as a given that any attempt to “control” the use of AI by copyright will be heavily slanted to favor the rich and written in a positively dystopian way. Just like copyright law in general.

As a slight aside…

My YouTube recommendations are now showing new channels every day (channels with almost no views). Not on my subscription list. Maybe related to some interests I have. But almost all of the ones I have bothered to check out are clearly AI produced. The person may have compiled the video but the dialog and even the voice narrating are AI.

It has become so easy to do this now there is almost no quality in these videos. I constantly tell YouTube to not recommend but the next day there are ten more new ones.