Straight Dope 1/27/2023: Could artificial intelligence replace journalists?

Never. We would need totally new laws to cover that. If we are making determinations based on how the law is written, there is nothing that would grant it rights. That seems pretty unambiguous.

That can change in the future, but we can only speculate about that.

Businesses will have rights before that ever happens.

To see what I’m getting at, imagine you publish a newspaper. I invent a wonderful machine which can synthesize an original newspaper article, but in order to do so it must physically consume ten print articles on the same subject, five times each. I start a business where the model is to subscribe to ten newspapers, Xerox each of them four times, then feed it all to my machine. Then my machine spits out type for me to send to my editor. The resultant newspaper can undercut your paper because rather than pay a columnist for coverage, I need only subscribe to 10 newspapers and create forty copies.

Is my copying of your newspaper in the process of creating a competing product fair use? Or is it a violation of your copyright?

~Max

Some of this agreeableness can be reduced with the right prompt. For instance:

What is the etymology of OK? Please be as fact-based as possible and not engage in speculation.

The origin of the word “OK” is somewhat uncertain, but it is generally thought to have come from the expression “oll korrect,” which was used as a playful way of suggesting that something was not quite right but was still acceptable. This expression was popularized in the 1830s by a group of people known as the “OK Club” in Boston, Massachusetts. It is also believed that it was used as an abbreviation of “Orl Korrect” or “Oll Korrect” which was used as a humorous misspelling of “all correct” in the 1830s and 1840s in the United States. Another theory is that it came from the Choctaw word “okeh” which means “it is so.” However, this is not well supported by historical evidence.

Still a little wishy-washy, but it only threw out one alternative theory and said it was not well-supported.

I think the agreeableness problem is not fundamental. OpenAI trained their ChatGPT to be this way, but demand for AIs that are more factual and less friendly will be trained differently.

You just described Huffington post without the AI.

I don’t know. I would expect an AI to need to train at least partially off of usage, to see if it’s giving results that people value. Every other AI I’ve seen does this. And people have an inherent bias towards statements that agree with what they already think.

That said, it’s not really the agreeability on its own that I see as the problem. It’s just one of the ways that the AI can appear to be correct when it is not. The current system may not answer your questions directly, but incentivizes you to check the sources to get the answer. Something directly answering your question wouldn’t do that, even if it includes its sources.

I’m wary of this idea of using this technology to get correct answers, instead of just to help act as social lubricant.

Heck, we have that problem with real people. Even here, sometimes. Can you imagine how much ignorance would disappear if people didn’t blindly trust what they read online, or what they see on TV?

~Max

I can’t find where I’ve read this before. But I also can’t find any talk about copyright violations that doesn’t seem to assume distribution. When I read up on fair use doctrine, it only seems to talk about distribution. When I read up on cases of copyright infringement, I only see cases that have to do with distribution.

For example:

How much of someone else’s work can I use without getting permission?
Under the fair use doctrine of the U.S. copyright statute, it is permissible to use limited portions of a work including quotes, for purposes such as commentary, criticism, news reporting, and scholarly reports. There are no legal rules permitting the use of a specific number of words, a certain number of musical notes, or percentage of a work. Whether a particular use qualifies as fair use depends on all the circumstances. See, Fair Use Index, and Circular 21, Reproductions of Copyrighted Works by Educators and Librarians.

Even when I find something that lists the other parts of copyright beyond distribution rights, all of the other rights talk about things that are only illegal if distributed.

You clearly can make notes or translate something for personal use. You can alter your own personal copies of a book.

So I turn it around on you. Can you provide citations that personal use is an aspect of fair use? Can you provide examples (beyond decryption) where a copyright infringement did not involve distribution?

Meanwhile, I type the same query (cut & pasted from yours), and it spits out the correct etymology, but is even more wishy-washy. You ask it the same question a few times, and sometimes it gives you the right answer, sometimes half-right, sometimes totally wrong. (Which is to be expected, as I assume there is some random seed in its response generation.)

What is the etymology of OK? Please be as fact-based as possible and not engage in speculation.

The origin of the word “OK” is uncertain, but it is likely a shortened version of “oll korrect,” which was a playful misspelling of “all correct” popularized in the 1830s. It may also have originated from the Choctaw Indian word “okeh,” which means “it is so.” Another theory suggests that it comes from the German word “auch kein” meaning “also no” or “nothing either”. These theories are all considered plausible, but there is no conclusive evidence to prove the true origin of the word.

I asked it another time and it mentioned a Scottish derivation, as well.

As your cite notes, reproduction is one of the statutory rights afforded exclusively to the copyright holder in the general case. Fair use is a statutory exception that covers pretty much all of the personal uses you mention.

Fair use.

You can also sell a book which you have purchased, and thus redistribute it, despite the fact that copyright owners have the exclusive right to distribute their works. Copyright attaches to the work, which is intangible, not the copy, which is a particular material object a work has been fixed in. Property rights attach to a copy.

That was the main holding of the Betamax case.

I make a copy of a CD for personal use, then resell the original CD.

~Max

The moment you’ll know AI journalists have surpassed human journalists, is when you scroll to an article with byline: Ima Botski and think, alrighty, this will be a good read, as opposed to byline: Billy Manski and think, f$&k, I’m not reading this crap.

ChatGPT and others are “fine tuned” with human input. That’s why it takes effort to make it say racist/bigoted stuff–it’s not like there’s any shortage of training material available, but it went through a fine-tuning process to downplay that aspect.

I think it’s clearly been tuned to be agreeable, too. That’s what you want when you ask it to come up with a story or poem. And open-ended questions, or matters of opinion, etc. That’s usually not desirable for specific factual questions, but ChatGPT doesn’t seem to have been created with that in mind.

Even with fine-tuning, I’d agree that the current architecture isn’t great for factual questions, since it seems to have no means of citing its sources, etc. But it absolutely could do a better job at simply refusing to answer when it doesn’t know something, or being less likely to engage in speculation.

Interesting. Yes, there is definitely randomness–generally called “temperature” in this context–where it sometimes selects a word that’s not quite at the top of its list of predictions. I’m somewhat surprised at the variation in its wishy-washiness, though.

This seems to be the part of your post that actually gets at what I asked for. Your other examples all involved distribution, and I was already fully aware that my links talked about reproduction rights—the problem was that it only talked about them in reference to distribution. They did the same for fair use, not discussing private uses at all.

But, yes, it does seem that the Betamax case, the one that coined the term “fair use,” talks about personal use, albeit only in a very limited situation of time-shifting over-the-air broadcasts. Still, that personal use exception does not seem to be mentioned anywhere in fair use doctrine that I can find. You’d think it would be under the first prong, the one about the character of the use. Why isn’t “was it only for personal, private use” one of the tests?

I still contend that, since the copies that are used in these machines were legally obtained, and no further copies are made that were illegally obtained, there is no copyright infringement. The idea that the output of the AI is a derivative work doesn’t fit to me because none of the copyrightable features of the original work are present in the output.

And, no, I don’t see how your example with the newspaper would be an infringement, either. Not unless the competing newspaper contained some of the copyrightable material from the other newspapers.

Middle-ish in the other ChatGPT thread, I had repeated someone’s query about what presidents were born in Vermont. Coolidge and Arthur is the answer. The first time it answered:

Vermont has not had a President born in the state. The only U.S. President with ties to Vermont is Chester A. Arthur, who was born in Fairfield, Vermont, but moved to New York as a child.

Then I hit “regenerate answer”:

Only one U.S. president, Calvin Coolidge, was born in Vermont. He served as the 30th president from 1923-1929.

But third time’s a charm:

Two presidents were born in Vermont: Chester A. Arthur and Calvin Coolidge.

After it selects a word from its likely list, the next word it picks will be based on all the previous words in its output and most strongly by the word it just chose. Once it starts down a less-than-good path, forever will it dominate that output’s destiny. Consume it, it will.

Sometimes I’ll watch ChatGPT carefully as it generates, and when I see it pick a definitely incorrect word, stop its output right there. I know it’s unlikely to recover and give me what I want after that.

Good article about ChatGPT in the classroom:

https://alperovitch.sais.jhu.edu/five-days-in-class-with-chatgpt/

Long story short: Having students in class keep a chatGPT window open to answer questions during the lecture was a big help to everyone.

We have already reduced employees dramatically on factory floors. There is still room for some reduction with AI, but not all. You need people there to handle it when the machines break down, to handle abnormal situations,etc.

For an example of why you can’t get rid of people, I recently worked on a factory automation process to handle ‘exception’ flows. For example, a line makes multiple products, Currently it is producing product A, but a machine necessary for A breaks down. You can’t afford to idle the line while the machine is down, so humans have to go in, remove all the partially finished ‘A’ products, load the bins with material for ‘B’, and start producing ‘B’. This causes all sorts of issues with MES systems, such as counters that were counting ‘A’ have to be reset, inventory systems tracking ‘A’ have to adapt, etc. But most importantly, there is physical one-off work that has to be done, unique to each situation.

The ‘dark’ factory consisting of nothing but automation is the holy grail of manufacturing execution. Elon Musk thought he could do it at Tesla, and failed. There are just too many little nooks and crannies that require human, physical intervention. It will remain that way for a long time.

Even if an AI could spot all the problems and knew what to do with them, factory machinery is not up to the task. Factories are incredibly hard to create, and once one is working they tend to not get changed for a long time because even small changes can cause problems in complex environments. There are factories out there still using machinery built in the 1800’s. That’s the real limiter - the time it rakes to change over the physical infrastructure to be compatible with AI.

Years for sure. After that, it gets a lot fuzzier. But when it comes to changing over infrastructure, it has to be decades because this stuff just takes a long time. I worked on a project just to digitize the information in a factory so it would be ready to automate (getting rid of all paper processes, finding the ad-hoc processes and formalizing them, capturing information residing only in the heads of senior factory workers, etc.), and that project had been ongoing for a couple years when we joined it, and we worked on it for a couple of years and it was still ongoing when I left.

People like me will get hit this time around. You still need the blue collar guys to physically maintain the plant and handle exceptions, but the people who do statistical process control engineering, analysts, inventory specialists, and other white collar workers are much easier to replace with AI.

How is that any different than a writer for a small paper reading four stories in the major papers then writing his own based on them? That happens every day. Everyone leans on everyone else’s work, all the time.

If I’m an aspiring painter and intensively study Van Gogh right down to his brushwork, and therefore paint stuff in a way that is evocative of Van Gogh’s style, how is that any different than when an AI does it? There are plenty of painters out there who are recognized as following the styles of others, often very closely. This is sometimes lauded: “He’s a great painter. He studied under Baudin, and you can see the influence in his work.”

Copyright shouldn’t be about the process used to create something, it should be about whether the work is original or copied from someone else. The real problem artists are facing is that the AI can do this fast and at scale. But the process and output are no different than if a human uses the style of another artist to produce a unique work.

There are enough ChatGPT users here now that we could try a fun project: A group writing project in which we take turns generating the next section of the story or article by designing a prompt and feeding it to ChatGPT to write the next section.

Anyone interested? I’m thinking we make up a list of participants, put them in order, then when it’s a person’s turn they go to ChatGPT, prompt it to write the next section. Part of the prompt might be to feed it what was already written so it knows what to do next. It would be like an improv - when it’s your turn you get to decide where the plot goes, and the next person has to follow your lead. Or, we could plot out a story or article first, perhaps with the help of ChatGPT, build a character list, then part of the input to ChatGPT for each section would include the plot summary and character summary and then what you want your section to be about.

I would also suggest we include the prompt we used, hidden in a spoiler box. That way others can learn from it and we can try to improve the output by improving the prompts.

If someone doesn’t have access to ChatGPT, they could message the prompt to me, I’ll run it and send the output back in a response message so you can post it.

Anyone interested?

Couple things, I’m not talking about having no one at all inside the building. But, I am talking about having no one on the factory floor for the purposes of building things. As you pointed out, there will be a good while before breakdowns and exceptions can be handled, but I don’t think it will be as long as you do.

You can also make AI compatible with the existing infrastructure.

It takes a long time when there isn’t much incentive to change. If there is an incentive to change, whether that be due to labor shortage or just labor getting expensive, change happens much more rapidly.

If someone comes out with a robot that has humanoid arms and hands, and the dexterity to use them, that’s going to make it much easier to convert, as you won’t need to make much accomodation for them, they will be using the same tools and processes as the humans were.

For a while longer, but not that long, certainly not forever.

Yeah, most of them are toast. In fact, most things that require a college degree will probably be the next tranche of jobs that are eliminated, as once you have learned to give the knowledge and skills to an AI, you can replicate that as much as you want, and they can work 27/7 without making mistakes.

But, inside of a century for sure, pretty much all of us will be superfluous, there will be nothing that a human can do that an AI can’t do cheaper and better.