Many of the companies producing those systems use text taken from the internet to train them, however. That may lead to a loop in which the same AI systems being used to produce that text are then being trained on it.
I just searched for the expected wait time for a US governmental vetting process. The automatic AI summary at the top of the search results says “30 days.” This answer appears to derive from 3rd party sites’ statements, many of which are grossly outdated. The actual answer from the actual governmental site is 12-24 months.
Yeah, that’s probably a worse case than the OP. With the OP’s question, they knew they didn’t get an appropriate answer to their question, and had the choice of digging deeper to find a good answer, or trying some other resource, or giving up. On the other hand, your example looks like a perfectly reasonable answer, which could easily mislead someone into thinking it’s correct, and not looking any further at all, which could have disastrous consequences if they need that vetting done by some deadline and miss it.
Those AI summaries sometimes make me so mad. They are usually either obvious, or erroneous. And they’re everywhere now! I just saw they are now the top response on Quora (yes, I know). The one I just looked at, the AI answer was not only completely wrong, but actually contradicted by several of the real answers below.
For no other reason than that this was a fairly long post, I asked ChatGPT to summarize it. This is what I got:
Large language models (LLMs) like ChatGPT aren’t truly intelligent in the way humans think of intelligence. They are sophisticated versions of technology that allow visual AI to identify patterns, such as recognizing clouds from a set of images. LLMs are specifically designed to handle language data, enabling them to process a prompt and produce coherent and grammatically correct responses.
However, LLMs lack the ability to evaluate which response to a prompt is better or more valid. They generate answers that fit the prompt based on their training data but don’t inherently judge or validate the information. Similarly, they don’t draw new conclusions; they can only reflect relationships or conclusions present in their training data.
The quality and nature of their output depend entirely on the data they’re trained with. For instance, an LLM trained solely on a specific political ideology will produce responses consistent with that ideology. If trained on diverse viewpoints, it may present both perspectives without favoring one over the other.
LLMs aren’t capable of understanding deeper relationships or concepts; they merely process and regurgitate information in a structured way. When asked about European cities at the same latitude as Portland, Oregon, an LLM can provide a sensible answer about latitudes and weather but doesn’t truly understand the underlying relationships between these concepts. This limitation is due to their design, which focuses on language processing rather than true comprehension or reasoning.
That seemed fairly accurate, but verbose, so I asked it to be more brief, and it produced:
Large language models (LLMs) like ChatGPT aren’t genuinely intelligent but are advanced pattern-recognition tools designed to handle language data. They generate coherent responses based on their training data but lack the ability to evaluate the quality or validity of those responses. Their output reflects the information they’re trained on, without forming new conclusions or opinions. LLMs process prompts and provide structured answers without truly understanding the underlying relationships between concepts.
Seems to be rather ironic that it produced an accurate summary of a diatribe claiming that AIs can’t summarize worth shit.
I’m well aware of the many limitations of LLMs and the reasons for them, and I’ve run into many myself. I just think it’s a remarkable advance in the areas of natural language processing and interactive information retrieval that’s been improving very rapidly over timespans of mere months. We’ve come a very long way since AIs couldn’t understand idiomatic expressions or real-world contexts and now not only interpret them correctly, but can explain them.
Here’s a good example of both sides of this argument. ChatGPT actually did an impressive job overall of summarizing this abstract from a technical paper. On the negative side, it made a couple of mistakes – but they’re much the kind of mistakes that a human might have made.
Original:
The globally integrated sea-air anthropogenic carbon dioxide (CO2) flux from 1990 to 2009 is determined from models and data-based approaches as part of the Regional Carbon Cycle Assessment and Processes (RECCAP) project. Numerical methods include ocean inverse models, atmospheric inverse models, and ocean general circulation models with parameterized biogeochemistry (OBGCMs). The median value of different approaches shows good agreement in average uptake. The best estimate of anthropogenic CO2 uptake for the time period based on a compilation of approaches is −2.0 Pg C yr−1. The interannual variability in the sea–air flux is largely driven by large-scale climate re-organizations and is estimated at 0.2 Pg C yr−1 for the two decades with some systematic differences between approaches. The largest differences between approaches are seen in the decadal trends. The trends range from −0.13 (Pg C yr−1) decade−1 to −0.50 (Pg C yr−1) decade−1 for the two decades under investigation. The OBGCMs and the data-based sea–air CO2 flux estimates show appreciably smaller decadal trends than estimates based on changes in carbon inventory suggesting that methods capable of resolving shorter timescales are showing a slowing of the rate of ocean CO2 uptake. RECCAP model outputs for five decades show similar differences in trends between approaches.
ChatGPT summary:
From 1990 to 2009, scientists studied how much human-caused carbon dioxide (CO2) is absorbed by the ocean using various models and methods. They found that, on average, the ocean takes in about 2.0 billion tons of CO2 per year. This absorption rate can change yearly due to large climate changes, with a variability of about 0.2 billion tons per year. However, different methods showed varying trends over the decades, with some showing a decrease in the rate of CO2 uptake. Models focusing on shorter timescales suggest that the ocean’s CO2 absorption rate is slowing down.
I thought it was generally a pretty competent summary for a lay reader who may not have understood the original, but there were two mistakes. I like the way it understood that “−2.0 Pg C yr−1” meant 2 petagrams per year and translated that to tons, but that’s equal to 2 metric tonnes, whereas “tons” would conventionally be interpreted to mean US tons, so 2 Pg would equal about 2.2 tons. And “C” refers to the carbon component of that CO2, not the total mass of the CO2 itself, an understandable mistake that a human might have made, too.
So, a useful summary that needed two tweaks. Where that falls on the spectrum between a perfect summary and gibberish is a matter of judgment, but it would be pretty hard to dismiss as “gibberish”. If I had asked a general-studies student to summarize that paragraph in their own words, I’d probably have downgraded them to a “B” or “B-” for those two mistakes.
And if I had been looking for a summary for a general audience, the ChatGPT output would have been useful – especially if the original text was quite long – as it’s much easier to tweak some minor mistakes than to write something from scratch.