Are AIs capable of using the scientific method?

DomImhave to observe with my own eyeballs? Or can I I just read the results from other observations? If so, can’t ChatGPT? I’m not getting your ‘observation’ limitation, It would be trivial to give ChatGPT access to all the data from an observatory, or even allow it to control the obaervatory.

Human scientists generally do not look through telescope eyepieces anymore. They use digital cameras and computers. ChatGPT can also use digital sensors and it’s alreadyna computer. Where’s the distinction?

I think you’re giving ChatGPT a lot more capabilities than it has. I get it though, new stuff — especially when it’s in the news — is exciting.

What is this supposed to mean? Is this the same strange suggestion that AIs cannot observe?

And the hypothesis that the star formed in an area already rich in heavy elements, with a test to look at metallicity of stars that would have been nearby at the time of formation or supernova remnants in the path of the star is not novel. I’m pretty sure it more or less cribbed this from one or more papers on the topic from Physical Review Letters, Annals of Astronomy and Astrophysics. International Journal of Astronomy and Astrophysics, and other online journals and convention papers, or maybe just extracted it from any number of pop science sites like LiveScience or New Scientist. So that makes it about as ‘intelligent’ and informed as the typical college sophomore in a “Science of the Universe” liberal arts elective looking to puff out an essay at 1130 PM before it is due next morning, which is to say not very.

Stranger

No. Not at all. I’m saying that coming up with hypotheses isn’t enough claim one is doing “science”.

Can I imagine AI doing science some day? Sure. I can imagine tons of things, it’s theoretically possible. In fact intelligence might be defined as the capacity to science-like stuff

Is ChatGPT capable of doing science? No.

How exactly does that work? You know ChatGPT is not a general AI, but a chatbot - that is, it imitates human writing using the statistical relationships in its training texts. Why would you give it control of an observatory or anything else? It wasn’t designed for and isn’t the tool for controlling anything, really. It’s a tool made to talk to humans using the accumulated text of 8 million documents and some training to exclude some taboos and include the type of responses people regard as useful text.

It might be helpful to revisit some of ChatGPT’s limitations, from OpenAI itself:

Up-to-date Knowledge

ChatGPT’s training data cuts off in 2021. This means that it is completely unaware of current events, trends, or anything that happened after its training.

It will not be able to respond appropriately to questions or topics that require up-to-date knowledge or information. For example, it may not know who is the president of the United States, what is the latest viral meme, or what day it is.

Verifying Facts

ChatGPT has no external capabilities and cannot complete lookups. This means that it cannot access the internet, search engines, databases, or any other sources of information outside of its own model. It cannot verify facts, provide references, or perform calculations or translations. It can only generate responses based on its own internal knowledge and logic

Theoretical physicists, say, don’t observe directly. They take data from the observations of others. Observations which these days are in digital form.
An AI would have to be trained to interpret these observations, but so would you and I.

As for generating hypotheses, it could look for correlations in data sets. Now it is a common fallacy to consider these as proof of a correlation, but they can be used as a hypothesis and a test for that hypothesis using new data can be generated. It might not be as creative in generating hypotheses as a human, but it could look over a lot more data.
This is done already - IIRC WalMart found a real correlation between sales of snow shovels and hot chocolate.

It would be trivial to fine-tune an LLM with data from the observatory archives. I’ll bet it’s already being done. They can also use public APIs to access external data. Bing Search is essentially ChatGPT with the ability to crawl the web in real time.

As for why you might want to…

I predict that we are going to see a lot of research using LLMs in the future.

The new observatories like the Vera Rubin are going to be creating terabytes of data per day - far more than humans can look at. AIs can sift through this data looking for everything from comets to exoplanets. There’s no reason they can’t be trained to go theough day’s new data looking for anything out of thr ordinary, then forming a hypothesis and checking previous data to validate it. This is actually something AI is really good at - spotting anomalous patterns in huge amounts of data.

Correlation is not causation.

Observation is what makes science science. Otherwise it’s just masturbation (and that’s coming from someone with a PhD and years working in theoretical physics).

ChatGPT, specifically, wouldn’t be able to go through an observatory’s data in any meaningful way, because it’s not the type of data that ChatGPT was trained on. But another, similar, AI could be trained on data that includes data of that sort.

I thought I had more or less said that. Data mining and being convinced that correlated items have a causal relationship is a fallacy, since there are so many factors that two of them will inevitably look correlated in any data set.
Using that as the hypothesis for an experiment is not a fallacy, and I’m sure you are aware that we can measure the probability that a correlation seen in a new experiment is due to chance.

I think you’ve misunderstood - you’re using the specific name of an AI chatbot to refer to an entire type of AIs in general. Saying that ChatGPT should run an observatory is like saying people should use Microsoft PowerPoint to do their taxes. Sure, you’ve got the category right, and people should use a computer program as a tool to do their taxes, but it’s not the same kind of tool.

I was responding to this:

I’m aware that you won’t literally use ChatGPT, but you seem to be saying that it would take a general intelligence, and I’m saying that fine-tuning an LLM would seem to be fine. You dismiss them as just a device for spitting out statistical relationships between words but that’s really describing the transformer. The neural network is much more general and can work fine with images or any other data - with fine-tuning you can get the capability to do much more analysis, while using the LLM to give you natural language programming.

Again, you have misunderstood. Chatbots are a way of spitting out statistical relationships between words. That’s why the first part of the name is “chat”

And more to the point, the only degree a ‘chatbot’ trained in this way can have is the relationship between words. That is to say, when it formulates a response to a question, it is not doing so by producing some internal conceptual understanding of the meaning of the question in the context of world experience (of which it has none) but because it looks up how other people describe a zebra, or images that are labeled ‘zebra’, and derives that the typical response is “black and white”. A human has a mental construct of what a zebra is, and can recognize it from many different orientations, lighting conditions, et cetera, even if they have not seen a zebra in a similar condition; a machine learning system only knows what it has been trained on which is why they have such problems recognizing common objects presented in unfamiliar ways and have significant predictive biases depending on the training dataset. What they do not have is the context that even a small child has about objects in the real world; it is purely operating on the semantics of words as presented in their datasets, not the context of the world that they exist in. If I said something like, “That car is a real dog!” a chatbot would almost certainly interpret that statement literally as either being a kind of dog, or ‘like’ a dog in some way, rather than understanding that I mean that it is a very slow car.

As for the “scientific method”, there are a wide array of interpretations of what that really means, and there is a narrow subset of those that could sort of describe how a machine learning system works insofar as it takes in data and formulates a stochastic model of how likely another object or reference is similar or different in measurable or quantifiable ways. However, more expansive interpretations of the scientific method include the experimenter or observer using their experience to conceptualize a mechanism, and from that to formulate a hypothesis and falsifiable criteria, and I don’t think a credible argument can be made that machine learning algorithms do anything of that sort. They are building networks of statistical weighting without any real conception or spontaneous generation of hypotheses, so at best it is a kind of brute force design of experiments system that can easily produce incorrect and often incomprehensible results in the same way someone learning a new language by rote can use legitimate words in a way that is meaningless or completely contrary to the intent of the speaker.

This is not to say that a more powerful ‘true’ machine intelligence cannot develop more sophisticated models of the world and be able to make spontaneous theories and deductions based upon incomplete data (what we consider intuition) but that isn’t the way current machine learning systems function, and that would represent a vast revolutionary advance in machine cognition. The current machine learning systems are vast amounts of data producing statistical connections between words or images, and it doesn’t take much deliberate testing to demonstrate that they lack the ability to intuit concepts that even small children can deduce with a fraction of the amount of ‘data’ or training.

Stranger

It actually does interpret that as a metaphor, though it thinks it’s more a metaphor on its poor performance or condition in general, not specifically slowness.

“Amazon had an experimental hiring tool that taught itself that male candidates were preferable, and penalized resumes that included the word ‘women’s’, and downgraded graduates of two all-women’s colleges. Meanwhile, another company discovered that its hiring algorithm had found two factors to be most indicative of job performance: if an applicant’s name was Jared, and if whether they played high school lacrosse.”

So, basically a hiring algorithm that replicates the thinking process of a frat bro.

Stranger

There is a LOT more going on than that. There is a huge ‘black box’ between the input and output that we only vaguely understand.

A problem with the, ‘it’s only a statistical word generator’ argument is that these capabilities emerged. If ChatGPT were just a sophisticated lookup table implementation, we should have seen its abilities start from zero and get slowly better over time as more data is ingested. But that’s not what happened at all.

For example, GPT-3 could not do 2-digit arithmetic. They trained it up to 10^18 FLOPS, and nothing. Same at 10^20, and 10^22. Every attempt at arithmetic was random noise. But suddenly, just over 10^22 FLOPS of training, the ability to do arithmetic just emerged. No one planned to have the thing do math, or even expected it. The same happened with the ability to translate, understand words in context, code, do poetry, and pass “Theory of Mind” tests. These capabilities all emerged at different scales after showing no capability at all below that, and they exist inside the black box that is the 175-billion parameter neural net.

Focusing on the mechanism for spitting out the words seems wrong to me. Yes, the transformer that is the output gets a list of probabilities for the next word from the black box. It chooses one of the probable next words with a bit of randomness, then the black box gives another set of words with probabilities for the next word. Repeat. Focusing on that process misses the fact that the probabilities themselves are generated in an incredibly complex black box we do not understand. Think of the transformer as the speech center, an I/O system attached to the LLM model. The model itself is doing something very complex and opaque to generate those word lists, and the abilities to do so are not part of some algorithm humans cooked up, but emergent properties of a very complex network trained on huge data using reward models and human reinforcement learning.

So let me get this straight - you see Liang & Chi (the authors of the piece linked to, for those reading who haven’t clicked on it) ask the question:

Will new real-world applications of language models become unlocked when certain abilities emerge?

And you believe you know the answer regarding ChatGPT specifically (rather than language models generally,) and that it is “yes”?

I also don’t see any mention of a “theory of mind” test passed in your link, searching that page for “mind” just turns up a reference to DeepMind, is it in there somewhere?

Seems like a lot of work for you to go to, when I thought just typed ChatGPT where you should have typed an LLM

We have no idea what capabilities will emerge at larger scales. That’s the nature of emergence. It could be that these LLMs cap out in ability at a certain size, or maybe there are a lot more surprises in store. We have no idea when, or if, new capability will emerge.

I am using ChatGPT and LLMs somewhat interchangeably. Sloppy perhaps, but some people who have used ChatGPT don’t know the term LLM, and the underlying concept is the same for all of them. They differ in detail, but in general principle they are the same.

Here’s an paper describing LLMs developing theory of mind: