How do AI "programs" work?

Labeling is a big problem. AI companies are working on ways to make it easier.

Most self-driving systems, Tesla’s current version included, have as a first stage a system that converts raw (or close to raw) imagery into a high-level description of what it sees. There is a small car at these coordinates, a semi truck at these other coordinates, road lines here, a speed limit sign there, and so on.

To train the net requires a human to first perform the labeling manually, just viewing the image and drawing boxes around things and describing them (in terms the computer can understand). This is then fed into the net so that it can reproduce the same description.

This is incredibly tedious and error-prone work, so they’ve improved the approach in one way: once you get the AI partially trained, it can give you a best guess at what it sees. The human then fixes up any incorrect labels, and then the corrected data (if it needed correction) is fed back into the system. This reduces the workload substantially.

But it’s still a lot of work, and worse, it’s not what you really want anyway. The labels only contain things that humans already identified as important. But that’s not necessarily what is important. As said above, really we want the computer to learn from patterns that we aren’t even perceiving. And even aside from that, the labels are pretty low-fidelity. It’s just not all the data that you’d want.

So Tesla’s latest (unreleased) system is end-to-end–there’s no intermediate labeling step; instead it takes video input and outputs the vehicle controls (steering, throttle, brake, etc.). Somewhere deep inside the AI, it must have something like the labeling–it’s still distinguishing between different elements of the scene, else it wouldn’t work at all–but it’s difficult to know what exactly it’s doing. And no human is involved for that step.

The only labeling, as it were, is just that they have millions of examples of how humans behaved in the same circumstances. One funny consequence that came up is that humans apparently only come to a complete stop at stop signs <0.5% of the time. “California stops” are, apparently, near universal. Their FSD system learned to behave the same way, so they had to explicitly feed it extra examples of people coming to complete stops for it to behave that way.

At a very high level, DPRK is right; these systems are “universal approximators”. There is some function that we want to be solved, which takes some input (whether video, text, or otherwise), churns on it, and produces some desired output. We give it examples of the input/output we want and the internal weights get adjusted to fit those examples.

What’s remarkable though is how it generalizes. It wouldn’t be surprising if it could reproduce the training set exactly–you could do that with a lookup table, in principle. But somehow it usually manages to do “the right thing” even with novel inputs. It suggests, a little distressingly, that all of human intellect and creativity is not much different than interpolating points on a curve–just a “dumb” process of filling in the blanks.

We might see something similar to biological evolution happen here. The most complicated structures repurposed components that evolved for other reasons. Bird flight wouldn’t have happened without feathers, but feathers could not have evolved for the sake of flight–it’s just too big a step. Instead, they had already evolved for something else, like insulation, and thus were already available for use with flight.

So perhaps something similar could take place here–train a net on something that more directly learns an FFT, and then when it is trained on image data later, it has the FFT available for use already. If the FFT is useful for that purpose, the image training will refine it further, possibly opening it up for yet more applications.

Put another way, all AI training is just some form of gradient descent. But for some training data, there is just no path from the current position to some deeper minima far out. Other training data may however unlock a smooth path from here to there. Once there, the data that was previously stuck now has more options available.

Here’s a wonderful video that does a great job of visualizing how machine learning and neural nets work. Watch this first.

Watch it? OK, so you can see that Mar.io is dealing with a small set of inputs, and those inputs map to nodes, and those nodes lead to outputs. Given the same set of inputs, it will always provide the same output. But why those inputs lead to those outputs doesn’t make sense, even at this very small scale. It’s just the result of its training, and the program has no rationale for any of it.

Obviously something like a ChatGPT or TeslaFSD has exponentially more inputs and an incomprehensible number of nodes, so following a logic path becomes even more of a fools errand.

Now here’s the crazy thing that’s tough to get our heads around. Recent research into how humans makes decisions suggests that we operate no differently. Our brains are just bundles of neurons reacting to inputs. When we’re asked to explain why we did something, made a decision for instance, the research suggests that the answer we give is our brain trying to justify its own actions after the fact. In the example CaveMike gave above, when trying to explain why he thinks Customer X ordered what the ordered, all that stuff about gender, age, clothing, time of day, is just his brain grasping at observations to explain something that it itself doesn’t understand.

This video is my go-to for an easy to understand explanation. (Now if only I could embed it, why is this still a problem?)

https://www.youtube.com/watch?v=R9OHn5ZF4Uo&pp=ygUVaG93IGFpIHdvcmtzIGNncCBncmV5

Because nobody is home at SDMB IT administration, and our oddball settings about pix & video uploads are an edge case that Discourse’s development staff isn’t prioritizing to fix.

There is a well-explained and well-understood workaround that is 100% effective. Use that. Some searching in the Site Feedback category will find it.

N0, their ‘reverse engineering’ simply exposed a natural property of their subject. Had they reverse engineered the long term tide tables they would have found the same result. That’s what Fourier did.

Also, I have been unable to get GPT to deal with circuits. Perhaps you can share some,

Where though, does ChatGP get it’s data? The internet I suppose.

From things similar to this.

Interesting and totally agree about labelling. Is this data collected from normal Tesla drivers when driving manually?

Good point. From my experience if you treat training as a black box it doesn’t work out well. As you say preparing the inputs in a form that is closer to the desired solution can be a big boost. It’s why you might feed an RGB image instead of a .JPG. However theoretically with enough data and enough model complexity, the model should be able to learn an FFT (or a variation that it needs).

Indeed. This is the crux of a huge amount of what nowadays passes for AI and also optimisation work.

In an evolutionary view we get to the blind watchmaker. We get over that hump by throwing time at the problem.

A huge part of all the gradient descent problems is local minima, then edges and holes. A lot of the classical techniques, like genetic algorithms, simulated annealing etc work in part by stochastic mechanisms to leap out of constraints - with a range of successes.

If a ML system could hold onto old tactics and randomly turn them on and off, you might get better traction. This turns into a hybrid GA ML, and we are basically back to emulating evolution. Lots of work has been done on just this. But it still requires domain knowledge.

Maybe. Learning systems and indeed any of the optimisers can get caught in local minima or loops.
Given arbitrary time and mechanism to jump out of traps a system might eventually find an advanced algorithm. The problem with say a nueral net learning to use a FFT is that using isn’t the same as behaving as just an FFT. To make use of it, the FFT is buried within the rest of the net, and we are dependant on the backpropagation both finding the FFT and working out how to use it well. Which is a very big ask.

The DFT and its cousins are interesting. They are not a huge step for some ML systems to learn to mimic if you give them the basic input and transformed output to work with. Indeed on the inside a FFT butterfly looks a lot like a neural net. Image compression like JPEG uses the DCT as the basis of its operation, so for at least some tasks using JPEG images as input might actually work better. RGB is curiously not necessarily a good start. You may be better off with say HLS or even Lab. But a ML system could, in principle, work out how to perform the transformation. But it requires more layers and corresponding step up in time and data.
In the real world these are all limited resources.

One imagines there are a lot of smart people at Tesla, and with more experience than anyone in getting an AI to drive a car. But it does strike me as a pretty big leap to suggest that one huge deep ML system can do what is required. Learning from humans is likely to lead to the need to unlearn some poor human habits.

Right now, I remain unconvinced by the use case versus costs of a full self drive. You need to imagine that there are enough people who really want to relinquish control, and want it enough to pay for it. Sure, when I reach my dotage, and they confiscate my drivers license, I’ll probably be all for it. But Elon’s weird reality of cars spending their day earning their owners money as autonomous taxis has more than a few problems. Elon clearly thinks he is they guy to usher in the dawn of the age of The Culture for humankind.

I don’t disagree at all with what you said. I was mostly just trying to scheme up a simple and clear example of just how and why we don’t know how AI makes its determinations.

Many (too many?) people assume that because people programmed them, that we can just do something akin to a stack trace and see exactly what’s going on, even with something like an AI. They assume that AI is just an absurdly intricate and complicated deterministic program, which is not the case.

To me, the concept that it’s like when we "know* that something is wrong or off about a person or a situation, without actually being to intellectually identify what, or even articulate why. Nonetheless, we’re perceiving some combination of inputs from our environment that don’t add up somehow, and our brain is telling us that.

AI strikes me as a lot like that- we know what it’s telling us, we know what it’s been trained on, but we don’t know the how or why it makes its own decisions about the data it sees.

I have this sort of… feeling(?) that one of the real killer applications for this sort of AI isn’t going to be the ChatGPT sort of things, but rather in automating processes that are currently some combination of ridiculously manual AND don’t lend themselves to traditional pattern matching approaches. Like say… sorting garbage for recyclables.

I also think that at some point, we’re going to see some kind of distributed AI or AIs that share their learning data, and that’s going to be something that’s going to improve their accuracy a LOT. I mean, if you are constantly correcting your garbage sorting bot in Seattle, that’s great, but it would seem that if you shared that sort of information freely with most other cities, you’d collectively end up with MUCH better garbage sorting AIs than if you all did it in your own shut-off bubbles.

AI is not deterministic?

No, it isn’t; at least, not those driving by stochastic machine learning algorithms, i.e. large language models, diffusion probabilistic models, and similar generative systems.

Stranger

I would refine that to say that during training, AI/ML is often intentionally not deterministic and during inference it is often deterministic.

And even something that is deterministic can still behave as if it’s not, if it’s complicated enough that it can’t be predicted.

Even in inference it still isn’t deterministic. Given the same prompt twice may give significantly different answers, and even minor changes to the wording or organization of a prompt may give radically divergent responses. In any case, with a LLM or other generative model of any useful degree of complexity, it is just not possible to trace through the neural network and predict a response ab initio; the heuristic complexities of how such models ‘learn’ is very much an active area of investigation and has been the primary reason that it has taken so long to get to generative models that can work with any degree of reliability or stability.

Stranger

Inference can be easily made deterministic. There are two main reasons it might not be in practice:

  1. Temperature. Temperature is just the idea of occasionally not picking the most probable next symbol, and instead picking something further down the list (using a weighted random number). This helps make the output less repetitive. But it’s not necessary, and can be turned down to 0.
  2. Floating-point math is not associative. Because (a+b)+c != a+(b+c) in FP math, the answers depend on the order that you do the math. Parallel processing exacerbates this since it tends to perform a bunch of sub-calculations and then merges them together as they come in–which might happen non-deterministically (say, because one GPU is running a little faster than another). But again, this is fixable with some performance loss. You just ensure that all calculations get merged together in the same order, regardless of when they arrived.

The basic tech, as I understand is:

Our brain is a huge mess of neurons which interweave like a bowl of spaghetti. (Except, consider the scale of cells and th size of our brain…) where the neurons touch one another we have synapses, which can transmit current. Which neurons trigger which other neurons, determines out thought patterns. As we learn things, certain synapse connections become stronger and others weaker so certain pathways are “familiar”.

AI simulates this behaviour with crossed matrices - thnk of a neural connection - synapse - as the intersection of a row and column in a matrix. The vlaue of that cell is the conductivity, therefore the power, of that connection, (Hence the matrices in the earlier post)

Build enough of these and you basically have a massive n-by-n matrix. Show it a picture and identifiy the car in the picture, and have an algorithm that changes the value of the matrix when it sees this input - i.e. pixels to output… it will recognize pictures “almost” the same. Do this with thousands of pictures of similar objects from various angles, and it will have a good chance of recognizing any view of thee objects.

Also once you are reasonably confident of the tech perhaps it can train itself - i.e. it can tell a face 90% of the time, use the AI itslef to circle faces and feed them into the face recognizer to fine tune it.

(Tesla has supposedly dozens (hundreds?) of people employed all day long going through photos, circling the cars, traffic signs, etc. and adding them to the Self-Drive learning matrix/database)

These are then combined with “rules” in standard computing - go forward at the speed limit. Stop for stop signs and traffic lights. if you see the lines on the road, stay between them. Keep to the left/right. Watch out for cars ahead, coming from the side. That’s a freeway exit (does it match what the map says?), don’t take it unless the destination is that-a-way. That’s a speed limit sign, but this is just a suggested slow-down-to-30mph-for-curve sign. And so on. These rules are just as important to complete a task, but they need to be based on valid inputs or Garbage-In-Garbage-Out. (One fo the earlier Tesla accidents involved not discerning a white semi across the highway against a white cloudy sky.)

Given a million relatively varied examples of “X” and asked if a random input is an “X” the AI

I think it is a question of where the system boundaries are drawn.

For inference to be non-deterministic either:

  • State has to be maintained between inferences
  • The architecture has to acquire a random seed either as an input or by having access to an RNG source

If we draw the system boundaries at the point of the user, these appear non-deterministic.
If we draw the system boundaries at the point of the architecture of the model, these are deterministic. The state and/or seed are inputs to the model. Running the model with the same inputs will produce the same outputs.

If I feed Llama2 the same prompt, I get the same result each time. I can build a unit-test around it. However, if I have a chat application based on Llama2, the app can save state (previous prompts and/or previous results) and feed it back in during inference.

On the other hand, the notion of “deterministic” only even makes sense if it’s possible to feed the system the exact same input twice. For a chatbot, this might be possible: Just type in the same thing (assuming that it can’t pick up on things like typing speed and pauses in the typing as part of the input, which it might). But for a self-driving car with a bunch of cameras and sensors, the input will never be exactly the same. One input might be extremely similar to another, but maybe that slight difference is, this time, significant, and so it does something different. Or maybe it just thinks the slight difference is significant.