A neural network, regardless of the topology (deep learning is a type of topology, for example), is what is called a “universal function approximator”.
So given some set of inputs I and a desired set of outputs O, we want to know that the function F(I) = O. So if we think about a function in general mathematical way it can have the following form for the jth output.
O[j] = ax + by + cz + … where a,b,c are constants and x,y,z are variables. Let’s say for the moment that x,y,z are members of I. Then we would have O[j] = aI[1] + bI[2] + cI[3] + … However, this is too simple for some functional relationships. By layering and other changes to the topology (recurrence for example), as mentioned by @Sam_Stone above, neural networks can make the functional relationships arbitrarily complex. If we let the result of layer 1 be called L1, and the result from layer 2 be L2, and so on then the function becomes incredibly complex. It would take me some time just to write out for even just a couple of layers as the input to every element of each layer is the output from the previous layers… in the simplest neural network topology.
In other words, it has a form something like O[j] = aL2[1](a’L11, b’L12, c’L13, …). And again you can create polynomial functions with different topologies.
So where am I going with this? The billions of parameters are in some sense those constants; however, most of them are useless. They don’t really contribute to the result because of course we don’t really expect the output to be some complex function of the function of the function of the function of EVERY input. This would be silly. And that is part of why neural networks are regarded as black box AI algorithms. You cannot really determine which parts of the function is actually important or not. And they are easily fooled by deceptive data.
Consider if you had 1,000,000 pictures of tumors. 500,000 have a tumor and 500,000 do not. Suppose the image size is 512x512 (it wouldn’t be but that’s ok). Suppose further that by pure chance every image in the positive set has a white pixel at position (25,25). The neural network will learn this, even though it is irrelevant. This was a problem in early tumor detection algorithms because all of the positive images happened to have notes written on them by the doctors, and very few of the negative images had notes on them. The AI would learn that notes meant cancer leading to INCREDIBLY high success rates, that failed when it left the lab and was given pure imagery.
I’m sure I had a point when I started, but I think I forget what it is so I’m going to hit reply now and hope you found that at least interesting.