Can anyone explain how a GARCH model forecasts volatility in laymans' terms?

I am not a mathematician, just a trader, but would love to know how a GARCH model goes about predicting volatility. Any attempt to get an explanation leads to mathematical equations with one too many greek letters to be serviceable by my mind. Can anyone explain how GARCH predicts volatility in laymans’ terms? Kudos to anyone who can.

Let’s try this:
G = generalized
AR = autoregressive
CH = Conditionally heteroskedastic.

I’ll come back to generalized if you can understand ARCH first.

Autoregressive essentially means regressed on itself though in this case you regress your variable on past values of itself. Suppose we have a time series we’d like to estimate: x[sub]t[/sub]

(1) x[sub]t[/sub] = a + x[sub]t-1[/sub] + e[sub]t[/sub].

If the error terms e[sub]t[/sub] are uncorrelated (and as always have zero expected value), then our best estimate of x[sub]t[/sub] is a + x[sub]t-1[/sub]. This is a random walk with an expected change (or drift) of a. But if the changes tend to persist, that is if an increase in x tends to be followed by another increase, then x is an autoregressive process. In this case the error term can be broken into two parts, what you expect after seeing the last error, and the new part you can’t predict.

(2) e[sub]t[/sub] = r e[sub]t-1[/sub] + u[sub]t[/sub]

r (usually denoted by the Greek letter rho which looks a bit like a lower case p) is the autoregression coefficient.

So now the best way to forecast x[sub]t[/sub] is to use (1) and (2) like this

(3) x[sub]t[/sub] = a + x[sub]t-1[/sub] + e[sub]t[/sub] = a = x[sub]t-1[/sub] + r e[sub]t-1[/sub] + u[sub]t[/sub] = a + x[sub]t-1[/sub] + r ( x[sub]t-1[/sub] - x[sub]t-2[/sub] ) + u[sub]t[/sub]

This can now be estimated nicely since we know x[sub]t-1[\sub] and x[sub]t-2[\sub] and u[sub]t[\sub] is uncorrelated. This is an AR-1 model: autoregressive with one lag. Note it can also be written as

(4) x[sub]t[/sub] = a + b x[sub]t-1[/sub] + c x[sub]t-2[/sub] + u[sub]t[/sub]

More generally a process that has longer memory is an AR-q model with q lags

(5) x[sub]t[/sub] = a + b x[sub]t-1[/sub] + c x[sub]t-2[/sub] + … + c x[sub]t-q-1[/sub]u[sub]t[/sub]

Hit post instead of preview sorry.

In the autoregressive (AR-q) model the variance of the error terms is assumed to constant. In an ARCH model, it is assumed to be changing. Now

(6) e[sub]t[/sub] = s[sub]t[/sub]v[sub]t[/sub]

the error v[sub]t[/sub] has a constant variance or standard deviation of 1 while e has a changing standard deviation of s[sub]t[/sub]. An estimate of s[sub]t[/sub][sup]2[/sup] at any moment is just e[sub]t[/sub][sup]2[/sup]. If s[sub]t[/sub] is changing in an autoregerssive fashion as well, then just as with the x series

(7) s[sub]t[/sub][sup]2[/sup] = A + B s[sub]t-1[/sub][sup]2[/sup] + … + C s[sub]t-r-1[/sub][sup]2[/sup] + w[sub]t[/sub]
= A + B e[sub]t-1[/sub][sup]2[/sup] + … + C s[sub]t-r-1[/sub][sup]2[/sup] + w[sub]t[/sub]

This is an GARCH(p,q) model (this r has nothing to do with the autoregressive coefficient in my previous post, but GARCH(p,q) is the common notation. There are all kinds of more complicated GARCH models. NGARCH is nonlinear, IGARCH is integrated, EGARCH is exponential,

I appreciate your attempt but that’s still too mathy for me :). I was hoping for metaphors.