So I was thinking about how if you want to prove if 2 variables are connected (say death penalty punishment and murder rate), you need to see in the data a clear correlation that is causation.
Let me give a stock market example. Suppose you notice that the price of bananas and the price of big macs are correlated. As big macs go up in price, so do bananas.
To show causation, you might see if there is a predictable phase lag between a change in big mac price (say 3 months) and a change in banana price. If every time the big mac changed in price, the banana changed in price 3 months later, and no changes in banana prices happened when the big mac price was constant, you have fairly solid evidence of causation. The more times this has happened in your data, the stronger the probability that big mac prices predict banana prices to change is. (and some underlying mechanism of causation is involved)
Well, some people argue that the death penalty decreases homicides. It would be straightforward to prove or disprove this. It’s simple - toggle the death penalty on and off like a step function, at different frequencies, and see if the murder rate goes up and down at the same frequency following a phase lag.
So for a 5 year period, anyone caught for murder in that state (better to do it nationwide but you know) will be executed. The next 5 years, they get life in prison. Do this 1-2 cycles. Then make the period 2.5 years, and same thing.
This would create clear proof one way or another, as if the hypothesis that “executions reduce homicides significantly” is correct, stopping and starting executions should cause a clear trend in the data. There should be a significant increase in murders during the 5 year “only life in a cage” period, and a decrease during the “to death row you go” period. And the period of increase and the period of decrease should be 5 years long. And 2.5 years for the second cycle.
Is there a formalized mathematical way to actually prove, based on the number of cycles you conducted the experiment and the relative effect - how *certain *you are of your conclusion?
Back to the banana example : if you used automated software to discover this relationship between big macs and bananas, you could make a nice profit trading banana futures. But you need to calculate the probability based on N observed correlated big mac->banana events, and M relative price swings in banana prices, and O uncorrelated banana price swings how likely it is your model is correct. That way you know whether you should bet real money on this.