First off, I’m an interested amateur, obviously not a mathematician or statistician, so I apologize for making mistakes in terminology, and if you could please keep the replies as simplistic as possible, I’d appreciate it. Also, I find I’m having a hard time expressing the problem, so bear with me.
A few years ago, I went down the Sabermetric rabbit hole, to the point of creating my own run estimator based on Base runs, only much, much more complicated.
The gist of the problem that I’m having is this-
Say that Singles = S, Doubles = D, Triples = T, Homeruns = HR and that -
RUNS = (S * X) + (D * Y) + (T * Z) + (HR * W)
Previously in order to optimize the constants (S, D, T, HR), I would have tweaked one of them, then used Root Square Mean Error to determine whether the tweak rendered the formula more or less accurate with regard to the given RUNS. Simple enough in one spreadsheet, but when you have to change the formula and record the results for multiple leagues across multiple years in multiple spreadsheets, you obviously run into a big efficiency problem.
So, how does one go about optimizing each of these constants for multiple data sets efficiently, without having to resort to a brute force method of, ‘make a change, record the result, repeat two dozen times,’ as I have previously done?
Put another way, is there a way calculate the value of ‘X’ that’s the most accurate across all the data sets combined, but not necessarily the most accurate for each individual data set, without physically combining the data into one spreadsheet?
Again, apologies for the lack of clarity. I’ll explain as I’m able.