There is a pretty good evidence that the musical scales, although different mostly derive from the the same underlying mathematics. That the mathematics is not perfect and there are clear historical lineages for what a society considers tonal is however a key thing.
There are really two competing ways of generating the scales. You start with a note (really any frequency at all) and apply a set of simple ratios to it. 3:2 gets you the fifth, 4:3 - a fourth, 5:4 - major third, 6:5 minor third, 16:15 a semitone, 9:8 tone, 15:8 major seventh, 16:9 minor seventh, 8:5 minor sixth, 5:3 major 6th. However you can already buy an argument about more than a few of these.
Of course the other way of getting them all is via the circle of fifths, where you just keep multiplying by 3/2 and halving whenever you get a ratio large than unity. A scheme that gets you close to the same set of ratios as before, but not quite. (The unique prime factorisation theorem bites hard here - you can only have 2 and 3 as factors - so no factors of 5.) But it does get you a scheme where you can transpose.
The circle of fifths is interesting in that it can never get you back to the start. You just stop when the next note you get is within an acceptable range of your starting note. Doing so naturally gets us 3 different scales quite quickly. A pentatonic, the western 12 tone, and a 43 note (microtonal.) The 43 note scale allows us to get closer to the “proper” values. But never exact. Then you can look at Just intonation, and it goes rapidly ever more complicated. What matters here is that the number of notes isn’t an arbitrary choice.
There are also a huge underlying set of technical questions about the nature of harmony here. The easy introductory way of looking at harmony looks to these ratios and looks for interplaying ratios that result in more low order ratios (and avoiding higher order ratios). This is the point where we can think of music as messing about with mathematical rules. The rules of harmony can be derived in a reasonably regular manner, and the rules for simple melody come out soon after.
But that would only get us short distance into human music. Nothing here tells you about the nuances of the harmony, tells you why the different scales sound different and have different emotional impact. Why does moving a single note in a scale (say major to minor scale) utterly change the impact? Bach’s six-part ricercar from his Musical Offering might be a most fabulous example of ingenuity, and the subject of much mathematical analysis, but it is also quite beautiful.
Once you get into the last century of music all this goes out the window. Dissonance started to make inroads into western music about 150 years ago, but was present in other cultures long before (and was appropriated from there in some cases). The increasing ability of music to express complex things is clearly tied to the increasing richness of harmonic theory as much as anything else, and we are past any easy sort of mathematical structure to provide guidance on how to make such music. (I attended a perform of Rach 2 and Shostakovitch’s 5th a couple of days ago. Magnificent. I challenge anyone to work out how to explain either piece to an alien species.)
The point?
Some of this comes down to the specific manner in which our ears work, and the limitations to our ability to discern frequencies. Our ear can discern individual tones very well, but the masking effects mean that as the music becomes more harmonically rich, small scale shifts in frequencies are not discernable - which is really why we get away with using the equal tempered scale - despite its clear flaws. Western music puts up with the limitations of the diatonic scale, whereas some others (Indian especially) has harmonic nuances westerners can barely - if at all - register. But all of us are limited by the construction of our ears. Another species might be expected to have different frequency resolution capabilities - they might hear with a totally differently constructed sensory system, and that will mean a potentially different set of rules for harmony, and thus pretty much all music. They might have a huge frequency response capability say sound up to 1MHz, but have barely any ability to resolve frequency differences with ratios of less than about 3:2. They might have exquisite frequency resolution, down to being able to haul out dissonance in harmony of about one cent. Both might have highly developed music. Neither would be able to understand much about our music, nor we theirs - although we might all agree that each was making music.