It’s just regular ol’ mixing software, its expensive AF and has some bells and whistles (autotune, effects, etc) but it works exactly like I describe above.
It’s a computer program, made of a series of assembly language instructions. It takes as input some songs someone else wrote (they don’t have to be songs someone else wrote but I have zero musical talent, so that’s all I got) encoded as 1s and 0s, and some parameters provided by me. The program runs, by executing one dumb assembly language instruction after another, each instruction takes some bits from the input data and does something (a very small predictable thing, like load a few bits of input data, do some arithmetic, go to another instruction, etc.) When the program completes a new piece of audio (encoded as 1s and 0s) has been created.
This is exactly what the other program, that is an AI model for creating audio, does. The encoding of the audio files is a lot more complicated but that’s all it is, it cannot be anything else, because that program (and every program every written) works exactly as I describe above. There are no AI fairies.