Thank you very much for the answers. I just learnt that CUDA is compiled in two steps: first CUDA code to PTX, then PTX to ptxas. The first is open sourced process, and the second is closed because it is targeted at specific hardware. This second compiler is part of the GPU driver.