We can see the CUDA version only takes 6 seconds while the original version would take more than a hour.

