Parallel portions of an application are executed on the device as kernels (example add < < < 1 , 1
>>>). One kernel is executed at a time. Many threads execute each kernel. A CUDA kernel is executed by an array ofthreads. All threads run the same code. Each thread has an ID
that it uses to compute memory addresses and make control decisions.
CUDA (3.2.16)
-
Input
Video credit: Udacity.