/ˈkuː-də/
n. “A parallel computing platform and programming model for NVIDIA GPUs.”
CUDA, short for Compute Unified Device Architecture, is a proprietary parallel computing platform and application programming interface (API) developed by NVIDIA. It enables software developers to harness the massive parallel processing power of NVIDIA GPUs for general-purpose computing tasks beyond graphics, such as scientific simulations, deep learning, and data analytics.
Unlike traditional CPU programming, CUDA allows developers to write programs that can execute thousands of lightweight threads simultaneously on GPU cores. It provides a C/C++-like programming environment with extensions for managing memory, threads, and device execution.
Key characteristics of CUDA include:
- Massive Parallelism: Exploits hundreds or thousands of GPU cores for data-parallel workloads.
- GPU-Accelerated Computation: Offloads heavy numeric or matrix computations from the CPU to the GPU.
- Developer-Friendly APIs: Provides C, C++, Python (via libraries like PyCUDA), and Fortran support.
- Memory Management: Allows explicit allocation and transfer between CPU and GPU memory.
- Wide Adoption: Common in AI, machine learning, scientific research, video processing, and high-performance computing.
Conceptual example of CUDA usage:
// CUDA pseudocode
_global_ void vectorAdd(float *a, float *b, float *c, int n) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
if (i < n) c[i] = a[i] + b[i];
}
Host allocates arrays a, b, c
Copy a and b to GPU memory
Launch vectorAdd kernel on GPU
Copy results back to host memoryConceptually, CUDA is like giving a GPU thousands of small, specialized workers that can all perform the same operation at once, vastly speeding up data-intensive tasks compared to using a CPU alone.
In essence, CUDA transforms NVIDIA GPUs into powerful general-purpose parallel processors, enabling researchers, engineers, and developers to tackle workloads that require enormous computational throughput.