/paɪ-ˈkuː-də/
n. “A Python library that lets developers access CUDA from Python programs.”
PyCUDA is a Python wrapper for NVIDIA CUDA, enabling developers to write high-performance parallel programs for GPUs directly from Python. It combines Python’s ease of use with the computational power of CUDA, allowing rapid development, experimentation, and integration with scientific or AI workflows.
PyCUDA provides direct access to GPU memory management, kernel execution, and asynchronous computation while keeping the Python syntax familiar and intuitive. It also automates resource cleanup and integrates smoothly with NumPy arrays, making it highly practical for numerical computing and machine learning.
Key characteristics of PyCUDA include:
- Python Integration: Write GPU kernels and manage memory using Python code.
- Kernel Execution: Launch CUDA kernels from Python with minimal boilerplate.
- Memory Management: Automatic cleanup while supporting explicit control over GPU memory.
- NumPy Interoperability: Transfer arrays between host and GPU efficiently.
- Rapid Prototyping: Ideal for research, AI experiments, and GPU-accelerated computations.
Conceptual example of PyCUDA usage:
import pycuda.autoinit
import pycuda.driver as drv
import numpy as np
from pycuda.compiler import SourceModule
mod = SourceModule("<_global_ void double_elements(float *a) { int idx = threadIdx.x + blockIdx.x * blockDim.x; a[idx] *= 2; }>")
a = np.array([1, 2, 3, 4], dtype=np.float32)
a_gpu = drv.mem_alloc(a.nbytes)
drv.memcpy_htod(a_gpu, a)
func = mod.get_function("double_elements")
func(a_gpu, block=(4,1,1), grid=(1,1))
drv.memcpy_dtoh(a, a_gpu)
print(a) # Output: [2. 4. 6. 8.]Conceptually, PyCUDA allows Python developers to “speak GPU” directly, turning high-level Python code into massively parallel operations on GPU cores. It bridges the gap between prototyping and high-performance computation, making GPUs accessible without leaving the comfort of Python.