DirectCompute
/dəˈrɛkt-kəmˈpjuːt/
n. “A Microsoft API for performing general-purpose computing on GPUs.”
DirectCompute is part of the Microsoft DirectX family and provides an API that allows developers to leverage GPU computing for tasks beyond graphics rendering. It enables general-purpose parallel computations on compatible GPUs using the same hardware acceleration that powers 3D graphics.
cuDNN
/ˌsiː-juː-diː-ɛn-ɛn/
n. “A GPU-accelerated library for deep neural networks developed by NVIDIA.”
cuDNN, short for CUDA Deep Neural Network library, is a GPU-accelerated library created by NVIDIA that provides highly optimized implementations of standard routines used in deep learning. It is designed to work with CUDA-enabled GPUs and is commonly integrated into frameworks such as TensorFlow, PyTorch, and MXNet to accelerate training and inference of neural networks.
TensorRT
/ˈtɛnsər-ɑːr-ti/
n. “A high-performance deep learning inference library for NVIDIA GPUs.”
TensorRT is a platform developed by NVIDIA that optimizes and accelerates the inference of neural networks on GPUs. Unlike training-focused frameworks, TensorRT is designed specifically for deploying pre-trained deep learning models efficiently, minimizing latency and maximizing throughput in production environments.
Digital Signal Processor
/diː-ɛs-piː/
n. “A specialized microprocessor designed to efficiently perform digital signal processing tasks.”
DSP, short for Digital Signal Processor, is a type of processor optimized for real-time numerical computations on signals such as audio, video, communications, and sensor data. Unlike general-purpose CPUs, DSPs include specialized hardware features like multiply-accumulate units, circular buffers, and hardware loops to accelerate mathematical operations commonly used in signal processing algorithms.
NVIDIA
/ɛnˈvɪdiə/
n. “An American technology company specializing in GPUs and AI computing platforms.”
NVIDIA is a leading technology company known primarily for designing graphics processing units (GPUs) for gaming, professional visualization, and data centers. Founded in 1993, NVIDIA has expanded its focus to include high-performance computing, artificial intelligence, deep learning, and autonomous vehicle technologies.
PyCUDA
/paɪ-ˈkuː-də/
n. “A Python library that lets developers access CUDA from Python programs.”
PyCUDA is a Python wrapper for NVIDIA CUDA, enabling developers to write high-performance parallel programs for GPUs directly from Python. It combines Python’s ease of use with the computational power of CUDA, allowing rapid development, experimentation, and integration with scientific or AI workflows.
CUDA
/ˈkuː-də/
n. “A parallel computing platform and programming model for NVIDIA GPUs.”
CUDA, short for Compute Unified Device Architecture, is a proprietary parallel computing platform and application programming interface (API) developed by NVIDIA. It enables software developers to harness the massive parallel processing power of NVIDIA GPUs for general-purpose computing tasks beyond graphics, such as scientific simulations, deep learning, and data analytics.
GPGPU
/ˌdʒiː-piː-dʒiː-piː-juː/
n. “The use of a graphics processing unit to perform general-purpose computation.”
GPGPU, short for General-Purpose computing on Graphics Processing Units, refers to using a GPU to perform computations that are not limited to graphics rendering. While GPUs were originally designed to accelerate drawing pixels and polygons, their massively parallel architecture makes them exceptionally good at handling large-scale numerical and data-parallel workloads.
Graphics Processing Unit
/ˌdʒiː-piː-ˈjuː/
n. “The processor built for crunching graphics and parallel tasks.”
GPU, short for Graphics Processing Unit, is a specialized processor designed to accelerate rendering of images, video, and animations for display on a computer screen. Beyond graphics, modern GPUs are also used for parallel computation in fields like machine learning, scientific simulations, and cryptocurrency mining.
Key characteristics of GPU include:
TPU
/ˌtiː-piː-ˈjuː/
n. “Silicon designed to think fast.”
TPU, or Tensor Processing Unit, is Google’s custom-built hardware accelerator specifically crafted to handle the heavy lifting of machine learning workloads. Unlike general-purpose CPUs or even GPUs, TPUs are optimized for tensor operations — the core mathematical constructs behind neural networks, deep learning models, and AI frameworks such as TensorFlow.