/ˈpærəˌlɛlɪzəm/

noun … “Doing multiple computations at the same time.”

Parallelism is a computing model in which multiple computations or operations are executed simultaneously, using more than one processing resource. Its purpose is to reduce total execution time by dividing work into independent or partially independent units that can run at the same time. Parallelism is a core technique in modern computing, driven by the physical limits of single-core performance and the widespread availability of multicore processors, accelerators, and distributed systems.

At a technical level, parallelism exploits hardware that can perform multiple instruction streams concurrently. This includes multicore CPUs, many-core GPUs, and clusters of machines connected by high-speed networks. Each processing unit works on a portion of the overall problem, and the partial results are combined to produce the final outcome. The effectiveness of parallelism depends on how well a problem can be decomposed and how much coordination is required between tasks.

A key distinction is between parallelism and Concurrency. Concurrency describes the structure of a program that can make progress on multiple tasks at overlapping times, while parallelism specifically refers to those tasks running at the same instant on different hardware resources. A concurrent program may or may not be parallel, but parallel execution always implies some degree of concurrency.

There are several common forms of parallelism. Data parallelism applies the same operation to many elements of a dataset simultaneously, such as processing pixels in an image or rows in a matrix. Task parallelism assigns different operations or functions to run in parallel, often coordinating through shared data or messages. Pipeline parallelism structures computation as stages, where different stages process different inputs concurrently. Each form has different synchronization, memory, and performance characteristics.

In practice, implementing parallelism requires careful coordination. Tasks must be scheduled, data must be shared or partitioned safely, and results must be synchronized. Overheads such as communication, locking, and cache coherence can reduce or eliminate performance gains if not managed properly. Concepts like load balancing, minimizing contention, and maximizing locality are central to effective parallel design.

A typical workflow example is numerical simulation. A large grid is divided into subregions, each assigned to a different core or node. All regions are updated in parallel for each simulation step, then boundary values are exchanged before the next step begins. This approach allows simulations that would take days on a single processor to complete in hours when parallelized effectively.

Parallelism also underlies many high-level programming models and systems. Thread-based models distribute work across cores within a single process. Process-based models use multiple address spaces for isolation. Distributed systems extend parallelism across machines, often using message passing. Languages and runtimes such as OpenMP, CUDA, and actor-based systems provide abstractions that expose parallelism while attempting to reduce complexity.

Conceptually, parallelism is like assigning many builders to construct different parts of a structure at the same time. Progress accelerates dramatically when tasks are independent and well-coordinated, but slows when workers constantly need to stop and synchronize.

See Concurrency, Threading, Multiprocessing, Distributed Systems, GPU.