QA - Section 1.03. Parallel

Posted Apr 16, 2026

3 min read

1. How do you decide what to parallelize?

You should parallelize parts of the code that are computationally expensive and independent. Typically, large loops or operations over big datasets are good candidates.

Example (Good Candidate)

  
#include <thread>
#include <vector>

void process(std::vector<int>& v, int start, int end) 
{
    for (int i = start; i < end; i++)
        v[i] *= 2;
}

int main() {
    std::vector<int> v(1000000, 1);

    std::thread t1(process, std::ref(v), 0, 500000);
    std::thread t2(process, std::ref(v), 500000, 1000000);

    t1.join();
    t2.join();
}

2. Why can parallelization make things slower?

If the workload is too small, the overhead of threading dominates.

Example (Too Fine-Grained)

  
for (int i = 0; i < 1000; i++) 
{
    std::thread([] {
        // very small work
    }).detach();
}

Thread creation cost > actual work

3. How do you decide granularity?

You must balance between too small and too large tasks.

Example (Better Granularity)

  
int chunk = v.size() / 4;

for (int i = 0; i < 4; i++) 
    threads.emplace_back(process, std::ref(v), i * chunk, (i + 1) * chunk);

Each thread has enough work

4. How do you reduce workload imbalance?

Split work evenly or dynamically assign tasks.

Example (Static Balanced Split)

  
int mid = v.size() / 2;

std::thread t1(process, std::ref(v), 0, mid);
std::thread t2(process, std::ref(v), mid, v.size());

Equal work → better utilization

Multiple threads update variables on the same cache line.

  
struct Data 
{
    int a;
    int b;
};

Data d;

void t1() { for (int i = 0; i < 1e7; i++) d.a++; }
void t2() { for (int i = 0; i < 1e7; i++) d.b++; }

a and b may share the same cache line → performance drop

Fix (Padding)

  
struct Data 
{
    alignas(64) int a;
    alignas(64) int b;
};

6. Why is cache locality important?

Accessing contiguous memory improves performance.

Example (Good Locality)

  
for (int i = 0; i < n; i++)
    arr[i] += 1;

Bad Locality

  
for (int i = 0; i < n; i++)
    arr[i * 1024] += 1;

Cache miss increases

7. Amdahl’s Law

Even if you parallelize part of the code, the serial part limits speedup.

Example

// 90% parallel, 10% serial
// Max speedup ≈ 10x even with infinite threads

8. What are parallelization overheads?

Thread creation
Synchronization
Context switching

Example

  
std::thread t1(func);
t1.join(); // synchronization overhead

9. What is lock contention?

Multiple threads compete for the same lock.

Example

  
#include <mutex>

std::mutex m;
int counter = 0;

void inc() 
{
    std::lock_guard<std::mutex> lock(m);
    counter++;
}

Many threads → waiting → performance drop

10. Why can memory allocation become a bottleneck?

Allocators are often shared and require synchronization.

Example

  
void work() 
{
    for (int i = 0; i < 1000000; i++) 
    {
        int* p = new int(i);
        delete p;
    }
}

Multiple threads doing this → allocator contention

In a multithreaded environment, new and delete themselves do not directly cause threads to wait. Instead, they rely on an internal memory allocator that must be thread-safe. Because multiple threads may access the same memory pool at the same time, the allocator uses synchronization mechanisms such as locks or atomic operations. As a result, when one thread is allocating or freeing memory, other threads may have to wait, leading to contention and potential performance overhead.

Better Approach (Object Reuse)

  
std::vector<int> buffer(1000000);

for (int i = 0; i < buffer.size(); i++)
    buffer[i] = i;

QA, QA - Selection 1

QA QA - Selection 1

This post is licensed under CC BY 4.0 by the author.

QA - Section 1.03. Parallel

1. How do you decide what to parallelize?

Example (Good Candidate)

2. Why can parallelization make things slower?

Example (Too Fine-Grained)

3. How do you decide granularity?

Example (Better Granularity)

4. How do you reduce workload imbalance?

Example (Static Balanced Split)

5. What is False Sharing?

Example (False Sharing)

Fix (Padding)

6. Why is cache locality important?

Example (Good Locality)

Bad Locality

7. Amdahl’s Law

Example

8. What are parallelization overheads?

Example

9. What is lock contention?

Example

10. Why can memory allocation become a bottleneck?

Example

Better Approach (Object Reuse)

Trending Tags