Post

QA - Section 1.06. Debugging & Optimizing Parallel Code

QA - Section 1.06. Debugging & Optimizing Parallel Code

QA - Section 1.06. Debugging & Optimizing Parallel Code


1. Why is debugging parallel code difficult?

Parallel programs are inherently non-deterministic, meaning the execution order of threads can change every run. This leads to bugs that are:

  • Hard to reproduce
  • Timing-dependent
  • Sensitive to small changes (e.g., adding logs)

Additionally, shared state, synchronization, and memory visibility introduce complexity that does not exist in single-threaded code.

The same code can behave differently across runs due to scheduling.

2. How do you find race conditions?

Race conditions occur when multiple threads access shared data without proper synchronization.

Strategies
  • Compare result on single thread
  • Use tools (ThreadSanitizer, Helgrind)
  • Add logging (carefully, as it may change timing)
  • Minimize shared state
  • Use assertions and invariants
Example (Race Condition)
1
2
3
4
5
int counter = 0;

void inc() {
    counter++; // unsafe
}
Fix
1
2
3
#include <atomic>

std::atomic<int> counter = 0;

3. How do you detect deadlocks?

Deadlocks occur when threads wait on each other indefinitely.

Strategies
  • Analyze lock order
  • Use tools (deadlock detectors)
  • Add timeouts or watchdogs
  • Log lock acquisition/release
Prevention
  • Always lock in consistent order
  • Use std::scoped_lock
1
std::scoped_lock lock(m1, m2); // avoids deadlock

4. How do you measure performance?

You should measure both execution time and resource usage.

Basic Timing
1
2
3
4
5
6
7
#include <chrono>

auto start = std::chrono::high_resolution_clock::now();

// work

auto end = std::chrono::high_resolution_clock::now();
Metrics
  • Execution time
  • CPU utilization
  • Throughput (tasks/sec)
  • Latency

5. What do you compare before and after parallelization?

  • Total execution time
  • CPU usage
  • Scalability (speedup vs number of threads)
  • Efficiency (speedup / thread count)

6. How do you handle non-reproducible bugs?

These are common in multithreaded systems.

Approach
  • Reduce concurrency (force single thread)
  • Add controlled delays
  • Use deterministic scheduling tools
  • Narrow down shared state

7. False Sharing

False sharing happens when threads modify different variables in the same cache line.

Example
1
2
3
4
5
struct Data 
{
    int a;
    int b;
};

Fix with padding:

1
2
3
4
5
struct Data 
{
    alignas(64) int a;
    alignas(64) int b;
};

8. Load Balancing

Load balancing ensures all threads do similar work.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <thread>
#include <vector>

void heavy(int n)
{
    for (int i = 0; i < n; i++) 
    {
        // heavy work
    }
}

int main() 
{
    std::thread t1(heavy, 100000000);
    std::thread t2(heavy, 1000);

    t1.join();
    t2.join();
}
Strategies
  • Equal partitioning
  • Dynamic scheduling
  • Work stealing

9. Task Granularity

Granularity is the size of each task.

  • Too small → overhead dominates
  • Too large → imbalance

Balance is key.

10. Lock Contention

Multiple threads competing for the same lock reduces performance.

Example
1
2
3
4
5
6
7
8
9
10
std::mutex m;

void work() 
{
    std::lock_guard<std::mutex> lock(m);
}

std::thread t1(work);
std::thread t2(work);
std::thread t3(work);

11. Deadlock vs Starvation vs Livelock vs Spinlock

  • Deadlock: threads wait forever
  • Starvation: one thread never progresses
  • Livelock: threads are active but make no progress
  • Spinlock: a locking mechanism where a thread continuously checks (busy-waits) until the lock becomes available

12. Thread Pool

Thread pools reuse threads instead of creating new ones.

Benefits:

  • Reduced overhead
  • Better control of concurrency

13. Local Accumulation + Merge

Avoid shared writes by using per-thread local data.

Example

1
2
3
4
int local = 0;
// compute locally

// merge later

14. Profiling First, Then Parallelize

Always identify bottlenecks before adding threads.

Do not parallelize blindly.

This post is licensed under CC BY 4.0 by the author.