Post

Avoiding High-Latency Operations

Avoiding High-Latency Operations

Avoiding High-Latency Operations in C++ (Division Optimization)


Prerequisites


1. Why Division is Expensive

Not all CPU operations cost the same.

Approximate latency (cycles):
OperationLatency
Add / Sub~1 cycle
Multiply~3–5 cycles
Divide~10–30+ cycles

Division is significantly slower than other arithmetic operations. Avoid division (/) in performance-critical paths whenever possible.

2. Replace Division with Multiplication

Loop Optimization

❌ Bad
1
2
3
4
for (int i = 0; i < N; i++) 
{
    arr[i] = data[i] / scale;
}
✔ Better
1
2
3
4
float inv_scale = 1.0f / scale;

for (int i = 0; i < N; i++) 
    arr[i] = data[i] * inv_scale;

Integer Division Optimization

✔ Division by constant
1
int x = value / 8;

Replace with bit shift:

1
int x = value >> 3;

Actually compiler may optimize automatically, but:

  • Only for compile-time constants
  • Not always optimal

Approximation Techniques

✔ Fast inverse (approximate)

1
float inv = 1.0f / x;
  • Lookup table
  • Newton-Raphson iteration
  • SIMD intrinsic (e.g., _mm_rcp_ps)
1
2
float inv = approx_inverse(x);
inv = inv * (2.0f - x * inv);  // refine

Faster than division in some cases

Trade-offs

ApproachProsCons
DivisionAccurateSlow
Multiply by inverseFasterSlight precision loss
ApproximationVery fastLess accurate

❗ Precision matters:

  • Financial / critical systems → avoid approximation
  • Graphics / simulation → approximation acceptable

✔ DO

  • Precompute reciprocal values
  • Use multiplication inside loops
  • Replace division by constant with shift

❌ DON’T

  • Use division inside tight loops
  • Ignore precision requirements
  • Assume compiler always optimizes
This post is licensed under CC BY 4.0 by the author.