Post

Image File Formats

Image File Formats

🖼️ Why Image Formats Matter in Computer Vision

Image file formats define:

  • How pixels are stored
  • Whether data is compressed
  • Whether information is lost
  • How images are read and written

To process images correctly, you must understand what happens before pixels become a matrix.


📦 Common Image Formats Overview

FormatCompressionLossyAlphaTypical Use
BMPNoneRaw storage, debugging
PNGLosslessMasks, UI, screenshots
JPEGLossyPhotos, datasets

🟦 BMP (Bitmap Image)

Concept

BMP stores raw pixel values with almost no processing.

  • Very simple structure
  • Large file size
  • Easy to parse manually

BMP File Structure

1
2
3
4
[ Bitmap File Header ]
[ DIB Header ]
[ Color Table ] (optional)
[ Pixel Array ]

Bitmap File Header (14 bytes)

  • Signature: ‘BM’
  • File size
  • Pixel data offset

DIB Header (40 bytes typical)

  • Image width / height
  • Bits per pixel (8 / 24 / 32)
  • Compression (usually none)

Pixel Storage

  • Stored bottom-up by default
  • Each row padded to 4-byte alignment
  • Common format: BGR (not RGB)

When to Use BMP

  • Debugging
  • Teaching
  • Raw inspection

Not suitable for large-scale CV pipelines.


🟩 PNG (Portable Network Graphics)

Concept

PNG uses lossless compression.

  • Pixel-perfect reconstruction
  • Supports alpha channel
  • More complex structure

PNG File Structure

1
2
3
4
5
[ Signature ]
[ IHDR ]
[ PLTE ] (optional)
[ IDAT ] (compressed data)
[ IEND ]

Each block is called a chunk.


Key Chunks

IHDR

  • Width, Height
  • Bit depth (8 / 16)
  • Color type (Gray, RGB, RGBA)

IDAT

  • zlib-compressed pixel data

PNG Filtering

Before compression, each row is filtered:

  • Sub
  • Up
  • Average
  • Paeth

Purpose:

  • Improve compression efficiency

When to Use PNG

  • Binary masks
  • Labels
  • Screenshots
  • Lossless datasets

🟥 JPEG (JPG)

Concept

JPEG uses lossy compression based on human perception.

  • Much smaller files
  • Information is lost
  • No alpha channel

JPEG Compression Pipeline

1
2
3
4
5
6
7
8
9
10
11
RGB
 ↓
YCbCr conversion
 ↓
Block splitting (8×8)
 ↓
DCT
 ↓
Quantization (lossy)
 ↓
Entropy coding

Key Idea: Quantization

High-frequency components are discarded.

Result:

  • Compression
  • Blocking artifacts

When to Use JPEG

  • Natural images
  • Large datasets
  • When small size matters

Avoid JPEG for:

  • Masks
  • Depth images
  • Scientific measurements

🧠 Reading Images (Conceptual)

Reading means:

  1. Parse header
  2. Decode compression (if any)
  3. Reconstruct pixel array
  4. Convert to matrix

Result: \(I \in \mathbb{R}^{H \times W \times C}\)


✍️ Writing Images (Conceptual)

Writing means:

  1. Prepare pixel array
  2. Apply format-specific encoding
  3. Write headers + data

Key decision:

  • Lossy vs lossless
  • Bit depth
  • Channel order

⚠️ Practical CV Pitfalls

  • JPEG introduces non-linear noise
  • PNG preserves exact values
  • BMP row padding causes bugs
  • Channel order differs (RGB vs BGR)

🧩 Summary: How to Choose

NeedBest Choice
Exact pixel valuesPNG / BMP
Small file sizeJPEG
Binary maskPNG
DebuggingBMP
Training imagesJPEG / PNG

🎯 Takeaway

Image formats are not just containers.

They define:

  • Pixel integrity
  • Numerical accuracy
  • Pipeline correctness

Understanding BMP / PNG / JPEG is essential to build reliable computer vision systems 🚀

This post is licensed under CC BY 4.0 by the author.