17. SIFT
SIFT
Prerequisites
1
python
1. SIFT
SIFT (Scale-Invariant Feature Transform) is a feature extraction algorithm used in computer vision to detect and describe local features in an image. It is designed to be robust against image scaling, rotation, illumination changes, and partial viewpoint variations.
The first step of SIFT is keypoint detection. SIFT searches for stable feature points across multiple image scales using a Difference of Gaussian (DoG) pyramid. The image is repeatedly blurred and downsampled to create scale-space representations. Then, adjacent Gaussian-blurred images are subtracted to detect strong local extrema.
After detecting candidate keypoints, SIFT computes gradient information around each point. The gradients are calculated using image derivatives:
$dx = I(x + 1, y) - I(x - 1, y)$ $dy = I(x, y + 1) - I(x, y - 1)$
Using these values, the gradient magnitude and orientation are calculated:
magnitude = $sqrt(dx² + dy²)$ orientation = $atan2(dy, dx)$
The dominant orientation of the local region is assigned to the keypoint. This makes SIFT rotation invariant because descriptors are generated relative to the keypoint orientation.
Next, SIFT creates a descriptor around the keypoint. A local image patch is divided into a 4 × 4 grid. For each cell, an orientation histogram with 8 bins is computed using gradient directions. Since there are:
4 × 4 cells × 8 orientation bins = 128 values
the final SIFT descriptor becomes a 128-dimensional feature vector.
These descriptors are highly distinctive and can be used for feature matching between images. Similar descriptors usually represent the same physical point in different images.
Unlike Harris Corner Detection, SIFT not only detects feature points but also provides scale and orientation information, making it much more robust for real-world applications such as object recognition, panorama stitching, SLAM, and image matching.
SIFT (Scale-Invariant Feature Transform) Important Logic Flow
- Scale-Space Construction
- Difference of Gaussian (DoG)
- 3D Local Extrema Detection
- Remove Unstable Keypoints
- Orientation Assignment
- Descriptor Generation
- Feature Matching
Core Ideas of SIFT
Scale Invariant: Features are detected across multiple scales.
Rotation Invariant: Descriptors are aligned using dominant orientation.
Robust Descriptor: Gradient histograms represent local structure rather than raw pixel values.
Matching: Descriptors can match the same object even if:
- resized
- rotated
- partially illuminated differently
2. SIFT Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
if __name__ == "__main__":
cap = cv.VideoCapture(0)
if not cap.isOpened():
exit()
ret, img = cap.read()
viewer = view.MultiImageViewer.from_images(img, img, sync_view=False)
while True:
ret, img = cap.read()
if not ret:
break
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
sift = cv.SIFT_create()
keypoints, descriptors = sift.detectAndCompute(gray, None)
imgSift = cv.drawKeypoints(img, keypoints, None, color=(0,255,0))
if ret:
viewer.update_images(img, imgSift)
viewer.draw()
key = cv.waitKey(1) & 0xFF
if key == ord('q'):
break