20. Pos Estimation
Pos Estimation
Prerequisites
1
python
1. Pos Estimation
Pose Estimation
Pose estimation is the process of finding the position and orientation of an object relative to a camera. In computer vision, this is commonly done using known 3D points and their corresponding 2D image points.
For example, when detecting a chessboard pattern, we already know the real-world 3D coordinates of each corner:
1
2
3
4
(0,0,0)
(1,0,0)
(2,0,0)
...
At the same time, the image coordinates of those corners:
1
2
3
(523,221)
(540,220)
...
Using these 3D–2D correspondences together with the camera intrinsic matrix, Estimating the camera pose by minimizing the reprojection error between known 3D object points and their corresponding 2D image points.
The algorithm first assumes an initial rotation and translation, then projects the 3D points onto the image plane using the camera intrinsic matrix. After projection, it compares the predicted 2D positions with the actual detected image points.
The difference between these points is called the reprojection error. Repeatedly updating the rotation and translation values to reduce this error through optimization.
1
[R|T]
- Rotation vector: R
- Translation vector: T
This means the algorithm estimates how the object is rotated and where it is located relative to the camera.
After obtaining the pose, it can project 3D points back onto the image using projection.
For example, assume a 3D point exists at:
1
(2,1,10)
This means:
- 2 units to the right
- 1 unit upward
- 10 units in front of the camera
To project this point onto the image plane:
\[x = X / Z = 2 / 10 = 0.2, \:\:\: y = Y / Z = 1 / 10 = 0.1\]These normalized coordinates are then converted into pixel coordinates using the intrinsic matrix:
\[u = fx \times x + cx, \:\:\: v = fy \times y + cy\]If:
1
2
3
4
fx = fy = 1000
cx = 640
cy = 360
So the 3D point (2,1,10) appears at pixel position:
1
(840,460)
This is the core idea behind pose estimation and 3D point projection in computer vision.
2. Pos Estimation Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
import cv2 as cv
import os
import ImageUtils
import VideoUtils
import numpy as np
import MultiImageViewer as view
import Viewers
import ImageProcessing as ip
import glob
def pose_estimation_webcam():
# Inner corners count
CHECKERBOARD = (7, 6)
# Camera Matrix
camera_matrix = np.array([
[1.30458050e+03, 0.00000000e+00, 2.80160438e+02],
[0.00000000e+00, 1.32052587e+03, 3.11461051e+02],
[0.00000000e+00, 0.00000000e+00, 1.00000000e+00]
], dtype=np.float32)
# Distortion Coefficients
dist_coeffs = np.array([
[-1.83011423e+00,
1.28165170e+02,
5.68135052e-02,
3.31047683e-02,
-2.27203441e+03]
], dtype=np.float32)
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER,30,0.001)
# 3D object points of chessboard corners
objp = np.zeros((CHECKERBOARD[0] * CHECKERBOARD[1], 3), np.float32)
objp[:, :2] = np.mgrid[0:CHECKERBOARD[0],0:CHECKERBOARD[1]].T.reshape(-1, 2)
# Axis length
axis = np.float32([[3, 0, 0], [0, 3, 0], [0, 0, -3]])
cap = cv.VideoCapture(0)
if not cap.isOpened():
return
while True:
ret, frame = cap.read()
if not ret:
break
gray = cv.cvtColor(frame, cv.COLOR_BGR2GRAY)
found, corners = cv.findChessboardCorners(gray,CHECKERBOARD,None)
if not found:
print("Chessboard corners not found")
if found:
print("Chessboard corners found")
corners_refined = cv.cornerSubPix(gray,corners,(11, 11),(-1, -1),criteria)
success, rvec, tvec = cv.solvePnP(objp,corners_refined,camera_matrix,dist_coeffs)
if success:
imgpts, _ = cv.projectPoints(axis,rvec,tvec,camera_matrix,dist_coeffs)
corner = tuple(corners_refined[0].ravel().astype(int))
imgpts = imgpts.reshape(-1, 2).astype(int)
cv.line(frame,corner,tuple(imgpts[0]),(255, 0, 0),4) # Draw X axis - blue
cv.line(frame,corner,tuple(imgpts[1]),(0, 255, 0),4) # Draw Y axis - green
cv.line(frame,corner,tuple(imgpts[2]),(0, 0, 255),4) # Draw Z axis - red
cv.putText(frame,f"tvec: x={tvec[0][0]:.2f}, y={tvec[1][0]:.2f}, z={tvec[2][0]:.2f}",(20, 40),cv.FONT_HERSHEY_SIMPLEX,0.7,(0, 255, 255),2)
cv.imshow("Pose Estimation", frame)
key = cv.waitKey(1) & 0xFF
if key == ord("q") or key == 27:
break
cap.release()
cv.destroyAllWindows()
if __name__ == "__main__":
pose_estimation_webcam()
