In this post we learn how to detect lane pixels and fit them to display the lane boundaries on the screen. We also learn how to estimate the curvature of the lane and determine the position of the vehicle with respect to the center of the lane. This would be helpful in lane keeping for a self-driving car or could be used as an Advanced Driver Assistance System (ADAS).

## Camera Calibration

### Computing the camera matrix and distortion coefficients

Cameras contain internal distortion to adjust the field of view which warps the actual image. Straight lines in the actual world do not look straight in the camera image. For stereo applications, these distortions need to be corrected first. To find all these parameters, we provide some sample images of a well defined pattern (eg, chess board). We find some specific points in it ( square corners in chess board). We know its coordinates in real world space and we know its coordinates in image. With these data, we can get the distortion coefficients with the help of some mathemtical equations. An in depth discussion on camera calibration can be found here.

I used 17 images of the chessboard pattern for finding the calibration and distortion coefficients. I started by preparing object points, which were the (x, y, z) coordinates of the chessboard corners in the world. I assumed that the chessboard was fixed on the (x, y) plane at z=0, such that the object points were the same for each calibration image. The variable objp is just a replicated array of coordinates, and objpoints were appended with a copy of it every time I successfully detected all chessboard corners in a test image. imgpoints was appended with the (x, y) pixel position of each of the corners in the image plane with each successful chessboard detection.

I then used the output objpoints and imgpoints to compute the camera calibration and distortion coefficients using the cv2.calibrateCamera() function. I applied this distortion correction to the test image using the cv2.undistort() function and obtained this result: The code for this step is given below:

def cal_undistort(img, gray, objpoints, imgpoints, plot=True):
# Calibrate the camera
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

# undistort the image
undist = cv2.undistort(img, mtx, dist, None, mtx)

if plot:
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
f.tight_layout()
ax1.imshow(img)
ax1.set_title('Original Image', fontsize=30)
ax2.imshow(undist)
ax2.set_title('Undistorted Image', fontsize=30)
plt.subplots_adjust(left=0., right=1, top=0.9, bottom=0.)
plt.savefig('./examples/undist_chess_board.png')

return mtx, dist

gray_test_chess = cv2.cvtColor(test_chess, cv2.COLOR_RGB2GRAY)
mtx, dist = cal_undistort(test_chess, gray_test_chess, objpoints, imgpoints, plot=True)


## Pipeline for a single image

### Distortion correction

To remove this distortion we make use of the distortion coefficients obtained from the camera matrix. After distortion correction the output obtained is shown below: ### Thresholded binary image

I used a combination of color and gradient thresholds to generate a binary image . After a lot of trial with different color spaces I found that the L Channel from the LUV color space, with a minimim threshold of 225 and a maximum threshold of 255, did a very good job of idetifying the white lane lines while ignoring the yellow lines.

The B channel from the Lab color space, with a minimum threshold of 155 and a maximum threshold of 200, did a better job than the S channel in identifying the yellow lines and ignored the white lines.

I applied the gradient thresholds in the x direction, y direction but they did not add much information and hence I did not include them in my pipeline.

def abs_sobel_thresh(img, orient, sx_thresh=(0, 200)):
# Apply the following steps to img
# 1) Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
# 2) Take the derivative in x or y given orient = 'x' or 'y'
if orient == 'x':
sobel = cv2.Sobel(gray, cv2.CV_64F, 1, 0)
elif orient == 'y':
sobel = cv2.Sobel(gray, cv2.CV_64F, 0, 1)
# 3) Take the absolute value of the derivative or gradient
abs_sobel = np.absolute(sobel)
# 4) Scale to 8-bit (0 - 255) then convert to type = np.uint8
scaled_sobel = np.uint8(255*abs_sobel/np.max(abs_sobel))
# 5) Create a mask of 1's where the scaled gradient magnitude
# is > thresh_min and < thresh_max
binary_output = np.zeros_like(scaled_sobel)
# 6) Return this mask as your binary_output image
binary_output[(scaled_sobel >= sx_thresh)&(scaled_sobel <= sx_thresh)] = 1
return binary_output

def dir_threshold(img, sobel_kernel=3, sdir_thresh=(0, np.pi/2)):
# Apply the following steps to img
# 1) Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
# 2) Take the gradient in x and y separately
sobelx = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=sobel_kernel)
sobely = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=sobel_kernel)
# 3) Take the absolute value of the x and y gradients
abs_sobelx, abs_sobely = np.absolute(sobelx), np.absolute(sobely)
# 4) Use np.arctan2(abs_sobely, abs_sobelx) to calculate the direction of the gradient
dirn = np.arctan2(abs_sobely, abs_sobelx)
# 5) Create a binary mask where direction thresholds are met
binary_output = np.zeros_like(dirn)
# 6) Return this mask as your binary_output image
binary_output[(dirn >= sdir_thresh)&(dirn <= sdir_thresh)] = 1
return binary_output

# Color transforms, gradients to create  thresholded binary image
def threshold_binary(img, plot=True):
#s_channel = cv2.cvtColor(img, cv2.COLOR_RGB2HLS)[:,:,2]

l_channel = cv2.cvtColor(img, cv2.COLOR_RGB2LUV)[:,:,0]

b_channel = cv2.cvtColor(img, cv2.COLOR_RGB2Lab)[:,:,2]

# Threshold color channel
b_thresh_min = 155
b_thresh_max = 200
b_binary = np.zeros_like(b_channel)
b_binary[(b_channel >= b_thresh_min) & (b_channel <= b_thresh_max)] = 1

l_thresh_min = 225
l_thresh_max = 255
l_binary = np.zeros_like(l_channel)
l_binary[(l_channel >= l_thresh_min) & (l_channel <= l_thresh_max)] = 1

combined_binary = np.zeros_like(b_binary)
combined_binary[(l_binary == 1) | (b_binary == 1)] = 1

kernel = np.ones((3,3),np.uint8)
dilated_combined_binary = cv2.dilate(combined_binary,kernel,iterations = 1)

if plot:
# Plot the result
f, (ax1, ax2) = plt.subplots(1,2, figsize=(15, 5))
f.tight_layout()

ax1.imshow(img)
ax1.set_title('Original image', fontsize=20)

ax2.imshow(dilated_combined_binary, cmap = 'gray')
ax2.set_title('Combined S channel and gradient thresholds', fontsize=20)
plt.subplots_adjust(left=0., right=1, top=0.9, bottom=0.)
plt.savefig('examples/thresholded_binary_image.png')
#return dilation
return dilated_combined_binary

test_images = glob.glob('test_images/straight_lines1.jpg')

for img_num in test_images:
undistorted = cv2.undistort(test_img, mtx, dist, None, mtx)
combined = threshold_binary(undistorted)


Here’s an example of my output for this step. ### Perspective transform

The code for my perspective transform includes a function called warp(). The warp() function takes as inputs an image (img), and the source (src) and destination (dst) points are set inside the function.

def warp(img, plot=True):
img_size = (img.shape, img.shape)
src = np.float32([[(img_size / 2) - 55, img_size / 2 + 100],
[((img_size / 5) - 50), img_size],
[(img_size * 5 / 6) + 35, img_size],
[(img_size / 2 + 55), img_size / 2 + 100]])

dst = np.float32([[(img_size / 4), 0],
[(img_size / 4), img_size],
[(img_size * 3 / 4), img_size],
[(img_size * 3 / 4), 0]])

# Compute the perspective transform, M
M = cv2.getPerspectiveTransform(src, dst)

M_inv = cv2.getPerspectiveTransform(dst, src)

# Create warped image
warped = cv2.warpPerspective(img, M, img_size, flags=cv2.INTER_LINEAR)

warped_copy, im_copy = np.copy(warped), np.copy(img)

# Plotting
if plot:
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(24, 9))
f.tight_layout()
pts = np.array([[(img_size / 2) - 55, img_size / 2 + 100],
[((img_size / 5)) - 50, img_size],
[(img_size * 5 / 6) + 35, img_size],
[(img_size / 2 + 55), img_size / 2 + 100]], np.int32)
pts = pts.reshape((-1,1,2))
cv2.polylines(im_copy, [pts], True, (255,0,0), 5)
ax1.imshow(im_copy)
ax1.set_title('Source Image', fontsize=30)
cv2.rectangle(warped_copy,(img_size // 4, 0),((img_size * 3 )// 4, img_size),(255,0,0),5)
ax2.imshow(warped_copy)
ax2.set_title('Warped Image', fontsize=30)
plt.subplots_adjust(left=0., right=1, top=0.9, bottom=0.)
plt.savefig('examples/warped_image.png')

return warped, M_inv

warped_image, M_inv = warp(undistorted, plot=True)


I chose to hardcode the source and destination points in the following manner:

src = np.float32([[(img_size / 2) - 55, img_size / 2 + 100],
[((img_size / 5) - 50), img_size],
[(img_size * 5 / 6) + 35, img_size],
[(img_size / 2 + 55), img_size / 2 + 100]])

dst = np.float32([[(img_size / 4), 0],
[(img_size / 4), img_size],
[(img_size * 3 / 4), img_size],
[(img_size * 3 / 4), 0]])


This resulted in the following source and destination points:

Source Destination
585, 460 320, 0
206, 720 320, 720
1101, 720 960, 720
695, 460 960, 0

I verified that my perspective transform was working as expected by drawing the src and dst points onto a test image and its warped counterpart to verify that the lines appear parallel in the warped image. ### Fitting the positions of lane lines with a polynomial

I obtained a perspective transform of the the camera images converted into binary images using the warp() function. The lanes lines can be easily separated as shown in the figure below. Then I did the following steps to draw the lane-lines:

• Plot the histogram of the binary warped image which gives a plot of the frquency distribution of the non-zero pixels in the image. The histogram provides information regarding the position of the lane-lines. The x and y position where the lane lines are present are identified and stored in the leftx_base and rightx_base.
import numpy as np
histogram = np.sum(binary_warped[binary_warped.shape//2:,:], axis=0)
plt.plot(histogram)
plt.savefig('examples/histogram.png') • Follow a sliding window search strategy to identify the x and y positions of all non-zero pixels in the image in each window and search within a margin around the leftx_base and rightx_base positions. 9 sliding windows are used to search along the height of the image.
def sliding_window_search(binary_warped, plot=True):
# create the output image for visualization
out_img = np.dstack((binary_warped, binary_warped, binary_warped))*255
histogram = np.sum(binary_warped[binary_warped.shape//2:,:], axis=0)
# Find the peak of the left and right halves of the histogram
# These will be the starting point for the left and right lines
midpoint = np.int(histogram.shape/2)
leftx_base = np.argmax(histogram[:midpoint])
rightx_base = np.argmax(histogram[midpoint:]) + midpoint

# Choose the number of sliding windows
nwindows = 9
# Set height of windows
window_height = np.int(binary_warped.shape/nwindows)
# Identify the x and y positions of all nonzero pixels in the image
nonzero = binary_warped.nonzero()
nonzeroy = np.array(nonzero)
nonzerox = np.array(nonzero)
# Current positions to be updated for each window
leftx_current = leftx_base
rightx_current = rightx_base
# Set the width of the windows +/- margin
margin = 100
# Set minimum number of pixels found to recenter window
minpix = 50
# Create empty lists to receive left and right lane pixel indices
left_lane_inds = []
right_lane_inds = []

# Step through the windows one by one
for window in range(nwindows):
# Identify window boundaries in x and y (and right and left)
win_y_low = binary_warped.shape - (window+1)*window_height
win_y_high = binary_warped.shape - window*window_height
win_xleft_low = leftx_current - margin
win_xleft_high = leftx_current + margin
win_xright_low = rightx_current - margin
win_xright_high = rightx_current + margin
## Draw the windows on the visualization image
if plot:
cv2.rectangle(out_img,(win_xleft_low,win_y_low),(win_xleft_high,win_y_high),(0,255,0), 4)
cv2.rectangle(out_img,(win_xright_low,win_y_low),(win_xright_high,win_y_high),(0,255,0), 4)
# Identify the nonzero pixels in x and y within the window
good_left_inds = ((nonzeroy >= win_y_low) & (nonzeroy < win_y_high) & (nonzerox >= win_xleft_low) & (nonzerox < win_xleft_high)).nonzero()
good_right_inds = ((nonzeroy >= win_y_low) & (nonzeroy < win_y_high) & (nonzerox >= win_xright_low) & (nonzerox < win_xright_high)).nonzero()
# Append these indices to the lists
left_lane_inds.append(good_left_inds)
right_lane_inds.append(good_right_inds)
# If you found > minpix pixels, recenter next window on their mean position
if len(good_left_inds) > minpix:
leftx_current = np.int(np.mean(nonzerox[good_left_inds]))
if len(good_right_inds) > minpix:
rightx_current = np.int(np.mean(nonzerox[good_right_inds]))

# Concatenate the arrays of indices
left_lane_inds = np.concatenate(left_lane_inds)
right_lane_inds = np.concatenate(right_lane_inds)

# Extract left and right line pixel positions
leftx = nonzerox[left_lane_inds]
lefty = nonzeroy[left_lane_inds]
rightx = nonzerox[right_lane_inds]
righty = nonzeroy[right_lane_inds]

# Fit a second order polynomial to each
left_fit = np.polyfit(lefty, leftx, 2)
right_fit = np.polyfit(righty, rightx, 2)

if plot:
# Generate x and y values for plotting
ploty = np.linspace(0, binary_warped.shape-1, binary_warped.shape )
left_fitx = left_fit*ploty**2 + left_fit*ploty + left_fit
right_fitx = right_fit*ploty**2 + right_fit*ploty + right_fit

out_img[lefty, leftx] = [255, 0, 0]
out_img[righty, rightx] = [0, 0, 255]
plt.imshow(out_img)

plt.plot(left_fitx, ploty, color='yellow')
plt.plot(right_fitx, ploty, color='yellow')
plt.xlim(0, 1280)
plt.ylim(720, 0)
plt.savefig('examples/sliding_windows.png')

return left_fit, right_fit

left_fit, right_fit = sliding_window_search(binary_warped, plot=True) • A margin search is used to apply the sliding window within a margin of +/- 100 pixels around the leftx_base and rightx_base found in the particular sliding window. This assumption is practical since lane lines are not found randomly across the image. I stored the positions of the left lane line points and the right lane points and fit them using a 2nd degree polynomial. This is achieved inside the margin_search() function.
def margin_search(binary_warped, left_fit, right_fit, plot=True):
nonzero = binary_warped.nonzero()
nonzeroy = np.array(nonzero)
nonzerox = np.array(nonzero)
margin = 100
left_lane_inds = ((nonzerox > (left_fit*(nonzeroy**2) + left_fit*nonzeroy + left_fit - margin)) & (nonzerox < (left_fit*(nonzeroy**2) + left_fit*nonzeroy + left_fit + margin)))
right_lane_inds = ((nonzerox > (right_fit*(nonzeroy**2) + right_fit*nonzeroy + right_fit - margin)) & (nonzerox < (right_fit*(nonzeroy**2) + right_fit*nonzeroy + right_fit + margin)))

# Again, extract left and right line pixel positions
leftx = nonzerox[left_lane_inds]
lefty = nonzeroy[left_lane_inds]
rightx = nonzerox[right_lane_inds]
righty = nonzeroy[right_lane_inds]
# Fit a second order polynomial to each
left_fit = np.polyfit(lefty, leftx, 2)
right_fit = np.polyfit(righty, rightx, 2)
# Generate x and y values for plotting
ploty = np.linspace(0, binary_warped.shape-1, binary_warped.shape )
left_fitx = left_fit*ploty**2 + left_fit*ploty + left_fit
right_fitx = right_fit*ploty**2 + right_fit*ploty + right_fit

if plot:
# Create an image to draw on and an image to show the selection window
out_img = np.dstack((binary_warped, binary_warped, binary_warped))*255
window_img = np.zeros_like(out_img)
# Color in left and right line pixels
out_img[lefty, leftx] = [255, 0, 0] # Red
out_img[righty, rightx] = [0, 0, 255] # Blue

# Generate a polygon to illustrate the search window area
# And recast the x and y points into usable format for cv2.fillPoly()
left_line_window1 = np.array([np.transpose(np.vstack([left_fitx-margin, ploty]))])
left_line_window2 = np.array([np.flipud(np.transpose(np.vstack([left_fitx+margin, ploty])))])
left_line_pts = np.hstack((left_line_window1, left_line_window2))
right_line_window1 = np.array([np.transpose(np.vstack([right_fitx-margin, ploty]))])
right_line_window2 = np.array([np.flipud(np.transpose(np.vstack([right_fitx+margin, ploty])))])
right_line_pts = np.hstack((right_line_window1, right_line_window2))

# Draw the lane onto the warped blank image
cv2.fillPoly(window_img, np.int_([left_line_pts]), (0,255, 0))
cv2.fillPoly(window_img, np.int_([right_line_pts]), (0,255, 0))
result = cv2.addWeighted(out_img, 1, window_img, 0.3, 0)

plt.imshow(result)
plt.plot(left_fitx, ploty, color='yellow')
plt.plot(right_fitx, ploty, color='yellow')
plt.xlim(0, 1280)
plt.ylim(720, 0)
plt.savefig('examples/margin_search.png')

return left_fit, right_fit

left_fit, right_fit = margin_search(binary_warped, left_fit, right_fit, plot=True) ### Calculating the radius of curvature of the lane

For calculating the radius of curvature it is necessary to find a relationship between the distance between 2 pixels in the image and the actual distance between 2 points on the road. Assuming the meters per pixel in the y direction and x direction to be 30/720 and 3.7/700 respectively the value of the lane line points on the road was obtained. The points were fit into a 2nd degree polynomial and the radius of curvature was calculated using the formula mentioned here.

def radius_curvature(ploty, left_fitx, right_fitx):
# Define y-value where we want radius of curvature
# I'll choose the maximum y-value, corresponding to the bottom of the image
y_eval = np.max(ploty)

# Define conversions in x and y from pixels space to meters
ym_per_pix = 30/720 # meters per pixel in y dimension
xm_per_pix = 3.7/700 # meters per pixel in x dimension

# Fit new polynomials to x,y in world space
left_fit_cr = np.polyfit(ploty*ym_per_pix, left_fitx*xm_per_pix, 2)
right_fit_cr = np.polyfit(ploty*ym_per_pix, right_fitx*xm_per_pix, 2)

# Calculate the new radii of curvature
left_curverad = ((1 + (2*left_fit_cr*y_eval*ym_per_pix + left_fit_cr)**2)**1.5) / np.absolute(2*left_fit_cr)
right_curverad = ((1 + (2*right_fit_cr*y_eval*ym_per_pix + right_fit_cr)**2)**1.5) / np.absolute(2*right_fit_cr)


### Calculating the position of the vehicle with respect to center

The position of the vehicle with respect to the center was calculated by subtracting the mid-point of the width of image from the mid-point of the x-values of the left lane and right lane. This is achived in the center_offset() function.

def center_offset(left_fitx, right_fitx):
#print(len(rightx), len(leftx))
# Calculate the position of the vehicle
lane_center = (right_fitx + left_fitx)/2
xm_per_pix = 3.7/700 # meters per pixel in x dimension
center_offset_pixels = abs(640 - lane_center)
center_offset_m = xm_per_pix*center_offset_pixels
return center_offset_m


### Sample output

The region between the 2 lane lines was colored in green so that the lane area was identified clearly. Here is an example of my result on the test image: Here is the output on a few more test images: ## Pipeline for the video

Here’s a sample implementation of the pipeline run on the project video. Here’s the link to my video result on YouTube.

## Limitations of the pipeline

I faced a few issues when I hadn’t averaged the lane lines obtained over a few frames. In a few cases the lane lines jumped around a bit and didn’t look stable. After averaging over 10 frames, the lane lines looked fairly stable.

My pipeline would suffer if there is occlusion on the lane lines, that is if a car in the front moves on the lane lines. Also, it would be difficult to track the lanes if the car changes lanes. As I figured out my pipeline fails in the harder challenge video where the lane lines have a lot more bends. The assumption of fitting the lane lines using a quadratic polynomial is not robust here. My pipeline also suffers when there are shadows on the road or there is bright sunlight on the road as it can’t distinguish between the white lane lines and white reflection from the road. To summarize I have learned that it is relatively easy to finetune a pipeline to work well in ideal road and weather conditions, but it is really challenging to find single combination which works robustly in any condition.

I feel that using a 2 stream ConvNet suitable for videos, where one stream identifies the lane lines and the other stream takes advantage of the temporal information could be useful. Also, since videos are sequential data Recurrent Neural Networks could be used as they work well in sequence analysis. I would love to continue working on the project and explore these techniques.