Lane keeping for self-driving cars


In this post we learn how to detect lane pixels and fit them to display the lane boundaries on the screen. We also learn how to estimate the curvature of the lane and determine the position of the vehicle with respect to the center of the lane. This would be helpful in lane keeping for a self-driving car or could be used as an Advanced Driver Assistance System (ADAS).

Camera Calibration

Computing the camera matrix and distortion coefficients

Cameras contain internal distortion to adjust the field of view which warps the actual image. Straight lines in the actual world do not look straight in the camera image. For stereo applications, these distortions need to be corrected first. To find all these parameters, we provide some sample images of a well defined pattern (eg, chess board). We find some specific points in it ( square corners in chess board). We know its coordinates in real world space and we know its coordinates in image. With these data, we can get the distortion coefficients with the help of some mathemtical equations. An in depth discussion on camera calibration can be found here.

I used 17 images of the chessboard pattern for finding the calibration and distortion coefficients. I started by preparing object points, which were the (x, y, z) coordinates of the chessboard corners in the world. I assumed that the chessboard was fixed on the (x, y) plane at z=0, such that the object points were the same for each calibration image. The variable objp is just a replicated array of coordinates, and objpoints were appended with a copy of it every time I successfully detected all chessboard corners in a test image. imgpoints was appended with the (x, y) pixel position of each of the corners in the image plane with each successful chessboard detection.

I then used the output objpoints and imgpoints to compute the camera calibration and distortion coefficients using the cv2.calibrateCamera() function. I applied this distortion correction to the test image using the cv2.undistort() function and obtained this result:

alt text

The code for this step is given below:

def cal_undistort(img, gray, objpoints, imgpoints, plot=True):
    # Calibrate the camera
    ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

    # undistort the image
    undist = cv2.undistort(img, mtx, dist, None, mtx)
    
    if plot:
        f, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
        f.tight_layout()
        ax1.imshow(img)
        ax1.set_title('Original Image', fontsize=30)
        ax2.imshow(undist)
        ax2.set_title('Undistorted Image', fontsize=30)
        plt.subplots_adjust(left=0., right=1, top=0.9, bottom=0.)
        plt.savefig('./examples/undist_chess_board.png')
    
    return mtx, dist

test_chess = mpimg.imread('camera_cal/calibration2.jpg')
gray_test_chess = cv2.cvtColor(test_chess, cv2.COLOR_RGB2GRAY)
mtx, dist = cal_undistort(test_chess, gray_test_chess, objpoints, imgpoints, plot=True)

Pipeline for a single image

Distortion correction

To remove this distortion we make use of the distortion coefficients obtained from the camera matrix. After distortion correction the output obtained is shown below:

alt text

Thresholded binary image

I used a combination of color and gradient thresholds to generate a binary image . After a lot of trial with different color spaces I found that the L Channel from the LUV color space, with a minimim threshold of 225 and a maximum threshold of 255, did a very good job of idetifying the white lane lines while ignoring the yellow lines.

The B channel from the Lab color space, with a minimum threshold of 155 and a maximum threshold of 200, did a better job than the S channel in identifying the yellow lines and ignored the white lines.

I applied the gradient thresholds in the x direction, y direction but they did not add much information and hence I did not include them in my pipeline.

def abs_sobel_thresh(img, orient, sx_thresh=(0, 200)):
    # Apply the following steps to img
    # 1) Convert to grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
    # 2) Take the derivative in x or y given orient = 'x' or 'y'
    if orient == 'x':
        sobel = cv2.Sobel(gray, cv2.CV_64F, 1, 0)
    elif orient == 'y':
        sobel = cv2.Sobel(gray, cv2.CV_64F, 0, 1)
    # 3) Take the absolute value of the derivative or gradient
    abs_sobel = np.absolute(sobel)
    # 4) Scale to 8-bit (0 - 255) then convert to type = np.uint8
    scaled_sobel = np.uint8(255*abs_sobel/np.max(abs_sobel))
    # 5) Create a mask of 1's where the scaled gradient magnitude 
            # is > thresh_min and < thresh_max
    binary_output = np.zeros_like(scaled_sobel)
    # 6) Return this mask as your binary_output image
    binary_output[(scaled_sobel >= sx_thresh[0])&(scaled_sobel <= sx_thresh[1])] = 1
    return binary_output


def dir_threshold(img, sobel_kernel=3, sdir_thresh=(0, np.pi/2)):
    # Apply the following steps to img
    # 1) Convert to grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
    # 2) Take the gradient in x and y separately
    sobelx = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=sobel_kernel)
    sobely = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=sobel_kernel)
    # 3) Take the absolute value of the x and y gradients
    abs_sobelx, abs_sobely = np.absolute(sobelx), np.absolute(sobely)
    # 4) Use np.arctan2(abs_sobely, abs_sobelx) to calculate the direction of the gradient
    dirn = np.arctan2(abs_sobely, abs_sobelx) 
    # 5) Create a binary mask where direction thresholds are met
    binary_output = np.zeros_like(dirn)
    # 6) Return this mask as your binary_output image
    binary_output[(dirn >= sdir_thresh[0])&(dirn <= sdir_thresh[1])] = 1
    return binary_output


# Color transforms, gradients to create  thresholded binary image
def threshold_binary(img, plot=True):
    #s_channel = cv2.cvtColor(img, cv2.COLOR_RGB2HLS)[:,:,2]
    
    l_channel = cv2.cvtColor(img, cv2.COLOR_RGB2LUV)[:,:,0]

    b_channel = cv2.cvtColor(img, cv2.COLOR_RGB2Lab)[:,:,2]   

    # Threshold color channel
    b_thresh_min = 155
    b_thresh_max = 200
    b_binary = np.zeros_like(b_channel)
    b_binary[(b_channel >= b_thresh_min) & (b_channel <= b_thresh_max)] = 1
    
    l_thresh_min = 225
    l_thresh_max = 255
    l_binary = np.zeros_like(l_channel)
    l_binary[(l_channel >= l_thresh_min) & (l_channel <= l_thresh_max)] = 1
    
    combined_binary = np.zeros_like(b_binary)
    combined_binary[(l_binary == 1) | (b_binary == 1)] = 1
    
    kernel = np.ones((3,3),np.uint8)
    dilated_combined_binary = cv2.dilate(combined_binary,kernel,iterations = 1)
    
    if plot:
        # Plot the result
        f, (ax1, ax2) = plt.subplots(1,2, figsize=(15, 5))
        f.tight_layout()

        ax1.imshow(img)
        ax1.set_title('Original image', fontsize=20)

        ax2.imshow(dilated_combined_binary, cmap = 'gray')
        ax2.set_title('Combined S channel and gradient thresholds', fontsize=20)
        plt.subplots_adjust(left=0., right=1, top=0.9, bottom=0.)
        plt.savefig('examples/thresholded_binary_image.png')
    #return dilation
    return dilated_combined_binary

test_images = glob.glob('test_images/straight_lines1.jpg')

for img_num in test_images:
    test_img = mpimg.imread(img_num)
    undistorted = cv2.undistort(test_img, mtx, dist, None, mtx)
    combined = threshold_binary(undistorted)

Here’s an example of my output for this step.

alt text

Perspective transform

The code for my perspective transform includes a function called warp(). The warp() function takes as inputs an image (img), and the source (src) and destination (dst) points are set inside the function.

def warp(img, plot=True):
    img_size = (img.shape[1], img.shape[0])   
    src = np.float32([[(img_size[0] / 2) - 55, img_size[1] / 2 + 100],
                     [((img_size[0] / 5) - 50), img_size[1]],
                     [(img_size[0] * 5 / 6) + 35, img_size[1]],
                     [(img_size[0] / 2 + 55), img_size[1] / 2 + 100]])
    
    dst = np.float32([[(img_size[0] / 4), 0],
                     [(img_size[0] / 4), img_size[1]],
                     [(img_size[0] * 3 / 4), img_size[1]],
                     [(img_size[0] * 3 / 4), 0]])
    
    # Compute the perspective transform, M
    M = cv2.getPerspectiveTransform(src, dst)
    
    M_inv = cv2.getPerspectiveTransform(dst, src)
    
    # Create warped image
    warped = cv2.warpPerspective(img, M, img_size, flags=cv2.INTER_LINEAR)
    
    warped_copy, im_copy = np.copy(warped), np.copy(img)
    
    # Plotting
    if plot:
        f, (ax1, ax2) = plt.subplots(1, 2, figsize=(24, 9))
        f.tight_layout()
        pts = np.array([[(img_size[0] / 2) - 55, img_size[1] / 2 + 100],
                     [((img_size[0] / 5)) - 50, img_size[1]],
                     [(img_size[0] * 5 / 6) + 35, img_size[1]],
                     [(img_size[0] / 2 + 55), img_size[1] / 2 + 100]], np.int32)
        pts = pts.reshape((-1,1,2))
        cv2.polylines(im_copy, [pts], True, (255,0,0), 5)
        ax1.imshow(im_copy)
        ax1.set_title('Source Image', fontsize=30)
        cv2.rectangle(warped_copy,(img_size[0] // 4, 0),((img_size[0] * 3 )// 4, img_size[1]),(255,0,0),5)
        ax2.imshow(warped_copy)
        ax2.set_title('Warped Image', fontsize=30)
        plt.subplots_adjust(left=0., right=1, top=0.9, bottom=0.)
        plt.savefig('examples/warped_image.png')
        
    
    return warped, M_inv

warped_image, M_inv = warp(undistorted, plot=True)

I chose to hardcode the source and destination points in the following manner:

src = np.float32([[(img_size[0] / 2) - 55, img_size[1] / 2 + 100],
                 [((img_size[0] / 5) - 50), img_size[1]],
                 [(img_size[0] * 5 / 6) + 35, img_size[1]],
                 [(img_size[0] / 2 + 55), img_size[1] / 2 + 100]])
    
dst = np.float32([[(img_size[0] / 4), 0],
                 [(img_size[0] / 4), img_size[1]],
                 [(img_size[0] * 3 / 4), img_size[1]],
                 [(img_size[0] * 3 / 4), 0]])

This resulted in the following source and destination points:

Source Destination
585, 460 320, 0
206, 720 320, 720
1101, 720 960, 720
695, 460 960, 0

I verified that my perspective transform was working as expected by drawing the src and dst points onto a test image and its warped counterpart to verify that the lines appear parallel in the warped image.

alt text

Fitting the positions of lane lines with a polynomial

I obtained a perspective transform of the the camera images converted into binary images using the warp() function. The lanes lines can be easily separated as shown in the figure below.

alt text

Then I did the following steps to draw the lane-lines:

  • Plot the histogram of the binary warped image which gives a plot of the frquency distribution of the non-zero pixels in the image. The histogram provides information regarding the position of the lane-lines. The x and y position where the lane lines are present are identified and stored in the leftx_base and rightx_base.
import numpy as np
histogram = np.sum(binary_warped[binary_warped.shape[0]//2:,:], axis=0)
plt.plot(histogram)
plt.savefig('examples/histogram.png')

alt text

  • Follow a sliding window search strategy to identify the x and y positions of all non-zero pixels in the image in each window and search within a margin around the leftx_base and rightx_base positions. 9 sliding windows are used to search along the height of the image.
def sliding_window_search(binary_warped, plot=True):
    # create the output image for visualization
    out_img = np.dstack((binary_warped, binary_warped, binary_warped))*255
    histogram = np.sum(binary_warped[binary_warped.shape[0]//2:,:], axis=0)
    # Find the peak of the left and right halves of the histogram
    # These will be the starting point for the left and right lines
    midpoint = np.int(histogram.shape[0]/2)
    leftx_base = np.argmax(histogram[:midpoint])
    rightx_base = np.argmax(histogram[midpoint:]) + midpoint

    # Choose the number of sliding windows
    nwindows = 9
    # Set height of windows
    window_height = np.int(binary_warped.shape[0]/nwindows)
    # Identify the x and y positions of all nonzero pixels in the image
    nonzero = binary_warped.nonzero()
    nonzeroy = np.array(nonzero[0])
    nonzerox = np.array(nonzero[1])
    # Current positions to be updated for each window
    leftx_current = leftx_base
    rightx_current = rightx_base
    # Set the width of the windows +/- margin
    margin = 100
    # Set minimum number of pixels found to recenter window
    minpix = 50
    # Create empty lists to receive left and right lane pixel indices
    left_lane_inds = []
    right_lane_inds = []

    # Step through the windows one by one
    for window in range(nwindows):
        # Identify window boundaries in x and y (and right and left)
        win_y_low = binary_warped.shape[0] - (window+1)*window_height
        win_y_high = binary_warped.shape[0] - window*window_height
        win_xleft_low = leftx_current - margin
        win_xleft_high = leftx_current + margin
        win_xright_low = rightx_current - margin
        win_xright_high = rightx_current + margin
        ## Draw the windows on the visualization image
        if plot:
            cv2.rectangle(out_img,(win_xleft_low,win_y_low),(win_xleft_high,win_y_high),(0,255,0), 4) 
            cv2.rectangle(out_img,(win_xright_low,win_y_low),(win_xright_high,win_y_high),(0,255,0), 4) 
        # Identify the nonzero pixels in x and y within the window
        good_left_inds = ((nonzeroy >= win_y_low) & (nonzeroy < win_y_high) & (nonzerox >= win_xleft_low) & (nonzerox < win_xleft_high)).nonzero()[0]
        good_right_inds = ((nonzeroy >= win_y_low) & (nonzeroy < win_y_high) & (nonzerox >= win_xright_low) & (nonzerox < win_xright_high)).nonzero()[0]
        # Append these indices to the lists
        left_lane_inds.append(good_left_inds)
        right_lane_inds.append(good_right_inds)
        # If you found > minpix pixels, recenter next window on their mean position
        if len(good_left_inds) > minpix:
            leftx_current = np.int(np.mean(nonzerox[good_left_inds]))
        if len(good_right_inds) > minpix:        
            rightx_current = np.int(np.mean(nonzerox[good_right_inds]))

    # Concatenate the arrays of indices
    left_lane_inds = np.concatenate(left_lane_inds)
    right_lane_inds = np.concatenate(right_lane_inds)

    # Extract left and right line pixel positions
    leftx = nonzerox[left_lane_inds]
    lefty = nonzeroy[left_lane_inds] 
    rightx = nonzerox[right_lane_inds]
    righty = nonzeroy[right_lane_inds] 
    
    # Fit a second order polynomial to each
    left_fit = np.polyfit(lefty, leftx, 2)
    right_fit = np.polyfit(righty, rightx, 2)
    
    if plot:
        # Generate x and y values for plotting
        ploty = np.linspace(0, binary_warped.shape[0]-1, binary_warped.shape[0] )
        left_fitx = left_fit[0]*ploty**2 + left_fit[1]*ploty + left_fit[2]
        right_fitx = right_fit[0]*ploty**2 + right_fit[1]*ploty + right_fit[2]

        out_img[lefty, leftx] = [255, 0, 0]
        out_img[righty, rightx] = [0, 0, 255]
        plt.imshow(out_img)

        plt.plot(left_fitx, ploty, color='yellow')
        plt.plot(right_fitx, ploty, color='yellow')
        plt.xlim(0, 1280)
        plt.ylim(720, 0)
        plt.savefig('examples/sliding_windows.png')
        
    return left_fit, right_fit

left_fit, right_fit = sliding_window_search(binary_warped, plot=True)

alt text

  • A margin search is used to apply the sliding window within a margin of +/- 100 pixels around the leftx_base and rightx_base found in the particular sliding window. This assumption is practical since lane lines are not found randomly across the image. I stored the positions of the left lane line points and the right lane points and fit them using a 2nd degree polynomial. This is achieved inside the margin_search() function.
def margin_search(binary_warped, left_fit, right_fit, plot=True):
    nonzero = binary_warped.nonzero()
    nonzeroy = np.array(nonzero[0])
    nonzerox = np.array(nonzero[1])
    margin = 100
    left_lane_inds = ((nonzerox > (left_fit[0]*(nonzeroy**2) + left_fit[1]*nonzeroy + left_fit[2] - margin)) & (nonzerox < (left_fit[0]*(nonzeroy**2) + left_fit[1]*nonzeroy + left_fit[2] + margin))) 
    right_lane_inds = ((nonzerox > (right_fit[0]*(nonzeroy**2) + right_fit[1]*nonzeroy + right_fit[2] - margin)) & (nonzerox < (right_fit[0]*(nonzeroy**2) + right_fit[1]*nonzeroy + right_fit[2] + margin)))  

    # Again, extract left and right line pixel positions
    leftx = nonzerox[left_lane_inds]
    lefty = nonzeroy[left_lane_inds] 
    rightx = nonzerox[right_lane_inds]
    righty = nonzeroy[right_lane_inds]
    # Fit a second order polynomial to each
    left_fit = np.polyfit(lefty, leftx, 2)
    right_fit = np.polyfit(righty, rightx, 2)
    # Generate x and y values for plotting
    ploty = np.linspace(0, binary_warped.shape[0]-1, binary_warped.shape[0] )
    left_fitx = left_fit[0]*ploty**2 + left_fit[1]*ploty + left_fit[2]
    right_fitx = right_fit[0]*ploty**2 + right_fit[1]*ploty + right_fit[2]
    
    if plot:
        # Create an image to draw on and an image to show the selection window
        out_img = np.dstack((binary_warped, binary_warped, binary_warped))*255
        window_img = np.zeros_like(out_img)
        # Color in left and right line pixels
        out_img[lefty, leftx] = [255, 0, 0] # Red
        out_img[righty, rightx] = [0, 0, 255] # Blue

        # Generate a polygon to illustrate the search window area
        # And recast the x and y points into usable format for cv2.fillPoly()
        left_line_window1 = np.array([np.transpose(np.vstack([left_fitx-margin, ploty]))])
        left_line_window2 = np.array([np.flipud(np.transpose(np.vstack([left_fitx+margin, ploty])))])
        left_line_pts = np.hstack((left_line_window1, left_line_window2))
        right_line_window1 = np.array([np.transpose(np.vstack([right_fitx-margin, ploty]))])
        right_line_window2 = np.array([np.flipud(np.transpose(np.vstack([right_fitx+margin, ploty])))])
        right_line_pts = np.hstack((right_line_window1, right_line_window2))

        # Draw the lane onto the warped blank image
        cv2.fillPoly(window_img, np.int_([left_line_pts]), (0,255, 0))
        cv2.fillPoly(window_img, np.int_([right_line_pts]), (0,255, 0))
        result = cv2.addWeighted(out_img, 1, window_img, 0.3, 0)

        plt.imshow(result)
        plt.plot(left_fitx, ploty, color='yellow')
        plt.plot(right_fitx, ploty, color='yellow')
        plt.xlim(0, 1280)
        plt.ylim(720, 0)
        plt.savefig('examples/margin_search.png')
        
    return left_fit, right_fit

left_fit, right_fit = margin_search(binary_warped, left_fit, right_fit, plot=True)

alt text

Calculating the radius of curvature of the lane

For calculating the radius of curvature it is necessary to find a relationship between the distance between 2 pixels in the image and the actual distance between 2 points on the road. Assuming the meters per pixel in the y direction and x direction to be 30/720 and 3.7/700 respectively the value of the lane line points on the road was obtained. The points were fit into a 2nd degree polynomial and the radius of curvature was calculated using the formula mentioned here.

def radius_curvature(ploty, left_fitx, right_fitx):
    # Define y-value where we want radius of curvature
    # I'll choose the maximum y-value, corresponding to the bottom of the image
    y_eval = np.max(ploty)
    
    # Define conversions in x and y from pixels space to meters
    ym_per_pix = 30/720 # meters per pixel in y dimension
    xm_per_pix = 3.7/700 # meters per pixel in x dimension

    # Fit new polynomials to x,y in world space
    left_fit_cr = np.polyfit(ploty*ym_per_pix, left_fitx*xm_per_pix, 2)
    right_fit_cr = np.polyfit(ploty*ym_per_pix, right_fitx*xm_per_pix, 2)
    
    # Calculate the new radii of curvature
    left_curverad = ((1 + (2*left_fit_cr[0]*y_eval*ym_per_pix + left_fit_cr[1])**2)**1.5) / np.absolute(2*left_fit_cr[0])
    right_curverad = ((1 + (2*right_fit_cr[0]*y_eval*ym_per_pix + right_fit_cr[1])**2)**1.5) / np.absolute(2*right_fit_cr[0])
    return left_curverad, right_curverad, left_fit_cr, right_fit_cr

Calculating the position of the vehicle with respect to center

The position of the vehicle with respect to the center was calculated by subtracting the mid-point of the width of image from the mid-point of the x-values of the left lane and right lane. This is achived in the center_offset() function.

def center_offset(left_fitx, right_fitx):
    #print(len(rightx), len(leftx))
    # Calculate the position of the vehicle
    lane_center = (right_fitx[719] + left_fitx[719])/2
    xm_per_pix = 3.7/700 # meters per pixel in x dimension
    center_offset_pixels = abs(640 - lane_center)
    center_offset_m = xm_per_pix*center_offset_pixels
    return center_offset_m

Sample output

The region between the 2 lane lines was colored in green so that the lane area was identified clearly. Here is an example of my result on the test image:

alt text

Here is the output on a few more test images:

alt text

Pipeline for the video

Here’s a sample implementation of the pipeline run on the project video.

Here’s the link to my video result on YouTube.

Limitations of the pipeline

I faced a few issues when I hadn’t averaged the lane lines obtained over a few frames. In a few cases the lane lines jumped around a bit and didn’t look stable. After averaging over 10 frames, the lane lines looked fairly stable.

My pipeline would suffer if there is occlusion on the lane lines, that is if a car in the front moves on the lane lines. Also, it would be difficult to track the lanes if the car changes lanes. As I figured out my pipeline fails in the harder challenge video where the lane lines have a lot more bends. The assumption of fitting the lane lines using a quadratic polynomial is not robust here. My pipeline also suffers when there are shadows on the road or there is bright sunlight on the road as it can’t distinguish between the white lane lines and white reflection from the road. To summarize I have learned that it is relatively easy to finetune a pipeline to work well in ideal road and weather conditions, but it is really challenging to find single combination which works robustly in any condition.

I feel that using a 2 stream ConvNet suitable for videos, where one stream identifies the lane lines and the other stream takes advantage of the temporal information could be useful. Also, since videos are sequential data Recurrent Neural Networks could be used as they work well in sequence analysis. I would love to continue working on the project and explore these techniques.

Related Posts

Facial Recognition using PCA

Behavioral Cloning for self-driving cars

Traffic Sign Classification

Finding Lane Lines on the Road