With the recent launch of the self driving cars and trucks, the field of autonomous navigation has never been more exciting. What were once research projects in laboratories are now commercially available products. One of the main tasks that any such vehicle must perform well is the task of following the rules of the road. Identifying the traffic lights in the midst of everything is the one of the most important tasks. Thankfully due to the recent advancements in Deep Learning and the ease of use of different Deep Learning Frameworks like Caffe and TensorFlow that can utilize the immense power of GPUs to speed up the computations, this task has become really simple. In this article I will show how anyone can train their own model for the purposes of Traffic Light Detection and Classification using the openly available data-sets and tools. I used the Udacity’s openly available data-sets. Udacity’s Self Driving Car Engineer Nanodegree provides a simulator and some ROS bag files. The model that I have developed was a part of the final capstone project submission in which we need to first pass the tests on the simulator and then pass the test by driving around an actual track on a real vehicle.
Step 1: Gather the data
As with any machine learning exercise, we first need to gather our data on which we will train the model. The simulator images look something like this:
Data the simulator’s camera captures
While the actual images from the track look something like this:
Step 2: Label and annotate the images
The next step is to manually annotate the images for the network. There are many open source tools available for this like LabelImg, Sloth, etc. The annotation tools create a yaml file that looks something like this:
Output after manual image annotations
This step was the most time consuming of all. When I started, it took me almost 3 hours to understand the working of the tools, install the dependencies and then annotate the simulator data-set. Luckily, one of the main advantages of the Nano-degree is the immense amount of support that you get from discussion with your peers from around the world. One of my peers, Anthony Sarkis has graciously made his annotated data-set openly available for all to use. Thank you Anthony Sarkis for this 🙂
Step 3: Training the Model
For training the model with the API, we first need to convert our data into the TFRecord format. This format basically takes your images and the yaml file of annotations and combines them into one that can be given as input for training. The starter code is provided on the tensorflow’s Github page.
Next we need to setup an object detection pipeline. TensorFlow team also provides sample config files on their repo. For my training, I used two models, ssd_inception_v2_coco and faster_rcnn_resnet101_coco. These models can be downloaded from here.
I needed to adjust the
num_classes to 4 and also set the path (
PATH_TO_BE_CONFIGURED) for the model checkpoint, the train and test data files as well as the label map. I also reduced the number of region proposals from the author’s original suggestion of 300, to 10 for faster_rcnn and from 100 to 50 for ssd_inception. In terms of other configurations like the learning rate, batch size and many more, I used their default settings. (Note: the
second_stage_batch_size must be less than or equal to the
max_total_detections so I reduced that to 10 as well for faster_rcnn else it will throw an error.)
data_augmentation_option is very interesting if your dataset doesn’t have much of variability like different scale, pose etc. A full list of options can be found here (see
We also need to create a label map for each class. Example of how to create label maps can be found here. For my case it looked something like this:
So in the end, we need the following things before going to the next step:
- COCO pre-trained network models
- The TFRecord files we created earlier
- The label_map file with our classes
- The image data-set
- The TensorFlow model API
Next steps are pretty straightforward. You need access to a GPU for training. I used the AWS p2.xlarge with the udacity-carnd-advanced-deep-learning AMI which has all the dependencies like TensorFlow and Anaconda installed. I trained in total 4 different models — two models with faster-rcnn (one each for simulator images and real images) and two with ssd_inception.
The output of the model inference looks something like this:
Detection on the simulator images
Detection on the real images
The detection and classifications were really good with both the models, though the ssd_inception trained model made a few minor errors like the one in the below image which was correctly classified by the faster_rcnn model.
Wrong Classification by SSD Inception model
However, the plus point of the ssd_incpetion model was that it ran almost 3 times faster than the faster_rcnn model on simulator and almost 5–6 times faster on the real images.
You can see my code and the results on my GitHub repository. It also contains the link to the data-sets and the annotations.
Good luck with your own models 🙂