Lab for intelligent & safe automobiles, UC San Diego, California

A collaboration between University of California, San Diego, USA and Aalborg University, Denmark

logo-ucsd
vap

Our collaboration

It all started back in 1995 when Henrik Birk, Jens Sørensen and Thomas Moeslund from Aalborg University visited Mohan Trivedi’s lab in Knoxville, Tennessee. During the stay Mohan moved to UCSD to start a new lab and asked whether the three Danes wanted to come along. Knoxville is a very nice place, but the opportunity to visit California quickly convinced the three Danes to pack their bags. They actually arrived in the lab before Mohan did and the Danes can therefore rightfully say that they founded the lab in San Diego.

After returning from San Diego, Thomas finished his master’s degree and started working at Aalborg University. He and Mohan stayed in contact, which opened the gateway for next generations of students from Aalborg University who wanted to do a study/research stay abroad. Below is a listing of the traveling students and their work that have emerged from the collaboration from multiple Master’s Thesis’ as well as Ph.D. work.

Traveling students

Jacob Dueholm
10th semester (Master thesis) - Spring 2016
Miklas S. Kristoffersen
10th semester (Master thesis) - Spring 2016
Mark P. Philipsen
10th semester (Master thesis) - Spring 2015
Morten B. Jensen
10th semester (Master thesis) - Spring 2015
Andreas Møgelmose
10th semester (Master thesis) - Spring 2012 and Ph.D. stay - Spring 2014
Carsten Høilund
10th semester (Master thesis) - Spring 2009
Dennis Hansen
10th semester (Master thesis) - Spring 2007
Poul Duizer
10th semester (Master thesis) - Spring 2007
Christian R. Andersen
10th semester (Master thesis) - Spring 2006
Claus R. Pedersen
10th semester (Master thesis) - Spring 2006
Preben Fihl
10th semester (Master thesis) - Spring 2005
Rasmus Colin
10th semester (Master thesis) - Spring 2005
Jens Pedersen
10th semester (Master thesis) - Spring 2004
Daniel Nørgaard
10th semester (Master thesis) - Spring 2004
Jens S. Sørensen
9th semester - Spring 1996
Henrik Birk
9th semester - Spring 1996
Thomas B. Moeslund
9th semester - Spring 1996

Projects

Surround Vehicle Analysis

Jacob Dueholm & Miklas S. Kristoffersen
Master's Thesis - Spring 2016

Keeping an overview of surrounding vehicles is a nearly impossible task using only human senses. This motivates automatic tracking of surrounding vehicles using cameras with use in both passive and active safety applications. This study presents the development of a novel framework for tracking vehicles in full surround using computer vision techniques. The framework consist of a vehicle detector, a modified tracker optimized for multiperspective tracking, and an association of tracks in real world coordinates to achieve consistent trajecotories in full surround. A vision-based dataset is collected using four GoPro cameras with more than 4000 annotated vehicles which is used in the evaluation of the detector, tracker, and multi-perspective tracker. A trajectory dataset is collected from 50 sequences using the developed framework on which a trajectory analysis is made using machine learning to demonstrate its uses in naturalistic driving studies and advanced driver assistance systems.

Computer Vision at Intersections: Explorations in driver assistance systems and data reduction for naturalistic driving studies

Mark P. Philipsen & Morten B. Jensen
Master's Thesis - Spring 2015
This report constitutes a long master thesis in Vision, Graphics, and Interactive Systems. It details the work done during almost two semesters abroad at UC San Diego. The work has been research oriented, the report is therefore structured in separate parts in contrast to a standard linear product development flow. The scope of the research has primary been traffic light detection, but also the development of a vehicle detector at intersection used for naturalistic driving studies. A comprehensive survey of traffic light recognition (TLR) systems inducing both academic and industrial research has been done. From this it is clear that TLR research lack challenging public datasets and standardized evaluation methodology. Thus, the largest public dataset in the world with around 110,000 annotated traffic lights has been created and made available. Finally a comparative analysis of a learning-based versus heuristic model-based approaches in relation to traffic light detection is done.

Traffic Sign Detection

Andreas Møgelmose
Ph.D. research stay - Spring 2014

Traffic Sign Detection using Computer Vision

Andreas Møgelmose
Master's Thesis - Spring 2012

This report is a master thesis in Vision, Graphics, and Interactive Systems. It details the work done during two semesters abroad at UC San Diego. The work has been research oriented, so the report is structured with 5 separate chapters instead of a linear product development flow.The work has primarily been on US traffic sign detection, but includes a chapter on pedestrian detection as well. A comprehensive survey of traffic sign detection systems has been made and it shows a lack of work with US signs and a lack of public databases for those. Thus, a publicly available dataset with nearly 8000 annotated signs has been created. The dataset is unique, not only because it contains US signs, but also because it include videos. This report also details investigations of using synthetic training data for traffic sign detectors, but concludes that synthetic images are no match for real-world training images. A purely model based detection system based solely on shapes is also presented as a building block for a full detection system. Finally, a two-stage pedestrian detection system has been developed and documented. The system extends a prevous system and produces better detection with fewer false positives.

Free Space computation from Stochastic Occupancy Grids Based on Iconic Kalman Filtered Disparity Maps

Carsten Høilund
Master's Thesis - Spring 2009

Free space computation from stochastic occupancy grids based on iconic kalman filtered disparity maps:
The objective of this project is to determine the free space in a scene as viewed by a camera on a vehicle.
Free space is defined as the area where navigation without collision is possible. The starting point for determining this free space is disparity maps, obtained with a stereo camera. The accuracy and precision of the stereo camera is evaluated to determine the distribution of the noise in the measurements, necessary for filtering the disparity maps.
The disparity maps are filtered by an iconic Kalman filter, operating on each pixel individually. Applying ego-motion, the previous disparity map is predicted to correspond to the current disparity map. The two, ideally identical, disparity maps are merged by the Kalman filter yielding an optimal estimation of the true state, reducing variance, and increasing the density of the filtered disparity map.
The stochastic occupancy grids are calculated from these disparity maps, providing a top-down view of the scene where the uncertainty of disparity measurements are taken into account. A pixel from the disparity map can thus affect several cells with varying likelihood.
These occupancy grids are segmented to indicate a maximum depth free of obstacles, enabling the marking of free space in the accompanying intensity image.
The test shows successful marking of free space in the evaluated scenarios in addition to the significant improvement in the disparity map quality

Multi-View Video Surveillance of Outdoor Traffic

Dennis Hansen & Poul Duizer
Master's Thesis - Spring 2007

One of the goals for performing traffic monitoring is to avoid traffic accidents. It would not be feasible to use a human operator to monitor a traffic scene because accidents are rare events. There is a growing interest in automating this process, and visual surveillance systems are paid much attention due to their non-intrusive nature.
This thesis addresses the tracking issue, which is a cornerstone of all visual surveillance systems. The overall goal is to use the tracking information to detect potential traffic accidents before they occur. A requirement is thus that the system must be able to track both vehicles and humans reliably. There is only a limited amount of work reported on tracking of both vehicles and humans.
The developed system is a multi-view tracking system based on the planar homography. Foreground segmentation for each view is performed using the codebook method, which is capable of adapting to illumination changes. The tracking of objects is performed in each view using bounding box overlap, and occlusion situations are resolved by probabilistic appearance models. The following correspondence of tracks between views is carried out by combining and modifying prominent methods for humans and vehicles. In the human case, the principal axis method is extended to handle groups. In the vehicle case, the footage region is applied, and special attention has been put on solving occlusion situations.
Due to the use of multiple views and the correspondence of tracks it is possible to calculate an accurate view invariant representation of the objects. This representation is suitable for performing event recognition and assessment of the danger level of the situation. The goal of this is to detect an accident before it occurs, and an alarm is raised as a rest step of preventing the accident. The developed system is tested over several hours of unconstrained data on different times of the day, under different illuminations and different camera configurations. The system gives a solid foundation for tracking objects, and demonstrations using analysis based on the view invariant representation of objects show that the system is able to detect dangerous situations, e.g. near collisions between vehicles and humans.

Laser-based People Tracking

Christian R. Andersen & Claus R. Pedersen
Master's Thesis - Spring 2006

In the recent years the use of laser range scanners for tracking of objects have increased in popularity. It is especially in the areas of robot navigation and collision avoidance systems for the automobiles, that the laser scanners have shown their capabilities.
The topic of this thesis has considered some of the fundamental challenges within tracking of people from laser range systems. This included a segmentation, an underlying tracking and a tracking at a higher level with the use of multiple laser scanners.
For segmentation people, two different segmentation algorithms were combined for an optimal performance and the output were tracked using flow optimization techniques. This information were combined into a three dimensional model for tracking each person at a higher level with a Particle Filter. Tracking with the tree dimensional model could not be establish, which is considered to be malfunction in the implementation of the Particle Filter. Though, testing of the whole system has not been possible, the underlying tracking and the initialization of the model has been evaluated. The underlying tracking could robustly separate multiple persons, whereas the estimated pose and especially the gait of persons were less accurate.
The poor performance of the gait is considered to be related to the noise of the motion model and the input to this model. Also, the knee is not modeled in the motion model of the person, which is regarded as an important factor for the poor performance.”,”

Tracking of interacting people and their body parts for outdoor surveillance

Preben Fihl & Rasmus Colin
Master's Thesis - Spring 2005

All over the world the interest in surveillance of people have increased in the recent years due to the events such as large scale terror attacks. In the wake of this visual surveillance systems, and computer systems that can do automatic surveillance, have received much attention from the governments, industry, and the academic community. The topic of this thesis has been to deal with some of the fundamental challenges in visual surveillance, namely segmentation in dynamic outdoor environment and tracking of people in a detailed manner as these interact and occlude each other.
For the segmentation a multi-modal codebook background subtraction algorithm has been used as a basis, and been extended to adapt to illumination changes, as well as being able to add extra layers of background if objects are placed in the scene. For the tracking of people an algorithm is used that maintains a hierarchical body model by tracking blobs of similar color and spatially connected pixels. To support this modeling a silhouette based tracking is carried out to make a coarse estimation of the position of people.
Testing of the background subtraction algorithm on a ten hour sequence in an unconstrained outdoor environment has proven it to be able to adjust to the changes during a day and still make a good segmentation of the foreground. The tests of the tracking has shown that it is able to track people as these walk alone or have a mild occlusions. Under moderate and heavy occlusions it did not provide good results. The major problem has been identified to be related to the color appearance of people, since it in the unconstrained outdoor environment is difficult to get a good distinction between the different colors.

A Two-Level Head Pose Estimation Framework using Majority Voting of Gabor Wavelets and Bunch Graph Analysis

Daniel Nørgaard & Jens M. Pedersen
Master's Thesis - Spring 2004

In the futures todays active command based communication with computers will be replaced with a more natural way of communicating. Computers will learn to interpret the mood and intentions by analyzing faces in much the same way people do. That type of system requires advanced computer vision based sensing and interpretation techniques, which are still to be developed. Today, computer vision based systems are mainly being used in the industry, but other more intelligent applications are rapidly emerging.
The topic of this thesis makes up a corner stone in such systems, both those of todays and tomorrow, namely determining the focus of attention of a person. Or more specifically, determining the orientation of the head. Using the human visual system as a role model, a head pose estimation system is designed based on coarse-to-fine conceptual model.
First the face is segmented using a cascade of classifiers optimized through adaptive boosting. An appearance-based method is used for initial estimate, investigating both linear and non-linear feature space tranformations. The estimate is made using a nearest prototype classifier based on generic pose templates and a majority voting scheme. A refinement of the estimate is made using a feature-based method by optimizing a matching cost function for fitting pose specific face bunch graphs. For both methods the convolution with a filter-bank generated from a Gabor wavelet forms the features. The purpose of the system is to maximize the pose estimation accuracy.

Tracking Multiple Objects using Multiple Cues

Henrik Birk, Thomas B. Moeslund & Jens S. Sørensen
9th semester project, 1995-1996

In many computer vision systems there is a need for determining the motion of one or more objects in a sequence of 2D images using tracking. A traditional approach to the tracking problem in many systems has been to use a method based only on a single tracking cue.
This project concerns investigating the performance of a tracking system when multiple tracking cues are interacted with the purpose of tracking multiple objects. The two cues that are chosen and implemented are token matching and a correlation based template matching. The choice of the cues is made with the intention that disadvantages of the one cue is to be compensated by the advantage of the other cue, so in situations where the primary cue fails, the secondary cue has the ability to improve the performance.
The token matching is based on matching measured feature vectors with the predicted feature vectors, if that fails a template matching method takes over based on the normalized cross-correlation coefficient. A Kalman filter was chosen for the prediction of the feature vectors.
Results: The system has been implemented in ANSI C, and tested in situations where an interaction between the two implemented cues was needed. The system was able to track both single and multiple objects in the designed test situations with interactions between the two cues.

For more information, contact

Professor Thomas B. Moeslund
Aalborg University, Denmark
tbm@create.aau.dk