ENGN 2560 - Projects

 

ENGN2560 - Computer Vision

Projects


01 - Building a Security Surveillance System with Point-Tilt-Zoom (PTZ) Cameras

In this project, a student will be responsible from putting together a surveillance system, which detects motion in a scene and then zooms in and focuses on the motion with a secondary camera to get better-quality, higher-resolution image of the target. A proof of concept exists with stationary cameras and basic background modeling method based on mixture of gaussians model. The student will improve this with a more professional system, replacing stationary cameras with PTZ cameras and/or making improvements in the motion detection stage.

Reference documents

Project source code

Initial Presentation

Mid-Project Presentation

Final Presentation

 

02 - PatchMatch-based Content Completion

PatchMatch is a fast algorithm for computing dense approximate nearest neighbor correspondences between patches of two image regions. It was first proposed to be used in interactive image editing operations such as image retargeting, image completion and image reshuffling. An approach that builds on the original PatchMatch algorithm does content completion in stereo RGB-D image pairs and won best paper award in 3DimPVT 2012.

Source code exists for both the original and generalized PatchMatch algorithms. A student can use these to implement the aforementioned paper, find online or capture RGB-D stereo pairs and test the implementation. Partial or complete source code could be located for the algorithm used in the paper, in which case the focus can shift from implementation to comparative evaluation using other content completion methods.

Reference papers

PatchMatch Algorithm Webpage

Generalized PatchMatch Algorithm Webpage

Project source code

Initial Presentation

Mid-Project Presentation

Final Presentation

 

03 - Evaluation of 3D Reconstruction Algorithms Using an Image Sequence with Dense Point Correspondence Ground Truth

Quantitative evaluation of 3D reconstruction algorithms is a challenge for multiple reasons. Existing approaches typically utilize laser scanners to capture a ground truth 3D reconstruction, which can then be used to register query reconstructions and see which points are captured and which ones are missing. The fundamental problem with this approach is the ambiguity of point-to-point correspondences between the two reconstructions - in other words, the reliability of the evaluation depends a lot on the accuracy of the registration process that came beforehand.

The goal of this project is to test a newly-formulated approach for evaluating 3d reconstructions using dense point correspondence data on the image sequence. A student will be given this data, the details of the evaluation procedure and source code/binaries for as many 3d reconstruction algorithms as possible. The goal, then, is to evaluate all these algorithms and comment on the reliability of the evaluation scheme, making modifications to it whenever necessary.

Reference papers

 

04 - Photometric Stereo Using Spherical Harmonic Representation of Lighting

The goal of this project is to implement the photometric stereo approach developed by Basri et. al., which builds on their own general illumination model based on spherical harmonics. Any function defined on a sphere can be analyzed in the frequency domain, which is analogous to Fourier analysis with similar implications, such as the conclusion that any function defined on a sphere can be written as a linear combination of some harmonics (spherical harmonics in this case). Using this, the paper argues that general illumination can be thought of as a function on a sphere, with the scene being illuminated being at the center of this sphere. As long as the surfaces in the scene are Lambertian, only 9 spherical harmonic coefficients are needed to model a general illumination with over 99% accuracy.

This model can be used in many computer vision applications, such as object detection and recognition. Here we are primarily interested in the photometric stereo for general, unknown illumination. A student will be given all the relevant source code available and will be asked to put together a photometric stereo system using this harmonic representation of lighting.

Reference papers

 

05 - Visual Odometry

Visual odometry is the process of estimating the trajectory of a moving object/person by analyzing the images taken by onboard camera(s). The intuition is that motion induces structural changes in the sequence of images captured by moving cameras, and that this cue is especially strong in continuous camera motion. 3D structure of the scenes that the cameras pass through can also be inferred but is a secondary goal. Camera motion is typically estimated by detecting and matching features in input frames and then using these constraints to solve for camera motion between two frames. 

The goal of this project is to put together a visual odometry system that estimates motion trajectory with reasonable accuracy. The student is expected to make informed choices regarding each stage in the pipeline, i.e., image matching algorithm to be used, number of cameras to be mounted, camera calibration method to be used etc.

Reference papers

Visual odometry tutorial

LIBVISO2: Cross-platform, C++ library for visual odometry

Photoconsistency-based visual odometry library

An implementation of monocular odometry using a regular webcam


06 - Inverted Index Compression for Scalable Image Matching

Image matching is a crucial problem in many computer vision applications, such as image search, image retrieval and visual odometry. Any algorithm that tackles this problem should be useful in the context of millions of images to be useful in a real-life application, therefore the emphasis is often on the scalability of such algorithms. The goal of this project is to implement and evaluate the DCC 2010 paper by Chen et. al., which focuses on compressing the inverted index of a vocabulary tree in order to reduce memory usage in large-scale image matching tasks. Aside from implementation and evaluation, an important component to the project will be obtaining a ground truth dataset that is suitable for testing the scalability of an image matching algorithm.

Reference paper

The Stanford Mobile Visual Search Data Set -> Reference paper

Project source code

Initial Presentation

Mid-Project Presentation

Final Presentation

Link for BlindFind Git


07 - "Improved" SIFT Descriptor

Over the years, SIFT has proved to be a very robust keypoint detector/descriptor and has been useful in many applications in computer vision such as object recognition, camera calibration, 3D modeling, video tracking etc. This project aims to explore potential improvements to this descriptor by considering alternate uses of the image intensities and intensity gradients. More concretely, SIFT classifies keypoints by looking for similarities in the weighted histograms of ∇I / |∇I| around each detected keypoint, and the goal here is to experiment with alternative ways to construct such an image gradient-based histogram, while keeping the rest of the SIFT machinery intact.

The starting point is going to be a past class project that explored the idea of replacing ∇I / |∇I| with |∇I| / I in an attempt to construct a descriptor that takes into account image intensities as well as the gradient. This new descriptor was tested for robustness under rotation, viewpoint changes and illumination changes. It was also used in the context of image classification and the results were shown to be comparable to the original SIFT. A student is expected to take over this project, and i) experiment with different histograms and suggest improvements to the new descriptor and ii) expand the evaluation of this new descriptor to scale changes, image blur, image compression etc. Well-known keypoint evaluation methods have been included among the reference papers.

Reference papers

Reference documents

Project source code

Initial Presentation

Mid-Project Presentation

Final Presentation

 

08 - Chains Model for Detecting Object Parts by Their Context

A project that consists of the implementation of the named paper, followed by qualitative evaluation of the results. The student is encouraged to try and locate existing source code and/or binaries for this project. If the student ends up implementing a substantial portion of the pipeline, then a small-scale qualitative evaluation will be sufficient. In the case of existing source code, the student is required to perform a more thorough quantitative evaluation involving multiple datasets, comparison of results against competing methods, ROC/PR curves etc.

Reference papers

Project source code

Initial Presentation

Mid-Project Presentation

Final Presentation

 

09 - Using Linking Features in Learning Non-Parametric Part Models

A project that consists of the implementation of the named paper, followed by qualitative evaluation of the results. The student is encouraged to try and locate existing source code and/or binaries for this project. If the student ends up implementing a substantial portion of the pipeline, then a small-scale qualitative evaluation will be sufficient. In the case of existing source code, the student is required to perform a more thorough quantitative evaluation involving multiple datasets, comparison of results against competing methods, ROC/PR curves etc.

Reference papers

Project source code

Initial Presentation

Mid-Project Presentation

Final Presentation

 

10 - Shape Matching and Classification Using Height Functions

A project that consists of the implementation of the named paper, followed by qualitative evaluation of the results. The student is encouraged to try and locate existing source code and/or binaries for this project. If the student ends up implementing a substantial portion of the pipeline, then a small-scale qualitative evaluation will be sufficient. In the case of existing source code, the student is required to perform a more thorough quantitative evaluation involving multiple datasets, comparison of results against competing methods, ROC/PR curves etc.

Reference papers

Project source code

Initial Presentation

Mid-Project Presentation

Final Presentation

 

11 - Learning to Match Images in Large Scale Collections

A project that consists of the implementation of the named paper, followed by qualitative evaluation of the results. The student is encouraged to try and locate existing source code and/or binaries for this project. If the student ends up implementing a substantial portion of the pipeline, then a small-scale qualitative evaluation will be sufficient. In the case of existing source code, the student is required to perform a more thorough quantitative evaluation involving multiple datasets, comparison of results against competing methods, ROC/PR curves etc.

Reference papers

Project source code

Initial Presentation

Mid-Project Presentation

Final Presentation

---