Last semester at MIT I took the class “6.111 – Introductory Digital Systems Lab”.  This class revolves almost entirely around using FPGAs to create digital systems, including a car alarm and a sound recorder/player. After all of the labs were complete, you had to complete a final project that took about a third of the semester to complete. (A full report can be found at the end.)

For my final project I chose, with my team partner Patrick Yang, to build a Hardware Based Image Perspective Correction System. This basically meant a system that allowed you to take an image of a document on a table and then correct the perspective of the image such that it appeared that you are looking at the document straight on.

An example of this can be seen in the following image:

Card
An example of perspective correction using the OpenCV library. Source.

The main difference between the above example and our system is that our system is entirely hardware based – it uses an FPGA to do all the processing – while this example is done in software on a computer with a CPU.

The idea for the system was this: When prompted by the user, the system would use an NTSC camera to capture an image of a document on the table. The system would then automatically detect the corners of the document in the image and highlight them with a cross. If the user was satisfied with the proposed corners, she could progress to the image correction stage, otherwise she could manually correct the corners before continuing. Finally, the system would take the image and corner locations and transform the image to produce an image on screen which would show the document from a direct perspective.

The system we produced very almost completed this specification. Unfortunately we had some difficulties regarding the automatic corner detection which I will expand upon after the video.

We successfully interfaced with the camera to capture an image. The manual corner detection and image transformation was also successfully implemented, as demonstrated in the video. Unfortunately we had difficulties with the automatic corner detection.

me in camera
We managed to interface with the camera and display early on

The automatic corner detection consisted of four main stages: A Gaussian blur, Canny edge detection, hough transform and lines-to-corners.

The Gaussian blur used a 5×5 Gaussian kernel with standard deviation of 2 and applied it to the luminosity component of each pixel. This produced quite a strong blur in the image, however this was necessary due to the high levels of noise in the image which had to be removed for effective edge detection. This also discarded the chrominance components of the image which were not necessary for the following stages of image processing.

The Canny edge detection stage took the blurred image and applied the Sobel operator to the image. This allowed the gradient of each pixel to be calculated and thresholded. Pixels of high gradient were highlighted, while pixels of low gradient were discarded. By selecting different thresholds, different strengths of edge could be detected.

The Hough transform was where we had most issues. This algorithm is used to resolve the most prominent lines in the image by finding which lines intersect with the most number of “edge pixels”. The problem arose due to the edges being very thick. As you can see in the right-hand image above, the edges around my hair and face are not one pixel thick. Therefore the Hough transform would select four parallel lines very close together around one of the edges of the document, instead of choosing the four lines representing the four edges of the document. This could’ve been fixed by performing a line thinning algorithm on the image before performing the Hough transform, however we only discovered this issue approximately 1 hour before the lab closed on the night of the deadline.

The lines-to-corners stage took the equation of the four lines found in the Hough transform stage and found the intersections which represented the four corners of the document. While this stage was functional, it never had valid data to operate upon as the Hough transform always selected four parallel lines along one edge of the document.

While this is frustrating, we are generally very pleased with the result of our project given the time limitations, and believe that given another week we could’ve completed the specification.

Full details of our design can be found in our final report.

Advertisements