The job is to help make the dream of grading an OCR automated into a reality by writing an automatic grading program that will grade scanned test forms as cheat-proof test-taking system with unique test booklets for every student. Given a scan of the answer sheet your program should identify the answers that the student marked as accurately and robustly as possible and compare their answers to a printed code containing the correct answers in a format that can't be easily discerned by students.
- Hough Transform -- Plays a major role in all the 3 python scripts
- Canny edge detection - Using Sobel operator and edge detection
- Convolution operation
- Sharpening and blurring of the image
- Inject.py
- Extract.py
- Grade.py
- Convolution_preprocessing.py
- Edge_detection.py
The idea for inject.py was to print a barcode-like mark that is unique to each answer key and would be very difficult to discern by a human, especially with a time limit. I decided that I wanted to use relative bar lengths to print the answers with the added complexity of randomizing their order on the page. I randomized all 85 questions and saved the key in this code, as well as in Extract.py so that it could be read back into the proper order. The bar lengths were a bit tricky to figure out as I wanted to make sure that, even with noise and poorly scanned test forms, there would be enough space in between the bars to discern which bar length belongs to which answer.
Location of barcode: the key to implementing the bar lengths in a robust way was to print a thick bordered box around where the answers would be printed. By using the box I could make the bar heights and widths relative to those of the box. This came in handy as we needed to move the box up slightly to assist with grading and we were able to do so just by updating one variable without affecting performance.
Barcode lengths: the actual bar length percentages were chosen by allowing for all 5 possible answers to be correct and still only take up 75% of the box height. This would be helpful in keeping the border box easier to find later on when the form is scanned in. The width of each bar was determined by dividing the available width of the box by 2x the number of questions + 1, which would allow space for all questions to print and the spaces between them.
The order in which mutiple correct answers for one question in a single vertical was also randomized in order to add some complexity for attempts by humans to crack the code. Overall this approach works very well at printing the correct answers and require all of the secret variables (random order and % bar length) to be able to quickly understand by a student.
The process for reading in the barcode generated by inject.py strongly depends on straightening the image and correctly determining the location of the box that contains the barcode, after that the bars can be easily read in and coverted to answers by logic in the code.
Straightening: we decided to use the hough algorithm to straigthen the form image. To do so I divided the 0-180 degree range to get 5 increments per degree. This can easily be increased or decreased but 5 provided a good balance for accuracy and efficiency. I converted each of these units into radians and built my hough space. For my rho length in the hough space I used pythagoreans theorem to get the length of the diagonal of my image and used twice that number, which needed to be accounted for later on when converting back. I added every pixel over 100 in the converted grayscale image to the hough space using the equation icos(radians) + jsin(randians) where i is the horizontal pixel and j is the vertical pixel location. From the hough space I determined the top 6 lines, which I was able to see from testing would be strong lines close to 90 or 180, and averaged the difference between those and their strightened forms to find how much to rotate the image. This works very well with the chosen variables and was able to get the image straightened enough for reading in the barcode correctly. We were able to speed this up by only adding angles between 80-100 and 170-190 degrees (or 0-10 and 170-180).
Box detection: due to the method of drawing a large box with a strong border a good method to find the box coordinates was again to use the hough algorithm. I decided the best thing to do was to just run the hough algrithm again but limited even further to just detect lines within 1 degree of 90 or 180 as it had already been straightened to 0.2% precision. Instead of building an entirely new hough function I just updated the one I built for straightening to some separate logic. What returns is both the image and a list of the strongest lines. Using this list I am able to narrow down to the top 2 distinct horizontal lines and top two vertical lines and can then crop the box to the correct dimensions. This works very well, even on very poorly printed forms with noise.
Barcode reading and output: Once we have the box we can follow similar ideas to those used to print the barcode in inject.py, but with differences in implementation to allow for robustness. After cropping down to the box, there will often still be some part of the black border which we would like to remove, if it exists. Since we leave space in between the border and the bars we can clean up the image by detecting edges and their thickness using logic and removing them. If the black line at a border takes up at least 50% for width or 80% for height then it is removed. These numbers were determined by trial and error and the 80% needed to be larger than 75% as bars with all 5 correct answers could take up that much space. After removing these borders there can still be some noise so an additional 3 pixels are removed from those edges without affecting accuracy. To read in the answers we move across the image from left to right, checking all pixels at each vertical to detect groups of pixels that start and stop while moving from top to bottom, recording their lengths, and determining the % of the length of the vertical space. We adjust for noise by averaging pixel values across mulitple columns and only allowing lengths of at least 2.5% of the vertical space to be recorded.
The output of this logic gives us a list of each tested vertical and its results. In the list there will be natural groups at each vertical (list index) containing a bar, with the middle of each group being the best predicted % lengths for each question. We determine these answers and compare them to the % that belongs to each letter answer. Once these answers are recordeded we can revert back to the correct order using the question_order_key which is the randomized order of the 85 questions from inject.py. When those are in the correct order we can add them into a new text file, ensuring that any questions with mulitple asnwers are in alphabetical order.
The idea behind, algorithm used for implementing grade.py is to form a grid around boxes to identify and distinguish between the marked and not marked answers as follows:
- Reading the image using the PIL library, then:
- Converting to grayscale irrespective if the format of the image is in grayscale or not
- Checking if the background of the image is white (255) or black(0), if it is white then reversing the pixels values
- Removing pixels values before 600 positions as all the answer boxes are after that (found as to be a common value from all the test images present and including a buffer zone)
- Running sharpening to make the image features stand out more before running the canny edge detection algorithm
- Run Canny Edge detection Algorithm on the normalized image:-
- Using the standard Sobel filter, do Sobel Edge detection.
- Pass the image through Non-maximum suppression to find the direction of edges by using theta (gradients), which was calculated in Sobel.
- Then follow up with thresholding to segregate the pixels into good, weak and bad edges using a high and a low threshold and finally return the image.
- After passing the image through canny, I removed the marked answers by students to form the grid because they were causing noise in the votes casted by the pixels in hough space. Using a 20x20 grid and a threshold of 20000 (sum of pixel intensities around each pixel) I removed the pixels (made the values of the pixels to 0) which crossed this threshold.
- Now after identifying the edges I used Hough transform to cast votes for each pixel for a horizontal and vertical line.
- Since our goal was to form a grid around the boxes that should only contain horizontal and vertical lines, I made each pixel cast a vote for a 0-degree theta and a 90-degree theta.
- In the polar coordinate system each pixel casted their votes
- Based on the votes casted rhos with the best votes were selected
- Then using the logic of pixel difference between the consecutive lines which were selected after voting was done and best 30 lines for vertical and 58 best horizontal lines were selected and thus a grid was formed around the boxes.
- Now using a for loop we looped through each questions which were divided into 3 columns of questions.
- In each iteration, we looped through the 58 lines and selected 2 lines between which the boxes existed and then pairs of vertical lines were used to get 5 different boxes.
- Using a dynamic method to find the threshold (found by averaging the intensities over the first 29 questions) of the sum of pixel intensities of the boxes marked and not marked we selected the answers.
- Finally outputting the answers in the text file that was specified in the arguments.
python3 inject.py form.jpg answers.txt injected.jpg
python3 extract.py injected.jpg output.txt
python3 grade.py b-13.jpg outputs/b-13_output.txt
- The Canny edge detection sometimes failes to detect the edges if the edges were with very light internsity so we used a sharpening filter to enhance the features of the images before we ran the edge detection algo.
- Due to the marked boxes we were getting a lot of false horizontal lines which would not result in a proper grid. So, in order to avoid that we used a 20x20 filter to remove the fillings of the boxes as much as possible.
- While running the hough transform to get the votes of each pixel of each every possible line would have been the ideal scenario but we couldnt proceed with that as the time taken for that was too much and exceeding the allotted time. So we limited the hough transform based on where we were implementing it.
- Identify the box - Using the Canny edge detection to get vertical and horizontal lines, which make up the boxes (the intersections of lines are the locations of boxes), then we can get the locations of boxes and identify them.
- Recognite the answer in a box - Loop through all the boxes formed by the grid of vertical and horizontal lines and find the marked box based on the threshold and output it.
- Recognize the answer in a box - Using some classification algorithms (such as KNN, SVM, Naive Bayes, Random Forest, ANN and so on) to implement.
- Use of feature matching or fourier transform to distinguish the marked boxes from unmarked boxes.
- Vectorization of the many matrix operations in the files to make them run faster.
- Run some noise reduction algorithm so that the noise doesnt affect the hough transform voting mechanism.
- Deploy segementation algorthim for seprating different kind of boxes.
University project:-participation with jcbarney and mzha