Friday, October 14, 2011

XVIII

  

This last experiment is about image processing using the frames of a video-captured scene. Videos are merely a string of still images captured at a certain frequency. This of course is still ruled by the Nyquist criterion to ensure a non-aliased signal (the video itself) in the perception of our eyes. This frequency is more known as the frame rate or frames-per-second (fps) in digital electronics.

With those in mind, this activity taps to the capability of video for time-dependent phenomenon. Our experiment is a basic one: Capture a colored ball as it falls down the floor and using video analysis, experimentally determine g. To do this, we have first to be able to capture the individual still frames of an image. We use the Stoik Video Converter 2.0 to adjust the initial video (i.e. decrease the frame size, standardize the fps, remove audio) before chopping it to the still frames using VirtualDub 1.6.19. 

Our selected time frame consisted of 14 still images. 

Figure 1. Sample of a still frame from our video

Fig. 1 shows a sample image. We have used an orange ball against a white wall to facilitate an easier way for color segmentation later. The height used was ~1m. 

Finally, these 14 images were processed in Scilab 4.1.2 using the Non-Parametric segmentation method we used for Activity 14 - Color Segmentation. The colored ball was now identifiable and by averaging the coordinates we find the location of the approximate center of the ball (centroid).

Figure 2. Centroid locations extracted from the 14 images

 Plotting this in a single image gives us Fig. 2. The distance between two points increases in time due to the gravitational acceleration. Converting the coordinates (specifically the y) to the actual values through the method we have previously used in Activity 1, we can now plot the height vs. time plot of our experiment.

Figure 3.Height vs. Time plot of the experiment

Fig. 3 shows the exponential trend of our data. The time interval between images is simply obtained from the fps of our video, which is 30 fps. This gives us 0.03s gap per frame. The trend line's coefficient 5.767 equals to  50% of our experimental g. This is by virtue of the simple kinematic equation:


 Since we have no initial velocity applied to the ball, we end up with the 2nd term that relates the distance (s) and time (t) with the acceleration (a) of the system. The negative sign constitutes the downward movement. As such, the expected value is around 4.903. With our 5.767 experimental value, this yields a 17% experimental error.

In summary, we have successfully used a video to extract still images and perform color segmentation. This enabled us to obtain the coordinates of our colored ball and experimentally measure g by converting these to actual length and using the frame rate of the video as time steps. The large error can be attributed to parallax in the recording. Also, the small distance (1m) could be a factor to the resolution of this experiment.

Self-Assessment: 10/10

XVI

This activity deals with Probabilistic Classification. We use a derivative of the Bayes Theorem, which is the Linear Discriminant Analysis (LDA) formula:

using the definitions given in http://people.revoledu.com/kardi/tutorial/LDA/LDA.html, assignment of a test image is given to the class with a maximum f.


Using the images of Activity 15, I chose the set of sample and test images of playing cards and 5 peso coins. With 5 samples and also 5 test images, I employ LDA to the test images. The feature identified I chose is color, so that it would be easy to test.

Figure 1. Result of Linear Discriminant Analysis

Fig. 1 shows the results of LDA. The 1st set of f corresponds to the cards, while the 2nd set corresponds to the 5-peso coins test images. The results showed a 100% rate in correctly identifying the test images. This may be due to my decision to use color as feature, because the five-peso coins are tarnished gold while the cards are  red.

Self-Assesment: 10/10

Thursday, September 22, 2011

XIV

In this activity, the use of Normalized Chromaticity Coordinates (NCC) gives us an advantage since it can separate brightness and chromaticity. 
 (1)


Eq. 1 shows the normalization where I is the sum of all channels. We can now write b=1-r-g, and thus the mapping is reduced to 2 dimensions.
Figure 2. NCC space, x is r and y is g

Fig. 2 shows the reduced space. Notice that the blue is when r and g is zero.



Figure 3. Top: Reference image. Bottom:  Region of Interest (ROI)

Fig. 3 shows the images that will be used in the next activities. 

Parametric Probability Distribution Estimation
This method uses the NCC of the image's ROI and then fits it in a Gaussian distribution to determine the probability that a pixel is indeed a part of the ROI.
 (2)
The actual probability used is of course dependent for the 2 channels. This is called the joint probability, p(r)*p(g). Knowing this, we search for the pixels in the whole image (not the ROI only) that is within the joint probability for the ROI.

Figure 3. Resulting image of the Parametric method

Non-Parametric Probability Distribution Estimation
This method involves backprojecting the known histogram of the ROI to the new estimated image. The algorithm is partly similar to the Parametric method but instead of using the joint probability, the method is reversed to associate pixels to a blank matrix from the known 2D histogram of the ROI.
Figure 4. Resulting image of the Non-Parametric method


From the Figs. 3 and 4, we notice that the Parametric method produced a more even and connected image since it has a Gaussian distribution for the PDF while the Non-Parametric method used only direct histogram to image (ROI to whole) backprojection to estimate the association of a certain pixel.


Self-Assessment: 9/10   


XIII

Image compression is a vital tool for getting a compromise between quality and file size of our images. This activity uses Principal Component Analysis results to represent images as a superposition of weighted eigenvectors.

Figure 1. Original Image

Fig. 1 shows the original image that will be used for demonstrating image compression. To simplify things, I will use the grayscale version of the image to flatten the hypermatrix to a normal matrix.

Now, the method works by cutting the image to 10x10 px segments and converting it to a single column, thus we end up with a 100 element column for every 10x10 segment. We do this again for all segments so we end up with a nxp matrix where n is the number of segments and p is the number of elements per block. After that, we now use the pca() function to this.



Figure 2. Top-Bottom: Correlation Circle, Eigenvalue distribution and Eigenimages

The output of pca() has eigenvectors and principal components. By adjusting the amount of elements that is multiplied (eigenvectors and principal components), we also vary the compression of the method. Then this is reassembled back to the matrix size of our image.

Figure 3. Images a-i corresponds to original then 2, 5, 10, 15, 20, 25, 50, and 70 Eigenvector & Principal Components pairs  

Fig 3. shows the resulting quality of images from the compression technique that is used in this work. There is a total of 100 available eigenvectors that can be used so the original image is equal to a full 100 of those pairs used.

Figure 4. Graph of file size vs eigenvector-principal component pairs used

Fig 4. illustrates that the maximum slope is at the 0-20 eigenvector region. This is then the most optimal setting, file size wise, in which there is a sharp decrease.

Self-Assessment: 10/10 

Tuesday, September 6, 2011

XII

Preprocessing text is one real-world application for the image processing techniques that we have previously learned. For this activity, I have binarized handwritten text and performed template matching to find words within the document.

Figure 1. Original image

Figure 2. Region of Interest

Figure 3. 2D Fourier Transform of Fig. 2


Figure 4. Filtered Fourier Transform of Fig. 3

Figs. 2-3 show my attempt in removing the lines from the image. Due to the low resolution and noise of the original image, filtering in the frequency domain did not fully remove the line in the image. For me to continue to the binarization of the text, I manually removed the remaining gray-ish parts.

To facilitate the binarization, morphological transformation was implemented through the close and skel operations.

Figure 4. After operating close

Figure 5.  After operating skel() on Fig. 4

Fig. 5 shows the final preprocessed image. The "M" and "O" cannot be identified properly, but the "D", "E" and the "III" are still good. The quality of binarization is of course dependent on the quality of the original handwritten text. Due to the thinning strokes, the letters "M" and "O" were damaged by the processing.


Figure 6. Template

To end this, I finally do template correlation. Using imcorrcoef(), I tried to find other instances of the word "DESCRIPTION" in the image (Fig. 1). However, I was only able to find the same spot where my template came from.

Figure 7. imcorrcoef() result. 

Fig. 7 shows the result. Notice the white dot in the right side. This is the approximated location of the match of the used template from imcorrcoef().


  Self-Assessment: 7/10   


Thursday, August 25, 2011

XI

Let's be sonically active. This activity challenges our skills in culminating all the image processing tricks to convert an image of a sheet music to actual sound.

Figure 1. First 2 measures of "London Bridge Is Falling Down"

Fig. 1 shows the sheet excerpt that was used for this activity. These two bars contains 4 elements: the G-clef, the time signature symbol, the quarter notes, and quarter rests. We need not concern ourselves with the first two, since I will only be extracting a simple monotonic sound and not a full blown sound complete with timbre and accent. To have a distinction between the quarter notes and rests, I used template matching by correlation that we tackled in Activity 6.

Figure 2. Thresholded image after correlation with a quarter note image

Figure 3. Thresholded image after correlation with a quarter rest image


Figure 4. Combined image of the quarter notes and rests positions

After the correlation, I thresholded the resulting images so that only the brightest spots remain. This means that I am selecting the region that is most correlated with my pattern image. Fig. 2 and 3 shows the result for the quarter notes and rests. Fig. 4 shows the combined image for notes and rests. The sequence of Fig. 4 is in reverse, which means I have to rotate it by 180 degrees to acquaint it with Fig. 1. This is due to the fftshift(). That fact is taken into account for all the sorting that I have done in this work.

From this, I have properly identified the coordinates of the spots that correspond to quarter notes and quarter rests. This showcases the ability of the code to distinguish entities found in the music sheet. In assigning the specific notes, I do a for loop for the various x ranges (height in the image) because this identifies the specific note value of our quarter notes. For the rests, the height range is not important since they don't have a frequency value, only a time value. This translates to a dependence for its L-R placement, but not the height placement, unlike the notes that needs both. To simulate a pause, I use 44 kHz since this is well beyond the range of human hearing. Sequencing the results finally generates the melody:


It is now easy to improve this code since I have already generalized its identification capabilities. Further work is needed in automatizing the x range selection for the note values.

Self-Assessment: 10/10   

Saturday, August 13, 2011

X

Binary operations can be used for size estimation, due to it's independency on details. Regions of interest (ROI) are separated by edge detection methods and morphological operations were used to enhance the binarized image. This improves the information obtained from the data.

This activity culminates various techniques that we have learned before for area estimation of "cells". This enables us to sort out possible "cancerous cells" via image manipulation.

Figure 1. Image of sample "cells"

Fig. 1 shows our test image. This is a snapshot of paper cut to circles. The main idea is to binarize this using its histograms, and then perform opening and closing transforms.

(1)

(2)
Closing (eq. 1) is done by erosion of matrix A with pattern B, then the resulting image is dilated with pattern B. Reversing the process constitutes to opening (eq. 2).

Operating these two methods on Fig. 1 yielded Fig. 2:

Figure 2. Result of performing opening and closing of the image in Fig. 1.
Note that the image was divided to 7 segments to have different ROIs.


Fig. 2 was generated by using a circle (r=10px) as erode()/dilate() pattern. Fig. 2 shows that opening is more viable for our image since we want to improve gap areas and to have a better view on the stacked cells. It also removed some deformed images, due to erosion being the first operator for openClosing though could possibly improve cut segments. To find the average area of the cells, I have selected the Opened images row 1,3 and 4. These were the ones selected since the separation between the cells are, at most well, defined. Using bwlabel() to separate closed clusters of  data, the averaged value is 520px/cell.


Finally, we tackle the 2nd image for this activity. This time, our image has 5 cells that are bigger than the rest. To separate these, we implement this process:

  1. Convert image to black and white
  2. Perform opening transform
  3. Using bwlabel(), obtain the average cell size
  4. Those clusters that exceed the average value are zeroed out.
This method returns an image that has only the cancerous cells retained, thus easily identifying them based from the original image.

Figure 3. Image a)after and b) before operating open. The pattern was a circle with radius of 14px.


Fig. 3 shows the transformed image. The transformation actually cleaned the black and white converted image, because there were some remnant white blots due to the thresholding.

Figure 4. a) Before and b) after filtering the opened image.

Fig. 4 shows the result of the filtering. The colored blobs in Fig. 4a indicate those that are removed, as can be observed in Fig. 4b. The red ones are the supposed "cancerous" cells while the green ones are those that are removed due to overlapping. The filtering was a partial success because it was able to sort out all 5 of the target "big, cancerous" cells, however, the overlapping blobs were also removed. I suggest that in the future, edge detection can be used to discriminate indivual cells among a collection of overlapped ones.

  Self-Assessment: 9/10