Friday, October 14, 2011

XVIII

  

This last experiment is about image processing using the frames of a video-captured scene. Videos are merely a string of still images captured at a certain frequency. This of course is still ruled by the Nyquist criterion to ensure a non-aliased signal (the video itself) in the perception of our eyes. This frequency is more known as the frame rate or frames-per-second (fps) in digital electronics.

With those in mind, this activity taps to the capability of video for time-dependent phenomenon. Our experiment is a basic one: Capture a colored ball as it falls down the floor and using video analysis, experimentally determine g. To do this, we have first to be able to capture the individual still frames of an image. We use the Stoik Video Converter 2.0 to adjust the initial video (i.e. decrease the frame size, standardize the fps, remove audio) before chopping it to the still frames using VirtualDub 1.6.19. 

Our selected time frame consisted of 14 still images. 

Figure 1. Sample of a still frame from our video

Fig. 1 shows a sample image. We have used an orange ball against a white wall to facilitate an easier way for color segmentation later. The height used was ~1m. 

Finally, these 14 images were processed in Scilab 4.1.2 using the Non-Parametric segmentation method we used for Activity 14 - Color Segmentation. The colored ball was now identifiable and by averaging the coordinates we find the location of the approximate center of the ball (centroid).

Figure 2. Centroid locations extracted from the 14 images

 Plotting this in a single image gives us Fig. 2. The distance between two points increases in time due to the gravitational acceleration. Converting the coordinates (specifically the y) to the actual values through the method we have previously used in Activity 1, we can now plot the height vs. time plot of our experiment.

Figure 3.Height vs. Time plot of the experiment

Fig. 3 shows the exponential trend of our data. The time interval between images is simply obtained from the fps of our video, which is 30 fps. This gives us 0.03s gap per frame. The trend line's coefficient 5.767 equals to  50% of our experimental g. This is by virtue of the simple kinematic equation:


 Since we have no initial velocity applied to the ball, we end up with the 2nd term that relates the distance (s) and time (t) with the acceleration (a) of the system. The negative sign constitutes the downward movement. As such, the expected value is around 4.903. With our 5.767 experimental value, this yields a 17% experimental error.

In summary, we have successfully used a video to extract still images and perform color segmentation. This enabled us to obtain the coordinates of our colored ball and experimentally measure g by converting these to actual length and using the frame rate of the video as time steps. The large error can be attributed to parallax in the recording. Also, the small distance (1m) could be a factor to the resolution of this experiment.

Self-Assessment: 10/10

XVI

This activity deals with Probabilistic Classification. We use a derivative of the Bayes Theorem, which is the Linear Discriminant Analysis (LDA) formula:

using the definitions given in http://people.revoledu.com/kardi/tutorial/LDA/LDA.html, assignment of a test image is given to the class with a maximum f.


Using the images of Activity 15, I chose the set of sample and test images of playing cards and 5 peso coins. With 5 samples and also 5 test images, I employ LDA to the test images. The feature identified I chose is color, so that it would be easy to test.

Figure 1. Result of Linear Discriminant Analysis

Fig. 1 shows the results of LDA. The 1st set of f corresponds to the cards, while the 2nd set corresponds to the 5-peso coins test images. The results showed a 100% rate in correctly identifying the test images. This may be due to my decision to use color as feature, because the five-peso coins are tarnished gold while the cards are  red.

Self-Assesment: 10/10

Thursday, September 22, 2011

XIV

In this activity, the use of Normalized Chromaticity Coordinates (NCC) gives us an advantage since it can separate brightness and chromaticity. 
 (1)


Eq. 1 shows the normalization where I is the sum of all channels. We can now write b=1-r-g, and thus the mapping is reduced to 2 dimensions.
Figure 2. NCC space, x is r and y is g

Fig. 2 shows the reduced space. Notice that the blue is when r and g is zero.



Figure 3. Top: Reference image. Bottom:  Region of Interest (ROI)

Fig. 3 shows the images that will be used in the next activities. 

Parametric Probability Distribution Estimation
This method uses the NCC of the image's ROI and then fits it in a Gaussian distribution to determine the probability that a pixel is indeed a part of the ROI.
 (2)
The actual probability used is of course dependent for the 2 channels. This is called the joint probability, p(r)*p(g). Knowing this, we search for the pixels in the whole image (not the ROI only) that is within the joint probability for the ROI.

Figure 3. Resulting image of the Parametric method

Non-Parametric Probability Distribution Estimation
This method involves backprojecting the known histogram of the ROI to the new estimated image. The algorithm is partly similar to the Parametric method but instead of using the joint probability, the method is reversed to associate pixels to a blank matrix from the known 2D histogram of the ROI.
Figure 4. Resulting image of the Non-Parametric method


From the Figs. 3 and 4, we notice that the Parametric method produced a more even and connected image since it has a Gaussian distribution for the PDF while the Non-Parametric method used only direct histogram to image (ROI to whole) backprojection to estimate the association of a certain pixel.


Self-Assessment: 9/10   


XIII

Image compression is a vital tool for getting a compromise between quality and file size of our images. This activity uses Principal Component Analysis results to represent images as a superposition of weighted eigenvectors.

Figure 1. Original Image

Fig. 1 shows the original image that will be used for demonstrating image compression. To simplify things, I will use the grayscale version of the image to flatten the hypermatrix to a normal matrix.

Now, the method works by cutting the image to 10x10 px segments and converting it to a single column, thus we end up with a 100 element column for every 10x10 segment. We do this again for all segments so we end up with a nxp matrix where n is the number of segments and p is the number of elements per block. After that, we now use the pca() function to this.



Figure 2. Top-Bottom: Correlation Circle, Eigenvalue distribution and Eigenimages

The output of pca() has eigenvectors and principal components. By adjusting the amount of elements that is multiplied (eigenvectors and principal components), we also vary the compression of the method. Then this is reassembled back to the matrix size of our image.

Figure 3. Images a-i corresponds to original then 2, 5, 10, 15, 20, 25, 50, and 70 Eigenvector & Principal Components pairs  

Fig 3. shows the resulting quality of images from the compression technique that is used in this work. There is a total of 100 available eigenvectors that can be used so the original image is equal to a full 100 of those pairs used.

Figure 4. Graph of file size vs eigenvector-principal component pairs used

Fig 4. illustrates that the maximum slope is at the 0-20 eigenvector region. This is then the most optimal setting, file size wise, in which there is a sharp decrease.

Self-Assessment: 10/10 

Tuesday, September 6, 2011

XII

Preprocessing text is one real-world application for the image processing techniques that we have previously learned. For this activity, I have binarized handwritten text and performed template matching to find words within the document.

Figure 1. Original image

Figure 2. Region of Interest

Figure 3. 2D Fourier Transform of Fig. 2


Figure 4. Filtered Fourier Transform of Fig. 3

Figs. 2-3 show my attempt in removing the lines from the image. Due to the low resolution and noise of the original image, filtering in the frequency domain did not fully remove the line in the image. For me to continue to the binarization of the text, I manually removed the remaining gray-ish parts.

To facilitate the binarization, morphological transformation was implemented through the close and skel operations.

Figure 4. After operating close

Figure 5.  After operating skel() on Fig. 4

Fig. 5 shows the final preprocessed image. The "M" and "O" cannot be identified properly, but the "D", "E" and the "III" are still good. The quality of binarization is of course dependent on the quality of the original handwritten text. Due to the thinning strokes, the letters "M" and "O" were damaged by the processing.


Figure 6. Template

To end this, I finally do template correlation. Using imcorrcoef(), I tried to find other instances of the word "DESCRIPTION" in the image (Fig. 1). However, I was only able to find the same spot where my template came from.

Figure 7. imcorrcoef() result. 

Fig. 7 shows the result. Notice the white dot in the right side. This is the approximated location of the match of the used template from imcorrcoef().


  Self-Assessment: 7/10   


Thursday, August 25, 2011

XI

Let's be sonically active. This activity challenges our skills in culminating all the image processing tricks to convert an image of a sheet music to actual sound.

Figure 1. First 2 measures of "London Bridge Is Falling Down"

Fig. 1 shows the sheet excerpt that was used for this activity. These two bars contains 4 elements: the G-clef, the time signature symbol, the quarter notes, and quarter rests. We need not concern ourselves with the first two, since I will only be extracting a simple monotonic sound and not a full blown sound complete with timbre and accent. To have a distinction between the quarter notes and rests, I used template matching by correlation that we tackled in Activity 6.

Figure 2. Thresholded image after correlation with a quarter note image

Figure 3. Thresholded image after correlation with a quarter rest image


Figure 4. Combined image of the quarter notes and rests positions

After the correlation, I thresholded the resulting images so that only the brightest spots remain. This means that I am selecting the region that is most correlated with my pattern image. Fig. 2 and 3 shows the result for the quarter notes and rests. Fig. 4 shows the combined image for notes and rests. The sequence of Fig. 4 is in reverse, which means I have to rotate it by 180 degrees to acquaint it with Fig. 1. This is due to the fftshift(). That fact is taken into account for all the sorting that I have done in this work.

From this, I have properly identified the coordinates of the spots that correspond to quarter notes and quarter rests. This showcases the ability of the code to distinguish entities found in the music sheet. In assigning the specific notes, I do a for loop for the various x ranges (height in the image) because this identifies the specific note value of our quarter notes. For the rests, the height range is not important since they don't have a frequency value, only a time value. This translates to a dependence for its L-R placement, but not the height placement, unlike the notes that needs both. To simulate a pause, I use 44 kHz since this is well beyond the range of human hearing. Sequencing the results finally generates the melody:


It is now easy to improve this code since I have already generalized its identification capabilities. Further work is needed in automatizing the x range selection for the note values.

Self-Assessment: 10/10   

Saturday, August 13, 2011

X

Binary operations can be used for size estimation, due to it's independency on details. Regions of interest (ROI) are separated by edge detection methods and morphological operations were used to enhance the binarized image. This improves the information obtained from the data.

This activity culminates various techniques that we have learned before for area estimation of "cells". This enables us to sort out possible "cancerous cells" via image manipulation.

Figure 1. Image of sample "cells"

Fig. 1 shows our test image. This is a snapshot of paper cut to circles. The main idea is to binarize this using its histograms, and then perform opening and closing transforms.

(1)

(2)
Closing (eq. 1) is done by erosion of matrix A with pattern B, then the resulting image is dilated with pattern B. Reversing the process constitutes to opening (eq. 2).

Operating these two methods on Fig. 1 yielded Fig. 2:

Figure 2. Result of performing opening and closing of the image in Fig. 1.
Note that the image was divided to 7 segments to have different ROIs.


Fig. 2 was generated by using a circle (r=10px) as erode()/dilate() pattern. Fig. 2 shows that opening is more viable for our image since we want to improve gap areas and to have a better view on the stacked cells. It also removed some deformed images, due to erosion being the first operator for openClosing though could possibly improve cut segments. To find the average area of the cells, I have selected the Opened images row 1,3 and 4. These were the ones selected since the separation between the cells are, at most well, defined. Using bwlabel() to separate closed clusters of  data, the averaged value is 520px/cell.


Finally, we tackle the 2nd image for this activity. This time, our image has 5 cells that are bigger than the rest. To separate these, we implement this process:

  1. Convert image to black and white
  2. Perform opening transform
  3. Using bwlabel(), obtain the average cell size
  4. Those clusters that exceed the average value are zeroed out.
This method returns an image that has only the cancerous cells retained, thus easily identifying them based from the original image.

Figure 3. Image a)after and b) before operating open. The pattern was a circle with radius of 14px.


Fig. 3 shows the transformed image. The transformation actually cleaned the black and white converted image, because there were some remnant white blots due to the thresholding.

Figure 4. a) Before and b) after filtering the opened image.

Fig. 4 shows the result of the filtering. The colored blobs in Fig. 4a indicate those that are removed, as can be observed in Fig. 4b. The red ones are the supposed "cancerous" cells while the green ones are those that are removed due to overlapping. The filtering was a partial success because it was able to sort out all 5 of the target "big, cancerous" cells, however, the overlapping blobs were also removed. I suggest that in the future, edge detection can be used to discriminate indivual cells among a collection of overlapped ones.

  Self-Assessment: 9/10 

Saturday, July 30, 2011

IX

Morphological operations uses Set Theory to manipulate matrices. Since we now know that images are just matrices of values with layers of channels, it is appropriate that certain algorithms can be devised to do morphological operations with pictures.

This activity deals primarily with some basic shape alteration and recognition. For basic operations, binary "flattened" versions of multi-chanelled images are used. To start off, let's examine how these 2 operations work.

Note: Since the images are in their binary form, zeros are considered as background while ones are the object. Background is ignored by the operations, and is thus useful since the operations would only work if the matrices compared have the same dimensions.

First, matrix A (one containing the original pattern) is scanned with another matrix B (one containing the mask). Then, a new zero matrix C (transformed image) with the same dimensions as matrix A & B are mapped depending on the operation used:


  • erosion: All coordinates of the anchor point for when the mask is entirely enclosed by the object is set to one in matrix C. 

Figure 1. Erosion. The anchor point is the center of matrix B.


  • dilation: All coordinates of the anchor point for when at least one element of the mask intersects with the object is set to one.

Figure 2. Dilation. The anchor point is the center of matrix B.

Thus, the anchor element of the mask in matrix B determines how the transformed image will be shifted in matrix C with respect to matrix A.

Using 4 original patterns and 5 maskserosion and dilation are employed and their effects were observed. The anchor points for the masks are as follows:
  1. 2x2 square: Top-left pixel
  2. 2x1: Top pixel
  3. 1x3: Left pixel
  4. 3x3 cross: Top-most pixel 


                          Figure 3. Original Patterns: 5x5 square, 3x4 right triangle, 10x10 square annulus 2 pixels thick, 5x5 cross 
                                          Mask: 2x2 square, 2x1 , 1x2, 3x3 cross and a 2 pixel long diagonal.



erode()

Figure 4. My hand drawn predictions for erosion.


Figure 5. scilab's erode() operator results.


My prediction would've been perfect if not for my careless error on the diagonal mask of the square annulus. However, I now fully understand how erosion works. Since erosion "trims" by fitting the mask entirely, it is possible to have completely blank images. We can see this with the cross mask for the 3x4 triangle and the 2x2 mask for the 5x5 cross patterns.

dilate()
Figure 6. My hand drawn predictions for erosion.

Figure 7. scilab's dilate() operator results.

Again, like with my erosion predictions, this would've been perfect if not for the 1x2 mask of the square annulus.


thin() & skel()

Lastly, I examine the thin() and skel() operators of scilab. These are more complex than the erode() and dilate() operators. From the help file, the implementation of thin() on an image of text produces:

Figure 8. Above: Original image. Below: thin() results

thin() seems to trace lines and curves by "thinning" them until they are only one pixel wide. The deviations from straight lines came as a consequence of not using a perfectly binary image. The above image was just converted using im2bw(), in which the conversion led to some unstraightened lines when thin() was used.

As we can see, this may be problematic when we have lots of line nodes in our image. For this, we use the more complex skel(). 

Figure 9. L-R: Original image, result of skel() superimposed with the original and the distance transform.

skel() successfully traced a quite rounded and thick image. These characteristics would have resulted to a poor trace, had we used thin() for the image. skel()  seems to average the whole network of lines and deduce the "skeletal" frame of the image. As such, it also has a distance mapping output. This seems to be a pixel population distribution map of the image with respect to skel()'s traced path.

This was a good introductory activity for morphological transforms of images.


  Self-Assessment: 9/10   

Saturday, July 23, 2011

VIII

We can enhance or filter out unwanted frequencies of an image by removing them in the Fourier map and then reapplying the transform to obtain the final image.

Note: The reconstructed images are180 degrees rotated (with respect to the original) due to the FT.

8A. Convolution Theorem

Figure 1. L-R Two-dot binary image and it's FT.

Fig. 1 is an example of how the FT of a FT of an image would revert back to the original image. If we reverse the image labels (i.e. if we take the right image as the original one), one can remember that the 2 dots are representative of the quantitative value of the frequency on the original image.



Figure 2. R-L: Dots were replaced with circles of increasing radii. Top-Bottom: Original image and its FT.

Fig. 2 shows that as the circles were increased, the overall size of its FT diminishes. Why? This is because of the now 2D nature of our image. As the circles increase in size, this is interpreted in a Fourier-sense that a more constant and non-repetitive image is being generated. We can note that the black lines in the FT are remnants of the 1D layout (the center of the circles are still on the x-axis) and the concentric light bands are the 2D components of the circles.

Figure 3. R-L: Dots were replaced with squares of increasing areas. Top-Bottom: Original image and its FT.


Fig. 3 shows a similar behavior as Fig. 2. The shape of the squares are reflected in its FT.

Figure 4. R-L: FT of circles with Gaussian intensity distribution (increasing variance)

Just as with Fig. 2, as the size of the Gaussian circles increase, the radius of the resulting FT pattern decreases. However, due to the the distribution of the intensity, the resulting FT also has a less distinct concentricity.

Figure 5. Convolution of 10 randomly placed dots and a random 3x3 matrix

Fig. 5 uses convolution. This image is not very different from the original image. The only difference is that the dots became broader. The 3x3 matrix appeared to have been transposed to the dot locations, as noticeable from the convolution of a function f(x) with a dirac-delta. 

 (1)


Eq. 1 shows that as the convolution causes f(t) to appear on the previous location of the dirac-delta. Fig. 5 has white dots (1 pixel in size) that are considered dirac-deltas. So the result of imconv() reflects the 3x3 mask on the location of the 10 random dots.

Figure 6. L-R: FT of equally spaced white pixels. (5, 10, 50, 100 & 200 pixel separation on both x and y axis, respectively)


Fig. 6 further cements our first-hand experience that the FT is in frequency space. As the "wavelength" is increased, the frequency decreases, so the magnitude of the separation of the dots in the corresponding FT decreases, too.

8B. Ridge Enhancement

This time, I will do a more practical application of Fourier map knowledge. Since fingerprints have a repetitive structure, filtering in the Fourier map may enhance their images.
Figure 7. L-R: My own fingerprint and it's FT


From Fig. 7, we can see that there are a lot of noise on the radial extreme of the FT. A prominent halo can be distinguished in the middle (with the DC term on the origin). I tried to blacken out the noisy parts and retain the middle parts, and this is what I got:
 
Figure 7. L-R: Filtered FT (top) and their respective reconstructed images (bottom)

Fig. 7 shows how the filtering of signal in the Fourier map affects the quality of the reconstructed image. When the mask covers the "halo" signal, the reconstructed image suffers a poor quality; the ridges become indistinguishable. The leftmost filter works well, however, the clarity could have been better had I removed the DC component (middle spot)

8C. Line Removal

Now, let's try some more basic filtering: line removal.

Figure 7. Top: FTs of original image (left) and it's filtered  form (right) Bottom: Corresponding reconstructed images


Fig. 7 shows the results of line removal. The lines ultimately became less accented. The position is based my previous work with FTs. If the pattern persists in a certain dimension, it would also be in the Fourier map. As such, the lines were along the image's x-axis, as such the FT had reflections on its x-axis, too.

8D. Weave Removal

Finally, let's do masking for 2D signals.


Figure 8. Top: FTs of original image (left) and it's filtered  form (right) Bottom: Corresponding reconstructed images


Similar to Fig. 7, filtering removed the blotch patterns of the original image. Note that the presence of the peaks on the FT that has x & y components signify some angled patterns. These can be thought of the collection of the individual x & y patterns viewed at a certain angle. Comparing this with Fig. 7, we see that since the weave pattern has is 2D repetitive, there are bright spots on the x & y axis of the FT image. Thus, blocking these signals and those that has has both x & y components improves our reconstructed image.

To further illustrate that these are indeed the FT components of the weave pattern, I have reconstructed the mask using FT:


Figure 9. Top-Bottom: Mask and its FT

Fig. 9 shows the inverted mask and its FT. The mask was inverted because the original was meant to block out the signals. Inverting it would approximate a FT that has peaks at the once "masking" areas. The FT clearly reveals that it is indeed the weave pattern as seen in Fig. 8.

This activity has again increased my insight on the nature of FTs and their possible applications.

Self-Assessment: 10/10