To get more acquainted with file types, I've collected images from the web based from the classification of Activity 3 and investigated the characteristics using Windows and SciLab's imfinfo().
Figure 1. Binary image properties
Figure 2. Grayscale image properties
Figure 3. True Color image properties
Figure 4. Indexed image properties
Figure 5. HDR image properties
There are some discrepancies with the bit-depth that is returned by Windows and SciLab. This could be due to the fact that I'm using an older version of SciLab. As such, image header reading could've changed and that's why SciLab differs from what Windows is showing.
The next figures show further SciLab works for image manipulation. Fig. 6 shows the converted image of my previous True Color (Fig. 3) image to a grayscale one.
Figure 6. Converted image using im2gray() and gray_imread(). Both yield similar results.
Figure 7. Converted image using im2bw()
This command needs an argument for the baseline value that it will zero out. Notice the 0.3 value. Knowing the histogram could give further insight to a better value, but for my image, I just picked 0.3 such that I have a sufficient black & white image.
To give an insight on the application of image processing in data analysis, I try to do a background subtraction for my A1 graph. However, my graph is actually clean and thus the effect is not emphasized.
Figure 8. Grayscale converted image [source]
Figure 9. histplot() result of above image
We see from Fig. 8 the negligible visual change from the original. I then continue to get the histogram using the histplot() (Fig. 9) and study the grayscale distribution. The aim is to maximize certain parts of the image histogram and threshold values to become either white or black. However, since my image is particularly "smooth", as we can see in it's histogram, there is little change to the final result.
Figure 10. Final black and white converted image
Figure 11. Top to bottom: Original, im2bw() transforms with 0.8, 0.7, 0.6 and 0.5 thresholds, respectively.
Figure 12. histplot() of the original image
As we can see, lowering the threshold improves the clarity of the actual data. By looking at Fig. 12, most of the pixel values are at the 0.8 to 0.95 intensity. Thus, by examining the behavior of the graphs, I've further understood that im2bw() works by setting pixels with larger intensity values (as can be seen from the histplot()) than your threshold to 1 (white) and smaller values to 0 (black), hence producing the black and white image. Thus by getting farther away from the 0.8 to 0.95 range, we remove most of the data in the picture which is actually noise. We retain the small amount of data contained (area under the curve of 0 to ~0.65) that is our actual data lines.
Note that we also have losses since some pixels in our data graph correspond to pixel values white-d out by the thresholding. Also, one might wonder what happened to the darkening of the image produced with the 0.8 threshold. The conversion of the scanned image to a format with lower channels than the original produced this effect. Lastly, in scanned images, it is not necessary that the actual ink images are the blackest, as we can deduce from Fig. 12.
Lastly, to further elucidate their differences, I have done a side by side comparison of an image saved in the various file formats.
Figure 13. Top to bottom: Original [PNG, source], zoomed-in shots in PNG, JPG, GIF and BMP formatting, accordingly.
From Fig. 13, I barely notice differences between the PNG and BMP formats, except the filesize. The PNG format was well over 2 times smaller than its BMP counterpart. For the other 2 lossy formats, JPG is significantly better in terms of quality than GIF and marginally smaller in file size.
With our brief introduction to image formats, it seems that lossless formats only optimizes the storage of the image data to minimize size. Lossy formats on the other hand, employ various methods and algorithms to remove some data (with specific considerations, of course) such that the resulting file size is smaller. The amount of data removed, method and impact to the actual visual of the image varies with lossy formatting used.
Self-Assessment: 10/10
No comments:
Post a Comment