Experiments with concepts of Image Processing

Car Pic in Matplotlib

I was using MacOS’s Preview to sign a PDF document and came across this amazing feature that lets you create a digital version of your signature by clicking a photo of it through your webcam. This intrigued me towards thinking what kind of technology goes into building a nifty feature like that. I haven’t been able to get all my answers yet but the following are a list of few things that I have learnt till now.

Setting up the Environment

I have primarily used Python 2.7.x for the experimentation. The following libraries are required :

- mahotas
- scikit-image
- matplotlib
- numpy

These libraries can be found on pip or any such python package manager.

Reading an image into memory

Image can be read by using mahotas.

1
2
3
import mahotas as mh
img = mh.imread('image.jpg')

The image is stored in the form of a multidimensional numpy array of pixels. This is useful when further processing is done on the image.

Displaying the image using pyplot

Displaying an image through various stages of processing can be useful and give insights into how the process affects the image.

1
2
import matplotlib.pyplot as plt
plt.imshow(img)

Converting an image from RGB to Gray scale

It’s difficult to handle a colour image during processing because the image becomes a 3-Dimensional array, this is usually solved by converting the image to a gray scale (“removing” the third dimension) where varying intensity of the gray colour represents the image.

1
2
from skimage import rgb2gray
img_gray = rgb2gray(img)

This function returns a numpy array of dtype float64. Care should be taken to convert it back into integer before performing tasks that require integer type values.

Converting an image array from float64 to uint8

Through the stages of processing, data type casting is a much used technique because of the varied kind of algorithms used.

1
2
import numpy as np
img_gray_int = (img_gray * 255).astype(np.uint8)
  • np.uint8 : Unsigned integer (0 to 255)
  • float64 : Double precision float: sign bit, 11 bits exponent, 52 bits mantissa

Experiment 1 : Object Detection

In order to detect an object, its borders should be detected first. For that purpose we use Gradient Vectors of pixels of an image. Roughly speaking, gradient vector is the change in x and y direction” of a pixel. Chris McCormick explains it beautifully in his blog.

Gradient vector can be calculated by using numpy’s gradient() method. This method returns two vectors of change in gradient x and gradient y respectively.

1
2
3
4
5
6
#X and Y gradients
gx, gy = np.gradient(img_gray)
#Calculating magnitude of the gradient vector
g = (gx ** 2 + gy ** 2) ** .5
#plotting g
plt.imshow(g)

Since gradient vector is the "change in x and y direction", the change will be maximum around borders in an image
The following was the outcome of the experiment:

Original Car Picture

Car in Gray Scale

Car Image after plotting g ~ magnitude of gradient vector shows borders

As can be seen we extract useful features of the image, step by step removing/ignoring the information we don’t need.

We followed the following steps :

  • Removed the colours by converting the image to gray scale
  • Replaced the pixel values by calculating pixel gradient
  • Calculated the gradient vector’s magnitude by the formula :
    g = (gx ** 2 + gy ** 2) ** .5
  • On plotting this magnitude, we can an image with borders clearly marked.