Professional Documents
Culture Documents
Introduction When retrieved from the Internet, digital images take a considerable
amount of time to download and use a large amount of computer memory. The Haar
wavelet transform that we will discuss in this application is one way of compressing
digital images so they take less space when stored and transmitted. As we will see
later, the word ``wavelet stands for an orthogonal basis of a certain vector space.
The basic idea behind this method of compression is to treat a digital image as an
array of numbers i.e., a matrix. Each image consists of a fairly large number of little
squares called pixels(picture elements). The matrix corresponding to a digital image
assigns a whole number to each pixel. For example, in the case of a 256x256 pixel
gray scale image, the image is stored as a 256x256 matrix, with each element of the
matrix being a whole number ranging from 0 (for black) to 225 (for white). The JPEG
compression technique divides an image into 8x8 blocks and assigns a matrix to each
block. One can use some linear algebra techniques to maximize compression of the
image and maintain a suitable level of detail
These will form the first four entries of the next step vector r1.
3. Subtract each average from the first entry of the pair to get the numbers:
.
These will form the last four entries of the next step vector r1.
4. Form the new vector:
Note that the vector r1 can be obtained from r by multiplying r on the right by the
matrix:
The first four coefficients of r1 are called the approximation coefficients and the last
four entries are called the detail coefficients.
For our next step, we look at the first four entries of r1 as two pairs that we take their
averages as in step 1 above. This gives the first two entries: 564, 1470 of the new
vector r2. These are our new approximation coefficients. The third and the fourth
entries of r2 are obtained by subtracting these averages from the first element of each
pair. This results in the new detail coefficients: -14, -130. The last four entries
of r2 are the same as the detail coefficients of r1:
Here the vector r2 can be obtained from r1 by multiplying r1 on the right by the
matrix:
For the last step, average the first two entries of r2, and as before subtract the answer
from the first entry. This results in the following vector:
Let
The columns of the matrix W1 form an orthogonal subset of R8 (the vector space of
dimension 8 over R); that is these columns are pair wise orthogonal (try their dot
products). Therefore, they form a basis of R8. As a consequence, W1 is invertible. The
same is true for W2 and W3.
As a product of invertible matrices, W is also invertible and its columns form
an orthogonal basis of R8. The inverse of W is given by:
The fact the W is invertible allows us to retrieve our image from the compressed form
using the relation
Suppose that A is the matrix corresponding to a certain image. The Haar transform is
carried out by performing the above operations on each row of the matrix A and
then by repeating the same operations on the columns of the resulting matrix. The
row-transformed matrix is AW. Transforming the columns of AW is obtained by
multiplying AW on the left by the matrix WT (the transpose of W). Thus, the Haar
transform takes the matrix A and stores it as WTAW. Let S denote the transformed
matrix:
Using the properties of inverse matrix, we can retrieve our original matrix:
This allows us to see the original image (decompressing the compressed image).
Let us try an example.
Example Suppose we have an 8x8 image represented by the matrix
The point of doing Haar wavelet transform is that areas of the original matrix that
contain little variation will end up as zero elements in the transformed matrix. A
matrix is considered sparse if it has a high proportion of zero entries. Sparse
matrices take much less memory to store. Since we cannot expect the transformed
matrices always to be sparse, we decide on a non-negative threshold value known as
, and then we let any entry in the transformed matrix whose absolute value is less
than to be reset to zero. This will leave us with a kind of sparse matrix. If is zero,
we will not modify any of the elements.
Every time you click on an image to download it from the Internet, the source
computer recalls the Haar transformed matrix from its memory. It first sends the
overall approximation coefficients and larger detail coefficients and a bit later the
smaller detail coefficients. As your computer receives the information, it begins
reconstructing in progressively greater detail until the original image is fully
reconstructed.
Linear algebra can make the compression process faster, more efficient
Let us first recall that an nxn square matrix A is called orthogonal if its columns form
an orthonormal basis of Rn, that is the columns of A are pairwise orthogonal and the
length of each column vector is 1. Equivalently, A is orthogonal if its inverse is equal
to its transpose. That latter property makes retrieving the transformed image via the
equation
much faster.
This in turns shows that ||Av||=||v||. Also, the angle is preserved when the
transformation is by orthogonal matrices: recall that the cosine of the angle between
two vectors u and v is given by:
so, if A is an orthogonal matrix, is the angle between the two vectors Au and Av,
then
Since both magnitude and angle are preserved, there is significantly less distortion
produced in the rebuilt image when an orthogonal matrix is used. Since the
Remark If you look closely at the process we described above, you will notice that
the matrix W is nothing but a change of basis for R8. In other words, the columns
of W form a new basis (a very nice one) of R8. So when you multiply a
vector v (written in the standard basis) of R8 by W, what you get is the coordinates
of v in this new basis. Some of these coordinates can be neglected using our
threshold and this what allows the transformed matrix to be stored more easily and
transmitted more quickly.
Compression ratio If we choose our threshold value to be positive (i.e. greater than
zero), then some entries of the transformed matrix will be reset to zero and therefore
some detail will be lost when the image is decompressed. The key issue is then to
choose wisely so that the compression is done effectively with a minimum
damage to the picture. Note that the compression ratiois defined as the ratio of
nonzero entries in the transformed matrix (S=WTAW) to the number of nonzero
entries in the compressed matrix obtained from S by applying the threshold .
http://aix1.uottawa.ca/~jkhoury/haar.htm
http://www.whydomath.org/node/wavlets/hwt.html
Image Compression:
How Math Led to the JPEG2000 Standard
(a, b)
( (b + a)/2, (b - a)/2 )
We will call the first output the average and the second output the difference.
So why would we consider sending (150,47,20,3 | 50, 3, 0, -1) instead of (100,200,44,50,20,20,4,2)? Two reasons quickly come to
mind. The differences in the transformed list tell us about the trends in the data - big differences indicate large jumps between
values while small values tell us that there is relatively little change in that portion of the input. Also, if we are interested in lossy
compression, then small differences can be converted to zero and in this way we can improve the efficiency of the coder. Suppose
we converted the last three values of the transformation to zero. Then we would transmit (150, 47, 20, 3 | 50, 0, 0, 0). The recipient
could invert the process and obtain the list
(150-50, 150+50, 47-3, 47+3, 20-0, 20+0, 3-0, 3+0) = (100,200,44,50,20,20,3,3)
The "compressed" list is very similar to the original list!
Matrix Formulation
For an even-length list (vector) of numbers, we can also form a matrix product that computes this transformation. For the sake of
illustration, let's assume our list (vector) is length 8. If we put the averages as the first half of the output and differences as the
second half of the output, then we have the following matrix product:
W8 v=
1 2 0 0 0 1 2 0 0 0 1 2 0 0 0 1 2 0 0 0 0 1 2 0 0 0 1 2 0 0 0 1 2 0 0 0 1
2 0 0 0 0 1 2 0 0 0 1 2 0 0 0 1 2 0 0 0 1 2 0 0 0 0 1 2 0 0 0 1 2 0 0 0 1 2 0 0 0 1 2
v1 v2 v3 v4 v5 v6 v7 v8
(v1+v2)
=y
W81 y=
1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
0 1 1
v1 v2 v3 v4 v5 v6 v7 v8
=v
The matrix W8
satisfies another interesting property - we can compute the inverse by doubling the transpose! That is,
W81 =2W8T
For those of you who have taken a linear algebra course, you may remember that orthogonal matrices Usatisfy U1=UT . We almost
have that with our transformation. Indeed if we construct W8=
W81 = = = = = = (
2W8 )1 (1
2W8 , we have
2)W81
(1
2)2W8T
2W8T
2W8T )T W8T
WN=
2 200
2 20
2 200
2 20
0
00
2 20
2 200
2 200
2 200
2 2000
2 200
2 200
2 2
2 . The last
2.
We define the Haar filter as the numbers used to form the first row of the transform matrix. That is, the Haar filter is h= h0 h1 =
2 2
2 2 . This filter is also called a lowpass filter - since it averages pairs of numbers, it tends to reproduce (modulo the
2)
two values that are similar and send to 0 to numbers that are (near) opposites of each other. Note also that the sum of the filter
values is
2.
We call the filter that is used to build the bottom half of the HWT a highpass filter. In this case, we have g= g0 g1 =
2 2
2 . Highpass filters process data exactly opposite of lowpass filters. If two numbers are near in value, the highpass filter will return
a value near zero. If two numbers are (near) opposites of each other, then the highpass filter will return a weighted version of one of
the two numbers.
Fourier Series From the Filters
An important tool for constructing filters for discrete wavelet transformations is Fourier series. To analyze a given filter h=(h0 h1 h2
hL), engineers will use the coefficients to form a Fourier series
H( )=h0+h1ei +h2e2i + +hLeLi
and then plot the absolute value of this series. It turns out that we can identify lowpass filters and highpass filters from these graphs.
The plots for the filters for the HWT H( )=2 2+2 2ei and G( )=2 2+2 2ei appear below:
H(
G(
2 at 0 and H( )=0 . The graph for the highpass filter is just the opposite - G(0)=0 and G( )
= 2 . This is typical of lowpass and highpass filters. We can also put other conditions on these graphs and that is often how more
sophisticated lowpass/highpass filter pairs for the DWT are defined.
HWT and Digital Images
How do we apply the HWT to a digital grayscale image? If the image is stored in matrix A with even dimensions M x N, then the
natural thing to try is to compute WMA. We can view this matrix multiplication as WM applied to each column of A so the output
should be an M x N matrix where each column is M/2 weighted averages followed by M/2 weighted differences. The plots below
illustrate the process:
We have used the Haar matrix to process the columns of image matrix A. It is desirable to process therows of the image as well. We
proceed by multiplying WMA on the right by WNT. Transposing the wavelet matrix puts the filter coefficients in the columns and
multiplication on the right by WNT means that we will be dotting the rows of WMA with the columns of WNT (columns of WN). So the two
dimensional HWT is defined as:
B=WMAWNT
The process is illustrated below.
B=WAWT= H G A H G T= H G A HT GT = HA GA
HT GT = HAHT GAHT HAGT GAGT
We now see why there are four blocks in the wavelet transform. Let's look at each block individually. Note that the matrix H is
constructed from the lowpass Haar filter and computes weighted averages while Gcomputes weighted differences.
The upper left-hand block is HAHT - HA averages columns of A and the rows of this product are averaged by multiplication with HT.
Thus the upper left-hand corner is an approximation of the entire image. In fact, it can be shown that elements in the upper left-hand
corner of the HWT can be constructed by computing weighted averages of each 2 x 2 block of the input matrix. Mathematically, the
mapping is
acbd
2 ( a + b + c + d )/4
The upper right-hand block is HAGT - HA averages columns of A and the rows of this product are differenced by multiplication
with GT. Thus the upper right-hand corner holds information about vertical in the image - large values indicate a large vertical
change as we move across the image and small values indicate little vertical change. Mathematically, the mapping is
acbd
2 ( b + d - a - c)/4
The lower left-hand block is GAHT - GA differences columns of A and the rows of this product are averaged by multiplication with HT.
Thus the lower left-hand corner holds information about horizontal in the image - large values indicate a large horizontal change as
we move down the image and small values indicate little horizontal change. Mathematically, the mapping is
acbd
2 ( c + d - a - b )/4
The lower right-hand block is differences across both columns and rows and the result is a bit harder to see. It turns out that this
product measures changes along 45-degree lines. This is diagonal differences. Mathematically, the mapping is
acbd
2 ( b + c - a - d )/4
To summarize, the HWT of a digital image produces four blocks. The upper-left hand corner is an approximation or blur of the
original image. The upper-right, lower-left, and lower-right blocks measure the differences in the vertical, horizontal, and diagonal
directions, respectively.
The iterated HWT is an effective tool for conserving the energy of a digital image. The plot below shows the energy distribution for
the original image (green), one iteration of the HWT (brown), and three iterations of the HWT (orange). The horizontal scale is pixels
(there are 38,400 pixels in the thumbnail of the image). For a given pixel value p, the height represents the percentage of energy
stored in the largest p pixels of the image. Note that the HWT gets to 1 (100% of the energy) much faster than the original image
and the iterated HWT is much better than either the HWT or the original image.
Summary
The HWT is a wonderful tool for understanding how a discrete wavelet tranformation works. It is not desirable in practice because
the filters are too short - since each filter is length two, the HWT decouples the data to create values of the transform. In particular,
each value of the transform is created from a 2 x 2 block from the original input. If there is a large change between say row 6 and
row 7, the HWT will not detect it. The HWT also send integers to irrational numbers and for lossless image compression, it is crucial
that the transform send integers to integers. For these reasons, researchers developed more sophisticated filters. Be sure to check
out the other subsections to learn more other types of wavelet filters.
http://www.cs.ucf.edu/~mali/haar/
An Introduction to Wavelets and the Haar Transform
by Musawir Ali
Lets get in to the details of how this dynamic set of basis functions is chosen
and how the input function is transformed into wavelets.
<1, 0, 0, 0>
<0, 1, 0, 0>
<0, 0, 1, 0>
<0, 0, 0, 1>
But as you would suspect, this is not the best way of doing things. Can we do
better? The trick is to choose a basis that represents our data efficiently and
in a very compact fashion. Notice that our data is pretty uniform; in fact it is
just a constant signal of 2. We would like to exploit this uniformity. If we
choose the basis vector <1, 1, 1, 1>, we can represent our data by just one
number! We would only have to send the number 2 over the network, and our
entire data string could be reconstructed by just multiplying (or weighting)
with the basis vector <1, 1, 1, 1>. This is great, but we still need three more
basis vectors to complete our basis since the space in our example is 4
dimensional. Remember, that all basis vectors have to be orthogonal (or
perpendicular). This means that if you take the dot (or scalar) product of any
two basis vectors, the result should be zero. So our task is to find a vector
that is orthogonal to <1, 1, 1, 1>. One such vector is <1, 1, -1, -1>. If you
take the dot product of these two vectors, the result is indeed zero.
Graphically, these two vectors look like this:
<1,1,1,1>
<1,1,-1,-1>
Notice that graphically these basis vectors look like waves, hence the name
wavelets. Now that we have two basis vectors, we need two more. Haar
constructed the remaining basis vectors by a process of dilation and shifting.
Dilation basically means squeezing; therefore the remaining basis vectors
were constructed by squeezing and shifting. If we squeeze the vector <1, 1, 1, -1>, we get <1, -1, 0, 0>. The 1, 1 pair gets squeezed in to a single 1, and
similarly the -1, -1 pair becomes a single -1. Next, we perform a shift on the
resultant basis vector and get: <0, 0, 1, -1> which is our final basis vector.
Graphically, these two vectors look like this:
<1,-1,0,0>
<0,0,1,-1>
We now have a complete basis for our four dimensional space, comprised of
the following basis vectors or wavelets.
<1, 1, 1, 1>
<1, 1, -1, -1>
<1, -1, 0, 0>
<0, 0, 1, -1>
Take time to convince yourself that all four of these vectors are perpendicular
to each other (take the dot product and see if it is zero). Even though these
basis vectors are orthogonal, they are not orthonormal. However, we can
easily normalize them by calculating the magnitude of each of these vectors
and then dividing their components by that magnitude.
Now that we have our basis, let us look at an example of how we can project
1D Haar Transform
Suppose our input vector is: <4, 2, 5, 5>. To project this in to wavelets, we
simply take a dot product of the input vector with each of the basis vectors.
Thus the input vector got transformed in to <8, -2, 2/2, 0>. Notice the 4th
component is 0! This means that we do not need the 4th basis vector, we can
reconstruct our original input vector with just the first three basis vectors. In
other words, we dynamically chose 3 basis vectors from a possible 4
according to our input.
8 * <1/2, 1/2, 1/2, 1/2> = <4, 4, 4, 4>
-2 * <1/2, 1/2, -1/2, = <-1, -1, 1, 1>
1/2>
2/2 * <1/2, -1/2, 0,
= <1, -1, 0, 0>
0>
add the vectors
= <4, 2, 5, 5>
no. We can use the smaller basis that we already have. In fact, we can use
the simplest wavelet basis which consists of: <1/2, 1/2> and <1/2, 1/2>. These are the smallest wavelets, notice you cannot squeeze them any
further. However in choosing these smaller basis vectors for larger input, we
can no longer do the Haar wavelet transform in one pass as we did earlier. We
will have to recursively transform the input vector until we get to our final
result. As an example, let us use the simple, 2 component basis to transform
the 4 component input vector that we had in our previous example. The
algorithm is outlined below, and our example is traced along side.
Example
Input Vector:
<4,2,5,5>
<4,2> , <5,5>
dot(<4,2> , <1/2, 1/2>)
= 6/2
dot(<5,5> , <1/2, 1/2>)
= 10/2
<6/2, 10/2>
5. Go to 1.
<8>
This algorithm is very simple. If you think about it, all it does is take the sums
and differences of every pair of numbers in the input vector and divides them
by square root of 2. Then, the process is repeated on the resultant vector of
the summed terms. Following is an implementation of the 1D Haar Transform
in C++
}
for(i=0;i<(w*2);i++)
vec[i] = vecp[i];
}
delete [] vecp;
}
2D Haar Transform
The 1D Haar Transform can be easily extended to 2D. In the 2D case, we
operate on an input matrix instead of an input vector. To transform the input
matrix, we first apply the 1D Haar transform on each row. We take the
resultant matrix, and then apply the 1D Haar transform on each column. This
gives us the final transformed matrix. The source code for both the 1D and 2D
Haar transform can be downloaded here. The 2D Haar transform is used
extensively in image compression.