Professional Documents
Culture Documents
Abstract— In this paper a new data compression Therefore, the output of the block sorting for the string
technique has been proposed based on Lampel Ziv Welch mississippi is the pair {pssmipissii, 4}.
(LZW) coding and block sorting. At first block sorting is Remarkably, it is possible, given only this output, to
performed on input data to produce permuted data. LZW reconstruct the original text. This is possible because the
coding is then applied on that permuted data to produce matrix in Figure 1 is constructed using cyclic rotations;
more compressed output. Though block sorting takes some that is, the characters in the last column of Figure 1(b)
extra time (the amount is very negligible), it increases the cyclically precede those in the first column. Figure 1(d)
performance of LWZ compression much. The proposed shows this relationship more clearly: the characters in the
model is compared with respect to LZW compression.
left-hand column (which are taken from the last column of
the sorted matrix) immediately precede those in the right-
Keywords—Block sorting, cyclic rotation, permuted string,
reverse block sorting, LZW.
hand column (the first character of the sorted matrix). If
we refer to the first and last columns of the sorted matrix
as F and L respectively, then L is the output from the
I. INTRODUCTION block sorting, and F can be obtained by the decoder
simply by sorting the characters of L. Then we see that
Data compression algorithms are developed to reduce each character in L is followed by the corresponding
redundancy in data representation in order to decrease data character in F. In particular, we know that in this case L[4]
storage requirement. It is broadly divided into 2 is the first character of the text; therefore, F[4] (the letter
categories; lossless and lossy. LZW compression is very i) must be the second character. We must now consider
much popular in the field of lossless data compression. In the problem of locating the next character to decode. We
that paper a modification of LZW compression has been know that each character in F must correspond to a
devised through block sorting [3], which is a concept for character in L, since the two strings are merely
producing permuted data. In the organization of this paper permutations of one another. However, the letter i occurs
we first describe block sorting, and then LZW coding four times in L. Which one corresponds to the, i that has
technique. The proposed model and experimental results just been decoded (the fourth i in F)? To answer this
comes in succession and proves the superiority of the question, we must observe that each group of characters in
proposed technique over LZW compression. F occurs in the same order in L. For example, the third s in
F corresponds to the third s in L, and so on. Since we are
dealing in this instance with the fourth i in F, this
II. LITERATURE SURVAY corresponds to the fourth i in L (L[11]), which, we then
discover, is followed by an s, and so forth. Decoding the
A. Basic Concepts about Block Sorting rest of the string in a similar manner gives the order 4, 11,
9, 3, 10, 8, 2, 7, 6, 1, 5.
Block sorting [3] is a technique for making text files
more amenable to compression by permuting the input file
so that characters that appear in similar contexts are
clustered together in the output.
Figure 1 shows how the block sorting would be
performed on the shorter text mississippi. First, all the
cyclic rotations of the text are produced. Next, the
characters are sorted into the order of their contexts.
Finally, the permuted characters are transmitted. For
example, the complete permuted text for the string
mississippi is pssmipissii (Figure 1(c)).
It is also necessary to transmit the position in the
permuted text of the first character of the original text. In
the case of the example in Figure 1, we transmit the
number four, since the letter m, which is the first character
of the input, appears at position four in the permuted
string.
F L
extend the 8-bits to few more bits to hold 256(100h) and
above. If we extend it to 12-bits, then we can store up to
1 mississippi 1 i mississip p 4096 elements in the table. So when we store each
2 ississippim 2 i ppimissis s element in the table it is to be converted to a 12-bit
3 ssissippimi 3 i ssippimis s number.
4 sissippimis 4 i ssissippi m For example, when you want to store A(dec-65, hex -
5 issippimiss 5 m ississipp i 41), T(dec-84, hex-54), O(dec-79, hex-4F) and Z(dec-90,
6 ssippimissi 6 p imississi p hex-5A), you have to store it in bytes as 04, 10, 54, 04, F0,
7 sippimissis 7 p pimississ i 5A . The reason is, we have allotted only 12-bits for each
8 ippimississ 8 s ippimissi s character. Consider a string .ATOZOFC.. It takes 7x8(56)
9 ppimississi 9 s issippimi s bits. Suppose if a code is assigned to it as 400(190h), it
10 pimississip 10 s sippimiss i will take only 12-bits instead of 56-bits!
11 imississipp 11 s sissippim i Example
Input string: ATOZOFCATOZOFCATOZOFC
(a) (b)
L F Characters String Process in In file
Read Stored / Table
1 p 1 p i Retrieved
2 s 2 s i
3 s 3 s i A Store
4 m* 4 m* i T AT Store Store
5 i 5 i m O TO Store Store
6 p 6 p p Z OZ Store Store
7 i 7 i p O ZO Store Store
8 s 8 s s F OF Store Store
9 s 9 s s
C FC Store Store
10 i 10 i s
A CA Store -
11 i 11 i s
T AT Retrieve Store
Relevant
(c) (d) Code
Figure 1: Block sorting of the string mississippi: O AT0 Store -
(a) rotations of the string; (b) sorted matrix; (c) permuted Z OZ Retrieve Store
string (last character of sorted matrix); (d) permuted Relevant
string and sorted string. Code
O OZO Store -
B. Basic Concepts about LZW Compression F OF Retrieve Store
In LZW compression algorithm [1, 2, 4], the input file Relevant
that is to be compressed is read character by character and Code
they are combined to form a string. The process continues C OFC Store -
till it reaches the end of file. Every new string is assigned A CA Retrieve Store
some code and stored in Code table. They can be referred
when the string is repeated with that code. The codes are Relevant
assigned from 256, since in ASCII character set we have Code
already 256(0-255) characters. The decompression T CAT Store -
algorithm expands the compressed file. Here the file, O TO Retrieve Store
which is created in the compression, is read character by Relevant
character and it is expanded. This decompression process Code
doesn’t require the Code table built during the Z TOZ Store -
compression. O ZO Retrieve Store
Here the 1st and the 2nd characters are combined to Relevant
form a string and they are stored in the Code table. The Code
code 256(100h) is assigned to the first new string. Then F ZOF Store -
2nd and 3rd characters are combined and if that string is
not available in the Code table, it is assigned a new code C FC Retrieve Store
and it is stored in the Code table. Thus we are building a Relevant
Code table with every new string. When the same string is Code
read again, the code already stored in the table will be
used. Thus compression occurs when a single code is
outputted instead of a set of characters [1, 5]. In this example-string, the first character A is read and
The extended ASCII holds only 256(0 to 255) then the second character T. Both the characters are
characters [4, 5]and it requires just 8-bits to store each concatenated as AT and a code is assigned to it. The code
character. But for building the Code table, we have to is stored in the Code table. Since this is the first string that
is new to the table, it is assigned 256(100h). Then the is also built concurrently when each new string is read.
second and the third characters are concatenated to form When we read 100, 102 etc., we can refer to the relevant
another new string TO. This string is also new to the Code code in the table and output the relevant code to the file.
table and the table expands to accommodate this new For example, when we reach the 4th set of characters and
string and it is assigned the next code 257(101h). Thus read 04, 31 and 00 they must be converted to 12-bit form
whenever a new string is read after concatenation it is as 043 and 100 will refer to the code in the table and
assigned a relevant code and the Code table is build. The outputs the string C and AT respectively. Thus we can get
table expands till the code reaches 4096 all the characters without knowing the previous Code
(since we have assigned 12-bits) or it reaches the end of table.
file. When the same set of characters that is stored in the
table is again read it is assigned to the code in the Code
table. Thus according to the number of bits specified by III. A NEW MODEL FOR COMPRESSING DATA
the program the output code is stored. In other words, if LZW works on input data to produce compressed
we have extended the bits from 8 to 12 then the character output. But in that model LZW works on block sorted data
that is stored in 8-bits should be adjusted so as to store it to produce a more compressed output. The proposed
in 12-bit format. model is given in Figure 2.
C. Basic Concepts about LZW Decompression Input data Compressed data
The file that is compressed is read byte by byte [1, 2].
The bytes are concatenated according to the number of
bits specified by us. For example, we have used 12-bits Block sorting LZW decompression
for storing the elements so we have to read first 2-bytes
and get the first 12-bits from those 16-bits. Using this bits
Code table is build again without the Code table
previously created during the compression. Use the LZW compression Reverse bock sorting
remaining 4-bits from the previous 2-bytes and next byte
to form the next code in the string table. Thus we can
build the Code table and use it for decompression. This Compressed data Source data
decompression algorithm builds its own Code table and it
(a) (b)
will be same as the table created during the compression.
The decompression algorithm refers this newly created Figure 2: Proposed model (a) Compression
Code table but not the Code table created during the (b) Decompression.
compression. This is the main advantage in this algorithm.
V. CONCLUSION
The proposed model modifies the LZW compression [2] Jacob Ziv and Abraham Lempel,
Compression of Individual Sequences Via
[1, 2] by performing block sorting [3] on inputted data
Variable-Rate Coding, IEEE Transactions on
first; and then performs all the operations similar as Information Theory , September 1978.
[2]. The results have shown that the method achieves [3] Burrows, M. and Wheeler, D.J (1994) A block
sorting Lossless Data Compression
better compression ratio but an increase in compression Algorithm, SRC Research Report 124, Digital
and decompression time. Here the increase of System Research center, Palo
Alto,CA,gatekeeper.doc.com,
compression and decompression time is very negligible /pub/DEC/SRC/research-reports/SRC-
considering the increase of compression ratio. 124.ps.Z.
[4] http://marknelson.us/1989/10/01/lzw-data-
compression
REFERENCES [5] http://en.wikipedia.org/wiki/LZW#References
[1] Welch, T. A. , A technique for high-
performance data compression, Computer. Vol.
17, pp. 8-19, June 1984.