Professional Documents
Culture Documents
Abstract: There are large numbers of compression techniques the input/output channels of the computer system. In order
available which are capable of application specific large to discuss the relative merits of data compression
compression ratios. Here we limit our scope to the LZW techniques, a framework for comparison must be
algorithm, which can be considered as the industry standard for established. There are two dimensions along which each of
the loss less data compression. This paper presents a modified the schemes discussed here may be measured, algorithm
approach for LZW algorithm, which will incorporate. The complexity and amount of compression. When data
dynamic restructuring of the number of bits and Feedback
compression is used in a data transmission application, the
mechanism. The level of compression depending on the above
factors which are crucial for working of LZW. LZW
goal is speed. Speed of transmission depends upon the
compression excels when confronted with data streams that have number of bits sent, the time required for the encoder to
any type of repeated strings. Because of this, it does extremely generate the coded message, and the time required for the
well when compressing English text. Compression levels of at decoder to recover the original ensemble. In a data storage
least 50% or better than 50% mainly achieved. Likewise, application, although the degree of compression is the
compressing saved screens and displays will generally show very primary concern, it is nonetheless necessary that the
good results. algorithm be efficient in order for the scheme to be
Keywords: LZW, Feedback mechanism, Dynamic practical.
Restructure of Bits, Compression Level.
2. LZW Coding Technique
1. Introduction
The original Lempel Ziv approach to data compression
Data compression [5, 8] is the process of converting an was first published in in1977. Terry Welch's refinements to
input data stream (the source stream or the original raw the algorithm were published in 1984. The algorithm is
data) into another data stream (the output, or the surprisingly simple. In a nutshell, LZW compression
compressed stream) that has a smaller size [10, 11] A replaces strings of characters with single codes [4, 13, 14].
stream is either a file or a buffer in memory. A simple It does not do any analysis of the incoming text. Instead, it
characterization of data compression is that it involves just adds every new string of characters it sees to a table of
transforming a string of characters in some representation strings. Compression occurs when a single code is output
(such as ASCII) into a new string (of bits, for example) instead of a string of characters. The code that the LZW
which contains the same information but whose length is as algorithm outputs can be of any arbitrary length, but it must
small as possible. Data compression has important have more bits in it than a single character. The first 256
application in the areas of data transmission and data codes (when using eight bit characters) are by default
storage. Many data processing applications require storage assigned to the standard character set. The remaining codes
of large volumes of data, and the number of such are assigned to strings as the algorithm proceeds. The
applications is constantly increasing as the use of computers sample program runs as shown with 12 bit codes. This
extends to new disciplines. At the same time, the means codes 0-255 refer to individual bytes, while codes
proliferation of computer communication networks is 256-4095 refers to substrings.
resulting in massive transfer of data over communication
links. Compressing data to be stored or transmitted reduces 2.1 Compression
storage and/or communication costs. When the amount of The LZW compression algorithm in its simplest form is
data to be transmitted is reduced, the effect is that of shown below. A quick examination of the algorithm shows
increasing the capacity of the communication channel. that LZW is always trying to output codes for strings that
Similarly, compressing a file to half of its original size is are already known. And each time a new code is output, a
equivalent to doubling the capacity of the storage medium. It new string is added to the string table [4].
may then become feasible to store the data at a higher, thus
faster level of the storage hierarchy and reduce the load on
(IJCNS) International Journal of Computer and Network Security, 23
Vol. 1, No. 1, October 2009
immediately following, which could lead to the increase in compression ratio at suitable intervals, after each such
the compression ratio. interval the compression ratio is compared against a
threshold value, if the compression ratio is below the
4. The Proposed Compression Method threshold the dictionary is flushed out (FLUSH_CODE is
used here) and a new one is created which helps us to
Based on the above-discussed parameters we propose a improve the compression ratio.
modified approach, which will incorporate the following:
*The dynamic restructuring of the number of bits (based on 5. Application Constraints
the absolute value of the sequence number of the current
output). The application constraints, which are applicable to the
*Feedback mechanism. LZW, are also applicable to our modified version. It is
particularly suitable for text files and the performance
4.1 Incorporating the Dynamic Restructuring of the
cannot be guaranteed for the image files. The compression
Number of bits:
ratio becomes better with the increase in the size of the
If normal restructuring of the number of bits is applied on source file.
the basis of the absolute value of the index of the current
entry in the dictionary then a significant number of bits is 6. Results
saved. For example suppose we completely fill a 4096 size
dictionary this would mean that we use 8-bits up to 255, It is somewhat difficult to characterize the results of any
9bits up to 511, 10 bits up to 1023, 11 bits up to 2047 and data compression technique. The level of compression
12 bits only for the range 2048-4095. Thus in effect the total achieved varies quite a bit depending on several factors.
number of bits saved as compared the approach without the LZW compression excels when confronted with data
dynamic restructuring of the number of bits will be. streams that have any type of repeated strings. Because of
12*4096-(8*256+9*256+10*512+11*1024+12*2048) this, it does extremely well when compressing English text.
which is equal to 3840 bits thus quite an achievement. Compression levels at least 50% or better should be
Generally the benefit will be more as the total input to be expected. Likewise, compressing saved screens and displays
scanned would not be such that its contents will be such that will generally show very good results. We applied this
they will only saturate the dictionary and no input will be approach on various types file like Text, Image and Sound
left which could be scanned. For several hundred Kbytes of and results are very good with respect to compression ratio.
information we can even increase the dictionary size to 15 We present our experimental result in the form of table.
bits. To let the decompression program know when the bit
size of the output code is going to change, a special Table 1: The compression effect on the size of Text files
BUMP_CODE is used. This code tells the decompression
program to increase the bit size immediately. Another Type of file Starting Compressed Compression
variant on LZW compression method is to build a phrase by
concatenating the current phrase and the next character of size in (kb) size in (kb) Ratio (%)
data. This causes a quicker buildup of longer strings at the
cost of a more complex data dictionary an alternative
Sample1.doc 120 67.4 44
method would be to keep track of how frequently strings are
used, and to periodically flush values that are rarely used.
Sample2. doc 121 64 47
An adaptive technique like this may be too difficult to
implement in a reasonably sized program. One final Sample3. doc 883 453 49
technique for compressing the data is to take the LZW codes
and run them through an adaptive Huffman coding filter. Sample4. doc 46.5 20.8 56
This will generally exploit a few more percentage points of Sample5. doc 103 43.9 58
compression, but at the cost of considerable more
Sample6. doc 37.5 9.56 75
complexity in the code, as well as quite a bit more run time.
4.2 Using Feedback
Compression Graph of Text File:
The standard LZW uses a single dictionary, i.e. the
dictionary is only created once, if it gets filled then no more TEXT FILES RESULT
patterns can be formed and hence the only replaceable STARTING SIZE
patterns are the ones which exist in the dictionary. What we 1000
mean to say that once the dictionary is full the input read is 800
600 Starting Size(kb)
compared to the patterns within the dictionary if matched 400
they are replaced by the appropriate code word otherwise the 200
Compressed
0
characters are outputted as such. This approach followed by Size(kb)
1 2 3 4 5 6 7 8 9 10
the standard LZW at times leads to reduction in the
COMPRESSED SIZE
compression ratio. To overcome the above stated drawback
we suggest dynamic feedback to the algorithm. The
feedback is in the sense that we keep monitoring the Figure1. Effect of the size on compression for Text file
(IJCNS) International Journal of Computer and Network Security, 25
Vol. 1, No. 1, October 2009