You are on page 1of 4

Web Pages Tamper-Proof Method Using Virus-Based Watermarking

Cong Jin, Hongfeng Xu, Xiaoliang Zhang


Department of Computer Science, Central China Normal University, Wuhan 430079, P.R.China
E-mail: jincong@mail.ccnu.edu.cn

Abstract of time polling and the cryptography has been far away
from what we need. A primary drawback of Hash
A novel tamper-proof model of web pages using algorithm [1] for tamper-proof of website is that it
virus-based watermarking is proposed in this paper. requires extra storage and channel to transmit the Hash
The model provides a good security and accuracy value. Recently developed watermark technique
about judging the situation of web page tampering. provides alternatives for integrity protection of digital
The classifying theory based on virus is applied when documents [2]. Katzenbeisser et al.[3] proposed a
watermark embedded and extracted. The proposed watermark-based method by adding space and tag into
scheme is applied in all kinds of HTML or XML files, the source code of web pages. However, it has the
not just for English letters but also for the rest of problem of expanding the size. The watermark scheme
characters. More importantly, it can be restored to the based on PCA [4] takes up greats of computing though
original file completely when the watermark is it does not expand the file size. The information hiding
extracted. Therefore, the proposed scheme, associated technology based on web page tags [5] purposes to
with 3rd generation technology of tamper-proof for web insert the information into the predicted position and
pages, exhibits a good property of real-time the tags may be executed by browser.
performance and security. Experiment results show This paper provides a novel watermark scheme that
that it overwhelms existing schemes of tamper-proof in can be associated with the 3rd generation technique.
that it does not increase the file size and it does not And it overcomes the defaults, exhibiting a good
expend great computing time such as cryptography. property of real-time performance, security.

1. Introduction 2. Theory and application

Websites plays an important role with the 2.1. Virus-based theory


development of information, and it has spread all over
the world. Greats of websites and web people put a Computer virus always adapts to divide itself into
good base for the rapid development of the information many pieces to hide in anywhere of the file [6]. The
times. The position of web pages has been enhanced virus could reduplicate and reconstruct to itself as long
increasingly. The web pages stands for its home for the as all the pieces are not deleted. However, when the
enterprise and the government. However, Hacker sequence of construction is changed, the virus will lose
intrusion and homepage tampers happen constantly vitality. Accordingly, the embedded watermark
through the system leak caused by the complexity and information is a sequence phrase. Once the web page is
diversity of application systems, although the tampered, the watermark information can be still
safeguard measures such as firewall and intrusion extracted but remains to be a changed sequence phrase.
detection have been taken. The tampered web pages So, we can compare the extracted watermark
account for over 30,000 during May, 2007. In other information with the embedded watermark information
words, about one website suffered being attacked in one to detect the security of the web page timely.
1.5 minutes on average. So it is extremely important
for us to exploit and develop the new scheme against 2.2. Application of watermark in tamper-proof
websites being tampered.
The technique of tamper-proof has developed to 3rd The key program of the tamper-detection will be
generation, which is the combination of file filter applied in the web server by the technology of file
driving and event-triggered technology. The technique filter driving, then automatic detection will be done by

978-1-4244-1724-7/08/$25.00 ©2008 IEEE 1012 ICALIP2008

Authorized licensed use limited to: University of South Australia. Downloaded on March 24,2010 at 23:25:02 EDT from IEEE Xplore. Restrictions apply.
the way of event-triggered. All the files in the folder
that have been sorted by fast algorithm will be 3.2. Watermark embedded algorithm
extracted out of the watermark information, which is
timely to be compared with the information embedded We select a character that exists in high frequency
in advance. If they don’t match, the corresponding file as the key. The key will act as the dividing point and
content of backup will be copied to the location of the the text, such as HTML and XML, will be segmented
tampered file. The process of copy is completed by the to lots of section. We call the section as element. All
way of the non-protocol and pure text, so it behaves the elements are divided into 32 classes just by Hash
high security. classified. Thus one class may contain several or more
Besides the process lasts only millisecond. The elements. Meanwhile it produces the 32-bit random
running property and real-time detection reach a sequence of ASCII value by seed, where the ASCII
relatively high standard. When users want to browse value ranged from 00 to 31.
the web page, the request will be sent to the web The 32 spices of information to be embedded
server. Once the server responds, it calls the program respond to 32-bit sequence by certain way. Then some
to extract the watermark out of the relative file. Then spaces in each class are replaced by the ASCII value
the file restores the original one that will be sent to according to the characteristic statistic of spaces. The
users. Please see Fig.1. relative table which contains the class, the character
embedded and the significant letters is created for
3. Watermark embedded and tamper extracting significant information.
detection The key issue is the location where we should
define. In this paper, the location is defined according
3.1. Watermark embedded and extracted to the statistics of “<” and “>” on each aggregate. If the
aggregate doesn’t contain any “<” or “>”, all the
Fig.2 and Fig.3 show method of watermark spaces will be replaced by the responding ASCII. Else
embedded and extracted respectively. the spaces that are in front of “<” and behind of “>”
will be replaced by the responding ASCII.
Key
3.3. Tamper detection

The program of tamper detection consists of two


Source file Segment by Key Classified steps:
by Hash 1) Firstly, watermark is extracted. The basic
classified step is the same as the step of watermark
Source file embedded, and then extracts the information on each
Information Characteristic
embedded statistic
aggregate. In theory, one kind of information is
extracted on each aggregate. Therefore if more than
one kind of information is extracted, the one that
Random sequence appears least on frequency is extracted. Then a
sequencing phrase is produced according to the relative
Fig.2 Watermark embedded model table.
The following is watermark comparing.
Key The extracted sequence phrase is compared with the
phrase embedded. If they are not matched, we can
draw the conclusion that the file has been tampered.
Secure Segment by Classified The responding file is copied from the backup files on
file Key by Hash the bottom layer of the OS at the fastest time.

Secure Information 3.4. Original file restored


file extracted
When the user requests the web page, the program
Random
of restoring original file is running. It traverses the
sequence
whole file in sequence and replaces the character
whose ASCII value is less than 31 into the space. Thus
Fig.3 Watermark extracted model

1013

Authorized licensed use limited to: University of South Australia. Downloaded on March 24,2010 at 23:25:02 EDT from IEEE Xplore. Restrictions apply.
the original file is running on the server and sent to the Because the embedded information is the invisible
user. characters whose ASCII value is less than 32, the
editor can not recognize them. Fig.6 reflects integrate
4. Experiments and analysis watermark information extracted in good effect, which
is consisting of meaningful phrase information. It can
In the experiment, we choose a simple HTML file not only present itself copyright, more importantly it is
as the original file, and its source code is also called used for detecting whether the web page file is
the cover file. Before the information will be tampered by the watermark matching.
embedded the cover file is read as the .txt file. The Fig.7 shows the watermark information extracted
experiment and analysis are as follows. from the tampered file just as adding the tag <td> and
</td>. Obviously three information bits have been
changed, so they don’t match. The hint will be given
that the file has been tampered. Actually the watermark
information is related with the length of each item in
aggregate, so the watermark will be different if the
code is added in or deleted.

5. Conclusions
Fig.4 Original cover file The paper provides a new watermark scheme which
is applied in tamper-proof of web page. It presents
good property as follows.
(1) It behaves less computing than the cryptography
that is always used in the second generation technique
[7]
. Also it can make the most accurate judge on
whether the file has been tampered, which is not done
by the cryptography. Table 1 presents the running time
of both algorithms.

Fig.5 Watermark embedded file Table 1 Comparison of two algorithms on


running time
Algorithm Cryptography Proposed
Running time 2.163s 1.872s

Fig.6 Watermark (2) The size of all embedded files doesn’t expand
and it’s the same size as before, although the
watermark is embedded.
(3) The scheme, associated with the 3rd generation,
provides the good security for the website. The user
can’t look through the tampered web page because the
restored file can’t be sent to the user when the
Fig.7 Watermark tampered
extracted watermark doesn’t match the embedded
watermark. Besides the program can detect the security
In the experiment, the detect program is simulated.
file timely and copy the file on bottom layer to cover
The program reads the source code of web page
the tampered file once being detected to be tampered.
showed as Fig.4. The secure file is obtained when the
Actually the speed of Internet traveling is so fast that it
watermark information is embedded. Thus it is just the
proposes high requirement on security.
file that saves in the path on the server and becomes
On the opposite, if it doesn’t depend on the 3rd
the object to be attacked by hacker. Fig.5 presents the
generation, the program must match the information
watermark embedded file. We can see that there are no
with the embedded one when the server responds to the
distinct difference between original and the watermark
user, which will surely add the time consumption and
embedded file.
slow down the speed of browsing the web page.

1014

Authorized licensed use limited to: University of South Australia. Downloaded on March 24,2010 at 23:25:02 EDT from IEEE Xplore. Restrictions apply.
(4) It accomplishes blind detection, which decreases Watermarking system. IEICE Trans, Fundamentals E
the overhead and consumption on OS. 87-A(4): 949-951, 2004
One side to be mentioned is that the new scheme
presents the fragile watermark so that it makes the [3] S.Katzenbeisser, A.P.Petitcols. Information hiding
techniques for steganography and digital Watermark.
tamper detecting behave good robustness and security. Boston, Artech House, 2000
However, it demands the server to be good property
and high speed. Other side, the database linked to the [4] Qijun Zhao and Hongtao Lu. A PCA-based
file should be copied and modified timely. watermarking scheme for tamper-proof of web pages.
Pattern Recognition 38: 1321-1323, 2005
6. Acknowledgements
[5] Changzheng Wang and Jianhui Liu. Research and
implementation of the information hiding technology
This work was supported by the Natural Science based on web page tags. 2007
Foundation of Hubei (China) and Grant
No.2007ABA119. [6] Haiyan Zhou, Fengsong Hu and Can Chen. English text
digital watermarking algorithm based on idea of virus.
7. References Computer Engineering and Applications, 43(7): 78-80,
2007
[1] W. Stallings. Cryptography and network security
[7] Liu Gu. Research and implementation of information
principles and practice. Prentice-Hall, Englewood
hiding based on web page, Microcomputer
Cliffs, NJ, 1999
Information. 22: 186-187, 2006
[2] Guorui Feng, Lingge Jiang and Chen He. Orthogonal
transformation to enhance the security of the still image

Request Copy
The file of information embedded

User Backup
Watermark extracted Tamper detection Files

Respond N

Original file Watermark match

End

Fig.1 Application of watermark in tamper-proof

1015

Authorized licensed use limited to: University of South Australia. Downloaded on March 24,2010 at 23:25:02 EDT from IEEE Xplore. Restrictions apply.