You are on page 1of 21

Steganalysis:

A Steganography Intrusion Detection System

Angela D. Orebaugh

George Mason University

Abstract

Steganography is the art of information hiding. In todays digital age messages can be hidden in
images, sound files, text, and other digital objects. To a casual observer, these messages are
invisible. The use of steganography on public networks, such as the Internet, is unknown due to
its stealthy nature. Unless it is being actively looked for, one would not know that it is there.
For example, pop-up ads, photos on Ebay, and other recreational sites, all have the potential of
containing hidden messages. Although some groups have taken on the vast responsibility of
searching large web sites and news group areas for potential steganographic images, this is not
something an average organization would do. Most organizations can and do, however, monitor
network traffic that is entering and exiting the local area network. This paper presents a
detection framework that includes tools to detect, retrieve, and analyze images for
steganographic content as they enter and exit a monitored network. The framework is comprised
of the Steg_IDS engine that operates in the UNIX environment. Steg_IDS combines both custom
written and third party software and processes to deliver a purpose-built steganography
intrusion detection system.

1. Introduction

Steganography is not only the art of information hiding, but also the art and science of hiding the
fact that communication is even taking place. The real message is hidden in a cover medium so
in general someone cant tell that you are sending a secret message. Information can be hidden
in a variety of formats including images (BMP, GIF, JPG), Microsoft Word documents, text
documents, etc. Steganography differs from cryptography in that it provides secrecy of the data
being sent. When a cryptographic message is steganographically concealed, it is indecipherable
and undetectable. Steganography requires a host file (cover medium) and hidden file. The host
file conceals the data of the hidden file. For the purpose of this research we will be considering
steganographic messages to be embedded in files, however messages can also be hidden in IP
packets, data streams, etc.

The Steg_IDS engine is a combination of custom written code and existing third party products,
such as Snort, Crawl, and Stegdetect. Crawl also requires the libevent library module. Currently
Steg_IDS only operates with JPG images due to the Stegdetect constraints. Steg_IDS and its
components are a combination of UNIX shell scripting and C code. The development system
consisted of RedHat 8.0 with Snort v1.9.0, Crawl v0.3, and Stegdetect v0.5.
The remainder of this paper is organized as follows: Section 2 discusses current steganographic
methods and how they are used, Section 3 discusses current steganalysis methods used to detect
steganography, Section 4 discusses the various tools used for steganography, Section 5 discusses
the architecture and modules of the Steg_IDS engine, Section 6 discusses the false alarm rate
study of the Stegdetect tool, Section 7 discusses the results of the steganography research, and
Section 8 concludes with the limitations and future enhancements of the Steg_IDS engine. In
addition, several Appendices are included that contain the source code, configuration files, and
log files of the Steg_IDS engine.

2. Steganographic Methods

Steganographic methods include injection, substitution, and the generation of a new file.
Injection involves hiding the message in parts of a file that will be ignored by the application.
This can include comment tags, hidden form elements, and other holes in the data. Substitution
involves replacing the insignificant data in the host file with the hidden message. An example of
substitution is least significant bit replacement. Steganography can also be accomplished
without the need for a host file, by using the secret message to generate a new file. An example
would be to use the secret file as input to generate fractals [SANS]. Steganography today can
include audio, video, and digital images as medium. The focus of this project will be on digital
images only.

Some of the common approaches of hiding information in digital images include:


Least significant bit insertion
Masking and filtering
Algorithms and transformations.

Least Significant Bit Substitution

The most common way to embed data in an image is to replace the least significant bits (LSB).
For example, in an 8-bit image, each pixel is represented by 8 bits, such as 01101010. The most
significant bits (MSB) are the ones to the left, and the least significant bits are the one to the
right. Changing the MSBs will have a noticeable impact on the color, however, changing the
LSBs will not be noticeable to the human eye. This results in a high number of near-duplicate
colors. Most humans can only detect around 6 or 7 bits of color, and women see more color than
men. Radiologists are trained to see incredibly subtle difference in gray on X-ray films and can
detect 8 or more bits of color. In general though 01101010 could be changed to 01101011,
01101000, or 01101001 and would go unnoticed by the casual observer. The last two bits could
be used to embed data, allowing a message of 10110001. [SANS]. The image formats typically
used in the LSB substitution are lossless and the data can be directly manipulated and recovered.
One significant advantage of this method is that it is simple to implement. It also can achieve
both high capacity and low perceptibility since every cover bit can contain one bit of the secret
message. However, due to the fact that the data are hidden in the LSB may be known, LSB
methods are vulnerable to extraction and attacks such as compression and cropping. Many
steganography tools that make use of the LSB technique are S-tools, EzStego, White Noise
Storm, Steganos, StegoDos, etc.

Masking and Filtering


These technologies hide information by marking the cover image in a similar way to paper
watermarks. The message is embedded in significant areas of the cover image in such a way that
the hidden message becomes an integral part of it, making the message less vulnerable to image
processing such as compression and cropping.

Perceptual masking refers to any situation where information in certain regions of an image is
occluded by perceptually more prominent information in another part of the scene. Thus, to
create a watermarked image, one can increase the luminance of the masked area by a small
percentage such that it can be used to hide plaintext or encoded information. By covering or
making a faint signal with another to make the first signal non-perceptible, the human visual
system cannot detect slight changes of the image [Sellers]. Masking techniques are more suitable
for use in digital watermarking, as seen in Figure 1.

Figure 1 - Digital Watermarking

Algorithms and transformations

Other more robust methods of hiding information in images include applications that involve
manipulation of mathematical functions and image transforms. The widely used transformation
functions include Discrete Cosine Transformation (DCT), Fast Fourier Transform (DFT), and
Wavelet Transformation. The basic approach to hiding information with DCT, FFT or Wavelet is
to transform the cover image, tweak the coefficients, and then invert the transformation. If the
choice of coefficients is good and the size of the changes manageable, then the result is pretty
close to the original. These methods allow more data to be hidden in a cover image and many of
the tools available can hide a file approximately 30% the size of cover. [Blake]

DCT is a mechanism used in the JPEG compression algorithm to transform successive 88-pixel
blocks of the image from spatial domain to 64 DCT coefficients each in frequency domain. The
least significant bits of the quantized DCT coefficients are used as redundant bits into which the
hidden message is embedded. The modification of a single DCT coefficient affects all 64 image
pixels. Because this modification happens in the frequency domain and not the spatial domain,
there are no noticeable visual differences. [Provos] The advantage DCT has over other
transforms is the ability to minimize the block-like appearance resulting when the boundaries
between the 8x8 sub-images become visible (known as blocking artifact) [Johnson]. The
statistical properties of the JPEG files are also preserved [Hetzl]. The disadvantage is that this
method only works on JPEG files since it assumes a certain statistical distribution of the cover
data that is commonly found in JPEF files.

FFT is another useful transform for steganography since it is based on algorithm optimized for
the digital data files that are often used to hide information today. Many of these files use FFT to
analyze data, find the dominant characteristics, and then use this information to enhance the data
or perhaps compress it. It provides a simple way to embed several signals together in a larger
one. The basic idea is to take a mathematical function f and represent it as the weighted sum of
another set of functions. FFT provides ideal ways to mix signals and hide information by
changing the coefficients. There are current proposals offer a novel way to hide information in
an image by tweaking the k largest coefficients of an FFT [Wayner]. The largest coefficients
correspond to the most significant parts of the data stream. They are the frequencies that have
the most energy or do the most for carrying the information about the final image. This is
better than hiding data in noise that corresponds to the smallest coefficients, since noise is most
likely to be modified by compression, printing, or using a less than perfect conversion process.
The most significant parts of the signal are less likely to be damaged without damaging the entire
signal.

Wavelet-based steganography is a new idea in the application of wavelets. However, the


standard technique of storing in the least significant bits (LSB) of a pixel still applies. The only
difference is that the information is stored in the wavelet coefficients of an image, instead of
changing bits of the actual pixels. The idea is that storing in the least important coefficients of
each 4 x 4 Haar transformed block will not perceptually degrade the image [Wayner]. While this
thought process is inherent in most steganographic techniques, the difference here is that by
storing information in the wavelet coefficients, the change in the intensities in images will be
imperceptible.

Another way to embed message is by altering the statistics of the luminance of the pixels of the
cover image. This modification is typically small to take advantage of human weaknesses to
luminance variation. Patchwork algorithm and other similar techniques use redundant pattern
encoding to randomly select multiples areas (or patches) to repeatedly scatter hidden information
throughout the cover image. This works basically the same as painting a small message over
an image many times. In the Patchwork algorithm, a pseudo-random generator is used to select
n pairs of pixels (a, b) to hold the message. The brightness value of a is increased by a constant
while the brightness value of b is decreased by the same amount. This leaves the total amplitude
of the image (and therefore the average amplitude) unchanged, making it hard for human eyes to
detect the change. By repetitively doing this, the data is more robust against attacks. An
advantage over this method is that it can withstand cropping as the hidden information is
painted in multiple patches. However, there is a trade-off between message size and
robustness. For example, a small message may be repeatedly painted across the cover image and
there is a high probability that the message (watermark) can still be read after the image has been
cropped. A large message on the other hand may be painted only once across the cover image
and will therefore be vulnerable to cropping. To embed more, one can first split the image into
pieces and then apply the embedding to each of them. [Johnson]

Spread spectrum technology can also be used in steganography. Spread spectrum steganograhpy
scatters an encrypted message throughout an image (not just like LSB). It is based on spread
spectrum radio techniques which have been developed for military applications since the mid
1940s because of their anti-jamming and low-probability of intercept properties. This can be
accomplished by modulating the narrowband waveform with a wideband waveform, such as
white noise [Petitcolas]. After spreading, the energy of the narrowband signal in any frequency
band is low and therefore difficult to detect. Spread Spectrum Image Steganography (SSIS) uses
this concept to embed a message for image steganography. The fundamental concept of SSIS is
the embedding of the hidden information with noise, which is then added to the digital image.
This noise is typical of the noise inherent in the image acquisition process, and if kept at low
levels, is not perceptible to the human eyes or detection by computer analysis without access to
the cover image. SSIS works by modulating the hidden message so that it looks like Gaussian
noise and then adding the noise to an image. The resulting signal, perceived as noise, is then
added to the cover image to produce the stego-image. Since the power of the embedded signal is
low compared to the power of the cover image, the SNR is also low, thereby indicating lower
perceptibility and providing low probability of detection by an observer. Subsequently, an
observer will be unable to visually distinguish the original image from the stego-image with the
embedded signal [Boncelet]. The encoded data will appear like noise to an outsider but a
legitimate receiver, furnished with an appropriate key, can recognize. To decode the message,
the recipient needs the algorithm, crypto-key and stego-key. This method helps protect against
hidden message extraction but is still vulnerable to destruction from compression and image
processing.

3. Steganalysis - Detecting Steganography

In analogy to a cryptosystem a steganographic system can be defined. The model of a


steganographic system in Figure 2 (also referred to as stegosystem) defines the data and the
processes involved as well as the relationships among them.

key K key K

cover C
stego C' emb E
fE fE -1

emb E

Sender Recipient

Figure 2 - Stegosystem

The Stegosystem has the following components:


emb
The message to be embedded.
cover
The data in which emb will be embedded.
stego
A modified version of cover that contains the embedded message emb.
key
Additional secret data that is needed for the embedding and extracting processes and
must be known to both, the sender and the recipient.
fE
A steganographic function that has cover, emb and key as parameters and produces stego
as output.
fE-1
A steganographic function that has stego and key as parameters and produces emb as
output. fE-1 is the inverse function of fE in the sense that the result of the extracting
process fE-1 is identical to the input E of the embedding process fE.

The embedding process fE embeds the secret message E in the cover data C. The exact
position(s) where E will be embedded is dependent on the key K. The result of the embedding
function is (usually) a slightly modified version of C: the stego data C'. After the recipient has
received C' he starts the extracting process fE-1 with the stego data C' and the key K as
parameters. If the key that is supplied by the recipient is the same as the key used by the sender
to embed the secret message and if the stego data the recipient uses as input is the same data the
sender has produced (i.e. it has not been modified by an adversary) then the extracting function
will produce the original secret message E. [Hetzel]

As with cryptography and cryptanalysis, steganalysis is defined as the art and science of
breaking the security of steganographic systems. As the goal of steganography is to conceal the
existence of a secret message, a successful attack on a steganographic system consists of
detecting that a certain file contains embedded data. As in cryptanalysis if it is assumed that the
steganographic system is known to the attacker and that thereby the security of the
steganographic system lies solely on the fact that the secret key is not known to the attacker.
[Hetzel]

The stegosystem can be extended to include scenarios for different attacks similar to the attacks
on cryptographic systems. Figure 3 shows a diagram for such an extended stegosystem. The
circles denote spots at which a potential attacker can have access to the steganographic system.
An attacker can have access to one or more of these spots resulting in different kinds of attacks.
[Hetzel]

An important distinction must be made between passive attacks where the attacker is only able to
intercept the data and active attacks where the attacker is also able to manipulate the data. In the
diagram in Figure 3 a filled circle means that the attacker has sufficient access to the data to
perform an active attack. If a circle is not filled, the attacker is only able to perform a passive
attack, i.e. to intercept the data. [Hetzel]
key K key K

cover C
stego C' emb E'
-1
fE fE
emb E

Sender Recipient

passive attacks:

stego-only-attack/stego*-attack
cover-stego-attack
emb-stego-attack
cover-emb-stego-attack

active attacks:
manipulating stego

manipulating cover

Figure 3 - Extended Stegosystem

The following attacks are possible in this model of the stegosystem:

stego-only-attack
The attacker has intercepted the stego data and is able to analyze it.

stego*-attack
The sender has used the same cover repeatedly to embed data. The attacker possesses
different stego files that originate from the same cover file. In each of these stego files a
different message is embedded.

cover-stego-attack
The attacker has intercepted the stego file and knows which cover file was used to create
this stego file. This provides an advantage over the stego-only-attack for the attacker.

cover-emb-stego-attack
The attacker has intercepted the stego file and knows not only which cover was used to
create this stego file but also the message that is embedded in this stego file.

manipulating the stego data


The attacker has the ability to manipulate the stego data. If the attacker only wants to
determine if a message is hidden in this stego file this usually does not provide an
advantage but being able to manipulate the stego data usually means that the attacker is
capable of removing the secret message in the stego data (if there is one).

manipulating the cover data


The attacker can manipulate the cover data and intercept the resulting stego data. This
can make the task of determining whether the stego data contains a hidden message easier
for the attacker. [Hetzel]

The stego*-attack and the cover-stego attack can be prevented if the user of the steganographic
system acts with caution. A user should never use the same cover twice to embed a secret
message and should also not use files as cover that are widely available, for example the
Windows startup sound or standard backgrounds that come with Windows. It might even be
risky to use an image or a sound file that can easily be found on the web. [Hetzel]

The stego-only-attack is the most important attack against steganographic systems because it will
occur most frequently in practice. Different methods have been developed to determine whether
a certain stego file contains hidden data. Two different approaches can be distinguished: visual
attacks which rely on the capabilities of the human visual system and statistical attacks which
perform statistical tests on the stego file. [Hetzel]

The visual attack is a stego-only-attack that exploits the assumption of most authors of
steganography programs that the least significant bits of a cover file are random. This is done by
relying on a human to judge if an image presented by a filtering algorithm contains hidden data.
The filtering algorithm removes the parts of the image that are covering the message. The output
of the filtering algorithm is an image that consists only of the bits that potentially could have
been used to embed data. The filtering of the potential stego image is dependent on the
steganographic embedding function that is analyzed. However, as most of the embedding
functions are similar in most cases only small changes are necessary to adapt an existing filtering
algorithm to another steganographic embedding function. [Hetzel]

Visual attacks have two important drawbacks. If many images should be analyzed they are very
slow or very costly because every image must be filtered, displayed and looked at by a human.
The other important drawback is that some (unmodified) images might contain random looking
data in it's least significant bits. If such an image is used as cover file the visual attack will fail.
[Hetzel]

Statistical attacks exploit, similar to visual attacks, the fact that most steganography programs
treat the least significant bits of the cover file as random data and therefore assume that they can
overwrite these bits with other random data (the encrypted secret message). However, as the
visual attacks have showed the least significant bits of an image are not random. When a
steganography program embeds a bit through overwriting the least significant bit of a pixel in the
cover file, the color value of this pixel is changed to an adjacent color value in the palette (or in
the RGB cube if the cover file is a true-color image). Now look at two adjacent color values (a
pair of values, also referred to as PoV), where adjacent means identical except for the least
significant bit. When overwriting the least significant bits of all occurrences of one of these
color values with a bit from the secret message, the frequencies of these two color values will
essentially be the same. This happens because the data that is embedded is encrypted and
therefore equally distributed. [Hetzel]
The idea of the statistical attack is to compare the frequency distribution of the colors of a
potential stego file with the theoretically expected frequency distribution for a stego file. The
theoretically expected frequency distribution is calculated as follows: Under the assumption that
only the least significant bits are overwritten and that the embedded data is equally distributed
the expected frequency distribution is that for each PoV the frequencies of the two colors are the
same. Due to the fact that the sum of the occurrences of the two colors in a PoV is not changed
by the embedding process, the expected frequency can be calculated as the median of the
frequencies of a PoV in the potential stego file. The degree of similarity of the frequencies in the
potential stego file and the theoretically expected frequencies is a measure for the probability that
the analyzed file contains a hidden message. This statistical attack is implemented using a chi-
square test. [Hetzel]

Methods to preserve the color frequency are also proposed. Assuming that enough redundant
bits remain unused to provide the possibility to correct the statistical distortions a correction
algorithm can be run after the embedding is finished. This correction algorithm changes color
values in the redundant bits that do not contain embedded data in a way that the color
frequencies of the stego image equal those of the cover image. [Hetzel]

Another possibility to preserve the color frequencies arises if the embedding algorithm is
designed with the frequency distribution of the cover files in mind. Such a frequency
distribution can not always be assumed but this is possible for example for jpeg files. The
algorithm F5 works on jpeg files and preserves the color frequencies by using the decrementation
of the absolute value of a DCT coefficient as embedding operation. [Hetzel]

A statistical test has been developed that tests higher-order statistics of "natural" images to
determine if a secret message has been hidden in an image. This test uses a training set of
images with and without a hidden message and the goal is to sufficiently train the test to
recognize whether a new image contains a hidden message by comparing this new image's
statistics to the statistics of the images from the training set. [Hetzel]

Steganography is an effective means of hiding data, thereby protecting the data from
unauthorized or unwanted viewing. But stego is simply one of many ways to protect the
confidentiality of data. It is probably best used in conjunction with another data-hiding method.
When used in combination, these methods can all be a part of a layered security approach. Some
good complementary methods include:

Encryption - Encryption is the process of passing data or plaintext through a series of


mathematical operations that generate an alternate form of the original data known as
ciphertext. The encrypted data can only be read by parties who have been given the
necessary key to decrypt the ciphertext back into its original plaintext form. Encryption
doesn't hide data, but it does make it hard to read!

Hidden directories (Windows) - Windows offers this feature, which allows users to hide
files. Using this feature is as easy as changing the properties of a directory to "hidden",
and hoping that no one displays all types of files in their explorer.

Hiding directories (Unix) - in existing directories that have a lot of files, such as in the
/dev directory on a Unix implementation, or making a directory that starts with three dots
(...) versus the normal single or double dot.
Covert channels - Some tools can be used to transmit valuable data in seemingly normal
network traffic. One such tool is Loki. Loki is a tool that hides data in ICMP traffic (like
ping). [Westphal]

Due to their invasive nature, steganographic systems often leave detectable traces within a
mediums characteristics. This allows the detection modified media, revealing that secret
communication is taking place. Although the secret content is not exposed, because it typically
relies on a secret key, its existence is revealed, which defeats the main purpose of steganography.
The modification of redundant bits can change the statistical properties of the cover medium. As
a result, statistical analysis may reveal that there is hidden content. Statistical tests can reveal
that an image has been modified by steganography by determining that an images statistical
properties deviate from a norm. Some tests are independent of the data format and just measure
the entropy of the redundant data. We expect images with hidden data to have a higher entropy
than those without. For example, a JPG image can be tested by measuring the frequency of the
DCT coefficients. [Provos]

Stegdetect is an automated tool for detecting steganographic content in images. It is capable of


detecting several different steganographic methods to embed hidden information in JPG images.
Stegdetect currently detects images that have content hidden with JPHide, JSteg, Camouflage,
Invisible Secrets, F5, AppendX, and Outguess 0.13b. The output from Stegdetect lists the
steganographic systems found in each image, or negative if no steganographic content could be
detected. Stegdetect expresses the level of confidence of the detection with one to three stars.
The statistical tests used to find steganographic content in images indicate nothing more than a
likelihood that content has been embedded. Because of that, Stegdetect cannot guarantee the
existence of a hidden message. Stegbreak is used to launch a dictionary attack against JPG files
with potential steganographic content. The current version supports JSteg-Shell, JPHide and
OutGuess 0.13b. Stegbreak needs to run on a distributed cluster of systems, and may still be too
slow to process all images that Stegdetect finds. Stegbreak is beyond the scope of this paper and
is not included in Steg_IDS v.1. More information can be obtained from www.outguess.org.
[Provos]

4. Steganography Tools

There are several steganographic systems that embed hidden messages into JPG images. Listed
below are some of the specific tools that Stegdetect is written to detect. Statistical distortions are
characteristic for each system, and can be used to identify signatures of the various tools.

JSteg is a free UNIX based program that hides files in JPG images. It uses a form of least
significant bit embedding. The DCT coefficients are modified continuously from the beginning.
It does not support encryption. JSteg-Shell, the Windows user interface to JSteg, supports
encryption and compression of the hidden content before embedding. It uses the RC4 stream
cipher with a 40-bit key.

JPHide is a free UNIX and Windows based program that hides files in JPG images. It uses a
form of least significant bit embedding. It uses the Blowfish algorithm for encryption and
supports compression of the hidden content before embedding. The DCT coefficients are not
selected continuously from the beginning, making it more difficult to detect. However, it does
use a fixed table to define classes of DCT coefficients, and other distinct methods that give this
program its signature. It not only modifies the least significant bits of the DCT coefficient, it can
also switch to a mode where the second least significant bits are modified.

Outguess 0.13b is a UNIX program that allows the insertion of hidden information into the
redundant bits of data sources in JPG images. It can be detected with statistical analyses,
however the latest version, 0.2, preserves statistical properties in a way that cannot be currently
detected by Stegdetect. Outguess 0.13b chooses the DCT coefficients with a pseudo-random
number generator. It also uses the RC4 stream cipher for encryption.

Camouflage v1.2.1 is a free Windows based program that hides files by scrambling them and
attaching them to the host file. This is an older steganography program and is no longer
supported or developed.

Invisible Secrets v3.2 is a shareware Windows based program that encrypts and hides files in
JPG, PNG, BMP, HTML and WAV files. It performs data compression before the encrypt/hide
process and uses strong encryption methods including Blowfish, Twofish, RC4, Cast128, GOST,
AES, Diamond 2, and Sapphire II. It also incorporates a Shredder to destroy files and folders
beyond recovery.

F5 0.12beta is a DOS and Windows based program that embeds files into true color BMP, GIF,
or JPEG images. The F5 algorithm works on jpeg files and preserves the color frequencies by
using the decrementation of the absolute value of a DCT coefficient as embedding operation.

AppendX is a free Perl program used to embed data in PNG, JPG, and GIF files. It will work on
UNIX and Windows with a Perl compiler. AppendX can be detected in hex-mode so it is
typically used with a PGP style program. From version 0.1 on appendX supports pgp-header
stripping (-s) so pgp data just looks like rubbish. From Version 0.11 on it runs under Windows
too. From Version 0.2 it includes the ability to restore (-r) a file, that means that the appended
data will be stripped off.

5. Steg_IDS

The Steg_IDS main engine consists of 4 modules: Snort, Parser, Crawler, and Stegdetect.
Figure 4 represents the process flow and interfaces between the modules.
STEG_IDS

PARSER snort.conf local.rules


SNORT

Downloaded steg_log
CRAWLER
JPEGs

steg_results
STEGDETECT

Figure 4 - Steg_IDS Framework

The Steg_IDS engine is the main program that controls all other modules. It is started by simply
typing steg_ids at the command prompt. Steg_IDS then starts Snort in NIDS mode, which
reads the /etc/snort/snort.conf and /etc/snort/local.rules configuration files. Command line
options are provided to collect the first 200 bytes (for performance) of the payload data and log
only the ASCII portion (no hexadecimal). The local.rules file contains custom signatures that
monitor and log incoming and outgoing traffic that contains JPG images. For performance, the
signatures are configured to only monitor traffic with the ACK and PUSH flags (AP) enabled.
This traffic is logged to the steg_log in the default /var/log/snort directory and is used as input to
the Parser.

The Parser reads data from the continuously updated steg_log and extracts the destination IP
address, the directory path, and the JPG file name. This information is stored as variables within
the program and is used as input to the Crawler and Stegdetect modules.

The Crawler concatenates the destination IP address, directory path, and JPG filename into one
dest_url string and checks to see if the image already exists in the log directory. If it does not
exist, Crawler performs an HTTP GET and downloads the image to the /crawl/logs directory.
These images are then used as input to the Stegdetect module. If the image has been previously
downloaded, the program continues on to process the next steg_log entry.

Stegdetect uses the dest_url string as input and performs various steganalysis methods on the
JPG image. It uses statistical tests to determine if steganographic content is present, and also
tries to determine the system that has been used to embed the hidden information. It uses the
default sensitivity level of 1. The result of the analysis is then logged to the
/crawl/logs/steg_results file. This file contains the filename, results, and confidence level of each
analysis.

6. Stegdetect and False Alarm Rate

Stegdetect was tested using control images in order to analyze the false alarm rate. One hundred
images from a digital camera were used to test the Stegdetect steganalysis results. Out of the 100
clean images Stegdetect reported 6 of those images as containing JPHide content, with varying
degrees of confidence.

False Positives on Clean Images


105-0580_IMG.JPG : jphide(*) 6%
105-0583_IMG.JPG : jphide(**)
105-0592_IMG.JPG : jphide(**)
110-1086_IMG.JPG : jphide(*)
110-1088_IMG.JPG : jphide(**)
111-1138_IMG.JPG : jphide(**) Negative
Positive

94%
Figure 5 - Stegdetect False Alarm Rate

Next, each of the 100 images were embedded with a hidden Microsoft Word document, using
JPHide, Camouflage, Invisible Secrets, and JSteg respectively. One hundred percent of the JPGs
with embedded content were detected, however the JSteg program was detected as the Outguess
tool:

JPHide:
105-0568_IMG.JPG : jphide(***)

Camouflage:
105-0567_IMG.JPG : appended(2288)<[random][data][.........{D....d]>

Invisible Secrets:
105-0569_IMG.JPG : invisible[50536](***)

JSteg:
105-0570_IMG.JPG : outguess(old)(***)

7. Steganography and the Internet

The Steg_IDS engine was used in an operational environment for two weeks. It monitored daily
web traffic for a single workstation between the hours of 8am and 5pm. The activity consisted of
normal Internet usage including web research and occasional Ebay searches. During this test
period the Steg_IDS engine collected and processed 585 JPG images. Sixteen of these images
were flagged as containing JPHide content. These images were processed with Stegbreak
utilizing the L0phtcrack dictionary, but none were cracked. Mostly likely these are false
positives, resulting in a 2.7% false alarm rate. This is considerably lower than the test case false
alarm study. This could be a result of many factors of the photo, including size and color
complexity. The control case contained very large images, with a lot of detail, such as
landscapes. The web images tended to be smaller with less detail.

8. Steg_IDS Future Enhancements

Steg_IDS is currently in its first release at v0.1. Although it has been tested in an operational
environment, there are several enhancements that are planned for future releases. The include:

Log Rotation
Better error checking
Migrating the scripts to a more efficient language
Increased speed of processing
The ability to use Steg_IDS in a distributed architecture
Add support for other types of images, such as GIF or BMP
Incorporation of Stegbreak for analysis of detected images
APPENDIX A STEG_IDS

#################################################
# steg_ids 04-2003 #
# #
# Steganography Detection Engine #
# #
# Modules: Snort, Parser, Crawler, Stegdetect #
# #
# #
# Angela Orebaugh #
#################################################

### Snort Module


### This module starts the Snort IDS with the proper
### parameters and config file

snort -i eth0 -P 200 -d -C -c /etc/snort/snort.conf &

### Parser Module


### This module parses the Snort log data

tail -f --retry /var/log/snort/steg_log |


while read line
do
dest_tmp=`echo $line | grep '\->' | cut -d" " -f4 | cut -d: -f1`
if [ "$dest_tmp" != "" ]
then
dest="$dest_tmp"
fi

url=`echo $line | grep 'GET' | cut -d" " -f2`

if [ "$url" != "" ]
then

#if it is not already in the crawl directory

if [ ! -f /crawl/logs/"$dest""$url" ]
then

### Crawl Module


### This module imports the parsed data to the web
### crawler and downloads the files

crawl -R -t 1 -d /crawl/logs http://"$dest""$url"

### Stegdetect Module


### This module performs steganography detection
### on the downloaded files and saves the results to a file

stegdetect /crawl/logs/"$dest""$url" >>


/crawl/logs/steg_results
else
echo this is a duplicate > /dev/null
fi
fi

done
APPENDIX B SNORT.CONF

#--------------------------------------------------
# http://www.snort.org Snort 1.9.0 Ruleset
# Contact: snort-sigs@lists.sourceforge.net
#--------------------------------------------------
# NOTE:This ruleset only works for 1.9.0 and later
#--------------------------------------------------
# $Id: snort.conf,v 1.110 2002/08/14 03:17:58 chrisgreen Exp $
#
###################################################
# This file contains a sample snort configuration.
# You can take the following steps to create your
# own custom configuration:
#
# 1) Set the network variables for your network
# 2) Configure preprocessors
# 3) Configure output plugins
# 4) Customize your rule set
#
###################################################

<-configuration file truncated all default configuration used->

#=========================================
# Include all relevant rulesets here
#
# shellcode, policy, info, backdoor, and virus rulesets are
# disabled by default. These require tuning and maintance.
# Please read the included specific file for more information.
#=========================================

#include $RULE_PATH/bad-traffic.rules
#include $RULE_PATH/exploit.rules
#include $RULE_PATH/scan.rules
#include $RULE_PATH/finger.rules
#include $RULE_PATH/ftp.rules
#include $RULE_PATH/telnet.rules
#include $RULE_PATH/rpc.rules
#include $RULE_PATH/rservices.rules
#include $RULE_PATH/dos.rules
#include $RULE_PATH/ddos.rules
#include $RULE_PATH/dns.rules
#include $RULE_PATH/tftp.rules

#include $RULE_PATH/web-cgi.rules
#include $RULE_PATH/web-coldfusion.rules
#include $RULE_PATH/web-iis.rules
#include $RULE_PATH/web-frontpage.rules
#include $RULE_PATH/web-misc.rules
#include $RULE_PATH/web-client.rules
#include $RULE_PATH/web-php.rules

#include $RULE_PATH/sql.rules
#include $RULE_PATH/x11.rules
#include $RULE_PATH/icmp.rules
#include $RULE_PATH/netbios.rules
#include $RULE_PATH/misc.rules
#include $RULE_PATH/attack-responses.rules
#include $RULE_PATH/oracle.rules
#include $RULE_PATH/mysql.rules
#include $RULE_PATH/snmp.rules

#include $RULE_PATH/smtp.rules
#include $RULE_PATH/imap.rules
#include $RULE_PATH/pop3.rules

#include $RULE_PATH/nntp.rules
#include $RULE_PATH/other-ids.rules
#include $RULE_PATH/web-attacks.rules
#include $RULE_PATH/backdoor.rules
#include $RULE_PATH/shellcode.rules
#include $RULE_PATH/policy.rules
#include $RULE_PATH/porn.rules
#include $RULE_PATH/info.rules
#include $RULE_PATH/icmp-info.rules
#include $RULE_PATH/virus.rules
#include $RULE_PATH/chat.rules
#include $RULE_PATH/multimedia.rules
#include $RULE_PATH/p2p.rules
#include $RULE_PATH/experimental.rules
include $RULE_PATH/local.rules
APPENDIX C LOCAL. RULES

# $Id: local.rules,v 1.5 2001/12/19 18:40:05 cazz Exp $


# ----------------
# LOCAL RULES
# ----------------
# This file intentionally does not come with signatures. Put your local
# additions here.

log tcp any any <> any any (content: "GET"; content: ".jpg"; content: !
"libcrawl" nocase; msg:"jpg"; flags: AP; logto: "steg_log";)
APPENDIX D SAMPLE LOGS

Steg_log

[**] jpg [**]


04/07-10:19:27.982516 156.80.45.184:33092 -> 66.218.76.71:80
TCP TTL:64 TOS:0x0 ID:55493 IpLen:20 DgmLen:723 DF
***AP*** Seq: 0x7ADECB1B Ack: 0x2A6F4D68 Win: 0x16D0 TcpLen: 32
TCP Options (3) => NOP NOP TS: 1577985 237565446
GET /sansmentor/SANS_title.jpg HTTP/1.1..Host: www.geocities.com
..User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2)
Gecko/_
=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

[**] jpg [**]


04/07-10:22:27.939738 156.80.45.184:33117 -> 66.135.209.133:80
TCP TTL:64 TOS:0x0 ID:18025 IpLen:20 DgmLen:836 DF
***AP*** Seq: 0x86333DDD Ack: 0x4CFF557B Win: 0x16D0 TcpLen: 32
TCP Options (3) => NOP NOP TS: 1670123 481072009
GET /pict/3125705302.jpg HTTP/1.1..Host: thumbs.ebay.com..User-A
gent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2) Gecko/20
021120_
=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

[**] jpg [**]


04/07-10:22:28.037844 156.80.45.184:33117 -> 66.135.209.133:80
TCP TTL:64 TOS:0x0 ID:18029 IpLen:20 DgmLen:836 DF
***AP*** Seq: 0x863340ED Ack: 0x4CFF6112 Win: 0x2D40 TcpLen: 32
TCP Options (3) => NOP NOP TS: 1670173 481072019
GET /pict/3125033215.jpg HTTP/1.1..Host: thumbs.ebay.com..User-A
gent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2) Gecko/20
021120_
=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

Steg_results
/crawl/logs/66.135.209.133/pict/3602019586.jpg : jphide(*)
/crawl/logs/66.135.209.133/pict/3602019796.jpg : negative
/crawl/logs/66.135.209.133/pict/3601537611.jpg : negative
/crawl/logs/66.135.209.133/pict/3602646798.jpg : jphide(***)
/crawl/logs/12.162.67.95/lukebacta.jpg : skipped (false positive likely)
/crawl/logs/12.162.67.95/potjcarbon.jpg : negative
/crawl/logs/12.162.67.95/atcpic.jpg : negative
/crawl/logs/12.162.67.95/potjatst.jpg : negative
References

Provos, Niels. Honeyman, Peter. Detecting Steganographic Content on the Internet.


http://www.citi.umich.edu/u/provos/papers/detecting.pdf.

Outguess. http://www.outguess.org/

SANS GSEC curriculum material. http://www.sans.org

Gonzalez, Fernando C. Counter Terrorist Steganography Search Engine

Invisible Secrets 2002 Neobyte, http://www.invisiblesecrets.com/

JP Hide and Seek http://linux01.gwdg.de/~alatham/stego.html

JSteg http://packetstormsecurity.nl/crypt/stego/jpeg-steg/

JSteg Shell v2.0 http://members.tripod.com/steganography/stego/jstegshella.zip

Camouflage v1.2.1 http://camouflage.unfiction.com/

F5 http://wwwrn.inf.tu-dresden.de/~westfeld/f5.html

Snort. www.snort.org

http://www.jjtc.com/ihws98/jjgmu.html

http://www.pipo.com/guillermito/camouflage/

D. Sellers, An Introduction to Steganography,


http://www.cs.uct.ac.za/courses/CS400W/NIS/papers99/dsellars/stego.html

T. Blake, Steganalysis Or Is Ralph in Marketing Selling Company Secrets on Our Web Page?,
http://www.sans.org/rr/encryption/steganalysis.php

N. Provos and P. Honeyman, Detecting Steganographic Content on the Internet,


http://niels.xtdnet.nl/papers/detecting.pdf

S. Hetzl, A Survey of Steganography, http://steghide.souceforge.net/steganography/survey

P. Wayner, Disappearing Cryptography, http://www.wayner.org

N. Johnson and S. Jajodia, Exploring Steganography: Seeing the Unseen,


http://www.jjtc.com/pub/r2026.pdf

F. Petitcolas, R. Anderson and M. Kuhn, Information Hiding A Survey,


http://debut.cis.nctu.edu.tw/~yklee/Research/DataHiding/REF/ProcIEEE1999-7-1.pdf
L. Marvel, C. Boncelet, and C. Retter, Reliable Blind Information Hiding for Images,
http://debut.cis.nctu.edu.tw/~yklee/Research/Steganography/Lisa_M_Marvel/IPIEEE.pdf

Stefan Hetzl, A Survey of Steganography


http://steghide.sourceforge.net/steganography/survey/steganography.html

Kristy Westphal, Steganography Revealed


http://www.securityfocus.com/infocus/1684

You might also like