You are on page 1of 12

George Washington University

Web Application
Vulnerabilities
An Overview

Michael Corsello
11/1/2008
Michael Corsello CSci 286, Fall08

Introduction
Web applications are the primary business enabling technology in use today. The concept of doing
business solely on the internet, outsourcing computing to “compute service” organizations is being
referred to as “cloud computing” and is gaining momentum in industry. Peer computing is growing in
value as is represented by the growth in use of mobile ad-hoc networks (MANETs). All forms of devices
are now web-enabled and open up new vulnerability surfaces as each device connected to the internet
can potentially be attacked via that network interface or, once compromised, used to attack other
devices on the internet. These new devices of interest include browser enabled smart phones, smart
devices such as cars using online services (OnStar), portable media players such as iPods which either
have built-in internet capabilities or synchronize when connected to another device, streaming media
devices such as Tivo or other set-top boxes and the list goes on.

New technologies are emerging at an alarming rate with ever growing capabilities, each of which
present new areas for vulnerabilities. Massive data storage such as 1.5Tb single hard drives, unlimited
storage available thorough storage area networks (SAN) provide an increasing challenge at locating
malicious code within the overall data stores. Multi-core CPUs are the norm, with quad core currently
commonplace and 60 core CPUs having been demonstrated. Distributed and high-performance
computing (HPC) solutions are gaining traction in industry for various applications such as offloaders and
appliances with data mining applications being a primary area for wide spread use of distributed and
HPC technologies.

Issues
In today’s environment, data is the most valuable commodity to any organization. IT workers are
increasingly hard to find with skill sets declining overall due to demand outpacing supply. While highly
skilled personnel are still out there, they are rare with respect to the sum of all workers available. The
primary focus in industry is on quick turn around and keeping costs low at each step of the IT lifecycle.
Hardware has become cheap with labor assuming the primary cost of IT systems with the primary
demand to minimize labor costs. IT tools are being focused on “drag and drop” development and the
“simplification” of management applications. Overall, this is reducing the barrier to entry for becoming
an IT worker across the board to meet the demand while greatly affecting the ability of the average IT
worker in the field. Unfortunately, there are few corresponding efforts at improving the overall quality
of the products being produced.

Education
Software and system developers are not trained to account for security. Authentication and
authorization are generally the only security concerns addressed by IT workers in software application
development. Many security needs are being “pushed” to administrative COTS products (e.g. logging,
backup, physical security) with no strong interfaces between connected application layers. The
emphasis in industry today is “buy” rather than “build” for applications without real regard to the

Page 1 of 12
Michael Corsello CSci 286, Fall08

potential implications of connecting components that we not designed to work together. Open source
applications and components create a broader surface of internal knowledge that both presents
benefits and risks to the security of applications built with these technologies.

High-Level Anatomy
Current web applications are becoming increasingly complex and distributed. The three primary areas
include the client browser, the internet itself and the application hosting site. Each of these areas have
vulnerabilities that can be exploited to launch a further attack at the other areas. In addition to these
main areas, there are the additional areas of certificate authority servers (CAs) and service hosts. The
combination of multiple services into a single user display has become a hallmark of “Web 2.0”
applications known as “mashups”. These “mashups” raise the bar for security vulnerabilities and
computational complexity.

Figure 1. Anatomy of a web application pipeline

Each node within the chain of communication for a web application may be exploited in a variety of
ways to compromise the larger whole. The irony of the distributed nature of this form of application is
the freedom of harnessing loosely coupled computational resources comes at the cost of trusting the
security and intent of each resource in the application. Each computational resource may be
compromised without any other resource being aware of the vulnerability or resultant compromise.

Page 2 of 12
Michael Corsello CSci 286, Fall08

Web Request Anatomy


A web application functions as a collection of request / response pairs between a client and server that
are initiated solely by the client. In this form of application, the formal definition of a server is one that
listens for and responds to requests, and the client is one that makes requests. In this manner the
server can only respond to requests from the client and has no mechanism for initiating a
communication with the client. Furthermore, the server is expected to respond to numerous requests
from a myriad of concurrent clients. Each client is distinct and each request is atomic in that the server
is not expected to know the specific client making a request. This is the overall architecture of the
hypertext transfer protocol (HTTP) used in web applications.

A specific request / response pair is structured as follows:

Client user makes a web request via their web browser


o Client browser formats the request as an HTTP request
o The HTTP request is sent over a TCP connection to the web host
The TCP connection is opened to the server
o A “web URL” is resolved to an IP address via DNS resolution
o The HTTP request is sent to the server at the resolved IP address
An SSL tunnel may be opened using a certificate that is signed by a “registered” CA
o A standard HTTP request is passed through this secure tunnel
The web server generates response
o The server processes request
o The server performs processing including I/O to generate a response
o The server formats HTTP response
o The server sends HTTP response over existing TCP connection to web client
o The server terminates TCP connection
Client browser evaluates HTTP response
o Browser processes response
o Browser generates document content (layout)
o Browser executes script code

The overall pattern of this request / response pair is divided into network regions or enclaves that
separate the logical networks participating in this transaction. Each of these enclaves are secured
separately and are therefore independently subject to attack.

Client Enclave
The entire boundary of the client to its connection to the internet will be defined as the client enclave.
This includes the user, their computer, a connection to a local network, that network and its ultimate
connection to an internet service provider (ISP).

In this environment, the user is vulnerable to any number of exploits to compromise their machine. The
most common and effective are social attacks that play upon human nature to entice the user to

Page 3 of 12
Michael Corsello CSci 286, Fall08

provide access to their computer to an adversary without being aware of their action. This is commonly
done through physical access to the machine, installing key loggers or other covert software or
hardware devices to fully exploit the actions of the user on their computer. Additionally, social
engineering attacks are quite effective where a user is presented with a false image of a trusted
information source that the user will respond to, such as a faked web application or email.

Once a client machine is compromised, the machine may be used to attack any other application on any
network that machine has access to. The machine can further be used to record information on private
networks and later replay that information over a public network for roving devices such as a laptop.

Finally, an unauthorized user can use compromised user information to provide legitimate access to a
web application by providing the compromised credentials. This form of attack or identity theft poses a
significant risk especially where the compromised account has elevated privileges. Not only can a
compromised account be exploited to access business information in the compromised application, but
this access can be used as a vector to insert malicious code to provide further access to secured
resources to compromise other more highly secured applications and hardware.

Figure 2. User Client Enclave.

Beyond the user computer itself, the client enclave can be compromised through network attacks and
eavesdropping. Wireless networks are highly susceptible due to the nature of open transmissions that
can be passively intercepted without detection. While cabled networks can also be tapped, the physical
location of many cables makes this attack less logistically trivial.

Page 4 of 12
Michael Corsello CSci 286, Fall08

ISP Enclave and the Internet


Each ISP connection that a client uses is subject to attack. ISP organizations will generally track
connections and provide some logging functionality. These logging facilities can be modified to capture
information that can be directly exploited. Further, ISP hardware is generally not adequately physically
secured to prevent direct exploitation. Overall, every aspect of communication from the client machine
to the internet itself is subject to various aspects of network attacks.

Once the ISP network is reached, the content is transferred to the internet backbone at a point of
presence (POP) for the ISP carrier. These POPs are the only places where traffic is allowed to switch
from one carrier to another due to the proprietary nature of the telecommunications infrastructures.
Additionally, these are correlated to the points at which traffic is transferred from an IP style switched
network to a non-IP aware synchronous optical networking (SONET) or other form of infrastructure.

These
telecommunications
provider networks
are extremely high
speed and generally
poorly secured.
While it is extremely
difficult to pick a
communication
stream from these
backbones (as they
are multiplexed and
disjointly routed), it is
quite possible to gain
physical access to
these hardware
resources. Where Figure 3. ISP and Internet Enclave
accessed directly, it is
possible to spoof large sections of the internet by circumventing some of these connections and routing
schemes. These types of attacks are known to be possible but currently have not been seen on any
scale. This form of attack is the primary area of focus to critical infrastructure protection groups such as
the DHS in the National Infrastructure Protection Plan (NIPP).

Certificate Authority and DNS Enclaves


In performing request / response cycles for a web application there are supporting infrastructure servers
that are used.

Page 5 of 12
Michael Corsello CSci 286, Fall08

Domain Name Service


The primary external service is that of the Domain Name Service or DNS. DNS performs the name
resolution to look up an internet protocol (IP) address from a uniform resource locator (URL) address
such as www.microsoft.com. The DNS system is a networked set of servers throughout the world that
replicate portions of the total name resolution caches between each other. There is no single DNS
server in the world that houses the entire resolution dictionary, instead each server may be a source of
truth (start of authority or SOA) for a small portion of the total dictionary based upon a fixed starting
domain name, such as sun.com. In this manner, that DNS server that is the SOA for the sun.com domain
would have authority over all subordinate records for that domain, such as www.sun.com or
commerce.sun.com. Each SOA server can delegate authority to another server for a portion of its
domain as well. In this manner, the resolution of a domain name to an IP address is a serious attack
vector.

This vector can permit an attacker to have a request for a web application at a given URL result in the
user arriving at the IP address of an attacker controlled machine. This can result in the attacker simply
intercepting all traffic that is then “passed through” to the actual server or placing an alternative
application at the attacker location. These forms of attacks can be performed in several ways to include:

DNS cache poisoning where a legitimate DNS server is loaded with bad data from a replication
partner or registering client
DNS spoofing where a false DNS server is established within the enclave of the client to return a
false IP address for the domain name
DNS circumvention where a client is compromised and provided with a local “hosts” resolution
that bypasses external DNS requests

DNS resolution as an external mechanism is a source of concern for all web applications that use
“friendly” URLs rather than IP addresses only. Furthermore, there is no way to determine if these
attacks are happening at the server, as they are local to the client’s enclave. Reverse DNS hacks and
secondary server-side attacks are also business issue for the servers as lookup records can be altered or
removed to prevent clients from ever reaching the server.

Certificate Authorities
Certificate Authorities (CAs) are “self-promoting” sources of “truth” for security certificate
authentication. There is no guarantee that a CA is trustworthy as any server can be configured to be a
CA. The SSL/TLS connections that use CA certificates are simply used as digital signatures against the
pre-installed root certificates in the browser’s security cache. These certificates are rarely checked for
revocation. While PKI is used for the signature, but is still prone to attack at the end-points as a man in
the middle attack can still be conducted since SSL is only a point to point secure protocol. Further, the
proxy of an SSL connection can break the security as the proxy may consider itself as the terminal point
in the point to point connection. The use of SSL accelerators and offloaders make this of increasing
significance within the server side enclave.

Page 6 of 12
Michael Corsello CSci 286, Fall08

While an SSL connection will display the signature result of an SSL


certificate, users will often accept the risk of non-verified
certificates issued by CAs that the browser is unable to verify.
Overall, while the cryptographic capabilities of modern SSL/TLS
(early SSL versions were not secure) is fairly robust, social
engineering and technical workarounds such as CA spoofing
coupled with a DNS spoofing can render SSL moot. As in many
cases, security is limited and mostly a façade. Figure 4. CA or DNS Enclaves

Web Server Enclave


The heart of any web application exists within the server enclave of the hosting organization. In most
cases, the web server is physically isolated from the data storage server and many times isolated from
the processing servers that actually provide the data and computation (respectively) for any request.
Since the nature of the server enclave is to process requests from many users concurrently in successive
request / response cycles, the server enclave has to focus on throughput. Quick handling of each user
request provides responsiveness to the client who is already burdened by network latencies in both
directions.

Within the server enclave a firewall (or series of firewalls) generally separates the internet from the web
servers. The web servers are contained in a demilitarized zone (DMZ) where a partially hostile
environment is expected and acts as a buffer to the protected internal network. The compute and data
servers are generally beyond another firewall inside the protected internal network and accessed by the
web server via a trusted connection (secured in any number of ways). Each of these layers can be
attacked, compromised or merely sniffed to compromise the data traffic passing to and through them.

The application and database servers are of particular interest as these servers host the actual
information and processing power that represents the business capability the web application serves as
a user interface to. By hacking or compromising these nodes, the entire enterprise is then vulnerable to
exploit and more importantly, the intellectual property of the host is at risk. In modern web
applications, the database is backed by a storage area network (SAN) which can host nearly unlimited
sized data volumes. The SAN itself is a private network relying on TCP/IP underpinnings with data
specific higher protocols (fiber channel, ISCSI, etc) to enable greater performance for disk I/O style
operations. Given that the SAN is simply another network, it is susceptible to many of the same attacks
as a conventional network but is rarely secured in the same fashion as the commodity network.

Page 7 of 12
Michael Corsello CSci 286, Fall08

Figure 5. Web Server Enclave

Service Provider Enclave


The addition of services as a mechanism for distributing computational and data resources has added
another facet to web applications. While the concept of services has been around for many years
(CORBA, RMI, RPC, PVM, DCOM, etc) the practical use of distributed services has been limited. Once the
advent of internet technologies expanded to the common use of XML, web based services using XML as
the sole exchange format became established. These “web services” have been touted as the primary
vehicle for implementing a “service oriented architecture” (SOA) where computation is distributed to
nodes that host coarse-grained services that are later merged together to form a coherent application.
While this is largely just another form of distributed component reuse, web services have gained a level
of use that no other such technology has seen. In a SOA based environment, each part of an application
can be encapsulated and distributed as a service or set of services. Due to the nature of the transport
and encoding (XML and HTTP), the entire communication is text based and requires structured parsing.
This has resulted in a need to enhance performance in services that has been realized through the use of
various other technologies.

Within a service provider enclave there may be several specialized server devices that are less likely to
be seen in other enclaves. The use of SSL offloaders is a primary example of a dedicated server within
the DMZ of an enclave to increase performance that is a direct security risk. In an SSL offloader, the
specialized server handles all incoming HTTPS requests and simply serves as the SSL termination point.

Page 8 of 12
Michael Corsello CSci 286, Fall08

From that server requests are forwarded to another web server that handles the actual processing. The
requests sent from the SSL offloader are entirely in the clear as HTTP requests. If the DMZ is
compromised, the outbound requests from the SSL offloader is a primary means of acquiring full access
to all traffic in the clear.

Figure 6. Service Host Enclave

In addition to the SSL offloader, there may be XML processing appliances that validate, transform and
enforce encoding rules on the XML payloads of the requests. These appliances must have direct access
to the XML content for all requests. Since these appliances are proprietary servers, they are subject to
attacks based upon knowledge of how they are implemented and their host operating systems. These
appliances are often not open to local administrators to patch directly and instead must be patched
from the manufacturers (generally online). This can result in delays in patches to OS vulnerabilities
being applied as the vendors get the patches into their environments. In addition to these
vulnerabilities, each host enclave will generally be very similar to the web server enclave and therefore
have all the same vulnerabilities as those enclaves. By having these enclaves physically separated by the
internet, there are additional vectors for attack between the enclaves. Additionally, each separate
location is generally secured differently by different organizations that may not effectively communicate
with each other, this can result in many other secondary vulnerabilities in addition to the increased
attack surface area.

Application Code Vulnerability


The primary topic of interest to the author is that of application vulnerabilities based upon the
architecture of applications themselves. Each server that hosts code is subject to many forms of
tampering including reverse engineering of the code libraries, injection of malicious code into existing
libraries or into dynamically loaded code, configuration parameter manipulation, host environment
compromise, OS and driver intercepts, etc. Each of these forms of application tampering may

Page 9 of 12
Michael Corsello CSci 286, Fall08

compromise the entire server and all applications it hosts. Once compromised, all data is potentially at
risk as are the users and hosting organizations.

Code-Level Security
Each library contains executable code that is compiled for use. This includes virtual machine languages
such as Java, .NET and Python as well as traditional languages such as C, C++, Delphi, Fortran, etc. Each
library may be reverse engineered to run additional malicious code. Even though some virtualized
languages such as Java and .NET provide a signing mechanism and runtime version checking, these are
often not used by developers and can be bypassed. In a dynamic environment it is often desirable to
design an application for extensibility using dynamic late binding via a mechanism known as reflection.
In these cases, there is no a priori knowledge of the library at all. Therefore if not planned for, a
malicious library may easily be injected onto the server and run.

Scripting languages are especially risky as the actual code runs directly as text with no real distinction
between code and data. These languages expose the possibility of code injection where data and code
meet (e.g. SQL injection, JavaScript injection). Web application user interfaces are a perfect example of
this issue as the web browser renders a combination of hypertext markup language (HTML) and
JavaScript transported from the server to the client in an HTTP response (also add asynchronous
responses in the case of AJAX). An adversary posing as a user can receive a response from the web
server, then manipulate any portion of the content in that response to do any form of action desired. If
the server places undue expectations on the client side content, the entire application may be
compromised by such a client-side fiddling attack.

When dealing with configuration information, these are usually stored as textual data in an XML or
other flat file on the web server. These files are open, plain text that is easily read and modified by an
adversary. The editing of these files can cause grave impacts of the execution of an application, to the
extent of even rerouting traffic to a compromised server (such as a database man in the middle).
Further, these files often contain sensitive information such as login information to a database server or
third party service that may be exploited.

Conclusions
Web applications are the primary vehicle for computing and are incredible vulnerable to attack. The
surface area alone for a web application is immense and growing. Web services are far from immune
with new attacks being devised and few being seen so far.

Authentication and authorization are the primary focus of developers writing applications. There is little
incentive to develop robust applications as the desire for quick turn around and low cost are the primary
drivers in industry.

Security in the tail is a great opportunity for development of new forms of protection and new forms of
exploits. Data integrity is a hot topic for regulation and a primary area of active exploitation. All

Page 10 of 12
Michael Corsello CSci 286, Fall08

applications exist to act upon and in general store data. Data is the commodity to be protected at all
cost.

Education levels of technologists and time spend in analysis and design are huge issues for
implementation. Overall, analysis and design are skipped as much as testing is.

Services are the next area of concern for security. Secure services are generally only securing data, not
equipment. Binary services such as those offered by CORBA and RMI/IIOP perform better and are more
inherently secure, but are out of favor.

More resources need to be dedicated to the design and development of openly architected components
designed to be generically integrated in a secure fashion.

Page 11 of 12

You might also like