You are on page 1of 7

Applying Web Service and Windows Clustering for High Volume Risk

Analysis

Sivadon Chaisiri, Juta Pichitlamken, Putchong Uthayopas, Thanapol Rojanapanpat, Suphachan


Phakhawirotkul, and Theewara Vorakosit
High Performance Computing and Networking Center, Faculty of Engineering, Kasetsart
University, 50 Phaholyothin Road, Bangkok 10900, Thailand
{g4665304, juta.p, pu, g4765402, g4565065, g4685034}@ku.ac.th

Abstract maybe temporarily unavailable when there are too


many requests. This limitation depends on many
We present the development of a distributed system factors, such as server’s specifications (e.g., capability
to calculate the Value at Risk (VaR) measure when a of processors and amount of available memory) and
large number of users are presented. A scalable other runtime software infrastructure (e.g., operating
architecture based on Windows clustering and web system, application server software and database
services is proposed. In addition, we develop a load software). In addition, using one centralized server
balancing algorithm to distribute the workload among poses a risk of a single point of failure; when the server
the compute nodes in the Windows cluster. The fails, the whole system will be down too.
experimental results show that our system can To get more computing power, one can buy a new
substantially speed up the VaR calculation. In high-end server. However, the single point of failure
addition, it offers a good scalability. This work problem still exists, and the total cost of ownership
provides an example of how to deploy a standard web (TCO) also increases: not only the price of purchasing
service and Windows clustering technology to offer a
a new server but also the cost of upgrading,
cost-effective and scalable solution for demanding
maintenance, service, and repair.
financial applications in practice.
Another high performance computing (HPC)
solution is clustering. A cluster is a group of
computers connected via a high speed network.
1. Introduction
Computers inside the cluster share resources such as
storage, data, software, and computing power. When
Many commercial organizations routinely perform
an application is running on a cluster, it can harness the
computing-intensive calculations. For example, credit
shared computing resources that act as a unified pool.
card companies employ data-mining techniques to
An application can utilize this resources pool to obtain
detect credit card frauds. Commercial banks need to
much faster speedup than possible on a single server.
compute value-at-risk measures to comply with the
Furthermore, we can build a cluster which connects
Basel Capital II Accord. These calculations generally
commodity personal computers that are much less
require a large amount of historical data and thus high
expensive than a high-end server. In addition, the
computing power to analyze them. Deployment of
single point of failure problem is avoided because
such applications on a centralized server is one way to
many computers inside the cluster can provide the
get the results, but it also has some drawbacks.
same functionality; in the case that one computer fails,
In an actual commercial information system,
there are still others that are able to handle functions.
software is usually implemented as a server-side
In this paper, our cluster is built on the Microsoft
application that follows some logics, manipulates some
Windows platform, which we choose because the
data and is hosted in only one centralized server.
operating system is robust and very easy to configure.
There may be many users and other applications
The application that we test in this work is a real-world
requesting the services that this server provides. If the
application developed for the Thai Bond Dealing
applications consume lots of computing power, the
Centre (ThaiBDC. See www.thaibdc.or.th). The
overall performance of the server will be degraded and
application calculates the Value-at-Risk (VaR) for a framework [6]) are appealing because they provide
portfolio of stocks traded in the Stock Exchange of many tools and APIs that supports business application
Thailand. VaR is an estimate, with a user-defined implementations. By applying such a framework for
degree of confidence, of how much one can loose from parallel program development, a programmer can
one’s portfolio over a given time horizon [1]. It aims develop business application that is able to utilize the
at making a statement of the form “We are c percent parallel programming power much faster than before.
certain not to loose more than V dollars in the next m When we build a cluster for parallel programming,
days.” ThaiBDC has already provided the VaR we generally choose a single operating system that
calculation service commercially but using different provides the runtime infrastructure for resource sharing
platforms. Thus, this application will be used and communication. Although it is possible to build a
concurrently by a large number of users. Moreover, cluster with heterogeneous operating systems, having
each request generally requires a lot of computing only one type of operating system will be much
power because the calculation involves manipulation simpler to maintain. Two types of operating system
of large two-dimensional matrices. are widely used for clustering:
To achieve the cluster environment, we implement 1) Linux cluster: This type of cluster is based on
the VaR application as a web service which is commodity hardware, open-source software and Linux.
deployed on every computer inside the cluster. This This type of cluster is also widely known as Beowulf
means that the cluster is able to handle many incoming Cluster [7]. NPACI Rocks and OSCAR [8] are
requests simultaneously. In addition to the application, samples of software for building Linux clusters. The
we also need a load-balancing mechanism to distribute main advantage of this type of cluster is the highly
users’ requests to the computers inside the cluster. configurable architecture, high scalability, and low
We propose architecture for a cluster with a load software licensing cost. However, a skill in using
balancer that is designed to handle a large amount of Linux operating system is also required for
requests. From the experiment, our cluster and maintaining a cluster. Hence, this type of cluster is
software can maintain a good response time while the used mostly in technical and academic environment
number of request increases. We expect that our where the required skill and manpower are available
proposed architecture can be efficiently applied for cheaply.
other business applications as well. 2) Windows Cluster: Microsoft Corporation
This paper is organized as follows: Section 2 details provides a cluster solution via the Microsoft Windows
related works. We describe our design in Section 3. operating system. Many pioneer works has been done
The experimental results are in Section 4, and we by researchers from Cornell Theory Center (CTC)
conclude in Section 5. regarding how to build and use large scale Windows
cluster [9]. Although building a Windows cluster
2. Related work means users have to pay for some software cost, it may
be more suitable for commercial environment for a
Currently, most applications are generally designed number of reasons: First, many companies have
to run on a single processor. This is usually termed already had a lot of Windows-based expertise in-house.
sequential programming. To efficiently utilize the Second, building and maintaining a Windows cluster is
computing resources shared in a cluster, applications easier than a Linux one, with a rich set of GUI based
must be implemented using parallel programming tools. Third, there are a rich set of robust and standard
technique, where the processing is distributed on tools (such as Java Development Kit and Microsoft
multiple processors in the cluster. Additionally, the Visual Studio) that can be used to enable programmers
processors can communicate with each other and to quickly develop an application under this
exchange some data. Widely used tools for building a environment. Finally, all the standard technology (e.g.,
parallel programming application are Application web services and Java) is very robust as it has been
Programming Interfaces (APIs), such as Message tested for years in mission-critical commercial
Passing Interface (MPI) [2][3] and Parallel Virtual environment.
Machine (PVM) [4]. A major drawback of Currently, there is a rapidly increasing need for
parallel/distributed programming is that it takes more organizations to exchange information among each
time and effort to design and implement applications other. This so-called business to business (B2B)
than sequential programming. However, modern communication allows much faster and highly
development frameworks (e.g., J2EE [5] and .NET automatic flow of information to speed up processing
in application, e.g., package tracking and customer
relationship management (CRM). Hence, there is a
need for a standard technology that allows applications
to communicate with each others over Internet
regardless of the platform differences. Web service
(WS) [10] is a highly standard technology invented by
W3C for this purpose. There are a series of standards
cover web service technology including area such as
naming, WS description language, WS messaging, WS
security. The advantage of web service is that it is a
well support standard with a rich set of development
tool and environment. In this work, web service is a
key technology being applied to provide a
parallel/distributed execution of VaR calculation.
Although web service can be utilized as a common
communication infrastructure, parallel execution of the
application still requires a mechanism to distribute
application execution to each compute node inside the
cluster. The balancing of the execution load among
compute nodes is one of the determining factors of the Figure 1. Scalable VaR calculation process
system overall performance. For example, a poor load
In the VaR calculation tool, users input the portfolio
balancing algorithm may cause some computers to bear
information: his portfolio detail, the holding period,
a large amount of loads while other computers may
and confidence level, through a user interface module.
take up small loads or be idle. Load balancing is a
In this work, there are two user interface modules: One
field of active research. The random balancing and
is a web application developed using ASP.NET, and
round robin balancing algorithms are some examples
the other is an Excel worksheet that connects to VaR
of well-known and widely-used methods.
calculation system using web service. After this
The random balancing algorithm randomly chooses
information becomes available, the VaR calculation
a computer to handle a new incoming load. The appeal
starts. The algorithms used for VaR calculation is
of this algorithm is in its simplicity and efficiency on a
Single-Factor Capital Asset Pricing Model (CAPM)
large cluster. However, it generally performs poorly on
(cf. [11]). During the VaR calculation, the past
a small cluster and a seriously load unbalanced can be
historical data from the Stock Exchange of Thailand
caused by a bad randomization algorithms. The round
(SET) such as daily SET index, closing prices are
robin algorithm has a strategy to prevent starvation: a
needed. This data is available in a local database kept
situation in which some computers are idle for a long
at ThaiBDC sites. The updated version of the data is
time. This algorithm has a circular queue of
transfer daily from SET to ThaiBDC.
computers. The incoming loads will be distributed to
those computers sequentially and circularly. Although
the round-robin algorithm is efficient, but it may cause
3.2. System architecture
some computers to consume too much loads when the
The architecture of our system is depicted in Figure
cluster consists of processors with different speed.
2. The system consists of 3 components: a front-end
Hence, we use an algorithm that can distribute the
node, a compute node, and a client node.
incoming load by distributing it to a computer that has
Front-end node: A front-end node performs many
the least load. We will describe our algorithm in more
functions. First, it acts as a security and access control
detail in the next section.
for the cluster. Second, it acts as a web portal and web
server for web based applications that use the cluster.
3. Design and implementation Finally, front-end node also acts as a load balancer for
the VaR application. The load balancer is a service
3.1. Risk analysis process that balances the incoming requests to compute nodes.
The process used in this work for the VaR
calculation is as shown in Figure 1. Compute Node
Switch

Front End Node

Client
Node

Figure 3. Sequence diagram for scalable VaR


calculation
Figure 2. System configuration
The application is implemented as web service on
Compute node: Compute node is a collection of PCs compute nodes. The load balancer is responsible for
that contributes computing resources to a cluster. distributing the incoming requests to the deployed web
There must be an application component or a function services. In the next section, we explain our load
running on each compute node and perform a part of balancing algorithm.
computation required by the application. In this work,
the function that runs on each compute node is 3.4. Load balancing algorithm
implemented as a web service. In addition, a resource
is only available through the implemented web service. Load balancing algorithm used is called Least
Implementing web services is convenient with modern Weighted-Workload First Algorithm. This algorithm
development tools such as Microsoft Visual works as follow: First, there are many different VaR
Studio.NET and J2EE. calculation functions being supported in the system.
Client node: Computers located outside a cluster are For each VaR calculation function i, a weight value Wi
called client node. Client node functioned as a user is assigned to that function. Hence, the weight of each
interface for the application. function will be assigned according to the increasing
order of its execution time. The fastest function will be
3.3. Software architecture assigned the weight value of 1 and the next fastest
function will be assigned the weight of 2, and so on.
Figure 3 shows the scalable VaR calculation process. Therefore, the value of Wi can be found by simply
Each number indicates the step for handling an running the function and measure its execution time.
incoming request. Once the weight of each function is being assigned, it
1) The client node requests a URL of web service will be kept in a static table for future use.
located on the compute nodes from the load balancer Load balancer also keeps track of workload on each
running on the front-end node. machine. The tracking of machine workload relies on
2) The load balancer runs the load balancing a machine workload table (see Figure 4). This table is a
algorithm and returns the URL of the chosen compute priority queue that consists of two fields. The first
node to requested client node. field is the URL for web service on each compute
3) The client node uses the returned URL to connect node, and the second field is the workload already
and invoke web service on compute node. assigned to that node.
4) The compute node executes the service using
inputs from the incoming request. The result of
computing is returned to client node.
5) The result of computing is returned to a client
node.
their usual computing environment and tool such as
Microsoft Excel.

Figure 4. Machine workload table

Using these two data structures, the load assignment


is very simple. When a request for VaR calculation
arrives, a load balancer can immediately choose the
node at the top of machine workload table to assign the
task to. The reason is that this node always has the
least execution load on it since the table is kept using
built-in priority queue function. After the selection,
the workload of that node will be adjusted by adding
the pre-assigned weight of the function chosen by user
to the workload. Once this has been done, machine
workload table is updated automatically and the URL
of the web service chosen will be passed back to client
node.
When the execution on a compute node finishes, a
web service function will notify load balancer about
the termination of the function. At this point, the load
balancer will update machine workload list by
Figure 5. Web based user interface
subtracting the weight of finished function from
machine workload. Hence, the load balancer selects
Another user-interface is through Microsoft Excel.
the next compute node based on the most updated
Microsoft Office 2003 supports a web service
information. In this work the notification from web
invocation inside Office application (see Figure 6). We
service to load balancer is done using a separate web
use “Microsoft Visual Studio Tools for Microsoft
service component on the load balance machine. The
Office System 2003” to link Excel application to this
problem may arise when both the load balancing and
VaR computation platform. Users can load the data
notification web services are trying to access the
into Excel worksheet that we provide, connect their
machine workload table at the same time. We solve
computer to internet and perform the computation on
this problem by locking machine workload table using
our system.
.NET application locking mechanism.

3.5. User interface choice

To simplify user experience for this software, the


front end of the system that a user encounters must be
simple. We offer two design alternatives. First, we
build the VaR calculation as a web based application
using ASP.NET running under web server on fronted
machine. A user can interact with the software and
system using a standard web browser and a web form
interface (see Figure 5). A drawback of this approach
is that users have to move data back and forth from
requests. The experiments were repeated for three
times and the average value is presented in Table 1.
The speedup and efficiency of the computation are also
calculated and presented in Table 2 and Table 3
respectively. The plot of speedup and efficiency is
plotted in Figure 7 and Figure 8.

Table 1. Processing time used to process the


requests for 1, 2, 4, and 8 nodes system.

No. of Number of Compute Nodes


request 1 2 4 8
10 59.93 43.21 22.10 16.03
50 295.79 205.50 101.45 62.42
100 538.04 409.51 208.86 120.49
500 2669.70 2030.44 1037.47 567.00
1000 5326.22 4056.75 2071.00 1113.39

Table 2. Speed up computed from Table 1.

No. of Number of Compute Nodes


request 1 2 4 8
10 1.00 1.39 2.71 3.74
Figure 6. Excel-based user interface
50 1.00 1.44 2.92 4.74
4. Experiments 100 1.00 1.31 2.58 4.47
500 1.00 1.31 2.57 4.71
4.1. Experimental configuration 1000 1.00 1.31 2.57 4.78

The test system has one front-end node using Table 3. Efficiency of the processing
Pentium IV 3.0 GHz (with HyperThreading), 512 MB computed from Table 2.
RAM, and 120 GB hard disk. There are 8 compute
nodes in the system. For the compute nodes, Pentium No. of Number of Compute Nodes
III 750 MHz, with 128 MB RAM and 40 GB Disk are request 1 2 4 8
used. Both front-end and compute node use Windows 10 1.00 0.69 0.68 0.47
Server 2003 as an operating system. For the test client,
50 1.00 0.72 0.73 0.59
PC with AthlonXP 1600+ (1.4 GHz) processor, 256
MB RAM, and 40 GB hard disk running Windows XP 100 1.00 0.66 0.64 0.56
Professional is used. All machines are connected 500 1.00 0.66 0.64 0.59
through Fast Ethernet switch. 1000 1.00 0.66 0.64 0.60
In the test problem, we consider all stocks that are
currently listed in the Stock Exchange of Thailand (405 From the experimental results it can be seen that in
equities), and we use the actual historical prices from term of performance, our system can deliver a very
January 2, 2002 to December 30, 2003. good performance. Using 8 nodes, the total calculation
time has been reduced by 4.8 times. For a fixed
4.2. Results and discussion workload, speedup of the system increases with the
number of compute nodes. This happens because the
We test our system by using a simulated client work is distributed throughout the system. However,
program to send multiple simultaneous requests to the this speedup increases eventually levels off due to the
scalable VaR calculation system. We record the time high overhead inherent to the system. If more
spent to finish 10, 50, 100, 500, and 1000 simultaneous machines are available for testing, we may able to
determine this limit. common Windows based platform), enable us to build
We also see that when the number of compute nodes a scalable system for demanding needs. During the
is fixed, speedup tends to increase as the workload course of the development, we found that Windows
increases. However, this might not be obvious for a cluster development is simple since the program
small number of nodes (such as 2 nodes) and small development tool such as visual studio and C#
number of requests (such as 10, 50, and 100) because language is powerful. Moreover, we think that web
the startup and fixed cost overhead is high. service is a well developed and robust technology,
In addition, when we consider the system’s which provides a good reason for adopting Windows
efficiency (defined as the speedup divided by number cluster technology for business applications.
of compute node), the efficiency of the system tends to The proposed architecture, although initially
increase when the number of node increases with the designed to work with Microsoft Windows platform,
workload (see Table 1). Thus, the system can scale up can be applied to other cluster platforms and
well and function more efficiently with more technology, e.g., NPACI Rocks cluster, and Beowulf
workload. This is an appealing characteristic for a Cluster.
high-throughput application.
6. References
Speedup Ratio by Num ber of Processor
6.00 [1] P. Jorion, Value at Risk, 2nd Edition, McGraw-Hill, 2001
5.00 10
[2] M. Snir, S. Otto, S. Huss-Lederman, D. Walker and J.
50
Dongarra, MPI: The Complete Reference Volume 1 - The
4.00 100
Speedup Ratio

500
MPI Core, 2nd edition, MIT Press, 1998
3.00 1000
[3] W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk, B.
2.00 Nitzberg, W. Saphir and M. Snir, MPI: The Complete
Reference Volume 2 - The MPI-2 Extensions, MIT Press,
1.00
1998
0.00
1 2 4 8 [4] A. Geist, A. Beguelin, J. Dongarra, W. Jiang , R.
Num ber of Processors (nodes) Manchek, V. S. Sunderam, PVM: Parallel Virtual Machine:
A Users' Guide and Tutorial for Network Parallel
Computing, MIT Press, 1994.
Figure 7. Speedup plot of the test results
[5] J. Grosling, B. Joy, G. Steele, G. Bracha, The Java
Efficiency by Num ber of Processor
Language Specification, 3rd Edition, Addison-Wesley
1.20 Professional, 2005.

1.00 [6] D. S. Platt, Introducing Microsoft .NET, 3rd Edition,


Microsoft Press, 2003.
0.80
Efficiency

0.60
[7] W. Gropp, E. Lusk, T. Sterling, Beowulf Cluster
10
Computing with Linux, 2nd Edition, MIT Press, 2003.
50
0.40
100
[8] J. D. Sloan, High Performance Linux Clusters: With
500
0.20 Oscar, Rocks, openMosix, And MPI, O'Reilly & Associates,
1000
2004.
0.00
1 2 4 8
Number of Processors (nodes) [9] D. A. Lifka, “High Performance Computing with
Microsoft Windows 2000”, Technical White Paper,
http://webserver.tc.cornell.edu/hpc/papers/cluster20012.pdf.
Figure 8. Efficiency plot of the results
[10] T. Erl, Service-Oriented Architecture: A Field Guide to
5. Conclusion Integrating XML and Web Services, Prentice Hall PTR, 2004.

[11] E. J. Elton, M. J. Gruber, S. J. Brown and W. N.


In this work, we apply Windows clusters to high Goetzman, Modern Portfolio Theory and Investment
volume risk analysis application. We show that Analysis, 6th Edition, John Wiley & Sons, Inc., 2003
standards technology (such as Web services and

You might also like