You are on page 1of 9

Abstract

Oracle Student Learning (OSL) is an enterprise-class software solution for K-12 schools that has been designed to support contemporary, 21st century paradigms of learning. A deployment of OSL may need to support thousands of concurrent users. Hence, the deployment architecture must be able to sustain acceptable response times as and when the number of concurrent users grows. This paper is based on a scalability and performance case study that involved testing the performance of OSL with tens of thousands of concurrent users. The testing focused mainly on the performance and scalability of the Weblogic middleware tier in the deployment architecture, and how to configure and deploy the middleware. The principal purpose of this paper is to present deployment recommendations for scaling up to support more than 10,000 concurrent OSL users. Characteristics of representative configurations as they grow to service more users and more transactions. . The result from this configuration shows that as the number of users and Weblogic nodes increased, the overall transactions per second also increased in a near linear fashion to handle 98,040 requests per minute from 12,800 concurrent users with more room to scale to even larger configurations.

1. Introduction
Performance and scalability are essential for your e-business systems. While your business is growing, you need more power on your e-business site to support more customers. The Oracle Student Learning application is based on using Oracle Database 11g and WebLogic Server 11g (WLS). When serving thousands of concurrent users, OSL requires multiple middle tier nodes as well as multiple database nodes. There are many ways to deploy and configure applications so that users see current information and can update information without blocking each other. The testing evaluated caching of data, coordinating of information between middle tier nodes, optimizing of memory allocations in the Java virtual machines (JVMs), and other aspects of the middle tier deployment and configuration The conclusions and recommendations reached from this testing are described in detail below, starting with an overview of the deployment architecture followed by a description of the test cases, and then laying out the JVM, caching and garbage collection parameters for the middle tier. We conclude with some recommendations for the database

2. Measurement methodologies and results


The OSL WebLogic Cluster was deployed on servers with 4-core Intel Xeon Processor 5600 Series using Sun JVM. Each WebLogic node (VM) was allocated 4 cores and configured with 24GB of memory. The JVMs were deployed using Sun JVM JDK 1.60_21 and configured with 2GB heap space. To accurately measure scalability, a baseline is first established to determine the maximum number of clients and transactions per second that could be achieved using a single Weblogic Application Server. Based on previous benchmarks and feedback from customer test cases, an average think time of 7 seconds per transaction is used. Given this think time, the single node tests determine that the best throughput is achieved when simulating 1600 users with 400 HTTP threads and 20 database connections. The baseline measurements are compared against results at 2, 4, 6, and 8 Weblogic nodes. As additional Weblogic nodes are added, the 1600 users per node and 7 second think time are kept constant. This allows accurate measurement of the transactions per second as an indicator of overall system throughput. Along with measuring the transactions per second, The following sections present the performance and scalability results for the measurements.

Testing started with 2,000 virtual users, scaled up to 16,000 virtual users, and finished with more than 25,000 concurrent virtual users. All tests ran for multiple hours to ensure that the tests represented ongoing load and not just initial load. Result: Linear Scalability Oracle Student Learning was found to scale linearly. The middle tier nodes were found to comfortably support 2,000 concurrent users each with no noticeable degradation as nodes were added. Key to this deployment was configuration of the WebLogic nodes or VMs, including garbage collection within the JVMs and caching in the middle tier.

3.WebLogic Node Allocation


This section provides recommendations and observations for configuring middle tier nodes and JVMs for a large-scale OSL deployment. Recommendations : To deploy Oracle Student Learning WebLogic nodes in a large-scale environment, we recommend the following: Allocate at least 4 cores to each OSL WebLogic cluster node. Allocate at least 1 node (of 4-cores) per 2,000 users. If you are using VMs: o Allocate sufficient resources to the VMs running each OSL WebLogic cluster node so that there is no resource contention between the VMs. Quarantine the OSL VMs and do not over-allocate the hardware such that the VMs comprise more resources than the actual hardware available.

4. JVM Configuration
This section lists the recommendations and observations on JVM configuration. Recommendations This section discusses the recommendations for JVM configuration. Use Sun JVM rather than Oracle JRockit. OSL response times when using Sun JVM 1.6.0_22 were observed to be 15-20% faster than when using JRockit 4.0.0-1.6.0. Allocate 2GB of JVM heap size on each OSL WebLogic node. Garbage collection: Use the concurrent mark sweep garbage collector. Set the survivor ratio to approximately 1:8. Aggressively clean the soft references in the JVM.

5. Analysis An excellent result is achieved by clustering using in-memory replication in Weblogic Server. This configuration is capable of processing 98,040 requests per minute (1634 requests/second) using 8 Weblogic Application Server nodes, supporting 12,800 users with 7-second think time. It appears that creating a clustering environment by setting session affinity with sticky time in Weblogic Server provides very good results for both scalability and performance. The reason is that, as the session information is already available in the Weblogic session cache, Without configuring the system in this manner, the performance may not scale as well as you add nodes to the cluster.

6. Performance Tuning on WebLogic Server


6.1 Execute Queues: The main purpose of having the Execute Queues is to handle the request internal and external related to weblogic server. Here, external requests could be the requests related to:1) applications deployed inside the weblogic server 2) redirection requests from weblogic server to the database, etc Internal requests could be either the communication between the managed servers or between the managed server and admin server, etc. These requests are mainly handled using the threads. Hence, we can conclude that the Execute Queues are mainly used for thread management. This management is achieved by using the using the different queues like weblogic.kernel.default, weblogic.admin. HTTP, weblogic.admin.RMI, etc. Work Managers: With the release of the new versions of weblogic server and new advancement the execute queues or the thread management in weblogic server also had the enhancements. From Weblogic Server 9x we had a replacement done to Execute Queues with the introduction of Work Managers. With introduction of Work Managers, weblogic server helps to prioritize work and allocate the threads based on an execution model. The is taken care by the administrator-defined parameters after observing the actual run-time performance, throughput, monitoring, etc. Defining of this behaviour or rules is done at the different levels differently for differently for different applications, group of applications, etc as per the requirement The scope of the work managers could be any of the following three: 1) The default Work Manager 2) Global Work Managers 3) Application-scoped Work Managers Default Work Manager: These are used to handle thread management and perform self-tuning.This Work Manager is used by an application when no other Work Managers are specified in the applications deployment descriptors.

In many situations, the default Work Manager may be sufficient for most application requirements. WebLogic Servers thread-handling algorithms assign each application its own fair share by default. Applications are given equal priority for threads and are prevented from monopolizing them. If you do not explicitly assign a Work Manager to an application, it uses the default Work Manager. Although from weblogic server 9x execute queues are not available still you can enable Execute Queues (backward compatability) in the following ways: 1) Using the command line option: Dweblogic.Use81StyleExecuteQueues=true 2) Setting the Use81StyleExecuteQueues property via the Kernel MBean in config.xml <execute-queue> <name>weblogic.kernel.Default</name> <queue-length>256</queue-length> <thread-count>100</thread-count> <threads-increase>0</threads-increase> <threads-maximum>100</threads-maximum> <threads-minimum>100</threads-minimum> </execute-queue> <use81-style-execute-queues>true</use81-style-execute-queues> After enabling, Work Managers are converted to Execute Queues based on the following: 1) If the Work Manager implements a minimum or maximum threads constraint, then an Execute Queue is created with the same name as the Work Manager. The thread count of the Execute Queue is based on the value defined in the constraint. 2) If the Work Manager does not implement any constraints, the the global default Execute Queue is used For more information please refer to the link: http://download.oracle.com/docs/cd/E11035_01/wls100/config_wls/self_tuned.html#wp1067159

Thread Count Scenarios: Thread Count < number of CPUs This Results in an under-utilized CPU. So Increase the thread count. Thread Count == number of CPUs Theoretically ideal, but the CPUs are under-utilized. So Increase the thread count. Thread Count > number of CPUs Practically ideal, resulting in a moderate amount of context switching and a high CPU utilization rate. Assigning Applications to Execute Queues/work managers Although the default execute queue can be configured to supply the optimal number of threads for all WebLogic Server applications, configuring multiple execute queues can provide additional control for key applications. By using multiple execute queues, we can guarantee that selected applications have access to a fixed number of execute threads, regardless of the load on WebLogic Server. Default WebLogic Server installations are configured with a default execute queue which is used by all applications running on the server instance. You may want to configure additional queues to: Optimize the performance of critical applications. Throttle the performance of nonessential applications Remedy deadlocked thread usage To use user-defined execute queues in a WebLogic Server 9.0 domain, you need to include the use81style-execute-queues sub-element of the server element in the config.xml file and reboot

6.2.ConnectionPools Enhance Performance Establishing a JDBC connection with a DBMS can be very slow. If the application requires database connections that are repeatedly opened and closed, this can become a significant performance issue. WebLogic connection pools offer an efficient solution to this problem. When WebLogic Server starts, connections from the connection pools are opened and are available to all clients. When a client closes a connection from a connection pool, the connection is returned to the pool and becomes available for other clients; the connection itself is not closed. There is little cost to opening and closing pool connections. Prepared Statement Cache The prepared statement cache keeps compiled SQL statements in memory, thus avoiding a round-trip to the database when the same statement is used later. Set LRU from the drop down box from Services JDBC Data Source Connection Pool Statement Cache Type Tuning JDBC ConnectionPool Initial Capacity During development, it is convenient to set the value of the InitialCapacity attribute to a low number. This helps the server start up faster. In production systems, consider setting InitialCapacity equal to the MaxCapacity so that all database connections are acquired during server start-up. If InitialCapacity is less than MaxCapacity, the server then needs to create additional database connections when its load is increased. When the server is under load, all resources should be working to complete requests as fast as possible, rather than creating new database connections. Tuning JDBC ConnectionPool Maximum Size The MaxCapacity attribute of the JDBCConnectionPool element allows to set the maximum number of physical database connections that a connection pool can contain. Different JDBC drivers and database servers might limit the number of possible physical connections. In production, it is advisable that the number of connections in the pool equal the number of concurrent client sessions that require JDBC connections. The pool capacity is independent of the number of execute threads in the server. There may be many more ongoing user sessions than there are execute threads. Tuning Parameters for starting Weblogic Server Clearly, the JVM's performance also affects WebLogic's performance. Most aspects of JVM tuning relate to the efficient management of the memory heap, an efficient garbage collection scheme, and the choice between two Unix threading models: green and native threads

6.3. Tuning Java Virtual machines Garbage collection is the first and foremost thing while tuning JVM. Garbage collection is the VM process of de-allocating unused java objects in the java heap. For tuning Garbage collection we will focus on: JVM heap size Generational garbage collection JVM Heap Size The Java heap is a repository for live objects, dead objects and free memory. The JVM heap size determines how often and how long the VM spends collecting garbage i.e. dead objects. Larger the heap size full garbage collection is slower and frequency is less. Smaller the heap size full garbage collection is faster and frequency is more. Goal for tuning heap size To minimize the time spent doing garbage collection while maximizing number of clients to be

handled at a time. To ensure maximum performance during benchmarking the heap size values should be set in such a way that garbage collection does not occur during the entire run of the benchmark. Determine Optimal Heap Size Use the -verbosegc option to turn on verbose garbage collection output for your JVM and redirect both the standard error and standard output to a log file. Make sure that the heap size is not larger than the available free RAM on the system. Analyze the following data points How often is garbage collection taking place? In the weblogic.log file, compare the time stamps around the garbage collection. How long is garbage collection taking? Full garbage collection should not take longer than 3 to 5 seconds. Lower heap if major GC time is greater. What is your average memory footprint? In other words, what does the heap settle back down to after each full garbage collection? If the heap always settles to 85 percent free, you might set the heap size smaller. You might see the following Java error if you are running out of heap space java.lang.OutOfMemoryError --no stack trace available>> java.lang.OutOfMemoryError --no stack trace available>> Exception in thread "main" Turning on Verbose Garbage Collection Turn on verbose garbage collection output for Java VM when WebLogic Server is started, as shown in the example. Redirect both the standard error and standard output to a log file. This places thread dump information in the proper context with WebLogic Server informational and error messages, and provides a more useful log for diagnostic purposes. Example
USER_MEM_ARGS="-server -Xms2g -Xmx2g -XX:PermSize=256m -XX:MaxPermSize=256m XX:NewSize=800m -XX:MaxNewSize=800m -XX:SurvivorRatio=8 -Xnoclassgc -XX:+DisableExplicitGC verbose:gc -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC -XX:UseBiasedLocking -XX:ParallelGCThreads=12 -XX:+PrintGCTimeStamps -XX:+PrintGCDetails"

Specifying Heap Size Values Java heap size values are specified when starting the WebLogic Administration Server from the Java command line. The default size for these values is measured in bytes. Append the letter k or K to the value to indicate kilobytes, m or M to indicate megabytes, and g or G to indicate gigabytes. Java Heap Size Options -XX:NewSize: This option is used to set the New generation Java heap size. Set this value to a multiple of 1024 that is greater than 1MB. Make sure that the New generation is increased as the number of processors are increased. Memory allocation can be parallel, but garbage collection is not parallel. -XX:MaxNewSize This option is used to set the maximum New generation Java heap size. Set this value to a multiple of 1024 that is greater than 1MB. -XX:SurvivorRatio This option is used to configure the ratio of the Eden/survivor space size. Try setting this value to 8 and then monitor your garbage collection. -Xms

This option is used to set the minimum size of the memory allocation pool. Set this value to a multiple of 1024 that is nearly 2g. As a general rule, set minimum heap size (-Xms) equal to the maximum heap size (-Xmx). -Xmx This option is used to set the maximum Java heap size. Set this value to a multiple of 1024 that is nearly 2g. Control Permanent Generation area size explicitly
Set initial heap size = max heap size -XX:PermSize=256m -XX:MaxPermSize=256m Use less than ~80% available RAM on the machine 32-bit limit is 2GB; 64-bit is 16GB

6.4. CLUSTERED CONFIGURATIONS: Clusters greatly improve efficiency and failover. Customers using clustering should not see any noticeable performance degradation. A number of Weblogic Server deployments in production involve placing a cluster of Weblogic Server instances on a single multiprocessor server. Large clusters performing in-memory replication of session data for Enterprise JavaBeans (EJBs) or servlets require more bandwidth than smaller clusters. Consider the size of session data and the size of the cluster. There are three general categories of traditional clustering architectures, based on how each server in the cluster accesses memory and disks, and whether servers share a copy of the operating system and the I/O subsystem. These three categories are: Shared-memory: In the shared-memory model, all servers in the cluster use the same primary memory, through which all traffic to the cluster is routed. The servers also share a single copy of the operating system and the I/O subsystem. Shared-disk: In the shared-disk model, each server has its own memory but the cluster shares common disks. Since every server can concurrently access every disk, a distributed lock manager is required. Shared-nothing: In the shared-nothing model, every server has its own memory and its own disks. Systems based on disk mirroring often use the shared-nothing model. In addition, there is also the hybrid common-disk shared-nothing model, which uses a shared-nothing architecture on top of shared-disk hardware. Only one server at a time can access a disk, but if that server fails, another can provide uninterrupted service. There are other common attributes that help define how a cluster operates. In an active/active cluster, each server runs its own workload, and can assume responsibility for another cluster member in the event of a failure. Commonly, this functionality means that cluster servers are paired, although it may work more generally. In a cluster that provides failover/failback, the workload of a failed server is automatically transferred to another server until the first server recovers, at which time its workload is automatically transferred back. A cluster may use IP switch over to allow clients to find the replacement for a failed server with a minimum of service disruption. IP switchover causes a replacement server to change its IP address to match that of a failed server; it requires support for DHCP (Dynamic Host Configuration Protocol) and ARP (Address Resolution Protocol) to dynamically register an IP address change and then to update the physical network address translation caches of other systems attached to the cluster subnet. Like switchover, IP impersonation allows clients to find a replacement server. Instead of dynamically assigning a new IP address, however, IP impersonation reroutes network traffic intended for the failed server to the replacement server.

Others 6.5. Operating System Monitoring UNIX provides many tools for monitoring system sar: system activity mpstat: per-processor statistics vmstat: virtual memory statistics netstat: network statistics iostat: Input/Output statistics HTTP Sessions In terms of pure performance, in-memory session persistence is a better overall choice when compared to JDBC-based persistence for session state. If you use a single large attribute that contains all the session data and only 10% of that data changes, the entire attribute has to be replicated. This causes unnecessary serialization/deserialization and network overhead. You should move the 10% of the session data that changes into a separate attribute. So during developing the application need to take care the above fact. For the cluster environment make sure that the objects which are participating in the session need to be serialized. Performance Terms and Definitions Performance can be defined as: how the systems response time and throughput are affected by adding load. Capacity can be defined as: what the maximum threshold for the system is under a given set of conditions. Scalability can be defined as: how well the system responds to increasing load by adding additional resources. Most commonly we use the response time of the system and the throughput of the system as measurements for these terms. 7. Additional Recommendations: Database Configuration
This section discusses additional observations and recommendations regarding the dataset base configuration. This configuration was specifically targeted during the testing described in this paper. However, there are some pertinent points of interest. Running OSL against a single 8-core database node (Intel Xeon Processor 5600 Series) should be able to support 10,000 to 11,000 concurrent users and around 5,000 concurrent users in a database node running on a 4-core box. The following are our recommendations: Allocate sufficient database nodes to support the queries and transactions from the OSL nodes. Allocate approximately one database core for every two WebLogic server cores running OSL. The ratio may be altered based on database load. Database nodes should be deployed on the bare machine (not using VMs) and tuned to support a maximum number of queries and transactions. Use Oracle Real Application Clusters (RAC) when deploying multiple database nodes.

8. Conclusion This effort provides compelling evidence that ORACLEs Weblogic provide the scalability and integration required to deliver predictable scalability for demanding e-business workloads. The result from the configuration using Weblogic Servers sticky time highlights the near-linear scalability

obtained as the environment scaled up. Running against a single Weblogic Application Server the configuration supported 1600 users driving 216 transactions per second. As additional Weblogic nodes were added, the number of users per Weblogic Application Server were scaled linearly to enable accurate comparisons of overall transactions per second. The result from this configuration shows that as the number of users and Weblogic nodes increased, the overall transactions per second also increased in a near linear fashion to handle 98,040 requests per minute from 12,800 concurrent users. Using Weblogic Server provides very good load balancing when using the sticky time parameter. Finally, this effort proves that building an e-infrastructure doesnt have to be an exercise filled with uncertainty and doubt. Building on a hardware and software platform that can offer predictable levels of performance today and in the future will help alleviate many problems. The results of this scalability study provide compelling evidence that Weblogic on servers delivers a highly scalable solution for building an infrastructure that will support demanding e-business workloads, with reliable predictability.

The above are only some of the various ways that the server can be tuned. Bear in mind, however, that a poorly designed, poorly written application will usually have poor performance, regardless of tuning. Performance must always be a key consideration throughout the stages of the application development life cycle - from design to deployment. It happens too often that performance takes a back seat to functionality, and problems are found later that are difficult to fix.

You might also like