You are on page 1of 18

E-Guide

Scalability strategies: Managing


growing Oracle databases

Although Oracle offers a wide array of tools and techniques for


database scaling, IT managers are consistently challenged to apply
these tools at the proper times to ensure seamless growth, while
keeping hardware investments to a minimum. Read this E-Guide and
learn more about key considerations, best practices and strategies for
managing Oracle database growth. Get expert advice for scaling Oracle
databases using clustering or other approaches.

Sponsored By:
SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

E-Guide
Scalability strategies: Managing
growing Oracle databases
Table Of Contents
Scaling an Oracle database: What is the best strategy for you?

Understanding Oracle Real Application Clusters (RAC) best practices

Resources from Dell, Inc. and Intel

Sponsored By: Page 2 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

Scaling an Oracle database: What is the best


strategy for you?
By Don Burleson, Contributor, SearchOracle.com

One of the reasons Oracle has become a leader in database technology is the flexibility of
their tools and utilities that allow a database to grow from one used by a small department
to one more suitable for a giant multinational. The company offers a wide array of tools and
techniques for database scaling, but it has always been the challenge of the IT manager to
apply these tools at the proper times to ensure seamless growth, while keeping hardware
investments to a minimum.

Even though hardware prices fall by an order of magnitude every decade, the investment in
computing resources remains a large, critical expense that requires careful management.
Hardware depreciates regardless of use, so a too-much, too-soon approach can be wasteful.
On the other hand, waiting until the system experiences stress-related response time delays
is also bad, especially since today's end-user community has very little tolerance for
sluggish response time.

Goals for Oracle scalability management

The overall goals for any IT manager are twofold: maximize end-user satisfaction while
minimizing expenses.

• Monitor end-user satisfaction -- The primary objective of any information system


is end-user happiness and, assuming that a user's data is correct and complete, the
number one factor is response time. The IT manager must put in place end-user
monitors and carefully ensure that end-user access speeds remain fast.
• Manage economic resources -- Hardware is expensive and an IT manager must
devise a plan to add new hardware only when it is needed. Advanced planning with
Oracle tool standards can also reduce the DBA costs involved in growth. Given these
goals, there are several important tools and techniques that will come into play as
you design a scalable architecture for an Oracle database.

Sponsored By: Page 3 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

Techniques for Oracle scalability management

There are many proven techniques and approaches for ensuring your success in a rapidly
growing database:

• Enforce Standards -- Oracle has many standard conventions that facilitate


seamless growth. This would include disk standards, like SAME (Stripe And Mirror
Everywhere), which is Oracle's "RAID-10" standard for data file layouts on disk. Also,
be sure to follow the Oracle Optimal Flexible Architecture (OFA) for all external files
and directories. Following these standards will make it very easy to grow the
database when the time comes.
• Perform Capacity planning -- Growth monitoring and planning are critical to
seamless growth. Like any project, you must know exactly how long it takes to have
resources added to your infrastructure and plan accordingly. For example, if it takes
a vendor 72 hours to install more disks, your predictive monitoring must alert you
more than 72 hours before you have a disk-full condition. A good IT manager will
install capacity planning and monitoring tools that alert them well in advanced of any
resource shortage. In a rapidly growing database, the idea is to fix the problem
before it cripples the database and effects end-user response time.
• Don't skimp on resource quality -- When choosing a hardware vendor and DBA
staff, don't look solely at cost. Top quality resources are expensive and a penny-
wise, pound-foolish approach can backfire. For example, a DBA with 10 years
experience who charges $300/hr is often a better value than a $75/hr DBA because
they can work many times faster. With hardware, choosing from the top tier vendors
such as HP, IBM, Sun Microsystems, is always a wise approach. You will pay more up
front, but you should get what you pay for.
• Use the right tools for the job -- Because Oracle is the world's most flexible
database there are many tools that do similar jobs. For example, high availability can
be accomplished with Real Application Clusters (RAC), Oracle Streams, and Oracle
Data Guard. The savvy IT manager will hire Oracle experts to advise them about the
right tool to match their specific requirements.

Sponsored By: Page 4 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

While this may seem self evident, we must remember that the stakes are high, and
hundreds of millions of dollars are riding on an IT manager making the right decisions for
any mission critical database.

When designing Oracle systems, it's important to remember that we must understand a few
very important realities:

• Hardware depreciates rapidly -- Hardware becomes worthless quickly,


depreciating as a function of age, not a function of usage. All CPU's, disks, and RAM
rapidly depreciate to worthless in just a few years, regardless of how much they are
used.
• Today's servers allow for internal expansion -- Many companies offer servers
that can accept additional RAM and CPU quickly, such that you can scale-up, within a
single server environment.
• Human costs now exceed hardware costs -- With hardware costs falling rapidly,
Oracle DBA costs will frequently exceed hardware costs. For example, instead of
paying a DBA $200,000 to tune the I/O for a large database, you may choose to
deploy solid-state disks for $100,000.
• Independent advisors are more reliable -- Obviously, hardware vendors will
have a built-in bias, as will Oracle's consultants, each pushing their own hidden
agendas. It's not hard to find experts with a proven track record of success in
architecting scalable systems.

Now that we see the basic concepts, let's examine a proven approach for scalability, the
scale-up, scale-out approach to infinite growth.

Oracle scalability solutions

While Oracle has a host of tools that facilitate scalability (online reorganization tools), Oracle
RAC is most commonly associated with scalable Oracle solutions.

RAC is marketed for two purposes, scalability and continuous availability. While RAC is a
superb 24 by 7 availability solution (when used in conjunction with other redundant
components), using RAC for scalability is widely misunderstood.

Sponsored By: Page 5 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

It's important to know that RAC only prevents outages that are due to an Oracle instance
failure and a complete High Availability solution also requires redundant disks and other
hardware components.

Where RAC shines is its ability to quickly add an entire server to a cluster, thereby adding
horsepower without effecting end-user response time. Oracle RAC Grid control is not just for
adding database resources. By using pre-loaded servers, you can use Oracle grid control to
add servers to any layer in the architecture, adding servers to the Web server, application
server or database server (Figure 1):

Figure 1 – Using Oracle grid control to add servers

Using the scale-up, scale-out approach, RAC only comes into the picture when you have
saturated a single server. Let's take a closer look at how the scale-up, scale-out approach
works.

Scale-up, scale-out

To achieve seamless growth, you need to be able to start with a server environment
whereby additional computing resources can be added without service interruption. You
must be able to add RAM, CPU's and disk as the workload grows. Eventually, you will
saturate the capacity of even the largest single server, and then you start the scale-out

Sponsored By: Page 6 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

process, adding additional servers to accommodate your growing workloads. Let's start by
understand how scale-up (vertical integration) works with Oracle systems.

Scale-up (vertical scaling)

Oracle hardware vendors promise on-demand computing resources, lower TCO, and easy
scalability. Their huge servers offer savings from CPU and RAM consolidation, far less
human management costs, and seamless allocation of resources.

In the "scale-up" approach, server resources (CPU, RAM, Disk) can be added into a single,
monolithic server, which can have slots for up to 64 CPU's and over 256 Gigabytes of RAM.
Examples include the HP Superdome (64 CPU), the Unisys ES-7000 Series (32 Processors),
the Sun Microsystems SunFire and the IBM X and Regatta class servers.

Adding resources to a single server frame is simple and effective because machine
resources (especially CPU) are instantly available to the growing application. The scale-up
approach is simple and yields immediate benefits, without the complexity of the scale-out
approach:

• On demand resource allocation by sharing CPU and RAM between many resources.
• Less maintenance and human resources required to manage fewer servers.
• Optimal utilization of RAM and CPU resources.
• High availability through fault tolerant components.
• The expense and management of RAC is not required.

But we cannot "scale-up" forever, and as our processing demands grow, we need to look at
the scale-out, the "horizontal" integration of many large servers in a RAC cluster
environment.

Scale-out (horizontal scaling)

Grid vendors offer solutions where server blades can be added to Oracle as processing
demand increases. While grid computing offers infinite scalability, no central point of failure,
and the use of fast, cheap server blades, it does have the same in-the-box parallelism that

Sponsored By: Page 7 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

is found within a monolithic server. Unlike the scale-up approach, Oracle 10g Grid
computing is not automatic and requires additional costs, additional training, as well as
sophisticated monitoring and management software.

The "scale out" approach is designed for super large Oracle databases that support many
thousands of concurrent users. Unless the system has a need to support more than 10,000
transactions per second, it is likely that the system will benefit more from a scale up
approach.

In the real world, savvy corporations combine vertical scalability and horizontal scalability.
They start with a large vertical architecture server, adding resources as needed. If
continuous availability is also required, they may have a mirrored server using long-distance
RAC or Oracle Streams.

When the single server is approaching capacity, it is time to "scale-out" with horizontal
scalability employing RAC and adding additional servers, each with a vertical scaling
architecture.

For these huge shops that rely on on-demand server allocation with Oracle Grid control, we
see the ability to gen-in new servers on an as needed basis.

Combining horizontal scalability and vertical scalability

In the real world, savvy corporations combine vertical scalability and horizontal scalability.
They start with a large vertical architecture server, adding resources as needed. If
continuous availability is also required, they may have a mirrored server using long-distance
RAC or Oracle Streams.

Conclusions

The scale-up and scale-out approaches are simple in concept, but most difficult to deploy in
practice. A common misconception is that everyone will need to eventually scale-out.
However, in the real world, very few applications have workloads that saturate the capacity
of the million dollar servers with CPU's and hundreds of gigabytes in RAM.

Sponsored By: Page 8 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

The scale-out approach using RAC and grid control are designed for super large Oracle
databases that support many thousands of concurrent users. Unless a system has a need to
support more than 10,000 transactions per second, it is likely that the scale-up approach
will be more than adequate.

Amazon is an excellent example of a scale-out Oracle shop. Amazon announced plans to


move their 14 trillion byte Oracle database to Oracle RAC on Linux and Amazon uses load-
balanced Linux Web servers to horizontally scale its Web presence to millions of connected
users.

Remember, large-scale RAC databases use large servers, each with 32 or 64 processors and
over a hundred gigabytes of RAM. As the capacities of the large servers are exceeded, a
new server is generated into the RAC cluster.

Sponsored By: Page 9 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

Understanding Oracle Real Application Clusters


(RAC) best practices
By Don Burleson, Contributor, SearchOracle.com

With end users getting accustomed to instantaneous response times, Oracle, more than
ever, is challenged to provide continuous availability to its database products. An important
tool the folks at Redwood Shores have to help them accomplish that is Oracle Real
Application Clusters (RAC).

What is RAC? In a nutshell, it is a software tool that allows a single database to be accessed
by many Oracle programs. If one server fails, transactions can be redirected to another live
server with a minimum of downtime.

Oracle advertises RAC as a cure for many ailments. IT shops can misunderstand such
marketing hype, however, and not recognize the cost and benefits of using RAC in a high
availability (HA) environment.

Let’s explore some Oracle RAC best practices and in the process shed some light on
common mistakes users make when using this cluster-based technology. In this Oracle RAC
guide, we’ll take a look at:

• RAC planning best practices


• RAC implementation best practices
• RAC infrastructure considerations
• Hardware architecture and RAC performance
• RAC backup and recovery best practices
• Performance and tuning best practices

One of the most common mistakes with Oracle RAC is misunderstanding its functions and
limitations. Oracle Real Application Clusters is used as part of a comprehensive capacity
planning strategy, but the technology’s strengths and limitations are not always understood.
Here is a list of the most common misperceptions about the technology.

Sponsored By: Page 10 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

Oracle RAC is ideal for scalability

Even though Oracle Corporation wants you to buy tiny “blade servers” and use their grid
computing solution for “horizontal scaling,” that’s not how most shops use RAC. Keep in
mind that RAC is only a legitimate scalability option for very large IT shops that need more
horsepower than a single server can deliver.

Instead, it’s an Oracle best practice to scale-up first, and then scale out by first building up
within a single server through “vertical scaling.” Only after you have saturated a large
server do you need to use RAC to “scale out” the application across multiple servers. Today,
a single server’s memory and CPU horsepower can be significantly expanded compared with
just several years ago, making it easier to add resources instead of plunking in a new server
to the RAC environment. In real-world environments, a single server can handle thousands
of transactions per second. Only the world’s largest Oracle databases need to scale-out
using RAC nodes.

Oracle RAC is a standalone high-availability solution

Remember that RAC only protects you against instance failure, and that’s only one of many
components that can cause an unplanned outage. For true continuous availability, we must
deploy triple-mirrored disks (with a mean-time-to-failure rate expressed in centuries) and
redundant network components.

For complete availability on each RAC server node, you will want multiple host bus adapters,
multiple network cards and multiple power sources. Just as we have failover at the instance
layer, you need to purchase software to allow the multiple host bus adapter cards to
automatically failover and issue a notification that one has failed.

As we have noted, RAC systems require a cluster interconnect in order to accommodate


RAM-to-RAM transfers of data blocks in the RCA cache fusion layer. This interconnect must
be very fast, with high bandwidth and low latency. Interconnects include:

Sponsored By: Page 11 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

• Dark fiber: Dense Wavelength Division Multiplexing (DWDM) technology


• Infiniband
• Myrinet

This cache fusion bottleneck is another reason why RAC scale-out, or horizontal scalability,
is problematic. If your cluster interconnect cannot handle the traffic, extra servers will
actually degrade your performance instead of helping it. The only way around this problem
is to change your entire application to accommodate RAC, or to purchase faster storage
such as Solid State Disk.

Oracle RAC ensures fast response time

Response time for transactions is always important, but it’s especially important for RAC
databases. This is because of the connection wait-time that is used to detect whether a RAC
node, or server, has failed. Consequently, you must plan to ensure that new transactions
are serviced in less than one second wall-clock time so that you can set a failover time of
two seconds.

Oracle RAC does not need a disaster recovery component

Except in the rare cases where you can deploy Dense Wavelength Division Multiplexing
(DWDM) technology, known as dark fiber, you still need to create a disaster recovery
solution. Because RAC nodes are normally located within a few miles of each other, a
natural disaster like a hurricane would still cause a global outage. So it has become a RAC
best practice to also deploy a fast-failover geographical solution like Data Guard -- or better
still, n-way Streams.

Now that we understand the planning aspect of best practices, let’s take a closer look at
RAC best practices issues after we have implemented our new database.

Oracle RAC implementation best practices

Operational RAC databases follow many of the same best practices as any Oracle database,
but there are some that are unique to Oracle RAC systems. First, it’s an important best

Sponsored By: Page 12 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

practice to plan RAC servers in a way that minimizes the geographical distance between the
RAC nodes while still keeping them separate, in order to avoid a failure of all nodes.

As a reference, you can take a look at what I wrote on how to implement RAC
implementation guidelines.

In a busy RAC database, the speed of the server interconnect is critical for fast response
times. It’s a commonly accepted best practice to use the fastest possible interconnect,
typically a fiber optics solution like dark fiber.

Some shops will place RAC nodes in separate buildings in the same neighborhood, but with
the advent of the superfast dark fiber interconnect, you can use “Extended RAC” and place
RAC nodes up to 100 miles apart. This allows you to combine high availability with disaster
recovery.

Dark fiber is rather expensive, however. To reduce costs, most shops adopt a best practice
where they combine RAC with disaster recovery solutions like n-way Streams replication.

The whole point of RAC is to make end users automatically reconnect to a surviving server
when one server fails. This is done either at the Web-server level or with the Oracle
Transparent Application Failover (TAF) option. Whichever tool you choose, you should wait
about three seconds before assuming that the server is dead and re-trying a new RAC
server.

Next, let’s take a closer look at specific RAC technical best practices.

Oracle RAC interconnect best practices

Since RAC is a method in which many instances share the same database, shared data
blocks are transferred between the servers using a high-speed interconnect called “cache
fusion.” In order to keep performance fast, it’s critical that you pay close attention to the
interconnect layer and remember these points:

Sponsored By: Page 13 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

RAC likes small block sizes, the interconnect must have extremely fast network hardware,
and RAC load balancing is critical to performance.

Oracle RAC node load balancing best practices

I disagree with Oracle’s practice of load balancing using a least-loaded approach because of
the overhead it lays on top of the cache fusion layer. In the real world, like-minded end
users are directed to the same RAC server. If we have a RAC system with different types of
end users, we would want to load balance according to their data needs. For example,
customer processing might be on node one, order processing on node two, and product
processing on node three. Grouping RAC end users by data needs ensures that cache fusion
overhead is minimized.

Oracle RAC disk storage management best practices

In order to implement a RAC system, you should use a shared storage device because many
servers must have concurrent access to the disks. A single instance database can, however,
use Direct Attached Storage (DAS), which is an array of inexpensive disks connected to a
single server. You must now use what is known as a Storage Area Network (SAN). A SAN,
which is more expensive and complex, is a disk array capable of connecting to many
servers, usually through Fibre Channel. This requires a unique set of hardware, ranging
from host bus adapters to the SAN itself. It’s important that your DBA have complete
knowledge of the internals of the data storage layer.

Oracle RAC block size best practices

It has become a best practice in RAC to use a small 2 kilobyte block size in order to
minimize the “baggage” shipped across the cache fusion layer. Because the block size is the
unit of work, the smaller the block size, the higher the granularity of data being transferred,
with less overhead. If you have long rows (greater than 2 kilobytes), then you will want to
move to a 4 kilobyte block size.

Sponsored By: Page 14 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

The implementation of a RAC cluster is only the beginning, and it’s critical to constantly
monitor the health of your RAC clusters so that you can spot and fix impending problems
before you inconvenience your end users.

Oracle RAC monitoring best practices

To ensure that a RAC node never experiences a global problem, a proper monitoring
infrastructure is an absolute requirement. RAC databases rarely fail without warning. If the
DBA understands the proper metrics to watch, he can create an alert system that notifies
him of a looming problem so that he can fix it before the instance crashes.

The DBA must monitor the cluster, the shared disk setup, ASM (or OCFS), the database
instance, listeners, and more in-depth metrics such as cache coherency, interconnect
latency, disk times from multiple systems, and a range of other things.

While higher-cost performance monitoring tools such as Oracle Grid Control can help
perform rudimentary RAC monitoring for beginners, a RAC DBA should have the coding
skills to build his own RAC monitoring infrastructure using dictionary queries,
dbms_scheduler and email alert mechanisms.

Wrapping up the discussion of Oracle RAC best practices, let’s focus on the best way to
define job roles for a RAC database.

Oracle RAC staffing best practices

One best practice for RAC databases is to always hire an experienced RAC DBA to manage
your cluster, avoiding people who have had the RAC training but have no job experience.

It’s important to recognize that human resource costs are the most expensive part of an
Oracle shop. Over the decades, hardware costs have steadily fallen while manpower costs
have remained the same.

It’s important to note that Oracle professionals with RAC skills command a hefty premium
over an ordinary DBA. A recent Oracle salary survey notes that an average DBA earns about

Sponsored By: Page 15 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

$97,000 per year, whereas RAC experts commonly earn $140,000 a year. Those who
manage multi-billion-dollar RAC databases typically command upwards of $250,000 per
year.

Sadly, there is no easy way to “grow your own” RAC DBA. The training courses are very
expensive, and there is no substitute for real-world experience. And training your own DBA
in RAC may make him more marketable. It’s not uncommon to spend tens of thousands of
dollars teaching RAC to your DBA only to lose him to a better job offer.

Oracle RAC job role best practices

There is a perpetual conflict between systems administrators (SAs), who traditionally


manage servers and disks, and the RAC DBAs who are responsible for managing the RAC
database. There are also clearly defined job roles for network administrators, who are
especially challenged in a RAC database environment to manage the cluster interconnect
and packet shipping between servers.

If your DBA is going to be held responsible for the performance of the RAC database, then
it’s only fair that he be given root access to the servers and disk storage subsystem.
However, not every DBA will have the required computer science skills to manage a
complex server and SAN environment, so each shop makes this decision on a case-by-case
basis.

Oracle RAC training best practices

One of the sure ways to set your company up for an unplanned outage is to fail to train your
SA, DBA and network administrator properly. SAN environments like EMC, Tagmastore and
NetApp have complex architectures, and they frequently require training classes.

Disk configuration is also challenging, and RAC will function only when using specific disk
setups such as ASM, OCFS, RAW, or a third-party cluster file system. These tools require
training classes.

Sponsored By: Page 16 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

Network administrators must also receive training on how to work with the cluster
interconnect, as well as specialized interconnects such as Infiniband and DWDM.

Of all those on a RAC staff, DBAs will have the greatest learning curve. They will have to
understand how to set up and administer all of the complex RAC components, including the
clusterware and file system storage.

Conclusion

In summary, while RAC offers continuous availability, it’s not magic. There is a lot of work
required to ensure that a RAC database is always available. Every RAC database has some
unique properties, but there are some well-known perils and pitfalls as well. Using Oracle
RAC best practices from other shops is a must for ensuring success. The vast majority of
the best practices with RAC relate to properly planning the infrastructure and configuring
and deploying the RAC database.

About the author:

Donald K. Burleson is a leading Oracle expert, with more than 25 years of DBA experience.
He has authored more than 30 Oracle books, including five officially authorized O books on
Oracle tuning. Burleson also manages a popular DBA website, www.dba-oracle.com.

Sponsored By: Page 17 of 18


SearchOracle.com E-Guide
Scalability strategies: Managing growing Oracle databases

Resources from Dell, Inc. and Intel

Reduced Costs Increase Oracle Database OLTP Workload Service Levels

Dell Services - IT Consulting for Oracle Databases

Consolidating Multiple Oracle OLTP Workloads on the Dell PowerEdge R710 Server

About Dell, Inc. and Intel


Dell and Intel are strategic partners in delivering innovative hardware solutions to solve
your most challenging IT problems. Together we are delivering virtualization optimized
solutions to maximize the benefits of any virtualization project.

Sponsored By: Page 18 of 18

You might also like