You are on page 1of 36

Welcome to IT862: High Availability Administration & System Design

Complimentary Wireless SSID: IT862


High Availability
Administration &
System Design
Nickolas Tong
Higher Education C.S.A.
Apple HiEd Gulf States
with special guests Brian Pitts and Benjamin Wald
January 12th, 2007
What is High Availability (HA)?
Introduction
Myth of the Nines

Nines only measure that which can be modeled

Nines are only an average

Nines reflect a single system


view of the world.
Unplanned Downtime
What do you blame?
• Hardware Failure?
• Human Error?
• Software Error / Bug?
• Viruses / Spyware?
• Natural Disaster?
Unplanned Downtime
Categorized
• Hardware Failure (44%)
• Human Error (32%)
• Software Error / Bug (14%)
• Viruses / Spyware (7%)
• Natural Disaster (3%)
Modes of Failure
Most Common Failures
• Hardware
• Environmental
• Network
• File and Print
• Database
• Security Breaches
Blazing Pace of Innovation
Cheetah Puma Jaguar Panther Tiger Tiger on Intel
Mac OS X Server v10.0 Mac OS X Server v10.1 Mac OS X Server v10.2 Mac OS X Server v10.3 Mac OS X Server v10.4 Mac OS X Server v10.4.4

2001 2001 2002 2003 2005 2006


HA Defined
in Mac OS X Server v10.5

2007: The year of high availability


Mail Server
Standards-based email
• Active/active clustering with Xsan
• 64-bit mail services
– SMTP (Postfix)
– IMAP and POP (Cyrus)
• Vacation messages!!!
Active/Active Mail Clustering
Mail Servers

Directory Server
Xsan Metadata Controller

Mail Outlook Evolution


Entourage MS Mail kmail
iCal Server
Calendar sharing and scheduling
• Schedule group meetings and events
• iCal 3, Sunbird, Chandler, Outlook clients
• Active/active clustering with Xsan
• No client access licenses
• Uses standard CalDAV protocol
• Open source Darwin Calendar Server
Active/Active Clustering with Xsan

iCal Server Directory Server

iCal 3 Evolution Outlook


Sunbird
Open Directory 4
Directory services and network authentication
• LDAP proxy
• Cross-domain authorization
• Cascading replication
• Replica sets
• RADIUS authentication
Leopard Server

Shipping Spring 2007


Monitoring: The First Step
Monitoring
with Apple’s best of breed tools

• Server Monitor (Hardware)


• Server Admin (Services)
• RAID Admin (Array Health)
• ARD (Remote Administration)
Third-Party Tools
to remotely monitor and administer your system

Hardware Software
Laying the Foundation for
High Availability
Basic Infrastructure
what you need to ensure availability

Power; Management of Operable Climate


Storage Infrastructure
Advancing your storage systems for availability
• RAID
– Don’t leave home without it Active/Active Clustering with Xsan

– Levels 0 and 5
– Other levels, applications
• Types of storage
– NAS
– SAN
• Products
– XSAN
– STORNEXT
Storage Infrastructure
Redundant Array of Inexpensive Disks (RAID)

Level 0 Level 1 Level 3 Level 5 Level 0+1 Z

Byte level striping Block level


What it does Striping Mirroring with dedicated striping with Mirror of stripes Similar to level 5
parity disk distributed parity

Hardware
Xserve RAID, controller and Presumably part
Requirements Two (2) drives Two (2) drives minimum of three minimum of three Four (4) drives of 10.5 ZFS
(3) drives (3) drives, four (4) implementation
recommended

Survives one drive


Fast Read, Copy on write,
Fast Read and Survives one drive failure. Good mix Speed, survives
Advantages Write
survives all but
failure of performance one drive failure
checksums,
one disk failing flexibility
and reliability

Slower than level


5 because the Not ideal for Unproven in OS X
dedicated parity applications environments.
Disadvantages No Redundancy Capacity
drive is a requiring fast
Capacity
Overhead
performance writes intensive.
bottleneck
Active/Active Clustering with Xsa

Storage Infrastructure
Centralizing Storage

Better Hardware

Server/Data Independence

Clustering

SAN vs. NAS


Preventing Problems from
Becoming Problems for your Users
Step One
Always have good backup data
• Backup Solutions - Commercial, Open-Source
• Address 4 issues with backups:
– Hardware Failure (no external factors)
– Hardware Failure (external factors)
– Data Corruption (no external factors)
– Data Deletion (external factors / user)
• Online, Nearline, Offline storage & media
• Hardware
Step Two
Consider these Issues
• Schedule - Service Needs & System Load
• Snapshot (rsnapshot, dirvish)
• Conventional (Bacula, Retrospect)
• Special Tools for Databases
• Means of Storage (DDT, etc.)
• Physical / Geographic Location
Step Three
Implement Failure Tolerance Mechanisms
• System Level: 802.3ad Ethernet Link Aggregation
• IP Failover
– Setting it up
– Problem of auto-restart in IP Failover
• Synced Nodes: SAN vs Replication
• Load Balancing: OD + DNS replication
• Performance Tuning: Specifying right values
for your system and application
Step Four
Importance of Performance Tuning
• handle load efficiently
• performance tuning = load balancing
• spread out services
• AFP
• Web Server
Step Five
AFP Tuning
• Again, AFP Really Doesn’t Scale!
• tuning on clients
– run: defaults read -g com.apple.AppleShareClientCore

– afp_wan_threshold
– afp_wan_quantum

• tuning on server
– ram
– threads
– in /Library/Preferences/com.apple.AppleFileServer.plist
Step Six
OS X Web Services with Apache
- from within Server Admin

- Persistent Connection

- Connection timeout interval

- Proxy

- Extra modules
Recovery: For when it happens
Recovering
knowing where to look for help during recovery
• Resources for IT staff
• Procedures Manual: Quite Valuable.
• Wiki
• WO System
• Spare hardware, cold boxes,
system images, etc. Necessary.
Get this updated preso at:
http://wheel.emory.edu/macworld

User: macit
Password: IT862

(user and password are case sensitive)


Q&A
Thank you for attending IT862: High Availability Administration & System Design

See you next year at Macworld Expo 2008!

MACWORLD EXPO 2007

You might also like