You are on page 1of 66

HA-OSCAR 2.

0: an open source
HA-enabling framework for mission
critical systems
COMPLETE TUTORIAL

RAJAN SHARMA

An open source project under Louisiana Tech research foundation copyright and a GNU public license

Contents
1. Introduction to HAOSCAR 2.0
2. Step by step Installation of HAOSCAR 2.0
3. Configuring HAOSCAR 2.0
4. Trouble Shooting HAOSCAR 2.0

HAOSCAR 2.0 Introduction

Goals of HA-OSCAR 2.0

Improve flexibility and open solution

Free HA-OSCAR from being dependent on OSCAR

Support most IT infrastructures based on Linux operating systems such


as web servers, and clusters.
Incorporate a self-healing mechanism, failure detection, automatic
synchronization, fail-over and fail-back.
Cross platform support.

HAOSCAR 2.0 Overview

HA-OSCAR provides a High Availability solution.


Cross Platform support, verified with Ubuntu 9.10 server edition and Cent
Os 5.6
It has a features that clone the system during installation to make the
data and software consistent.
Features a fail-over system that allows the clone to take over for the head
node during failure.
Monitors services to ensure maximum uptime.
Provides data synchronization between the primary and secondary
system.

Hardware Architecture 1/2

HA-OSCAR (2.0 release) is an


active/hot-standby architecture
with automatic failover, fail-back
& automatic data synchronization
HA-OSCAR major components

Primary server

Standby server

Switches

Multiple clients

Hardware Architecture 2/2

HA-OSCAR consists of the following major system components

Primary Server

Standby Secondary Server

Monitors primary server and takes over in the event a failure is detected.

Local LAN Switches

Responsible for receiving and distributing requests.


Two networks,one connected to the Internet, and one
connected to a local LAN.

provide local connectivity among head and client nodes.

Cluster(optional)

Software Architecture 1/2

HA-OSCAR 2.0 software


architecture consist of three
main components.

Heartbeat : handles failure


detection, fail-over, and failback.
Monit : used to monitor
system critical services.
rsync : used for copying files
to the secondary node.

Software Architecture 2/2

IP Monitoring(Heartbeat)

Service Monitoring(MONIT)

A service designed to detect failure of the physical components


Initiates fail-over in the event of a Primary node failure
Attempts to restart services that fail
MONIT can induce fail-over if critical services cannot be started

Data Syncronization (rsync)

Monitors the files that are necessary for services


Copies changes to the secondary node
rsync copies changes 2 minutes( by default) after first file change occurs.
User can set this according to the need of the application

Project Management

Code hosted on GitHub which provides Issues tracker and


Subversion based collaborative source code hosting.
Cluster of four different Linux workstations used for
development and testing.
Microsoft Office Live Space and Google Wave for more
coordinated team collaboration.

HAOSCAR 2.0 File Hierarchy

HA-OSCAR 2.0 Download

http://hpci.latech.edu/blog/?page_id=64

HA-OSCAR Future Release Plans

HAOSCAR API

New addition in which developers and administrators can extend the functionality of
HA-OSCAR using provided hooks
Allows creating event notification services and powerful rule based systems
Can be used to determine the state of the monitored services

Integration of Virtual Machine Management Capabilities


Extension of HA-OSCAR to include failure prediction technologies
Compute Node Redundancy
Many more

Applications for HAOSCAR 2.0

Web Applications

Factors that make web application unavailable

Patient Data Exchange Server

Hardware/software failure
Power issues
Routine maintenance

Exchange the patients data between Hospitals.

Cybertools Petashare

Petashare storage is used for sharing data across the networks and
can be HA enabled with HAOSCAR.

HAOSCAR 2.0 Installation

HAOSCAR 2.0 Beta Installation

Limitations of HAOSCAR 2.0

GUID Partition Table (GPT) is not supported


Ext4 file system is not supported
LVM is not supported for partitions other than swap
Swap partition must be present and it must be in a
logical volume
All .gvfs must be unmounted from both primary and
secondary head nodes

Pre-Installation Configuration

Services

Specify virtual IP for all services


System Files

/etc/fstab

/etc/hosts
Network Interfaces

Configure primary and secondary local networks


System Imager

Install System Imager

Service Configuration

Install all desired services

Specify that all HA services use the reserved virtual IP

Configure cluster, if applicable

Communication between nodes should happen through


the virtual IP as well

System Files Configuration 1/2


Note: The user should be as root from now on for the installation

/etc/fstab
Replace all UUIDs with the full path of the device file
Before:

After:

System Files Configuration 2/2

/etc/hosts
Replace 127.0.1.1 of the host name with the static IP
for the Primary node
Before:

After:

Network Interface Configuration 1/5

For Debian Systems in /etc/network/interfaces

Configure your primary and secondary local network


interfaces
Both local interfaces must be static
Use lines:

auto NIC
iface NIC inet static
address IP
netmask MASK

Execute: sudo /etc/init.d/networking restart

Network Interface Configuration 2/5

The /etc/network/interfaces will look like


Before:

After:

Network Interface Configuration 3/5

For Red Hat Systems in /etc/sysconfig/network-scripts/ifcfgNIC

Configure your primary and secondary local network interfaces


Both local interfaces must be static
Use lines:

auto NIC
iface NIC inet static
address IP
netmask

Execute: sudo /etc/init.d/networking restart

Network Interface Configuration 4/5

The /etc/sysconfig/network-scripts/ifcfg-NIC will


look like
Before:

After:

Network Interface Configuration 5/5

System Imager Installation

Download the system imager packages from


http://hpci.latech.edu/haoscarpackages/systemimager-packages/

and Execute the command


sudo ./packages-install.sh

This will install--via aptitude--the


prerequisites for system imager.

The user should login as root before


starting the installation wizard.

HAOSCAR 2.0 Installation

Download HAOSCAR 2.0 package


http://hpci.latech.edu/haoscarpackages/ and
execute the following command

sudo dpkg -i *.deb

Installs all .deb files in current directory

sudo apt-get install -f

Installs dependencies for files installed above

Configuring HAOSCAR 1/2

Run Command

sudo haoscar_configure

Gathers system configuration data for the primary


and secondary nodes
Sets up HATCHI programs: Heartbeat, Rsync, and
Monit
Starts system cloning, net-boots the secondary
node, and handles image transfer

Running the Installation Script

sudo haoscar_configure prints this on the screen

Gathering System Configuration Information 1/2

Primary Server

During the configuration process the user will be asked


to enter various configuration data for the primary
server

The folders to synchronize with the secondary


server (in addition to the home directory)
The local network interface
Virtual Ips that will be used for Network Address
Takeover (one per subnet)

Gathering System Configuration Information 2/2

This is the screen for this step

Database Initialization 1/2

Installation create the sqlite3 database from the


schema specified in usr/share/haoscar
Secondary server

During this step the user is asked to input configuration


data for the secondary server

The host name for the secondary server


The static IP address for the secondary server

Database Initialization 2/2

This is the screen for this step

High Availability Tools Configuration and


Installation (HATCI)

Responsible for the installation and configuration of


tools necessary to provide high availability to the system
The current implementation of HATCI is divided into
three sub components:

Node Redundancy

Service Redundancy

Data Replication Services

HATCI Architecture

Node Redundancy

Node redundancy makes use of world class


monitoring and syncing tools such as Linux-HAs
heartbeat
The heartbeat subsystem ensures timely and
reliable failover / failback

HATCI SETUP 1/5

Heartbeat Configuration

Heartbeat service is configured in the primary server


which links to the secondary server after the net-boot to
provide IP fail-over and fail-back

Note: This step requires you to enter a password for Heartbeat. It is


used for your systems to identify each other

HATCI SETUP 2/5

This is the screen for the Heartbeat configuration


step

Service Redundancy

Service redundancy on the other hand is


responsible for the configuration and installation
of tools used to provide highly available cluster
services.

HATCI SETUP 3/5

Monit Configuration

Adds default apache, sshd and syslog


configurations to itself and start monitoring these
services at a 2 minute interval

Monit uses this configuration to automatically


maintain and repair these services in error
situations

The configuration file for monit can be found


in /etc/monit/monitrc

HATCI SETUP 4/5

This is the screen for the Monit configuration step

Data Replication

Reliable data replication is achieved with RSYNC


keeping both head nodes constantly synchronized

Rsync Configuration

Automatically backs up any previous rsync


configuration file and creates a new configuration
file in /etc/rsyncd.conf

HATCI SETUP 5/5

This is the screen for the Rsync configuration step

Configuring HAOSCAR 2/2

System Cloning

Creates an image of the primary server


Uses the provided information to allow a net boot of the
secondary server
Copies the created image to the secondary server,
changing host name and IP address
After successful completion of this step, the haoscar
installation is complete. Please Net-boot your
secondary server

System Image Creation

This is the screen of this step

Secondary Sever Installation Process

Netboot and Image Transfer to Secondary server

One of the secondary server's network interfaces, eth0, is


connected to the private LAN and requests an IP address (DHCP)
during its network boot
The secondary server will then boot PXElinux from the Primary
server
From PXElinux, the Secondary server uses Rsync to retrieve the
system image from the Primary server to its hard drive
After the primary server is successfully cloned, the secondary
server will be rebooted form its hard drive

Configuring the Secondary Server 1/3

Now that the secondary server has been created


we must now configure it

/etc/network/interfaces set up the static IPs for the


secondary server

One per subnet specified by the virtual IPs

Authenticate the primary server's ssh key with the


secondary server
Restore the primary DHCP configuration file

Configuring the Secondary Server 2/3

Configuring the network interfaces

Same as the process for setting up the static IPs for the
primary server though the last number should be
different
Authenticating ssh keys
Done from the primary server
ssh-keygen (The passphrase must be left empty.)
ssh-copy-id -i ~/.ssh/id_rsa.pub Secondary_Server

Configuring the Secondary Server 3/3

Restoring the primary DHCP configuration file

For Debian Systems

For Red Hat Systems

mv /etc/dhcp3/dhcpd.conf.bak.1
/etc/dhcp3/dhcpd.conf
service dhcp3-server restart
mv /etc/dhcpd.conf.bak.1 /etc/dhcpd.conf
/etc/init.d/dhcpd restart

Restart Both Systems to be HA enabled

HAOSCAR 2.0 Configuration

Optional Customization
HAOSCAR 2.0 can be configured according to the need of
application by user. These are some of the configuration
that can be done to HAOSCAR 2.0

Service Monitoring (Monit) Configuration


IP Availability (Heartbeat) Configuration
Data Syncronization (HA-OSCAR filemon)
Configuration

Monit Configuration
Debian: /etc/monit/monitrc
set daemon SECONDSS
set httpd port PORT and use address localhost
allow localhost
allow USER:PASSWORD

http://mmonit.com/wiki/Monit/ConfigurationExamples

Monit Configuration Example 1/4

Case 1: Mail Service

Postfix (mail server)

check process postfix with pidfile /var/spool/postfix/pid/master.pid


group mail
start program = "/etc/init.d/postfix start"
stop program = "/etc/init.d/postfix stop"
if failed port 25 protocol smtp then restart
if 5 restarts within 5 cycles then timeout
check file postfixpid with path /var/spool/postfix/pid/master.pid
If changed timestamp for 5 cycles then exec /bin/sh /usr/bin/fail-over

Proof of Case 1

This is the
monit status
report for
postfix mail
server when
we run the
command
monit status

Monit Configuration Example 2/4

Case 2: System Service

Syslogd (system logfile daemon)

check process syslogd with pidfile /var/run/rsyslogd.pid


start program = "/etc/init.d/rsyslog start"
stop program = "/etc/init.d/rsyslog stop"
if 5 restarts within 5 cycles then timeout
check file syslogd_file with path /var/log/syslog
if timestamp > 65 minutes then alert

Proof of Case 2

This is the monit


status report for
system logfile
daemon when we
run the command
monit status

Monit Configuration Example 3/4


Case 2: WWW Service

Apache (web server)

check process apache with pidfile /var/run/apache2.pid


start program = "/etc/init.d/apache2 start"
stop program = "/etc/init.d/apache2 stop"
if 5 restarts within 5 cycles then timeout
check file apachepid with path /var/run/apache2.pid
if changed timestamp for 4 cycles then exec "/bin/sh /usr/bin/fail-over"

Proof of Case 3
This is the
monit status
report for
Apache web
server when
we run the
command
monit status

Monit Configuration Example 4/4


Case 3: Login Service

SSHD

check process sshd with pidfile /var/run/sshd.pid


start program = "/etc/init.d/ssh start"
stop program = "/etc/init.d/ssh stop"
if 5 restarts within 5 cycles then timeout
if failed port 22 protocol ssh then restart
check file sshdpid with path /var/run/sshd.pid
if changed timestamp for 5 cycles then exec "/bin/sh /usr/bin/fail-over"

Proof of Case 4
This is the
monit status
report for
SSHD when
we run the
command
monit status

Heartbeat Configuration 1/3

In /etc/ha.d/ha.cf
logfile /var/log/haoscar/heartbeat.log
udpport 694
keepalive 2
deadtime 30
Initdead 120
bcast eth0

Heartbeat Configuration 2/3

Udpport defines which port the primary and secondary servers


communicate across.
Keepalive specifies the polling interval the Primary server uses to
reassert itself to the Secondary server.
Deadtime is how long the Secondary server will wait without
reassertion from the Primary server before taking over the Ips.
Initdead is the deadtime used when first bringing the system
online.
Bcast defines which NIC heartbeat uses to communicate between
nodes.

Heartbeat Configuration 3/3

In /etc/ha.d/haresources:

Primary-Server 192.168.0.9 192.168.1.9

Provides virtual IPs to be used by Heartbeat, and


which system they belong to by default.

Note: Both files /etc/ha.d/ha.cf and /etc/ha.d/haresources


must be kept synchronized between servers.

HAOSCAR Filemon Configuration

/etc/init.d/ha-oscar-filemon
start-stop-daemon --start --pidfile $PID_FILE --background
--make-pidfile --exec $DAEMON -- --recursive --period
SECONDS --primary=$PRIMARY
--secondary=$SECONDARY $watch_dirs

period defines how rsync will wait to transmit new sets of


changes to the Secondary server

Trouble Shooting
Problem

Possible Solution

System Imager Fails


With complaint about .gvfs

Unmount .gvfs

During System Imager installation

Update package manager repositories


(eg: apt-get update)

After PXE loads, before cloning starts

Verify partition table is not GPT

Heartbeat Fails (no failover/fail-back)

Restart the heartbeat service on each


machine. (eg. :sudo service heartbeat
restart)

Monit Fails (Stopped HA service stays


stopped)

Verify apache2 and ssh are installed

You might also like