Professional Documents
Culture Documents
Author
Date of Creation
Email id
Shaheer Fardan
16th March 2006
shaheer.fardan@wipro.com
Page 1 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
Table of content
1.
2.
3.
4.
5.
6.
A Quick Recap
14
7.
15
8.
16
9.
17
10.
Appendix
18
Page 2 of 20
12
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
Page 3 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
4. LLT: LLT is acronym for Low Latency Transport. LLT. LLT is responsible for
tracking network heartbeat status between the cluster nodes. It also determines if
the heartbeat link with other nodes is reliable or unreliable and communicates the
same to GAB. Connection to other nodes is considered reliable when LLT is able
to send and receive LLT packets on two or more heartbeat links. Connection to a
node is considered unreliable when LLT is able to send and receive heartbeat
packets on only one heartbeat link.
5. Agents: had tracks and maintains the status of resources configured in service
groups though agents. had on its own has no means of communicating to
resource. It requires an agent to act as intermediary between itself and resource. A
resource can be an IP Address, a network interface, diskgroup, volume etc. VCS
requires an agent to be running for each resource type configured in the cluster.
For example to configure and track whether IP address is configured or not, it
requires IP agent to be running. Whenever, an IP resource has to be onlined, VCS
will ask the corresponding agent to configure the IP address and make it up. VCS
will also communicate to the agent the parameters necessary for bringing the
interface up. All the operations let it be onlining, offlining, monitoring, enabling,
disabling the resource, VCS carries out through agents.
Page 4 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
200
set-cluster is the llt directive and 200 is the value defined for it; ie, cluster id is 200
Each entry in /etc/llthosts has the following format
Node_id
node_name
veritas-1
1 is the node id and veritas-1 is the hostname that has this node id assigned to it..
All the nodes in a given cluster should have unique node-id. In case there is a duplicate
node-id in a given cluster, the resulting behavior will depend on the VCS version cluster
is running
VCS 4.0:
In 4.0, llt is shutdown on the links where duplicate ids are detected
VCS 3.5:
In 3.5, on the first node where the duplication is detected, the llt
module will detect this and inform GAB. Then GAB will see if this is detected on
heartbeats configured on private links or public link. In case if it is detected on
private link, GAB will reboot the server
VCS 2.0:
In 2.0, on the first node where the duplication is detected, the llt
module will detect this and inform GAB. Gab will then reboot the server
irrespective of the type of link it was detected on.
Page 5 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
Step 1.
This step is required.
Set local-mac-address? to true if not already set
In case the system is in multiuser-mode do the following,
#eeprom local-mac-address?=true
#sync
#sync
#init 6
In case the system is at ok prompt.
ok> setenv local-mac-address? true
reset-all
Note:
a) By default, Sun systems will use the ethernet address coded in PROM for all the
interfaces. In case the private link and public interface(with IP configured) are connected
to same switch although on different VLANs, it may cause switch to forward llt packet
on public interface and IP packets to private llt link which is not desirable
b) Also this step must be completed in case you are going to use MultiNICB resource
along with IPMultiNICB resource either in base mode or mpathd mode for local interface
failover
Step 2.
Once the system is in multi-user/single user-mode, identify the interfaces that are
supposed to be used for heartbeats on private link.
The two heartbeats should go on separate and distinct VLANS or network. Eg if
heartbeat 1 is on vlan1 then the second should go to vlan2. All the cluster nodes are
required to have same VLANs for the corresponding heartbeats. Eg, If node1 has first
heartbeat on vlan1 and second one on vlan2 then all the nodes should have their first
heartbeat in vlan1 and second heartbeat on vlan2
Page 6 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
Step 3.
Check if you are able to see network packets on interfaces to be configured for heartbeats
Check if the interfaces(private) supposed to exchange heartbeats are hard set at 100Mbps
Full Duplex with auto-negotiation disabled using either ce.conf /qfe.conf depending on
the type of interface used for heartbeats ( file location is /platform/sun4u/kernel/drv)
In case the heartbeats are not set at 100 Mbps FDX, then it should be setup as defined
above. Please note that this requires server reboot. After creating
ce.conf/qfe.cong/bge.conf file always reboot the system to verify if the settings specified
in conf files are correct. Use the following entries to force all the interfaces on 100Mbps
FDX with auto-negotiation disabled:
adv_autoneg_cap=0 adv_1000fdx_cap=0 adv_1000hdx_cap=0 adv_100T4_cap=0
adv_100fdx_cap=1 adv_100hdx_cap=0 adv_10fdx_cap=0 adv_10hdx_cap=0;
After server reboot verify the settings for heartbeat channels. I am assuming interface
used is ce and the instance is 1
ndd set /dev/ce instance 1
ndd /dev/ce adv_autoneg_cap (this should report zero)
ndd /dev/ce adv_100fdx_cap ( it should report 1 meaning 100Mbps)
ndd /dev/ce link_status (It should report 1 meaning link is up)
ndd /dev/ce link_mode (It should report 1 meaning link is full duplex)
ndd /dev/ce link_speed (It should report 1 indicating link is 100Mbps)
Do the same for second heartbeat as well
After this, verify if you are able to see network (layer 2 frames) on heartbeat channels
using snoop command
If heartbeats are on ce1 and ce3 the use
snoop d ce1
snoop d ce3
You should be able to see layer 2 frames of the following kind on the interfaces
corresponding to heartbeat channels
? -> (multicast)
bytes
Incase you are not able to see network packets on heartbeat interfaces, check the interface
settings and troubleshoot. Make sure that all parameters except for adv_100fdx_cap are
set to 0(zero)
Page 7 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
Step 4.
Choose a cluster id that is not in use in the cluster environment in case where multiple
clusters share the same heartbeat infrastructure. This number should be between 0-255.
Avoid using 0 as the cluster id.
Step 5.
Once you identify the cluster id that is not in use proceed with the following steps:
i)Verify that had/hashadow is not running. Verify that /etc/llttab, etc/llthosts and
/etc/gabtab file do not exist.
ps ef| grep ha
ls /etc/llt*
ls /etc/gabtab
Although, it is assumed that we are configuring nodes in new cluster which does not has
any functional node, the above steps will avoid any mistake that can be caused by typing
incorrect hostname..
ii) Assuming that bge2 has the first heartbeat and bge3 has the second heartbeat, create
the following entries in /etc/llttab file. I am taking example of veritas-1, veritas-2 which
is a two node cluster
echo set-cluster
61
set-node
/etc/VRTSvcs/conf/sysname
link bge2 /dev/bge:2 - ether - link bge3 /dev/bge:3 - ether - - > /etc/llttab
echo `uname n` > /etc/VRTSvcs/conf/sysname
echo 1
veritas-1
2
veritas-2 > /etc/llthosts
echo /sbin/gabconfig c n2 > /etc/gabtab
Once first host has llt and gab config files setup, the same should be done for other
cluster members as well as described in Steps 1 through 5.
Once all the nodes have llt config files setup. Run the following on all the nodes:
#/sbin/lltconfig c
This will read /etc/llttab and /etc/llthosts and setup llt communication channels as
specified in config files
To check if llt has started on the local node, run the following command
# /sbin/lltconfig
LLT is running
Wipro Technologies TIS
Page 8 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
This command will report you the status of the heartbeat links as configured in llttab. By
default it will show the status for all the 32 nodes( maximum nodes allowed) ----existing
+ non-existing. The ones that have name 1 veritas-1 specified are existing nodes; the
ones that do not specify hostnames are non-existent nodes. To avoid lltstat nvv reporting
status for all the 32 nodes use the exclude directive in llttab file. The procedure to
configure this is provided at the end of the document.
The entry starting with star (*) specifies the node from which the command was actually
run.
By default until and unless gab is started it will show the status and state of all the links
as IDLE and DOWN.
After llt has been started on all the hosts run the following command on all the nodes;
/sbin/gabconfig c n #
Where # specifies the number of cluster nodes that should be communicating by way of
GAB before gab membership can be established and had can start. This is also called the
seed number
VCS best practices recommend setting the seed no equal to number of cluster nodes in
the cluster. For example, if a cluster has five nodes then the corresponding seed no should
be 5. For two node cluster seed no should be 2. Setting up seed no this way reduces the
chance of creating split brain in case of pre-existing network partition when the cluster is
rebooted.
Let us assume that we have five node cluster the command should be run as
/sbin/gabconfig c n5
on all the five cluster nodes
Page 9 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
Once it is run on all the nodes you may see messages like following ones if you are on
console:
root@node0 # /sbin/gabconfig -c -n5
Feb 17 02:56:58 node0 gab: GAB:20026: Port a registration waiting for seed port
membership
root@node0 # /sbin/gabconfig -a
GAB Port Memberships
===============================================================
root@node0 # Feb 17 02:57:11 node0 llt: LLT:10024: link 0 (ce1) node 1 active
Feb 17 02:57:11 node0 llt: LLT:10024: link 1 (ce7) node 1 active
Feb 17 02:57:24 node0 llt: LLT:10024: link 0 (ce1) node 2 active
Feb 17 02:57:24 node0 llt: LLT:10024: link 1 (ce7) node 2 active
Feb 17 02:57:30 node0 llt: LLT:10024: link 1 (ce7) node 3 active
Feb 17 02:57:30 node0 llt: LLT:10024: link 0 (ce1) node 3 active
Feb 17 02:57:36 node0 llt: LLT:10024: link 1 (ce7) node 4 active
Feb 17 02:57:36 node0 llt: LLT:10024: link 0 (ce1) node 4 active
Feb 17 02:57:41 node0 gab: GAB:20036: Port a gen 4e4f01 membership 01234
root@node0 # /sbin/gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 4e4f01 membership 01234
Explanation of output of gabconfig a command
Port a specifies the gab has seeded and gab membership has been established.
4e4f01 specifies a randomly generated number. This number changes every time the
membership changes. In a stable configuration which has all the nodes up and
communicating by way of GAB, this number should be same when gabconfig a
command is run from different cluster nodes..
01234 specifies that nodes having nodeids 0,1,2,3,4 respectively are up and
communicating by way of gab
In case the gab is not seeded; output of gabconfig a will show the following output
root@node0 # /sbin/gabconfig -a
GAB Port Memberships
===============================================================
Once the gab membership has been established, run the lltstat nvv command on all the
cluster nodes. It should show all the link status as OPEN and UP
Page 10 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
For proper functioning, please make sure that the information in State field should be
OPEN and the Link field should show UP. In case if State filed show CONNWAIT/IDLE
or Status field shows DOWN for existing hosts, in that case either the config files have
not been setup correctly or there is some problem with physical connectivity
Page 11 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
cluster id should be specified correctly and should have value between 0-255(both
inclusive) and should not be in use. All nodes should have the same cluster id
In case the value specified in against set-node is number then there should be an
entry in /etc/llthosts file that maps the nodeid to nodename by which local cluster
node will be identified in the cluster
In case the value specified against set-node is nodename, then there should exist
an entry corresponding to this in /etc/llthosts that maps the nodename to nodeid..
In any case the value of nodeid should be between 0-31(both inclusive) with each
node in cluster assuming a unique node id (nodeids in a given cluster should not
conflict)
The links should be specified correctly .Eg /dev/ce:1 cannot be specified as
/dev/c:1
3. If llt is running then check the llt connectivity status using lltstat nvv command. The
entire Status field should show OPEN and State field should show UP. In case there
exists CONNWAIT status corresponding to any host in status field or state field shows
down that means there is a physical connectivity issue.
Page 12 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
Page 13 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
A QUICK RECAP
1. Identify the hosts that are going to be part of new cluster. Set local-mac-address? to
true on all the hosts. Check all the heartbeat channels should be hard set at 100Mbps
FDX with auto-negotiate disabled
2. Create /etc/VRTSvcs/conf/sysname file on cluster nodes with local nodename as
content
Eg, on node0, the contents of /etc/VRTSvcs/conf/sysname should be same as hostname
ie, node0
On node1, the contents of /etc/VRTSvcs/conf/sysname should be same as hostname ie,
node1
Do the same on all the cluster nodes.
3. Assign unique nodeid to each node in new cluster
4. Make mapping of nodeid to nodenames as entered in sysname file and enter this in
/etc/llthosts file of all the nodes. /etc/llthosts file should be same on all the nodes in a
cluster
5. Take an unused cluster id and configure the llttab file as explained in the document..
6. Create /etc/gabtab and put the following in it as content
/sbin/gabconfig c n #
where # is the seed no.
Do this step only when the cluster is ready to be configured at layer 3 (had)
7. Start llt on all the hosts using the command
/sbin/lltconfig c
8. start gab on all the hosts using command
/sbin/gabconfig c
9. Verify that llt and gab are running
/sbin/lltconfig
/sbin/lltstat nvv
/sbin/gabconfig a
Page 14 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
Page 15 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
Page 16 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
Page 17 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
Appendix
Explanation of output of lltstat nvv command
# /sbin/lltstat -nvv
LLT node information:
Node
State Link Status Address
* 0 node0 OPEN
ce9 UP
00:03:BA:E6:33:CA
ce2 UP
00:03:BA:DA:FE:90
1 node1 OPEN
ce9 UP
00:03:BA:E4:D8:72
ce2 UP
00:03:BA:DA:B8:04
1. Node field indicates the nodeid and nodename as configured in llthosts file
2. State indicates the state of the peer node. It should be OPEN for proper communication
to take place
3. Link gives the tag name specified in the llttab file
eg, in link ce9 /dev/ce:9 - ether - - ce9 is the tag name specified for 9th instance of ce
interface which will be displayed under link field in lltstat nvv output
4. Status gives whether link is up or not. It should be UP for proper communication to
take place
5. Address field gives the Mac-address corresponding to the interface tag name specified
in tag field on the respective host
Page 18 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
name of file containing hostname by which this host will be recognized in the
cluster. It can be any file eg, /etc/nodename /etc/VRTSvcs/conf/sysname
The first field link specifies that the entry corresponds to high priority heartbeat
channel
The second field ce9 specifies the tag name by which the interface will be
recognized by commands like lltstat, lltconfig etc.
The third field /dev/ce:9 specifies the device file corresponding to interface and
the instance number. If ce9 is the interface is to be used for heartbeat then it
would be entered as /dev/ce:9
The fourth field stands for node-range. It should be left at default value of -
The fifth field stands for type of network. It always assumes the value of ether
The sixth field stands for SAP value used by Ethernet for heartbeat messages.
When we specify -in this field it takes the default value of 0xCAFE in hex. The
value should not be changed and be left at default value -
The seventh field stands for MTU (Maximum transmission Unit) on the network.
It should also be left at default value by specifying -in seventh filed.
Fourth Entry: link ce2 /dev/ce:2 - ether - - specifies the second high priority heartbeat
channel
VCS Best practices recommend use of /etc/VRTSvcs/conf/sysname file and that this file
should contain real hostname (specified in /etc/nodename)
Page 19 of 20
Setting up Basic VCS ( LLT and GAB) on new Clusters using VCS 4.0
hostname
Node id should take values between 0-31 and every host in a given cluster should have
unique nodeid. In our example node0 has node id of 0 and node1 has node id of 1.
************************************************************************
Page 20 of 20