Basic concept about VCS?
What is Veritas cluster?
VCS is known as Veritas Cluster
Server which is the high availability cluster software developed by Veritas
technologies which are available for Unix, Linux, and windows. It provides
application cluster capabilities to the systems running on applications and
database
------------------------------------------------------------------------------------------------
VCS Terminologies
Service Groups: Service group are a
container or a group which contains a set of resources working together to
provide application services available to the clients. VCS performing the
operation on resources such as starting/stopping/restarting/monitoring at the
service group level
------------------------------------------------------------------------------------------------
SG types
Failover: Service group runs on one
the system at any one time (one node is active and other nodes are passive)
Parallel: Service group can run simultaneously on more than
one system at any time.
Hybrid: Hybrid service group is the combination of failover
and parallel service group used in vcs 4.0
------------------------------------------------------------------------------------------------
Resources
Resources
are objects which can be hardware or software which are required to bring up
the services and VCS control these resources by starting /stopping/monitoring
Two types of resources:
Non-persistent: Can be controlled by VCS
Persistent: Can't controlled by VCS (Ex: NIC cards) (Persistent
resources cannot be a parent resource)
Resource dependencies: It determines the sequence of the resources to bring online/offline
Resource types: It defines a type of resources or characteristics of resources.
(Ex: Nic is the network card resource, IP is the ip based resource, mnt is the
mount point based resource)
------------------------------------------------------------------------------------------------
VCS concepts and components:
LLT - Low Latency Transport
GAB - Group membership services and
atomic broadcast
HAD - (High Availability Daemon)
LLT
ü It’s the layer 2 protocol and it is very low level.
ü LLT is the transport mechanism which is used by VCS to get
information from one node to another.
ü It is the very first thing that comes up in the vcs startup.
ü private and public links: interconnecting links between the
cluster nodes is called private links (hipri) and other link is a public link
(lowpri)
ü maximum of 8 links supported
It has two primary functions:
1) heartbeats: LLT is constantly heart beating and sending a small pocket from that
node to all the other nodes like "hey I am still here"
2) Cluster communication traffic: If someone makes changes on the vcs on one node, then all the other node
needs to know about that (Actually, LLT send that information all across nodes)
------------------------------------------------------------------------------------------------
GAB (Group membership services
and atomic broadcast)
It is the second key part when the
VCS starts up.
GAB is responsible for maintaining the overall cluster membership. Heartbeats
are used to determine if a system is a active member, joining or leaving a
cluster.
Atomic broadcast: Cluster configuration and status info is distributed dynamically
to all the systems within the cluster using GAB's Atomic broadcast feature.
(Atomic means that all the system receive the updates if one fails)
Cluster membership: Managing the cluster membership (If node A crash, LLT immediately
figures it out, GAB is responsive for reforming the cluster with the rest of 2
nodes)
Cluster comm.: it is the mechanism is for communicating one node can able to talk to other nodes using the GAB ports.
GAB ports: GAB/IO Fencing/HAD -> if any config change on made on node a, then
rest of the nodes to know about the changes using GAB/IO Fencing / HAD)
VCS seed: Only seeded nodes can run VCS. VCS does not start the service group
until it has a seed.
Manual seed: It is possible that one of the nodes does not come up when all other nodes
on the cluster are started, due to the "minimum seed requirement"
safety that is enforced by GAB in this stage, human intervention required
the mini-cluster by determining other node is in fact not participating its own
mini-cluster. To manually seed the GAB membership, run #gabconfig -cx)
------------------------------------------------------------------------------------------------
HAD
HAD maintain the cluster state
information and it track all the changes within the cluster configuration and
resource status by communicating with GAB
HAD is monitored by hashadow. If HAD
fails, hashadow attempt to restart it, likewise hashadow demon dies HAD will
restart it.
HAD using the main.cf file to build
the cluster information in memory also it is responsible for updating the
configuration in memory.
------------------------------------------------------------------------------------------------
VCS architecture for LLT/GAB/HAD
ü Agent monitors the resources on each
system and provide status to HAD on the local system.
ü HAD on each system send status
information to GAB.
ü GAB broadcasts configuration
information to all cluster members.
ü LLT transport all cluster
communications to all cluster nodes
ü HAD on each nodes take corrective
action, such as failover when necessary
------------------------------------------------------------------------------------------------
VCS syndrome
Jeopardy syndrome:
If a
node is running with one heartbeat only, that is the state of
Jeopardy. VCS does not restart the application on the new node when
a node is running with one heartbeat only. (This action of disabling the failover is a safety mechanism that prevents the data corruption)
------------------------------------------------------------------------------------------------
split brain syndrome
The split-brain syndrome occurs, when all
the LLT links fail simultaneously.
The nodes on the cluster fail to
identify whether it is the system failure or interconnect failure, Each
mini-cluster will be formed and cluster thinks that it is the only cluster
that's active at the moment and tries to start the service groups on the other
mini-cluster which he thinks is down. A similar thing happens to the other
mini-cluster. This may lead to simultaneous access to storage
and can cause data corruption.
------------------------------------------------------------------------------------------------
How to prevent these syndromes
I/O fencing
VCS implements I/O fencing mechanism
to avoid a possible split-brain condition. It ensures data integrity and
data protection. I/O fencing driver uses SCSI-3PGR to fence off the data in
case of a possible split-brain scenario.
Coordinator disks
Coordinator disks are used to store
the key of each host, which can be used to determine which node stays in
cluster in case of possible split-brain scenario.
In the case of split-brain scenario, the
coordinator disks trigger the fencing driver to ensure only one mini-cluster
survives. Which means when the heartbeats are failed, both nodes tried to reach
the coordinator disks. Which node reaches the co-ordinator disks, that node
will start the services and turned to active.
------------------------------------------------------------------------------------------------
Auto start policy
Auto start policy kicks in whenever
hastart has been done (initiated)
Auto start list while coming to picture
when it is with the failover service group
System list and auto start list are
consider as same when it is parallel SG.
Auto start policy: Three types (order, priority, and load)
Order - it will follow the auto start list from left to right.
Priority - in system list we have a list of servers with priority (lower the
number, higher the priority)
Load - very rarely used as it is static load (not a dynamic one)
------------------------------------------------------------------------------------------------
Application agent
Application agent gives protection to
the application.
Attributes:
Start program: executable/script to start the application
Stop program: executable/script to stop the application
Monitor program: provide a path to program/script to monitor the state of the program when
it starts it up
Monitor processes: List out the running processes using ps -ef command
PID files: some application when they
start it up, they create files that contain process id.