Saturday, November 30, 2019

How to start / stop / remove the wpar?

How to start / stop / remove the wpar?



To list wpar:  # lswpar
To shutdown wpar:  #stopwpar -F <wpar_name>  (or) stopwpar -Fv <wpar_name>
To reboot wpar:   #stopwpar -F -r <wpar_name>
To remove wpar:  #rmwpar <wpar_name>


#lswpar
#startwpar <wpar_name>  (or)  # startwpar -v <wpar_name>
#stopwpar <wpar_name>  (or)  # stopwpar -F -r <wpar_name>
#rmwpar <wpar_name>



How to identified Global Environment in aix?

How to identified Global Environment in aix?


Please execute the "uname -w" command on the server and if you see the output is ZERO (0) means you are in Global Environment. 


root@testserver:/ # uname -w
0
root@testserver:/ #


Note: if it is displayed as "non-zero" then it is vwpar.


How to differenciate the Global Environment process and wpar process?

How to differentiate the Global Environment process and wpar process?



ps -ef@      -> It displayed all the process including Global environment and wpar

ps -ef@ Global      -> It displayed the Global Environment processes

ps -ef@ <wpar_name>   -> It displayed the WPAR processes.


How to backup of the Versioned wpar (VWPAR) ub aix?

How to backup of the Versioned wpar (VWPAR) ub aix?


Using the below command we can take the backup of the versioned wpar (VWPAR) in aix


savewpar -f /vwp01/<vwpar_name>/backup/<vwpar_name>.vwbck -i -e -x <vwpar_name>

consider <vwpar_name>  as testvwpar


#savewpar -f /vwp01/testvwpar/backup/testvwpar.vwbck -i -e -x testvwpar


creating list of files to back up.

Backing up 69591 files ......

69591 of 69591 files (100%)
0512-038 savewpar: Backup completed successfully.



what is the limitations of wpar and vwpar?

what are the limitations of wpar and vwpar?


Limitations of WPAR:

  1. No EtherChannel in WPAR
  2. AutoFS must not be used in a wpar because it might prevent the wpar from stopping cleanly
  3. kernel tuning is unavailable in within wpar.
  4. Resource control: WPARs support up to 8192 resource-controlled workload partition. It depends on GE's memory and disk space resource.


Limitation of VWPAR: 

  1. File systems cannot be shared iwth other wpars.
  2. Adapaters cannot be exported to a versioned wpar.


What is the configuration file for the VWPAR?

What is the configuration file for the VWPAR?


Configuration file:   /vwp01/<vwpar_name>/etc

Location of devmap:  ls -l /etc/wpars/devmap


How to find out the EMC lun details in aix?

How to find out the EMC lun details in aix?


We can easily find out the EMC lun details using inq in aix.  How?
We can get it like below.

       cd /usr/lpp/E*/S*/bin
       ./inq.aix64_51


Saturday, August 3, 2019

Basic concept about VCS?

Basic concept about VCS?


What is Veritas cluster?

VCS is known as Veritas Cluster Server which is the high availability cluster software developed by Veritas technologies which are available for Unix, Linux, and windows. It provides application cluster capabilities to the systems running on applications and database
 ------------------------------------------------------------------------------------------------

VCS Terminologies

Service Groups: Service group are a container or a group which contains a set of resources working together to provide application services available to the clients. VCS performing the operation on resources such as starting/stopping/restarting/monitoring at the service group level
 ------------------------------------------------------------------------------------------------

SG types

Failover: Service group runs on one the system at any one time (one node is active and other nodes are passive)

Parallel:      Service group can run simultaneously on more than one system at any time. 

Hybrid:      Hybrid service group is the combination of failover and parallel service group used in vcs 4.0

------------------------------------------------------------------------------------------------


Resources

Resources are objects which can be hardware or software which are required to bring up the services and VCS control these resources by starting /stopping/monitoring

Two types of resources: 

Non-persistent: Can be controlled by VCS

Persistent: Can't controlled by VCS (Ex: NIC cards)  (Persistent resources cannot be a parent resource)

Resource dependencies: It determines the sequence of the resources to bring online/offline

Resource types: It defines a type of resources or characteristics of resources. (Ex: Nic is the network card resource, IP is the ip based resource, mnt is the mount point based resource)

 ------------------------------------------------------------------------------------------------

VCS concepts and components:


LLT - Low Latency Transport
GAB - Group membership services and atomic broadcast
HAD - (High Availability Daemon)


LLT

  ü It’s the layer 2 protocol and it is very low level.
  ü LLT is the transport mechanism which is used by VCS to get information from one node to another.
  ü It is the very first thing that comes up in the vcs startup.
  ü private and public links: interconnecting links between the cluster nodes is called private links (hipri) and other link is a public link (lowpri)
  ü maximum of 8 links supported

It has two primary functions:

1) heartbeats:  LLT is constantly heart beating and sending a small pocket from that node to all the other nodes like "hey I am still here"

2) Cluster communication traffic: If someone makes changes on the vcs on one node, then all the other node needs to know about that (Actually, LLT send that information all across nodes)

 ------------------------------------------------------------------------------------------------

GAB  (Group membership services and atomic broadcast)

It is the second key part when the VCS starts up.

GAB is responsible for maintaining the overall cluster membership. Heartbeats are used to determine if a system is a active member, joining or leaving a cluster.

Atomic broadcast: Cluster configuration and status info is distributed dynamically to all the systems within the cluster using GAB's Atomic broadcast feature. (Atomic means that all the system receive the updates if one fails)

Cluster membership: Managing the cluster membership (If node A crash, LLT immediately figures it out, GAB is responsive for reforming the cluster with the rest of 2 nodes)

Cluster comm.: it is the mechanism is for communicating one node can able to talk to other nodes using the GAB ports.

GAB ports: GAB/IO Fencing/HAD -> if any config change on made on node a, then rest of the nodes to know about the changes using GAB/IO Fencing / HAD)


VCS seed: Only seeded nodes can run VCS. VCS does not start the service group until it has a seed.

Manual seed: It is possible that one of the nodes does not come up when all other nodes on the cluster are started, due to the "minimum seed requirement" safety that is enforced by GAB in this stage, human intervention required the mini-cluster by determining other node is in fact not participating its own mini-cluster. To manually seed the GAB membership, run #gabconfig -cx)

 ------------------------------------------------------------------------------------------------

HAD

HAD maintain the cluster state information and it track all the changes within the cluster configuration and resource status by communicating with GAB

HAD is monitored by hashadow. If HAD fails, hashadow attempt to restart it, likewise hashadow demon dies HAD will restart it.

HAD using the main.cf file to build the cluster information in memory also it is responsible for updating the configuration in memory.

 ------------------------------------------------------------------------------------------------

VCS architecture for LLT/GAB/HAD

  ü  Agent monitors the resources on each system and provide status to HAD on the local system.
  ü  HAD on each system send status information to GAB.
  ü  GAB broadcasts configuration information to all cluster members.
  ü  LLT transport all cluster communications to all cluster nodes
  ü  HAD on each nodes take corrective action, such as failover when necessary

 ------------------------------------------------------------------------------------------------


VCS syndrome

Jeopardy syndrome: 

     If a node is running with one heartbeat only, that is the state of Jeopardy.  VCS does not restart the application on the new node when a node is running with one heartbeat only. (This action of disabling the failover is a safety mechanism that prevents the data corruption)

 ------------------------------------------------------------------------------------------------

split brain syndrome

The split-brain syndrome occurs, when all the LLT links fail simultaneously.

The nodes on the cluster fail to identify whether it is the system failure or interconnect failure, Each mini-cluster will be formed and cluster thinks that it is the only cluster that's active at the moment and tries to start the service groups on the other mini-cluster which he thinks is down. A similar thing happens to the other mini-cluster.   This may lead to simultaneous access to storage and can cause data corruption.

 ------------------------------------------------------------------------------------------------

How to prevent these syndromes

I/O fencing

VCS implements I/O fencing mechanism to avoid a possible split-brain condition. It ensures data integrity and data protection. I/O fencing driver uses SCSI-3PGR to fence off the data in case of a possible split-brain scenario.

Coordinator disks

Coordinator disks are used to store the key of each host, which can be used to determine which node stays in cluster in case of possible split-brain scenario.

In the case of split-brain scenario, the coordinator disks trigger the fencing driver to ensure only one mini-cluster survives. Which means when the heartbeats are failed, both nodes tried to reach the coordinator disks. Which node reaches the co-ordinator disks, that node will start the services and turned to active.


 ------------------------------------------------------------------------------------------------

Auto start policy

Auto start policy kicks in whenever hastart has been done (initiated)
Auto start list while coming to picture when it is with the failover service group
System list and auto start list are consider as same when it is parallel SG.

Auto start policy: Three types (order, priority, and load)

Order - it will follow the auto start list from left to right.

Priority - in system list we have a list of servers with priority (lower the number, higher the priority)

Load - very rarely used as it is static load (not a dynamic one)

 ------------------------------------------------------------------------------------------------

Application agent


Application agent gives protection to the application.

Attributes:
Start program: executable/script to start the application

Stop program: executable/script to stop the application

Monitor program: provide a path to program/script to monitor the state of the program when it starts it up

Monitor processes: List out the running processes using ps -ef command
PID files: some application when they start it up, they create files that contain process id.








Thursday, April 25, 2019

0514-086 Cannot perform the requested function because the specified device is not supported on this platform in aix?

0514-086 Cannot perform the requested function because the specified device is not supported on this platform in aix?


What happened:


We got that disk failure issue on the server, so we have raised case with vendor and they confirmed that the disk is faulty. We are trying to replace the disk however after the disk has been replaced we got the below error (which means we could not able to scan the newly replaced disk)


Error:


# cfgmgr

0514-086 Cannot perform the requested function because the specified device is not supported on this platform


Root cause / Fix :


The field engineer trying to replace the disk with 15K drives which is not supported on the server, later we can replace the disk with 10K drives on the server and we have executed the cfgmgr again, and the server detected the disk easily.



Saturday, March 16, 2019

How to migrate a aix server from AIX 6.1 to 7.1 easily?

How to migrate a aix server from AIX 6.1 to 7.1 easily?




There are different ways to migrate your system from one AIX version to another AIX version:


Migration by using NIM / CD-DVD / mksysb / alternate disk migration.


Sometimes you will be in a position to migrate a NIM master from AIX 6.1 to AIX 7.1.
It is difficult to migrate a nim server using another nim server as you facing some difficulties. 


Sometimes your server sitting in beyond the firewall (DMZ) area, you could not able to enable rsh/ssh/ping to the server from outside, at that time the above method will help you to migrate it easily.


The below steps are really really easy to perform the aix migration from AIX 6.1 to AIX 7.1 which does not required much requirements to meet.


Take a mksysb of the target server
server1# mksysb -ieX /backup/server1_aix61.mksysb



SCP the mksysb over to your NIM master and define it as a NIM mksysb resource
server1# scp /backup/server1_aix61.mksysb root@nimserver:/backupfs/.



Make sure you got the lpp_source and SPOT from the base AIX ISO images at the level you want to upgrade the system to. If not please create it.

nimserver# lsnim -l aix71_TL05_SP02_spot
nimserver# lsnim -l aix71_TL05_SP02_lppsource



Create the mksysb resource using the mksysb taken on the lpar mksysb

nimserver# nim -o define -t 'mksysb' -a server=master -a location=/backupfs/server1_aix61.mksysb server1_AIX61_mksysb



Use the 'nimadm' command to migrate the existing mksysb to a new + upgraded mksysb image with the following command

nimadm -s <spot> -l <lpp_source> -j <volume group for cache> -Y -T <old_mksysb_resource> -O <new mksysb resource file pathname> -N <new_mksysb_resource>

nimserver# nimadm -s aix71_TL05_SP02_spot -l aix71_TL05_SP02_lppsource -j nimadmvg -Y -T server1_AIX61_mksysb -O /backupfs/server1_7.1_mksysb -N server1_AIX71_mksysb




once that completes you would take the new mksysb file that was created and upgraded and scp it back to the lpar it was taken from

server1# scp /backupfs/server1_7.1_mksysb root@server1:/backup/



Install the "bos.alt_disk_install.rte" and "bos.alt_disk_install.boot_images" fileset to that system at the same TL/SP level as the level you upgraded that mksysb to

server1# mount nimserver:/aix71_TL05_SP02_lppsource  /mnt    -> where we hold the AIX 7.1 package

server1# smitty installp  -> install the "bos.alt_disk_install.rte" and "bos.alt_disk_install.boot_images"
(or)
          server1# installp -acgXYd .  bos.alt_disk_install.rte bos.alt_disk_install.boot_images

server1# alt_disk_mksysb -m -k  /backup/server1_7.1_mksysb -d hdisk2

Note: -k specifies that mksysb devices be kept (formally the ALT_KEEP_MDEV variables) This option is to having the customized network related settings to be cloned with mksysb from the old image to new image. 

If you are not using the "-k" option, there might be the chance to losing your network connectivity after the reboot.
           


Boot up from the upgraded disk

Server1# bootlist -m normal -o hdisk2


For the first reboot, it will take long time to reboot as the server booted from the pugraded OS. Please patience.  Once the server rebooted, please check the OS version and other post checks.

Server1# oslevel -s; lppchk -vm3; instfix -i|egrep -i 'tl|sp'; oslevel -rl7100-05

Server1# oslevel -s
7100.05.02.1134



see your server successfully migrated to AIX 7.1