Wednesday, September 21, 2011

Procedure to replace HBA card on AIX MPIO

HBA Overview:

HBA is short form for host Bus Adapter, This is a interface card that will connects a Host to your SAN/Tapes. It is an electronic circuit board that operates input/output operations and physical connectivity among a server and our storage or tape drives.

At present we are using these HBA is frequently used for Fiber channel interface cards. Every HBA's have a unique number that called as World Wide Name (WWN)

Some time we may windup with bad HBA's so we may need to replace them with new HBA's .. Here this post will give us a idea how we can replace the bad HBA... This can be identified by verifying errpt...

The procedure here is defined for MPIO .. it will vary for EMC power path and Veritas DMP..

Replacing HBA in MPIO pathing:
1. For any parts replacements take backup of your system configurations and Speacilly that hardware configuration. Here we need to take backup of your fiber channel and FSCSI conncted to them and all the devices connected to those.


Make note of your device WWN number for rezoing after replacement .....
iostat -a grep -i fcs;lsdev -C grep -i ;lsattr -El ;lscfg -vl ;
lsdev -C grep -i ;lsattr -El ; lsattr -vl ; lspath grep -i ...

2. Verify that the bad scsi's parent device with following command .. Consider we had bad fscsi5 ...
#lsdev -C -l fscsi5 -F parent
fcs5
it means the fscsi5's parent device is fcs5 ..


3. Now verify that fcs5 is dual port HBA or single ...
#lsslot -c pci |grep -i fcs5
U1.5-P1-I1 PCI 64 bit, 66MHz, 3.3 volt slot fcs5
Here in this case the HBA is single port ... so no need to bring down any apart from fscsi5 .. if it is dual Port we need to bring down another port FC ..

4. If we want we can verify the fcs5 parent device from the coommand line ..
#lsdev -C -l fcs5 -F parent
pci37
Here the fcs5 parent device is pci37 on pci slot..


5. Disable and remove the paths those conncted to bad fscsi device by using simple script ..
#lspath |grep -i fail while read LINE
do
set -- $LINE
chpath -l $2 -s disable -p $3
done Here we are disabling all the fail paths from fscsi ...

Now remove the paths from defined state ..
#lspath |grep -i disable while read LINE
do
set -- $LINE
rmpath -l $2 -p $3
done


6.Remove the device from server .. We no need to remove the it from ODM .. IBM engineer will do the same from diag .. if he don't have access to server we may need to remove the device from ODM .. before that you can run diag and verify the HBA status.you can run diag on fsc5 to verfiy what went wrong on server
diag --->task selection ----> Run Diagnostics ---> and select app FCS device
#rmdev -Rl fcs5 (will remove the device but still ODM entry on server)
#rmdev -Rl -d fcs5 ( Will remove the device from ODM entry also)


7. Once you have done with deletion of the device You can inform your IBM engineer to replace the new HBA ... He will replace the same and will provide the new WWN for rezoning .. or U can get the same once you run the cfgmgr ...


8.Before running cfgmgr or enabling paths after HBA replacemnet ... please change the settings for fscsi .. like below ..
#chdev -a dyntrk=yes -l fscsi5
#chdev -a fc_err_recov fast_fail -l fscsi5


9. Run the cfgmgr across any parent device for fcs5 or fscsi5 .. here I am running across fcs5 device ..
#cfgmgr -vl pci37 ..


10. Get WWN nembers and ask your storage team to re zone ur SAN ( (if you havn't get from your engineer run the below commands)
#lscfg -vl fcs5| egrep "Net|FRU|Part"
Here Net value is the ur New WWN number.


11. Verify that if still lspath is showing the disabled paths enable them ....
#lspath|grep -i fail while read LINE
do
set -- $LINE
chpath -l $2 -s enable -p $3
done

or

#lspath |grep -i disable  while read LINE
do
set -- $LINE
chpath -l $2 -s enable -p $3
done


Verfiy the errpt and you can see log repair action on ur server and monitor the errpt for couple of hours ..
That's it We have done with HBA replacment ..






























Monday, September 5, 2011

Moving Memory from One Lpar to Other

This post will give procedure to move memory from one LPAR to another Partitions. Here it will take
adavnatges of Hardware..... if LPAR is capable of Dynamic opeartions i.e. DLPAR capabilities.
By using this adavantage we can move physical resources like memory , CPU and I/O devices from one LPAR to other.

This post will give breif idea on how to move one of the physical resource (memory) from lapr to other.
Here the memory setting min,desired,max memory for lpar1 (1024,10240,10240) for and for lpar2(1024,6144,6144).... we rae moving 2048MB from lpar1 to lpar2.

Note: Both Lpars should be on same hardware i.e. on same managed system.

Step1: Take the below outputs for backup purpose on both LPARs.
lparstat -i , lsaatr -El sys0 -a realmem,ifconfig -a ,df -m ,lsvg , lsvg -o and LPAR HMC profile backup.

Step2: verify that bothe LPAR's are capable of DLPAR operations. to verify that use below.
from HMC: #opt/csm/bin/lsnodes -a status
lpar1 1
lpar2 1
dbprd 0
here lpar1,lpar2 both are DLPAR capable and dbprd is not.

(OR)
#lspartition -dlpar
if your facing any issues with DLPAR verify below filesets are installed on LPARs i.e.
#lslpp -l rsct.core*
#lslpp -l csm.client
and verify the below subsystems are in acive state.
#lssrc -a|grep rsctSubsystem Group PID Status
Ctrmc rsct 21044 Active
Ctcas rsct 21045 Active
IBM.CSMagentRM rsct_rm 21045 Active
IBM.serviceRM rsct_rm 11836 Active
IBM.DRM rsct_rm 20011 Active
IBM.HostRM rsct_rm 20012 Active

Step3: change the memory setting on LPARS for all profiles. This can be chnage from HMC GUI also.
#chsyscfg -r prof -m -i "name=lpar1_normal,lpar_name=lpar1,min_mem=1024,desired_mem=8192,max_mem=8192"
here we have reduced the 2048MB from desired and max memory and this same to be added on lpar2 profile.
#chsyscfg -r prof -m -i "name=lpar2_normal,lpar_name=lpar2,min_mem=1024,desired_mem=8192,max_mem=8192"
Added 2048GB to lpar2 profile.
Note: This activity has to be done on all the profiles defined for each lpar.

Step4: Now we are ready to move memory from lpar1 to lpar2. The memory movement should be Logical MB (LMB).To get the LMB size from HMC.
#lshwres -m -p lpar1 -r mem -F lmb_size
here size is 256MB.

Step5:Now as we are moving 2048MB .. so here 20048MB is equal to 8LMB so to move 2048 MB from lapr1 to lapr2
#chhwres -r mem -m -o m -p lpar1 -t lpar2 -q 8
-r --- resource
-o --- operation(add/delete/move)
-t --- target server
-q --- memowy count in LMB.

Note : If your hardware is not supporting the DLPAR operation we need to change the profiles and shutdown the lpars and activate the lpar from HMC.