Wednesday, September 21, 2011

Procedure to replace HBA card on AIX MPIO

HBA Overview:

HBA is short form for host Bus Adapter, This is a interface card that will connects a Host to your SAN/Tapes. It is an electronic circuit board that operates input/output operations and physical connectivity among a server and our storage or tape drives.

At present we are using these HBA is frequently used for Fiber channel interface cards. Every HBA's have a unique number that called as World Wide Name (WWN)

Some time we may windup with bad HBA's so we may need to replace them with new HBA's .. Here this post will give us a idea how we can replace the bad HBA... This can be identified by verifying errpt...

The procedure here is defined for MPIO .. it will vary for EMC power path and Veritas DMP..

Replacing HBA in MPIO pathing:
1. For any parts replacements take backup of your system configurations and Speacilly that hardware configuration. Here we need to take backup of your fiber channel and FSCSI conncted to them and all the devices connected to those.


Make note of your device WWN number for rezoing after replacement .....
iostat -a grep -i fcs;lsdev -C grep -i ;lsattr -El ;lscfg -vl ;
lsdev -C grep -i ;lsattr -El ; lsattr -vl ; lspath grep -i ...

2. Verify that the bad scsi's parent device with following command .. Consider we had bad fscsi5 ...
#lsdev -C -l fscsi5 -F parent
fcs5
it means the fscsi5's parent device is fcs5 ..


3. Now verify that fcs5 is dual port HBA or single ...
#lsslot -c pci |grep -i fcs5
U1.5-P1-I1 PCI 64 bit, 66MHz, 3.3 volt slot fcs5
Here in this case the HBA is single port ... so no need to bring down any apart from fscsi5 .. if it is dual Port we need to bring down another port FC ..

4. If we want we can verify the fcs5 parent device from the coommand line ..
#lsdev -C -l fcs5 -F parent
pci37
Here the fcs5 parent device is pci37 on pci slot..


5. Disable and remove the paths those conncted to bad fscsi device by using simple script ..
#lspath |grep -i fail while read LINE
do
set -- $LINE
chpath -l $2 -s disable -p $3
done Here we are disabling all the fail paths from fscsi ...

Now remove the paths from defined state ..
#lspath |grep -i disable while read LINE
do
set -- $LINE
rmpath -l $2 -p $3
done


6.Remove the device from server .. We no need to remove the it from ODM .. IBM engineer will do the same from diag .. if he don't have access to server we may need to remove the device from ODM .. before that you can run diag and verify the HBA status.you can run diag on fsc5 to verfiy what went wrong on server
diag --->task selection ----> Run Diagnostics ---> and select app FCS device
#rmdev -Rl fcs5 (will remove the device but still ODM entry on server)
#rmdev -Rl -d fcs5 ( Will remove the device from ODM entry also)


7. Once you have done with deletion of the device You can inform your IBM engineer to replace the new HBA ... He will replace the same and will provide the new WWN for rezoning .. or U can get the same once you run the cfgmgr ...


8.Before running cfgmgr or enabling paths after HBA replacemnet ... please change the settings for fscsi .. like below ..
#chdev -a dyntrk=yes -l fscsi5
#chdev -a fc_err_recov fast_fail -l fscsi5


9. Run the cfgmgr across any parent device for fcs5 or fscsi5 .. here I am running across fcs5 device ..
#cfgmgr -vl pci37 ..


10. Get WWN nembers and ask your storage team to re zone ur SAN ( (if you havn't get from your engineer run the below commands)
#lscfg -vl fcs5| egrep "Net|FRU|Part"
Here Net value is the ur New WWN number.


11. Verify that if still lspath is showing the disabled paths enable them ....
#lspath|grep -i fail while read LINE
do
set -- $LINE
chpath -l $2 -s enable -p $3
done

or

#lspath |grep -i disable  while read LINE
do
set -- $LINE
chpath -l $2 -s enable -p $3
done


Verfiy the errpt and you can see log repair action on ur server and monitor the errpt for couple of hours ..
That's it We have done with HBA replacment ..






























1 comment:

  1. I have recevied a error is "Link Error" on both HBA Cards, Both the Hba Cards are in Online and also checked the GBIC/Fabric are fine .

    dual port HBA are fine

    U.01.-P1-C4 PCI-X 64 bit, 266MHz slot fcs
    U.01.-P1-C5 PCI-X 64 bit, 266MHz slot fcs

    How to fix the error link issue? Please advice me

    ReplyDelete