I have a MD3000i with dual controllers.
One of my controllers (Slot 0) is currently has a loss of communication (component reporting problem is "Thermal Sensor") and all data is currently failed over to the controller on Slot 1, therefore I have some Virtual Disks not on the preferred path.
All cabling & switches are good. I cannot ping the management Ethernet port, but can ping both the iSCIS RAID ports on the controller. I have not yet had the opportunity to power cycle the controllers, but I am anticipating the fact the controller in Slot 0 may have failed and will need replaced.
Once I have a full backup on all data on the MD3000i, I intend to stop all I/O and restart the device. If the problem still exists, I them plan on replacing the controller (I have a backup controller on hand). This is the part I want to make sure I get right! The following is what I plan to do and I want to make sure 1) this is the correct procedure/steps and 2) make sure the sequence I have them is correct...
Upgrade Firmware:
1.Backup array.
2.Gather support Bundle
3.Stop all I/O to array
4.Update Raid Controller Firmware (on the existing, good, controller to the latest version - my spare controller has firmware version 07.35.39.64, which is newer than the version I have running on my existing controllers)
5.Update Hard Drive firmware
6.Power down Server (using MDSM).
7.Power Down MD3000i & leave for several minutes
8.Power Up MD3000i
9.Power up Server
10.Verify the controller firmware & check status.
11.Gather a support bundle
12.If controller is Slot 0 is still in a failed state, proceed to Replacing Controller.
Replacing Controller
1.Place RAID Controller in slot 0 offline
2.Wait several minutes
3.Remove RAID Controller in slot 0 and insert replacement
4.Set static IP Address on management and iSCSI ports to the same address that were configured on old controller.
5.Place RAID Controller in Slot 0 online.
Notes: I am managing the controller using MDSM V3.35.G6.45 (I believe I cannot upgrade this to V4.1 or 5 unless I have SCOM or SCE) on a Windows 2008 R2 Server.
Any help/advise on a suitable plan to full recovery will be greatly appreciated! Thank you