LSI 9285-8e 和 Supermicro SC837E26-RJBOD1 重複的機箱 ID 和插槽號
我正在使用連接到 Supermicro 1U 主機中的單個 LSI 9285-8e 卡的 2 個 Supermicro SC837E26-RJBOD1 機箱。每個機箱中有 28 個驅動器,總共 56 個驅動器在 28 個 RAID1 鏡像中。
我遇到的問題是 2 個機箱有重複的插槽(插槽列表兩次,僅從 0 到 27)。所有驅動器還顯示相同的機箱 ID (ID 36)。但是,MegaCLI -encinfo 正確列出了 2 個機箱(ID 36 和 ID 65)。
我的問題是,為什麼會發生這種情況?我是否缺少有效使用 2 個機箱的選項?
這阻止了我重建插槽 11 中出現故障的驅動器,因為我只能將機箱和插槽指定為參數來更換驅動器。當我這樣做時,它選擇了錯誤的插槽 11(設備 ID 46 而不是設備 ID 19)。
適配器 #1 是 LSI 9285-8e,適配器 #0(由於空間限制,我將其刪除)是板載 LSI。
適配器資訊:
Adapter #1 ============================================================================== Versions ================ Product Name : LSI MegaRAID SAS 9285-8e Serial No : SV12704804 FW Package Build: 23.1.1-0004 Mfg. Data ================ Mfg. Date : 06/30/11 Rework Date : 00/00/00 Revision No : 00A Battery FRU : N/A Image Versions in Flash: ================ BIOS Version : 5.25.00_4.11.05.00_0x05040000 WebBIOS Version : 6.1-20-e_20-Rel Preboot CLI Version: 05.01-04:#%00001 FW Version : 3.140.15-1320 NVDATA Version : 2.1106.03-0051 Boot Block Version : 2.04.00.00-0001 BOOT Version : 06.253.57.219 Pending Images in Flash ================ None PCI Info ================ Vendor Id : 1000 Device Id : 005b SubVendorId : 1000 SubDeviceId : 9285 Host Interface : PCIE ChipRevision : B0 Number of Frontend Port: 0 Device Interface : PCIE Number of Backend Port: 8 Port : Address 0 5003048000ee8e7f 1 5003048000ee8a7f 2 0000000000000000 3 0000000000000000 4 0000000000000000 5 0000000000000000 6 0000000000000000 7 0000000000000000 HW Configuration ================ SAS Address : 500605b0038f9210 BBU : Present Alarm : Present NVRAM : Present Serial Debugger : Present Memory : Present Flash : Present Memory Size : 1024MB TPM : Absent On board Expander: Absent Upgrade Key : Absent Temperature sensor for ROC : Present Temperature sensor for controller : Absent ROC temperature : 70 degree Celcius Settings ================ Current Time : 18:24:36 3/13, 2012 Predictive Fail Poll Interval : 300sec Interrupt Throttle Active Count : 16 Interrupt Throttle Completion : 50us Rebuild Rate : 30% PR Rate : 30% BGI Rate : 30% Check Consistency Rate : 30% Reconstruction Rate : 30% Cache Flush Interval : 4s Max Drives to Spinup at One Time : 2 Delay Among Spinup Groups : 12s Physical Drive Coercion Mode : Disabled Cluster Mode : Disabled Alarm : Enabled Auto Rebuild : Enabled Battery Warning : Enabled Ecc Bucket Size : 15 Ecc Bucket Leak Rate : 1440 Minutes Restore HotSpare on Insertion : Disabled Expose Enclosure Devices : Enabled Maintain PD Fail History : Enabled Host Request Reordering : Enabled Auto Detect BackPlane Enabled : SGPIO/i2c SEP Load Balance Mode : Auto Use FDE Only : No Security Key Assigned : No Security Key Failed : No Security Key Not Backedup : No Default LD PowerSave Policy : Controller Defined Maximum number of direct attached drives to spin up in 1 min : 10 Any Offline VD Cache Preserved : No Allow Boot with Preserved Cache : No Disable Online Controller Reset : No PFK in NVRAM : No Use disk activity for locate : No Capabilities ================ RAID Level Supported : RAID0, RAID1, RAID5, RAID6, RAID00, RAID10, RAID50, RAID60, PRL 11, PRL 11 with spanning, SRL 3 supported, PRL11-RLQ0 DDF layout with no span, PRL11-RLQ0 DDF layout with span Supported Drives : SAS, SATA Allowed Mixing: Mix in Enclosure Allowed Mix of SAS/SATA of HDD type in VD Allowed Status ================ ECC Bucket Count : 0 Limitations ================ Max Arms Per VD : 32 Max Spans Per VD : 8 Max Arrays : 128 Max Number of VDs : 64 Max Parallel Commands : 1008 Max SGE Count : 60 Max Data Transfer Size : 8192 sectors Max Strips PerIO : 42 Max LD per array : 16 Min Strip Size : 8 KB Max Strip Size : 1.0 MB Max Configurable CacheCade Size: 0 GB Current Size of CacheCade : 0 GB Current Size of FW Cache : 887 MB Device Present ================ Virtual Drives : 28 Degraded : 0 Offline : 0 Physical Devices : 59 Disks : 56 Critical Disks : 0 Failed Disks : 0 Supported Adapter Operations ================ Rebuild Rate : Yes CC Rate : Yes BGI Rate : Yes Reconstruct Rate : Yes Patrol Read Rate : Yes Alarm Control : Yes Cluster Support : No BBU : No Spanning : Yes Dedicated Hot Spare : Yes Revertible Hot Spares : Yes Foreign Config Import : Yes Self Diagnostic : Yes Allow Mixed Redundancy on Array : No Global Hot Spares : Yes Deny SCSI Passthrough : No Deny SMP Passthrough : No Deny STP Passthrough : No Support Security : No Snapshot Enabled : No Support the OCE without adding drives : Yes Support PFK : Yes Support PI : No Support Boot Time PFK Change : Yes Disable Online PFK Change : No PFK TrailTime Remaining : 0 days 0 hours Support Shield State : Yes Block SSD Write Disk Cache Change: Yes Supported VD Operations ================ Read Policy : Yes Write Policy : Yes IO Policy : Yes Access Policy : Yes Disk Cache Policy : Yes Reconstruction : Yes Deny Locate : No Deny CC : No Allow Ctrl Encryption: No Enable LDBBM : No Support Breakmirror : No Power Savings : Yes Supported PD Operations ================ Force Online : Yes Force Offline : Yes Force Rebuild : Yes Deny Force Failed : No Deny Force Good/Bad : No Deny Missing Replace : No Deny Clear : No Deny Locate : No Support Temperature : Yes Disable Copyback : No Enable JBOD : No Enable Copyback on SMART : No Enable Copyback to SSD on SMART Error : Yes Enable SSD Patrol Read : No PR Correct Unconfigured Areas : Yes Enable Spin Down of UnConfigured Drives : Yes Disable Spin Down of hot spares : No Spin Down time : 30 T10 Power State : Yes Error Counters ================ Memory Correctable Errors : 0 Memory Uncorrectable Errors : 0 Cluster Information ================ Cluster Permitted : No Cluster Active : No Default Settings ================ Phy Polarity : 0 Phy PolaritySplit : 0 Background Rate : 30 Strip Size : 64kB Flush Time : 4 seconds Write Policy : WB Read Policy : Adaptive Cache When BBU Bad : Disabled Cached IO : No SMART Mode : Mode 6 Alarm Disable : Yes Coercion Mode : None ZCR Config : Unknown Dirty LED Shows Drive Activity : No BIOS Continue on Error : No Spin Down Mode : None Allowed Device Type : SAS/SATA Mix Allow Mix in Enclosure : Yes Allow HDD SAS/SATA Mix in VD : Yes Allow SSD SAS/SATA Mix in VD : No Allow HDD/SSD Mix in VD : No Allow SATA in Cluster : No Max Chained Enclosures : 16 Disable Ctrl-R : Yes Enable Web BIOS : Yes Direct PD Mapping : No BIOS Enumerate VDs : Yes Restore Hot Spare on Insertion : No Expose Enclosure Devices : Yes Maintain PD Fail History : Yes Disable Puncturing : No Zero Based Enclosure Enumeration : No PreBoot CLI Enabled : Yes LED Show Drive Activity : Yes Cluster Disable : Yes SAS Disable : No Auto Detect BackPlane Enable : SGPIO/i2c SEP Use FDE Only : No Enable Led Header : No Delay during POST : 0 EnableCrashDump : No Disable Online Controller Reset : No EnableLDBBM : No Un-Certified Hard Disk Drives : Allow Treat Single span R1E as R10 : No Max LD per array : 16 Power Saving option : Don't Auto spin down Configured Drives Max power savings option is not allowed for LDs. Only T10 power conditions are to be used. Default spin down time in minutes: 30 Enable JBOD : No TTY Log In Flash : No Auto Enhanced Import : No BreakMirror RAID Support : No Disable Join Mirror : No Enable Shield State : Yes Time taken to detect CME : 60s Exit Code: 0x00
機箱資訊:
# /opt/MegaRAID/MegaCli/MegaCli64 -encinfo -a1 Number of enclosures on adapter 1 -- 3 Enclosure 0: Device ID : 36 Number of Slots : 28 Number of Power Supplies : 2 Number of Fans : 3 Number of Temperature Sensors : 1 Number of Alarms : 1 Number of SIM Modules : 0 Number of Physical Drives : 28 Status : Normal Position : 1 Connector Name : Port B Enclosure type : SES VendorId is LSI CORP and Product Id is SAS2X36 VendorID and Product ID didnt match FRU Part Number : N/A Enclosure Serial Number : N/A ESM Serial Number : N/A Enclosure Zoning Mode : N/A Partner Device Id : 65 Inquiry data : Vendor Identification : LSI CORP Product Identification : SAS2X36 Product Revision Level : 0718 Vendor Specific : x36-55.7.24.1 Number of Voltage Sensors :2 Voltage Sensor :0 Voltage Sensor Status :OK Voltage Value :5020 milli volts Voltage Sensor :1 Voltage Sensor Status :OK Voltage Value :11820 milli volts Number of Power Supplies : 2 Power Supply : 0 Power Supply Status : OK Power Supply : 1 Power Supply Status : OK Number of Fans : 3 Fan : 0 Fan Speed :Low Speed Fan Status : OK Fan : 1 Fan Speed :Low Speed Fan Status : OK Fan : 2 Fan Speed :Low Speed Fan Status : OK Number of Temperature Sensors : 1 Temp Sensor : 0 Temperature : 48 Temperature Sensor Status : OK Number of Chassis : 1 Chassis : 0 Chassis Status : OK Enclosure 1: Device ID : 65 Number of Slots : 28 Number of Power Supplies : 2 Number of Fans : 3 Number of Temperature Sensors : 1 Number of Alarms : 1 Number of SIM Modules : 0 Number of Physical Drives : 28 Status : Normal Position : 1 Connector Name : Port A Enclosure type : SES VendorId is LSI CORP and Product Id is SAS2X36 VendorID and Product ID didnt match FRU Part Number : N/A Enclosure Serial Number : N/A ESM Serial Number : N/A Enclosure Zoning Mode : N/A Partner Device Id : 36 Inquiry data : Vendor Identification : LSI CORP Product Identification : SAS2X36 Product Revision Level : 0718 Vendor Specific : x36-55.7.24.1 Number of Voltage Sensors :2 Voltage Sensor :0 Voltage Sensor Status :OK Voltage Value :5020 milli volts Voltage Sensor :1 Voltage Sensor Status :OK Voltage Value :11760 milli volts Number of Power Supplies : 2 Power Supply : 0 Power Supply Status : OK Power Supply : 1 Power Supply Status : OK Number of Fans : 3 Fan : 0 Fan Speed :Low Speed Fan Status : OK Fan : 1 Fan Speed :Low Speed Fan Status : OK Fan : 2 Fan Speed :Low Speed Fan Status : OK Number of Temperature Sensors : 1 Temp Sensor : 0 Temperature : 47 Temperature Sensor Status : OK Number of Chassis : 1 Chassis : 0 Chassis Status : OK Enclosure 2: Device ID : 252 Number of Slots : 8 Number of Power Supplies : 0 Number of Fans : 0 Number of Temperature Sensors : 0 Number of Alarms : 0 Number of SIM Modules : 1 Number of Physical Drives : 0 Status : Normal Position : 1 Connector Name : Unavailable Enclosure type : SGPIO Failed in first Inquiry commnad FRU Part Number : N/A Enclosure Serial Number : N/A ESM Serial Number : N/A Enclosure Zoning Mode : N/A Partner Device Id : Unavailable Inquiry data : Vendor Identification : LSI Product Identification : SGPIO Product Revision Level : N/A Vendor Specific : Exit Code: 0x00
現在,請注意每個插槽 11 設備顯示的機箱 ID 為 36,我認為這就是發生差異的地方。一個應該是 36。但另一個應該在外殼 65 上。
插槽 11 中的驅動器:
Enclosure Device ID: 36 Slot Number: 11 Drive's postion: DiskGroup: 5, Span: 0, Arm: 1 Enclosure position: 0 Device Id: 48 WWN: Sequence Number: 11 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online, Spun Up Is Commissioned Spare : YES Device Firmware Level: A5C0 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5003048000ee8a53 Connected Port Number: 1(path0) Inquiry Data: MJ1311YNG6YYXAHitachi HDS5C3030ALA630 MEAOA5C0 FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :30C (86.00 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Drive's NCQ setting : Enabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No Enclosure Device ID: 36 Slot Number: 11 Drive's postion: DiskGroup: 19, Span: 0, Arm: 1 Enclosure position: 0 Device Id: 19 WWN: Sequence Number: 4 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online, Spun Up Is Commissioned Spare : NO Device Firmware Level: A580 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5003048000ee8e53 Connected Port Number: 0(path0) Inquiry Data: MJ1313YNG1VA5CHitachi HDS5C3030ALA630 MEAOA580 FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :30C (86.00 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Drive's NCQ setting : Enabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No
2012 年 6 月 28 日更新:
我終於有了一些關於(我們認為)這個問題的根本原因的新資訊,所以我想我會分享。
在與一位知識淵博的 Supermicro 技術人員取得聯繫後,他們為我們提供了一個名為 Xflash 的工具(在他們的 FTP 上似乎並不容易獲得)。當我們使用這個實用程序收集一些資訊時,我的同事發現了一些非常奇怪的東西:
root@mogile2 test]# ./xflash.dat -i getavail
Initializing Interface. Expander: SAS2X36 (SAS2x36) 1) SAS2X36 (SAS2x36) (50030480:00EE917F) (0.0.0.0) 2) SAS2X36 (SAS2x36) (50030480:00E9D67F) (0.0.0.0) 3) SAS2X36 (SAS2x36) (50030480:0112D97F) (0.0.0.0)
這列出了連接的機箱。您會看到 3 個已連接(我們已經添加了第 3 個和第 4 個尚未顯示)及其各自的 SAS 地址/WWN (50030480:00EE917F)。現在我們可以使用這個地址來獲取各個機箱的資訊:
[root@mogile2 test]# ./xflash.dat -i 5003048000EE917F get exp Initializing Interface. Expander: SAS2X36 (SAS2x36) Reading the expander information.......... Expander: SAS2X36 (SAS2x36) B3 SAS Address: 50030480:00EE917F Enclosure Logical Id: 50030480:0000007F IP Address: 0.0.0.0 Component Identifier: 0x0223 Component Revision: 0x05 [root@mogile2 test]# ./xflash.dat -i 5003048000E9D67F get exp Initializing Interface. Expander: SAS2X36 (SAS2x36) Reading the expander information.......... Expander: SAS2X36 (SAS2x36) B3 SAS Address: 50030480:00E9D67F Enclosure Logical Id: 50030480:0000007F IP Address: 0.0.0.0 Component Identifier: 0x0223 Component Revision: 0x05 [root@mogile2 test]# ./xflash.dat -i 500304800112D97F get exp Initializing Interface. Expander: SAS2X36 (SAS2x36) Reading the expander information.......... Expander: SAS2X36 (SAS2x36) B3 SAS Address: 50030480:0112D97F Enclosure Logical Id: 50030480:0112D97F IP Address: 0.0.0.0 Component Identifier: 0x0223 Component Revision: 0x05
你懂了嗎?前 2 個機箱邏輯 ID 被部分屏蔽,而第三個(具有正確的唯一機箱 ID)則沒有。我們向 Supermicro 指出了這一點,並且能夠確認該地址應該在製造過程中設置,並且某些批次的這些機箱存在問題,其中沒有設置邏輯 ID。
我們認為 RAID 控制器根據邏輯 ID 確定 ID,並且由於我們的前 2 個機箱具有相同的邏輯 ID,因此它們獲得相同的機箱 ID。我們還確認0000007F是來自 LSI 作為 ID 的預設值。
有助於確認這可能是執行 JBOD 的製造問題的下一個指標是,所有 6 個存在此問題的機箱都以00E開頭。我相信在 00E8 和 00EE 之間,Supermicro 忘記了正確程式邏輯 ID,並且忽略了在製作後召回或修復問題。
對我們來說幸運的是,有一個工具可以管理來自 Supermicro 的設備的 WWN 和邏輯 ID:ftp: //ftp.supermicro.com/utility/ExpanderXtools_Lite/。我們的下一步是安排這些 JBOD 的關閉(在數據遷移之後)並重新程式邏輯 ID 並查看它是否解決了問題。
2012 年 6 月 28 日更新 #2:
我剛剛在 Supermicro 上發現了這個常見問題解答,同時 Google 搜尋“lsi 0000007f”:http ://www.supermicro.com/support/faqs/faq.cfm?faq=11805 。我仍然不明白為什麼,在我們最近幾次聯繫 Supermicro 時,他們從未將我們定向到這篇文章:\
我們設法最終解決了這個問題。最終原因和解決方法?似乎是製造過程錯誤導致一些從 Supermicro 發貨的 JBOD 帶有預設的燒錄邏輯 ID(0000007F)。這個地址實際上應該預設匹配 SAS 地址。
為了解決這個問題,我們必須執行一個名為 ExpanderXtools Lite ( <ftp://supermicro.com/utility/ExpanderXtools_Lite/> ) 的工具。您執行 SMC 二進製文件並會彈出一個 X 視窗(您需要安裝 X,或者如果您沒有像我們這樣的伺服器上執行 X,則筆記型電腦上的本地 X 伺服器通過 SSH 轉發)。在 SMC 程序中,您選擇 COM 菜單並點擊帶內。
現在,您可以進入 WWN 菜單並選擇 WWN。一個新的彈出視窗將顯示您的 JBOD 主要和次要(如果您有 E26 型號)控制器。您需要在關閉視窗之前同時更新兩個控制器。更新並關閉視窗後,關閉陣列電源一段時間然後重新打開電源。再次使用 SMC 二進製文件來驗證邏輯地址是否正確顯示。
最大的痛點是必須關閉陣列。可以線上執行此操作並使用您的 RAID 卡重新掃描。但最好安全地玩。磁碟的埠 ID 將更改。對我們來說,我們的 LSI 卡能夠在更改後拾取陣列。你的旅費可能會改變。