Linux
mpt3sas_scsih_issue_tm:執行 ZFS 清理時超時
概述
我們目前正在對具有 12 個 RAID-Z1 vdev 的 ZFS 池進行清理,每個 vdev 有 12 個驅動器。每個 vdev 對應一個機箱。硬體是帶有兩個戴爾 12Gbps SAS (LSI SAS3008) 控制器和 12 個戴爾 MD1400 機箱的戴爾 PowerEdge 730xd。作業系統為 CentOS 7.6.1810。
我們無法成功清理池,因為一段時間後驅動器進入
FAULTED
ZFS,我們必須zpool clear
繼續。變成的驅動器FAULTED
看似隨機,並smartctl
表示它們的 SMART 狀態還可以。唯一的共同點是在驅動器被標記之前
FAULTED
,錯誤消息mpt3sas_scsih_issue_tm: timeout
出現在 中dmesg
,然後是控制器重置,以及大量的 ZED 錯誤和讀取錯誤。我目前陷入以下困境:
- 這是軟體問題還是硬體問題?
- 如果是軟體,是否有可以防止錯誤的配置更改或更新檔?
- 如果是硬體,我該如何縮小問題範圍?
我們嘗試了什麼
我們嘗試了以下方法:
- 增加每個磁碟的超時值
/sys/block/*/device/timeout
- 更換所有 SAS 電纜
- 升級所有韌體
FAULTED
在磁碟上執行 SMART 後台長時間測試- 重新啟動(到目前為止 3 次)
我也看了這個答案,但沒有幫助。
細節
以下是
journalctl
活動開始的時間:Apr 12 04:42:07 kernel: sd 5:0:18:0: attempting task abort! scmd(ffff8d36c295a4c0) Apr 12 04:42:07 kernel: sd 5:0:4:0: attempting task abort! scmd(ffff8d3745b20540) Apr 12 04:42:07 kernel: sd 5:0:4:0: [sdac] CDB: Read(32) Apr 12 04:42:07 kernel: sd 5:0:4:0: [sdac] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00 Apr 12 04:42:07 kernel: sd 5:0:4:0: [sdac] CDB[10]: 60 2a b8 c8 60 2a b8 c8 00 00 00 00 00 00 00 08 Apr 12 04:42:07 kernel: scsi target5:0:4: handle(0x000e), sas_address(0x5000c500a6bb846e), phy(4) Apr 12 04:42:07 kernel: scsi target5:0:4: enclosure logical id(0x5204747299f56500), slot(4) Apr 12 04:42:07 kernel: scsi target5:0:4: enclosure level(0x0000), connector name( 1 ) Apr 12 04:42:07 kernel: sd 5:0:18:0: [sdap] CDB: Read(32) Apr 12 04:42:07 kernel: sd 5:0:18:0: [sdap] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00 Apr 12 04:42:07 kernel: sd 5:0:18:0: [sdap] CDB[10]: 60 2b f7 f8 60 2b f7 f8 00 00 00 00 00 00 00 08 Apr 12 04:42:07 kernel: scsi target5:0:18: handle(0x001d), sas_address(0x5000c500a6bb68ce), phy(5) Apr 12 04:42:07 kernel: scsi target5:0:18: enclosure logical id(0x5204747299f5dd00), slot(0) Apr 12 04:42:07 kernel: scsi target5:0:18: enclosure level(0x0001), connector name( 1 ) Apr 12 04:42:37 kernel: mpt3sas_cm1: mpt3sas_scsih_issue_tm: timeout Apr 12 04:42:37 kernel: mf: Apr 12 04:42:37 kernel: 0100000e Apr 12 04:42:37 kernel: 00000100 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 000000b6 Apr 12 04:42:37 kernel: Apr 12 04:42:47 kernel: mpt3sas_cm1: sending diag reset !! Apr 12 04:42:48 kernel: mpt3sas_cm1: diag reset: SUCCESS Apr 12 04:42:48 kernel: mpt3sas_cm1: LSISAS3008: FWVersion(16.00.04.00), ChipRevision(0x02), BiosVersion(18.00.00.00) Apr 12 04:42:48 kernel: mpt3sas_cm1: Protocol=( Apr 12 04:42:48 kernel: Initiator Apr 12 04:42:48 kernel: ,Target Apr 12 04:42:48 kernel: ), Apr 12 04:42:48 kernel: Capabilities=( Apr 12 04:42:48 kernel: TLR Apr 12 04:42:48 kernel: ,EEDP Apr 12 04:42:48 kernel: ,Snapshot Buffer Apr 12 04:42:48 kernel: ,Diag Trace Buffer Apr 12 04:42:48 kernel: ,Task Set Full Apr 12 04:42:48 kernel: ,NCQ Apr 12 04:42:48 kernel: ) Apr 12 04:42:48 kernel: mpt3sas_cm1: sending port enable !! Apr 12 04:42:55 kernel: mpt3sas_cm1: port enable: SUCCESS Apr 12 04:42:55 kernel: mpt3sas_cm1: search for end-devices: start Apr 12 04:42:55 kernel: scsi target5:0:0: handle(0x000a), sas_addr(0x5000c500a6bc5ef6) Apr 12 04:42:55 kernel: scsi target5:0:0: enclosure logical id(0x5204747299f56500), slot(9) Apr 12 04:42:55 kernel: scsi target5:0:1: handle(0x000b), sas_addr(0x5000c500a6bc6e66) Apr 12 04:42:55 kernel: scsi target5:0:1: enclosure logical id(0x5204747299f56500), slot(5) Apr 12 04:42:55 kernel: scsi target5:0:2: handle(0x000c), sas_addr(0x5000c500a6bbd86e) Apr 12 04:42:55 kernel: scsi target5:0:2: enclosure logical id(0x5204747299f56500), slot(1)
對於連接到控制器的每個驅動器,重複
handle
和行。enclosure
然後,緊隨其後的是:
Apr 12 04:42:57 kernel: mpt3sas_cm1: search for end-devices: complete Apr 12 04:42:57 kernel: mpt3sas_cm1: search for expanders: start Apr 12 04:42:57 kernel: expander present: handle(0x0009), sas_addr(0x5204747299f565ff) Apr 12 04:42:57 kernel: expander present: handle(0x0016), sas_addr(0x5204747299f5ddff) Apr 12 04:42:57 kernel: expander present: handle(0x0024), sas_addr(0x520474729a0a68ff) Apr 12 04:42:57 kernel: expander present: handle(0x0032), sas_addr(0x520474729a0b61ff) Apr 12 04:42:57 kernel: expander present: handle(0x0040), sas_addr(0x520474729a09f1ff) Apr 12 04:42:57 kernel: mpt3sas_cm1: search for expanders: complete Apr 12 04:42:57 kernel: sd 5:0:4:0: task abort: SUCCESS scmd(ffff8d3745b20540) Apr 12 04:42:57 kernel: mpt3sas_cm1: removing unresponding devices: start Apr 12 04:42:57 kernel: mpt3sas_cm1: removing unresponding devices: end-devices Apr 12 04:42:57 kernel: mpt3sas_cm1: removing unresponding devices: expanders Apr 12 04:42:57 kernel: mpt3sas_cm1: removing unresponding devices: complete Apr 12 04:42:57 kernel: mpt3sas_cm1: scan devices: start Apr 12 04:42:57 kernel: sd 5:0:18:0: task abort: SUCCESS scmd(ffff8d36c295a4c0) Apr 12 04:42:57 kernel: scsi_io_completion: 13 callbacks suppressed Apr 12 04:42:57 kernel: sd 5:0:18:0: [sdap] FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK Apr 12 04:42:57 kernel: sd 5:0:18:0: [sdap] CDB: Read(32) Apr 12 04:42:57 kernel: sd 5:0:18:0: [sdap] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00 Apr 12 04:42:57 kernel: sd 5:0:18:0: [sdap] CDB[10]: 60 2b f7 f8 60 2b f7 f8 00 00 00 00 00 00 00 08 Apr 12 04:42:57 kernel: blk_update_request: 13 callbacks suppressed Apr 12 04:42:57 kernel: blk_update_request: I/O error, dev sdap, sector 1613494264 Apr 12 04:42:57 kernel: sd 5:0:21:0: attempting task abort! scmd(ffff8d3acfef0540) Apr 12 04:42:57 kernel: sd 5:0:21:0: [sdas] CDB: Read(32) Apr 12 04:42:57 kernel: sd 5:0:21:0: [sdas] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 03 Apr 12 04:42:57 kernel: sd 5:0:21:0: [sdas] CDB[10]: 01 af 8c b0 01 af 8c b0 00 00 00 00 00 00 00 08 Apr 12 04:42:57 kernel: scsi target5:0:21: handle(0x0020), sas_address(0x5000c500a6bc5f82), phy(8)
加上更多的讀取超時。然後,我們看到很多
zed
錯誤:Apr 12 04:42:57 zed[137074]: eid=2425 class=delay pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc59bb-part1 Apr 12 04:42:57 zed[137076]: eid=2426 class=delay pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc59bb-part1 Apr 12 04:42:57 zed[137078]: eid=2427 class=io pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc59bb-part1 Apr 12 04:42:57 zed[137080]: eid=2428 class=io pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc59bb-part1 Apr 12 04:42:57 zed[137082]: eid=2429 class=delay pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc4337-part1 Apr 12 04:42:57 zed[137084]: eid=2430 class=delay pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc4337-part1 Apr 12 04:42:57 zed[137086]: eid=2431 class=io pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc4337-part1 Apr 12 04:42:57 zed[137088]: eid=2432 class=io pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc4337-part1 Apr 12 04:42:57 zed[137090]: eid=2433 class=io pool_guid=0x3317CEBDDE480DA0 Apr 12 04:42:57 zed[137092]: eid=2434 class=io pool_guid=0x3317CEBDDE480DA0 Apr 12 04:42:57 zed[137094]: eid=2435 class=delay pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc5f83-part1 Apr 12 04:42:57 zed[137096]: eid=2436 class=delay pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc5f83-part1 Apr 12 04:42:57 zed[137098]: eid=2437 class=io pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc5f83-part1 Apr 12 04:42:57 zed[137100]: eid=2438 class=io pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc5f83-part1 Apr 12 04:42:57 zed[137102]: eid=2439 class=delay pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bb68cf-part1 Apr 12 04:42:57 zed[137104]: eid=2440 class=io pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bb68cf-part1
之後,驅動器被標記為 DEGRADED 或 FAULTED。我還將包括一些可能有用的更多資訊。
這是
zpool status
兩個帶有FAULTED
設備的 vdev 的輸出:raidz1-4 DEGRADED 0 0 0 scsi-35000cca2513f78b8 DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca25157bfd0 ONLINE 0 0 0 (repairing) scsi-35000cca251597aa4 DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca2515de7b0 FAULTED 0 0 0 too many errors scsi-35000cca2516278c8 DEGRADED 0 0 0 too many errors scsi-35000cca25163ea64 ONLINE 0 0 0 (repairing) scsi-35000cca251644664 DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca2516576a0 DEGRADED 0 0 0 too many errors scsi-35000cca251699f68 DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca25169bd10 DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca25169be5c DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca25169c09c DEGRADED 0 0 0 too many errors (repairing) raidz1-5 DEGRADED 0 0 0 scsi-35000cca2516bc234 DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca2516bc26c ONLINE 0 0 0 scsi-35000cca2516c8e78 ONLINE 0 0 0 scsi-35000cca2516ca244 ONLINE 0 0 0 scsi-35000cca2516ca334 ONLINE 0 0 0 (repairing) scsi-35000cca2516ca848 ONLINE 0 0 0 (repairing) scsi-35000cca2516cb3e0 ONLINE 0 0 0 (repairing) scsi-35000cca2516cb420 DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca2516cc210 ONLINE 0 0 0 scsi-35000cca2516ce390 FAULTED 0 0 0 too many errors (repairing) scsi-35000cca2516ce8e4 ONLINE 0 0 0 scsi-35000cca2516cf224 ONLINE 0 0 0
這是驅動器
smartctl -a
的輸出:FAULTED``raidz1-4
=== START OF INFORMATION SECTION === Vendor: HGST Product: HUH721010AL5200 Revision: LS15 Compliance: SPC-4 User Capacity: 9,796,820,402,176 bytes [9.79 TB] Logical block size: 512 bytes Physical block size: 4096 bytes Formatted with type 2 protection LU is fully provisioned Rotation Rate: 7200 rpm Form Factor: 3.5 inches Logical Unit id: 0x5000cca2515de7b0 Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Fri Apr 12 13:40:57 2019 CDT SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled === START OF READ SMART DATA SECTION === SMART Health Status: OK Current Drive Temperature: 29 C Drive Trip Temperature: 50 C Manufactured in week 02 of year 2017 Specified cycle count over device lifetime: 50000 Accumulated start-stop cycles: 5 Specified load-unload count over device lifetime: 600000 Accumulated load-unload cycles: 889 Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 30677043943309312 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 40 0 294 10394513 118610.223 0 write: 0 0 0 0 239773 43528.082 0 verify: 0 0 0 0 18403 101.563 0 Non-medium error count: 0 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background long Completed 96 18243 - [- - -] # 2 Background short Completed 96 16753 - [- - -] # 3 Reserved(7) Completed 64 2 - [- - -] Long (extended) Self Test duration: 64033 seconds [1067.2 minutes]
sysctl -a | grep -v 'net.' | grep -v 'kernel.sched_domain.'
:abi.vsyscall32 = 1 crypto.fips_enabled = 0 debug.exception-trace = 1 debug.kprobes-optimization = 1 debug.panic_on_rcu_stall = 0 dev.hpet.max-user-freq = 64 dev.mac_hid.mouse_button2_keycode = 97 dev.mac_hid.mouse_button3_keycode = 100 dev.mac_hid.mouse_button_emulation = 0 dev.raid.speed_limit_max = 200000 dev.raid.speed_limit_min = 1000 dev.scsi.logging_level = 0 fs.aio-max-nr = 65536 fs.aio-nr = 0 fs.binfmt_misc.status = enabled fs.dentry-state = 235028 190450 45 0 0 0 fs.dir-notify-enable = 1 fs.epoll.max_user_watches = 108185722 fs.file-max = 52384239 fs.file-nr = 2080 0 52384239 fs.inode-nr = 102807 662 fs.inode-state = 102807 662 0 0 0 0 0 fs.inotify.max_queued_events = 16384 fs.inotify.max_user_instances = 128 fs.inotify.max_user_watches = 8192 fs.lease-break-time = 45 fs.leases-enable = 1 fs.may_detach_mounts = 0 fs.mount-max = 100000 fs.mqueue.msg_default = 10 fs.mqueue.msg_max = 10 fs.mqueue.msgsize_default = 8192 fs.mqueue.msgsize_max = 8192 fs.mqueue.queues_max = 256 fs.nfs.nlm_grace_period = 0 fs.nfs.nlm_tcpport = 0 fs.nfs.nlm_timeout = 10 fs.nfs.nlm_udpport = 0 fs.nfs.nsm_local_state = 3 fs.nfs.nsm_use_hostnames = 0 fs.nr_open = 1048576 fs.overflowgid = 65534 fs.overflowuid = 65534 fs.pipe-max-size = 1048576 fs.pipe-user-pages-hard = 0 fs.pipe-user-pages-soft = 16384 fs.protected_hardlinks = 1 fs.protected_symlinks = 1 fs.quota.allocated_dquots = 0 fs.quota.cache_hits = 0 fs.quota.drops = 0 fs.quota.free_dquots = 0 fs.quota.lookups = 0 fs.quota.reads = 0 fs.quota.syncs = 0 fs.quota.warnings = 1 fs.quota.writes = 0 fs.suid_dumpable = 0 fs.xfs.age_buffer_centisecs = 1500 fs.xfs.error_level = 3 fs.xfs.filestream_centisecs = 3000 fs.xfs.inherit_noatime = 1 fs.xfs.inherit_nodefrag = 1 fs.xfs.inherit_nodump = 1 fs.xfs.inherit_nosymlinks = 0 fs.xfs.inherit_sync = 1 fs.xfs.irix_sgid_inherit = 0 fs.xfs.irix_symlink_mode = 0 fs.xfs.panic_mask = 0 fs.xfs.rotorstep = 1 fs.xfs.speculative_prealloc_lifetime = 300 fs.xfs.stats_clear = 0 fs.xfs.xfsbufd_centisecs = 100 fs.xfs.xfssyncd_centisecs = 3000 kernel.acct = 4 2 30 kernel.acpi_video_flags = 0 kernel.auto_msgmni = 1 kernel.bootloader_type = 114 kernel.bootloader_version = 2 kernel.cad_pid = 1 kernel.cap_last_cap = 36 kernel.compat-log = 1 kernel.core_pattern = core kernel.core_pipe_limit = 0 kernel.core_uses_pid = 1 kernel.ctrl-alt-del = 0 kernel.dmesg_restrict = 0 kernel.domainname = (none) kernel.ftrace_dump_on_oops = 0 kernel.ftrace_enabled = 1 kernel.hardlockup_all_cpu_backtrace = 0 kernel.hardlockup_panic = 1 kernel.hostname = htc-sblock-node197 kernel.hotplug = kernel.hung_task_check_count = 4194304 kernel.hung_task_panic = 0 kernel.hung_task_timeout_secs = 120 kernel.hung_task_warnings = 0 kernel.io_delay_type = 0 kernel.kexec_load_disabled = 0 kernel.keys.gc_delay = 300 kernel.keys.maxbytes = 20000 kernel.keys.maxkeys = 200 kernel.keys.persistent_keyring_expiry = 259200 kernel.keys.root_maxbytes = 25000000 kernel.keys.root_maxkeys = 1000000 kernel.kptr_restrict = 0 kernel.max_lock_depth = 1024 kernel.modprobe = /sbin/modprobe kernel.modules_disabled = 0 kernel.msg_next_id = -1 kernel.msgmax = 8192 kernel.msgmnb = 16384 kernel.msgmni = 32768 kernel.ngroups_max = 65536 kernel.nmi_watchdog = 1 kernel.ns_last_pid = 176562 kernel.numa_balancing = 1 kernel.numa_balancing_scan_delay_ms = 1000 kernel.numa_balancing_scan_period_max_ms = 60000 kernel.numa_balancing_scan_period_min_ms = 1000 kernel.numa_balancing_scan_size_mb = 256 kernel.numa_balancing_settle_count = 4 kernel.osrelease = 3.10.0-957.5.1.el7.x86_64 kernel.ostype = Linux kernel.overflowgid = 65534 kernel.overflowuid = 65534 kernel.panic = 0 kernel.panic_on_io_nmi = 0 kernel.panic_on_oops = 1 kernel.panic_on_stackoverflow = 0 kernel.panic_on_unrecovered_nmi = 0 kernel.panic_on_warn = 0 kernel.perf_cpu_time_max_percent = 25 kernel.perf_event_max_sample_rate = 32000 kernel.perf_event_mlock_kb = 516 kernel.perf_event_paranoid = 2 kernel.pid_max = 196608 kernel.poweroff_cmd = /sbin/poweroff kernel.print-fatal-signals = 0 kernel.printk = 7 4 1 7 kernel.printk_delay = 0 kernel.printk_ratelimit = 5 kernel.printk_ratelimit_burst = 10 kernel.pty.max = 4096 kernel.pty.nr = 4 kernel.pty.reserve = 1024 kernel.random.boot_id = 5bd2b4ab-221e-4157-98ad-fe4a81da7784 kernel.random.entropy_avail = 4034 kernel.random.poolsize = 4096 kernel.random.read_wakeup_threshold = 64 kernel.random.urandom_min_reseed_secs = 60 kernel.random.uuid = 4f4a6d22-d974-452d-b550-0e19b7a3c74e kernel.random.write_wakeup_threshold = 896 kernel.randomize_va_space = 2 kernel.real-root-dev = 0 kernel.sched_autogroup_enabled = 0 kernel.sched_cfs_bandwidth_slice_us = 5000 kernel.sched_child_runs_first = 0 kernel.sched_latency_ns = 24000000 kernel.sched_migration_cost_ns = 500000 kernel.sched_min_granularity_ns = 3000000 kernel.sched_nr_migrate = 32 kernel.sched_rr_timeslice_ms = 100 kernel.sched_rt_period_us = 1000000 kernel.sched_rt_runtime_us = 950000 kernel.sched_schedstats = 0 kernel.sched_shares_window_ns = 10000000 kernel.sched_time_avg_ms = 1000 kernel.sched_tunable_scaling = 1 kernel.sched_wakeup_granularity_ns = 4000000 kernel.seccomp.actions_avail = kill trap errno trace allow kernel.seccomp.actions_logged = kill trap errno trace kernel.sem = 250 32000 32 128 kernel.sem_next_id = -1 kernel.shm_next_id = -1 kernel.shm_rmid_forced = 0 kernel.shmall = 18446744073692774399 kernel.shmmax = 18446744073692774399 kernel.shmmni = 4096 kernel.softlockup_all_cpu_backtrace = 0 kernel.softlockup_panic = 0 kernel.spl.hostid = 0 kernel.spl.kmem.slab_kmem_alloc = 0 kernel.spl.kmem.slab_kmem_max = 0 kernel.spl.kmem.slab_kmem_total = 0 kernel.spl.kmem.slab_vmem_alloc = 305947392 kernel.spl.kmem.slab_vmem_max = 732324608 kernel.spl.kmem.slab_vmem_total = 347979264 kernel.spl.version = SPL v0.7.12-1 kernel.stack_tracer_enabled = 0 kernel.sysctl_writes_strict = 1 kernel.sysrq = 16 kernel.tainted = 12289 kernel.threads-max = 4126958 kernel.timer_migration = 1 kernel.traceoff_on_warning = 0 kernel.unknown_nmi_panic = 0 kernel.usermodehelper.bset = 4294967295 31 kernel.usermodehelper.inheritable = 4294967295 31 kernel.version = #1 SMP Fri Feb 1 14:54:57 UTC 2019 kernel.watchdog = 1 kernel.watchdog_cpumask = 0-191 kernel.watchdog_thresh = 10 kernel.yama.ptrace_scope = 0 sunrpc.max_resvport = 1023 sunrpc.min_resvport = 665 sunrpc.nfs_debug = 0x0000 sunrpc.nfsd_debug = 0x0000 sunrpc.nlm_debug = 0x0000 sunrpc.rpc_debug = 0x0000 sunrpc.tcp_fin_timeout = 15 sunrpc.tcp_max_slot_table_entries = 65536 sunrpc.tcp_slot_table_entries = 2 sunrpc.transports = tcp 1048576 sunrpc.transports = udp 32768 sunrpc.transports = tcp-bc 1048576 sunrpc.udp_slot_table_entries = 16 user.max_ipc_namespaces = 2063479 user.max_mnt_namespaces = 2063479 user.max_pid_namespaces = 2063479 user.max_user_namespaces = 0 user.max_uts_namespaces = 2063479 vm.admin_reserve_kbytes = 8192 vm.block_dump = 0 vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500 vm.drop_caches = 0 vm.extfrag_threshold = 500 vm.hugepages_treat_as_movable = 0 vm.hugetlb_shm_group = 0 vm.laptop_mode = 0 vm.legacy_va_layout = 0 vm.lowmem_reserve_ratio = 256 256 32 vm.max_map_count = 65530 vm.memory_failure_early_kill = 0 vm.memory_failure_recovery = 1 vm.min_free_kbytes = 90112 vm.min_slab_ratio = 5 vm.min_unmapped_ratio = 1 vm.mmap_min_addr = 4096 vm.mmap_rnd_bits = 28 vm.mmap_rnd_compat_bits = 8 vm.nr_hugepages = 0 vm.nr_hugepages_mempolicy = 0 vm.nr_overcommit_hugepages = 0 vm.nr_pdflush_threads = 0 vm.numa_zonelist_order = default vm.oom_dump_tasks = 1 vm.oom_kill_allocating_task = 0 vm.overcommit_kbytes = 0 vm.overcommit_memory = 0 vm.overcommit_ratio = 50 vm.page-cluster = 3 vm.panic_on_oom = 0 vm.percpu_pagelist_fraction = 0 vm.stat_interval = 1 vm.swappiness = 60 vm.user_reserve_kbytes = 131072 vm.vfs_cache_pressure = 100 vm.zone_reclaim_mode = 0
讓我知道我是否可以包括任何其他有用的東西。
這是免費贈品,因為我認為工作範圍擴展到付費 ZFS 諮詢:
- 您的機箱是如何佈線的?
- 您有 12 個外部 JBOD,但沒有指示啟用多路徑
- 考慮離線磁碟與機箱和 zpool 的關係
- 在使用這麼多機箱時,我總是提倡使用 SAS 佈線環拓撲
- 如果那不到位,我會朝著它努力
- 在這種情況下,您的池也應該由多路徑
/dev/mapper
設備組成- 你能展示你的
/etc/modprobe.d/zfs.conf
嗎?- 所有磁碟都是 SAS 嗎?
SAS 多路徑佈線範例: