Centos5

Linux CentOS:伺服器拋出奇怪的錯誤資訊,你能推測出原因嗎?

  • January 20, 2015

我公司有一台伺服器,它是生產環境的一部分。在伺服器上有一個 ActiveMQ 伺服器正在執行。我登錄到 Active-MQ UI 並嘗試創建一個新隊列。當我這樣做時,我收到了這條消息:

HTTP ERROR: 500

/workspace/development/org/apache/activemq/5.1.0/data/kr-store/data/data-container-roots-2 (Read-only file system)
RequestURI=/admin/createDestination.action

Caused by:

java.io.FileNotFoundException: /workspace/development/org/apache/activemq/5.1.0/data/kr-store/data/data-container-roots-2 (Read-only file system)
   at java.io.RandomAccessFile.open(Native Method)
   at java.io.RandomAccessFile.(RandomAccessFile.java:212)
   at org.apache.activemq.kaha.impl.data.DataFile.getRandomAccessFile(DataFile.java:51)
   at org.apache.activemq.kaha.impl.data.SyncDataFileWriter.storeItem(SyncDataFileWriter.java:71)

我知道“找不到文件”消息,但它似乎與問題沒有直接關係。

為了解決這個問題,我登錄到伺服器並執行了一些測試,在這些測試中我發現我嘗試執行的一些基本命令失敗並出現相同的錯誤:

[root@ctrl3 kr-store]# touch 1
touch: cannot touch `1': Read-only file system
[root@ctrl3 /]# chgrp users /workspace
chgrp: changing group of `/workspace': Read-only file system
[root@ctrl3 kr-store]# chown peeradmin.users /workspace
chown: changing ownership of `/workspace': Read-only file system
[root@ctrl3 kr-store]# ls -ld data
drwxrwxr-x 2 peeradmin users 4096 AUG 12 12:27 data
[root@ctrl3 kr-store]# chmod o+w data/
chmod: changing permissions of `data/': Read-only file system

如果我沒記錯的話,上次遇到這樣的錯誤,後來我們發現是磁碟有I/O問題,但如果不是這樣,那還能是什麼?

編輯#1:

[root@ctrl3 kr-store]# cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 ro,data=ordered 0 0
/dev /dev tmpfs rw 0 0
/proc /proc proc rw 0 0
/sys /sys sysfs rw 0 0
/proc/bus/usb /proc/bus/usb usbfs rw 0 0
devpts /dev/pts devpts rw 0 0
/dev/sda7 /tmp ext3 rw,data=ordered 0 0
/dev/VolGroup00/LogVol00 /workspace ext3 ro,data=ordered 0 0
/dev/sda5 /usr ext3 rw,data=ordered 0 0
/dev/sda3 /var ext3 rw,data=ordered 0 0
/dev/sda1 /boot ext3 rw,data=ordered 0 0
tmpfs /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
/etc/auto.misc /misc autofs rw,fd=7,pgrp=3795,timeout=300,minproto=5,maxproto=5,indirect 0 0
-hosts /net autofs rw,fd=13,pgrp=3795,timeout=300,minproto=5,maxproto=5,indirect 0 0
atlas.sj.company.com:/volumes/atlas_vol/NFS1 /nfs1 nfs rw,noatime,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=atlas.sj.company.com 0 0
atlas.sj.company.com:/volumes/atlas_vol/NFS1/NIS/home /home nfs rw,noatime,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=atlas.sj.company.com 0 0
atlas.sj.company.com:/volumes/atlas_vol/NFS1 /nfs1 nfs rw,noatime,vers=3,rsize=1048576,wsize=1048576,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=atlas.sj.company.com 0 0
atlas.sj.company.com:/volumes/atlas_vol/NFS1/NIS/home /home nfs rw,noatime,vers=3,rsize=1048576,wsize=1048576,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=atlas.sj.company.com 0 0

斯文:日誌什麼也沒說:

[root@ctrl3 kr-store]# cat /var/log/messages |grep -v [xinetd\|snmpd]
[root@ctrl3 kr-store]#

另外,如果什麼都不能寫入磁碟,那麼我猜日誌也不能更新。

編輯#2:所以看起來文件系統已經以某種方式損壞了……我是對的嗎?

SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
ext3_abort called.
EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
sd 1:0:0:0: SCSI error: return code = 0x06000000
end_request: I/O error, dev sdb, sector 745962211
printk: 215 messages suppressed.
Buffer I/O error on device dm-0, logical block 51773423
lost page write due to I/O error on dm-0
Buffer I/O error on device dm-0, logical block 51773424
lost page write due to I/O error on dm-0
Buffer I/O error on device dm-0, logical block 51773425
lost page write due to I/O error on dm-0

提前致謝,

您的文件系統似乎以只讀方式安裝。您可以通過cat /proc/mounts. 以只讀方式重新掛載的文件系統通常是由文件系統錯誤引起的。其原因可能是硬碟問題,因此您應該檢查您的磁碟(SMART 值、硬體 RAID 情況下的控制器狀態等)

編輯#1:您的安裝表明它確實是只讀安裝的:

/dev/VolGroup00/LogVol00 /workspace ext3 ro,data=ordered 0 0

您可以嘗試將捲重新掛載為可寫,但在您發現它之前為什麼以只讀方式重新掛載之前,我不建議這樣做,否則您將面臨失去數據的風險:

mount -o remount,rw /workspace

在任何情況下,您都應該首先檢查輸出dmesg並通過smartctl.

編輯#2:

似乎 sdb 是這裡的物理問題:

end_request: I/O error, dev sdb, sector 745962211

檢查輸出

smartctl -a /dev/sdb

引用自:https://serverfault.com/questions/660899