什麼會導致 mysqlcheck 錯誤地將表報告為未損壞?
我們正在為我們的一個客戶管理一個 MySQL 伺服器,該客戶擁有超過 100 個數據庫,每個數據庫大約有 50 個表,其中許多是 InnoDB 表。伺服器崩潰了,我正試圖找到罪魁禍首。重新啟動時
innodb_force_recovery = 2
,我可以連接並且在 error.log 中看不到任何錯誤。更重要的是,mysqlcheck --all-databases
所有表格都報告“Ok”。但是當我刪除時innodb_force_recovery
,伺服器再次崩潰,將堆棧跟踪寫入 error.log 並且只能使用kill -9
.在這些情況下,我應該如何找到有問題的數據庫,以及什麼會導致 mysqlcheck 錯過損壞的表?請不要告訴我忽略它並從轉儲中恢復所有數據庫。這對於一個或兩個數據庫來說可能是可以接受的,並且如果它在藍月亮中只發生一次,但是我不止一次在同一台伺服器上遇到問題,並且從轉儲中恢復所有內容只需要太多的時間和手動工作來做到這一點時間。
伺服器版本為 5.5.46 並且
innodb_file_per_table
處於活動狀態。按要求摘錄 error.log(是否
The tablespace free space info is corrupt
意味著存在不在特定表中且無法更正的錯誤?):180222 17:13:48 mysqld_safe Starting mysqld daemon with databases from /home/mysql 180222 17:13:48 [Warning] 'THREAD_CONCURRENCY' is deprecated and will be removed in a future release. 180222 17:13:48 [Note] /usr/libexec/mysqld (mysqld 5.5.46) starting as process 26242 ... 180222 17:13:48 [Note] Plugin 'FEDERATED' is disabled. 180222 17:13:48 InnoDB: The InnoDB memory heap is disabled 180222 17:13:48 InnoDB: Mutexes and rw_locks use InnoDB's own implementation 180222 17:13:48 InnoDB: Compressed tables use zlib 1.2.3 180222 17:13:48 InnoDB: Using Linux native AIO 180222 17:13:48 InnoDB: Initializing buffer pool, size = 128.0M 180222 17:13:49 InnoDB: Completed initialization of buffer pool 180222 17:13:49 InnoDB: highest supported file format is Barracuda. 180222 17:13:49 InnoDB: Waiting for the background threads to start 180222 17:13:50 InnoDB: 5.5.46 started; log sequence number 1632912830888 180222 17:13:50 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306 180222 17:13:50 [Note] - '0.0.0.0' resolves to '0.0.0.0'; 180222 17:13:50 [Note] Server socket created on IP: '0.0.0.0'. 180222 17:13:50 [Note] Event Scheduler: Loaded 0 events 180222 17:13:50 [Note] /usr/libexec/mysqld: ready for connections. Version: '5.5.46' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Community Server (GPL) by Remi InnoDB: Dump of the tablespace extent descriptor: len 40; hex 000000000000000200000000061600000000126e00000004ffffffffffffffffffffffffffffbfaa; asc n ; InnoDB: Serious error! InnoDB is trying to free page 512 InnoDB: though it is already marked as free in the tablespace! InnoDB: The tablespace free space info is corrupt. InnoDB: You may need to dump your InnoDB tables and recreate the whole InnoDB: database! InnoDB: Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 180222 17:13:50 InnoDB: Assertion failure in thread 2499464080 in file fsp0fsp.c line 3309 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 16:13:50 UTC - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=268435456 read_buffer_size=1048576 max_used_connections=0 max_threads=512 thread_count=0 connection_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1314506 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0 thread_stack 0x30000 /usr/libexec/mysqld(my_print_stacktrace+0x33)[0x842a1f3] /usr/libexec/mysqld(handle_fatal_signal+0x42b)[0x82d9d3b] [0x7bc420] [0x7bc410] /lib/libc.so.6(gsignal+0x50)[0x626b10] /lib/libc.so.6(abort+0x101)[0x628421] /usr/libexec/mysqld[0x85012e7] /usr/libexec/mysqld[0x850147e] /usr/libexec/mysqld[0x849c0b1] /usr/libexec/mysqld[0x84a8a61] /usr/libexec/mysqld[0x8561fef] /usr/libexec/mysqld[0x85570a9] /usr/libexec/mysqld[0x847b082] /usr/libexec/mysqld[0x846bf04] /usr/libexec/mysqld[0x846dad4] /lib/libpthread.so.0[0x50d912] /lib/libc.so.6(clone+0x5e)[0x6d347e] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 180222 17:13:50 mysqld_safe Number of processes running now: 0 180222 17:13:50 mysqld_safe mysqld restarted
正如建議的那樣,從這些轉儲中轉儲和恢復是我最終選擇恢復所有數據庫的方式。幸運的是,
innodb_force_recovery = 2
允許我無錯誤地轉儲所有內容,這樣我就不必使用備份中的轉儲。當然,我寧願找出錯誤的真正原因,但 MySQL 除了在錯誤報告中提到“表空間可用空間資訊已損壞”之外沒有提供任何幫助。如果不辨識和消除錯誤的原因,我希望它最終會再次發生——我們的客戶可能會比我更生氣。也許壞硬體是罪魁禍首,但係統中所有磁碟的 SMART 數據看起來都不錯,並且
/var/log/messages
在崩潰時不包含任何可疑的東西。也沒有意外斷電或重新啟動。
錯誤消息本身告訴您為什麼驗證表數據不能解決問題。它試圖釋放一個頁面(可能從表或索引中);但該頁面已被標記為免費。換句話說,當頁面被認為可以提供給另一個表或索引時,您的一個表或索引正在使用一個頁面。而且,很明顯,如果 InnoDB 不知道哪些頁面實際上是空閒的,就會發生壞事。
轉儲所有表/索引數據並重新載入,意味著讓 InnoDB 有機會重建其空閒頁面的集合。理想情況下,您會在新數據庫上執行此操作。為什麼?好吧,你永遠不應該假設只有一個單一的、單獨的損壞錯誤。因此,如果您將數據移動到全新安裝,您不必擔心是否存在其他未檢測到的損壞問題。