什麼會導致 mysqlcheck 錯誤地將表報告為未損壞？

March 24, 2021

我們正在為我們的一個客戶管理一個 MySQL 伺服器，該客戶擁有超過 100 個數據庫，每個數據庫大約有 50 個表，其中許多是 InnoDB 表。伺服器崩潰了，我正試圖找到罪魁禍首。重新啟動時innodb_force_recovery = 2，我可以連接並且在 error.log 中看不到任何錯誤。更重要的是，mysqlcheck --all-databases所有表格都報告“Ok”。但是當我刪除時innodb_force_recovery，伺服器再次崩潰，將堆棧跟踪寫入 error.log 並且只能使用kill -9.

在這些情況下，我應該如何找到有問題的數據庫，以及什麼會導致 mysqlcheck 錯過損壞的表？請不要告訴我忽略它並從轉儲中恢復所有數據庫。這對於一個或兩個數據庫來說可能是可以接受的，並且如果它在藍月亮中只發生一次，但是我不止一次在同一台伺服器上遇到問題，並且從轉儲中恢復所有內容只需要太多的時間和手動工作來做到這一點時間。

伺服器版本為 5.5.46 並且innodb_file_per_table處於活動狀態。

按要求摘錄 error.log（是否The tablespace free space info is corrupt意味著存在不在特定表中且無法更正的錯誤？）：

180222 17:13:48 mysqld_safe Starting mysqld daemon with databases from /home/mysql
180222 17:13:48 [Warning] 'THREAD_CONCURRENCY' is deprecated and will be removed in a future release.
180222 17:13:48 [Note] /usr/libexec/mysqld (mysqld 5.5.46) starting as process 26242 ...
180222 17:13:48 [Note] Plugin 'FEDERATED' is disabled.
180222 17:13:48 InnoDB: The InnoDB memory heap is disabled
180222 17:13:48 InnoDB: Mutexes and rw_locks use InnoDB's own implementation
180222 17:13:48 InnoDB: Compressed tables use zlib 1.2.3
180222 17:13:48 InnoDB: Using Linux native AIO
180222 17:13:48 InnoDB: Initializing buffer pool, size = 128.0M
180222 17:13:49 InnoDB: Completed initialization of buffer pool
180222 17:13:49 InnoDB: highest supported file format is Barracuda.
180222 17:13:49  InnoDB: Waiting for the background threads to start
180222 17:13:50 InnoDB: 5.5.46 started; log sequence number 1632912830888
180222 17:13:50 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
180222 17:13:50 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
180222 17:13:50 [Note] Server socket created on IP: '0.0.0.0'.
180222 17:13:50 [Note] Event Scheduler: Loaded 0 events
180222 17:13:50 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.46'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL) by Remi
InnoDB: Dump of the tablespace extent descriptor:  len 40; hex 000000000000000200000000061600000000126e00000004ffffffffffffffffffffffffffffbfaa; asc                    n                    ;
InnoDB: Serious error! InnoDB is trying to free page 512
InnoDB: though it is already marked as free in the tablespace!
InnoDB: The tablespace free space info is corrupt.
InnoDB: You may need to dump your InnoDB tables and recreate the whole
InnoDB: database!
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
180222 17:13:50  InnoDB: Assertion failure in thread 2499464080 in file fsp0fsp.c line 3309
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
16:13:50 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=268435456
read_buffer_size=1048576
max_used_connections=0
max_threads=512
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1314506 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x30000
/usr/libexec/mysqld(my_print_stacktrace+0x33)[0x842a1f3]
/usr/libexec/mysqld(handle_fatal_signal+0x42b)[0x82d9d3b]
[0x7bc420]
[0x7bc410]
/lib/libc.so.6(gsignal+0x50)[0x626b10]
/lib/libc.so.6(abort+0x101)[0x628421]
/usr/libexec/mysqld[0x85012e7]
/usr/libexec/mysqld[0x850147e]
/usr/libexec/mysqld[0x849c0b1]
/usr/libexec/mysqld[0x84a8a61]
/usr/libexec/mysqld[0x8561fef]
/usr/libexec/mysqld[0x85570a9]
/usr/libexec/mysqld[0x847b082]
/usr/libexec/mysqld[0x846bf04]
/usr/libexec/mysqld[0x846dad4]
/lib/libpthread.so.0[0x50d912]
/lib/libc.so.6(clone+0x5e)[0x6d347e]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
180222 17:13:50 mysqld_safe Number of processes running now: 0
180222 17:13:50 mysqld_safe mysqld restarted

正如建議的那樣，從這些轉儲中轉儲和恢復是我最終選擇恢復所有數據庫的方式。幸運的是，innodb_force_recovery = 2允許我無錯誤地轉儲所有內容，這樣我就不必使用備份中的轉儲。當然，我寧願找出錯誤的真正原因，但 MySQL 除了在錯誤報告中提到“表空間可用空間資訊已損壞”之外沒有提供任何幫助。如果不辨識和消除錯誤的原因，我希望它最終會再次發生——我們的客戶可能會比我更生氣。
也許壞硬體是罪魁禍首，但係統中所有磁碟的 SMART 數據看起來都不錯，並且/var/log/messages在崩潰時不包含任何可疑的東西。也沒有意外斷電或重新啟動。

錯誤消息本身告訴您為什麼驗證表數據不能解決問題。它試圖釋放一個頁面（可能從表或索引中）；但該頁面已被標記為免費。換句話說，當頁面被認為可以提供給另一個表或索引時，您的一個表或索引正在使用一個頁面。而且，很明顯，如果 InnoDB 不知道哪些頁面實際上是空閒的，就會發生壞事。
轉儲所有表/索引數據並重新載入，意味著讓 InnoDB 有機會重建其空閒頁面的集合。理想情況下，您會在新數據庫上執行此操作。為什麼？好吧，你永遠不應該假設只有一個單一的、單獨的損壞錯誤。因此，如果您將數據移動到全新安裝，您不必擔心是否存在其他未檢測到的損壞問題。

引用自：https://serverfault.com/questions/898439

什麼會導致 mysqlcheck 錯誤地將表報告為未損壞？

相關問答

MySQL 5.5 InnoDB = 關閉？

mysql容災

MySQL 無法啟動一世nnoDBRepair一世nn這D乙R和p一種一世rInnoDB Repair

Systemd 和災難恢復備用系統

在 MySQL/InnoDB 中為事務設置時間限制

從 *.sql 導入 mysql… IBD 文件失去