Mysql

MariaDB 記憶體峰值和崩潰

  • February 8, 2021

我們在 GKE 上執行 MariaDB 10.5.8 伺服器,RAM 為 16Gb。伺服器每天有多次意外的記憶體使用峰值導致伺服器崩潰

1天記憶體使用圖 (橙色線是 k8s 請求的 ram)

一些額外的細節

  • 伺服器有 13.4 GB 的可用記憶體(不包括 mysql)
  • 即使在安靜的日子裡也會發生(比如今天)
  • QPS:~150(5% 更新,3% 插入)
  • 平均連接數 50-150
  • 沒有異常的網路流量
  • slow_query_log沒有顯示任何有用的東西

我在這裡想念什麼?伺服器記憶體不足怎麼辦?

下一步將啟用general_log並嘗試查看我是否可以捕捉到崩潰前發生的情況。

配置

[mysqld]
skip-name-resolve
explicit_defaults_for_timestamp
character-set-server=UTF8
collation-server=utf8_general_ci
sql_mode=TRADITIONAL

innodb_buffer_pool_size=4G

tmp_table_size=32M
max_heap_table_size=32M

net_read_timeout=1800
net_write_timeout=1800

max_connections=300
open_files_limit=8192

預期的最大記憶體使用量

SELECT @@innodb_buffer_pool_size/1024/1024 as cur_buf, ROUND(
   ( @@GLOBAL.key_buffer_size                     
    + @@GLOBAL.query_cache_size 
    + @@GLOBAL.tmp_table_size 
    + @@GLOBAL.innodb_buffer_pool_size
    + @@GLOBAL.innodb_log_buffer_size 
    + @@GLOBAL.max_connections * ( 
        @@GLOBAL.sort_buffer_size
      + @@GLOBAL.read_buffer_size 
      + @@GLOBAL.read_rnd_buffer_size 
      + @@GLOBAL.join_buffer_size 
      + @@GLOBAL.thread_stack 
      + @@GLOBAL.binlog_cache_size)
   ) / 1024 / 1024, 1) `total MB`;


#cur_buf: 4096.00000000
# total MB: 5155.4

目前總索引大小

SELECT sum( ROUND(stat_value * @@innodb_page_size / 1024 / 1024, 2)) size_in_mb 
FROM mysql.innodb_index_stats 
WHERE stat_name = 'size' AND index_name != 'PRIMARY' ORDER BY `size_in_mb` DESC

# size_in_mb  6471.11

編輯

更新狀態 2021-02-08

……哦,哎呀!有一些以前沒有的問題!…

don't see a command prompt, try pressing enter.
[--] Status: +ARCHIVE +Aria +BLACKHOLE +CSV +InnoDB +MEMORY +MRG_MyISAM +MyISAM +PERFORMANCE_SCHEMA +SEQUENCE
[--] Data in InnoDB tables: 20.6G (Tables: 1680)
[OK] Total fragmented tables: 0

-------- Analysis Performance Metrics --------------------------------------------------------------
[--] innodb_stats_on_metadata: OFF
[OK] No stat updates during querying INFORMATION_SCHEMA.

-------- Security Recommendations ------------------------------------------------------------------
[OK] There are no anonymous accounts for any database users
[OK] All database users have passwords assigned
[--] There are 620 basic passwords in the list.

-------- CVE Security Recommendations --------------------------------------------------------------
[OK] NO SECURITY CVE FOUND FOR YOUR VERSION

-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 10h 14m 26s (4M q [128.720 qps], 295K conn, TX: 97G, RX: 1G)
[--] Reads / Writes: 89% / 11%
[--] Binary logging is disabled
[--] Physical Memory     : 13.7G
[--] Max MySQL memory    : 8.8G
[--] Other process memory: 0B
[--] Total buffers: 3.3G global + 18.9M per thread (300 max threads)
[--] P_S Max memory usage: 0B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 5.6G (41.01% of installed RAM)
[OK] Maximum possible memory usage: 8.8G (64.64% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 0% (19/4M)
[OK] Highest usage of available connections: 41% (125/300)
[OK] Aborted connections: 0.00%  (3/295567)
[OK] Query cache is disabled by default due to mutex contention on multiprocessor machines.
[OK] Sorts requiring temporary tables: 0% (6 temp sorts / 359K sorts)
[!!] Joins performed without indexes: 1244
[!!] Temporary tables created on disk: 54% (76K on disk / 140K total)
[OK] Thread cache hit rate: 99% (125 created / 295K connections)
[OK] Table cache hit rate: 27% (1K open / 6K opened)
[!!] table_definition_cache(400) is lower than number of tables(1882)
[OK] Open file limit used: 0% (16/32K)
[OK] Table locks acquired immediately: 100% (7K immediate / 7K locks)

-------- Performance schema ------------------------------------------------------------------------
[--] Performance schema is disabled.
[--] Memory used by P_S: 0B
[--] Sys schema isn't installed.

-------- ThreadPool Metrics ------------------------------------------------------------------------
[--] ThreadPool stat is enabled.
[--] Thread Pool Size: 4 thread(s).
[--] Using default value is good enough for your version (10.5.8-MariaDB)

-------- MyISAM Metrics ----------------------------------------------------------------------------
[!!] Key buffer used: 18.2% (24M used / 134M cache)
[OK] Key buffer size / total MyISAM indexes: 128.0M/4.0K

-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[!!] InnoDB buffer pool / data size: 3.0G/20.6G
[!!] Ratio InnoDB log file size / InnoDB Buffer pool size (3.125 %): 96.0M * 1/3.0G should be equal to 25%
[--] Number of InnoDB Buffer Pool Chunk : 24 for 1 Buffer Pool Instance(s)
[OK] Innodb_buffer_pool_size aligned with Innodb_buffer_pool_chunk_size & Innodb_buffer_pool_instances
[OK] InnoDB Read buffer efficiency: 99.99% (11893153063 hits/ 11894346836 total)
[!!] InnoDB Write Log efficiency: 21.89% (50454 hits/ 230456 total)
[OK] InnoDB log waits: 0.00% (0 waits / 280910 writes)

-------- Aria Metrics ------------------------------------------------------------------------------
[--] Aria Storage Engine is enabled.
[OK] Aria pagecache size / total Aria indexes: 128.0M/2.4M
[!!] Aria pagecache hit rate: 93.2% (1M cached / 77K reads)

-------- TokuDB Metrics ----------------------------------------------------------------------------
[--] TokuDB is disabled.

-------- XtraDB Metrics ----------------------------------------------------------------------------
[--] XtraDB is disabled.

-------- Galera Metrics ----------------------------------------------------------------------------
[--] Galera is disabled.

-------- Replication Metrics -----------------------------------------------------------------------
[--] Galera Synchronous replication: NO
[--] No replication slave(s) for this server.
[--] Binlog format: MIXED
[--] XA support enabled: ON
[--] Semi synchronous replication Master: OFF
[--] Semi synchronous replication Slave: OFF
[--] This is a standalone server

-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
   MySQL was started within the last 24 hours - recommendations may be inaccurate
   We will suggest raising the 'join_buffer_size' until JOINs not using indexes are found.
            See https://dev.mysql.com/doc/internals/en/join-buffer-size.html
            (specially the conclusions at the bottom of the page).
   When making adjustments, make tmp_table_size/max_heap_table_size equal
   Reduce your SELECT DISTINCT queries which have no LIMIT clause
   Performance schema should be activated for better diagnostics
   Consider installing Sys schema from https://github.com/mysql/mysql-sys for MySQL
   Consider installing Sys schema from https://github.com/FromDual/mariadb-sys for MariaDB
   Before changing innodb_log_file_size and/or innodb_log_files_in_group read 
Variables to adjust:
   join_buffer_size (> 256.0K, or always use indexes with JOINs)
   tmp_table_size (> 32M)
   max_heap_table_size (> 32M)
   table_definition_cache(400) > 1882 or -1 (autosizing if supported)
   performance_schema = ON enable PFS
   innodb_buffer_pool_size (>= 20.6G) if possible.
   innodb_log_file_size should be (=768M) if possible, so InnoDB total log files size equals to 25% of buffer pool size.

=====================================
2021-02-08 13:58:40 0x7ff6b3d11700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 11 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 20876 srv_active, 0 srv_shutdown, 16149 srv_idle
srv_master_thread log flush and writes: 37025
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 10455
OS WAIT ARRAY INFO: signal count 11626
RW-shared spins 2112, rounds 12450, OS waits 102
RW-excl spins 2416, rounds 5720, OS waits 122
RW-sx spins 146, rounds 1352, OS waits 17
Spin rounds per wait: 5.89 RW-shared, 2.37 RW-excl, 9.26 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 1707308132
Purge done for trx's n:o < 1707308131 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 422180334491048, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334538304, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334516824, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334534008, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334512528, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334503936, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334525416, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334508232, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334478160, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334529712, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334521120, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334499640, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334495344, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334486752, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334482456, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334473864, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334469568, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334465272, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: (null) ((null))
I/O thread 1 state: (null) ((null))
I/O thread 2 state: (null) ((null))
I/O thread 3 state: (null) ((null))
I/O thread 4 state: (null) ((null))
I/O thread 5 state: (null) ((null))
I/O thread 6 state: (null) ((null))
I/O thread 7 state: (null) ((null))
I/O thread 8 state: (null) ((null))
I/O thread 9 state: (null) ((null))
Pending normal aio reads:
Pending flushes (fsync) log: 0; buffer pool: 0
1196036 OS file reads, 362093 OS file writes, 291004 OS fsyncs
0.27 reads/s, 16384 avg bytes/read, 6.73 writes/s, 6.73 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 817, free list len 41154, seg size 41972, 2293 merges
merged operations:
insert 9206, delete mark 105324, delete 196
discarded operations:
insert 0, delete mark 0, delete 0
0.00 hash searches/s, 111476.96 non-hash searches/s
---
LOG
---
Log sequence number 966315111123
Log flushed up to   966315111015
Pages flushed up to 966313325596
Last checkpoint at  966304251703
0 pending log flushes, 0 pending chkp writes
283193 log i/o's done, 6.73 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 3254779904
Dictionary memory allocated 30676992
Buffer pool size   193560
Free buffers       88
Database pages     193472
Old database pages 71422
Modified db pages  688
Percent of dirty pages(LRU & free pages): 0.355
Max dirty pages percent: 90.000
Pending reads 0
Pending writes: LRU 0, flush list 0
Pages made young 2028651, not young 55580837
0.00 youngs/s, 0.55 non-youngs/s
Pages read 1194330, created 25139, written 78328
0.27 reads/s, 3.00 creates/s, 0.00 writes/s
Buffer pool hit rate 999 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 193472, unzip_LRU len: 0
I/O sum[1443]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 read views open inside InnoDB
Process ID=0, Main thread ID=0, state: sleeping
Number of rows inserted 118319, updated 165195, deleted 169952, read 9369336877
3.64 inserts/s, 2.55 updates/s, 0.18 deletes/s, 172245.89 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 28810
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

好吧,我想我偶然發現了這個問題……thread_cache_size

[OK] Thread cache hit rate: 99% (125 created / 295K connections)

預設thread_cache_size設置為256,在我們的例子中,MariaDB 處理兩種類型的負載,來自應用程序和客戶端的正常 SQL,以及偶爾的深度和復雜的分析查詢。

似乎自從連接完成後,執行緒將被保留以cache供將來使用,這導致執行緒不釋放它們的記憶體。

我不知道是否有另一種方式向 MariaDB 表明它應該更頻繁地回收其執行緒,但與此同時,降低thread_cache_size 解決了這個問題

引用自:https://serverfault.com/questions/1052679