This was one of the rare issues I encountered. The MySQL server on one of the client’s server was getting restarted quite frequently at random intervals. The server logs were quite identical with a forced process termination and the server recovered itself in the majority of the cases. This is quite different from a crash/FS error where an auto recovery is not common. Let us see how to deal with this MySQL frequent restart issue.
Corresponding logs ( Eg: the latest restart) were read as given below
131113 14:20:50 mysqld_safe Number of processes running now: 0
131113 14:20:5 0mysqld_safe mysqld restarted
131113 14:20:50[Note] Plugin ‘FEDERATED’is disabled.
131113 14:20:50 InnoDB: Initializing buffer pool, size = 4.0G
131113 14:20:51 InnoDB: Completed initialization of buffer pool
InnoDB: Log scan progressed past the checkpoint lsn 4142899941524
131113 14:20:51 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files…
The error log message “mysqld_safe Number of processes running now: 0” indicates scarcity for resources to pursue the operations.
Availability of primary memory was neither appreciable nor alarming. The MySQL server was poorly optimized and hence I went ahead and tweaked the server for optimal resource usage. This reduced the restart frequency and the server reported an uptime of more than 24 hours. Which was a significant improvement from the prior uptime values of 30-40 minutes.
During the monitoring phase, it is noted that MySqld showed high resource consumption in the form of high CPU usage. The usage had crossed 500% at times. Meanwhile, the MySQL processes were stacked up in the queue and showed a higher sleep time.
The slow log query file /var/lib/mysql/db-slow.log shows too many queries and indicates that some dbs as well as queries need to be optimized.
A check to identify the resource crunch and service termination using Out Of memory (OOM) Killer was performed in the syslog.
egrep -i "oom|kill|mysql" /var/log/messages
Nov 13 12:52:49 db1 prm(303104): process 280704exceeded resource limits, killed.
Nov 13 12:52:50 db1 prm(303104): check /usr/local/prm/killed/280704forprocess specific information.
Nov 13 13:32:48 db1 prm(375464): process 314307exceeded resource limits, killed.
Nov 13 13:32:48 db1 prm(375464): check /usr/local/prm/killed/314307forprocess specific information.
Nov 13 14:20:49 db1 prm(450579): process 387718exceeded resource limits, killed.
Nov 13 14:20:49 db1 prm(450579): check /usr/local/prm/killed/387718forprocess specific information.
[~]# mysqladmin versionmysqladmin Ver 8.42Distrib 5.1.70, forunknown-linux-gnu on x86_64
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective owners.
Server version 5.1.70-cll
Protocol version 10
Connection Localhost via UNIX socket
UNIX socket /var/lib/mysql/mysql.sock
Uptime: 4hours 11min 32sec
Threads: 10 Questions: 6679219 Slow queries: 166 Opens: 380202 Flush tables: 1 Open tables: 2048 Queries per second avg: 442.566
[~]# date
Wed Nov 13 18:32:27PST 2013
Nov 13 14:20:49 db1 prm(450579): process 387718exceeded resource limits, killed.
Nov 13 14:20:49 db1 prm(450579): check /usr/local/prm/killed/387718forprocess specific information.
# cat /usr/local/prm/killed/387718 USER: mysql PID : 387718 CMD : /usr/sbin/mysqld CPU%: 407 (limit: 85) MEM%: 10 (limit: 55) PROCS: 1 (limit: 75)
This confirms that the process killed was “MySQL” and it was executed by PRM.The MySQL service was terminated due to high CPU usage. The server uptime is perfectly in synchronization with the service termination time by PRM ie Nov 13 14:20:49 . From the data, it is evident that PRM kills the processes which exceed the set threshold and creates a resource bottleneck for the server performance.