If you are monitoring large environments you may be are using Mod_Gearman to spread the execution of checks using multiple worker nodes.
However, unfortunately I had the problem that the Gearman Job Server eats 100% of my CPU time
if more than 450 workers tried to connect to it and gearadmin --status
stuck and
did not return information anymore.
In the log file /var/log/gearman-job-server/gearman.log
I found the following message:
ERROR 2015-04-14 22:02:54.000000 [ main ] accept(Too many open files) -> libgearman-server/gearmand.cc:788
By default the Linux kernel set a limit of 1024
open files which is bad for
MySQL servers or the Gearman Job Server.
To fix this issue, you need to increase this limit.
Edit the file /etc/init.d/gearman-job-server
like this:
# Description: Enable gearman job server
### END INIT INFO
ulimit -n 16384 # <--- Add this line
prefix=/usr
exec_prefix=${prefix}
And restart: /etc/init.d/gearman-job-server restart
Edit the file /etc/init/gearman-job-server.conf
like this:
respawn
limit nofile 16384 16384 # <--- Add this line
exec start-stop-daemon --start --chuid gearman --exec ...
And restart: service gearman-job-server restart
Edit the file /etc/systemd/system/multi-user.target.wants/gearman-job-server.service
like this:
PIDFile=/run/gearman/server.pid
LimitNOFILE=16384
ExecStart=/usr/sbin/gearmand --listen=127.0.0.1 ...
And restart:
systemctl daemon-reload
systemctl restart gearman-job-server
To make sure that the new limit is enabled, you should check the file /proc/$PID$/limits
:
root@ubuntu-dev:~# cat /proc/2945/limits | grep -i 'max open files'
Max open files 16384 16384 files
If you are interested in, this is a screenshot showing the system: