Linux Watchdog - Automatic Recovery for Unresponsive Systems

Linux Watchdog - Automatic Recovery for Unresponsive Systems

- 4 mins

Introduction

Any Linux system — from a small edge device to a large server — can unexpectedly freeze. When that happens on a remote machine, the only “fix” might be a manual reboot, which isn’t much help if you’re nowhere near the hardware.

The Linux Watchdog is a built-in safety mechanism that can automatically restart an unresponsive system, reducing downtime and eliminating the need for someone to physically press the reset button.

1. How the Watchdog Works

A watchdog is essentially a countdown timer that must be regularly reset (“kicked”) by the system. If the timer expires without a kick, it assumes the system has crashed and performs a reboot.

Two main types exist:


2. Identifying Your Watchdog Devices

Watchdog devices appear in /dev. To check:

ls -l /dev | grep watchdog

#Example output:
crw-------  1 root root  10, 130 Aug 14 12:34 watchdog
crw-------  1 root root 252,   0 Aug 14 12:34 watchdog0

watch

What these mean:

Files are not readable that’s expected — these devices are not readable text files; they are control interfaces.


3. Manually Using the Watchdog

To start the watchdog timer:

echo 1 > /dev/watchdog

To keep it alive, feed it at intervals:

while true; do
    echo 1 > /dev/watchdog
    sleep 10
done

To stop it before the timeout expires:

echo V > /dev/watchdog

⚠️ If you start the watchdog and do not feed it, your system will reboot when the timer runs out.


4. Automating with the Watchdog Daemon

Linux provides a ready-to-use watchdog daemon that can monitor multiple health indicators.

sudo apt update
sudo apt install watchdog

watch

sudo systemctl enable watchdog
sudo systemctl start watchdog

watch

5. Configuring /etc/watchdog.conf

Edit the configuration file:

sudo nano /etc/watchdog.conf

Example:


#watchdog device
watchdog-device = /dev/watchdog

#interval
interval = 10

# Reboot if CPU load average exceeds this threshold
max-load-1 = 30

# Minimum free memory in MB
min-memory = 2

# Check if a specific network interface is up
interface = enp1s0

# Give watchdog process higher scheduling priority
realtime = yes
priority = 1

watch

After changes:

sudo systemctl restart watchdog

6. Watchdog in /sys/class/watchdog/

The watchdog device exposes runtime settings here:

ls /sys/class/watchdog/

Example output:

watchdog0

Check the current timeout:

cat /sys/class/watchdog/watchdog0/timeout

watch

Set a new timeout (e.g., 45 seconds):

echo 45 | sudo tee /sys/class/watchdog/watchdog0/timeout

7. Testing the Watchdog

You can simulate a system hang by overloading the CPU:

:(){ :|:& };:

Once the system becomes unresponsive, the watchdog should trigger a reboot.

8. Service-Level Watchdog with systemd

You can also monitor specific services so they are restarted automatically if they stop responding.

Example service unit:

[Unit]
Description=My Background Service

[Service]
ExecStart=/usr/local/bin/my-service
Restart=on-failure
WatchdogSec=20s

[Install]
WantedBy=multi-user.target

WatchdogSec tells systemd to expect periodic “keep-alive” messages from the service. If it doesn’t get them, it restarts the service.


9. Best Practices

Prefer hardware watchdogs if available. Choose reasonable timeouts — too short can cause false triggers, too long delays recovery.

Monitor logs with:

journalctl -u watchdog

or

#That need to edit conf file to send logs 
grep watchdog /var/log/syslog

Treat watchdogs as a last line of defense, not a substitute for fixing bugs.

Conclusion

The Linux watchdog acts like an invisible hand that presses the reset button when your system freezes. By configuring it properly, you can ensure that even if something goes wrong, recovery happens automatically and with minimal downtime. For remote servers, headless machines, or critical applications, enabling the watchdog can turn a risky single point of failure into a self-healing system.


Thanks for reading!

Guneycan Sanli

Guneycan Sanli

Guneycan Sanli

A person who like learning, music, travelling and sports.

comments powered by Disqus