Validator Healthcheck Alerts
TLDR
Simple and lightweight health checker service for DVT operators or home stakers running >10 validator keys to monitor client-level health and send alerts via Telegram otherwise.
Healthchecks.io (Free!) notifies users if the health checker service itself is offline.
Follow me on Twitter or subscribe to my newsletter if you find this useful for you!
Special thanks to Eridian for sharing about healthchecks.io with me! He also has a more scalable monitoring/alerting setup which you can check out here.
How it works
HTTP & CURL Requests: The service periodically makes an HTTP GET or CURL request to the specified IP address, port, health, & metrics endpoints. e.g. http://127.0.0.1:8008 (CL health endpoint). This requests the server to respond with their "health" status.
Concurrent "check-in" with Healthchecks.io: Each time the service runs successfully, it sends a message to Healthchecks.io, informing it of the successful run.
Service Response: For the service to consider the consensus or execution client to be "up," the service at the IP address and port must respond to the HTTP request. This typically involves the client's API or a health check endpoint responding with an HTTP status code of 200 OK, indicating that the service is operational and can handle requests.
Timeout Handling: The service includes a timeout (e.g., 5 seconds), ensuring that if the client doesn't respond within a reasonable timeframe, it's considered "down." This helps differentiate between an unresponsive service and one that's simply slow to reply.
Send Alert Message: Finally, the script sends a message to the chat groups designated for each endpoint that was checked if it is “down” or “cannot be reached”.
Self-Reporting: Healthchecks.io notifies the user if the alerting service itself is down.
How to use
Preparation
Telegram bot token & Chat ID
Open Telegram and search for @BotFather
Start a chat with BotFather by clicking the Start button
Send the command:
/newbot
Follow the prompts:
Name your bot (e.g., MyCoolBot).
Choose a username that ends with bot (e.g., my_cool_bot).
Once created, BotFather will send you a message with your
bot token
:
123456789:ABCDefGhIJKLMnoPQRStUvWxYZ
Save this token somewhere safe.
Go to your newly created bot on Telegram (search for its username, e.g., @my_cool_bot) and start the bot by clicking the
Start
button.Create a new group & add your bot to the group
Send a message in the group (e.g., "Hello, bot!").
Open up a browser and enter the following (Replace YOUR_BOT_TOKEN with your bot's token)
https://api.telegram.org/botYOUR_BOT_TOKEN/getUpdates
Look for the chat object in the response. The
id
under chat is your groupChat ID
(e.g.,-987654321
). For example:
{
"ok": true,
"result": [
{
"update_id": 123456789,
"message": {
"chat": {
"id": -987654321,
"title": "MyGroup",
"type": "group"
},
"text": "Hello, bot!"
}
}
]
}
You now have your BOT_TOKEN
and CHAT_ID
.
Healthchecks.io URL
Go to Healthchecks.io
Sign-up/Log-in with your preferred method (email, GitHub, or Google)
Click on "Add Check" on the dashboard. Select 5 minutes for the Period and 10 minutes for the Grace Time
On the check's configuration page, you’ll find a "Ping URL". It looks like this:
https://hc-ping.com/your-unique-id
Go to your Integrations page and select the Telegram integration: Profile>>Integrations>>Telegram
Enter your Telegram Bot Token and Chat ID
Add the Healthchecks.io bot into your Telegram chat group:
@Healthchecks_io_bot
You now have your HEALTHCHECK_URL
and have both your own Telegram bot and the Healthchecks.io bot in your Telegram group. e.g.
Make sure your monitoring endpoints are accessible
All Users:
SSV node: Enable the health endpoint by adding
SSVAPIPort: 16000
into the config.yaml file.SSV DKG & Obol Charon: No changes needed as we will query the P2P ports for these.
Systemd & EthPillar Users:
Execution clients: Add the
--Metrics.Enabled true
and--Metrics.ExposePort 6060
or equivalent flagsConsensus client: Add the
--metrics
and--metrics-port=8008
or equivalent flagsValidator client: Add the
--metrics
and--metrics-port=8009
or equivalent flags
Notes:
I prefer not to expose the EL's HTTP (8545) or CL's RPC (5052) ports to this service to reduce the risk of malicious code injection
Nevertheless, do not expose these ports to the public internet
Eth Docker Users:
Map the monitoring ports of your EL, CL, & VC to the host
nano ~/eth-docker/cl-shared.yml
Replace with the following for all CLs::
services:
consensus:
ports:
- ${SHARE_IP:-}:${CL_REST_PORT:-5052}:${CL_REST_PORT:-5052}/tcp
- ${SHARE_IP:-}:8008:8008/tcp
validator:
ports:
- ${SHARE_IP:-}:8009:8009/tcp
nano ~/eth-docker/el-shared.yml
Replace with the following for Geth EL:
# To be used in conjunction with erigon.yml, nethermind.yml, besu.yml or geth.yml
services:
execution:
ports:
- ${SHARE_IP:-}:${EL_RPC_PORT}:${EL_RPC_PORT:-8545}/tcp
- ${SHARE_IP:-}:${EL_WS_PORT}:${EL_WS_PORT:-8546}/tcp
- ${SHARE_IP:-}:6061:6061/tcp
Replace with the following for all other ELs:
# To be used in conjunction with erigon.yml, nethermind.yml, besu.yml or geth.yml
services:
execution:
ports:
- ${SHARE_IP:-}:${EL_RPC_PORT}:${EL_RPC_PORT:-8545}/tcp
- ${SHARE_IP:-}:${EL_WS_PORT}:${EL_WS_PORT:-8546}/tcp
- ${SHARE_IP:-}:6060:6060/tcp
Edit the .env
file.
nano ~/eth-docker/.env
Append :el-shared.yml:cl-shared.yml
into the COMPOSE_FILE=
line. Additionally, for Geth EL, add --metrics.port 6061
the following into the EL_EXTRAS=
line.
Restart Eth Docker.
ethd up
Notes:
You may need to make this edit every time you update your Eth Docker repository
Do not expose these ports to the public internet
Setup
Install dependencies - curl, wget, nano, git, docker
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget nano git
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo groupadd docker
sudo usermod -aG docker $USER
Log out and then back in again for the new user group settings to take effect.
exit
Clone this git repository
cd && git clone https://github.com/samuelclk/validator-healthchecks.git
Pull requests welcome at https://github.com/samuelclk/validator-healthchecks
Create a .env
nano ~/validator-healthchecks/.env
Paste the following content in your .env
file and replace BOT_TOKEN=TELEGRAM_BOT_TOKEN
, CHAT_ID=GROUP_CHAT_ID
, & HEALTHCHECK_URL=https://hc-ping.com/your-unique-id
with your actual credentials.
BOT_TOKEN=TELEGRAM_BOT_TOKEN
CHAT_ID=GROUP_CHAT_ID
HEALTHCHECK_URL=https://hc-ping.com/your-unique-id
# HTTP-based endpoints monitored via HTTP GET requests for EL, CL, & SSV nodes
HTTP_CONSENSUS_CLIENT=http://127.0.0.1:8008/health
HTTP_SSV_NODE=http://127.0.0.1:16000/v1/node/health
## Add more CL or SSV node endpoints here as needed in this format: "HTTP_XX_XX". "HTTP" must be the prefix.
# Command-based monitoring for validator clients
COMMAND_VALIDATOR_CLIENT=curl -s http://127.0.0.1:8009/metrics | grep -E -q '(get_validators_liveness|beacon_attestation_included_total|lighthouse_validator_beacon_node_requests_total|nimbus_validator_attestations_total)' && echo "200 OK" || echo "500 ERROR"
## For Geth EL
COMMAND_GETH_EL=curl -s http://127.0.0.1:6061/debug/metrics | tr -d "\n" | grep -E -q "\"chain/head/block\":[[:space:]]*([0-9]+).*\"chain/head/header\":[[:space:]]*\\1" && echo "200 OK" || echo "500 ERROR"
## For other ELs
COMMAND_EXECUTION_CLIENT=curl -s http://127.0.0.1:6060/metrics | tr -d '\n' | grep -E -q 'ethereum_blockchain_height.*} [0-9]+.*ethereum_best_known_block_number.*} [0-9]+' && echo "200 OK" || echo "500 ERROR"
## Add more VC or EL endpoints here as needed in this format: "COMMAND_XX_XX=". "COMMAND" must be the prefix. Change only the "127.0.0.1:8009" part of the variable
# P2P-based nodes for Obol Charon & SSV DKG (host:port format)
P2P_OBOL_CHARON=127.0.0.1:3610
P2P_SSV_DKG=127.0.0.1:3030
## Add more Obol Charon or SSV DKG endpoints here as needed in this format: "P2P_XX_XX=". "P2P" must be the prefix.
Save & exit with CTRL+O
, ENTER
, CTRL+X
.
Some context:
My first priority are the "health" endpoints that do not require exposing the HTTP or RPC ports where possible
If those are not available, I query the "metrics" endpoints for a proxy of "health"
If neither are available, I query the P2P endpoint directly to ask "are you there?" to those clients
Build the docker image.
cd ~/validator-healthchecks
docker compose build
Run the tool
Run the health checker service as a docker service.
docker compose up -d
Check for errors.
docker logs validator-healthchecks -f --tail 20
You should see the following:
Execution Client is healthy
Consensus Client is healthy.
Ssv Node is healthy.
Validator Client is healthy.
Obol Charon is up.
Ssv Dkg is up.
Healthcheck ping successful.
This health checker service runs once every 5 minutes and sends a message to Healthchecks.io immediately after.
If any of your monitored services are unhealthy or unreachable, it will send a telegram message to your designated group chat via your Telegram bot.
If Healthchecks.io does not receive a message from your health checker service, it will send a message to your Telegram chat group to notify that your health checker service itself is offline.
Restarting after editing the .env
file
.env
fileYou will need to rebuild the docker image every time you edit the .env
file.
cd ~/validator-healthchecks
docker compose down
docker compose up -d --build
More Context
Most solo/home stakers rely on beaconcha.in watchlists to notify them when their validators are missing attestations.
However, in some scenarios, this method no longer works well for solo stakers. e.g.,
Running many validator keys via Lido CSM or similar platforms: It's not useful to receive 10 to 100 notifications every 6.5 minutes from beaconcha.in when the issue could lie with your EL, CL, or VC directly.
Running DVTs: Because your nodes could be offline without causing missed attestations--e.g., a cluster of X nodes is responsible for operating Y validator keyshares.
This problem is especially pronounced for DVT operators as they may only act when the cluster fails to achieve consensus as a whole, leading to free-rider problems.
Hence, I needed a way of being notified when each of my core services is unhealthy instead of relying solely on onchain alerts. I also want to be alerted when my health checker service itself fails.
I settled on a custom Python script running in a docker container that queries the health, metrics, or p2p endpoints of the consensus, execution, validator, & DVT clients of your setup periodically and sends a Telegram message to yourself or a chat group if it fails. Healthchecks.io alerts you if your health checker service itself fails.
It's a lightweight solution ideal for scenarios where detailed metrics and alerts are not required.
Disclaimer: This is more of a fun project for home stakers and does not replace professional monitoring tools.
Last updated