Validator Healthcheck Alerts
TLDR
Simple and lightweight health checker service for DVT operators or home stakers running >10 validator keys to monitor client-level health and send alerts via Telegram otherwise.
Healthchecks.io (Free!) notifies users if the health checker service itself is offline.
Follow me on Twitter or subscribe to my newsletter if you find this useful for you!
Special thanks to Eridian for sharing about healthchecks.io with me! He also has a more scalable monitoring/alerting setup which you can check out here.
How it works
HTTP & CURL Requests: The service periodically makes an HTTP GET or CURL request to the specified IP address, port, health, & metrics endpoints. e.g. http://127.0.0.1:8008 (CL health endpoint). This requests the server to respond with their "health" status.
Concurrent "check-in" with Healthchecks.io: Each time the service runs successfully, it sends a message to Healthchecks.io, informing it of the successful run.
Service Response: For the service to consider the consensus or execution client to be "up," the service at the IP address and port must respond to the HTTP request. This typically involves the client's API or a health check endpoint responding with an HTTP status code of 200 OK, indicating that the service is operational and can handle requests.
Timeout Handling: The service includes a timeout (e.g., 5 seconds), ensuring that if the client doesn't respond within a reasonable timeframe, it's considered "down." This helps differentiate between an unresponsive service and one that's simply slow to reply.
Send Alert Message: Finally, the script sends a message to the chat groups designated for each endpoint that was checked if it is “down” or “cannot be reached”.
Self-Reporting: Healthchecks.io notifies the user if the alerting service itself is down.
How to use
Preparation
Telegram bot token & Chat ID
Open Telegram and search for @BotFather
Start a chat with BotFather by clicking the Start button
Send the command:
Follow the prompts:
Name your bot (e.g., MyCoolBot).
Choose a username that ends with bot (e.g., my_cool_bot).
Once created, BotFather will send you a message with your
bot token
:
Save this token somewhere safe.
Go to your newly created bot on Telegram (search for its username, e.g., @my_cool_bot) and start the bot by clicking the
Start
button.Create a new group & add your bot to the group
Send a message in the group (e.g., "Hello, bot!").
Open up a browser and enter the following (Replace YOUR_BOT_TOKEN with your bot's token)
Look for the chat object in the response. The
id
under chat is your groupChat ID
(e.g.,-987654321
). For example:
You now have your BOT_TOKEN
and CHAT_ID
.
Healthchecks.io URL
Go to Healthchecks.io
Sign-up/Log-in with your preferred method (email, GitHub, or Google)
Click on "Add Check" on the dashboard. Select 5 minutes for the Period and 10 minutes for the Grace Time
On the check's configuration page, you’ll find a "Ping URL". It looks like this:
Go to your Integrations page and select the Telegram integration: Profile>>Integrations>>Telegram
Enter your Telegram Bot Token and Chat ID
Add the Healthchecks.io bot into your Telegram chat group:
@Healthchecks_io_bot
You now have your HEALTHCHECK_URL
and have both your own Telegram bot and the Healthchecks.io bot in your Telegram group. e.g.,
Make sure your monitoring endpoints are accessible
All Users:
SSV node: Enable the health endpoint by adding
SSVAPIPort: 16000
into the config.yaml file.SSV DKG & Obol Charon: No changes needed as we will query the P2P ports for these.
Systemd & EthPillar Users:
Execution clients: Add the
--Metrics.Enabled true
and--Metrics.ExposePort 6060
or equivalent flagsConsensus client: Add the
--metrics
and--metrics-port=8008
or equivalent flagsValidator client: Add the
--metrics
and--metrics-port=8009
or equivalent flags
Notes:
I prefer not to expose the EL's HTTP (8545) or CL's RPC (5052) ports to this service to reduce the risk of malicious code injection
Nevertheless, do not expose these ports to the public internet
Eth Docker Users:
Map the monitoring ports of your EL, CL, & VC to the host
Replace with the following:
Replace with the following:
Edit the .env
file.
Append :el-shared.yml:cl-shared.yml
into the COMPOSE_FILE=
line.
Restart Eth Docker.
Notes:
You may need to make this edit every time you update your Eth Docker repository
Do not expose these ports to the public internet
Setup
Install dependencies - curl, wget, nano, git, docker
Log out and then back in again for the new user group settings to take effect.
Clone this git repository
Pull requests welcome at https://github.com/samuelclk/validator-healthchecks
Create a .env
Paste the following content in your .env
file and replace BOT_TOKEN=TELEGRAM_BOT_TOKEN
, CHAT_ID=GROUP_CHAT_ID
, & HEALTHCHECK_URL=https://hc-ping.com/your-unique-id
with your actual credentials.
Save & exit with CTRL+O
, ENTER
, CTRL+X
.
Some context:
My first priority are the "health" endpoints that do not require exposing the HTTP or RPC ports where possible
If those are not available, I query the "metrics" endpoints for a proxy of "health"
If neither are available, I query the P2P endpoint directly to ask "are you there?" to those clients
Build the docker image.
Run the tool
Run the health checker service as a docker service.
Check for errors.
You should see the following:
This health checker service runs once every 5 minutes and sends a message to Healthchecks.io immediately after.
If any of your monitored services are unhealthy or unreachable, it will send a telegram message to your designated group chat via your Telegram bot.
If Healthchecks.io does not receive a message from your health checker service, it will send a message to your Telegram chat group to notify that your health checker service itself is offline.
More Context
Most solo/home stakers rely on beaconcha.in watchlists to notify them when their validators are missing attestations.
However, in some scenarios, this method no longer works well for solo stakers. e.g.,
Running many validator keys via Lido CSM or similar platforms: It's not useful to receive 10 to 100 notifications every 6.5 minutes from beaconcha.in when the issue could lie with your EL, CL, or VC directly.
Running DVTs: Because your nodes could be offline without causing missed attestations--e.g., a cluster of X nodes is responsible for operating Y validator keyshares.
This problem is especially pronounced for DVT operators as they may only act when the cluster fails to achieve consensus as a whole, leading to free-rider problems.
Hence, I needed a way of being notified when each of my core services is unhealthy instead of relying solely on onchain alerts. I also want to be alerted when my health checker service itself fails.
I settled on a custom Python script running in a docker container that queries the health, metrics, or p2p endpoints of the consensus, execution, validator, & DVT clients of your setup periodically and sends a Telegram message to yourself or a chat group if it fails. Healthchecks.io alerts you if your health checker service itself fails.
It's a lightweight solution ideal for scenarios where detailed metrics and alerts are not required.
Disclaimer: This is more of a fun project for home stakers and does not replace professional monitoring tools.
Last updated