ThreeFold Node Status Bot Updates
86 subscribers
1 link
Download Telegram
Hey all, thanks for checking out the bot. Looks like it has an issue and is not responding to status checks. If you've subscribed to nodes, updates may or may not be working. I'll debug as soon as I can and send another update
Issue appears to be resolved. Please let me know if you see any other problems πŸ™‚
Hey everyone, I've tuned the bot a bit to help reduce the number of offline reports when your node is in fact up. Let me know if you observe more reliable behavior after this change
Bot is now waiting for two successive failed ping attempts before notifying that a node is down. This seems to produce much more useful output. If you still feel like you're getting too many notifications, please let me know and I'll prioritize an update to give you some control over this

An issue with fetching new node details was reported and resolved today. This should not have affected any existing subscriptions
Please note that the bot is currently offline and not sending alerts. The testnet VM hosting the bot was decomissioned for reasons I am still investigating and unfortunately all subscription data appears to be lost. Bot will come back online once I have a plan to make it more resilient with some kind of backup system

Thanks for your patience and for being an early tester πŸ™πŸ»
I am pleased to announce that the bot is back online. Please resubscribe to your nodes to start receiving alerts again. Quick recap of issue and fix:

1. Billing issue on testnet caused bot's VM to get decommissioned and all data was lost
2. Bot now uses an external backup that syncs data on every update
3. I'll experiment with running a second copy as fallback in case of outages
New release today removes the bot's dependency on Yggdrasil and uses the Grid Proxy server to check status instead. This should mean that status reports and alerts match what's seen in the explorer. Please report any issues you see with the new version to @scottyeager. Thanks!
Thanks for all reports that subscriptions have not been reliable since the switch to Grid Proxy. Changes deployed today should correct this.

If you prefer the old behavior, the bot also now supports a /ping command to check node status with pings over Yggdrasil. Offering ping based subscriptions again with a separate command is an option I'm considering. Let me know if this is of interest to you πŸ€–
A few updates over the last months:

* The status and ping commands now report the outcome for all subscribed nodes, if no node id is given

* The bot was printing excessive logs and filled up its (rather limited) available disk space. This interrupted operation briefly but was quickly resolved

* A bug affecting some early users of the bot who had not interacted with the bot for a while has been fixed. The bot will respond to these users again

* As of today, the bot now behaves properly when multiple users subscribe to the same node. Previously, only one user would get the alert (not necessary the same user each time)

* Requesting the status of an invalid node id now gives an error rather than just not responding

Thanks to everyone who reported issues and asked questions about expected operationβ€”I couldn't identify and fix the problems without you!

Roadmap Ideas for development, as feasible:

* Begin querying (and storing) uptime events from GraphQL for subscribed nodes. Evaluate data as a replacement or fallback for grid proxy

* Provide the ability to check cumulative uptime for a given period

* Add alerts for capacity changes, in case of failed hardware (suggestion from a community member, thx)

* ...?

Suggestions welcome -> @scottyeager. Thanks for tuning in πŸ“‘
Update to the bot today, released just a bit ago, corrects a bug that was blocking subscription alerts for some users. This means you may have received some delayed alerts. Please disregard any alerts that are no longer relevant.

Some other changes:

* When using /ping or /status without a node id, to query all subscribed nodes, the lists come back separated by which nodes are up/down and sorted by descending node id

* Bot now refreshes the node's Planetary Network IP before each ping attempt, since they can sometimes change

* Better error handling, so any issues that come up while checking nodes are isolated to a single node

* Bot now sends admin alerts when errors appear in the log file. This should help with catching issues in the future

Thanks to everyone who's reported about their experience with the bot. If you notice any issues, please send a DM to @scottyeager or reply to a message on this channel.
Hi everyone, with the 3.9 release coming to mainnet today, there are changes that will interfere with the bot's ability to function until it gets an update

In summary:

* The bot will not alert you if a node you are subscribed to goes down (or comes back up)
* Ping function will not work
* You can still fetch the node's status using the /status command

Furthermore:

* The power saving (Wake-on-LAN) feature complicates the notion of a node being "down" or "up"
* I already wanted to overhaul the bot's code and make it nice enough to open source

Therefore, the bot will remain in it's current state until I have the time and energy to bring it up to speed with the changes to the Grid. While the bot has never been officially supported piece of ThreeFold software, I understand that many farmers depend on it and I'd like to continue maintaining it for the time being.

Thanks for your patience until the new version is ready πŸ™πŸ»
Hi again,

Today there's an interim release with two small changes:

* Subscription alerts are now triggered based only on the node's uptime reporting, without any dependence on ping. That means alerts should work again, with the limitation that nodes only report uptime every 40 minutes so it will take at least that long to detect that a node is offline

* The ping command is disabled an displays a message to that effect

Work continues on the new bot, but since it's release date is unknown, these changes should help a bit in the meantime.
Hello,

After receiving reports the the bot was not responding, I identified and resolved an issue related to network performance on the system where the bot was running. If you see any further issues, please don't hesitate to reach out
Hi everyone,

I had to redeploy the node status bot after some reports that it wasn't working. It's back online and should be functioning normally

Since the bot data was recovered from a backup, you might see some extra alerts based on an outdated state

Please do check that your node subscriptions are as you expect, by running the /sub command with no input. If you made any recent changes they may have been lost
Hi all,

The node status bot is back after ~10 hours of outage due to the node it was hosted on going offline. This should be a 1-1 recovery of the bot as it was before the outage, but it wouldn't hurt to double check that you are still subscribed to all nodes as you expect. Just run the /sub command with no input and the bot will reply with your currently subscribed nodes

Thanks for your support and understanding πŸ™πŸ»
Hi,

I'm pleased to announce that version 2.0 of the node status bot has been released to the bot's main handle, @tfnodestatusbot

This update is mainly focused on bringing visibility to farmers using the farmerbot about if their nodes have incurred any violations for failing to wake up within 30 minutes. These alerts will begin automatically for any subscribed nodes and you can also begin to use the /violations command to generate a report

There are also some other bug fixes and quality of life improvements included. Thanks to everyone who has reported issues and tested on the staging version of the bot

If you have any questions or suggestions, please feel free to reach out (@scottyeager)

Cheers πŸ™πŸ»
Hello farmers,

Due to today's Grid update, there were some incorrect alerts sent by the node status bot about nodes going offline or failing to wake up within 24 hours of being put to sleep by the farmerbot. If you received a message that your node failed to wake up within 24 hours but this doesn't look correct based on when the node went to sleep, most likely the alert was an error and can be safely ignored.

If you noticed anything else that seems off or have further questions, feel free to reach out to me directly (@scottyeager) or contact the ThreeFold support team. Thanks πŸ™πŸ»