Title Image

Owner’s Corner: Monitoring Script

We've never had a great monitoring solution. If something goes really wrong, usually I wait for someone to text me if I'm not there to deal with it. I've long wanted to initiate an automatic notification of downtime, and tried to use something like nagios to do the monitoring, but it ended up being overkill for this monitoring of 1 service on 1 server. So I decided to write my own solution.

I believe there may be some logical complications here, so I wanna talk out my logic and ask you, the readers, to poke holes in my thinking. This script isn't done yet, by far.

The server lives as a node on my home network, which is behind my own firewall, which is behind my router. First, we need to monitor if a computer elsewhere on the internet can contact my home network. Second, if we can get to my hone network, then we need to make sure that the server the game runs on is online...to do that we monitor it from my always-on home server (home desktop assistant, or HDA). Finally, let's assume the server is running, we need to make sure that the minecraft server application is running, and that it's properly communicating (and not hung).

I have some simple solutions for these tasks, but I'm not convinced these are logically sound. Without being repetitive, I outlined some of my concerns in the comments of the script. To test that you can reach my home network, I have a ping going to my DynDNS resolver...this will just check that my router can be connected to. It should be an indicator that the internet is up at my house. My home server auto-boots if the power goes out, so I'm operating under the assumption that if the network is reachable, the HDA will be up and as a result running its checks. The HDA then auto-pings the minecraft server, to make sure it's running. It also auto-boots, so if it's running, its checks should be running. Finally, the on-server script first checks that a java process is running...if it is, send a command to the server to produce a log entry, then immediately pull a system timecode. If the command is successful, indicating the server is running properly, the timecodes should match. If they don't, the log isn't being written to, and even though the process is running something is wrong.

So that's the deal, like I said there's a few things I'm concerned about, but I think this should really help us keep on top of a downtime. I really need feedback, error checking, and thoughts on improving this one.

Spoiler Inside SelectShow

    One Response to “Owner’s Corner: Monitoring Script”

  1. Morate Says:

    Just an idea: You can set up an e-mail address that will post whatever is sent to it directly on the blog. (I can’t recall if WordPress has this built in or not, but here’s how to do it: http://codex.wordpress.org/Post_to_your_blog_using_email) You can create a separate error message that is easily understandable by the less tech-savvy on the server, and e-mail this public error message to that e-mail account. The end result would be something like this:

    Subject: The Server is Down!

    Message: At about 3:51 PST today, the teh3l3m3nts Minecraft server started misbehaving. The owners have been notified, and the server will be disciplined as soon as possible. Sorry for the inconvenience!

    Also, my sister has a computer that she’s not using and probably won’t use in the future. If you want, I can convince her to give it to me for whatever purpose you may need. (Such as having an extra check in the loop or maybe as a backup host.)
    Good luck! :D

Leave a Reply