In our previous posts, we shared a detailed overview of Logicify software monitoring toolset, with Graylog and Grafana as its core components. In this article, we’d like you to meet our open source Healthcheck Bot, an application we developed for proactive software check-ups.
4 min read
We already emphasized how attentively we monitor performance of web applications developed for our clients. At Logicify, software monitoring means not simply scanning through logs and reacting to errors and exceptions. It is about collecting detailed data, from multiple levels and sources, about an app’s performance in every given point of time. Preferably, in a single data store — for easier analysis and troubleshooting.
Proactive application health-checks are crucial in monitoring process. Neither application logs nor infrastructure data would signal that your app is “dead”; only systematic health-checks would.
What Are Application Health-Checks?
Broadly, software health-checks verify whether an app is a) up and running; b) working as expected. The definition of “health” varies from one application to another, depending on features and functionality, so every software has its unique set of tests and metrics that prove it is fully functional. A few common things tracked for every software are URL accessibility and server resources usage (CPU, memory, disk space). In addition, application-specific checks target particular pieces of functionality. In an e-commerce, for example, you might want to regularly run the scripted test order placement to verify all critical parts of functionality perform as expected.
The ultimate goal of health-checks is to determine a high-level business-affecting malfunction as early as possible (read: before a customer does). But, because of apps’ complexity, there are countless things that could possibly fail, so detailed and multi-level health-checks are most effective.
Here’s a brief step-by-step recipe to tune health-checks for your app:
- Define pieces of functionality to target, along with metrics for their healthy vs. poor performance and frequency of check-ups
- Configure your health-check tools. This could be a dedicated software or just a combination of custom scripts. Consider adding alerts for reported misfunctioning
- Define a data storage for health-check outcomes
- Write out a set of actionable steps to fix the issue, or at least mitigate its consequences for end-users. If these steps can be automated, adjust the system logic for this
- After any issue found during a health-check is fixed, run retrospective analysis: why did it happen? What can be done to prevent it in future?
- Think of a disaster recovery strategy for critical misfunctions.
Tools for Application Health-Checks
Health-check content varies from one application to another; so do the means and monitoring tools. Industry-standard software for application, network, and infrastructure monitoring (e.g. Nagios or Zabbix) do not always fit existing setup. Plus, some of them come with a high price tag and quite limited flexibility. So dev teams usually go with heterogeneity of scripts and solutions for their application monitoring. Over time, it becomes too tedious and complicated to manage this. With this in mind and passion for software monitoring, Logicify developed an open source bot for regular proactive software health-checks.
Meet Logicify Healthcheck Bot
Logicify Healthcheck Bot is a simple yet highly configurable standalone application for proactive software monitoring. It is licensed under GPLv3. The user is free to configure target functionality and system metrics for health-checks as well as the storage for check outcomes. Moreover, the bot supports custom assertions written in Python for the most sophisticated check scenarios.
Find more about the bot at GitHub.
Advantages of the Bot
- The application is highly configurable and very flexible. You may define app components and functionality to target, boundary values for metrics, the content of health-checks and time-intervals for their execution. Apart from pre-configured watchers, you could add custom, your app-specific ones.
- The bot allows defining the data storage for check outcomes, which fits your existing setup. This decreases reaction time and allows consistent alerting in case your app misfunctions. This is also easy to backtrack system state in every given moment in the past.
- Healthcheck Bot supports even sophisticated custom-coded test scenarios and can use entire pieces of app logic, written in Python, for health-checks. This means literally any piece of functionality could be “in crosshairs.” Say goodbye to the monitoring script “zoopark” you used to manage :)
How the Bot Fits Logicify Monitoring System
We build smart ecosystems for our apps. We know their “healthy” behavior metrics, so it is easy to determine when they deviate, and Healthcheck Bot perfectly fits our orchestrated double-sided system of user behavior and system state monitoring.
In short, we use Graylog as a single datastore for all monitoring data received from multiple sources: application logs and structured analytical data, infrastructure telemetry, and outcomes of health-checks done by the bot. The data could be easily managed for various dev and support purposes — troubleshooting, investigating and debugging. Setup in Graylog is the main reason we declined Nagios as a monitoring tool: though Nagios is powerful for software health-checks, it is certainly not for data and system performance analysis.
We paired Graylog with Grafana, which used to be a tool purely for data visualization. Starting with v4.0, Grafana supports alerting functionality: it allows to attach rules to dashboard panels and notifies users every time a certain metric deviates. So, we have a consistent system of rules and alerts in place for all monitoring metrics, including the ones collected by Healthcheck Bot.
Interested in Healthcheck Bot for your app monitoring? We'll assist you to configure it. Contact us today.
The last thing you wanna see is user-reported software malfunctioning. Early issue detection is paramount for any application. Be proactive in tracking your app’s performance and conduct regular health-checks. The main goal of such check-ups is proving the application is “alive” and has its main components functioning.
If, for some reason, dedicated software monitoring tools do not fit your existing setup, you could use Logicify Healthcheck Bot to write, configure and launch even very detailed and sophisticated health-checks. The bot will also allow you to determine the ultimate data storage for the check outcomes within your app infrastructure.
If you are interested in learning more about our monitoring system or Healthcheck Bot, feel free to reach out via the form below. We’ll be glad to share our experience!
- Datawise Recipe for Effective Software Monitoring: App Logs, Infrastructure Telemetry, Health-Check Reports
- Logicify Contributes to Open Source: Mautic Advanced Templates Bundle
- Logicify Double-Sided System of User Behavior and System Condition Monitoring
Have a question about software monitoring?
Thank you for contacting us! We would get back to you shortly (usually, within 24 hours on weekdays and 48 hours over weekends).