Datawise Recipe for Effective Software Monitoring: App Logs, Infrastructure Telemetry, Health-Check Reports

Daryna Tkachenko

We at Logicify have a powerful & flexible toolset for software monitoring - both for apps we develop for our clients and the ones we use internally. Let's take a closer look at the types and sources of data that form the basis of this monitoring solution.

6 min read

We <3 Monitoring

At Logicify, we are proud to be software monitoring geeks. We love to monitor both the apps we develop and the ones we use internally. Not because they are sloppy. Not because we don’t trust our code. But because we love to keep abreast of events, control performance and eliminate the risks of an error. Monitoring helps us be proactive and avert issues before real users are affected.

We already gave an overview of our double-sided system of user behavior and app condition monitoring. In short, its main components are Graylog, used as a single data storage for logs and other data about the web app, and Grafana, a powerful data visualization tool. Combined and wisely configured, these two tools give an objective picture of the app’s performance at all times.

For comprehensive snapshots of system behavior and, what is more important for apps in production, for proactive moves to iron troubles out, we need a few layers of monitoring data. App-specific metrics should be complemented by other analytics to give us a broader picture of system state and performance. Here’s our datawise recipe for effective monitoring.

Correct Data — the Ground for Monitoring

No surprise, even mightiest monitoring solutions are useless for poor data. Keeping track of every single metric is time-consuming and, in fact, useless. Poor or incorrect data is misleading, and you’ll fail to notice an error. So before tuning any monitoring system, be sure to determine the metrics worth tracking. They should provide an objective picture of an app’s behavior in a given time interval, focus on the bottlenecks you are aware of, and give clues where to look for the context of an error in case it occurs.

There are some generic metrics worth tracking for any web application, e.g. server load or hard drive space, and some specific metrics dependent on the app’s logic and functionality. Think objectively — what is this exactly that you need to monitor? Would these metrics be meaningful and indicative of system state?

Need help defining the key metrics to be monitored in your application? We are here to help. Contact us via email or using the form below.

Data Collected for Our Comprehensive System Monitoring

Combined metrics and data from the following sources form the ground of our software monitoring system.

Scheme with Different Data Types for a Monitoring Toolset.

Application logs, analytical data about app performance

Application logs are the primary and most informative source for performance tracking and possible errors. No need to mention they allow devs to troubleshoot whatever issues occur in the system.

Though not obligatory to collect, structured analytical data about app performance is useful for monitoring purposes. It is context-rich and thus very helpful for investigations, debugging, troubleshooting. In our case, this data is generated by a piece of logic in the app, which takes system-specific use cases into account. One case in point is needling through all records for a given request ID, tracing sequential actions of a given user, finding out which issues he/she experienced and why. Imagine a situation when a client failed to complete a check-out on your eCommerce platform as a result of a server glitch. With this app-specific analytics, the devs will isolate this case. Your customer support will then find this customer's contact info in the system and reach out via phone/email, apologizing for technical issues and reminding about the abandoned cart.

Read more about our custom module for Django used to generate this structured data and to push it to Graylog.

Infrastructure metrics — both generic and service-specific

Apps are running atop infrastructure — physical or virtual resources, such as servers, network resources, transmission media, centralized or spread across multiple data centers. For a web application to perform healthy, technical state of physical infrastructure should be constantly monitored. If your app’s infrastructure resides in a cloud, IaaS providers (Amazon for AWS, Microsoft for Azure and so on) usually take care of this for you.

However, you still need to keep track of certain app-critical metrics, for instance, available storage limits per your tier. You may not even be aware of poor user experience or performance issues in the app if you fail to monitor these generic server metrics:

  • CPU load
  • remaining hard drive space
  • used vs remaining short-time memory.

At the same time, this also makes sense to keep track of service-specific metrics: for a database — average interaction time, for cache — number of elements in it and their utilization and so forth.

To get this telemetry, you can use a dedicated software applications, e.g. Nagios, an industry-standard software for IT infrastructure monitoring, or Goss, an open-source tool for validating a server’s configuration. Most IaaS providers offer monitoring and management services too, e.g. Amazon CloudWatch or Azur Monitor, which allow to understand how your application is performing and optimize resource utilization. Another benefit of these software products is their alerting feature.

Outcomes of regular software health-checks

Neither application logs nor infrastructure parameters cover the cases when your app is simply down. Indirect indicators, such as website traffic or number of requests, are not always indicative of system state, especially on early stages after an app’s public release.

For complex multi-component app, this makes sense to conduct regular proactive health-checks. Again, specific test cases and metrics to be checked depend on the application logic and architecture. For instance, for service workers, you can schedule a few regular test tasks and check their execution or make sure test messages appear in message queue. Such health-checks allow proactive monitoring of all app’s components, targeting both real-time and background processes.

One way to set this is through custom scripts for specific system components. However, as the system functionality extends and new features are added, this may be tiresome and counter-productive to keep track of the scripts’ list and health-check execution and outcomes, especially if you lack a single data storage for monitoring data.

As a solution to this, Logicify recently developed a highly configurable HealthCheck bot to aid with this kind of proactive monitoring. With it, developers can easily choose which metrics to monitor and how often. For us, the ability to write custom assertions in Python for even more sophisticated check scenarios is the main advantage of HealthCheck bot. It allows to define where to store the outcome of these health-checks, and this is very helpful if you already built a monitoring system around a single data storage. Logicify bot was released under GPLv3 license. Health-check reports are loaded into monitoring data storage, along with other information about system state, and further used for analysis.

We can set up a similar monitoring toolset with multiple-layer data from your application being the basis for it. Contact us today if you are interested.

Bottomline

Before you tune a system to monitor your web app’s performance, define critical metrics to keep an eye on and sources to receive data from. For a comprehensive picture of your app’s behavior over time, the data should be received from multiple sources, such as app’s logs, on-premise or cloud infrastructure, dedicated monitoring tools. Run regular proactive checkups of all system components to catch possible errors not reflected in application logs. For the sake of convenient and time-saving investigation and troubleshooting, it is advised to have a single data storage for all monitoring findings.

At Logicify, we build a powerful comprehensive software monitoring system, with consistent and timely alerts, and a single-place depot for application logs and structured analytics, infrastructure telemetry and outcomes of regular health-checks run by Logicify bot.

If you are interested in learning more about our monitoring system or have questions about it, feel free to reach out via the form below. We’ll be glad to share our experience!

Related articles

Have a question about software monitoring?

Let us know if you have questions about our software monitoring solution or need to set up a similar one for your web application. We'll be glad to assist!

Tags

Scroll top