We at Logicify have a powerful & flexible toolset for software monitoring - both for apps we develop for our clients and the ones we use internally. Let's take a closer look at the types and sources of data that form the basis of this monitoring solution.
6 min read
We <3 Monitoring
At Logicify, we are proud to be software monitoring geeks. We love to monitor both the apps we develop and the ones we use internally. Not because they are sloppy. Not because we don’t trust our code. But because we love to keep abreast of events, control performance and eliminate the risks of an error. Monitoring helps us be proactive and avert issues before real users are affected.
We already gave an overview of our double-sided system of user behavior and app condition monitoring. In short, its main components are Graylog, used as a single data storage for logs and other data about the web app, and Grafana, a powerful data visualization tool. Combined and wisely configured, these two tools give an objective picture of the app’s performance at all times.
For comprehensive snapshots of system behavior and, what is more important for apps in production, for proactive moves to iron troubles out, we need a few layers of monitoring data. App-specific metrics should be complemented by other analytics to give us a broader picture of system state and performance. Here’s our datawise recipe for effective monitoring.
Correct Data — the Ground for Monitoring
No surprise, even mightiest monitoring solutions are useless for poor data. Keeping track of every single metric is time-consuming and, in fact, useless. Poor or incorrect data is misleading, and you’ll fail to notice an error. So before tuning any monitoring system, be sure to determine the metrics worth tracking. They should provide an objective picture of an app’s behavior in a given time interval, focus on the bottlenecks you are aware of, and give clues where to look for the context of an error in case it occurs.
There are some generic metrics worth tracking for any web application, e.g. server load or hard drive space, and some specific metrics dependent on the app’s logic and functionality. Think objectively — what is this exactly that you need to monitor? Would these metrics be meaningful and indicative of system state?
Data Collected for Our Comprehensive System Monitoring
Combined metrics and data from the following sources form the ground of our software monitoring system.
Application logs, analytical data about app performance
Application logs are the primary and most informative source for performance tracking and possible errors. No need to mention they allow devs to troubleshoot whatever issues occur in the system.
Though not obligatory to collect, structured analytical data about app performance is useful for monitoring purposes. It is context-rich and thus very helpful for investigations, debugging, troubleshooting. In our case, this data is generated by a piece of logic in the app, which takes system-specific use cases into account. One case in point is needling through all records for a given request ID, tracing sequential actions of a given user, finding out which issues he/she experienced and why. Imagine a situation when a client failed to complete a check-out on your eCommerce platform as a result of a server glitch. With this app-specific analytics, the devs will isolate this case. Your customer support will then find this customer's contact info in the system and reach out via phone/email, apologizing for technical issues and reminding about the abandoned cart.
Read more about our custom module for Django used to generate this structured data and to push it to Graylog.
Infrastructure metrics — both generic and service-specific
Apps are running atop infrastructure — physical or virtual resources, such as servers, network resources, transmission media, centralized or spread across multiple data centers. For a web application to perform healthy, technical state of physical infrastructure should be constantly monitored. If your app’s infrastructure resides in a cloud, IaaS providers (Amazon for AWS, Microsoft for Azure and so on) usually take care of this for you.
However, you still need to keep track of certain app-critical metrics, for instance, available storage limits per your tier. You may not even be aware of poor user experience or performance issues in the app if you fail to monitor these generic server metrics:
- CPU load
- remaining hard drive space
- used vs remaining short-time memory.
At the same time, this also makes sense to keep track of service-specific metrics: for a database — average interaction time, for cache — number of elements in it and their utilization and so forth.
To get this telemetry, you can use a dedicated software applications, e.g. Nagios, an industry-standard software for IT infrastructure monitoring, or Goss, an open-source tool for validating a server’s configuration. Most IaaS providers offer monitoring and management services too, e.g. Amazon CloudWatch or Azur Monitor, which allow to understand how your application is performing and optimize resource utilization. Another benefit of these software products is their alerting feature.
Outcomes of regular software health-checks
Neither application logs nor infrastructure parameters cover the cases when your app is simply down. Indirect indicators, such as website traffic or number of requests, are not always indicative of system state, especially on early stages after an app’s public release.
For complex multi-component app, this makes sense to conduct regular proactive health-checks. Again, specific test cases and metrics to be checked depend on the application logic and architecture. For instance, for service workers, you can schedule a few regular test tasks and check their execution or make sure test messages appear in message queue. Such health-checks allow proactive monitoring of all app’s components, targeting both real-time and background processes.
One way to set this is through custom scripts for specific system components. However, as the system functionality extends and new features are added, this may be tiresome and counter-productive to keep track of the scripts’ list and health-check execution and outcomes, especially if you lack a single data storage for monitoring data.
As a solution to this, Logicify recently developed a highly configurable HealthCheck bot to aid with this kind of proactive monitoring. With it, developers can easily choose which metrics to monitor and how often. For us, the ability to write custom assertions in Python for even more sophisticated check scenarios is the main advantage of HealthCheck bot. It allows to define where to store the outcome of these health-checks, and this is very helpful if you already built a monitoring system around a single data storage. Logicify bot was released under GPLv3 license. Health-check reports are loaded into monitoring data storage, along with other information about system state, and further used for analysis.
We can set up a similar monitoring toolset with multiple-layer data from your application being the basis for it. Contact us today if you are interested.
Before you tune a system to monitor your web app’s performance, define critical metrics to keep an eye on and sources to receive data from. For a comprehensive picture of your app’s behavior over time, the data should be received from multiple sources, such as app’s logs, on-premise or cloud infrastructure, dedicated monitoring tools. Run regular proactive checkups of all system components to catch possible errors not reflected in application logs. For the sake of convenient and time-saving investigation and troubleshooting, it is advised to have a single data storage for all monitoring findings.
At Logicify, we build a powerful comprehensive software monitoring system, with consistent and timely alerts, and a single-place depot for application logs and structured analytics, infrastructure telemetry and outcomes of regular health-checks run by Logicify bot.
If you are interested in learning more about our monitoring system or have questions about it, feel free to reach out via the form below. We’ll be glad to share our experience!
- Grafana as Yet Another Tool for Technical Monitoring of Software Products
- Graylog as a Tool for Technical Monitoring of Software Products
- Logicify Double-Sided System of User Behavior and System Condition Monitoring
Have a question about software monitoring?
Thank you for contacting us! We would get back to you shortly (usually, within 24 hours on weekdays and 48 hours over weekends).