Escalations. Typical scenarios website monitoring service

Published: 2014-06-12 all articles | Glossary | FAQ

I was woken up by an SMS at three a.m.
My site dropped for three minutes, and it raised back itself.
But I could not go back to sleep.

True-life story

As many people know, HostTracker is a sites efficiency monitoring system. One of its main functions is to notify the user of any problems promptly. The efficiency of the notifications and the acceptable level of “detalization” are important. If you send alerts at each “sneeze”, the person will not find the important information in this flow.

We have provided several mechanisms that will help the right people to get the necessary notifications:

  • Separation of the notifications into several groups according to their criticality;
  • No notifications at short-term failures;
  • Report the problem to the manager promptly;
  • Report a prolonged failure to the administration;
  • Use the free alerts first – email, gtalk, and then the paid ones – SMS or phone call;
  • At the contact level – set the working time when this contact should receive the alerts.

There are three types of notifications:

  • The website has “dropped”;
  • The website is still “down”;
  • The website “rose

The “dropped” and “rose” are clear. The notifications “site is still down” are sent at each test fail, but only at the confirmed drops. The fails confirmation algorithm was described in the article “False alerts exclusion”

For each site-contact pair you may enable or disable the appropriate notification type. The setting can be located in the contact properties as well as in the general “matrix” at the “Notifications subscribtion” page.

Escalation and the notifications detalization level.

Suppose, two people are responsible for the site:

  • Administrator
  • Manager

Let's try to implement the following scenario:

  • In the event of a “drop” we want to send an email message to the administrator immediately;
  • If the site does not rise within 15 minutes, we send an SMS to the administrator;
  • If the site is “down” for more than an hour, then we send an SMS to the manager.

Adding the contacts for the users. While adding, draw attention to the “Notification Delay” window.

We appear to have three contacts with the following delays:

  • Administrator (email) – no delay;
  • Administrator (SMS) – 15 minutes delay;
  • Manager (SMS) – 1 hour delay.

According to this configuration the administrator will get all the failures notifications to the email, but SMS notifications will be sent only if the site is “down” for more then 15 minutes. The manager will receive only SMS about major failures lasting more than an hour. Setting up the contact working schedule

Suppose that one administrator can not cope, and we hired one more administrator. The first one works during the first half of the week, the second one works during the second half. Accordingly the notifications should be sent to the administrator “on duty” To set this scenario the window “Set the contact working hours” is used in the contact settings.

In this case the first administrator will receive the SMS notifications from Monday to Thursday inclusive. Additionally, you may divide the notification for different employees according to the time of day, for example appointing day and night administrators.

Conclusions: with the help of relatively simple mechanisms we may cover most notifications fine-tune user scenarios.

Tags: usecase