EventsManager - Alert Threshold Settings Should Limit Number of Notifications Sent
EventsManager - Alert Threshold Settings currently do not allow the end user to specify the number of notifications sent once an alert is triggered. By default, EventsManager will send one notification for each event processed and accounted for as part of the threshold. This can pose issues with E-mail or SMS congestion if a critical process or event occurs.
Example A) EventsManager is configured to sent E-mail alerts on a failed system sign-on event - If 10 or more of these events are logged within 60 seconds, send a Notification. Currently, if a process running constantly experiences a password reset or expiration, the Windows Event Log will generate an event every 500ms. This generates 120+ entries within a 60 second window, and EventsMangager will consequently send 120 notifications, one for each event. Consider the case where the event triggers at 3:00 AM Local time, and is sent to multiple users within a particular environment and you can quickly see the impact this can have on a local Exchange server.
Example B) EventsManager is configured to alert on a custom log import, which notified system administrators when 500 or more of a particular error is logged within a 60 second window. Assume this custom log is a web application with several thousand users concurrently connected. Consider the case where the application server fails, and each session is immediately terminated. EventsManager will parse several thousand events from the application crash, meet the threshold, and send out several thousand notifications regarding the event. It should be trivial to identify the impact to an E-mail server in this case.
Proposed Solution: Include a Alert Threshold toggle setting with the following options (for both method 1 and method 2):
A) Send a notification for each occurrence of an event (current default behavior of Threshold Alerting)
B) Send a single notification and include the number of similar events (E.g., pull the event details from the instance which triggered the alert and indicate the total number of similar events which were logged within the specified alert time frame.) This should be fairly easy to implement, as I imagine EventsManager counts the number of EventIDs from a particular log to determine if an alert threshold has been met. Option A (above) is provided for users who wish to retain the current notification settings in cases where it would be necessary to see each event's details (e.g., multiple account logon failures within the specified time frame which would otherwise not be displayed using Option B).