SLIC for Steelcentral – Application Service Level Monitoring

Tons of data, IP-addresses, applications, tcp-ports, metrics, values etc. from an IT-data-monitoring system are delivered constantly in a confusing and overwhelming quantity and depth. Often these data will be used just for troubleshooting workflows – but – beside – this complex numbers can be used for high-quality SLA reports and service-evaluating dashboards. This is the job of SLIC.

Availability & Performance - Management Metrics

For reporting the quality of a certain IT-service usually two parameters are required: availability & performance. If both parameters exist for all service-elements – they can be extrapolated to the whole IT-service. SLIC is the tool to create such values, availability and performance in one number, for and application, or a server, a service group, for a department – or the IT -Service at all.

Service-oriented Dashboards with Top-Down Workflows

SLIC presents objects, metrics and values in a clear top-down-approach – starting from the final user experience, showing the metrics and values causing an incident and finally going back to the root-systems and its parameters responsible for the incidents.
More is less - informations instead of data
Showing bytes per minute in a graph for one single element creates 1440 datapoints on the screen. How to fill the screen with thousands of elements ? In SLIC we try to avoid to overburden the user with millions of datapoints which are not relevant to the user – but focus on symptoms and incidents in our dashboards. This reductions allows to display a large number of elements and categories in one single graph.
Fig. 1: SLA Dashboard
Figure 2: Application Incident Heat Chart

SLIC shows for the selected app all metrics which were causing incidents, metrics and time-points without incidents are not displayed.

Revisable Service Level Reports

SLIC imports data from various data sources, can correlate them and display data in different visibility-levels eg.
• System / Server by SNMP/TXT File
• Network & Applicatoon performance data from ARX /Netflow / others
• Synthetic monitoring systems via API/DB Import
• Business Analytics via API
• Or many others
The reports – which SLIC does create – are perfect formatted and available as HTML or PDF. User can add comments, time of maintenance which are excluded from the calculations. Reports are auto-created without user -interaction and forwarded to a defined email-list.
Figure 3: SLA / BSC Week report

Multi-Tier-Level Service Discovery

Application problems are often caused by a backend-element in the tier chain – but identified by the server-receiver from eg. a web-server. Without knowing the service architecture at time of incident – the search for the cause will be time-consuming and difficult. SLIC Service health runs a daily automated service discovery – frontend- and 3-level-backend tiers are recognized, metrics collected, and incidents calculated. For TCP-Applications SLIC uses an own Baseline Algorithm which calculates incidents based on same-time/same-day histories for the particular element.
Figure 4: Service Architecture Discovery

Multi-Technologies

Since the data structure of SLIC is modular, all kinds of technologies can be imported and processed, if industry4.0-objects, Application-Metrics, Network-device-counters or security-events. Those technologies can be processed in SLIC – allowing a time-based correlation of all incidents over a wide technical landscape. Business-Analytics or user-experience data can added to understand the impact of technical issues with business- success.

All-Data Import

SLIC can import data from many various sources, just requiring an identifiable data structure, like Database, API or TXT-File are valid data sources.

Capture-trace-based Incidents

SLIC can use packet-traces for incident generation. Thousands of tracefiles – neither created by capture-appliances or by wireshark/Tcpdump agents - can be used for monitoring. SLIC-TraceMonitor – a module for permanent trace-analysis – is responsible for defining incidents based on deep packet capture conditions, like bytes, codes, flags or timing issues. The chart below shows a statistic over a day for a defined scenario – clear identifying which traces are critical – who is responsible (network or application team) and which metrics are causing the incidents.
Fig 5. Trace analysis Dashboard

Serivce Quality

Slic does extend the view on IT-services. It provides a one-digit-number of Serivce Quality for a whole service or it single contributing elements – allowing C-Level-management understanding the quality of service with a glance.

Intelligence

By including multi-technologies from multiple-data sources – and last not least – by an intelligent data processing system focusing on incident – the complete service infrastructure and its success to the end user can be displayed and understood.

Round Table

SLIC makes it possible that C-Management and different departments sit together on one table over one Report – to discuss the service quality and its parameters.