This is the first of a three-part AIOps discussion with ScienceLogic’s Public Sector Principal Solutions Architect, Lee Koepping, and Swish’s CTO, Sean Applegate. The three part discussions include:
Sean:
Welcome to our three-part AIOps series. I’m Sean Applegate, Swish’s Chief Technology Officer (CTO). First, let’s define AIOps. Gartner defines AIOps as “the application of machine learning and data science to IT operations problems. AIOps platforms, combine big data and machine learning functionality to enhance, and partially replace, all primary IT operations functions, including availability and performance monitoring, event correlation and analysis, and IT service management and automation. AIOps platforms consume and analyze the ever-increasing volume, variety and velocity of data generated and present it in a useful way.” For the context of our discussion with ScienceLogic, we’ll focus on use cases that span various IT operational roles and teams, as well as business owners. Applicable organizations include Enterprise Operations Centers, Network Operations Centers, Site Reliability Engineering, and DevOps, as well as the business executives and mission centric lines of business. It my pleasure to introduce Lee Koepping from ScienceLogic. Lee, please tell us a little bit about yourself.
Lee:
Thanks Sean. I am the Public Sector Principal Solutions Architect with ScienceLogic. Throughout my career, I’ve been a CTO, such as yourself, at multiple companies as well as worked with ISPs, MSPs and around the operational space. AIOps is a passionate subject for me because of the value that it can bring organizations.
Sean:
I appreciate you joining today. Much like yourself, I’ve been around the observability space for a long time. I’ve spent time at performance engineering OEMs and have had the honor of being a director whose team coordinated Wireshark SharkFest globally. As a performance junkie, I think this is going to be a fun conversation. Let’s jump into our first question. When ScienceLogic says AIOps, what specifically does ScienceLogic mean?
Lee:
That’s a great question. AIOps is a relatively new term. From personal experience it is really an evolution of operations. It’s something we were always ‘trying’ to do. If you look far enough back, anyone involved in an operational capacity has always had this utopian destination in mind and finally, or at least currently, it has a name – AIOps. The Gartner definition that you provided is great. There’re a lot of words around what it is, why is it, and what it does for somebody. In a nutshell it’s really automation. I mean, that’s what we’ve always been after. A long time ago you could get your arms around the data. As time went on you could sort of do that and you might have to bounce it against a couple of systems and have some intermediary middleware to help you figure that out. And then this whole revolution of analytics comes into play with machine learning and artificial algorithms. Things that are tunable without data scientist involvement, which is still relatively new. In general, I think the culmination of data, which has always been there, and the evolution of algorithmic analysis is part of AIOps, but really the goal is streamlined IT automation. So those three things together, the culmination of the data, the analytics and the automation is what AIOps is today.
Sean:
That makes a lot of sense. One of the simplest ways I’ve seen ScienceLogic describe AIOps is see, contextualize and act. It’s very easy to remember whether you’re a business executive, an application owner, or you’re the network engineer fixing the plumbing that always seems to be guilty before proven innocent. Give us a short story that speaks to some of those roles, a real-world use case for our readers.
Lee:
A very large customer of ours was bent on the goal of automation and was able to leverage our platform. This drove a lot of our initial approach and development, but the genesis behind it was that we have all this data and are asking ‘how do we better present it?’ Especially to executives. There’s tribal knowledge amongst technical people that allow them to intuitively just know things. You know server XYZ, router ABC. When those have a problem, you just intuitively know what is happening. But management, leadership, or anybody outside of that core IT operations realm wouldn’t know what that meant.
So, part of it is simply structuring data in a way that is intimately familiar to anybody that’s going to look at it. There may be a couple of views on the same data. The concept of business services is an important part of our approach. That drives a lot of context in a platform like ScienceLogic. We, derive context through discovery, but allowing you to define context creates a better data lake which ultimately leads to automation. So, to come full circle, this particular customer had a lot of data that they were beginning to visualize in a very operational way. The next logical conclusion is leveraging knowledge management from their ITSM system to see patterns, things that were repetitive that could be automated. Some of that started in troubleshooting. If a human has the right access, and the right knowledge set to pull additional information and automate to the point where, when the event is recognized seconds later, all that same information is present in the event. Ultimately the incident that gets created is a huge use case.
It started on a very large scale with a manufacturing customer in the IT space and is now permeating into large government agencies. Being able to drive automation of troubleshooting and triage is probably the lowest hanging fruit when it comes to an example of AIOps.
Sean:
That’s great. From a value perspective we all want to have our IT staff do higher value functions and push complex tasks down to lower cost employees that we have more of, such as Tier One or Two Support Analysts at a service desk.
Lee:
There’s obviously value in being able to drive down to a lower cost execution. To give them more bandwidth at the lower level you’ve got to identify repetitive tasks and eliminate them. If I can eliminate the repetitive tasks for Tier One or Tier Two staff, they now have the bandwidth to consume what we can consolidate from the complex tasks above them. You must look at it as a chain to derive real value across the entire operations team.
Sean:
That’s a very good point. The old 80/20 rule. If you can identify the greatest quantity of tickets which are easiest to address, and you can reduce the constraint on key resources. This frees them up for higher value activities.
Lee:
Another point worth making is AIOps, as much as it is a concept, is absolutely a journey. There is no AIOps product. There are products that have AIOps capabilities or AIOps functionality, but there’s no AIOps product. It is not a SKU. It is not a thing. It’s not a module. It is a journey. It’s a bit of a framework.
It’s being able to do multiple things, as I just mentioned, taking the complex and making it easier, but eliminating the repetitive and combining visualization and automation so human decisions benefit from a clearer presentation. Speed of operations and reductions in MTTR result from the automation. It’s decision making and automation.
Sean:
In complex federal enterprises, what we’ve seen is a need to coordinate across different silos. Whether that’s an application developer, an operations support engineer, a network engineer, or a cloud engineer, it’s very important that they collaborate to understand the dependencies in a complex system by seeing the visualizations and data enhanced by AIOps insights.
Lee:
The key is the data lake. There is a resurgence in organizations trying to create a common data lake that DevOps teams can rely on and pull data from for their purposes. This is somewhat different than infrastructure operations, which is also different than executive leadership. Organizations have multiple audiences, but to the extent that they can pull from the same data, they can look through different lenses at the same situation. Yes, you’re driving automation and that’s great, but decision-making humans still must analyze data to some extent, and each respective group is going to analyze it differently. So how do I tailor the presentation of that data in a native way? That’s very much an AIOps concept as well.
Sean:
A lot of that gets back to culture. How do we build a generative collaborative culture that is focused on performance goals and objectives? Not bureaucratic where they have more silos and walls that prevent teams from working well together. We’re seeing solutions like ScienceLogic allow those teams to not only consolidate their views and see more things together but work better together. In some cases, consolidating costs and doing tool rationalization, can be good for the organization’s budget. Let’s circle back to financial value. If I’m a business executive doing tool rationalization, and I’m looking at something like an AIOps framework, what are some of the ways I can learn about the business value or estimate potential savings when ScienceLogic is one of the AIOps solutions being considered?
Lee:
There are great resources available from the analysts. For example, the Forrester Total Economic Impact studies present the economics and benefits of technologies like AIOps. Early-stage prospects can use ScienceLogic’s AIOps Business Case Calculator to rapidly assess their Return on Investment (ROI) or payback. When they’re ready to tune it or do a deep dive an organization can work with partners like Swish, or with ScienceLogic directly, to further adjust the accuracy of a financial and operational business case.
Often, we input a customer’s incident information, data about MTTR, the number of tools they own, or the number of different silos they have. We can crunch that and present the potential cost savings, the payback or ROI, and the efficiencies both in dollars and resources (i.e., man hours).
Sean:
ROI analysis is well understood by Swish, especially within the Public Sector budget approval process. Swish only deals with federal organizations and often having that business conversation around operational savings not just from buying a tool, but implementing it at scale, adopting it across various teams, and realizing the value is very important for achieving the desired outcomes. A couple of things that stuck out to me in the Forrester TEI study was the average IT analyst’s time savings. They realized 15-minute savings or avoided 15 minutes of work on the average ticket. As an individual that’s worked at a service desk, that’s a lot of time savings. That’s real value in employee satisfaction and fast data driven decisions. The other eye-opening finding was they were able to reduce their number of events by two thirds. They went from 50,000 events to 17,000 events. That’s huge savings! When dealing with one third as many incidents, your staff can provide a lot more value and get answers to the executives quicker. In large government agencies where they’re dealing with tens of thousands of employees, or in some cases, hundreds of thousands of employees using a critical application, optimizing a major outage’s meantime to repair (MTTR) by a significant number of hours is millions of dollars in savings per hour. That’s a very powerful return on investment.
Lee:
Cisco is a large customer of ScienceLogic’s, that experienced effectiveness gains at massive scale. They realized a 5900% MTTR improvement. They took 175,000 tickets, automated triage and roughly 80,000 were auto resolved. So huge savings in both dollars and staff. AIOps saving scales well and efficiency scales well. The larger your scale the more efficiency you want. That’s obviously the outcome of an AIOps methodology and journey.
Sean:
Absolutely! Well, Lee let’s wrap it up and put a bow on today’s insights on AIOps value. We hope the readers join us on part two of our discussion, which is going to focus on data, decisions, and automation best practices. I’d like to thank everybody for reading. Below are links to additional resources we thought could be useful.
Resources