Tentang SLA (Service Level Agreement) & BSM


What are SLAs? 
SLAs are agreements that exist between business units and IT environments where the
level of a service is defined to ensure reliable operations.  They effectively document
business or customer performance and availability expectations of an application or
service.


3 Major Types of SLAs
SLAs mean different things depending on the context.  So let’s define three common types:

  1. Contractual: this kind of agreement is negotiated between two companies, for example between an outsourcer (IBM Global Services, or a smaller managed service provider), an application provider (like salesforce.com or  Amazon’s AWS), or a service (like DropBox, or Evernote).
  2. Service desk based: Mainly for ticket based systems where activities involve people and process and may include complex workflows (like deploying a system, or on-boarding a new employee).
  3. Infrastructure based: Mainly done by IT service management systems where infrastructure is actively monitored and outages of applications are tracked in real-time.



BSM – what is it?
Business Service Management
A business centric view of services provided. It is an overriding theme that sits above ITSM (IT
Service Management).  IT must switch from a technological view to a customer view, and must
understand business workflows and business logic of applications.


What is it we’re actually trying to achieve?
IT and Business need to talk in the same language. For example, from an IT perspective, everything may be operational but are you able to answer: “did the user receive what they paid for, or what they actually requested?”

We need to understand the complex workflows that may not be actively monitored using typical ITSM  tooling.


Is it achievable?
Becoming BSM aware requires a lot of work and transparency on both IT and the business’s side.
Start with a candid dialogue – are we providing adequate service now?  Where should we go?  How
can we improve?  How are you going to help us improve?

TIP: Can you back-test your application availability and performance? Can you go back through
history and see how your business is trending and determine if your SLAs will even meet those
targets.

Decomposing an application
You can’t properly monitor an application and understand its key business drivers without first breaking it down.
  1. What does the application do? How do people interact with it?
    1. What are the key business metrics?  Dollars per hour?  Transactions per minute?  Claims per day?
    2. How is the application built?
    3. Are there key dependencies shared across many applications?  Are you depending on third-parties, or cloud based services
    4. Are you using custom, insular components?  They might be hard to monitor or metric.
  2. What infrastructure does the application sit on?
    1. Will one component of your infrastructure affect many applications?  Do you know how to prioritize fixing outages and map them to key applications?
SLA Routines
Try to alter your view from infrastructure-centric to business-centric? Are you delivering the right data in the right amount of time?  What are some routines around SLAs that you can implement?
Building SLAs that work
  1. What are the critical tests for an application?  Can it be actively monitored or is it a complex workflow?
  2. Prioritize your applications: revenue or customer critical should always come first.
  3. How can proactively watching SLAs help?
    1. Catch items before they become too critical! Ensure you able to proactively monitor your environment and  be alerted before an outage or violating an SLA.
    2. Tiger Teams
      • Daily ops review – getting daily reports (are you trending to violate an SLA?).
      • Prioritize what to work on (root cause analysis that impacts key applications).
      • Figure out causal high-priority problems. Determine particular services that impact different applications within your environment, or different components of an application.
      • If something keeps causing outages and headaches – re-architect it!

Respect – how do you start?
  1. Take care of the small stuff first
    1. Trying to monitor business metrics when you keep running out of file system capacity is going to be a lost cause. Make sure your systems are as available as possible.
  2. Don’t aim too high at the beginning
    1. Understand what you’re committing to – see if you can back-test SLAs before you sign up for them.
    2. Make this an iterative approach – aim for incremental improvements and be able to demonstrate them.
  3. Overcome the fear of transparency
    1. Over-communication of outages and how they occur might be cringe-worthy at the beginning, but it ultimately helps your team prioritize and improve.
    2. Yes, it sucks to have outages, but communal betterment is… better!

(Webinar SLA Uptime)