Monday 25 May 2015

ITIL - Availability Management

Availability Management is the practice of identifying levels of IT Service availability for use in Service Level Reviews with Customers.
All areas of a service must be measurable and defined within the Service Level Agreement (SLA).
To measure service availability the following areas are usually included in the SLA:
  • Agreement statistics – such as what is included within the agreed service.
  • Availability – agreed service times, response times, etc.
  • Help Desk Calls – number of incidents raised, response times, resolution times.
  • Contingency – agreed contingency details, location of documentation, contingency site, 3rd party involvement, etc.
  • Capacity – performance timings for online transactions, report production, numbers of users, etc.
  • Costing Details – charges for the service, and any penalties should service levels not be met.

Availability is usually calculated based on a model involving the Availability Ratio and techniques such as Fault Tree Analysis, and includes the following elements:
  • Serviceability – where a service is provided by a 3rd party organisation, this is the expected availability of a component.
  • Reliability – the time for which a component can be expected to perform under specific conditions without failure.
  • Recoverability – the time it should take to restore a component back to its operational state after a failure.
  • Maintainability – the ease with which a component can be maintained, which can be both remedial or preventative.
  • Resilience – the ability to withstand failure.
  • Security – the ability of components to withstand breaches of security.



Some availability measurements, that may be included in SLA:


  • Mean-Time-Between-Failure (MTBF): elapsed time between a service gets up and down. It represents relaibility.
  • Mean-Time-To-Repair (MTTR): elapsed time to repair a configuration item or IT service. 
  • Mean-Time-Between-System-Incidents (MTBSI): elapes time between detection of two consecutive incidents.
  • Mean-Time-To-Restore-Service (MTRS): elapes time from the detection of an incident until it gets up.It represents maintainability.

MTBSI = MTBF + MTRS

Availability = uptime/ (uptime+downtime)    =MTBF / (MTBF + MTTR)






No comments:

Post a Comment