Pages

Tuesday, February 6, 2018

Gaming Business Process Outsourcing (BPO) Contracts

Introduction

Business Process Outsourcing (PBO) contracts are governed by a number of Service Level Agreements (SLAs) that both client and vendor must establish so the vendor's performance will match the client's requirements and expectations. In a help desk context, one of the most basic measures represented in a SLA is the Mean Time To Resolve (MTTR) for an incident, which is the average time it takes to provide a service or resolve an issue of a standard complexity, assuming an agent's standard training and skill level. Expected times to close are typically set for a variety of incident/task types and categories, and each incident that takes longer than the target time to resolve is marked in breach of SLA. Most contracts have targets for monthly breaches, expressed as a percentage of total incidents, and penalties can be attached when the targets are not met. If SLAs are tight, agents start to look for innovative ways to game the system. This article explores a few well-known ways to game the typical service contract. The objective here isn't to encourage cheating; rather, the goal of this article is to help improve contract structure by adding some Key Performance Indices (KPIs) to help spot and discourage some of these behaviors.

Nomenclature

Gaming service desk SLAs can apply to both Incidents and Requests, so these terms are being used interchangeably in this document. They are very different service delivery tasks with different objectives, and a discussion of these two tasks is beyond the scope of this paper. In addition, the term "caller" can also be referred to as "requester" for our purposes, although many service management systems make a distinction between the "caller", being the person having the need or issue, and the "requester", being the person submitting the request or incident on behalf of the caller. Lastly, "resolver" refers to the person who is responsible for resolving the issue or completing the request. "Agent" is another commonly used term.

The Sweep: Cancel the incident, ideally before breach

Canceled incidents and requests are generally those that resolve by themselves or by the caller, become irrelevant due to caller reassignment/termination, are superseded by a new incident, or generally do not require any work due to a change of status or information provided. These often don't count towards SLAs because no work is assumed to have been done if the incident was canceled, and if work was done, it probably did not result in a satisfactory result for the caller, or certainly not one worth paying for. Knowing this, a resolver can cancel incidents that are rather difficult, aging, or from a hostile caller, knowing that if the issue persists, a new incident can be opened with a fresh SLA timer. Of course, a valid response to a cancellation objection is that the incident was cancelled prematurely, or by accident. Sweeps can occur right before a report cycle to clean up an aging backlog, and in many cases, there might not be many complaints since aging incidents tend to be stale anyway. Still, the correct procedure is to reach out to the caller to confirm resolution.

The Punt: escalate or assign to another resolver group

In a single-vendor environment, the punt is often premature escalation. This can happen when the first level agent's knowledge/skill is insufficient to resolve the issue, the Knowledge Base (KB) articles are incomplete, or training has not been provided/completed for the subject area. In a multi-vendor environment, each vendor might have unique SLAs so that when an incident gets punted to another vendor's queue, the clock on the previous vendor stops and the next vendor's clock starts. This provides a handy way for less-than-honerable vendors to reassign undesirable incidents or requests to another resolver group, thereby stopping the SLA clock and allowing the resolver to work on something more fruitful. This is one of the reasons reassignment count is a key measure to monitor, however - it is unusual to see an SLA on reassignment counts because the client and caller typically only cares that the incident was resolved within the expected time.

The Push Back: Push a task to the Caller


SLAs typically exclude "Waiting For Customer" time because a customer response is outside of an agent's control. Knowing this, an agent can ask the caller for some piece of information that will take them some time to obtain, thereby stopping the SLA clock. Instead of using remote desktop, for example, an agent can refer to a Knowledge Base (KB) article and instruct the user to follow the steps and report back. This will stop the SLA clock, allowing the agent to work on other things, or to prepare a more thoughtful analysis.

The Close: Close the incident right before breach

Similar to canceled incidents, except a closed incident will count towards SLA calculations and result in a survey being sent to the caller, so if the issue was closed within the SLA time frame, then credit will be given for the incident. Since most contracts include a provision to achieve X% of closed incidents within SLA for a given bill period, it helps to stuff a few winners in the queue. The risk, of course, is that the incident is closed unresolved, in which case the resolver risks receiving a poor Customer Satisfaction (CSAT) or Net Promoter Score (NPS) survey response. Since response rates can be low, depending on the organization, this risk can be worth taking.

Obfuscation

Obfuscation hides incidents by associating them with incorrect Configuration Items (CIs), applications, or business services. Incorrect associations are more commonly the result of mistakes or laziness, but there can be motivation to avoid correcting mistaken associations through the life of an incident or request. An example would be a mobile user having issues with email might request assistance from the mobile device team, but it is, in fact, the messaging team that fat-fingered a configuration change impacting all mobile users. If the messaging team is a different vendor and is inundated, they might be inclined to let the mobile device team take the hit for the incident and close it out with the incorrect associations. Obfuscation can have a bigger impact when reporting by application or business service. Creating accurate attributions to application or business service is not always easy since since it requires root cause identification and documentation, and that takes time away from resolving new incidents. In addition, it is not common for a data analyst to create post-Morten attributions unless directed to do so. Of course, separating the data would not be an impossible task, though the obfuscation makes it more difficult, and someone would have to know how to identify most likely causes from the incident data. One way to do this is with word clouds, described in another article.

The Nuke: Wipe and Load (Examples: re-image a desktop, reboot a server, re-install software, failover to redundant system, etc.)

The nuke is the most disruptive of all options to the requester in terms of time and inconvenience, but generally has a high success rate because it eliminates many variables. Nuclear resolutions should generally be last-resort options, but when time and/or skill are in short supply, these options have known times to implement, thereby reducing the risk of a breach. Everyone on a desktop support team, for example, knows how to re-image a desktop, and how long it takes. Sadly, few companies track "nuclear" resolutions as a KPI, but they are generally easy to tease out of the resolver notes and can yield some interesting results.

Prime the Customer Satisfaction (CSAT) or Net-Promoter Score (NPS) Survey

After an incident has been resolved, it's important to measure how happy the requester is with the service they received. The two best-known scoring mechanisms are CSAT and NPS. A discussion of the differences between these scoring methods is outside the scope of this article. The data for these scores is collected by survey after an incident or service request has been resolved. Agents can "prime" the caller's responses by suggesting that they would appreciate a high score for work well done. This is a proven effective method of increasing the number of responses, as well as improving the scores received. Most contracts include SLAs and penalties for CSAT or NPS scores, but should also include clear guidelines on what is acceptable communication prior to sending out feedback requests.

Conclusion

Business Process Outsourcing (BPO) reporting is a data-intensive activity due to many Service Level Agreements (SLAs) that define the performance expectations of the service. Any measure can be gamed, and this article explores a few ways that service delivery teams can game their contracts to make their performance measures look better than they really are. Knowing what to look for helps us create better contracts by enhancing SLAs with Key Performance Indicators (KPIs).