The Apple 8.0.1 Debacle: Whom to blame?

Marc Andreessen drew my attention to a Bloomberg article that laid out what it purported to be “links” with the failed Maps launch. @pmarca was properly skeptical of the article:

And indeed, the piece starts in on the leader of the quality assurance effort, noting that:

The same person at Apple was in charge of catching problems before both products were released. Josh Williams, the mid-level manager overseeing quality assurance for Apple’s iOS mobile-software group, was also in charge of quality control for maps, according to people familiar with Apple’s management structure.

If you didn’t read any further, you’d think the problem was solved. Some guy wasn’t doing his job. Case closed.

But are quality problems ever so simple? After all, Isn’t quality supposed to be built into a product? If this guy was the problem, then why was Apple leaning so heavily on him to lead its bug-finding QA group?

Well, reading on is rewarding, for it becomes clear that the quality problems at Apple run deeper than a bad QA leader. For example, turf wars and secrecy within Apple make it so:

Another challenge is that the engineers who test the newest software versions often don’t get their hands on the latest iPhones until the same time that they arrive with customers, resulting in updates that may not get tested as much on the latest handsets. Cook has clamped down on the use of unreleased iPhones and only senior managers are allowed access to the products without special permission, two people said.

Even worse, integration testing is not routinely done before an OS feature gets to QA:

Teams responsible for testing cellular and Wi-Fi connectivity will sometimes sign off on a product release, then Williams’ team will discover later that it’s not compatible with another feature, the person said.

So all you Apple fans, just remember the joke we used to make late in a project: “What’s another name for the release milestone? User Acceptance Testing begins!”

The Incentive/Behavior Nexus

Steve Kerr uses a General Motors cautionary tale to show us that it isn’t enough to have incentives that appear to reward desired behavior. In this HBR blog post, he notes that:

Although managers’ bonuses are based partly on vehicle-quality improvements, and safety is supposed to be paramount, cost is “everything” at GM, and the company’s atmosphere probably discouraged individuals from raising safety concerns. Earlier this summer, a former GM manager described a workplace in which the mention of any problems was unacceptable.

Kerr’s critical insight is that while GM could point to formal quality incentives, these incentives didn’t have the required impact on its managers’ behavior. The money quote for me is this:

In order to properly align its incentives to support its mission and objectives, a company must determine what managers and employees believe they are being encouraged to do and not do.

Why personal behaviors impact testing

My last post used testing to illustrate the consequences of questionable personal behavior on a business situation.  Quality is susceptible to personal and professional gaps that interact to amplify each other’s effects.

Why is that so?  Let’s start with the examples I used.  Recall that business process owners simply copied the unit tests of the developers to serve as user acceptance tests.   I characterized this approach as a failure of accountability: the process owners didn’t believe it was their “real” job, even though they knew they would have to certify the system was fit for use.  Less charitably, one could have called it laziness.  More charitably, one could have called it efficiency. 

And indeed, an appeal to efficiency underlay the rationalizations of these owners: “Why should I create a new test when the developer — who knows the system better than I do — has already created one?”  How would you answer this question?  As a leader, do you know the snares such testing practices lay in your path?  Off the top…

  1. Perpetuating confirmation bias:  By the time someone presents work product for formal, published testing, he or she has strong incentives to conduct testing that proves that the work product is complete.  After all, who doesn’t want his work to be accepted or her beliefs confirmed?   This issue is well-known in the research field, so one should expect that even the most diligent developer will tend to select testing that confirms that belief.   An example is what on one project we called the “magic material number”, a material that was used by all supply chain testers to confirm their unit and integration tests.  And the process always worked…until we tried another part number.
  2. Misunderstanding replicability:  “Leveraging” test scripts can be made to sound like one is replicating the developer’s result.  I have had testers justify this short cut by appealing to the concept of replicability.  Replicability is an important part of the scientific process.  However, it is a part that is often misunderstood or misapplied.  In the case of copied test plans, the error is simple.  One is indeed following the process test exactly — a good thing — but applying it to the same test subject (e.g., same part, same distribution center, etc.).  This technique means that the test is only applied against what may be “a convenient subset” of the data.
  3. Impeding falsifiability: This sounds like a good thing, but isn’t.  In fact, the truth of a theory — in this case, that the process as configured and coded conforms to requirements — is determined by its “ability to withstand rigorous falsification tests” (Uncontrolled, Jim Manzi, p. 18).  Recall the problem with engaging users in certain functions?  These users’ ability to falsify tests makes their disengagement a risk to projects.  Strong process experts, especially those who are not members of the project team, are often highly motivated to “shake down” the system.  Even weaker process players can find gaps when encouraged to “do their jobs” using the new technology using whatever parts, customers, vendors, etc. they see fit.

I hope this example shows how a personal failing damages one’s professional perspective.  No one in this example was ignorant of the scientific method; in fact, several had advanced hard science or engineering degrees.  Nonetheless, disagreement who owned verifying fitness for use led to rationalizations about fundamental breaches in testing.

Can personal shortcomings undermine recovery? (Mini Case Part 1)

I concede that projects can recover — at least for a time — without sustainable personal and professional behaviors in place. Heroic measures to catch up on accumulated technical debt, more testers to ensure all tests are completed, new resources that specialize in turnarounds can and do work… again, for a time.

But what happens when the “hero” team needs to take a week or three of down time? What happens when those additional testers go back to their “real” jobs? What happens when the turnaround team leaves? What happens is that the project risks a slide back into the abyss.

Even one gap can be problematic. For example, I was on a troubled transformation program that needed to use all three of these approaches: extraordinary effort, additional testers, and experienced recovery resources. And indeed, the heroic measures did create deliverables that were fit for use , the technical debt had been repaid, and the development team was staffed up to support the remainder of the program. The turnaround specialists put a set of program governance practices in place; even better, the program office continued to execute them effectively.  Quality assurance and testing were other matters entirely….

More on stage gates and project reviews

Not stage gate experts...

Not stage gate experts...

Per an earlier post (here), it is important to ask how to ensure that stage gates — and project reviews for that matter — are relevant to the project at hand.  It’s pretty simple IMO.

  1. Make sure that the stage gates match the project phase.  It is amazing how many gate reviews are conducted with a single set of questions.  There should be a general set of questions as well as a phase-specific set.  The questions in a gate review must match the expected deliverables for that gate.
  2. Structure the gate to include sessions that focus on the key capabilities and their associated deliverables.  This approach ties the review more tightly to the expected benefits of the project/program.
  3. Have subject matter experts involved during these capability-focused sessions.  We often pair these SMEs with another PM who leads the review.  They jointly review and prepare the questions.  Then the project reviewers ask most of the questions, while the SME jumps in on follow-ups or asks any technically-advanced questions.

Heresy on Stage Gates

Just say you love Q-Gates and well stop...

Old school Q-Gate

Glen Alleman tells us why he doesn’t like stage gates (here) and a number of readers respond in the comments. 

Per my comment, “gates” (we call them Q-Gates) are pointless if they aren’t closely tied to the status, progress, and forecast of the deliverables in question.  Back in the day, too many “gates” didn’t vary enough during the project lifecycle…too many “gates” were  simply an occasion to run through a generic checklist.  More of a CYA exercise, in other words.

Anyway, check out the post and comments.  It will make you look twice at whether your stage or Q-gates are there only for show.

Still looking down in checklists?

Just saw this article of the power of checklists in medicine (here) which reminded me that I had forgotten to include a link in an earlier post (here).  Atul Gawande wrote a long piece in the New Yorker a little more than a year ago simply entitled “The Checklist“.  It is far too rich to summarize effectively — please read the whole article — but below are a few snippets

[I]t’s far from obvious that something as simple as a checklist could be of much help in medical care. Sick people are phenomenally more various than airplanes… Mapping out the proper steps for each is not possible, and physicians have been skeptical that a piece of paper with a bunch of little boxes would improve matters much.

In 2001, though, a critical-care specialist at Johns Hopkins Hospital named Peter Pronovost decided to give it a try. He didn’t attempt to make the checklist cover everything; he designed it to tackle just one problem, the one that nearly killed Anthony DeFilippo: line infections…. These steps are no-brainers; they have been known and taught for years. So it seemed silly to make a checklist just for them. Still… [i]n more than a third of patients, [doctors] skipped at least one. 

The results were so dramatic that they weren’t sure whether to believe them: the ten-day line-infection rate went from eleven per cent to zero. So they followed patients for fifteen more months. Only two line infections occurred during the entire period. They calculated that, in this one hospital, the checklist had prevented forty-three infections and eight deaths, and saved two million dollars in costs.

Still think your project is too complicated to benefit from a checklist or two?

PM Quote of the Day — Pauline Kael

[T]he critic is the only independent source of information. The rest is advertising.

This quote came to mind as we’ve been going through an internal program’s requirements.  One of my colleagues regularly refers to, and insists on, using the “Four Eyes” principle.  In other words, one should always involve a second set of eyes to verify and validate work product.  If the project team is the only arbiter of progress

Too often, the four eyes principle often only gets honored in quality control — e.g., during post-build testing processes.  As my colleague’s insistence suggests, we find that an independent opinion is most valuable early in an initiative.  For example, wouldn’t it be useful to have an independent validation during planning that the requirements and deliverables really represent (and will make real) the capabilites the project or program is intended to put in place? 

It is tough enough to re-work a deliverable that doesn’t conform to requirements.  It is worse to have to re-build a deliverable that had conformed to requirements, but find that those requirements were never valid in the first place.

PM Quote of the Day — Vince Lombardi

Practice does not make perfect. Only perfect practice makes perfect.

When I was younger, I never got the point of practice.  Sure, I knew that it would get me in shape and knock of the rust off.  However, I never got the idea that practice would help me perform better under pressure.  Too many times I found myself over-thinking a situation because I hadn’t practiced enough to make it automatic.  I finally started to realize that realistic practice in all sorts of endeavors — in particular, public speaking and presenting — helped to take the edge off along with the rust.

Practice?...  Youre talkin about practice?

Practice?... We're talkin' about practice?

Lombardi’s point also applies to how we test our processes and systems.  Too often I’ve seen customers and consultants assume away difficulties in their desire to save testing time and money.  Even worse, this saving “spasm” usually comes towards the end of the project, just as the testing was about to get serious.

The best testing practice (so to speak) I’ve seen came at a global firm that does dirty and dangerous work.  As you might imagine, that company is very conscious of safety and quality.  That firm called their last round of testing not integration or user acceptance, but “business simulation.”  Business simulation didn’t simply involve folks following a script.  We brought the system, interfaces, and data up like go-live, then encouraged the users to go “do their jobs” and call support if something went wrong.

Sure, such an approach is expensive.  But how much is that peace of mind that comes with a no-holds-barred validation that the system and its support conformed to requirements worth?

%d bloggers like this: