The Apple 8.0.1 Debacle: Whom to blame?

Marc Andreessen drew my attention to a Bloomberg article that laid out what it purported to be “links” with the failed Maps launch. @pmarca was properly skeptical of the article:

And indeed, the piece starts in on the leader of the quality assurance effort, noting that:

The same person at Apple was in charge of catching problems before both products were released. Josh Williams, the mid-level manager overseeing quality assurance for Apple’s iOS mobile-software group, was also in charge of quality control for maps, according to people familiar with Apple’s management structure.

If you didn’t read any further, you’d think the problem was solved. Some guy wasn’t doing his job. Case closed.

But are quality problems ever so simple? After all, Isn’t quality supposed to be built into a product? If this guy was the problem, then why was Apple leaning so heavily on him to lead its bug-finding QA group?

Well, reading on is rewarding, for it becomes clear that the quality problems at Apple run deeper than a bad QA leader. For example, turf wars and secrecy within Apple make it so:

Another challenge is that the engineers who test the newest software versions often don’t get their hands on the latest iPhones until the same time that they arrive with customers, resulting in updates that may not get tested as much on the latest handsets. Cook has clamped down on the use of unreleased iPhones and only senior managers are allowed access to the products without special permission, two people said.

Even worse, integration testing is not routinely done before an OS feature gets to QA:

Teams responsible for testing cellular and Wi-Fi connectivity will sometimes sign off on a product release, then Williams’ team will discover later that it’s not compatible with another feature, the person said.

So all you Apple fans, just remember the joke we used to make late in a project: “What’s another name for the release milestone? User Acceptance Testing begins!”

Why personal behaviors impact testing

My last post used testing to illustrate the consequences of questionable personal behavior on a business situation.  Quality is susceptible to personal and professional gaps that interact to amplify each other’s effects.

Why is that so?  Let’s start with the examples I used.  Recall that business process owners simply copied the unit tests of the developers to serve as user acceptance tests.   I characterized this approach as a failure of accountability: the process owners didn’t believe it was their “real” job, even though they knew they would have to certify the system was fit for use.  Less charitably, one could have called it laziness.  More charitably, one could have called it efficiency. 

And indeed, an appeal to efficiency underlay the rationalizations of these owners: “Why should I create a new test when the developer — who knows the system better than I do — has already created one?”  How would you answer this question?  As a leader, do you know the snares such testing practices lay in your path?  Off the top…

  1. Perpetuating confirmation bias:  By the time someone presents work product for formal, published testing, he or she has strong incentives to conduct testing that proves that the work product is complete.  After all, who doesn’t want his work to be accepted or her beliefs confirmed?   This issue is well-known in the research field, so one should expect that even the most diligent developer will tend to select testing that confirms that belief.   An example is what on one project we called the “magic material number”, a material that was used by all supply chain testers to confirm their unit and integration tests.  And the process always worked…until we tried another part number.
  2. Misunderstanding replicability:  “Leveraging” test scripts can be made to sound like one is replicating the developer’s result.  I have had testers justify this short cut by appealing to the concept of replicability.  Replicability is an important part of the scientific process.  However, it is a part that is often misunderstood or misapplied.  In the case of copied test plans, the error is simple.  One is indeed following the process test exactly — a good thing — but applying it to the same test subject (e.g., same part, same distribution center, etc.).  This technique means that the test is only applied against what may be “a convenient subset” of the data.
  3. Impeding falsifiability: This sounds like a good thing, but isn’t.  In fact, the truth of a theory — in this case, that the process as configured and coded conforms to requirements — is determined by its “ability to withstand rigorous falsification tests” (Uncontrolled, Jim Manzi, p. 18).  Recall the problem with engaging users in certain functions?  These users’ ability to falsify tests makes their disengagement a risk to projects.  Strong process experts, especially those who are not members of the project team, are often highly motivated to “shake down” the system.  Even weaker process players can find gaps when encouraged to “do their jobs” using the new technology using whatever parts, customers, vendors, etc. they see fit.

I hope this example shows how a personal failing damages one’s professional perspective.  No one in this example was ignorant of the scientific method; in fact, several had advanced hard science or engineering degrees.  Nonetheless, disagreement who owned verifying fitness for use led to rationalizations about fundamental breaches in testing.

How personal shortcomings undermine recovery (Mini Case Part 2)

Unfortunately, our quality control processes didn’t fare so well. We did get sufficient testing resources for the first rollout, but a couple of process owners only delivered under protest. For you see, they believed that testing of their processes — even user acceptance testing (UAT) — was not their job. To put it another way, they did not hold themselves accountable to ensure that the technical solution conformed to their processes’ requirements.

This personal shortcoming — an unwillingness to be accountable — triggered a chain of events that put the program right back in a hole:

  • Because it wasn’t their “real” job, some process owners did not create their own user acceptance tests. They simply copied the tests the developers used for unit or integration testing. Therefore, UAT did not provide an independent verification of the system’s fitness for use; it simply confirmed the results of the first test.
  • This approach also allowed process gaps to persist. Missing functionality that would have been caught with test plans that ensured process coverage went unnoticed.
  • Resources for testing were provided only grudgingly and were often second-rate. They often did not know enough about system and process to run the scripts, never mind verify the solution or notice process gaps.

To say it was a challenging cutover and start-up would be an understatement.  Yawning process gaps remained open because they had never been tested.  For sure, we had a stack of deliverable acceptance documents, all formally signed off.  What we didn’t have was a process that was enabled and fit for use.  One example: 

  • Documents remained stuck in limbo for weeks after go live, all because a key approval workflow scenario had not even been developed.
  • And because it hadn’t been developed, the developers hadn’t created and executed a test script.
  • And because the process owners were so focused on doing only their “real” job, they missed a gap that made us do business by hand for nearly two months.

Can personal shortcomings undermine recovery? (Mini Case Part 1)

I concede that projects can recover — at least for a time — without sustainable personal and professional behaviors in place. Heroic measures to catch up on accumulated technical debt, more testers to ensure all tests are completed, new resources that specialize in turnarounds can and do work… again, for a time.

But what happens when the “hero” team needs to take a week or three of down time? What happens when those additional testers go back to their “real” jobs? What happens when the turnaround team leaves? What happens is that the project risks a slide back into the abyss.

Even one gap can be problematic. For example, I was on a troubled transformation program that needed to use all three of these approaches: extraordinary effort, additional testers, and experienced recovery resources. And indeed, the heroic measures did create deliverables that were fit for use , the technical debt had been repaid, and the development team was staffed up to support the remainder of the program. The turnaround specialists put a set of program governance practices in place; even better, the program office continued to execute them effectively.  Quality assurance and testing were other matters entirely….

Story points and the meaning of done

Dan Ackerson at Boost Agile (@boostagile, HT  Craig Brown @brown_note)) posted on what he calls “cross-dysfunctional” development teams.  He focuses on an all-too-common case: developers want to keep working on to new features, but support and QA types want the bugs cleaned up.  IMO, Dan’s team is falling victim to a false choice here:

[Support and QA] don’t care about new user features – they want the bugs fixed. Because the project team only gives points to new features, the support colleagues feel left out and forgotten. Their job is seemingly made harder by our “overlooking” bug fixing. 

The choice isn’t between new features and bugfixes; IMO, the choice is between accepting a new feature only when it’s clean or doing constant rework when a bug-ridden story is called “done”.  How can a development team get “credit” for new feature story points when/if a feature doesn’t perform as designed?  

My take: this is another demonstration of why agile development has to be more disciplined than waterfall, not less.   Dan is on the right track when he suggests that “writing more tests” will get his team on a faster track.   His team should also take one step back and ensure they know what done looks like before writing!

Why SAP ramp-ups are hard to get into

Ramp-up interview in progress

Another note from the Dave Rosenberg post on cloud services (Dave’s post here)  inked to in my last post on software engineering (my post here).  I was often asked why SAP made it so hard to enter ramp-ups for new or upgraded solutions.  The answer is pretty simple and Dave put the reason succinctly in his post:

… it’s important to know what you are getting into if you want to use these [cloud] services now. As with any nascent technology, early adopters will benefit in some ways and suffer in others (emphasis mine).

It still surprises me how many technology professionals are themselves surprised that new solutions aren’t risk-free.   SAP wanted to make sure that ramp-up customers understood that, in the enterprise space especially, there is no way to test all the data constellations and system modifications a new solution will come up against. In fact, that’s one of the benefits of going into ramp-up: your bugs get worked first.

Good testers = crap code?

Fascinating phenomenon noted by Kevin Rutherford at Silk and Spinach (post here).  As he notes in his original post (here), a good tester (or testing team) can tempt developers to outsource all quality management to the testers.

However, because this is described as an “agile” project, I’m curious: What flavor of agile was used?   It sounds like that the “bad code” developers were throwing code over the wall to their tester, which seems like a parody of a waterfall life-cycle. 

The lack of details leads me to wonder how well the poorer-performing team was organized and led.  Per Kevin Schlabach’s comment, I’d expect that testers would be in the sprint team already.  That sounds like pretty standard practice.  I wonder about the team dynamics as well.  Lots of questions on that point:

  1. Where there interpersonal issues?
  2. How short were the effective iterations?  In other words, did the poorer-performing team behave more like a waterfall team, waiting until the last possible moment to turn over code (or did the tester only accept code at that last possible moment)?
  3. Implied in Kevin S’s comment are questions about the relative strengths of the sprint/iteration teams.  If the business team was working closely w/ developers, how did the tester end up testing so much crap code?
