Distributed Mind: google

Thursday, March 24, 2011

By James Whittaker

Instead of distinguishing between code, integration and system testing, Google uses the language of small, medium and large tests emphasizing scope over form. Small tests cover small amounts of code and so on. Each of the three engineering roles may execute any of these types of tests and they may be performed as automated or manual tests.

Small Tests are mostly (but not always) automated and exercise the code within a single function or module. They are most likely written by a SWE or an SET and may require mocks and faked environments to run but TEs often pick these tests up when they are trying to diagnose a particular failure. For small tests the focus is on typical functional issues such as data corruption, error conditions and off by one errors. The question a small test attempts to answer is does this code do what it is supposed to do?

Medium Tests can be automated or manual and involve two or more features and specifically cover the interaction between those features. I've heard any number of SETs describe this as "testing a function and its nearest neighbors." SETs drive the development of these tests early in the product cycle as individual features are completed and SWEs are heavily involved in writing, debugging and maintaining the actual tests. If a test fails or breaks, the developer takes care of it autonomously. Later in the development cycle TEs may perform medium tests either manually (in the event the test is difficult or prohibitively expensive to automate) or with automation. The question a medium test attempts to answer is does a set of near neighbor functions interoperate with each other the way they are supposed to?

Large Tests cover three or more (usually more) features and represent real user scenarios to the extent possible. There is some concern with overall integration of the features but large tests tend to be more results driven, i.e., did the software do what the user expects? All three roles are involved in writing large tests and everything from automation to exploratory testing can be the vehicle to accomplish accomplish it. The question a large test attempts to answer is does the product operate the way a user would expect?

The actual language of small, medium and large isn’t important. Call them whatever you want. The important thing is that Google testers share a common language to talk about what is getting tested and how those tests are scoped. When some enterprising testers began talking about a fourth class they dubbed enormous every other tester in the company could imagine a system-wide test covering nearly every feature and that ran for a very long time. No additional explanation was necessary.

The primary driver of what gets tested and how much is a very dynamic process and varies wildly from product to product. Google prefers to release often and leans toward getting a product out to users so we can get feedback and iterate. The general idea is that if we have developed some product or a new feature of an existing product we want to get it out to users as early as possible so they may benefit from it. This requires that we involve users and external developers early in the process so we have a good handle on whether what we are delivering is hitting the mark.

Finally, the mix between automated and manual testing definitely favors the former for all three sizes of tests. If it can be automated and the problem doesn’t require human cleverness and intuition, then it should be automated. Only those problems, in any of the above categories, which specifically require human judgment, such as the beauty of a user interface or whether exposing some piece of data constitutes a privacy concern, should remain in the realm of manual testing.

Having said that, it is important to note that Google performs a great deal of manual testing, both scripted and exploratory, but even this testing is done under the watchful eye of automation. Industry leading recording technology converts manual tests to automated tests to be re-executed build after build to ensure minimal regressions and to keep manual testers always focusing on new issues. We also automate the submission of bug reports and the routing of manual testing tasks. For example, if an automated test breaks, the system determines the last code change that is the most likely culprit, sends email to its authors and files a bug. The ongoing effort to automate to within the “last inch of the human mind” is currently the design spec for the next generation of test engineering tools Google is building.

Those tools will be highlighted in future posts. However, my next target is going to revolve around The Life of an SET. I hope you keep reading.

Monday, March 14, 2011

Lots of questions in the comments to the last two posts. I am not ignoring them. Hopefully many of them will be answered here and in following posts. I am just getting started on this topic.

At Google, quality is not equal to test. Yes I am sure that is true elsewhere too. “Quality cannot be tested in” is so cliché it has to be true. From automobiles to software if it isn’t built right in the first place then it is never going to be right. Ask any car company that has ever had to do a mass recall how expensive it is to bolt on quality after-the-fact.

However, this is neither as simple nor as accurate as it sounds. While it is true that quality cannot be tested in, it is equally evident that without testing it is impossible to develop anything of quality. How does one decide if what you built is high quality without testing it?

The simple solution to this conundrum is to stop treating development and test as separate disciplines. Testing and development go hand in hand. Code a little and test what you built. Then code some more and test some more. Better yet, plan the tests while you code or even before. Test isn’t a separate practice, it’s part and parcel of the development process itself. Quality is not equal to test; it is achieved by putting development and testing into a blender and mixing them until one is indistinguishable from the other.

At Google this is exactly our goal: to merge development and testing so that you cannot do one without the other. Build a little and then test it. Build some more and test some more. The key here is who is doing the testing. Since the number of actual dedicated testers at Google is so disproportionately low, the only possible answer has to be the developer. Who better to do all that testing than the people doing the actual coding? Who better to find the bug than the person who wrote it? Who is more incentivized to avoid writing the bug in the first place? The reason Google can get by with so few dedicated testers is because developers own quality. In fact, teams that insist on having a large testing presence are generally assumed to be doing something wrong. Having too large a test team is a very strong sign that the code/test mix is out of balance. Adding more testers is not going to solve anything.

This means that quality is more an act of prevention than it is detection. Quality is a development issue, not a testing issue. To the extent that we are able to embed testing practice inside development, we have created a process that is hyper incremental where mistakes can be rolled back if any one increment turns out to be too buggy. We’ve not only prevented a lot of customer issues, we have greatly reduced the number of testers necessary to ensure the absence of recall-class bugs. At Google, testing is aimed at determining how well this prevention method is working. TEs are constantly on the lookout for evidence that the SWE-SET combination of bug writers/preventers are screwed toward the latter and TEs raise alarms when that process seems out of whack.

Manifestations of this blending of development and testing are all over the place from code review notes asking ‘where are your tests?’ to posters in the bathrooms reminding developers about best testing practices, our infamous Testing On The Toilet guides. Testing must be an unavoidable aspect of development and the marriage of development and testing is where quality is achieved. SWEs are testers, SETs are testers and TEs are testers.

If your organization is also doing this blending, please share your successes and challenges with the rest of us. If not, then here is a change you can help your organization make: get developers fully vested in the quality equation. You know the old saying that chickens are happy to contribute to a bacon and egg breakfast but the pig is fully committed? Well, it's true...go oink at one of your developer and see if they oink back. If they start clucking, you have a problem.

Distributed Mind

How Google Tests Software - Part Five

By James Whittaker

How Google Tests Software - Part Three