Veeder-Root and CARB must agree on certification tests that demonstrate compliance: a sequence of failure-mode tests to determine if the diagnostic system meets the first CARB requirement that the diagnostic system must catch 95 percent of all true failures (the 1 percent false alarm level is tested differently). Working together, Veeder-Root and CARB engineers, statisticians and technicians develop and perform these certification tests. A technician working for CARB will visit a test station to set up a sequence of simulated failures (blocking the vapor-recovery hose). The CARB technician then will check to see if the diagnostic system catches each of these "true failures." Industrial engineers and statisticians from Veeder-Root and CARB must agree on the number of tests that are sufficient to certify a system as accurate.
There are two types of error to consider in this problem:
- fail a system that is actually good and
- pass a system that is actually bad.
Veeder-Root is most concerned about limiting the chances of the first type of error. CARB is most concerned about limiting the chances of the second type.
- The Veeder-Root engineers argue that a sequence of three tests is sufficient. The statisticians at CARB argue that this number is not sufficient to support the company's claim of 95 percent accuracy. What would the statisticians have to show to prove their point? What is the probability that a bad system, one working at only 80 percent accuracy, would pass this test?
- The statisticians from CARB want to require that the system catch seven out of seven failures in a test. The Veeder-Root statisticians argue that this requirement is too stringent. What can Veeder-Root show to demonstrate its point? What is the probability that a good system will fail the CARB test?
The engineer or statistician would use a spreadsheet to analyze many different types of tests. For example, the probability that a system with a 95 percent success rate will catch X = 3 out of three simulated failures is
The following table records the probability of passing the system if it is operating at accuracy level p and the certification test requires n out of n successful trials.

Once the research is complete, Veeder-Root must submit a written report and make an oral presentation of its findings and analysis to CARB for its new diagnostic system to be certified.
The full certification problem is much more complicated. For example, the diagnostic system actually must be able to detect as many as eight different types of failure. The company would like to argue that one "random" error in the failure-mode test is not sufficient to fail the entire system. How many trials are sufficient if the system is allowed one error in n trials?
Perhaps the main difficulty in this problem lies in clearly defining a "bad" system. In particular, it is very difficult to develop a test that will distinguish a "bad" system working at 94 percent accuracy from a "good" system working at 95 percent accuracy. The most important lesson learned in the mathematical analysis here could be the value of providing better definitions of good and bad systems.