MSA, or measurement systems analysis, is something you really need to be aware of.
Most of us in the medical device industry are familiar with process capability measurement terms Cp, Cpk, Pp, and Ppk. A small subset of us uses these metrics to generate process validation reports claiming to meet our customers’ capability requirements. But how many OEMs and suppliers have gone to the next level by providing the empirical evidence necessary to prove that the measured data and calculated capability index is trustworthy? Why aren't we equally as familiar with terms like “percent study variation,” and “precision to tolerance?”
Medical device suppliers and OEMs alike will benefit by discovering one of the lesser known keys to world class manufacturing—measurement systems analysis (MSA). Before we start discussing the benefits of MSA, let's establish a basic understanding of histograms and capability metrics. We’ll use a hypothetical process and measurement system to help explain the concepts. The Acme Company produces a portable medical device that displays remaining battery voltage to the end user. The fuel gauge chip that reads the voltage must be calibrated to within +/- 6 mV of the actual voltage (fuel gauge reading versus the input voltage) before it’s assembled into the product. Acme has established a special process for initial calibration and their quality management system requires this process to demonstrate Cpk ≥ 1.33 before being transferred to production.
Figure 1 is a histogram of 100 readings taken in Acme's QA lab by a single technician. The technician uses an array of lab equipment to inspect the incoming components for calibration accuracy. If the components are not shown to be within the calibration limits, they must be re-calibrated prior to being assembled into the device.
1. The histogram was generated by a statistical software package that added a "fitted line" to the graph as a prediction of what the distribution would look like if additional data were collected from the same process affected by the same variables.
Figure 2 is a process capability graph of the same data set used to make the histogram in Figure 1. As shown, the process is centered, making Cp ≈ Cpk. The process spread fits between the upper and low spec limits one time, making Cp ≈ 1 and approximately equal to 3 sigma. If the process was to remain centered and if the standard deviation was reduced from about 2 to 1 mV, the process spread would be cut in half and its width would fit within the specification limits nearly twice, making it a 6-sigma process where Cp = Cpk = 2.
2. Shown is a process capability graph of the same data set used to make the histogram in Figure 1.
Note that Cp is the short-term potential capability because it uses limited amounts of data and only compares the process width to the width between the spec limits. Cpk is the short term actual capability because it compares both the process width and location to the targeted value. If the process is not centered around the target, the Cpk will be less than Cp.
Pp and Ppk have similar definitions except that they account for long-term process variation due to the fact that the longer the process is running, the higher the probability that additional variation will have an effect on or influence the process for the worse. With these measurements, it appears that the process used to calibrate Acme's fuel gauges will require improvements to bring its capability up from 1.06 to the required minimum of 1.33 before the product is transferred to production. But does the Acme team really know how much improvement needs to be made? Is it possible that their measurements have misled them and in reality the short-term capability is much less?
Let's go back to the measurements taken in the lab. Would we get the same readings if the lab were to measure each part multiple times? Or if the lab had used more than one technician to measure each part? Or if the lab had utilized several different pieces of equipment to take the readings?
Suppose we took five calibration measurements on the same fuel gauge and it yielded the off sets from the input voltage as shown in Figure 3.
3. Fuel-gauge measurements must fall within the tolerance window.
Should this fuel gauge's calibration pass or fail? Which reading should be used to answer that question? With this level of uncertainty about the measurements relative to the specification limits (a.k.a. precision to tolerance), the calculated Cpk pictured in Figure 2 must be questioned.
For the sake of working through this example, let's assume that the Acme team identified a +2-mV bias in the measurement system that was affecting every reading. Subtracting 2 mV from each of the 100 readings in an attempt to display a more accurate picture of performance will result in a negative 2-mV shift of the capability histogram and lower the Cpk value from 1.06 to 0.71, as depicted in Figure 4.
4. The figure shows how the Cp value remains at 1.07, but the Cpk drops to 0.71 due to the process mean shifting away from the target.
Let's consider another way in which measurement-system variation can impact your ability to monitor and control your process. Suppose the Acme team has implemented process changes aimed at shifting the average calibration delta from +4 to 0 mV.
In Figure 5, let the black dots represent five measurements taken on a single part from the old process and the blue dots represent five measurements taken from a single part produced by the new process. With the measurement variation of a single part taking up more than half of the overall process variation, our level of confidence about the targeted improvement is very low. We can’t be certain as to which measurements from each process should be used for comparison.
5. Unfortunately, measurements depict a low level of confidence between the old and new processes.
These examples are intended to demonstrate the inherent risk likely to be found in all forms of measurement. Capability metrics should never be referenced without first establishing a statistically valid measurement system.
FDA regulations regarding MSA
As a six sigma black belt who has recently moved from the automotive to the medical device industry, I’ve been surprised to experience a relatively relaxed level of effort being applied to the validation of measurement systems tied to manufacturing processes. I attended a medical-device manufacturing conference recently and participated in two days of conference sessions dedicated to process validation in hopes of finding evidence that MSA theories and practices were being presented. Unfortunately, this material was not presented. However, the presentations did expose me to the maze of regulatory acronyms and requirements that helped me understand why the industry's maturity level on this topic appears to be sub-par.
Here's the abbreviated version of how MSA ties to FDA regulations. If your company produces medical devices that require FDA approval via the 510(k) application, then your company and its suppliers are subject to the FDA Quality System Regulation and must comply with the 21st volume of the Code of Federal Regulations (CFR) Part 820 of US federal law, also referred to as the FDA GMP or Good Manufacturing Practice. This law states, “The QS regulation requires you to validate processes whenever the results of a process cannot be fully verified by subsequent inspection and test methods.” This is the technical definition of a “special process.”
Enter stage left, the Global Harmonization Task Force (GHTF) study group 3 guidelines on process validation which is directly referenced as additional guidance to process validation, including Installation, Operational, and Performance Qualifications (IQ, OQ, and PQ). Unfortunately, the task force barely scratches the surface with this single sentence on the topic, “When studying variation, good measurements are required. Many times an evaluation of the measurement system should be performed using a gauge R&R or similar study.”
I'm astonished to see such a fundamental building block of process validation receive so little attention in an industry that’s so heavily regulated.
Measurement System Analysis
Let’s now focus on the terms and tools used to evaluate the validity of measurement systems. The most common source for MSA definitions, standards, and practical theory comes from a handbook titled MSA Fourth Edition by the Automotive Industry Action Group (AIAG). While the book was written to satisfy the specific needs of the automotive industry, I can’t find an equivalent written specifically for the medical device industry, nor do I find the handbook lacking in any critical areas.
By definition, Measurement System is the collection of instruments or gauges, standards, operations, methods, fixtures, software, personnel, environment, and assumptions used to quantify a unit of measure. In other words, it’s the complete process used to obtain measurements. In theory, a perfect measurement system will give you the same answer for repeated measurements. However, measurement systems in the real world seldom act like that due to the abundance of variables involved.
Figure 6 divides the most common causes of measurement error into two basic categories. For the sake of brevity, our discussion about accuracy issues will be limited as they are typically controlled through proper gauge selection and robust calibration procedures. Precision, on the other hand, is very dependent upon human factors, peripheral equipment such as holding fixtures, and the nature of the device under inspection.
6. The most common causes of measurement error can be divided into two basic categories—accuracy and precision.
In a good measurement system, the total measurement variation will be stable and small compared to the specification window and process variation. So what does that mean?
Modern statistical software packages can make all these calculations.
Now that we have metrics to characterize the precision of our measurement system, the question always arises as to how much measurement error is acceptable. While the final acceptance criteria ultimately depends on the measurement system's purpose and customer requirements, the general rule of thumb for both P/T and R&R says that error below 10% is accepted with little to no concern. Measurement error between 10% and 30% may be acceptable based on the application, the customer's requirements, and business cases such as cost and risk. Error above 30% is considered unacceptable.
Gauge R&R vs. MSA caution
Keep in mind that Gauge R&R is a subset of MSA and by itself can’t satisfy requirements to validate an entire measurement system. For example, a drop gauge may perform with less than 10% error when measuring the height of a rigid part resting on a granite slab, resulting in an acceptable Gauge R&R. However, this doesn’t mean the same gauge will consistently produce less than 10% measurement error when measuring other part types since elements such as holding fixtures can introduce new sources of variation.
The first MSA step is to set operational definitions. Hence, you should form a core team of process experts, then decide and document what key characteristic of the product or process needs to be measured. Confirm the specification limits for the chosen attribute and gain consensus on what percentage of measurement error will be acceptable. For discrete piece parts, these definitions can usually be taken straight from the engineering drawing if it is dimensioned properly and has key characteristics identified.
Determining these definitions when measuring a process is not as obvious. Consider the process of ultra sonic welding (USW) two halves of a plastic enclosure, which requires melting one part's energy director into the valley of the mating part. When done properly, this process results in a strong bond around the perimeter of the two parts and typically reduces the post weld assembly height by the height of the energy director (collapse distance). Correlation studies relating collapse distance to drop tests can be conducted to establish a lower specification limit for the minimum amount of collapse height that will ensure a bond strength that meets the customer's product requirement. To obtain reliable collapse-height readings, we first must prove that we can get reliable height measurements before and after the assembly is welded. As this example demonstrates, coming up with the operational definitions can be a science experiment in and of itself.
The second step is to identify the data type, of which there are two. Variable or continuous data is obtained with instrumentation and usually includes decimal places e.g., temperature and distance. Attribute or discrete data usually comes from counting our grouping results such as the number or types of defects. This is a very important distinction when it comes to MSA because measurement systems using attribute data require a very specific set of tools. The measurement system’s precision is still paramount but the evaluation of P/T and RR is replaced with calculations of Percent Agreement within appraisers, between appraisers, and between appraisers and the standard. The USW process measurement is in thousandths of inches and is therefore variable data.
Step three identifies the inputs. Consider all inputs to the measurement system in an effort to minimize all potential contributors to measurement variation. An Ishikawa (fishbone) diagram is helpful for brainstorming. Next, take action on critical inputs. Examples for the USW process may include acquiring a new drop gauge with increased resolution or building a holding fixture to ensure alignment of the assembly when taking pre-weld measurements. Next, check for accuracy. Every instrument used in the measurement process should be checked for discrimination, bias, linearity, stability and calibration.
If your measurements result in the destruction of the specimen, then you’ll need to apply additional steps to your MSA to overcome the obvious problem of not being able to measure the same part multiple times. Ultrasonic welding (USW) is a process commonly measured in a sub-standard manner. According to one of the leading manufacturers of USW equipment, many of their customers use attribute data combined with destructive drop testing to validate their USW process. Aside from being expensive, this method can be misleading and provides little or no warning to potential drifts in capability.
Data collection for variable data studies typically includes 10 parts, three operators, and at least two repetitions. It’s also highly recommended that the measurements be made in random order to help eliminate human bias from operators who consciously or subconsciously try to get the same reading on each part. This is where statistical software packages become very helpful. They allow you to enter the part numbers, operator names, and number of intended replications. The software will automatically create a randomized worksheet to be followed while executing the MSA.
It’s critical for the team to specify an exact set of process steps and requirements to be followed while taking the measurements. Nothing should be left to assumption. For example, does the gauge need to be zeroed out between each measurement or is it sufficient to do it once at the beginning of the trials? And is your measurement process subject to ambient temperature and relative humidity? If so, you should specify the allowable range prior to initiating the sequence of measurements.
The core team that develops the measurement system is often composed of engineers or other non-production oriented people. These are not the operators who should be validating the measurement system. It’s important that those who will be using the gauges and equipment long term be the ones who take the measurements. If these individuals weren’t part of the development team, they’ll require training and time to practice.
Note that a core team member or two should be present while the operators are taking their measurements. One member should record the readings while the other silently observes the process, looking for sources of variation in case the resulting percentage of measurement error exceeds the allowable limit. In such an event, the observation notes will be critical when determining if possible measurement process changes are needed to reduce the variation to an acceptable level.
Depending on the tools used to calculate the results, a series of categories will be made available to provide insight as to which parts of the measurement system have the greatest portion of error. These results are usually made available in both text and graphs. The gauge run chart in particular is helpful in getting inexperienced or non-technical staff to understand any significant problems with the MSA. Beyond that, more comprehensive output forms are available but require additional training to interpret.
Finally, you must determine whether the MSA acceptance criteria are met. If not, you’ll appreciate that you’ve taken good observation notes and maintained a solid core team of process experts capable of identifying and eliminating any sources of variation. The MSA study results will assess whether you are getting credible measurement data. But it can't tell you how to improve the measurement process if the data isn't within an acceptable range of error.
In my quest for medical device specific MSA instruction at the aforementioned conference, I asked the instructors why MSA was not included (not even a single slide) in any of the presentations. They responded with the expectation that MSA is a "given" when performing process validation, as if they assumed everyone was familiar with the concepts and tools. When I queried conference participants about their exposure and experience with MSA, I received a mixed bag of responses. Those that understood the question usually confirmed the need for MSA and Gauge R&R but admitted their company was very lax about it and they could rarely define the acceptable limits of measurement error.
John Rokus is the vice-president of continuous improvement at Micro Power Electronics, responsible for training and coaching Micro Power employees in lean deployment, and applying Six Sigma problem-solving methodologies. He is accredited by the American Society for Quality as a certified Six Sigma Black Belt. Rokus earned his BS degree in Industrial/Manufacturing Engineering from Oregon State University.