The success or failure of biomedical start-ups depends on regulatory approval. The government, rightly, wants to make sure that these products solve the problems that they are purported to solve and don’t cause harm to the people who use them.
But the statistical analysis that is used to show how well a new biomedical product works, and therefore, whether it is worthy of approval, has some interesting wrinkles.
Take, for example, the case of Boston Scientific’s new Taxus Liberte heart stent. The marketplace section of the August 14 Wall Street Journal had a story about a “flaw” in a Boston Scientific study of its new stent.
Two things matter a lot in studies of performance for new biomedical products: how big is the effect and how certain are we that the effect is real and not just a lucky draw. The discussion here is not about the size of the effect of Boston Scientific’s Taxus Liberte stent. The study it did for the FDA to showed that new stent was just as good at avoiding clogging as its old stent.
The question is how certain are we that the researchers’ finding is not wrong.
The Wall Street Journal article explained, “Medical studies define success or failure in testing a hypothesis by calculating a degree of certainty, known as the p-value. The p-value must be less than 5% for the results to be considered significant.” It goes on to say that there are a variety of ways to calculate the p-value and they generate slightly different results.
Using a statistic called Wald value, the Boston Scientific researchers said that there was only a 4.874% chance that they were wrong about the effect. But if they used the NCSS LLC’s exact double binomial test, the chance that they were wrong was 5.47%.
That is, one statistical test shows a 0.596% smaller chance that the finding was wrong than the other test. The problem is that the Wald test said that the chance that they were wrong was less than 5% and the NCSS test said that the chance that they were wrong was more than 5%.
That difference matters because 5% is a magic number. If the researchers had found that the Wald test had shown a p-value of 4.278% and the NCSS LLC’s exact double binomial test had shown a p-value of 4.874%, also a difference of 0.596% between the two tests, there would be no issue because both p-values would be less than 5% certainty.
The success of a new biomedical product can ride on whether the 0.596% difference in the certainty of the finding of an effect of a new drug or medical device across different statistical tools falls above or below 5%.
The problem is that 5% is just a convention. The world of scientific research could have developed the convention that the level of certainty that we need is 4% or 6% or something else.
Now Boston Scientific is a big company and will probably survive no matter what happens to this product. But suppose we were talking about a start-up here. Most biomedical start-ups initially try to develop a single new product. So their success or failure as companies depends on the approval of that product. If the product doesn’t get approved, they often go out of business and don’t get a chance to develop a second version of the product or a different product.
Essentially, we evaluate the efficacy of biomedical products, and stake the success or failure of biomedical start-ups, on whether a particular statistical tool shows the confidence that we have in the finding to be slightly above or slightly below a level of certainty that happens to be a convention that researchers have developed.