Business school researchers have made a fundamental error in their efforts to understand entrepreneurship. They have incorrectly assumed that most outcomes of interest in the startup world are normally distributed when they generally follow a power law distribution, Chris Crawford and his colleagues find in a new paper in Journal of Business Venturing.
Social scientists generally assume that the phenomena they are seeking to explain follow a normal distribution. This works pretty well for explaining a lot of things in this world, like the height of adult men or grocery prices, but they work rather poorly for explaining the performance of startups.
Crawford and others, like Jerry Neumann report that key indicators of the performance of new companies — including revenue and employment growth, firm valuations and angel and venture capital returns — follow a power law distribution. With a power law distribution, a few extreme cases account for almost all of the results, whether what you are measuring is the fraction of Y-Combinator’s returns that come from investment in Airbnb, the source of profits in Sequoia Capital’s latest fund or the jobs created by American industry.
Crawford and his colleagues make a bold claim in the abstract of their paper. They say, “our results call for the development of new theory to explain and predict the mechanisms that generate these distributions and the outliers therein.”
To understand why they are right, let me highlight three implications of their findings:
• The statistical assumption of the vast majority of entrepreneurship research conducted today is incorrect, making their findings suspect. Take, for example, this line from a scholarly article by Johan Wiklund of Syracuse University and Dean Shepherd of Indian University who write (2011:927) “in any sample of firms it can reasonably be assumed that performance will vary normally around a mean.”
The assumption of the distribution of firm performance leads researchers like Wiklund and Shepherd to use inferential statistics based on normal distributions. But Crawford and colleagues show that the data on start-up firm performance isn’t normally distributed, but follows a power law distribution. As the figure that I borrowed from their paper shows, normal distributions and power law distributions are very different animals. Assuming that the data follows one pattern when it actually follows another is going to mean that your statistical analyses will be wrong.
• Researchers’ efforts to ensure that their data “fit” the assumptions of normality lead them to throw away the very data that contains the most information about entrepreneurship. Statistical analysis that depend on the assumption of a normal distribution are very sensitive to outliers — like Uber’s latest valuation or Facebook’s market capitalization. To avoid the “bias” that will come from trying to include outliers in analyses that rely on normal distributions, researchers typically eliminate them. But when what you are measuring follows a power law distribution, that approach is akin to throwing the baby out instead of the bath water.
• Policy makers’ concerns about people’s privacy make it very difficult for researchers to accurately use government data to explain entrepreneurship. Most government databases, like those provided by the Census Bureau or the Federal Reserve, routinely “top code” — or remove the very highest performers — in public versions of their data sets to prevent users from identifying the study participants. That very effort to protect privacy undermines accurate measurement of entrepreneurship when the key variables researchers are predicting follow a power law distribution. The most important pieces of information in the database are the very numbers that are hidden from analysis.
Startup Photo via Shutterstock