Averages Considered Harmful
Arithmetic mean (aka average) is often a misleading number. One reason for this is that mean is sensitive to outliers. A very large or a very small value can greatly influence the average. In those situations a better measure of center is the median (the 50th percentile). But there is a second huge pitfall awaiting anyone using average for estimating or benchmarking: software size.
Even though we know that software size has a major influence on the key metrics (e.g., effort, duration, productivity, defects) many people insist on reporting and comparing and using the average value. Let’s look at an example. Consider a sample of 45 completed telecommunications application type projects. Picking one of the key metrics already mentioned, duration of phase 3, we can generate a histogram and calculate the mean. The average duration is 27.5 months. Does this tell us anything useful?
The histogram of durations shows a skewed distribution (many projects have a shorter duration, few have a long duration), so we will need to do some sort of normalization before the average is a measure of center. And even then, what about size? In a typical SLIM scatterplot of duration versus size for these projects, we can see that in general larger projects take longer than smaller ones.