Practical Software Estimation Measurement

Q&A Highlights from "Maximizing Value Using the Relationship between Software Size, Productivity, and Reliability"

During the webinar I recently presented, "Maximizing Value Using the Relationship between Software Size, Productivity, and Reliability," I received quite a few interesting questions. Here are the highlights:

Do you see the same behaviors in Agile projects as those you presented in this webinar?

In the work for my presentation, I did not look at Agile projects separately.  I was looking at overall trends, breaking things down by application type rather than by development methodology. 

However, Don Beckett recently made a conference presentation on Agile called “Beyond the Hype”.  Don looked at duration, effort, staff, productivity for Agile projects.  There is a nice table where he compared the performance of a typical agile project to a typical IT project. 

Don’s presentation summarizes it well.  The staff is a little higher on Agile projects, the duration and effort are a little lower, but the basic relationships between the metrics and size are similar.

Does the language an application development project is written in have any impact on the data? In other words, when looked at independently, do mainframe COBOL projects look different than .Net projects? 

I answered part of this during the webinar, but in general, the direction of the trend lines is the same regardless of the programming language.  If we think about a graph of productivity versus size, the sloping lines will be more or less parallel for different languages.  However, the vertical position of the lines will differ because programming language does have an impact on productivity.  For example, mainframe COBOL projects would typically have lower overall productivity than more modern languages and platforms.

When importing your own historical projects into a SLIM-Estimate, you can select individual projects to include or exclude.  It is a good idea to use the most similar projects that you have to the one being developed.

I did not include this in this presentation, but in the past I have created box plots where the vertical scale was productivity and the individual boxes represented language family (groups of projects that used similar languages).  When I do this, some of the language family boxes tend to be higher than others, again, this is because some languages are more productive than others.

Lots of factors influence productivity.  The programming language is one of those.

Another question related to using measures of physical software size: With code optimization & refactoring, we see LOC decrease within a project. How can we use that effectively to measure productivity?

I mentioned that the projects I was analyzing were primarily creating new code and changing functionality of existing code, so I had no projects in the sample where the application size decreased.

However, the suggestion I would have is to record the size (lines of code, function points, etc) of the software that was deleted, modified, and added.  If you have projects that only did one of those three, then you could graph the size versus effort and create a regression trend line (similar to the graphs I was using).  You could do this in Excel, or in a SLIM tool.

If you had a good sample of projects that did 2 or 3 of the above (e.g. deleted some code, modified some code, added some code) you could do a multivariate regression to see what the relationships are, and which of the 3 (deleted, modified, added) is most closely related to effort.

Once you have a relationship between size and effort, then you can look to see if an individual project had higher or lower productivity than would be predicted, and use that for benchmarking.

When you used the size of the team, did you consider the expertise level of staff?

In the analysis I conducted for this presentation, I did not.  The graphs with staff were counting only the number of people.   The conclusions I was drawing were things that apply to all projects.

Expertise level can be a driver of productivity and quality.  On some of the graphs, we would expect projects with higher expertise to tend to plot on the higher productivity side of the trend line, and those with low expertise staff to plot on the lower side of the productivity line.

In SLIM, there are some qualitative factors that can be adjusted which impact PI.  Likewise, these factors can be recorded in DataManager for later analysis.

My colleague, Don Beckett, recently gave a presentation on Agile which also compared best and worst projects.  On slide 8 he shows that people, communication, and knowledge factors are important differentiators between the best and the worst projects.

In some other research I have been doing (for a future conference presentation), I am data mining for factors that influence project duration.  I found that whereas staff skill level alone does not impact project duration, team motivation and management effectiveness do influence project duration.  This is duration, and not productivity, but productivity may have similar relationships.

Is "Schedule" on the Performance Benchmark Tables the industry group mean or median?

It is the height of the trend line at the given size point.  So, think of it as a typical value.  When creating the trend lines, we use transformed (log-log) data, which normalizes the data.  Since the transformed data fits a normal distribution, the mean and median will be very close to the same value.  Mean values for skewed distributions can be misleading, so always take care to analyze the data properly.

What is "fair" for stating delivery rates to a client when a portion of the code is "reused" code (which is a smarter way of designing your application)?

In today’s presentation, I didn’t consider reused code.  I was using effective size, so was including new and modified code in my project size.

We are seeing more projects that include reused code. In SLIM-Estimate, the way to deal with reused code is to include it in the size measures, and the tool will modify the PI accordingly.  It depends on how much reused code there is.  If the project is a development project that happens to have some reused code (and the glue code needed to integrate it), then I would use SLIM to estimate it as normal.

If the project is really a COTS implementation that is only developing a little glue code, then you may want to watch the replay of the recent webinar by Keith Ciocco on estimating package implementations.

At any rate, if you have your own historical data, you can use that as a benchmark.

If we measured productivity in terms of value-add to customers, would that change the relationship between size and productivity?

In a partial sense, a functional metric (Function Points, use cases, requirements, etc.) is a measure of value provided, or at least it is a measure of the size of the functionality delivered.   Of course, one underlying assumption would be that all of the functionality provided is of value.  And that is not always true.   Nonetheless, I think that since the “productivity paradox” exists for functional size as well as physical size, it would also exist for any other measures of value we could implement.  At least, it would given a large sample size of projects.  Certainly there would be outliers, projects that, for example, provided no value at all.

View the replay of this webinar here.

Blog Post Categories 
Webinars