Discovery Costs of Evolving Software Systems

Report of the Workshop on Discovery Costs at OOPSLA '02  (Seattle, November 4, 2002)

1. Overview of Discovery Costs

Discovery is the process of learning what you need to know to do the job you have to do.

Discovery costs are often ignored or underestimated in the planning and management of software systems.

Discovery is mostly an organizational/cultural problem, but tools and technologies can have an impact.

2. Practical advice for software development staff

Both managers and developers need to consider discovery costs in doing their jobs.
 
Advice to managers
Advice to developers
Acknowledge Discovery Costs
 • plan for discovery time in project schedules
Talk to maintainers and end users
  • they often know what behavior is needed
Encourage cross-team collaboration
  • formal/informal
Focus on problem to be solved
  • "extra stuff" increases discovery costs
Tangential learning is important -- doing discovery gradually and early
  • it's part of a continuous learning curve
Keep the initial hurdles low for new staff to be productive
  • modular design
  • good documentation

2.1 Advice to Managers

Acknowledge Discovery Costs:  Managers need to take discovery costs into account as they create project schedules and monitor the progress of development.  They should not be surprised when the development effort in a project that contains 50% reused code is only 10% or 20% less than for a similar-sized project with no reuse.  (The most significant benefits of reuse are usually realized in integration, testing, and maintenance.)

Encourage cross-team collaboration:  It is much more costly to do "discovery" alone -- so it is always better when managers allow developers to work in teams to do discovery.  The discovery can be sped up by sponsoring a set of formal design reviews and code walkthroughs.  In addition, informal meetings between experienced and inexperienced team members can communicate a lot of practical information in a short interval.  Some developers like to form ad-hoc "user groups" that meet on a regular basis to share information.

Tangential learning:  In many cases, the most important discovery activities are occurring at the same time as other software development tasks.  It is impossible to understand all of the internal design details of a large body of legacy code, and fortunately, it is possible to build on top of legacy code without a complete understanding of every design detail.  Developers use "wrapping" and "refactoring" techniques to allow them to temporarily ignore many design details.  But developers need to continually learn more facts about the code that they reuse -- to understand the data structures, algorithms, interfaces, and side effects that may eventually become important later.

2.2 Advice to Developers

Talk to maintainers and end users:  The best sources of information on the structure of software are the people who have been living with the code:  the maintenance developers who read and change the code and the end users who use the software.  The maintainers usually know "where the skeletons are" -- the parts of the system that are most complex, most troublesome, most rapidly changing.  This is useful information to anyone who is attempting to understand an existing design.  And talking with the end users is a good way to find out which "scenarios" in the existing system are most often used.

Focus on problem to be solved:  Developers are sometimes too creative -- they add in new functionality that no one actually needs.  The extra code is a burden to the maintainers of the system, and it is also a hindrance to anyone trying to "discover" the design of the code.  The standard software engineering advice ("just implement the documented requirements and nothing but the requirements") and the new-fangled XP advice ("you aren't going to need it") are good ideas -- following this advice will reduce downstream discovery costs.

Keep the initial hurdles low:  It is a good idea to keep reusable libraries and software components simple -- to reduce the learning curve for others who will use them.  Large software project teams are always made up of people with widely differing levels of skill and experience.  Even if you are a real pro, you are not doing anyone a favor by writing complex module interfaces that only you can understand.  In fact, for big systems and for open source projects, it is a good idea to include some simple examples of how to use your library to do something simple and useful -- some "tutorial examples" is one of the best things to help shrink the discovery costs for novices on a project.

3. Discovery cost good practices

Keep code modular:  The standard software engineering advice of "minimizing coupling" and "maximizing cohesion" is a good way to keep discovery costs under control.  Modularity allows newcomers to focus their design work on selected parts of the system without needing to understand everything.  Some things to avoid include: complex global data structures, god classes, changing encapsulated data by "side effect".

Up-to-date system documentation:  Good documentation is part of a plan for keeping a software product "living" for a longer period of time.  Even if the original developers are no longer available, the key design secrets can be explained to newcomers.  The initial "architectural vision" can be maintained or adapted.  Of course, documentation can be expensive to write and expensive to maintain -- so documentation is much less important for a small "agile" team writing financial trading software that will have a life of a few months, and documentation is much more important for a large geographically-distributed team (or an open-source development project).

Use pair programming and pair design:  Software development is not like going to school -- you are encouraged to work with others.  Pair programming is a particularly good practice for performing discovery and for reducing future discovery costs.  As for performing discovery, some of the pair programming ideas can be applied to doing discovery work on an existing legacy system.  Two developers have a better chance of finding some of the undocumented design tricks that lurk within the code.  And in order to reduce future discovery costs, doing design and coding work in pairs reduces the complexity and improves the clarity of both code and design documentation.  If you program with a partner, the partner is likely to object to attempts to write code that is obscure or clever.

Keeping code healthy:  In an ideal world, the target of software development is to be part of a "continuous learning culture".  Discovery is just one part of an iterative software lifecycle, which supports the development of "self-sustaining" software systems.  New requirements and new functionality offer opportunities to refactor and renew parts of the software.  And by moving forward, the skills and knowledge of the development team are less prone to get out-of-date.

4. Tools

The workshop didn't spend too much time on tools, but two of the participants had some good ideas of tools that can help manage and/or reduce discovery costs in practical situations.

Some tools to use:

5. Special challenges

Periodic design improvement in XP:  XP calls for frequent investment in refactoring the design.  Since XP projects develop in rapid iterations, the structure of the source code gets increasingly complex over time.  It is necessary to schedule some time to "remove the gunk" every once in a while -- in order to reduce the discovery costs for parts of the software that have had many new features bolted onto it.

Open source:  A healthy open source project has many novices contributing to the effort.  These novices rely (mostly) on written documentation to do "discovery", because it is usually impractical to get detailed coaching from the central development team.

Experiences in adding new staff and dealing with personnel turnover:

The discovery cost/efficiency curve is similar, even for new Web-based applications -- the time to go from 20-30% efficiency to 50-60% efficiency is about 6 months.  Again, the efficiency of a developer reaches a maximum of about 75-80% after about a year.  The reasons why discovery cost doesn't go to zero are:

6. Patterns

The new book Object Oriented Reengineering Patterns (by Serge Demeyer, Stephane Ducasse, and Oscar Nierstrasz) contains a lot of good advice for reengineering existing systems -- and it includes a lot of information about how to lower the initial "discovery costs" hurdles when doing reengineering. Linda Rising has a set of patterns on mentoring, meetings, contact lists, and external collaborations.

7. More information

For more information, see the workshop position papers and other information from the workshop at:
http://csc.noctrl.edu/f/opdyke/OOPSLA2002/discovery_index.html.