Discovery Costs of Evolving Software Systems
Report of the Workshop on Discovery Costs at OOPSLA '02 (Seattle,
November 4, 2002)
1. Overview of Discovery Costs
Discovery is the process of learning what you need to know to do
the job you have to do.
Discovery costs are often ignored or underestimated in the planning
and management of software systems.
-
Reuse is a two-edged sword. Many people expend extra effort
in the attempt to reuse some existing "legacy" code in the development
of new software applications. There is always some unavoidable overhead
"learning curve" associated with any form of reuse -- and some kinds of
reuse result in more overhead than others.
-
It is hard work to fit a new system into an existing context.
Even when you develop a new system without reusing old code, there are
often some "legacy requirements" that need to be understood -- the requirements
of the software being replaced by the new system. Discovery time
is needed to learn about all of the legacy requirements, many of which
are not written down in a formal way.
Discovery is mostly an organizational/cultural problem, but tools
and technologies can have an impact.
2. Practical advice for software development staff
Both managers and developers need to consider discovery costs in doing
their jobs.
Advice to managers
|
Advice to developers
|
Acknowledge Discovery Costs
• plan for discovery time in project schedules |
Talk to maintainers and end users
• they often know what behavior is needed |
Encourage cross-team collaboration
• formal/informal |
Focus on problem to be solved
• "extra stuff" increases discovery costs |
Tangential learning is important -- doing discovery gradually
and early
• it's part of a continuous learning curve |
Keep the initial hurdles low for new staff to be productive
• modular design
• good documentation |
2.1 Advice to Managers
Acknowledge Discovery Costs: Managers need to take discovery
costs into account as they create project schedules and monitor the progress
of development. They should not be surprised when the development
effort in a project that contains 50% reused code is only 10% or 20% less
than for a similar-sized project with no reuse. (The most significant
benefits of reuse are usually realized in integration, testing, and maintenance.)
Encourage cross-team collaboration: It is much more costly
to do "discovery" alone -- so it is always better when managers allow developers
to work in teams to do discovery. The discovery can be sped up by
sponsoring a set of formal design reviews and code walkthroughs.
In addition, informal meetings between experienced and inexperienced team
members can communicate a lot of practical information in a short interval.
Some developers like to form ad-hoc "user groups" that meet on a regular
basis to share information.
Tangential learning: In many cases, the most important
discovery activities are occurring at the same time as other software development
tasks. It is impossible to understand all of the internal
design details of a large body of legacy code, and fortunately, it is possible
to build on top of legacy code without a complete understanding of every
design detail. Developers use "wrapping" and "refactoring" techniques
to allow them to temporarily ignore many design details. But developers
need to continually learn more facts about the code that they reuse --
to understand the data structures, algorithms, interfaces, and side effects
that may eventually become important later.
2.2 Advice to Developers
Talk to maintainers and end users: The best sources of information
on the structure of software are the people who have been living with the
code: the maintenance developers who read and change the code and
the end users who use the software. The maintainers usually know
"where the skeletons are" -- the parts of the system that are most complex,
most troublesome, most rapidly changing. This is useful information
to anyone who is attempting to understand an existing design. And
talking with the end users is a good way to find out which "scenarios"
in the existing system are most often used.
Focus on problem to be solved: Developers are sometimes
too creative -- they add in new functionality that no one actually needs.
The extra code is a burden to the maintainers of the system, and it is
also a hindrance to anyone trying to "discover" the design of the code.
The standard software engineering advice ("just implement the documented
requirements and nothing but the requirements") and the new-fangled XP
advice ("you aren't going to need it") are good ideas -- following this
advice will reduce downstream discovery costs.
Keep the initial hurdles low: It is a good idea to keep
reusable libraries and software components simple -- to reduce the learning
curve for others who will use them. Large software project teams
are always made up of people with widely differing levels of skill and
experience. Even if you are a real pro, you are not doing anyone
a favor by writing complex module interfaces that only you can understand.
In fact, for big systems and for open source projects, it is a good idea
to include some simple examples of how to use your library to do something
simple and useful -- some "tutorial examples" is one of the best things
to help shrink the discovery costs for novices on a project.
3. Discovery cost good practices
Keep code modular: The standard software engineering advice
of "minimizing coupling" and "maximizing cohesion" is a good way to keep
discovery costs under control. Modularity allows newcomers to focus
their design work on selected parts of the system without needing to understand
everything. Some things to avoid include: complex global data structures,
god classes, changing encapsulated data by "side effect".
Up-to-date system documentation: Good documentation is
part of a plan for keeping a software product "living" for a longer period
of time. Even if the original developers are no longer available,
the key design secrets can be explained to newcomers. The initial
"architectural vision" can be maintained or adapted. Of course, documentation
can be expensive to write and expensive to maintain -- so documentation
is much less important for a small "agile" team writing financial trading
software that will have a life of a few months, and documentation is much
more important for a large geographically-distributed team (or an open-source
development project).
Use pair programming and pair design: Software development
is not like going to school -- you are encouraged to work with others.
Pair programming is a particularly good practice for performing discovery
and for reducing future discovery costs. As for performing discovery,
some of the pair programming ideas can be applied to doing discovery work
on an existing legacy system. Two developers have a better chance
of finding some of the undocumented design tricks that lurk within the
code. And in order to reduce future discovery costs, doing design
and coding work in pairs reduces the complexity and improves the clarity
of both code and design documentation. If you program with a partner,
the partner is likely to object to attempts to write code that is obscure
or clever.
Keeping code healthy: In an ideal world, the target of
software development is to be part of a "continuous learning culture".
Discovery is just one part of an iterative software lifecycle, which supports
the development of "self-sustaining" software systems. New requirements
and new functionality offer opportunities to refactor and renew parts of
the software. And by moving forward, the skills and knowledge of
the development team are less prone to get out-of-date.
4. Tools
The workshop didn't spend too much time on tools, but two of the participants
had some good ideas of tools that can help manage and/or reduce discovery
costs in practical situations.
Some tools to use:
-
a searchable archive of all email message to the project's mailing list
-- especially useful for open source projects
-
tools for rewriting legacy code
5. Special challenges
-
XP
-
open source
-
adding new staff later in a project
Periodic design improvement in XP: XP calls for frequent investment
in refactoring the design. Since XP projects develop
in rapid iterations, the structure of the source code gets increasingly
complex over time. It is necessary to schedule some time to "remove
the gunk" every once in a while -- in order to reduce the discovery costs
for parts of the software that have had many new features bolted onto it.
Open source: A healthy open source project has many novices
contributing to the effort. These novices rely (mostly) on written
documentation to do "discovery", because it is usually impractical to get
detailed coaching from the central development team.
Experiences in adding new staff and dealing with personnel turnover:
-
A study of software development projects in AT&T and Lucent Technologies
found that new developers would only work at about 20% or 30% efficiency
in their first 6-12 months on a big project. The "efficiency" is
the amount of productive programming work compared to total effort.
The 70-80% is mainly "discovery costs".
-
From the same study -- After being involved in one product release, the
programmer's efficiency would improve to about 50-60%, and when working
on their third release, they would be at about 75-80% efficiency.
Discovery costs still continued to be a factor -- continuing at a 20-25%
level even for experienced staff. This is due to changes in the development
and testing environment, evolution of the requirements, and new technologies
introduced in the project.
-
So, discovery costs shrink over time, but they never go to zero.
The discovery cost/efficiency curve is similar, even for new Web-based
applications -- the time to go from 20-30% efficiency to 50-60% efficiency
is about 6 months. Again, the efficiency of a developer reaches a
maximum of about 75-80% after about a year. The reasons why discovery
cost doesn't go to zero are:
-
the domain is usually growing and changing
-
new Web-based development tools may be used on newer releases
6. Patterns
The new book Object Oriented Reengineering Patterns (by Serge Demeyer,
Stephane Ducasse, and Oscar Nierstrasz) contains a lot of good advice for
reengineering existing systems -- and it includes a lot of information
about how to lower the initial "discovery costs" hurdles when doing reengineering.
-
First Contact: You learn the most important facts about the structure
of a legacy system by quick code reading and holding interviews with the
developers and maintainers. These people may have a good understanding
of the behavior that needs to be preserved over time.
Linda Rising has a set of patterns on mentoring, meetings, contact lists,
and external collaborations.
-
Mentoring: When choosing a mentor to assign to a new staff member,
consider choosing a team member who is less experienced, rather than the
most experienced team member. The most experienced team member is
often too busy answering the phone or attending meetings to be an effective
mentor, and a less-experienced person can often give responses to questions
that are more understandable to a novice.
-
Contact lists: A simple list of "who has the expertise about each
topic" can be the most important piece of paper for a new project team
member. It may be a good idea to create and maintain such a list
throughout a project.
-
External collaborations: Linda calls this the "Outsiders Are
Our Friends" pattern -- try to break down the walls that often exist between
different organizations. Instead of waiting for another group to
deliver the product or document that they have promised, you should try
to get involved early.
7. More information
For more information, see the workshop position papers and other information
from the workshop at:
http://csc.noctrl.edu/f/opdyke/OOPSLA2002/discovery_index.html.