Beyond Green-Field Software Development:

Techniques for Reengineering and Evolution

There are many languages, tools, and design methodologies in the software community that are aimed at the creation of new software.  But a lot of valuable software is the product of evolution, reuse, and reengineering.  Some software is too expensive to “throw away and start over”.  A skilled software team will have an arsenal of techniques at their disposal for adapting, evolving, and refactoring existing code and designs.

The workshop participants discussed a wide range of technologies and experiences.  The most important practices and techniques were distilled from the participants’ discussion and divided into four major categories:

1. ISSUES FOR TRADITIONAL COMMERCIAL SOFTWARE DEVELOPMENT

 The following software evolution issues assume that the environment is “traditional” commercial software development.  There are different economic tradeoffs and strategies for “open source” software development – see section 2 for more details.
 

1.1. People

Trust
Trust is an important issue in any legacy-based development process.  The members of a software team are constantly assessing the trustworthiness of the legacy software, and trust is an important part of the design process.

Reengineer using a small team, where everyone knows the roles of others
Any collaborative work, especially reengineering and evolution work, is made easier by having a team that can work smoothly together, and each team member can find the right person to refer questions related to both the domain and the technology.

Use small smart team for exploration; have a ready and willing team for development; with a supportive management
This has been a common formula for successful reengineering work.

The team should have some expertise in the legacy system
Although it isn’t absolutely necessary for everyone on the team to have expertise in the legacy system, it is extremely useful to have even one part-time team member who was involved in the design decisions for the existing software.  This person can resolve a lot of questions and help preserve the best parts of the legacy design.

Focus the smart people on critical areas
The assignment of staff to different areas of a large reengineering project should not be arbitrary.  The best results are achieved when the sharpest team members take charge of the critical parts of the system.

Understand and leverage people’s area(s) of expertise
Some ongoing investment is often needed to grow that expertise.

Management should know what motivates your people, and keep focused on doing the necessary motivation
There are many pitfalls here.  If a company or organization gives the greatest financial rewards and the highest prestige to the developers working on new green-field projects, the “maintenance developers” will feel very unmotivated to do a good job.
 

1.2. Processes

Know when to give up
One story from the workshop:  A system that needed to be enhanced had a very complex three-tier architecture, with lots of complex communications functionality in the middle tier.  Unfortunately, all of the “business value” was actually in the user interface and the database layer.  In effect, this was a two-tier system masquerading as a three-tier system.  The team in charge of reengineering this system did a lot of hard investigation of the system to understand it and to come up with ways to do reengineering as cleanly as possible – which was an impossible task.  Finally, they decided to give up, which was the right choice.  The lesson:  look out for impossible reengineering tasks like this, and know when to say it is impossible.

Make the business case obvious
Whatever you decide to do in reengineering and evolution, you will get good management support if you can help managers understand the business value of the work.

Ask if there is an alternative
Sometimes there are alternatives that need to be considered.  This should be part of your business process.

Measure business value of the legacy system
This is to prevent doing unnecessary work, if the legacy system no longer has adequate business value.  If the legacy system has limited business value, it may constrain your choices in how to do the reengineering.

Make sure the team has the expertise needed
Another story from the workshop:  A large project (Boeing 777) decided on using the Ada language, but they had one computer in their architecture that didn’t have an Ada compiler.  They set up a small team (with inadequate compiler experience) to develop an Ada compiler as a “side project”.  The schedule languished, no real progress was made, and finally they had to bring in a team of outside experts to work on the Ada compiler in order to meet the schedules.

Use retrospectives
Retrospectives are sometimes called “post-mortems”.  They are quite popular among the agile development and XP community.  The best source of information is the book Project Retrospectives by Norm Kerth.  Retrospectives often bring up serious issues that concern the usability of the system and its flexibility.

Incremental application of technologies
Don’t try to apply too many technologies at once in a reengineering effort.

Correctly estimate a team’s abilities
Don’t assign a novice team to do work that is beyond them.

Create and maintain accurate models
This might use techniques such as “round trip engineering” if the tools support it.

Use source code control for different lifecycle needs
Good source code control is especially important if development and testing are overlapping.  You might want to continue making changes to the software while an earlier cycle is being tested.

Methodology support for maintenance and enhancement

Understand and document the conventions and patterns in the development process

Perform legacy data migration planning
In addition to transforming code, some significant changes may need to occur in existing data files and databases.  It is good to start planning this early, especially if the legacy system has a lot of “crufty” data that needs to be restructured in the reengineering process.

Code reading
This is a practice that is underused by many software professionals and underrated by most managers.  Code reading skills are critical to effective software evolution – it is really important to be able to capture and maintain the basic design intent of existing software during the evolution process.
 

1.3. Tools and Technologies

Rule-based code transformation tool
Several workshop participants remembered a Practitioner Report paper at OOPSLA 2002, where Will Loew-Blosser described a transformation tool that was created at Cargill to perform a massive refactoring of the database interface software in a large Smalltalk-based application.  This refactoring would have been impossible to manage in small steps, because it would have taken too long – new software development would have been frozen for many weeks.  The transformation tool allowed the developer performing the refactoring to define rules to do pattern matching throughout the “parse tree” of the Smalltalk code, with a substitute set of code specified for each pattern that was matched.

They created a set of transformations using the tool, and then did careful testing on snapshots of the Smalltalk application’s image to make sure that the refactorings were correct.  After they were sure things would work, they were able to apply the complete large refactoring in a single weekend afternoon.

Modeling
Developers can use models as part of the thinking process about how to reengineer their code.  It is a good way to keep organized – using both a use-case model to give an overall description of the system behavior that should stay unchanged, and a class model to lay out the details of the transformation.

Code analysis tool with a trained user
Several workshop participants reported significant success in using code analysis tools as part of the reengineering process.  They can save a lot of work.  But one cautionary note:  make sure that there is at least one trained user of the tool on the team.

Languages/architectures for extreme change
Richard Gabriel told the story of the original Yahoo Store software, which was an application written in Common Lisp.  It provided two capabilities.  First, it could generate and display an on-the-web questionnaire for a potential on-line store owner to fill out to define their store (products, prices, pictures, charging options, and so on).  Second, it was a multi-threaded web server that actually served the requests from customers.

The architecture was definitely designed to support change.  Every once in a while, a store would crash because of a bug in the software.  But this would only crash the thread, not all of the other stores.  And the developer could patch and reinstall the code without taking the system down.

Design history
When a developer works on the evolution of a piece of legacy code, any information about the design intent is useful.  The developer will normally try to make changes that preserve the original design intent (because it is less likely to introduce new bugs because of unforeseen interactions).  Design history might be obtained by talking with people, reading documents, extracting information from code and a change management system.

Decrufting data
Tools are useful for automating the process of transforming the legacy data at the point that the newly-reengineered system is ready to cut over.

1.4. Stories about farm animals

Economic versus emotional choices
Don Roberts told a couple of stories based on his wife’s experiences as a veterinarian.  She is often asked to treat a farmer’s sick cow, and at times she has to explain to the farmer that the treatment will be very expensive.  The reaction of most farmers is that they ask her to shoot the cow – a purely economic choice, because treating the cow will be more expensive than the expected value of the milk the cow will produce in the future.  On the other hand, she often is asked to treat seriously ill pet animals (dogs, cats), whose owners are very emotionally attached to their animals.

The difference between an economic choice and an emotional choice is perfectly valid for animals.  But emotional attachment to a piece of legacy software can be quite irrational.

How long do you keep milking the dying cow?
The question is – what are the economics of continuing to evolve a software system that is no longer economical to maintain and transform.  At some point, management needs to make a decision without  being clouded by emotion.
 

2. ISSUES FOR OPEN SOURCE SOFTWARE DEVELOPMENT

Open source software development has some different economic and structural characteristics – which create a different environment for teams, different choices for tools, and different issues for software evolution.

For this reason, the sections below give different lists of issues and strategies for open source development.  These sections use the same basic titles as in section 1.
 

2.1. People for Open Source

Trust
This is just as important in open source as in conventional development.  Sometimes, unskilled people who do a poor job of contributing to an open source project get labeled by the project’s leaders as “losers”.

Open source team process
Developing open source software on a legacy base can in general use a larger software team size than conventional development.  There are still important things about the structure of an open team that must be followed.

Expertise in legacy system in core architecture team
Expertise in the legacy system is still needed, just like in conventional development.

Focus of experts on critical areas
It is still necessary to have experts to call on for the most critical parts of the software.  This is actually an easier thing to manage in open source development – because the experts tend to be drawn to the critical areas where they have their specialized knowledge and skill.

Emergent understanding of people’s expertise
The expertise of team members in an open source project is not always known early in development, but the team’s knowledge of the skills of its members will increase over time.  If someone is a “loser” it will be apparent relatively quickly.

Motivation (a wide spectrum of motivational drivers)
There are just as many different motivators for open source development as for conventional development – including monetary motivations.
 

2.2. Processes for open source

Fork
Open source projects sometimes split themselves into two projects, building two different (“forked”) versions of the software.  This project splitting is highly discouraged in the open source community, but it happens.  There are lots of forces at work to get the teams and the software to “rejoin” at a later point.

Modeling
Open source projects don’t do a lot of conventional modeling.  Most of their “design model” is actually in the source code – plus the comments, README files, HOWTOs, and the O’Reilly books.

Conventions
Conventions (design conventions, coding rules, and so on) are just as important in open source development as in conventional development.

Source code control
Just about every open source project uses source code control as a central part of the development environment – usually CVS.

Code reading
Open source project participants do a lot of code reading – probably as much code reading as the rest of the development world should do.
 

2.3. Tools and techniques for Open Source

Design history
The most important repository of design history is the set of email logs maintained for the project.  One interesting fact about email logs for open source projects – the content of many of the emails follow some standard conventions and formats:  including bug reports, new feature requests, descriptions of new code submissions.

Other sources of design history include code review results and the change history information in block comments at the beginning of each module.

Low-tech power tools
Open source project members generally don’t use fancy software development tools.  Most software is either written in C or in newer-generation interpretive scripting languages, in order to maximize the software’s portability.  The developers rarely use an interactive development environment (IDE) – most of them use text editors like emacs for writing code.
 

3. INITIAL SET OF TERMS TO DEFINE THE PROBLEM AREA

The following list is a set of terms to consider in the area of reengineering and evolution.  This list was a product of the initial brainstorming activity in the workshop:
  • Black-box (software modules that a team can use, but they can’t see the internal structure)
  • Domain knowledge (techniques for capturing and evolving)
  • Flex point (a “planned variation”) and Levers (one way to change things at a flex point – a flex point may have many different levers)
  • Reverse engineering
  • Model
  • Readability of code (code reading skills)
  • Component (something that can be plug-and-play)
  • Living requirements (requirements that are updated and adapted throughout the software’s lifecycle)
  • Savant (a person who is an intuitive problem solver – this is also the original meaning of the word “hacker”)
  • Separation of concerns
  • Tribal knowledge (and the tribe’s blind spots)
  • Working system (very important in agile development)
  • Literate programming (techniques to write down design decisions – also see Donald Knuth’s book of the same title)
  • Methodology and tool support
  • Managing customer expectations (helping them understand which changes are easy and which are hard)
  • Domain architecture
  • Continuous integration
  • Nuancing
  • Missing documentation (underlying knowledge)
  • Rewriting / transforming
  • Continuous redesign
  • Abundance (a characteristic of most open source projects)
  • Matching (business models to specs)
  • Legacy data
  • Discovery
  • Requirements languages
  • Teams (especially in the context of refactoring)
  • Resistance to change
  • Organizational requirements
  • “Implementors rule” (the concept that whatever the architecture documents and design models say, the architecture and design information in the code is the most important thing for future evolution)
  • Differences in languages
  • Anticipating changes in technology
  • Liability
  • Pattern mining
  • Conventions / style riffs

  •