Escaped from the Lab: Workshop final report

This workshop took place at OOPSLA 2006 in Portland on Monday Oct. 23, 2006. We had 13 participants from a number of companies and universities.

The main purpose of this workshop was to discuss issues relating to the transition of research prototype software into a production environment. The workshop started with an initial brainstorming on the issues that we all thought were most important to success, as well as the issues where we all had questions.

Issues discussed in the workshop:

Issue 1: System evolution

Software transferred from research to development is just like any "legacy code" situation -- there will be some need to modify and evolve the original system.

There were two main questions asked here:

In the Refactoring literature, there are three categories of techniques for "extending a code base" -- each item in the list is more involved than the one before:

Time-to-market is as important as cost.

Refactoring is viewed as overhead -- so it is not as popular as it should be among developers and project managers. In many projects, there is too much pressure to work on "new" functionality, rather than going back and cleaning up existing designs that need improvement.

This is somewhat shortsighted, but it is tough to run controlled experiments that "prove" the positive benefits of refactoring.

One problem we discussed relating to refactoring:

Issue 2: Quality (including testing)

We noted that Quality is not always "Job #1" in many software development organizations.

Another question we mentioned: Do we do quality from a "testing view", or is it as important to include techniques like design reviews, code inspections, automated inspection tools, and similar techniques?


Within the overall issue of software quality -- testing is the issue that is most often ignored in making the transition from research to development. We discussed a set of ideas that might help:

One person asked about the effectiveness of Microsoft's logging and problem reporting system (Microsoft's "crash analysis" feature in recent versions of the Windows operating system). The participants have said that it has actually been helpful in quickly finding and resolving device driver problems. This paradigm might be useful for diagnosing problems with research-derived products.

One problem with logging functions and automated bug reporting: Too much logging and online reporting can cross the line into "spyware" -- surveillance of what the users are doing on the system.

Quality issues are really hard when making the transition from research prototype to real-world product. This is inevitable because of the different values for researchers and product developers:

It may sometimes be possible to add things that are needed in production enviroment "around" the research prototype -- but more invasive changes are often needed.

Issue 3: Software Security

Most "escaped from the lab" software hasn't been developed with concern for security requirements, security architecture, or secure coding practices. There will be some work to do -- if we hope to get the research prototype into production condition:

What kinds of things need to be added?

A good source of information on Secure Coding (with some notes about security requirements and security architecture): the book Secure Coding by Mark Graff and Kenneth Van Wyk.

First time use of a research product: it is always interesting, because the developers are "speculating" on how the system will be used

Many problems may be found in the interactions with other systems

A set of security questions that should be asked for any research prototype that is going to be turned into a product:

Issue 4: Tactics for Rebuilding / Reengineering Existing Software

We called this "Rebuilding a house while you are living in it" -- tactics for doing effective reengineering.

This part of the workshop went over a set of common reengineering techniques -- found elsewhere in the literature...

Wrap and Convert: a good way to modify both the interface and internals of an existing module (which might be a class, a class cluster, or some kind of component)

Four major steps in this process:

See Wrapper Tutorial below for a more complete description of this process.

Another popular design evolution technique is to introduce the Strategy Pattern into the design:

One final design evolution technique is "merging similar code from several separate places into a single reusable component". (Some call this "mining existing legacy for components"

This is a three step process:

See Mining existing legacy system below for a more complete description of this process.

Issue 5: Documentation

Good design documentation is rare.

The "user stories" of XP actually make good concise documentation.

There was some excellent discussion around several questions about design documentation practices in R&D.

Who writes the documentation?

In development groups, the session participants talked about three situation:

In research groups:

When the research code is really difficult, and the design documentation is non-existent, what happens?

What kinds of documentation is useful?

In increasing value to the team making a conversion of a research prototype to a product:

How much documentation is useful?

We need the documents to be organized as a "story" -- not just a single big huge class diagram

Think of the elements of a story....

Make sure the documentation tells "a story" -- better than a disorganized set of APIs.

Research -- are researchers just as good as development team members at writing documentation? Maybe: but the overall story is bad. Some (really consciencious researchers) really want their software to be used, so they invest more effort in good tutorial documentation.

Some folks try to document a large undocumented code base by using Doxygen or Javadoc -- putting in small comments in each class and method. In a way, Doxygen/Javadoc can be both too much and too little...

Other issues:

Scott Ambler's book Agile Modeling has good advice on UML usage -- use a whiteboard and digital camera instead of a CASE tool

Visualization is an important part of effective design documentation -- many users find it easier to understand and trust a design where they have some pictures

Issue 6: Collaboration

Most corporate developers are weak at collaboration

Sometimes we need a catalyst -- someone who can help get the necessary people talking with eachother

Some tools and processes to use:

It is possible to consider using practices like "pair programming" and "pair design" to get people to work together and share their expertise.

One big problem in collaboration: code ownership. Developers are generally adverse to sharing.

One thing that helps is to establish a "code review" or "code inspection" process. This can check code for readability, conformance to coding standards, find subtle bugs, and generally improve everyone's understanding of the design and interfaces.

Another important concept related to collaboration: a software system ought to have *two* architectures:

Many projects have trouble because their "testing architecture" is non-existent -- it is whatever the tester slaps together at the last minute. In the transition from research prototype to product, it is worth thinking early about what test scenarios and testing architecture will be needed.

One last collaboration topic: Build environment. Research prototypes are often just built on a researcher's PC or workstation -- but a real commercial product needs to have a repeatable process for compiling and shipping the product to the customer.

Some important source books

During the course of the workshop, we talked about several useful books that have useful process information...

Appendix A: Wrapper Tutorial

This is a summary description of the "wrap and convert" process described in Issue 4:

Suppose that there is an existing module in the system called "Mod1", with an interface and implementation that you would like to evolve (to a simpler, more efficient, or more featureful module). The new module will be called "Mod1A" -- and it will be introduced into the system in a four step process.

Step 1. compile and test the existing unadorned software -- make sure it works

Step 2. define a new "wrapper" -- the wrapper is defined but not wired in yet

Step 3. partial refactor -- piece by piece, change deprecated calls to Mod1 to use the functions in Mod1A

Step 4. Inversion: writing a reverse wrapper (Mod1B has the same interface as Mod1, but it is a wrapper for Mod1A)

At the same time (or shortly thereafter) you can do step 4a

Step 4a. Rewrite Mod1A as a new standalone class. (with whatever needs to change -- more efficient or new data structure)

Note: Many folks get stuck at step 3 because they don't know about inversion -- and they aren't clear about how to get rid of the last calls to the old legacy interface...

Appendix B: Mining existing legacy system for components

This is a more involved process than normal refactoring...

There are two important steps:

Step 1: in 2 hours or less, do a CRC card modeling exercise

The result is an initial model of the design of the new common component, but we aren't done yet...

Step 2: using the CRC card model as a basis, write a simple one-page requirements/architecture document.

One useful format for this kind of document is the old AT&T "problem statement" document -- this is a document that describes the problem to be solved in terms of four "dimensions". Each of these section is a brief description of what characteristics are needed from a design solution:

At this point, you can take the problem statement to management, and the problem statement plus the CRC design to other development team members -- as the basis of a refactoring plan to develop and introduce the new common component.

In a way -- this two-step process is a form of "pair design": the participants in Step 1 can be one researcher and one development team member, and Step 2 can be written by one person with feedback from the other.

Last modified: October 24, 2006