I have watched a large rewrite fail and cost an engineering manager his job. The...

nonane · on Jan 25, 2011

I've noticed this as well. You simply can't start from scratch.

The first thing you do is make sure you have good tests built around our old code base. Then you slowly start refactoring/rewriting pieces out. After every small refactoring round run your tests and make sure everything is working. The key is to break down the rewrite into small steps and make sure you have a full functioning product at each step. This might even mean that you need to write code that will be removed after a couple of refactoring iterations.

auxbuss · on Jan 25, 2011

Wish I could vote you up a few more times on this. Mike Feathers called them characteristic tests. Sure, run them as unit tests, but get them under CI right away too. Test everything that moves all the time.

This type of work, and this approach, appeals to a limited set of people, though. It's painstaking, detailed work. The other problem is that businesses don't understand its value, and don't want to pay for it, ime. I've seen two companies go down not paying attention in this area, and two more who are currently dying.

Course, if the thing was under test to start with then things would be so much simpler ;)

toumhi · on Jan 25, 2011

This is what Michael Feathers calls 'seams' in his book, Working With Legacy Code. Often, you have to do exploratory testing, that is, you don't really know the requirements but you make tests that the current code passes. Then you can refactor it. That way, current code behavior won't be changed.

Very good read, if you need to deal with legacy code and you don't know where to start.

http://www.amazon.com/Working-Effectively-Legacy-Michael-Fea...

jacquesm · on Jan 25, 2011

This is exactly what we just did to ww.com, and while we're still on the fence whether that was the right thing to do or not the result is that we did not have big continuity issues and that we now have a completely fresh codebase.

Periodic · on Jan 25, 2011

Where I see rewrites fail is when the company has one huge monolithic and interconnected application which really has to be rewritten all at once or not at all.

Dividing your application up into logical libraries and services goes a long way towards making it easy to rewrite. Essentially, you want to obey the single responsibility principle. This means that each component should not need to change if a change is made in another component. From the Wikipedia article:

> Martin defines a responsibility as a reason to change, and concludes that a class or module should have one, and only one, reason to change. As an example, consider a module that compiles and prints a report. Such a module can be changed for two reasons. First, the content of the report can change. Second, the format of the report can change. These two things change for very different causes; one substantive, and one cosmetic. The single responsibility principle says that these two aspects of the problem are really two separate responsibilities, and should therefore be in separate classes or modules. It would be a bad design to couple two things that change for different reasons at different times.[1]

For example, some of your core logic might be written as a library or a stand-alone server. This separates it from the GUI logic and the database logic. So if you need to rewrite the interface, you do that in the GUI code. If you want to refactor the business logic, you rewrite that component. If you want to change the database, you alter your data-access layer.

This makes incremental improvement easy.

1. http://en.wikipedia.org/wiki/Single_responsibility_principle

shortlived · on Jan 26, 2011

> one huge monolithic and interconnected application which really has to be rewritten all at once or not at all

I disagree. Any program can always be refactored and compartmentalized. It may be a slow process but it is possible to do a rewrite one chunk at a time.

lars512 · on Jan 26, 2011

Absolutely -- a total rewrite is only necessary if broad architectural decisions are all wrong. Otherwise, working to decouple the monolithic system is the crucial step which will allow the chunk-by-chunk rewrites.

Lewisham · on Jan 25, 2011

The thing that popped into my head while reading the piece is that sparing one star developer might be pretty cheap; how about putting him on a Skunkworks rewrite for 6 months and see how it goes? If it's looking good, give him whatever resources are necessary to finish. Do you think that would work?

cellularmitosis · on Jan 25, 2011

I think that's a good way to minimize risk, but also consider the morale issue. This would be the equivalent of giving one developer a corner, window office while the rest stay in thei cubes.

Perhaps you could apply google's 20% rule? Every Friday the entire staff breaks up into teams and work on skunkworks rewrite projects. I think this will boost overall morale and you might find your developers staying late on Friday nights :)

arethuza · on Jan 25, 2011

I guess I'm on four major rewrites so far - two complete disasters, two great successes.

The only pattern I've worked out so far is: disaster, success, disaster, success...

lt · on Jan 25, 2011

Good luck in your next project.

ericb · on Jan 25, 2011

I wonder what predicts success in the rewrite?

The advantage to the component based rewrite is that it doesn't cost you your head if it fails. You can still push out new features in each version, and you have a fallback plan if the component rewrite fails or is delayed (just use the old one).

arethuza · on Jan 25, 2011

The disasters were driven mainly by an attempt to create a manageable codebase while keeping the end user experience fairly similar.

The successes were things that built on existing systems but focused on delivering things that were actually radical improvements in functionality, with the rewrites being driven by this, not an end in themselves.

silverbax88 · on Jan 25, 2011

I am a big proponent of gradual change like this. I've used it multiple times on major enterprise systems.

I'm still baffled by why more people don't do this.

ericb · on Jan 25, 2011

I feel like the mental stumbling block that stops people has something to do with our ideas of purity and cleanliness. Maybe people intuitively feel contact with the old system would make the new code unclean.

mechanical_fish · on Jan 25, 2011

It is slow and painful and requires you to pay lots and lots of careful attention to the behavior of the existing enterprise system that you may very well hate.

Greenfield development sings a siren song. You get to scribble on a gloriously empty page in your imagination, free of such mundane concerns as cash flow, near-term customer demands, day-to-day stability, vitally important edge cases, and hard-won but crufty bug fixes.

nollidge · on Jan 25, 2011

Part of the reason could be technical analysis paralysis. I've worked on projects rewriting PowerBuilder components in C#. Getting the two to talk to each other is non-trivial, so determining where to slice off chunks to rewrite is an anxiety-inducing prospect.

misterbwong · on Jan 26, 2011

We've been doing a variant of this for the last year. We rewrote the core and one large component first and have been gradually moving over the other components to the new architecture. It's definitely a safer way to go, though implementation time is longer.