Reverse Engineering or Refactoring

Inheriting a complex codebase is never fun. Having to modify a complex codebase is, well, even less fun. I was working with a customer about a year ago who found themselves in that very situation. The code in question was a mathematical system that they had inherited. Our team took a look at it, and decided to dedicate one of us to figuring it out. I drew the short straw.

A few parameters to note before we begin.

  • The legacy system had to be converted to iOS code.
  • This system was recursive, the values that were generated the previous round would be used as inputs for the next round, in addition to other user provided values.
  • The code had to produce a number that was accurate to the hundred thousandths digit.

For the purpose of this post I'm not going to provide the nitty gritty details, but rather provide a high level view of the approach. Some things I wish I had kept in mind, and will be thinking of in the future.

  • If the code is mathematical, take note of floats, doubles, ints and longs. The precision may matter, it did for me.
  • If possible, try to gather as much data around what inputs produce what outcomes. If possible write acceptance tests around these values.
  • Assumptions will be made, note them, and share them with the customer often.
  • In the end, when the code doesn't work perfectly. Don't freak out, and revisit the assumptions, even ones that were provided by the customer.

Ok, Lets get started.

Step One: Gather Your Bearings

I think the best tool when rewriting code, is a complete understanding of what the code does in its current state. For this piece of code, I was both unfamiliar with the language, and completely removed from the purpose of what it was supposed to do. So I began by first making a list of all the variables I could find in a notebook, even the unused ones. I then went one by one down the list, and wrote in my own words what I observed that variable being used for. Obviously this does not need to be done on paper, but for me I enjoyed not having to break my concentration on the screen by switching between programs.

Having the list of variables, I took a look at the common themes among them. This particular codebase, as many legacy codebases will be, did not have any methods, just one long stream of code, at least there were no gotos. After seeing the common themes between variables, you can start to see methods present themselves. I noted these methods. These steps are meant to give you a pretty good idea of what this piece of code is doing, and should begin forming an idea of what it will eventually look like.

Step Two: Organize The code

Up until now I had not been adjusting the code in any way. I purposely did this so that I would not accidentally change something on a whim, whether I thought the code was unused or if I thought it was just bad. Similarly, in this stage I am not deleting or rewriting any code, just reorganizing it. Starting with the variables, I moved them closer to the actual code that was using them. After that I moved the common variables and code around to place them logically within the file.

Step Three: Commence The Refactor

It may seem tedious to wait this long before refactoring. If you have quality tests around the code in question than you can confidently skip the previous two steps, most legacy code does not have quality tests, or any tests at all. On top of this, if you were asked to reverse engineer something, you can almost guarentee that no one knows all of the edge cases around how the system behaves. This is why I spent so much time learning the system, because once I start changing it, there will be no great way to know if I missed a minute detail.

All warnings aside, it was time to start refactoring. I started by deleting the really obvious things; unused variables, if statements that were always true, code that literally will never be ran, the kinds of things that make you want to cry. I then began renaming variables to have them make more sense. Followed by breaking out code into methods.

The code at this point is legible, and most importantly, you should have a deep understanding of it.

Step Four: Transfer To New Codebase

Ready for the easy part? Actually write the code in your new codebase. After struggling so hard to understand and refactor the old codebase this seemed like a relief for me. I understood the scope and I understood the gotchas, I hardly ever have a story like that.

Step Five: Expand And Have Fun

What was once a daunting task, is now a piece of code that hopefully you can be proud of, I know I was. In addition to this, when the customer wants to expand the feature set of the codebase, you now have the knowledge to easily do it.

It is important to note that while these steps seem straightforward, there was a lot of frustration involved. The first stage took nearly a week. Yes a full week writing things down in a notebook. Each subsequent step took less and less time, because I understood things more.

Originally, I coined this process as reverse engineering by refactoring. In the end, I realize it is just patient refactoring, which removes the need to reverse engineer.