Blog | matter & bytes

Of (unit-)tests and code

July 22, 2021 · 10 min read

"To me, legacy code is simply code without tests."
-- Michael C. Feathers

I want you to take a few moments and think about this quote. Read it five times, take a look at the window, and think about it.

Done? Allright, let's continue! If the exercise above made you a little uneasy, you are not alone. I'll be the first to admit that I have written some legacy code in the past. Most probably you did it too. Many (let's call that number $N^\text{today}$ ) people are writing legacy code as you read these lines. The main goal of this article is to make $N^\text{tomorrow} \lt N^\text{today}$ . Instead of attempting to write an exhaustive article, my strategy to convince you will rely on personal experiences. Along the way, I will also point to some references.

I was first exposed to the quote above in 2011, when I was just starting to get excited about C++. Enthusiastic about the programming language I had just decided to learn, I started looking for books which taught me what good C++ code looks like. Although it is not C++-specific, Michael's book Working effectively with legacy code was one of the first recommendations that I had found. I spent many evenings reading through it, and it seemed to make sense. The only problem was that I was not in the right situation to truly appreciate it -- instead of having to work with existing code, I was (luckily) in the situation where I could start something new. All I had were small scripts and fluid dynamics simulations, which were not too difficult to understand. On the other hand, there were many incentives to increase the number of features of my software, and I was somewhat aware of a corresponding increase in complexity. However, I had gotten used to manually testing my software (ocasionally), and adding unit-tests seemed like extra work (and who has time for that, when there are so many other fun things to learn and try?). I sometimes hear similar attitudes being expressed by software developers at earlier stages of their careers. From a certain point of view, it makes perfect sense -- when everything is new, you want to explore as much as possible, to get a better feeling for the possibilities available.

The growing elephant in the room#

Many software projects remain small. Perhaps you want to analyze some data and make some plots. Fire-up a Jupyter notebook, iterate until you obtain your figures, and you are done. If you are like me (and can afford the time), maybe you go back to do some refactoring, in case you feel that you might need to use that code in the future. After the paper is published, you probably let that code sit on your hard-drive, until the drive fails¹.

However, once-in-a-while, a small project begins to grow. You start re-using that code over and over, so you might decide to create a library². You show it to other people, and some of them like it. They tell you that they would like it more if you only added feature $X$ and $Y$ . You go ahead and do that (while also adding features $Z$ and $W$ because you can). Without you even noticing it, the project has grown. Not too much, but enough to no longer fit in your head. Only that you may not be aware of that yet.

The feature-requests keep coming, and you do your best to keep up. However, at some point, you might notice a few strange things:

You no longer understand all of the code.
Adding features becomes harder and harder.
You might start getting a little anxious whenever you need to work on that code.

So what happened? This was a project you loved, and you did your best. However, despite your best intentions, it turned into an unwieldy mess.

I have been in that situation more than once. Initially, I thought the problem was mainly in the code. I told myself that, if only I would have strived to make the code more clear, things would have been more manageable. Surely, design patterns should have helped, as well as a better thought-of architecture. However, how do you draw a clean architecture when the software is just growing organically?

The architecture of the code is often evolving (and you should accept that)#

In the case of software which grows organically, I would argue that you should expect architectural changes. If you want your software to survive, you need to be prepared for that fact. And (as you probably guessed by now), the best way to prepare is to write tests.

So, if I had the chance to go back in time and talk to my younger self, I would have said (with the wisest tone I could muster) that the problem is not the code, but the tests. Automated tests, which you run (or get executed) before every commit. And, unfortunately, I had none. Michael was right all along.

How do tests help (even when you are in a hurry)?#

One of my mother's favourite quotes is "grăbește-te încet!". In English, it means "hurry-up slowly!". Applied to software, this could mean that, as much as you might like "hacking" things until they work, you should first clarify to yourself what you are trying to achieve. If you are writing a new function, what would be the output(s) for some specific input(s)? If it's not a trivial task, write them down somewhere. Think of more pairs of inputs and outputs, to cover several cases. OK, done that? Great! Now take what you just wrote down and convert them into unit-tests. Run the tests, to make sure they fail. But don't worry -- now you may start writing your function. You put your coding hat on, and you get the job done. All tests pass, and you are a happy camper again.

So why is this worth the hassle?

Unit-tests tell you when your task is done#

If you were reasonably-thorough when you identified the input-output pairs, the moment when your tests pass is when you know that you have something working already. Most programming environments will give you a green dot/check-mark. In time, you will become addicted to seeing that green dot, and it will give you a small dopamine "hit" whenever it appears (better than scrolling on social media, in my opinion).

Unit-tests allow you to refactor with confidence#

OK, tests are passing, but maybe you don't like the look of your initial code. You could make it nicer, but how can you be sure that you will not break your precious solution? Well, actually you should not worry, because the tests are still there, to act as a safety-net for you. If you break something while refactoring, you will get to know about it!

Unit-tests allow you to change your architecture#

Assuming that you already have a good amount of unit-tests under your belt, perhaps you discover that your software would be easier-to-understand/faster/etc., if you just re-organized parts of the code differently. But, can you tell that the features you need would still work after your re-organization? If you did a good job with your unit-tests, you bet you can!

How unit-tests allowed a computational physicist to write a compiler#

Without any doubt, one of the most complex pieces of software I have worked on so far is the compiler for the GeLB domain-specific language (DSL). I will write more about that in a future post, when I will announce the public release. For now, I just want to tell you how unit-testing made this effort possible.

Unfortunately, I discovered the amazing world of programming language design later than others in this field. The first steps along this path were when I was trying to test different data-layouts for a simulation, to optimize the cache utilization. Instead of manually generating all the possible combinations, I learned just enough M4 macro-programming to become dangerous, and quickly got something working. Now I know that M4 was not the best tool for the job, but that doesn't matter. What it dit bring me was this "aha!"-moment, when I realized that a mortal like me could actually write code which wrote other code...which is probably a trivial fact for CS majors, but not so obvious for young people working in other fields.

But let's get back to our topic: these initial efforts eventually evolved into C++ libraries, using (probably too much) template meta-programming. Eventually, I hit a wall (about which I will write later), and realized³ that the best tool for that job would have been a new programming language. And, because I already had a lot of experience in that specific domain, I had a pretty good idea of what that language could have looked like.

Now...every programming language needs an interpreter or a compiler. Writing such a tool is hard-enough when the language you implement is already well-specified. However, it becomes a fiendishly-complicated task when you are defining the language in parallel.

I am $100\%$ confident that I would have had no chance of making any progress with that project if I would not have remembered about unit-tests. So I broke the big, hairy problem of implementing a compiler into very small chunks, focusing on what are the valid elements of expressions, which data-types to support, etc. And, for each of these small chunks, I wrote unit-tests. Although I had a well-defined high-level idea of what I was trying to build, the devil is always in details, and many parts of the code had to be re-designed along the way. Unit-tests helped me to avoid many unwanted side-effects of these re-designs, and allowed me sleep much better overall.

And, on the few occasions when I did skip the unit-test phase, I eventually discovered some unexpected interactions, which lead me to adding unit-tests anyway at later stages (after unnecessarily wasting some time with avoidable debugging sessions).

So, I learned how to "hurry-up slowly", and I think you should too :).

Where to go next#

This first blog post is already quite long, and it's time for me to get back to work. Obviously, there is much more to be said about this topic. However, instead of repeating information from others, I will end with some references:

Working effectively with legacy code, (by Michael C. Feathers) -- a good general reference, which is especially valuable if you start contributing to existing projects.
Unit Testing Principles, Practices, and Patterns (by Vladimir Khorikov) -- a more recent book, which focuses more specifically on unit-tests.
Modern C++ Programming with Test-Driven Development (by Jeff Langr) -- if you want to focus on C++

See you next time!

Just joking...you do make back-ups, right?↩
Or package, or whatever your programming language calls it.↩
Also a topic for a future blog post.↩