Migrating a Large Codebase to Python 3 in a Few Days
Python 2’s end-of-life announcement meant that we had to move a very large codebase to Python 3. Here’s how we successfully did it in a matter of days, and when our approach may work for you for similarly large projects.
We love Python. We use it for a wide range of tasks, from automation control, to high-throughput biological design work, to our scripting and API libraries, to data science. Python 2’s end-of-life announcement meant that we had to move large codebases to Python 3. Pain and suffering inbound, right? Wrong! We got one of our biggest codebases up and running on Python 3 within a few days. This post describes how we did it and when we like to solve problems this way.
How Did We Do It?
The Python Software Foundation’s official recommendations informed our initial plans — we would gradually convert the codebase to backwards-compatible Python 3 code while running Python 2, and when the entire codebase was backwards-compatible, we’d update the Python version to 3.6.
We encountered a couple issues with this approach. First, it’s very slow. You have to go through the entire codebase, making changes that don’t actually need to be there in the long term, for the sake of backwards compatibility in case Python 3 goes horribly wrong. We knew, however, that going back to Python 2 was rapidly becoming a non-viable option. Why keep writing code that works with Python 2 when we are never going to use it again? Second, we found it surprisingly difficult to estimate the size and complexity of this task. We didn’t know when and whether we would run into real, difficult issues to keep the code version-compatible. The unpredictable nature of this work made it hard to fit it in alongside our main stream of work.
In short, we weren’t happy with the set of potential outcomes that this approach provided us: lots of unproductive work, an unclear estimate of the total work required, and less time for each of us to advance scientific discovery.
So How Do We Do This?
We decided to try a different method: upgrading within a work week, wherein the entire team would dedicate themselves to making everything work in Python 3 and release it in “big-bang,” Python 2-incompatible form. Our approach had four steps:
- A brave engineer goes ahead to set up the playing field.
- The entire team makes the entire codebase Python 3-compatible.
- Deploy the Python 3 codebase.
- Fix any unforeseen bugs.
Each of these steps had hard, self-imposed deadlines. Projects like these, ones where we can’t see the road ahead of us, have a tendency to explode in complexity. At any point, if we slipped behind, we wanted to make sure to stop and go back to a safer, iterative approach. For example, if step 1 had not set the team up adequately for step 2 after one week of prep work, this would be a red flag for the approach and a signal to return to incremental work.
Step 1: A brave engineer goes ahead to set up the playing field
While the rest of us continued with regular work, one of us took to the codebase for one week to see what kinds of problems would take the most time to solve. This meant analyzing the necessary code changes that 2to3 and other existing automated tools won’t solve, estimating which parts of work would require the most attention, and estimating the riskier parts of the work that were liable to explode in complexity.
In addition to this risk analysis, they undertook the task of laying the foundation for our work. This included branching and setting up continuous integration so we could watch tests fail in Python 3; signaling to other teams that we would be solely focused on this upgrade; giving us advance notice on potential trouble spots; and getting some easy wins like using 2to3 to do automatic code migration. By doing all this, our engineer had done two things. First, they had given us the tools needed to come in to work on Monday and get cracking on the problem. Second, our engineer had done a safety assessment of sorts. Our brave engineer now knew where this effort was risky, where it was likely to fail, and had a sense of whether this work was fit for our entire team to spend the next week working on.
At this point, we had collected the relevant information and had a decision to make. If this problem had looked like it would sideline our team for weeks, we could have made the decision to stop and revert to the iterative, slow method that we initially attempted. This choice was crucial and was front-of-mind throughout this process. Keeping an eye on the scope was important because we didn’t know how much it would balloon until we were underway programming the changes necessary, and could see what new problems came up.
Step 2: The entire team makes the entire codebase Python 3-compatible
On Monday of week 2, we got our coffee, and got to work. Since our first engineer had made us a git branch with working Jenkins test pipelines the week before, we knew exactly which tests were failing. We each signed up for some section of the failing tests. Most of the failures were easy fixes: updating our requirements to Python 3 versions and using new Python 3 functions from those imports, iterators, dict changes, etc. Each time we dropped the number of failing tests, we celebrated and got back to knocking more out. This process took us about a day, and left us with about ⅓ of the tests still failing, for more complex reasons.
Let’s take a second to talk about what enabled us here. We invest in having a well-tested codebase, which is an essential requirement for this approach. At time of writing, the codebase has just over 100,000 lines of Python code, with 99% of all classes and 94% of all lines being tested by around 3,900 unit and integration tests. With this comprehensive suite of automated tests, we had a built-in progress metric and a way to stay on track. All of our fixes were focused on getting tests to pass. This kept us from straying into any other kinds of work.
We focused solely on this task, at the expense of developing features and fixing unrelated bugs. This is a real downside of the “stop the world” approach. In this case, we felt the trade-off was worth it. Slowing down to handle other tasks can drag the upgrade out indefinitely. If you split time or dedicate only some of your developers to the cause, you can get bogged down by the introduction of bugs unrelated to the upgrade, constant merge conflicts between your branch and master, and people continuing to write incompatible code.
This approach also requires regular communication to make sure work is not being duplicated. We created a Slack channel to document any changes we were working on. This was absolutely critical to avoid getting lost in circles or doing duplicate work.
Feeling good that ⅔ of our tests were already passing after the first day, we moved into the common problems that were keeping us from addressing the harder cases. We had some issues with commonly-used Zymergen data models, some caching issues tripped up a number of tests, and dict copy behavior was failing us in some areas as well. We convened to discuss the best approaches to each of these, and split up the work again. Some of these solutions resulted in more tests failing — like we expected, solving some issues exposed new problems that needed to be solved. Every time this happened, we outlined the new issue in Slack and someone took on the work of figuring out the new issue and solving it.
Just like our brave engineer from step 1 had the responsibility of aborting an exploding project, we had to look at each failure and decide if it was solvable within the week. We didn’t have any way to know what issues would pop up as we killed bugs — if new issues cropped up and it looked like something we were unlikely to solve within the week, we had the obligation to stop the all-hands-on-deck approach and go back to the safe path. We can’t be away from our work for too long, and if this approach is going to take a long time anyway, it doesn’t provide much benefit over the iterative approach suggested by the PSF.
Step 3: Deploy the Python 3 codebase
Luckily, we did not run into issues that needed herculean work, and we had all our unit and integration tests passing in Python 3 by Thursday. This was our green light to merge our working branch to our master branch, perform additional manual checks, and deploy using our regular release tools. In less than a week, we moved a large codebase to Python 3.
Step 4: Fix any unforeseen bugs
Automated tests can cover a lot, but there will always be something unforeseen. We wanted to be sure Python 3 didn’t keep our scientists from doing groundbreaking work, and so we made ourselves exceptionally available over the next few weeks in case bugs came up in their use of our tools. This meant hotfixing production as needed, with bugs taking priority over our usual work. One deepcopy bug came up, but it didn’t take too long to figure out. We don’t think this specific issue would’ve been detected in testing regardless of our approach to the Python 3 upgrade.
Due to the speed with which we pushed this out, we expected more production issues than we normally budget for. We felt that we could afford this time. We didn’t feel any burden in being extra available for support and squashing the last of the bugs because we saved so much time in not trying to make and maintain code that straddles both Python versions in production environments. We still spent much less time overall on this project than we would have if we had done it any other way. This allowed us to get back to our usual duties ASAP.
Taking a Step Back
Normally, we plan out small, incremental changes that can be delivered as independently, speedily, and safely as possible. This allows us to make frequent commits that keep builds green and tests passing. It is quite out of the ordinary for us to plan our work around a timed checkpoint, and even more out of the ordinary for our Continuous Integration tests to be failing for more than an hour, let alone several days.
But this time was different and it worked out in our favor. Therefore, it is worth asking what were the distinguishing characteristics of this approach, and why this problem made such an approach appropriate. After all, there are a lot of problems out there that look like language upgrades and it would be a shame if our Python migration was an isolated success.
What made this approach different
We identified four hallmarks of this approach.
Being OK with Continuous Integration being broken.
Perhaps the most notable difference to an incremental approach is that everyone is working off of a broken branch for several days. For example, we did not invest time in some intermediate state that supported Python 2 and 3 simultaneously. As mentioned earlier, Python has explicit instructions on how to keep things running while porting one’s code, but we forwent any notion of backwards compatibility at the expense of our Continuous Integration pipeline failing builds for a little while.
Favoring real time communication over upfront planning.
If a team is going to work on a broken version of the code, the next question is “how will things be broken?” We got a lot more mileage from keeping communications channels open and directing resources towards breakages as they occurred rather than trying to anticipate and plan for those breakages ahead of time.
Time-boxing and checkpoints.
In lieu of upfront planning, a team still needs a way to help predict and control how much time gets dedicated to a task. For this approach, a team should time-box their work and have checkpoints at the end of each time-box to make decisions about the project. The types of questions a team should be asking at these timed checkpoints are: “Should the team invest more engineering effort?,” “Should they invest less?,” and “Should the team bail on the endeavour all together?”
Making resourcing decisions based on the uncertainty of the work remaining.
Checkpoints are best viewed from the perspective of uncertainty — being honest about how much you understand about the work remaining. If the uncertainty is high, don’t scale up. If the uncertainty is coming down and a path to success is starting to crystallize, do scale up. If the uncertainty is coming down and failure looks certain, stop.
We started with one brave engineer when the amount of effort required was nebulous. As that engineer learned more about the work required and a path forward became clear, we added additional team members. If we had realized at some checkpoint that it was too ambitious to accomplish the migration in a week, we were prepared to bail.
When is it appropriate
This approach is well-suited for situations where the high-degree of uncertainty and all-or-none nature of the problem make the cost of investing in incrementalism outweigh the cost of having your working branch be in shambles for a little while.
When a problem has a lot of uncertainty (i.e. it is unclear how much work will be involved), it makes planning for incrementalism a high-effort, low-reward activity. With the Python 3 migration, regardless of how we split the work — whether it be by module, by the kind of update, or by covered tests — there was just no way to predict all the ways our code would break from any given change. Therefore, any effort to plan our increments ultimately proved wasted effort and we quickly realized that our time was better spent communicating and fixing those breakages than trying to anticipate them.
Similarly, when a problem is all-or-none, an undue amount of investment is required to support intermediate states. In our example (i.e. supporting Python 2 and 3 simultaneously), we would have had to make changes to our Continuous Integration pipeline, write version agnostic (or even version aware) code, and carefully manage multiple versions of dependencies. All of which would have needed to be undone once we fully dropped support for Python 2.
In short, for all-or-none problems, it might be true that there are structures one could put in place to allow for incrementalism — to keep everything working while slowly performing some task — but those structures may be expensive to create, maintain, and ultimately unwind.
That might sound like a bad position to be in, but if a team can get their system back up and running in a reasonable period of time, they’ve just accomplished the same goal as with the incremental approach (i.e. make some high-impact change), but without the additional investment needed to make incrementalism possible…and that’s a clear win.
What made it possible
In some sense, being able to forgo incrementalism was a luxury. Just because we were faced with an all-or-none problem with poor predictability didn’t necessarily mean that our approach would have worked. We had two very important things in place before we started that made this all possible: clear signals of success, and a short, well-defined timeline with a motivated and available team.
If you want to know how much progress you have made on a problem, the formula is (amount done) / (total amount of work). Problems like the Python 3 migration are scary enough because they are both high-uncertainty and all-or-none. In other words, the “total amount of work” is not well understood, and it delivers no value unless the “amount done” equals the “total amount of work.” The “total amount of work” will become more clear as time goes on, but if the “amount done” is also unclear…abandon hope.
We were successful because we had clear signals of success. For example, we could check that:
- Our code could start up without error
- We had confidence in our test coverage, and all tests were passing
If we relied on more ambiguous signals of success, i.e. “everyone is happy with the changes” and “the tests pass but we don’t have confidence in my tests,” it would have been impossible to realistically and responsibly gauge when we were done.
A motivated team with dedicated time
The migration to Python 3 was disruptive: it consumed the time of the people working on the migration, and it blocked anyone else from contributing to the codebase while the work was going on. Therefore, it was crucial that we had a team with a small, dedicated amount of time to tackle the problem — for us, less than a week. The most important word here is dedicated. That meant:
- Our team was allowed to completely focus on the problem, and
- It was “pencils down” when our time ran out — success or failure.
This fit nicely into the “time-boxed” nature of our approach and had a few key benefits:
- It kept the team motivated throughout the process. It is easy to burn out if you spend too long on what seems like an endless task.
- It established clear expectations with the people waiting for us to finish. If you say “come hell or high water, things will return to normal in a week,” people can plan accordingly.
- It let the team become wholly consumed with completing the task.
- It let us fail fast. If we failed to complete the task in the allotted time, so what? It was a relatively small amount of time and we would have learned an enormous amount for the next time we attempted the problem.
Looking Back and Looking Forward
We used one entire week of prep work, when our one engineer assessed the situation in front of us, and then ramped up to one entire week of our team of 10. This allowed us to upgrade to Python 3 in just under two weeks. We changed ~7000 lines of code across ~400 files with all ~3000 of our tests passing, and with only a few bugs that needed to be fixed as follow up. We consider this an absolute win! We didn’t need to spend countless hours making backwards-compatible code, we didn’t need to deal with the bugs associated with such code, and we didn’t need to take months to roll this out.
Migrating to Python 3 was far less painful than we anticipated, and we kept the impact to users minimal-to-invisible. It went well!
Additionally, we added another tool to Zymergen’s toolbox for delivering the best software to our scientists. We learned from our experience, generalized our approach, and stand ready to tackle future problems of similar flavor…or inspire others to do the same.
We were successful by being flexible and open to different techniques, and you can be too, now that your Python 2 codebase has gone EOL.
Alex McFadden is a Bioinformatics Software Engineer and Zach Palchick is a Staff Bioinformatics Software Engineer at Zymergen. They both work on Computational Biology at Zymergen.