May 15, 2018

Software Engineering at Zymergen

The Zymergen Technology organization is composed of Automation (the robots), Software (the code), Product Management (the plan), and IT (the gear).

A robotic arm stacks plates in Zymergen’s lab.

The Zymergen Technology organization is composed of Automation (the robots), Software (the code), Product Management (the plan), and IT (the gear). We invest in developing robust and flexible software to allow us to design new strains, track and guide high-throughput experimentation, collect and analyze process and performance data, and improve the overall accuracy and efficiency of all of our operations.

The goal of this blog post is to give you a big-picture introduction and overview of what the Software Engineering team does, and how we go about doing these things, as well as to preview of content we will be posting about over time.

A Very Brief Detour into Bioengineering

Let’s introduce some biology terms for the benefit of those of us who don’t have a bio background (those with a bio background may cringe at these descriptions—we are omitting a fair amount of complexity and nuance in the interest of brevity).

Zymergen makes microbes—single celled organisms—better at making useful molecules. We do this by conducting an efficient large-scale search for beneficial genetic changes to their DNA, and we use software, ML, robotics, and, of course, biology and chemistry, to make it happen.

Conceptually, we start with a microbe. If we let it grow and replicate, it will make many copies of itself, creating a colony. All of the microbes in the colony will have the same genome and belong to the same strain. If we then modify this microbial genome in some way, we create a new strain. If we do this 1000 times, introducing a different change every time, we make 1000 new strains.

Modifying the genome is complicated; you can’t just reach into a cell and replace a few base pairs (the A, G, C, and T’s). One way to change a genome is to hijack the cell replication process. During replication, a cell makes a copy of its DNA. We hack that copying so that it takes a little detour, and picks up some extra DNA bits we want to insert, or skips copying DNA bits we want to delete. To achieve this, a bunch of other genetic sequences are involved—new on- and off-ramps for the copying process to follow, markers to help us identify which of the resulting cells is the modified copy and which is the original, and more. Collectively, all these DNA sequences are called parts. All of these steps come with quality control (QC) processes designed to validate that the parts are right and the reactions are happening as they should be.

To further complicate matters, we do all this in a high-throughput way, building and testing many new strains at once, using lab automation in a facility we refer to as “the factory.” To build and test so many strains well, we must rigorously define the specific processes and protocols involved in every step.

Software at Zymergen

The key framework around which Zymergen is organized is the DBTAL cycle: Design, Build, Test, Analyze, Learn. We create software to facilitate all parts of this cycle, illustrated in Figure 1.

Software Engineering at Zymergen

Figure 1. Zymergen creates software that touches all aspects of the DBTAL cycle. Our tools help scientists “Design” new edits and automate generation of required DNA part specifications, as well as interfacing with the manufacturing strain ordering system. We provide a set of process workflow definition and management tools to help keep manufacturing processes consistent and organized during the “Build” phase. Extensive data collection and QC processes that use the data help us be accurate and detect process deviations in Test. By combining models and data from past experiments with new findings, we are able to accurately analyze performance of new strains in Analyze, and propose new strains to build, or new strategies to pursue, in Learn.

This cycle involves a large surface area to cover, necessitating a team that includes people with a variety of skills—infrastructure engineers, computational biologists, front-end programmers, “big data” experts, data scientists, and generalists and specialists of all kinds.

A partial enumeration of the areas for which our team is responsible:

● Tools to allow scientists to specify desired strain modifications in bulk, and to request that these modifications be tried without needing to worry about the detailed processes used to introduce desired changes into a parent strain.

● Automated, machine learning-driven tools that can operate alongside human experts for designing strains. These tools can act either as independent designers that can order strains, or as assistants that can help scientists evaluate their designs.

● Computer-aided generation of parts required to introduce the designed changes, and organization of all resources required to build the ordered DNA, integrate the genetic change into a strain, and validate that the right DNA was integrated into the right place in the right strain.

● Tools to analyze next-generation sequencing (NGS) data and to validate built DNA sequences.

● Pipelines to sequence and annotate genomes of new organisms; Zymergen prides itself on our ability to work with a wide range of microbes used for industrial fermentation, and onboarding and annotating new genomes quickly and effectively is an important capability.

● An end-to-end Manufacturing Execution System (MES) to encode factory processes, making them repeatable, predictable, and reproducible—not to mention automated!

● Tools that help with data-driven decision tasks like instrument calibration, control charting, data normalization, quality control (QC), outlier detection, and overall process controls.

● A LIMS (Laboratory Information Management System) with which all of these tools interact, and which encompasses the core set of data models and infrastructure to facilitate all data tracking, ingestion, storage, and analysis. This includes both batch-oriented and real-time data processing systems.

Future blog posts will shed more light on details of some of the challenges we face when developing software to address these problems and will dive deeper into technical details of our solutions.

Choosing the stack

We tend to “use boring technology,” to borrow a phrase from Dan McKinley. As an organization, we try to stick to proven, well-known technology for our stack, so that our attention can be focused on solving cutting-edge problems rather than bleeding on the cutting edge of technology. Sometimes this means we forego the “best tool for the job” and use a good one that works well enough. We’ve found that focusing too much on what is “best” can result in many local maxima—optimal solutions for very specific problems—that add up to far less than a global maximum. Hidden costs like maintenance and upgrades, difficulty in cross-training, lack of reusability, high learning costs, surprising bugs and failure modes, and other side-effects of a proliferation of languages, databases, and frameworks tend to outweigh the specific benefits of choosing a set of tools that might feel ideal for a particular problem. This is especially true if those tools are brand new and hot off the presses.

In practice, this means that our stack is mostly made up of good, well-worn, and well-understood tools. Our main languages are Java, Python, and JavaScript. Our first front-end was Ruby on Rails; we are sunsetting it for a number of reasons, including simplicity of using fewer languages. Everything runs in AWS. Our core databases are MySQL. We use Apache AirFlow for ETL management, Apache Kafka for durable event streams of all kinds, and Elasticsearch for, well, search. It’s a no-surprises, or a few surprises, stack. None of these tools are perfect, but they all do their jobs well.

This “boring” stance is not an absolute position, but a pragmatic one. We adopt new technologies where it makes sense to use them—and we tend to be fairly skeptical about what makes sense. In order for new tech to be adopted for production use, it must have very significant, clear benefits, enabling something we fundamentally could not do with our existing stack. This process allowed us to adopt, for example, Amazon Athena for querying S3 data, and Stan for statistical modeling and inference. Recently released Amazon API Gateway support for resources behind a VPC allowed us to start using API Gateway and Lambda for internal use cases, and we are enjoying trial runs of using them for light-weight internal web services.

Ensuring that we remain flexible and adaptable, while not giving in to the temptation to proliferate an unsupportable menagerie of frameworks, databases, and languages, is tricky. Our future posts will describe not only how we use various technologies to make the future of biologically derived materials possible, but will also explain our design process and how we encourage experimenting with alternative or new technologies, while maintaining a high bar for inclusion.

Planning and Hacking

The domains we tackle are fairly complex, and software often serves the function of providing information bridges between a variety of specialists and functions. Changes in how the software functions often enable efficiency and accuracy improvements, simplifications to operating procedures, and other benefits. On the other hand, these changes affect how individuals in our company get their daily tasks done; rolling out such changes requires advance warning. Re-training staff on new execution protocols, adopting formal user acceptance testing, and weighing other “real world” considerations are often involved.

As a result, we’ve spent a fair amount of time refining our product planning and prioritization process. We invest time into creating product plans and requirements, and validating them with our internal users. We maintain widely shared and discussed design documents and timelines that are continuously refined and updated throughout implementation. We find investment in time to get aligned on what, how, and when we are building things is immensely valuable.

To strike a balance and provide a counterweight to the constraints of rigorously planned projects, we host internal “Hack Weeks,” in which engineers are encouraged to mix up their regular teams and create more or less anything they like (bearing any relevance to what we do, of course—including satirical relevance). Hack weeks allow “bottom-up” ideas to emerge and prototypes to be built without regular restrictions. Multiple hack week projects have resulted in shipped improvements, influenced roadmap changes, or informed design of projects we decided to tackle at a later date. Creating a hack week that’s fun for engineers, useful for the company, and provides a path to actually getting things into production is non-trivial; we hope to share how we’ve set it up in the coming months.

Culture

As we grew our engineering org through 2016–17, we invested a lot of time and effort into making sure we created the kind of internal culture we all want to be part of, and internal processes that let us scale up and handle rapid growth.

Our team members—in tech and indeed, across Zymergen as a whole—come from a variety of backgrounds, both professionally and academically. Everyone’s first few months tend to be fairly intense—there is a lot to learn, regardless of where one comes from. Because it is important to get everyone on the same page, we are continuously working to strengthen our culture and our onboarding experience: smoothing out onramps, creating “buddy” programs, collecting and organizing background reading materials, and creating classes and discussion sessions for everything from the basics of strain engineering, to best practices for code reviews, to architecture overview.

Our team members come from a variety of backgrounds, both professionally and academically.

We value respect, a culture of learning and knowledge sharing, open discussion, and inclusion; we constantly look for ways to encourage sharing of knowledge and skills, be it through formal classes and presentations, or through setting up interest groups, reading clubs, and the like.

Our engineering culture is something we are very proud of and work hard to maintain; you can look forward to some blog posts on that topic, as well.

Conclusion

We are excited to start sharing what we’ve been up to on this blog, covering software, data science, automation, and tricky problems that come up when you try to engineer biology; we will also talk about the practice of software engineering at Zymergen and ideas on growing teams with thought and care.

This post was contributed by Dmitriy Ryaboy, VP of Software Engineering. At Zymergen, Dmitriy is responsible for the Software Engineering and Data Science organization. He thinks a lot about team health, productivity, and effectiveness.