Jun 22, 2021

Beyond the t-test: Using Bayesian modeling to separate signal from noise

A data science approach to finding commercially relevant titer improvements in fermentation.

By Aisha Ellahi, Ph.D.

Emancipation Day March 1905

The primary aim of strain engineering at Zymergen is to genetically engineer a microbe that can produce high titers of a commercially relevant target molecule. This is done by screening hundreds to thousands of genetic edits in small-scale, high-throughput microtiter plates and selecting a small subset for testing at larger scales in fermentation tanks.

Here, I describe the Bayesian modeling framework used by our Data Science team to remove process-driven noise and estimate true strain performance from high-throughput screening (HTS) data. The removal of such noise is particularly necessary for strain optimization programs in which the host organism is near the theoretical maximum titer/performance, and therefore, genetic contexts in which mutations that individually have small anticipated effect sizes must be combined (or stacked) to achieve percent improvements in titer that are commercially significant.  

Zymergen’s high-throughput screening platform

Zymergen’s high-throughput screening (HTS) platform enables the building and testing of hundreds to thousands of genetically distinct microbes (or strains) per experiment. Strains are grown in 96- or 384-well microtiter plates with multiple replicates per strain (Figure 1). Phenotypes such as overall growth or biomass, glucose consumption, pH, and titer of a desired target molecule are measured and recorded. 

Figure 1. Overview of Zymergen’s Test process. Gray circles in panels 2 and 3 depict a single parent strain (often called the “reference strain”) that is assayed on every plate. Test strains are depicted in green. Multiple replicates per strain are assayed. Portions of this figure were reproduced and adapted from Figure 2 of the blog post titled “Robots, Biology, and Unsupervised Model Selection.

Screen many, select few: The quest to identify truly improved strains

The purpose of HTS is to quickly and cheaply screen thousands of genetic mutations-many of which are anticipated to have zero effect on the desired phenotype-to identify the rare few that do. Identifying truly improved strains, however, can be difficult for at least three reasons. 

First, an organism’s biology can be finicky and unpredictable; there is wide variation in how different strains behave in different conditions. Some are remarkably steadfast in their growth and performance, regardless of external conditions and how their genome is perturbed. Others are remarkably sensitive. Conditions a strain might be sensitive to include media formulation, temperature, cell density, and genetic background, meaning the particular genotypic context in which mutations are introduced. 

Second, the closer an organism is to its theoretical maximum for producing a target molecule, the smaller the anticipated effect size is of a given genetic edit on titer. This fact, alluded to earlier, means that the vast majority of mutations will have zero effect, while those that do will have only a slight effect. The genetic signal will be small and likely hard to separate from other variables that affect titer.

The third factor that contributes to noise is the HTS process itself. Variations in media batches, incubator temperature, inoculum volume, and even the position of a well within a microtiter plate introduce slight biases in observed measurements. And of course, measurements are inherently noisy: even two wells inoculated from a single colony of genetically identical cells will yield slightly different measurements in the same assay. Further still, even two measurements taken from the same well will almost always be slightly different. With so many variables affecting performance and introducing bias, assessing a test strain’s true performance is non-trivial.  

Inferring strain performance through Bayesian modeling 

Though measurements are noisy, data scientists can attempt to estimate and remove noise by building a model of the data generating process. This can also allow for an estimation of the true effect of a genetic mutation on an observed measurement. Variables like noise, genetic edits, and other factors that affect a measurement are commonly called “parameters.” The Data Science team at Zymergen uses a Bayesian approach to estimate parameter values because among the many available modeling tools, a Bayesian approach has several advantages. The first advantage is that under a Bayesian framework, parameter values are estimated as probability distributions rather than absolute quantities or point estimates. Representing parameters as distributions enables a more intuitive quantification of the uncertainty in said parameter’s value. It can also reveal which parameters might be driving noise or which may require more sampling. 

Another advantage is that Bayesian methods provide a ready framework in which to incorporate prior knowledge about a parameter’s value into the model via prior probability distributions, or “priors.” Priors weight possible parameter values based on a scientist’s historical knowledge about the physically plausible range of values for a given parameter. A biologically-relevant example is to place a uniform prior on titer that restricts its value to a number between 0 and 100 g/L. This prior might be informed by expert knowledge of the microbe or by an estimate of the theoretical max for the target molecule’s titer.    

A third advantage of the Bayesian approach is that it can easily be adapted to model data in which parameters are non-independent, grouped together, or related to each other in some complex way. Such parameters are often called hierarchical parameters, and in the framework of hierarchical modeling, inferences made about one parameter can inform another. The hierarchical framework is particularly relevant for the HTS data because the HTS process is inherently structured: strain replicates are grouped by the genetic mutation they carry, wells within a microtiter plate are of course clustered by said plate, and plates themselves are further grouped by experiments (see Figure 2 for a visual depiction of this hierarchy). Another way in which HTS data is structured is in how parent strains and reference strains (i.e., strains used as biological controls or performance benchmarks) are replicated across plates. Even further still, media batches are used across many plates, and thereby constitute another facet by which plates can be grouped. We expect wells in the same plate, strains descended from a common parent, and strains grown in a particular media batch to behave similarly.  

While technically speaking it is possible to model the HTS process without a hierarchical approach (for instance by adding each parameter as a separate regression term), doing so can lead to parameter estimates with wildly varying levels of uncertainty due to differences in sampling. A hierarchical approach, by contrast, allows for the leveraging of estimates made about one parameter to inform the estimate of another parameter that is grouped in a related hierarchy. For example, HTS plates are grouped by experiments, and occasionally an experiment may have many fewer plates than others for a given HTS process. The estimated experiment effects from experiments with more plates (i.e., more samples) can be used to inform the estimate of the experiment with fewer plates.

HTS data can be structured as hierarchical probability distributions

Figure 2. A visual depiction of how HTS data can be structured as hierarchical probability distributions. Parameters at the upper levels of the hierarchy, such as “parent strain” and “experiment,” inform the lower-level child strain and main plate parameters clustered below them. Image courtesy of Nathan Chan, Data Scientist at Zymergen. 

An example of a Bayesian approach to modeling

Suppose you are a scientist charged with engineering a microbe that can produce 5% more than its parent strain of a desired molecule (“Your Favorite Molecule”, or YFM). You take an approach in which you engineer mutations randomly throughout the genome and build, let’s say, 1000 strains. After growing and measuring YFM titer, some strains appear to have higher titer than their parents, but you observe some inconsistencies: percent improvements vary experiment-to-experiment, plate-to-plate, and even among strain replicates within a plate.  

How much of the variation in titer is due to genetics and how much is due to process-driven bias? We can build a simple model to approximate the data generating process in which we assume that each strain has some true response that is a function of the genetic edit it carries. This true response is obscured by the bias of the particular plate the strain is grown on (perhaps itself a function of how that plate was processed or handled) and random noise. While we don’t know the exact value of each of these parameters, we can make some simplifying assumptions–for example, that global noise is gaussian, and that the effects of each parameter on the response are additive. 

Below is a mathematical definition of the model, where an individual well-level observation yi can be expressed as a function of the strain grown in that well (βstrain), the microtiter plate that well resides in (βplate), and a global noise term (σ):

beyond t-test equation

Strains can be grouped by the parent strains from which they are derived, and plates can be grouped by the experiments in which they are physically processed:

beyond t-test equation

The distributions of βstrain and βplate are thus themselves functions of the βparent and βexperiment parameters respectively. Below is a probabilistic graph model (PGM) of the model outlined above:

Beyond the T-test equation

Figure 3. A probabilistic graph model of the model outlined above.

Probabilistic graph models like the one depicted in Figure 3 offer another way another way to visualize the hierarchical relationships between parameters. After models are fit to the data, correcting measurements for process noise, or “normalizing” the data, is as simple as subtracting estimates from each measurement. An example of HTS data pre- and post-normalization for a biological control strain is shown in Figure 4. This example data shows how much variation in titer is observed in a single control strain expected to show an even performance across plates.  Raw measurements shown in 4A cluster and vary by plate, whereas we expect the strain to show roughly the same titer across plates. Modeling allows us to estimate the effect of each plate and subtract it, resulting in the “normalized” measurements shown in 4B.

Beyond the T-test figures

Figure 4: Normalization in action. (A) Example of raw titer measurements obtained from a control strain grown in twenty-six different 96-well plates. It’s performance is expected to be roughly the same across all plates, yet as shown in panel (A), titer varies by plate. Such variation exemplifies plate bias observed in a typical HTS experiment. These variations could be driven by differences in temperature, inoculum volume, or media formulation across plates. (B) “Normalized” raw data from panel A, in which the mean plate effect inferred from model fitting has been subtracted from the raw measurements on the right. Removing the effect of plates allows one to directly compare measurements across plates.

Does normalization help? 

The sections above have outlined one method by which one can quantitatively estimate the effect of a genetic mutation on a strain’s phenotype. While this is all well and good in theory, how do we assess that such an approach is in fact leading to better predictions and actually identifying truly improved strains? As Richard McElreath states in the first chapter of Statistical Rethinking, a widely popular primer on Bayesian statistics: “A concern with ‘truth’ enlivens these models, but just like a golem or a modern robot, scientific models are neither true nor false, neither prophets nor charlatans. Rather they are constructs engineered for some purpose.” Bayesian models are silicon robots of our own creation, and like robots, they will dutifully crank and churn on any dataset they are fed, producing results and predictions completely agnostic to truth and context. So how do we know they are helping and that their results reflect truth? 

One way to answer this question is to use differences between strains that have been observed in fermentation tanks as “ground truth.” Groups of strains that rank in this way are referred to as ladder strains. The task then would be to see if normalized microtiter plate data of ladder strains better mirrors differences observed in tanks compared to raw plate data; if they do, we know the Bayesian approach is working in it’s aim to remove HTS-specific noise. If they do not, then we know that either the model is, at the very least, perhaps misspecified, or alternatively that the plate conditions may not be accurately reflecting fermentation tank conditions. A cleaner approach involves using simulated data in which one “knows” which strains are statistically different in performance and to test if the model can replicate said differences. 

At Zymergen, we’ve used both approaches to validate our Bayesian models, and we have found that removing plate bias does increase the confidence with which a scientist can say a given strain is improved over its parent. The exact percent difference may not replicate in a fermentation tank, but the qualitative ranking of top improved strains does. Another bonus is that the models can reveal variables in the test process that are driving noise and can thereby help us improve it. For example, if a particular experiment or media batch is found to have a non-zero effect on measurements, actions can be taken to further investigate the cause of this bias. As we continue to test hundreds of mutations, we will accumulate more data that will enable us to generate better, more informative models.   

Sheeva Haghighat, Technology Product Manager at Zymergen


Bayesian hierarchical modeling of Zymergen’s HTS process is a powerful way to correct measurements for process-driven noise and bias. The probabilistic nature of Bayesian methods as well as a ready framework in which to incorporate prior knowledge allows for better parameter estimates and a straightforward estimation of parameter uncertainty. And without bias and noise affecting their estimated titers, strains can be directly compared to their parents as well as to each other. This results in a more accurate estimation of the percent of improvement a strain shows relative to its parent or other strains.    

Aisha Ellahi, Ph.D. is a data scientist with a background in molecular biology and yeast genetics. She spent the early part of her career studying the evolution of gene function in yeasts using genomics. Now, she uses her domain knowledge to develop models that quantify process bias and predict strain performance at Zymergen.

Learn more about our research and development in unsupervised model selection, biological data science, and machine learning.