Visualizing Automation Usage at Zymergen
How we built the infrastructure to collect and visualize equipment utilization data
Automation is a key part of the work we do at Zymergen, and lays the foundation for our unique approach to strain optimization and development of novel products. With Automation, we can design, build, and test thousands of microbes at a time with greater reproducability and efficiency than traditional bench methods.
Various Automation equipment we use at Zymergen.
We use a variety of lab automation equipment from different vendors to power our biological workflows. While this increases our flexibility and scope of experimentation, it also makes it more difficult to get consolidated usage data that tells us when and how we use our equipment. This is in part because each vendor provides a unique logging format that is often non-standardized. In addition, most of our devices were not designed to be utilized at the scale at which we use them within our integrated factory. As such, manufacturing metrics are not necessarily built in, and the logs generated are often not suitable for automated downstream processing with tools such as the ELK (Elasticsearch, Logstash, Kibana) stack.
Snippet of a Tecan EVOware log after an error.
As we further expand our automation factory at Zymergen, it becomes increasingly important to understand how we use our equipment:
- What percentage of time are machines utilized?
- What machines were used the most/least in the past month?
- Where are our bottlenecks by time of day or by day of week?
- What protocols are the most popular?
- What are our trends for equipment usage over the last year?
Answers to these questions (and others) can help us better understand our operational efficiency and make data-driven decisions in scheduling, equipment management, and factory operations.
For example, if Platform X has had extremely high utilization over the past few months, this may prompt us to look at bottlenecks and make a business case for purchasing another unit. On the other hand, if Platform Y is consistently underutilized, we may interview users to find out why it’s not being used and figure out ways to increase its value. In addition, if we find that the number and complexity of the protocol scripts run on our machines are increasing, it may prompt us to look into ways to reduce complexity and consolidate our scripts for simplicity.
Based on these needs, we developed Telescope, the infrastructure to automatically consolidate and visualize utilization data from our Automation equipment. Our technical solution contains three parts that span a range of technical tools including custom and open source software.
First, we transfer equipment logs from local machines into the Cloud.
Back Up Logs to S3
The first step in this project was to transfer equipment logs from the local machines into the cloud. Most of our vendor software must be run in Windows, so we took advantage of the existing Task Scheduler utility to run a PowerShell script syncing the log files to an S3 bucket every 10 minutes. As a bonus, to reduce disk space clutter, we added a cleanup step to remove old log files that have already been added to the S3 bucket. While installing this to our existing fleet of machines was feasible, we quickly realized that updating the package and adding it to all the new automation equipment we add would not be scalable with the existing manual process. We addressed this in a couple ways. First, we added the PowerShell script and Task Scheduler job to our equipment PC base image so that every newly provisioned automation PC would ship with the tool. To handle updates, we stored a copy of the PowerShell utility in S3 and added an additional Task Scheduler job to periodically check for updates and update itself if there were changes. This way, if we had to make an update, we could just make the change in one location rather than manually updating each PC.
Then, we use custom log parsers to extract and store useful data.
Next, we wrote parsers to extract useful data from the logs. Specifically, for each “Task” executed on a machine, we extracted the equipment name, protocol name, start time, end time, and outcome. When it was available, we also extracted parameter data (e.g. volume transferred). These values were then loaded into a MySQL database with parameters in a separate table to enable relational queries on parameter sets. We chose to implement the parsers in Python for its readability and its ability to integrate with Apache Airflow, an open source workflow engine we used to run our parse and load process on a regular schedule.
Lastly, we utilized the open source data visualization application Apache Superset to connect to our MySQL database and create meaningful visualizations. A few samples of these visualizations are presented below, with axes obfuscated for confidentiality.
Utilization: A summary chart with total utilization per automation platform over the last 30 days, with breakdown for actual run time (blue) and estimated setup/teardown time required to prepare plates and equipment (orange). This chart helps us track how efficiently we are using our equipment and bring to attention platforms that are concerningly over- or under-utilized.
Run Time by Week: Total run time over several automation platforms by week, with bar breakdowns by individual platforms. This graph helps us track our overall automation activity over longer periods of time. It helps validate the increasing utility of automation, and can potentially help us predict future trends in factory activity.
Protocol Runs by Time of Day: A heatmap with equipment usage by hour of day and day of week. Darker boxes indicate more protocol runs started in that period of time. This chart helps us identify high usage bottlenecks and discover patterns of utilization for our different workflows that can be optimized with smarter scheduling.
In the few months after the rollout of the Telescope dashboards, we have already made observations that have opened up new conversations on several fronts. Data on our protocol variety and distribution across machines illuminated the need for protocol file consolidation and helped get buy-in for a software project to standardize and automate protocol file generation. The Quality Control team was able to use time-of-day heatmaps to correlate sample contamination with machine usage throughout the day, providing evidence for a long-term laboratory investigation. Higher level equipment utilization trends are now being used to inform future equipment layouts and purchases.
At Zymergen, we believe that the more data we can collect about our processes the better. As we scale and develop a more comprehensive and complex factory, having the infrastructure in place to automatically collect utilization data will be critical to enabling data-driven decisions. By collecting data now, we lay the foundation for future applications such as predictions of usage peaks from historical data, or automated alerting from drastic equipment usage changes.
Peter Yin is an Automation Engineer on the Automation Design & Development team at Zymergen.