We Make Tomorrow: On becoming a biological data scientist
Senior Data Scientist Janet Matsen talks about her career path, her typical work day, trends in the field, and getting your foot in the door.
The road to better biofacturing is paved with the stones of many databases. Janet’s work combines data from metabolic networks, public databases, gene ontologies, and functional annotations with Zymergen’s own in-house measurements at the plate and tank levels to predict how a biofacturing system will perform.
On April 20th, 2021, Senior Data Scientist Janet Matsen spoke with Vivian Ko of STEM Job Talk, a blog page founded by the UC San Diego Neurosciences Graduate Program to share interviews with PhD graduates in industry. Janet’s interview is being republished with permission from STEM Job Talk.
Janet Matsen received her BS and PhD degrees in chemical engineering for synthetic biology with a data science specialty. She brings together knowledge of biochemistry, industrial biotechnology, and data science to accelerate the rate at which we can improve engineered microbes for renewable chemical production. Her work at Zymergen as a Senior Data Scientist involves developing software to predict which genetic edits will result in more productive microbes to help explore the DNA design space more efficiently.
Can you describe your academic and professional background? What path led you to pursue this field?
I am dedicated to harnessing biology to help us live more sustainably. That passion and my endless appetite for learning has led me to a super-interdisciplinary field: using machine learning to engineer microbes for the production of desirable chemicals.
I started with an undergrad degree in chemical engineering, and then started a PhD focused on metabolic engineering. I came to see that combining data science with biology was going to be key for sustainability, and thus pivoted my entirely wet-lab PhD to a computational focus. I worked really hard to pull off that transition, re-entering challenging coursework as many of my classmates were defending their PhDs! I also worked hard to land two internships where I could practice those skills, network, and boost my resume.
Investing in so much interdisciplinary training felt very risky at that time: there were basically zero companies putting together these technologies successfully. However, I stayed true to my passions and got quite lucky with the timing of the bio-economy. Right as I was graduating my PhD, a company called Zymergen started to build out a data science team that leveraged the exact skill set I had assembled. I have had 3.5 great years there, sharpening my skills while working on challenging and meaningful problems.
How did you find this particular position, and what was the hiring process like? Is there a typical structure for this in your field?
I got my foot in the door at Zymergen as a software engineering intern, as the data science team had not formed yet. The internship was a great opportunity to study and practice fundamental software skills, and “prove myself” to the team, who all had expert software skills.
Later, when I did my interview for the full-time job, it was more casual because I had been an intern. I was asked to pitch a “strain recommendation algorithm” in a whiteboard session, and then explain how I would implement the software pipelines to make it a reality. It was somewhat like an academic job talk, in that I was pitching a big idea that could be a multi-year, multi-person project. I spent tens of hours preparing for that pitch.
A more typical interview is done in three phases. First, there is a phone screen with Human Resources, and maybe a call from the hiring manager. Next, there is a python coding screen with a live coding challenge. After that we invite the speaker in to give a ~40 minute presentation on a technical topic of their choice, and have 5 different hour-long interviews/chats with folks on the team. The individuals already on the team are tasked with evaluating different aspects of your qualifications, motivations, and compatibility with the team. Any interview day like that can be stressful and tiring, but Zymergen tries to keep the vibe friendly.
For a deep dive of Janet’s work, check out her 2018 talk about Programming Microbes Using Python at PyCon 2018.
Can you tell us about your current responsibilities? What is a typical day or week like in your role?
I develop and build machine learning tools and pipelines for strain optimization. We extract machine learning features to represent the strains we’ve built, layer in additional features from bioinformatics and public data, organize data into formats that machine learning can consume, and apply algorithms.
The majority of my work has focused on platform-level projects that can be leveraged across programs. Thus, most of my time is spent scoping and executing large bodies of work as part of a team, using Jira, Git, and wikis. About half of my time is spent on project planning and communication, and the other half is hacking, mostly in Python. We use traditional software engineering workflows, where there’s a 1:1 mapping between a sprint ticket and a feature branch. We write unit and integration tests, then do code review and eventually merge into the master branch.
New data types arise, and new biological perspectives often catalyze exploratory “adhoc” work, which we do in Jupyter notebooks. These explorations sometimes mature into products that we formalize via software engineering.
I am also responsible for looking ahead to understand trends and innovations will strengthen our strain recommendation systems. Thus, I also spend a few hours per week reading primary literature to make sure we are aware of cutting-edge tools that may be applicable to our problems.
What do you enjoy about your current job and work environment?
I love that data science brings together people who are super motivated; this is especially true for data science in the scientific domain. The field changes quickly, which selects for people who enjoy challenging problems and drinking from a fire-hose of information. My passion for continual learning makes this role a good fit. One week I can dig deep on the technical details of an algorithm, and another I can push my software engineering skills by deploying a new idea.
What are some of the challenging aspects of your job? Is there anything you wish you had known about your job or industry before joining?
Our goal (optimizing organisms using algorithms) is HARD! The design space (the imagined set of all possible genotypes) is enormous, and the fitness landscape we are optimizing over is complex (e.g. filled with epistasis). We also have to be extra thoughtful about how we use our data in a machine learning context, since some datasets are small and the individual data points in a dataset share evolutionary trajectories (and thus are not independent). This makes the problem challenging, but that challenge is also what makes the work so engaging.
Academics often underestimate the importance of having strong software engineering skills and collaborative coding skills in industry. Industry requires more polished software than academia, and the work is often bigger in scope. For example, we often write complex pipelines rather than one-off scripts/analyses. Moreover, industry views the code produced in academia as hacky and bug-prone.
Academics should invest hundreds of hours studying advanced git skills and software engineering if they want to be effective in industry. Landing an internship is the best way to shine light onto which skills you need to develop. Intern experience also increases your marketability to industry, because that experience is likely to have boosted your collaborative coding skills, relative to a typical academic candidate’s.
What’s changing in your industry? Are there any future trends we should be aware of?
Machine learning innovations move quickly, and our ability to gain data is growing rapidly. Advances in wet-lab technologies like CRISPR, cell sorting, and metagenomics are constantly changing the nature of our data science challenges/opportunities.
What activities, internships, or organizations would you recommend someone get involved with to help them break into this field?
Internships are key if you want to work in industry. Reach out to companies, even if they don’t list available internships, because it is possible they can open a position with you in mind. Industry likes to use internships as recruiting pipelines, and a 3 month internship is a great way to recruit.
My second internship (at Arzeda) was created because I expressed persistent interest and passion in the company — they did not have an open position when I reached out. I truly feel like the momentum of my career picked up from that one event!
Ideally your internship and/or first job is on a team that is large enough to inspire your continual learning process. Being around people with diverse skills highlights what different backgrounds can bring to a problem, thus inspiring learning.
Do you have any final words of advice for those navigating these career questions? Is there anything you would have done differently given what you know now?
My top advice for students interested in industry is to regularly invest time in networking. Spending an hour a week trying to make connections is one of the best investments you can make in your future. Make yourself an email reminder to spend one hour a week on it, and actually do it! It’s never been easier, now that video calls are typical.
Do some research on the people you reach out to, especially if they offer you their time for a 1:1 chat. At the end of successful conversations, ask that person if they know anyone in their network that could be a good fit for a next informational interview. This is the fastest way to expand your network.
Also, express your gratitude for your network. If someone invests time in you by doing an informational interview, it’s nice to send them an old-fashioned thank-you card (and a nice bar of chocolate can’t hurt). In addition to being polite, such gestures really help you stick in their memory.
Good luck with your journey!