Dec 12, 2018

AI in the Enterprise: Challenges and Opportunities (Part II: Challenges)

At Zymergen, we apply AI or machine learning techniques to many aspects of our high-throughput microbial genome assembly and testing systems and practices. Aaron Kimball, our CTO, offers his thoughts on what lessons we’ve learned as a result of this AI journey, and how these can be generalized to a broader business context. In part one, we discussed the advantages can AI bring to organizations. Here in part two, we’ll dive into the challenges and costs of its implementation.

The challenge of implementing AI

The benefits described in part I are applicable to any number of domains of human effort and industry. In any workplace, there are probably a dozen or more aspects that could be revolutionized by AI — tedium eliminated, decisions accelerated, revenues raised. This is easier said than done. Implementing AI and taking advantage of it requires that organizations overcome a number of challenges. We’ll now explore a few reasons to be cautious, and areas of risk to plan for and mitigate.

Data integration is necessary and difficult

Taking advantage of AI requires that AI system implementers integrate all the relevant data. Humans are amazing at integrating data. An analyst can overhear something in the breakroom — say, news about an equipment failure at a remote production facility — and later that afternoon when crunching the numbers for next month’s production forecast, adjust for that news on-the-fly, perhaps without even thinking hard about it.

AI systems cannot capture that ad hoc wisdom unless it’s quantified and inserted in a data pipeline, fed into a central data store, and then connected into the model itself. Even being resident in the data store isn’t enough; the AI system needs to know to “look” at a given feature. Each table or column added to the data store must be subsequently identified to each prediction model as relevant.

Adding features to a model is itself a process and can take hours to days, especially when factoring in software development and data science practices like backtesting, code review, and deployment cycles.

In a well-instrumented organization (be it a factory full of sensors or an insurance company with many data channel partners) there can be thousands of individual streams of data or metrics. Aiming these data streams to converge at a single data warehouse or data lake is a significant engineering challenge. Once aggregated, harder work begins to ensure the data is correct and complete. Having a record containing a thousand elements is not helpful when, in any given record, 50% of those data points are blank (and not always the same 50% record-to-record). AI is a data engineering problem as much as it is a data science one. At Zymergen, we have as many or more people working on data integration (in various capacities) as we do in data science.

Building a high quality system to address this challenge is expensive; pay scales vary significantly by locale and experience level, but a report by the University of Wisconsin shows that data scientists can earn $85,000-$170,000, and data engineers can charge $100,000-$165,000. Especially after including benefits and overhead, payroll for a modestly-sized team of five data scientists and five data engineers (plus a manager) will be a significant investment. On top of that, organizations should factor in the increased demand for cloud computing, which can cost tens of thousands of dollars per month in computing and storage fees, depending on the size of the data and the nature of the analyses or machine learning tasks being performed. One or more devops engineers will also be necessary to manage the cloud infrastructure, with an average salary of $133,000. Less quantifiable, but no less real, are the costs of retraining other staff to use delivered AI systems, and the added overhead cost of performing more thorough data collection throughout the business.

Consistently delivering accurate and complete records is a constant challenge, especially when some data is generated or collected by humans. People frequently omit data fields they deem irrelevant, especially when they are concerned with answering a specific question or “local” problem that may not require such complete data coverage. Attempting to then generalize a collection of locally-collected data sets into a longitudinal machine learning system requires instilling both individual discipline and data capture compliance processes in the organization.

By contrast, humans are great at operating on incomplete data and making decisions about how to move forward. Subject matter experts looking at incomplete input data rely on their own prior experiences to intuit reasonable outcomes and choices all the time. To move to AI is to relinquish subjectivity or interference, and commit fully to data-driven decision-making. When data is unavailable, the system cannot make decisions or recommendations in a sensible manner by drawing on any outside expertise or opinion, no matter how trivial. The organizational choice to move to a decision system built on AI is a conscious choice to require more thorough data collection, greater process fidelity, and increased rigidity in all aspects of the business that lead to that decision point.

You get what you ask for

AI systems are often composed as a set of predictive models, wherein each model makes predictions for one narrow facet of a larger business concern. Models must be trained to predict answers to the right questions, and determining those questions and defining them precisely is the work of several stakeholders across the business, not just technologists.

Understanding what truly drives value, and optimizing for that, is key to using AI to raise the bottom line. The cautionary tale of the paperclip maximizer is a thought experiment about how an AI system tasked with maximizing fulfillment of office supply inventory can wreak havoc through the unanticipated consequences of a poorly-chosen objective function. “Generalized” AI (with an artificial sentience), with the potential for sowing chaos like that described in the paperclip maximizer example, is decades or more away — if it’s even possible to build at all. Nonetheless, objective functions should be carefully chosen. Consider systems aimed at reducing the cost of inputs to a process. There may be multiple possible collections of inputs to a process with different potential payoffs for their outputs. An AI system optimized for lowest input cost may consistently deprioritize using higher-cost inputs that stochastically produce much higher-value output, leading to lower aggregate revenue. As with all use of metrics, “you are what you measure,” so defining the right objective function for an AI algorithm is critical.

That having been said, it is exceedingly unlikely that teams implementing AI will identify the correct objective function on the first iteration. Organizations need to monitor the outcomes of AI-driven projects, and refine them to drive to improved outcomes over time. Such a refining process is hard: to retool a running AI system to ask a different question or modify its objective function is a heavyweight challenge, involving significant engineering and data scientist effort; it may also require collecting and integrating additional data streams into the data warehouse. Contrast this with making changes in a conventional (non-AI-driven) marketing organization. Were the CMO to decide, “we should be targeting our advertising by income, not age,” a marketing analytics team could begin work on a new targeting pattern immediately. Rather than modifying a pivot table in Excel or SQL query in a customer database, ML system process changes take days, weeks, or months.

AI is automation for decision analysis, and automation is a double-edged sword. An automated task in manufacturing is one that can run at high throughput with minimal human involvement. But modifying an automated manufacturing task becomes an engineering challenge, which is much slower to execute than simply retraining personnel. Tesla famously relied on too much automation, according to Elon Musk. Changing the Model 3 assembly line proved painful, and his final assessment is that “humans are underrated.”

The same is true for AI. Human analysts can tweak Excel models, edit SQL queries, or modify dashboards quickly to arrive at different kinds of conclusions. By contrast, if many decisions are automated and tightly locked together in an AI system, it can be painful, expensive, or slow to uncouple them.

Many AI systems are implemented, either consciously or by virtue of incompleteness, as “human-in-the-loop”: decisions from one stage of a pipeline are presented to a human before being fed as inputs into the next stage of the pipeline. Each of these handoff points represents an opportunity for a thoughtful person to sanity-check the output of the previous stage, or inject manually-derived input from an adjusted or ad hoc process. At the same time, these handoff points also require labor to be performed by process experts, and also represent an opportunity for delay; if one of those experts has a day full of meetings, the pipeline can stall.

A fully-connected, fully-automated system is hard to achieve as a matter of practice, and would require sustained engineering effort over many years to build. Yet, organizations still tend to strive toward connected pipelines within each functional area, which leads to a more rigid system. Choosing the right degree of automation is more art than science, and deliberately choosing the points to insert human intuition, flexibility, and oversight into a process is one that must be done thoughtfully.

Using AI requires a change in mindset

AI is as much a technical shift for companies as it is a process or change management shift. Of course, AI doesn’t happen without data scientists and software engineers building systems and shipping code. But beyond the technical challenges (and pitfalls described above), the entire organization must buy into the concept of AI-driven decision-making. When human judgement disagrees with a proposal from an AI algorithm to target a specific market sector or adjust prices a particular way, will the team believe it and act upon it? What about when the AI system, like Deep Learning, is impossible to analyze or explain?

Even at Zymergen, scientists sometimes struggle to accept counter-intuitive or unconventional proposals emitted by AI systems, even when on average, following the AI models has a statistically measurable benefit. Organizations with a long institutional history of relying on human intuition or observable analytic processes will be challenged to let go of the wheel and let an AI system drive. Starting with AI systems that have limited scope and impact may help create trust in AI tools before launching a more ambitious initiative. But this must be paired with continued leadership from high-level management that wins buy-in at all levels to be successful.

In conclusion…

AI systems have at their command fast calculation abilities, deep recall of data, and powerful statistical models. These tools can help organizations across many disciplines make more data-driven decisions and improve business outcomes. AI can increase the frequency or speed of decision-making and do so more consistently than organizations that rely on key individuals with subject matter expertise.

But implementing AI has costs that must be weighed: in addition to the high upfront price of the engineering cost of setting up AI systems and machine learning pipelines, organizations using AI are making a conscious choice to focus their decisions around data collected by more expensive, and less flexible, means than ad hoc human-driven data analysis. Organizations that require frequent adaptation may find that depending on AI has locked them into processes that are too difficult to modify in a timely fashion. In addition, mistrust or backlash from employees pose cultural and communication challenges to AI adoption that require confident and steady leadership to overcome.

At Zymergen, we were able to reduce human workload and improve outcomes on critical processes in a way that helps us scale our business. Certain of our objectives would not be possible without these gains in efficiency or accuracy. We chose to invest in data science capabilities that produced tools and AI models because they are necessary to overcome certain bottlenecks of time and labor in our operating model, or provide accuracy improvements with a high ROI, and we expect these performance gains to grow and compound over time. It is clear that this is a critical part of the path to our growing business’s future.

When considering a business commitment to AI, leaders must be thoughtful about where and how to implement it. After all, “AI first” does not mean “AI everywhere,” or “AI at any cost.” But given its potential to systematize operations, drive better outcomes, and increase speed, AI is a tool that cannot be overlooked, and it represents a multitude of potential opportunities.