Composing an AWS Reservation Portfolio
How we combine Amazon Web Services discounts to provide value and flexibility as we grow
Zymergen uses Amazon Web Services (AWS) as its primary computation platform. Using AWS is less expensive than running your own computer data center, but it can still add up to significant sums. To control costs, AWS offers discounts on some of its compute services. It packages these discounts as “reserved instances” (“RIs”), which customers can “buy” from AWS. There are many types of RIs, and they embody commitments to buy matching services from AWS. (A good introduction to AWS pricing can be found in this two-part article series.)
As AWS usage has grown at Zymergen, we wanted to get the best discounts. This article will describe the method we use.
How Reserved Instances Work
AWS reserved instances can be confusing. They’re not instances at all; rather, they are an agreement to pay for specific services over a period of time. Like a lease, or a wine club membership, you agree to a schedule of payment, and in exchange you can get a discounted price. You start the agreement by “buying” a reserved instance (RI), and that starts a payment schedule and a discounted price that can last either 12 or 36 months.
AWS RIs reduce the per-minute compute charges on two of its most popular computing platforms: “EC2” virtual servers and “RDS” database servers. Other AWS services, like storage, networking, and other specialized services are not eligible for RI discounts. We determined that about 60% of our AWS charges were on either EC2 or RDS, so discounts should apply to most of our charges.
RIs are not attached to any particular EC2 instance. Rather, AWS applies the RI discounts each hour, reducing the cost of any instance that is running during that hour that matches the RI. The match between RI and a specific instance is based on several factors, including:
- The location of the EC2 instance. RIs will apply only to a particular AWS region, or an availability zone within that region.
- The architecture of the EC2 instance. RIs will apply to a particular class of instances based on the CPU generation and other features of the instance (presence of GPU, type of networking, and type of storage). RIs generally apply to any instances within a class, with flexibility as to the specific size of the instance. For example, with a single RI purchased for an m4.xlarge instance, the RI discount will apply to one m4.xlarge instance, but if none are running, the discount will apply to two m4.large instances.
- Operating system software: RIs are limited to one of generic Linux, Windows, and RedHat.
When RIs were introduced 9 years ago, they applied to a single location, architecture, and operating system for their full duration. In September 2016, AWS introduced a 36-month “convertible” RI, and in November 2017 AWS introduced a 12-month version; both versions were limited to EC2, and not RDS. These permit any number of changes to location, architecture, and operating system during their lifespan, as long as the dollar value of the RI never goes down as a result of a change.
Finally, RIs are sold in two payment systems. “Upfront” RIs are prepaid, with all charges paid at the time of the RI purchase, and nothing more charged from month to month across the 12 or 36 month duration of the RI. By contrast, “No upfront” RIs are charged at a fixed price, month-by-month.
With all these factors, there are many types of RIs available from Amazon. At Zymergen, we use almost all one operating system, just a few system architectures, and little other variation, and we are confident that none of these factors will change very fast; if an AWS customer has more variety or less control over these factors, they might have more trouble using RIs. We calculated our possible discounts through using RIs, and they were substantial. For every $100 in EC2 on-demand instance charges, reservations provide these discounts today. Discounts vary by system architectures; these numbers are an average among the architectures we use most frequently.
Examples of Reserved Instances available for EC2 (virtual servers). Pricing discounts vary by system architectures; these numbers are an average among architectures Zymergen uses most frequently.
Examples of Reserved Instances available for RDS (database servers). Pricing discounts vary by system architectures; these numbers are an average among architectures Zymergen uses most frequently.
In EC2, the difference between upfront and no upfront was small, ranging between 2 and 5% per year. In effect, by paying upfront as a customer, we were advancing money to AWS at 2–5% interest. As a result, we decided we’d rather use the no upfront EC2 discounts. For RDS, the difference was more substantial, especially for the deeply discounted three-year RI, so we kept all the options open.
A Portfolio Approach
Zymergen’s need for AWS compute services is sure to vary over time, as our company evolves, and as new types of instances are offered by AWS. As a result, some flexibility is needed in the choice of RIs to match our needs. At any time, we will have purchased a set of RIs — a portfolio — that can be balanced to fit our usage.
In 2016, we experimented with buying some RIs. At the time, we chose the deepest discounts of non-convertible 36-month RIs, for a small portion of our total usage. Going forward, we knew we wanted a mix of 12- and 36-month durations. The main advantage to 12-month RIs is that they expire within a relatively short duration; if our demand for AWS instances fell (e.g. from adding services from an AWS competitor, or a shift to AWS services that do not offer RIs), there would be a near-term reduction in the AWS commitment. The 12-month RIs save less money, but they offer a hedge in our commitment to AWS. We also found convertible RIs attractive because Amazon introduces new instance types every year, and we like the option to shift some of our usage without penalty.
Here’s a representation of how three different kinds of RIs might be combined in a portfolio:
- The green boxes represent the 36-month RIs that we have already purchased. They provide the deepest discounts, with the least flexibility.
- The orange boxes represent 12-month RIs that could be bought for some of our load.
- The blue boxes represent a single 36-month convertible reservation, with 3 conversions shown during its term. Convertibles can be started any month, or be modified; this diagram shows modification, in which the total commitment ends after 36 months.
Possible portfolio combination of three different kinds of Reserved Instances.
This model resembles some financial portfolios (like a bond portfolio), in that some commitments are expiring in most months, and quite a few will expire within any year. Expiring commitments can be renewed, or can be modified as our needs change. AWS regularly introduces new RIs and new instance types, so we can adopt those as they are released. In addition, if we shift our AWS usage away from EC2 and RDS and to products that don’t have RI discounts, such as the Elasticsearch and Batch services, we know that our RI commitments can gradually be reduced.
Managing the Portfolio
The relative size of each of our RI commitments should change as time goes on, depending on our understanding of future needs. We decided we needed a periodic procedure to assess our current usage, and to add or convert RIs. In this procedure, each of the choices has a metric (like “50%” or “80%”), which can be adjusted as needed from one assessment to the next:
- Decide whether to make an assessment. In the month before an assessment, there will be some percentage of on-demand usage that is already covered by RIs. If that number is high enough, perhaps 80%, there’s no need to make any changes.
- Choose a target size of RI adjustment. If RI coverage is currently far below the desired level, it’s probably best not to make too large an adjustment at one time, so that we spread out the expiration dates on the RIs. We decided to plan for each assessment to cover 50% of the uncovered usage.
- Choose a target partition between 3-year and 1-year commitments. This partition should combine a business judgement of how committed we are to AWS, with a technical judgement of how much we are locked into EC2 and RDS instances. The central question is, how many of our current commitments expire in the next 12 months? We decided that a minimum of 30% of the value of our commitments should expire within the next 12 months (we could set separate duration targets for RDS and EC2, but chose 30% for both).
- Choose a target partition between 3-year fixed and convertible EC2 commitments. Since fixed RIs add about 6% to the discount level, and they are restricted only to the architecture of the instance and not the size, they should be used when loads are generic and predictable. This is a mostly technical judgement, anticipating changes in architecture or design (new CPUs, memory-heavy instances for containerized loads, etc.). We set a target of 60% convertible and 40% fixed among the 3-year commitments.
- Decide on upfront or no-upfront payments. For EC2, the savings for upfront payment is low, so we choose only no-upfront EC2 RIs. For RDS, no-upfront is only available for 1-year RIs, and the upfront discount is more significant, so we choose 75% upfront RDS RIs.
- Set a date for the next assessment. Making changes takes time, to gather cost data and model usage. Doing an assessment each month is likely to add too much overhead. However, waiting too long as usage grows will likely result in missed savings. At Zymergen, we chose to perform assessments every 2 months until our RIs covered 80% of our usage, and every 4 months after that.
Caveats and Details
There are other ways to pay for EC2 instances at Amazon. One way is called “Spot” instances. These are instances that are offered by AWS when it has unused capacity after serving all reserved and on-demand instances. AWS provides them at low cost but they are less reliable; Amazon may shut them down if they need the capacity for their other services. Spot instances are a good match for batch jobs, autoscaled services, or development instances that can be restarted without harm, and we use them heavily for software build and testing. Typical discounts for spot instances are 70% or more; from a cost point of view, they are always preferable to RIs.
We considered, but rejected, certain other EC2 RIs. “Partial Upfront” RIs are a fit for servers that may be utilized only part of the time during each month. “Scheduled” RIs are good for servers that are predictably required during particular weeks during a year, or hours during a day. RIs can also be purchased for “dedicated” EC2 instances, which are run on physical machines dedicated to a single customer. None of these fit Zymergen’s needs at present. However, as needs change, or as AWS introduces new types of RIs, we will revisit our options.
One final wrinkle is the AWS “reserved instance marketplace.” AWS customers can sell unused EC2 RIs to each other, through the AWS console. In principle, the marketplace could provide a low-cost way to change or abandon RI commitments. However, the volume of transactions on the marketplace is quite low. As a result, it can take a long time to dispose of an unused RI, and the amount of money that results for the seller can be low. We have checked the marketplace during our assessments, but haven’t found any bargains yet. For users who might be looking for RIs with short or unusual durations, you may want to see what’s available at the marketplace.
We’ve been running our assessment process for about nine months and, so far, it covers our needs. We’ve gradually increased our RI coverage, cutting our total AWS bill by over 30%. Having decided on an overall strategy for continuously refining our RI portfolio, we can now perform iterative assessments fairly easily, while making sure that our costs stay down and the infrastructure mix remains flexible. Our AWS usage keeps growing as we handle more data and do more sophisticated analysis of our data, and we will keep doing these assessments going forward. The wide range of options AWS offers is powerful, but it can be confusing — we hope this blog post provides helpful ideas for others who are also thinking about matching their growth rates and rates of change to hardware commitments and compute budgets.
Ken Novak is a Senior Software Engineer on the Infrastructure team at Zymergen.