Recently I’ve been thinking about pricing models for serverless compute consumption in the context of my role at Google. Serverless per its definition means that the end-user does not have to manage virtual machines or a “server” – applications can simply request compute resources in some dimension whenever they need. This in theory at least results in a compute offering that is scalable, highly available, and cost-efficient (as you only pay for what you use).
The burden of complexity to optimize your application and make sure you’re running as efficiently as possible is passed on from the customer to the cloud provider. The customer ends up paying the cloud provider for this convenience, however the idea is that you’re saving so much more than you’d otherwise be spending.
On the cloud provider side, you’re presented with some interesting challenges. With serverless, you are promising customers elasticity. So you can quickly go from a customer requesting 10 units of compute to 10,000 units of compute ahead of a major event. You need to make sure that you have enough physical machines/servers to back this up, but at the same time, you don’t want to be sitting on a ton of unused capacity. You also have the cold-start problem (i.e. waiting for the serverless compute instance to start functioning) and users tend to be less forgiving with a serverless model.
There are many different ways in which cloud companies are charging for serverless, and there is a lot of nuance to every model out there. It’s hard to compare apples to apples.
- AWS Lambda has pricing by region, architecture (x86 or ARM), duration (cost per GB-second determined by memory), and requests (cost per 1M requests).
- AWS Fargate pricing is calculated based on the vCPU, memory, operating system, architecture, and storage resources used.
- Azure Container Apps are billed based on per-second resource allocation and requests. You pay for what you use based on the number of vCPU-s and GiB-s your applications are allocated. You are also charged for the total number of requests processed each month.
- Azure Functions are billed based on per-second resource consumption and executions, i.e. number of requests processed and execution time (GB-seconds).
- Google Cloud Run has pricing tiers by region. You pay for what you use based on the number of vCPU-s and GiB-s your applications are using, as well as the number of requests processed. Google does offer Committed Use Discounts (CUDs) pricing for Cloud Run.
Per Google guidance “Committed use discounts (CUDs) for Cloud Run provide deeply discounted prices in exchange for committing to continuously use Cloud Run in a particular region for a one year term. These are ideal for workloads with predictable resource needs.” This is an interesting win-win for Cloud Run customers and Google – customers can pay less if they plan their needs ahead of time, and Google can order physical machines based on these sane estimates. AWS has something similar with its Savings Plans and Reserved Instances.
AWS Lambda also has an interesting mode of operation called “Provisioned Concurrency”. In their words “Enable Provisioned Concurrency for your Lambda functions for greater control over your serverless application performance. When enabled, Provisioned Concurrency keeps functions initialized and hyper-ready to respond in double-digit milliseconds. You pay for the amount of concurrency you configure, and for the period of time you configure it.” Customers can pay to minimize the cold start problem.
My takeaway overall is that there is room for creativity in serverless pricing even though the basics are the same (CPU and/or memory per-second varied by region plus flat charge for the number of requests). I don’t think we’ve explored all the possibilities here for pricing models just yet. I also think this is nothing short of a nightmare for an average company to figure out when deciding which option and which cloud to choose. There is a lot of room for innovation here and I think VCs and startups are recognizing it too.