This blog, from the series Cloud Strategy Unboxed, looks at effectively managing cloud finances, also known as financial operations or FinOps for short. The three main categories are tracking & associating, cost optimization, and monitoring & alerting.
IT on-premise requires significant CAPEX investments that require a yearly budgeting exercise and multi-year planning. In the cloud, infrastructure is made available immediately and on request, as and when needed, for as long as needed. Customers only pay for what they use. This shift opens up business agility and innovation opportunities, as ‘failing fast’ is now a possibility – a challenging strategy to achieve with CAPEX due to the up-front sunk cost.
Many organizations moving to the cloud struggle to achieve this benefit as they do not adapt their financial strategy to the new paradigms that the cloud offers, such as the pay-as-you-go model and on-demand provisioning.
The purpose of a FinOps strategy is to help optimize the cost of the cloud, both the operational cost of running the cloud platform and shared services, and the cost of each individual application. Part of this will be to move from a CAPEX mindset to OPEX and use the many discounts and incentives the cloud providers offer. To drive this strategy, the Cloud Centre of Excellence (CCoE) includes a role for a finance person responsible for establishing financial policy and working with the platform team to implement and automate enforcement where possible and with the education and enablement teams to spread awareness.
Tracking & associating
Efficient cost control starts with clearly identifying resources and accountability for these resources. A resource is clearly identifiable within an organization if the resource is tagged according to clear guidelines set by the Cloud Centre of Excellence (CCoE). These guidelines are applicable to all major cloud service providers as they offer similar capabilities when it comes to asset tagging.
A tagged resource...
- is named
- has labels (tags) that specify the role and characteristics of a resource
- has a resource name (where applicable) that follows the defined naming convention
Establishing organizational conventions is a good start, but it does not guarantee that resources will be appropriately named and tagged. Automated processes need to be put in place to ensure conformance.
Automating the deployment of resources in the cloud is a best practice and a great way to ensure that tagging conventions are respected. Such deployment pipelines can have checks that ensure the requested resources have tags configured. Deployments without tags will be rejected and not deployed.
When tagging of resources is done correctly, resources can be logically grouped in services such as AWS Billing and Cost Management according to tags and drive business intelligence. This applies to cloud-native solutions as well as third-party solutions.
When it comes to optimizing organizational cloud spend, there are four areas we need to consider.
- Driving cost-effective architecture and decisions
- Minimizing idle costs
Driving cost-effective architecture and decisions
Cost-effective architecture and decision-making in the cloud can be achieved through meaningful KPIs. At a high level, there is little a finance team can do to implement code-based cost-saving guardrails or create financial dashboards. Through training, the finance team can be made aware of how to reduce costs in the cloud. Armed with this knowledge, they can design policies and KPIs to drive cost-optimization.
The CCoE platform team can assess the policies and implement automated controls where possible. KPIs address topics such as using autoscaling to reduce the size of servers, using managed services where possible, optimizing licenses, and using cloud discounts and incentives such as AWS Savings Plans and the AWS Migration Acceleration Program (MAP).
Effective use of modern architectures, such as serverless for appropriate applications, can significantly reduce cloud spend. Education and advocacy efforts can drive application modernization efforts, both from a demand perspective — the business teams seeing the business benefits that can be achieved and defining the architecture as a requirement, and from the delivery perspective — the application teams seeing the technical advantages of the architecture and considering it first in the design process. As architecture is often defined early in the application lifecycle, the involvement of cloud architects at the design or procurement stage is critical to ensure unnecessary costs are not locked in.
Overprovisioning a resource will lead to wasted spending and unused capacity, while under-provisioning a resource will compromise the quality of the application sitting on top of the resource. Rightsizing a given resource is about finding the ideal parameters for it to run at the lowest price possible without impacting the workload’s performance, quality, or security.
In the cloud, resource metrics are collected to help understand if they are optimally used and identify opportunities for rightsizing. While the approach differs, rightsizing is relevant for cloud servers, containers, and several fully managed services.
With servers, the server type will be rightsized. Autoscaling can help significantly reduce the size of the base servers. Relying instead on additional servers that are automatically added for resource peaks and removed again when the requirements drop. With a serverless service such as AWS Lambda, the focus will be determining the optimal amount of memory to assign.
Minimizing idle costs
Another approach to reducing costs (that can also be driven by KPIs) is to minimize paying for resource idle time. For example, automatically turning off non-production servers outside of working hours. Another approach is to use fewer servers and more managed cloud services, also known as serverless architecture. This requires training of the application teams and, at least initially, support from the CCoE to design architecture and modernize legacy applications.
Many software vendors require licenses for their products based on the number of cores, virtual machines, or users. In a cloud environment, it can be challenging to accurately predict the number of licenses required, leading to unexpected costs. Architecture patterns fundamentally evolve in the cloud. For example, autoscaled instances only require a license for a short duration before they are scaled down again, further adding to the complexity of predicting the number of licenses that may be needed.
Carefully analysing and optimizing licensing agreements and configurations, considering CSP-provided licenses where possible, and preferring cloud-friendly license agreements can save significantly on cloud spend. Be especially aware of requirements such as licenses bound to bare metal instances, as this can severely impact cloud spend. Lastly, and AWS-specific, are Gravitron instances that can reduce infrastructure costs by up to 40%. While most x86 applications will work on these instances, some may have limitations or issues that should be considered as part of the licensing discussions.
Monitoring & alerts
Amazon CloudWatch is a service for monitoring AWS resources and applications. CloudWatch collects and tracks metrics, which are variables of the resources and applications that can be measured. CloudWatch alarms can automatically trigger alerts when there are resources that should be considered for rightsizing, are running into resource constraints, or are performing outside of normal thresholds.
Monitoring abnormal performance or setting budget limit alerts is especially important to avoid bill shock. For example, a resource might far exceed its normal utilization causing significant increases in cost for that month. This can happen if the resource is compromised or due to application bugs. An alert will notify the relevant team that an expected budget threshold has been crossed and that further investigation is required to determine the cause.
Total Cost of Ownership (TCO)
A FinOps strategy should include a Cloud Total Cost of Ownership (TCO) template to help business users estimate and compare different options for a given application.
A TCO provides a fair comparison between different approaches to the cloud that have varying pros and cons. For example, server architecture tends to have a low up-front development cost but a high recurring maintenance cost, while serverless architecture often has a higher up-front cost but a very low recurring operational cost. Including the one-time and the recurring costs in a multi-year estimate provides a more complete total for each approach that can be fairly compared.
To make things easier for business users, the TCO template should include standard architecture templates that can be adjusted to suit a particular application. For example:
- A non-optimized (lift-and-shift) migration to the cloud using self-managed servers
- An optimized 3-tier application architecture running on a mix of self-managed servers and fully managed cloud services
- A container-based architecture
- A serverless API application architecture
This list should be determined and expanded on by the CCoE to meet the common needs of the organization. The business user can compare the known on-premises cost to different cloud architecture options to determine the best approach and its potential savings.
A FinOps strategy can help organizations optimize their cloud spending by effectively managing financial operations in the cloud. This includes tracking and associating resources, cost optimization, and monitoring and alerting.
organizations can achieve significant cost savings by leveraging cloud provider programs and discounts and ensuring resources are tracked. Clear financial policies must be established, tagging conventions enforced, and cost-optimization KPIs rolled out to drive this strategy. Furthermore, it’s crucial to rightsize resources, minimize idle costs, and monitor abnormal performance to avoid bill shock. Finally, a Cloud Total Cost of Ownership template can help organizations estimate and compare different cloud architecture options fairly.
Don't miss out on unlocking the full power of the cloud.for a personalized consultation or to learn more.