Implementing Microsoft Azure cost optimization internally at Microsoft

Aug 28, 2023   |  

Microsoft Digital technical storiesOur Microsoft Digital Employee Experience (MDEE) team is aggressively pursuing Microsoft Azure cost optimization as part of our continuing effort to improve the efficiency and effectiveness of our enterprise Azure environment here at Microsoft and for our customers.

Adopting data-driven cost-optimization techniques, investing in central governance, and driving modernization efforts throughout our Microsoft Azure environment, makes it so our environment—one of the largest enterprise environments hosted in Azure—is a cost efficient blueprint that all customers can look to for lessons on how to lower their Azure costs.

We began our digital transformation journey in 2014 with the bold decision to migrate our on-premises infrastructure to Microsoft Azure so we could capture the benefits of a cloud-based platform—agility, elasticity, and scalability. Since then, our teams have progressively migrated and transformed our IT footprint to the largest cloud-based infrastructure in the world—we host more than 95 percent of our IT resources in Microsoft Azure.

The Microsoft Azure platform has expanded over the years with the addition of hundreds of services, dozens of regions, and innumerable improvements and new features. In tandem, we’ve increased our investment in Azure as our core destination for business solutions at Microsoft. As our Azure footprint has grown, so has the environment’s complexity, requiring us to optimize and control our Azure expenditures.

[Discover how we’re using Microsoft Azure to retire hundreds of physical branch-office servers. Explore building an agile and trusted SAP environment on Microsoft Azure. Unpack optimizing SAP for Microsoft Azure.]

Optimizing Microsoft Azure cost internally at Microsoft

Our Microsoft Azure footprint follows the resource usage of a typical large-scale enterprise. In the past few years, our cost-optimization efforts have been more targeted as we attempted to minimize the rising total cost of ownership in Azure due to several factors, including increased migrations from on-premises and business growth. This focus on optimization instigated an investment in tools and data insights for cost optimization in Azure.

The built-in tools and data that Microsoft Azure provides form the core of our cost-optimization toolset. We derive all our cost-optimization tools and insights from data in Microsoft Azure Advisor, Microsoft Azure Cost Management and Billing, and Microsoft Azure Monitor. We’ve also implemented design optimizations based on modern Azure resource offerings. We extract recommendations from Azure Advisor across the different Azure service categories and push those recommendations into our IT service management system, where the services’ owners can track and manage the implementation of recommendations for their services.

Understanding holistic optimization

As the first and largest adopter of Microsoft Azure, we’ve developed best practices for engineering and maintenance in Azure that support not only cost optimization but also a comprehensive approach to capturing the benefits of cloud computing in Azure. We developed and refined the Microsoft Well-Architected Framework as a set of guiding tenets for Azure workload modernization and a standard for modern engineering in Azure. Cost optimization is one of five components in the Well-Architected Framework that work together to support an efficient and effective Azure footprint. The other pillars include reliability, security, operational excellence, and performance efficiency. Cost optimization in Azure isn’t only about reducing spending. In Azure’s pay-for-what-you-use model, using only the resources we need when we need them, in the most efficient way possible, is the critical first step toward optimization.

Optimization through modernization

Reducing our dependency on legacy application architecture and technology was an important part of our first efforts in cost optimization. We migrated many of our workloads from on-premises to Microsoft Azure by using a lift-and-shift method: imaging servers or virtual machines exactly as they existed in the datacenter and migrating those images into virtual machines hosted in Azure. Moving forward, we’ve focused on transitioning those infrastructure as a service (IaaS) based workloads to platform as service (PaaS) components in Azure to modernize the infrastructure on which our solutions run.

Focus areas for optimization

We’ve maintained several focus areas for optimization. Ensuring the correct sizing for IaaS virtual machines was critical early in our Microsoft Azure adoption journey, when those machines accounted for a sizable portion of our Azure resources. We currently operate at a ratio of 80 percent PaaS to 20 percent IaaS, and to achieve this ratio we’ve migrated workloads from IaaS to PaaS wherever feasible. This means transitioning away from workloads hosted within virtual machines and moving toward more modular services such as Microsoft Azure App Service, Microsoft Azure Functions, Microsoft Azure Kubernetes Service, Microsoft Azure SQL, Microsoft Azure Cosmos database. PaaS services like these offer better native optimization capabilities in Microsoft Azure than virtual machines, such as automatic scaling and broader service integration. As the number of PaaS services has increased, automating scalability and elasticity across PaaS services has been a large part of our cost-optimization process. Data storage and distribution has been another primary focus area as we modify scaling, size, and data retention configuration for Microsoft Azure Storage, Azure SQL, Azure Cosmos DB, Microsoft Azure Data Lake, and other Azure storage-based services.

Implementing practical cost optimization

While Microsoft Azure Advisor provides most recommendations at the individual service level—Microsoft Azure Virtual Machines, for example—implementing these recommendations often takes place at the application or solution level. Application owners implement, manage, and monitor recommendations to ensure continued operation, account for dependencies, and keep the responsibility for business operations within the appropriate business group at Microsoft.

For example, we performed a lift-and-shift migration of our on-premises virtual lab services into Microsoft Azure. The resulting Azure environment used IaaS-based Azure virtual machines configured with nested virtualization. The initial scale was manageable using the nested virtualization model. However, the Azure-based solution was more convenient for hosting workloads than the on-premises solution, so adoption began to increase exponentially, which made management of the IaaS-based solution more difficult. To address these challenges, the engineering team responsible for the virtual lab environment re-architected the nested virtual machine design to incorporate a PaaS model using microservices and Azure-native capabilities. This design made the virtual lab environment more easily scalable, efficient, and resilient. The re-architecture addressed the functional challenges of the IaaS-based solution and reduced Azure costs for the virtual lab by more than 50 percent.

In another example, an application used Microsoft Azure Functions with the Premium App Service Plan tier to account for long-running functions that wouldn’t run properly without the extended execution time enabled by the Premium tier. The engineering team converted the logic in the Function Apps to use Durable Functions, an Azure Functions extension, and more efficient function-chaining patterns. This reduced execution time to less than 10 minutes, which allowed the team to switch the Function Apps to the Consumption tier, reducing cost by 82 percent.

Governance

To ensure effective identification and implementation of recommendations, governance in cost optimization is critical for our applications and the Microsoft Azure services that those applications use. Our governance model provides centralized control and coordination for all cost-optimization efforts. Our model consists of several important components, including:

  • Microsoft Azure Advisor recommendations and automation. Advisor cost management recommendations serve as the basis for our optimization efforts. We channel Advisor recommendations into our IT service management and Microsoft Azure DevOps environment to better track how we implement recommendations and ensure effective optimization.
  • Tailored cost insights. We’ve developed dashboards to identify the costliest applications and business groups and identify opportunities for optimization. The data that these dashboards provide help empower engineering leaders to observe and track important Azure cost components in their service hierarchy to ensure that optimization is effective.
  • Improved Microsoft Azure budget management. We perform our Azure budget planning by using a bottom-up approach that involves our finance and engineering teams. Open communication and transparency in planning are important, and we track forecasts for the year alongside actual spending to date to enable accurate adjustments to spending estimates and closely track our budget targets. Relevant and easily accessible spending data helps us identify trend-based anomalies to control unintentional spending that can happen when resources are scaled or allocated unnecessarily in complex environments.

Implementing a governance solution has enabled us to realize considerable savings by making a simple change to Microsoft Azure resources across our entire footprint. For example, we implemented a recommendation to convert Microsoft Azure SQL Database instances from the Standard database transaction unit (DTU) based tier to the General Purpose Serverless tier by using a simple Microsoft Azure Resource Manager template and the auto-pause capability. The configuration change reduced costs by 97 percent.

Benefits of Microsoft Azure

Ongoing optimization in Microsoft Azure has enabled us to capture the value of Azure to help increase revenue and grow our business. Our yearly budget for Azure has remained almost static since 2014, when we hosted most of our IT resources in on-premises datacenters. Over that period, Microsoft has grown by more than 20 percent,

Our recent optimization efforts have resulted in significantly reduced spending across numerous Microsoft Azure services. Examples, in addition to those already mentioned, include:

  • Right-sizing Microsoft Azure virtual machines. We generated more than 300 recommendations for VM size changes to increase cost efficiency. These recommendations included switching to burstable virtual machine sizes and accounted for a 15 percent cost savings.
  • Moving virtual machines to latest generation of virtual machine sizes. Moving from older D-series and E-series VM sizes to their current counterparts generated more almost 2,500 recommendations and a cost savings of approximately 30 percent.
  • Implementing Microsoft Azure Data Explorer recommendations. More than 200 recommendations were made for Microsoft Azure Data Explorer optimization, resulting in significant savings.
  • Incorporating Cosmos DB recommendations. More than 170 Cosmos DB recommendations reduced cost by 11 percent.
  • Implementing Microsoft Azure Data Lake recommendations. More than 30 Azure Data Lake recommendations combined to reduce costs by approximately 15 percent.

Key Takeaways

Cost optimization in Microsoft Azure can be a complicated process that requires significant effort from several parts of the enterprise. The following are some the most important lessons that we’ve taken from our cost-optimization journey:

Implement central governance with local accountability

We implemented a central audit of our Microsoft Azure cost-optimization efforts to help improve our Azure budget-management processes. This audit enabled us to identify gaps in our methods and make the necessary engineering changes to address those gaps. Our centralized governance model includes weekly and monthly leadership team reviews of our optimization efforts. These meetings allow us to align our efforts with business priorities and assess the impact across the organization. The service owner still owns and is accountable for their optimization effort.

Use a data-driven approach

Using optimization-relevant metrics and monitoring from Microsoft Azure Monitor is critical to fully understanding the necessity and impact of optimization across services and business groups. Accurate and current data is the basis for making timely optimization decisions that provide the largest cost savings possible and prevent unnecessary spending.

Be proactive

Real-time data and effective cost optimization enable proactive cost-management practices. Cost-management recommendations provide no financial benefit until they’re implemented. Getting from recommendation to implementation as quickly as possible while maintaining governance over the process is the key to maximizing cost-optimization benefits.

Adopt modern engineering practices

Cost optimization is one of the five components of the Microsoft Azure Well-Architected Framework, and each pillar functions best when supported by proper implementation of the other four. Adopting modern engineering practices that support reliability, security, operational excellence, and performance efficiency will help to enable better cost optimization in Microsoft Azure. This includes using modern virtual machine sizes where virtual machines are needed and architecting for Azure PaaS components such as Microsoft Azure Functions, Microsoft Azure SQL, and Microsoft Azure Kubernetes Service when virtual machines aren’t required. Staying aware of new Azure services and changes to existing functionality will also help you recognize cost-optimization opportunities as soon as possible.

Looking forward to more optimization

As we continue our journey, we’re focusing on refining our efforts and identifying new opportunities for further cost optimization in Microsoft Azure. The continued modernization of our applications and solutions is central to reducing cost across our Azure footprint. We’re working toward ensuring that we’re using the optimal Azure services for our solutions and building automated scalability into every element of our Azure environment. Using serverless and containerized workloads is an ongoing effort as we reduce our investment in the IaaS components that currently support some of our legacy technologies.

We’re also improving our methods for decentralizing optimization recommendations to enable our engineers and application owners to make the best choices for their environments while still adhering to central governance and standards. This includes automating the detection of anomalous behavior in Microsoft Azure billing by using service-wide telemetry and logging, data-driven alerts, root-cause identification, and prescriptive guidance for optimization.

Microsoft Azure optimization is a continuous cycle. As we further refine our optimization efforts, we learn from what we’ve done in the past to improve what we’ll do in the future. Our footprint will continue to grow in the years ahead, and our cost-optimization efforts will expand accordingly to ensure that our business is capturing every benefit that the Azure platform provides.

Related links

Tags: