Optimizing power consumption for latency-aware microservice architectures

In the last few years we have seen a substantial growth of cloud services: cloud storage, video streaming, video conferencing, social platforms, instant messaging and smart working apps are applications that each of us has to deal with on a daily basis. What we are noticing today is an increase in the adoption of these new technologies: day after day they become more and more present in our daily life. 

By Luca Danelutti,
Undergraduated student in Computer Science and Engineering, Politecnico di Milano

In this historical moment marked by a pandemic, it is easy to point out how some digital services are supporting us in carrying out work activities or in keeping in touch with our friends even if we are forced to limit real-life interactions.

Technology seems to have given us the opportunity to have what we want, where we want and when we want. This technological evolution is giving us new chances, enabling things like working or following a college course from home and talking with friends from the other side of the ocean. Unfortunately, however, we’re starting to see some drawbacks to this evolution.

Behind each service there are many servers located in large data centers that collaborate in a dependent and synchronized way to make an application available. Two problems arise when deploying a service. The first one is that on one hand certain performance must be guaranteed, a provider has to make the platform fast enough because a query on a search engine cannot take more than a given number of milliseconds or a video streaming app must be able to stream the content in real-time. At the same time, on the other hand, each service provider is interested in minimizing costs and so reducing energy consumption. Ten years ago Google conducted a research on its search engine that indicated that an increase in the queries response time would lead to a decrease in the number of searches made by users, a decrease that would not recover even after a return of the normal execution speed. To have faster response times the provider has to employ more machines with more powerful hardware and this results in increased total power consumption. Nowadays scientists estimate that in 2030 8% of world power consumption will be directly charged to data centers, this is why optimizing the electricity needs of these structures and at the same time satisfying performance requests is an important challenge.

At NECSTLab we are trying to solve this problem by working on the development of systems able to balance these two factors. Therefore the focus of our work is to minimize the power consumption of clusters and at the same time guarantee a certain level of performance. Our work is based on the Hyppo project, so the basic idea is to use an Observe-Decide-Act loop to control available CPU resources depending on the end-to-end response time of an application deployed in a microservices architecture. In the first phase our system collects the metrics needed by step two. In the decide phase we use a fuzzy controller to discretize the manipulated measures and to compute the actual limits to enforce in the last phase. In particular, we built two actuators: the first one is able to limit the CPU percentage that a pod can use, the second one tells each pod which CPU cores to execute on. 

Combining these elements for each microservice we are able to find the trade-off between power consumption and performance, then we can aggregate these results and decrease the overall energy consumption of the cluster. I think that this IT field is growing quickly, it is constantly evolving and it is becoming increasingly clear how necessary optimizing these services is. From the research point of view it’s an interesting challenge, we are living in a world where making human presence less harmful to the environment is becoming a necessity and so the impact that progress in this area can have is huge.