in Data Centers
Energy consumption has become an important problem for large scale data centers due to high and constant power demand and the cost associated with it. The cost of energy consumption of data centers has reached and even surpassed the cost of the physical data center itself. This necessitates research for dynamically reducing the amount of energy used for computing, cooling and maintaining a data center. The primary motivation for this work is to develop different data center power management schemes that deliver energy efficiency while minimizing the impact on performance.
We approach this high consumption problem in various ways. We first understand the energy and performance requirements of different virtual machines (VMs) running various data center workloads. Our experiments with real data center workloads on servers show that the resource and performance requirements of different type of applications exhibit great variability. We develop a dynamic scheduling mechanism, called vGreen, that leverages this variability of different VMs and schedules them across different servers to save energy without affecting their performance. It gets feedback from the clients and monitors the individual VMs to make sure that the scheduling decisions do not violate the quality of service (QoS) requirements of any individual job. We extend the idea of dynamically allocating computing resources to save energy to the data center interconnect network. This network ensures the connectivity among servers and is extremely crucial for data center workloads to have good performance. We develop vGreenNet that monitors the network activity and optimizes the energy consumption of the network elements without degrading the job performance.
Next, we modify our scheduling mechanism to support renewable energy integration. Renewable energy is an efficient solution to decrease the energy cost of data centers and the environmental effects of traditional coal-based (brown) energy. We use accurate prediction algorithms to reduce the effects of the intermittent nature of the renewable sources. Our analysis demonstrates that prediction leads to better renewable energy integration to the system and reduces the amount of energy wasted. Our solution is applicable not only to a single data center but also to a network of data centers and the backbone network connecting them. We show that the backbone network can decrease its brown energy costs significantly with dynamic, renewable energy aware routing mechanisms. We also build multiple scheduling algorithms that can both minimize the energy costs and maximize the job performance of geographically distributed data centers.
Peak power management is another important mechanism to reduce the energy cost of data centers because data centers do not get charged only for their energy consumption, but also for their peak power draw from the utility. Data centers traditionally use DVFS-based peak power shaving mechanisms to reduce their peak power level at the cost of reduced performance. In contrast, our peak power shaving solution leverages the already available energy stored in data center UPS (uninterruptible power supply) devices (i.e. batteries). These batteries can provide energy during peak power pulses and can be recharged when the demand is low. Our solution has no performance effect on the jobs because it does not modify their execution environment. We also consider the physical behavior of the batteries and use this information to further increase the energy savings.
Another effort in this research topic is the use of Optical Circuit Switching (OCS) in the datacenter. This technology potentially allows for less costly and more energy efficient datacenter networks compared with traditional electronic packet switched networks when operating at link rates of 40G and higher. One criticism of earlier projects such as Helios and c-Through was that they employed OCS technology with reconfiguration times on the order of 15 milliseconds or longer, limiting the utility of the resulting hybrid datacenter network to long-lived communication patterns with hotspots (elephants). MORDIA (Microsecond Optical Reconfigurable Datacenter Interconnect Architecture) is an effort to explore the use of OCS technology in datacenter networks with microsecond reconfiguration times. Because no such fast OCS was available, we constructed our own 24-port OCS with a reconfiguration time of 2.8 microseconds.
Ongoing efforts in the SEELAB focus of understanding application behavior across an OCS data center network. Towards this end, our group has developed a kernel module that allows an unmodified TCP/IP stack to operate on top of an OCS, which means that unmodified data center applications may be run. Combined with the power of Openvswitch, we are able to study the effect of scheduling network flows dynamically across an OCS. We envision developing software techniques that allow data center applications to seamlessly exploit the scalability and bandwidth offered by optical technologies.
Server fan subsystems are power-hungry and generate vibrations, degrading the performance of
data-intensive workloads and inflating the uptime electric bills of cost-sensitive data centers.
We demonstrate a systematic server measurement methodology to isolate different types
of vibrations and quantify their effect on hard disks.
We also introduce a thermal and cooling management
policy that mitigates vibrational effects workload scheduling and fan control, and study the
benefits of a hybrid storage array with solid-state drives (SSDs) that are impervious to vibrations.
We improve over the state of the art through performance speedups and energy savings,
while meeting thermal constraints and improving the system.s resilience to both internal and external vibrations.
Another important challenge for data centers is to identify the needs for the next generation infrastructure. These infrastructure should be highly compatible with high performance computing, but this poses the challenge of 10x-100x higher SW parallelism for higher performance, while exacerbating problems related to energy efficiency. Self-aware runtime management of hardware resources is required to achieve energy efficient software execution. To enable this, Seelab focuses on developing and implementing a very low overhead and non intrusive monitoring infrastructure that can perform introspection, e.g. is able to collect, distill and analyze data from both hardware and software counters. The goal is to identify key metrics and develop models (focusing on energy) for a hierarchical introspection framework that enables a self-aware runtime to intelligently utilize refined data from sensors and monitors to dynamically tune and manage future HPC systems.
- B. Aksanli, J. Venkatesh, and T. Rosing, “Datacenter Modeling and Simulation with Focus on Energy Efficiency and Green Energy Integration” IEEE Computer Special Issue on Modeling and Simulation of Smart and Green Computing Systems, 2012. [pdf]
- G. Dhiman, G. Marchetti, T. Simunic Rosing, “vGreen: A System for Energy-Efficient Management of Virtual Machines,” ACM TODAES, 2010. [pdf]
G. Dhiman and T.S. Rosing, "System Level Power Management Using Online Learning" In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 28, Nr. 5, May (2009), p. 676-689. [pdf]
- B. Aksanli, A.S. Akyurek, T. Rosing. "Minimizing the Effects of Data Centers on Microgrid Instability". International Green and Sustainable Computing Conference (IGSC), 2015. [pdf]
- H. Rodrigues, R. Strong, T. Rosing, "Accurate Emulation of Fast Optical Circuit Switches", IEEE International Conference on Communications (ICC), 2015. [pdf]
- A. Sadasivarao, H. Rodrigues, S. Syed, C. Liou, S. Balakrishnan, A. Lake, E. Poyoul, C. Guok, I. Monga, T. Rosing. , "Enabling Multi-Layer Provisioning and Optimization for Core Transport Networks with Unified Packet-Optical Control Plan", 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI'14 (Demo). [pdf]
- H. Rodrigues, A. Akyurek, T. Rosing, "OCSEMU: SDN Enabled Fast Hybrid Optical Circuit Switch Emulator Platform to Study Application Performance in the Emerging Optical Data Center", OIDA Software Defined Photonic and Data Center Networks Workshop (Poster), 2014. [pdf]
- H. Rodrigues, A. Akyurek, C. Guok, I. Monga, T. Rosing, "Enabling SDN-based Multi-Layer Provisioning and Optimization for Inter Data Center Networks with Unified Packet-Optical Control Plane", OIDA Software Defined Photonic and Data Center Networks Workshop (Poster), 2014. [pdf]
- H. Rodrigues, R. Strong, T. Rosing, "Scheduling Optical Tunnels to Distributed Applications", USENIX Annual Technical Conference, ATC'14 (Poster). [pdf]
- B. Aksanli and T. Rosing. "Providing Regulation Services and Managing Data Center Peak Power Budgets". Design, Automation and Test in Europe (DATE), 2014. [pdf]
- Henrique Rodrigues, Inder Monga, Abhinava Sadasivarao, Sharfuddin Syed, Chin Guok, Eric Pouyoul, Chris Liou, Tajana Rosing, "Traffic Optimization in Multi-Layered WANs using SDN", IEEE Symposium on High Performance Interconnects. (Best Student Paper Award) [pdf]
- B. Aksanli, E. Pettis, and T. Rosing. "Architecting Efficient Peak Power Shaving Using Batteries in Data Centers". International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2013. [pdf]
- C. S. Chan, B. Pan, K. Gross, K. Vaidyanathan, T. S. Rosing. "Correcting vibration-induced performance degradation in enterprise servers". SIGMETRICS Performance Evaluation Review, 2013. [pdf]
- L. Zhang, G. Dhiman, and T. S. Rosing. "vGreenNet: Managing Server and Networking Resources of Co-located Heterogeneous VMs". IEEE International Parallel and Distributed Processing Symposium (IPDPS), High Performance Grid and Cloud Computing Workshop (HPGC), 2013. [pdf]
- B. Aksanli, J. Venkatesh, T. Rosing, and I. Monga, “A Comprehensive Approach to Reduce the Energy Cost of Network of Datacenters” International Symposium on Computers and Communications (ISCC), 2013. (Best Student Paper Award) [pdf]
- B. Aksanli, E. Pettis, and T. Rosing. "Distributed Battery Control for Peak Power Shaving in Data Centers". International Green Computing Conference (IGCC) 2013. [pdf]
- G. Porter, R. Strong, N. Farrington, A. Forencich, P. Sun, T. Rosing, Y. Fainman, G. Papen, A. Vahdat, "Integrating Microsecond Circuit Switching into the Data Center". to appear in SIGCOMM 2013. [pdf]
- F. Seracini, X. Zhang, I. Krueger, T. Rosing, M. Menarini. "Green Web Services: Improving Energy Efficiency in Data Centers via Workload Predictions". ICSEWS 2013 GREENS. [pdf]
- V. Kontorinis, L. Zhang, B. Aksanli, J. Sampson, H. Homayoun, E. Pettis, D Tullsen, T. Rosing, "Managing Distributed UPS Energy for Effective Power Capping in Data Centers," ISCA, 2012.[pdf]
- B. Aksanli, T. Simunic Rosing, I. Monga, "Benefits of Green Energy and Proportionality in High Speed Wide Area Networks Connecting Data Centers," DATE 2012.[pdf]
- G. Dhiman, V. Kontorinis, R. Ayoub, L. Zhang, C. Sadler, D. Tullsen, T. Rosing."Themis: Energy Efficient Management of Workloads in Virtualized Data Centers". EuroPar-VHPC 2012. [pdf]
- B. Aksanli, J. Venkatesh, L. Zhang, T. Simunic Rosing, "Utilizing Green Energy Prediction to Schedule Mixed Batch And Service Jobs in Data Centers," HotPower 2011.[pdf] (Best of HotPower)
- G. Dhiman, K. Mihic, T. Rosing, "A System for Online Power Prediction in Virtualized Environments Using Gaussian Mixture Models," ACM/IEEE DAC 2010 [Acceptance Rate: 24.3%] [pdf]
- G. Dhiman, V. Kontorinis, D. Tullsen, T. Rosing: E. Saxe, J.Chew, "Dynamic Workload Characterization for Power Efficient Scheduling on CMP Systems," ISLPED 2010. [pdf]
- G. Dhiman, G. Marchetti and T.S. Rosing, “vGreen: A System for Energy Efficient Computing in Virtualized Environments” In Proceedings of the 14th IEEE/ACM International Symposium on Low Power Electronics and Design, 2009, ISLPED ’09. [Acceptance Rate (Regular Papers) = 52/210 = 24.7%] [pdf]
(Best Paper Nominee)
- G. Dhiman, K.K. Pusukuri and T.S. Rosing, “Analysis of Dynamic Voltage Scaling for System Level Energy Management” In Proceedings of the 2008 Workshop on Power Aware Computing and Systems, HotPower'08. [pdf]
- G. Dhiman and T.S. Rosing, “Dynamic voltage frequency scaling for multi-tasking systems using online learning” In Proceedings of the 2007 International Symposium on Low Power Electronics and Design, ISLPED '07. [Acceptance Rate (Regular Papers) = 56/192 = 29%] [pdf]
- G. Dhiman and T.S. Rosing, “Dynamic power management using machine learning” In Proceedings of the 2006 IEEE/ACM international Conference on Computer-Aided Design, ICCAD '06. [Acceptance Rate = 130/541 = 24%] [pdf]
(Best Paper Nominee)
Distributed Battery Control to Improve Peak Power Shaving Efficiency in Data Centers - Poster
Managing Distributed UPS Energy for Effective Power Capping in Data Centers - Poster
Benefits of Green Energy and Proportionality in High Speed Wide Area Networks Connecting Data Centers - Poster
Utilizing Green Energy Prediction to Adapt to Energy Supply Variability when Scheduling Mixed Batch and Service Jobs in Data Centers - Poster
Energy efficiency in virtualized environments - Poster
Richard Strong, Google
Gaurav Dhiman, Google
Raid Ayoub, Intel
Liuyi Zhang, Google
Vasileios Kontorinis, Google
Giacomo Marchetti, Italy Startup
Kresimir Mihic, Oracle
NSF-GreenLight project, NSF CIAN, CNS, Oracle, UC Micro, Cisco, GSRC/DARPA, Google, Microsoft, Multi-scale Systems Center (MuSyC), Intel