System Energy Efficiency Lab
Home People Research Publications Sponsors Contacts
   
Energy-efficient software design
Overview
Profiler
Complex library mapping
Results
References
 

Energy-efficient software design

Energy-effcient design of battery-powered systems demands optimizations in both hardware and software. In addition, low cost with fast time to market are crucial. As a result, typical portable appliances are built of commodity components and have a microprocessor-based architecture.

Our profiler enables designers to explore multiple different hardware and software architectures, as well as to do statistical analysis based on the input samples. In this way the design can be optimized for both energy consumption and performance based on the expected input data set.

The profiler architecture is shown in the figure below. Yellow portion represents the extension we made to the cycle-accurate energy simulator we also developed in order to enable code profiling. Profiling for energy and performance allows designers to identify those portions of their source code that need to be further optimized in order to either decrease energy consumption, increase performance or both.

 

Profiler operations:

Source code is compiled using a compiler for a target processor. The output of the compiler is the executable that the cycle-accurate simulator executes and a map of locations of each procedure in the executable that a proler uses to gather statistics. The profiler works concurrently with the cycle-accurate simulator. It periodically samples the simulation results and maps the energy and performance to the function executed using information gathered at the compile time. Once the simulation is complete, the results of profiling can be printed out by the total energy or time spent in each function.

The main advantage of the profiler is that it allows designers to obtain energy consumption breakdown by procedures in their source code after running only one simulation.

 

However, optimum implementation of calculation intensive routines for the particular hardware design is not possible with traditional compiler optimizations alone. Commonly, the designer does most of such optimizations by hand. Automating even a portion of this process can save much design time.

We present a methodology which facilitates embedded system software optimization with automating library and complex instruction mapping for a given embedded processor. As shown in the figure below, the mapping methodology consists of three main steps: library characterization, target code identification, and mapping. We use our profiler to identify the critical code chunks.

 

Complex Library mapping steps:

  • The first step is to characterize the library elements. The characterization not only includes performance and energy consumption of the complex element for a given hardware architecture, but also the expected input and output format, accuracy and a polynomial representation.

 

  • The next step identifies the target code for optimization. In this step, an initial check is performed to see whether data representation used in the algorithmic-level C code matches the target hardware. The profiler, described above, detects if data representation is an issue within several seconds. The profiler also reports the performance and energy critical functions of the code. Accuracy is checked at the end of the target code identification step to make sure that the code still meets the specifications.

 

  • Finally, the target code represented by polynomials is automatically mapped into the library elements
    and complex processor instructions

Our key contribution is a new method to map critical code segments into pre-optimized software library elements and complex assembly instructions using symbolic polynomial manipulation. The mapping process selects the solution that offers best performance with sufficient accuracy.

 

RESULTS:

We use complex library mapping to optimize a set of examples for the SmartBadgeIV portable embedded system running embedded Linux operating system. The figure below shows the results of an optimized MP3 audio decoder.

 

The final optimized MP3 audio decoder runs four times faster than real-time playback while consuming four times less energy. Since the decoder executes faster than real-time playback, additional energy savings are now possible by using processor frequency and voltage scaling.

References: