Professional Documents
Culture Documents
C
)
CPU Utilization (%)
CPU Utilization vs. CPU Temperature (A.
Dhesikan)
Data Center Energy Efficiency
Table 1. Engineering Matrix.
Criteria Max
Points
Design
A
Design
B
Design
C
Design
D
Energy Efficiency 10 0.0 10.0 8.8 6.3
High Availability 10 10.0 5.2 8.2 8.3
Automation 8 8 8 8 8
Server Reliability 5 5.0 1.1 3.3 2.3
Low Script Run Time 4 4 4 4 4
Total 42 27.0 28.3 32.3 28.9
Percent 100% 64.29% 67.38% 76.90% 68.81%
Design A: Load is distributed equally among four computers
Design B: Entire load is sent to one computer and the others are powered off; if the CPU
utilization of the computer exceeds 70%, another computer is powered on and load is distributed
evenly among computers that are powered on. When each of the computers that are powered on
reach 70% CPU utilization, another computer is powered on, etc.
Design C: Load is distributed equally between two computers and other computers are powered
off; if the CPU utilization of each computer that is powered on exceeds 70%, another computer
is powered on.
Design D: Entire load is sent to one computer and the others are powered off; if the CPU
utilization of the computer exceeds 50%, another computer is powered on and load is distributed
evenly among computers that are powered on. When each of the computers that are powered on
reach 50% CPU utilization, another computer is powered on, etc.
Criteria Determination:
Energy Efficiency- The four designs were simulated over 30 minutes and the total energy
saved was calculated after that amount of time. A linear function was used to calculate
the score based on energy saved
High Availability- The four designs were simulated over 30 minutes and the average
percent of requests delayed was found. A linear model was used to calculate the score
based on the average percent of requests delayed.
Automation- If the design worked with no user input, a score of 8 was given. If the
design did not work without user input, a score of 0 was given.
Server Reliability- The four designs were simulated over 30 minutes and the total number
of times a server switched on or off was found. A high number of power cycles results in
a low score for reliability and a low number of power cycles results in a high score. A
linear model was used to calculate the score based on the number of times a server was
switched on or off.
Low Script Run Time- The time for execution of the script was found to be the same for
all designs due to the nature of the script. This results in all designs receiving the same
score of 4.
Data Center Energy Efficiency
Data Analysis and Discussion
The first set of data shows the relationship between number of users and CPU utilization.
The data suggest a moderate relationship between the two variables, with an R
2
value of 0.90123.
There are a few anomalies in the data toward the end of experimentation; this is most likely due
to other factors like disk reads/writes causing variance in the processing capabilities. The slope
of the linear function, 0.0701, indicates that every user causes an increase of 0.0701% in CPU
utilization of the computer. The y-intercept of 0 suggests that when the number of users is zero,
the CPU utilization of the server is 0%.
The second set of data shows the relationship between number of users and power
consumption in watts. The data suggest a moderate relationship between the two variables, with
an R
2
value of 0.92904. Similarly to the previous set of data, there are anomalies toward the end
of experimentation. There are also few anomalies throughout experimentation, suggesting other
variables affecting the power consumption. The slope of the linear function, 0.0763, indicates
that every user causes the power consumption of the server to increase by 0.0763 watts. The y-
intercept of 54.263 suggests that when the number of users is zero, the power consumption is
54.263 watts.
The third set of data shows the relationship between number of users and CPU
temperature in degrees Celsius. The data suggest a weak relationship between the two variables,
with an R
2
value of 0.76021. There are several anomalies in the beginning of experimentation,
which may have caused a large difference in strength of the model. However, there is an
increasing trend between the two variables. The slope of the linear function, 0.5571, indicates
that every user causes the CPU temperature to increase by 0.5571C. The y-intercept of 25.093
suggests that when the number of users is zero, the CPU temperature is 25.093C.
Data of similar external studies have not been made publicly available at this time.
Nevertheless, the variables in this study are specific to the conditions provided; the type of
request sent by the user and the specifications of the computer may be vastly different in external
studies, possibly causing discrepancies.
Conclusions
A method of load balancing with greater energy efficiency can be designed; powering off
servers that are not necessary to process the current web requests to the data center saves energy
over time. In a data center simulation with only four servers, Design B was demonstrated to be
7.2% more energy efficient than the conventional method (Design A). Design C was
demonstrated to be 6.3% more energy efficient, and Design D was demonstrated to be 4.14%
more energy efficient. These percentages will only increase if the simulation was scaled to a
larger data center due to the greater number of combinations of servers, allowing for more
variation in server status, which essentially translates to even greater energy efficiency.
Limitations and Assumptions
Limitations for this project include the use of only one server to simulate a large data
center. In doing so, it is assumed that a single server can accurately represent a large server farm
in which all servers are the same model. In the simulation, four servers were assumed to be able
to represent a large data center and the processes deciding the load distribution in the data center
Data Center Energy Efficiency
were assumed to be able to scale to a large data center. It was also assumed that the CPU
utilization and other performance data of one server is the same as all servers of the same model.
Additionally, instead of having a natural flow of users to the webpage, a virtual user simulation
tool was used. It was assumed that this load simulation tool was an accurate representation of
real visitors to the website.
Based on the data gathered from experimentation, it was assumed that the CPU utilization
could be predicted to a moderate degree of accuracy based on the number of users or number of
requests being processed. It was also assumed that the power consumption could be predicted by
number of users.
Ambient room temperature, number of processes running on the server, physical location
or state of servers, type of computer or server, and software used to simulate load were
controlled in the experiment. However, humidity and background processes running on the
servers were not maintained.
There were several possible sources of error during the experiment. For instance, other
processes may have been running in the background of the server, causing random increases and
fluctuations in energy consumption. Also, the software used to simulate the users to the web
server may not have produced a perfectly concurrent flow of users, causing the results to shift in
either direction. Finally, there may have been a difference in the time stamps of the different sets
of logged data, causing all dependent variable values to be measured for the incorrect
independent variable setting. However, because the independent variable was changed gradually,
this would not have caused a great difference in understanding the relationship between number
of users and power consumed.
Applications and Future Experiments
The proposed methods of load balancing can be scaled up to large server farms to
increase the energy efficiency of data centers throughout the world.
The prototype must be redesigned for a larger server cluster and tested for various criteria
including energy efficiency and availability. Once one of the methods is demonstrated to be
more efficient in large data centers, the automation of the process must be scaled up for the
greater number of servers. With all aspects tested and prepared, the new method can be
implemented in data centers to begin saving energy immediately.
In the future, the study could be repeated with brand new servers that do not have many
background processes running to obtain a more accurate measure of energy saved. Also, the
entire test could be performed on a full-scale data center to get an actual representation of total
energy consumption and availability. The variation in number of users to any given web page
could be predicted using visitor history and mathematical modeling, and this prediction could be
factored into the load distribution design. Additionally, even more factors of the servers,
including CPU temperature, could be considered in the balancing of the load.
Data Center Energy Efficiency
Literature Cited
Alger, D. (2010). Grow a greener data center. Indianapolis, IN: Cisco Press.
Cho, J., Lim, T., Kim, B.S. (2012). Viability of datacenter cooling systems for energy efficiency
in temperate or subtropical regions: Case study. Energy and Buildings, Retrieved from
http://www.sciencedirect.com
Dixit, S., Ekbote, A., Judge, J., and Pouchet, J. (2008). Reducing data center energy
consumption. ASHRAE Journal, 50 (11), 14.
Doyle, R., Chase, J., Gadde, S., Vahdat, and A. (2002). The Trickle-Down Effect: Web Caching
and Server Request Distribution. Computer Communications, Retrieved from
http://www.sciencedirect.com
Fakhim, B., Behnia, M., Armfield, S.W., and Srinarayana, N. (2012). Cooling solutions in an
operational data centre: A case study. Applied Thermal Engineering, 31 (14-15), 2279-
2291. Retrieved from http://www.sciencedirect.com
Green, M., Karajgikar, S., Vozza, P., Gmitter, N., and Dyer, D. (2012). Achieving Energy
Efficient Data Centers Using Cooling Path Management Coupled with ASHRAE
Standards. Semiconductor Thermal Measurement and Management Symposium, 288-292.
Retrieved from http://ieeexplore.ieee.org/ doi: 10.1109/STHERM.2012.6188862
Iyengar, M., David, M., Parida, P., Kamath, V., Kochuparambil, B., Graybill, D.,Chainer, T.
(2012). Extreme energy efficiency using water cooled servers inside a chiller-less data
center. Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm),
137-149. Retrieved from http://ieeexplore.ieee.org/ doi: 10.1109/ITHERM.2012.6231424
Mukherjee, T., Banerjee, A., Varsamopoulos, G., Gupta, S., and Rungta, S. (2009). Spatio-
temporal thermal-aware job scheduling to minimize energy consumption in virtualized
heterogeneous data centers. Computer Networks, 53 (17), 2888-2904. Retrieved from
http://www.sciencedirect.com
Patterson, M.K. (2008). The Effect of Data Center Temperature on Energy Efficiency. Thermal
and Thermomechanical Phenomena in Electronic Systems, 1167-1174. Retrieved from
http://ieeexplore.ieee.org/
Poniatowski, M. (2010). Foundations of green it: consolidation, virtualization, efficiency, and
ROI in the data center. Indianapolis, IN: Prentice Hall.
Ricciardi, S., Careglio, D., Santos-Boada, G., Sol-Pareta, J., Fiore, U., and Palmieri, F. (2011).
Saving Energy in Data Center Infrastructures. Data Compression, Communications and
Processing (CCP), 265-270. Retrieved from http://ieeexplore.ieee.org/ doi:
10.1109/CCP.2011.9
Data Center Energy Efficiency
Sun, H.S., and Lee, S.E. (2006). Case study of data centers energy performance. Energy and
Buildings, 38 (5), 4078-4094. Retrieved from http://www.sciencedirect.com
Appendix
DataSource.php :
Publicly available at http://code.google.com/p/php-csv-parser/
db_config.php:
<?php
// These constants define how to connect to the database and which database to connect to
// Make sure to change these values to fit your own
define('DB_HOST', 'localhost');
// DO NOT USE root user in a production environment
define('DB_USER', 'root');
// Make sure to add a secure password
define('DB_PASS', '');
// This needs to change to match your database. Often, on shared hosting, it'll be your cpanel
username, followed by an underscore,
// followed by the actual database name
define('DB_NAME', 'anish_db');
?>
insert_csv_data_into_db.php:
<?php
// Contains the login info for the database
require_once('db_config.php');
$csv_data_file_src = 'FILTERED_hair salon wrentham ma 59.csv';
$csv_data_file_src2 = 'FILTERED_frozen yogurt wrentham ma.csv';
// THESE ALSO NEED TO BE CHANGED UPON UPLOAD
$visual_menu_id = 1;
$user_id = 2;
// Does the actual storing of data into the DB
storeDataFromCSV($csv_data_file_src, 1, 2);
storeDataFromCSV($csv_data_file_src2, 1, 3);
function retrieveAllDataRowsAsAssociativeArray($csv_filesrc, $row_to_use_as_header = 1){
require_once('DataSource.php');
// Instantiate the datasource class
$csv = new File_CSV_DataSource;
// We can set it to ignore X number of rows
Data Center Energy Efficiency
// load the csv file, ignore the first line since it doesn't have the header
$csv->load($csv_filesrc, $row_to_use_as_header);
// Make the CSV symmetric so that we can actually use most of the functions
$csv->symmetrize();
// Retrieve the data from the new CSV as a set of arrays with arrays in them
/* EXAMPLE:
array (
0 =>
array (
'name' => 'john',
'age' => '13',
'skill' => 'knows magic',
),
1 =>
array (
'name' => 'tanaka',
'age' => '8',
'skill' => 'makes sushi',
),
2 =>
array (
'name' => 'jose',
'age' => '5',
'skill' => 'dances salsa',
),
)
*/
$dataArray = $csv->connect();
return $dataArray;
}
function storeDataFromCSV($csv_data_file_src, $visual_menu_id, $user_id){
// Get the array with all data rows matched to header row
$csvDataArray = retrieveAllDataRowsAsAssociativeArray($csv_data_file_src, 0);
print_r(array_keys($csvDataArray[0]));
//print_r($csvDataArray);
// Connect to the database
$dbc = mysqli_connect(DB_HOST, DB_USER, DB_PASS, DB_NAME)
or die('Could not connect to database...');
// Check for duplicates in the database before inserting
foreach($csvDataArray as $csvRow){
$company_name = mysqli_real_escape_string($dbc, trim($csvRow['Company']));
Data Center Energy Efficiency
$website_url = mysqli_real_escape_string($dbc, trim($csvRow['Webpage']));
// $slogan
$addr1 = mysqli_real_escape_string($dbc, trim($csvRow['Address1']));
$addr2 = mysqli_real_escape_string($dbc, trim($csvRow['Address2']));
$phone = mysqli_real_escape_string($dbc, trim($csvRow['Phone']));
$email = mysqli_real_escape_string($dbc, trim($csvRow['E-mail']));
// $is_mobile
/*
$company_name = $csvRow['Company'];
$website_url = $csvRow['Webpage'];
// $slogan
$addr1 = $csvRow['Address1'];
$addr2 = $csvRow['Address2'];
$phone = $csvRow['Phone'];
$email = $csvRow['E-mail'];
// $is_mobile
*/
// Assume it is already there
$duplicate_exists = True;
// Check duplicate by comparing the website URL
$query1 = "
SELECT website_url FROM company_tbl WHERE
website_url = '$website_url'
";
$data = mysqli_query($dbc, $query1)
or die("Error INS: ".mysqli_error($dbc));
/*
if(mysqli_num_rows($data) == 0){
// There are no duplicates
$duplicate_exists = False;
}
*/
// We want more duplicates right now because we want to try to increase the CPU
usage
$duplicate_exists = False;
// Insert the same one 30 times
for($i=0; $i < 30; $i++){
// This is a new one, so let's insert it into the DB
if(!$duplicate_exists){
Data Center Energy Efficiency
$query2 = "
INSERT INTO company_tbl (name, website_url,
address_line1,
address_line2, phone, email, visual_menu_id, user_id,
date_added)
VALUES ('$company_name', '$website_url', '$addr1',
'$addr2',
'$phone', '$email', '$visual_menu_id', '$user_id', NOW())
";
// Submit the queries
$result = mysqli_query($dbc, $query2)
or die("Error RES: ".mysqli_error($dbc));
if($result){
echo "$website_url successfully inserted into database for
user $user_id <br />";
}
}else{
echo "Duplicate already exists for $website_url. It was not
inserted.<br />";
}
}
}
// Close the database
mysqli_close($dbc);
}
?>
csv file format:
Company,Address1,Address2,Webpage,Phone,E-mail
American Skin Care,158 Main Street,"Norfolk, MA
02056",http://americanskincarenorfolk.com/,(508) 528-2888,none
Beauty Nail & Spa Salon,11 Robert toner Blvd,"North Attleborough, MA
02760",http://beautynailspa-attleboro.com/,(508) 699-8881,none
Hair's Boston,225 Franklin Village Drive,"Franklin, MA 02038",http://hairsboston.com/,(508)
520-3919,none
Joseph Witt Salon,313 North Main Street,"Mansfield, MA
02048",http://josephwittsalon.com/,(508) 339-2623,none
L'Equipe Personalized Hairdressing,276 Franklin Village Drive,"Franklin, MA
02038",http://lequipesalon.com/,(508) 520-7828,none
Data Center Energy Efficiency
MG Salon & Spa,114 Main Street,"Medway, MA 02053",http://mgsalonspa.com/,(508) 533-
0779,none
Phillip Richard Salon,9 Washington Street,"Plainville, MA
02762",http://philliprichardsalon.com/,(508) 643-3700,none
Salon Michique's,1764 Mendon Road,"Cumberland, RI
02864",http://salonmichiquesspa.com/,(401) 333-6111,none
Unique Eyebrow Threading,3335 Mendon Rd,"Cumberland, RI
02864",http://uniqueeyebrowthreading.com/,(401) 405-0787,none
American Laser Skincare,550 North Main Street #4,"Attleboro, MA
02703",http://www.americanlaser.com/,(508) 223-4400,none
Beauty By Zangi,5 West Street,"Walpole, MA 02081",http://www.beautybyzangi.info/,(508)
660-1031,none
Bohemia,762 East Washington Street,"North Attleborough, MA
02760",http://www.bohemiasalon.com/,(508) 695-5500,none
Brian Richards Salon,Suite 3,"456 West Central Street, Franklin, MA
02038",http://www.brianrichardsalon.com/,(508) 528-7300,none
Acknowledgements
The author wishes to thank several mentors who contributed in various aspects of this
project. Mr. Harvell, a WPI graduate and visiting scholar at Mass Academy, provided ongoing
support and guidance throughout the project, and also offered helpful ideas and suggestions
during project development. Mr. Puneet Kohli, Director of Software Engineering at RSA,
provided valuable information to assist in the understanding of the project and surrounding
fields. Finally, the author would like to thank his parents and brother for contributing support,
funding, and resource acquirement, including all computers, servers, and routers used for
experimentation.