You are on page 1of 4

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2016.2631466, IEEE
Communications Letters

IEEE COMMUNICATION LETTERS 1

Optimizing Green Energy, Cost, and Availability in


Distributed Data Centers
Rakesh Tripathi, S. Vignesh and Venkatesh Tamarapalli

AbstractIntegrating renewable energy and ensuring high Existing literature mostly addressed workload distribution
availability are among two major requirements for geo- considering green data centers to minimize the total cost of
distributed data centers. Availability is ensured by provisioning ownership (TCO). Our work is the first one to consider the
spare capacity across the data centers to mask data center
failures (either partial or complete). We propose a mixed integer real-time price of electricity (while enforcing a minimum
linear programming formulation for capacity planning while green energy usage), to design cost-efficient fault-tolerant
minimizing the total cost of ownership (TCO) for highly available, distributed data centers. We provision spare capacity across the
green, distributed data centers. We minimize the cost due to data centers so that the demand is met even after the failure at
power consumption and server deployment, while targeting a a data center site (either partial or complete), while minimizing
minimum usage of green energy. Solving our model shows that
capacity provisioning considering green energy integration, not the TCO. We consider both green and brown energy cost in
only lowers carbon footprint but also reduces the TCO. Results minimizing the operational cost. We model the problem using
show that upto 40% green energy usage is feasible with marginal mixed integer linear programming (MILP) where, the main
increase in the TCO compared to the other cost-aware models. constraints include; green energy usage bound, latency bound,
Index TermsGeo-distributed green data center, cloud provi- and the failure probability at a site (partial or complete).
sioning, operating cost minimization, fault tolerance. Solving our model gives the optimal server distribution across
the sites and the demand distribution that minimizes the TCO.
I. I NTRODUCTION
II. O PTIMIZATION F RAMEWORK
With increased usage of Internet services, there is a rapid
growth in the number of geo-distributed data centers around In this section, we first state the assumptions in the model
the world. At the same time, data center operators are under for green energy availability, failure and power consumption,
pressure to minimize the carbon footprint. One of the ways and then present the MILP formulation.
to do this is to use renewable energy from on-site or off-
site sources. Recently, all major distributed data-centers are A. Assumptions
powered either partially or fully by renewable energy. The We assume a distributed data center powered by multiple
authors of [1] demonstrated how distributed data centers green and brown energy sources. Each data center is integrated
can exploit uncorrelated wind sources to meet 95% of their with green energy sources such as, onsite/offsite renewable en-
energy requirement. The work in [2] minimized the cost of ergy sources (wind and/or solar), Power Purchase Agreement
building data centers and renewable energy consumption while (PPA), and brown energy through grid. The brown energy
satisfying constraints on green energy integration, availability and PPA can be used flexibly based on the power demand
and latency. The authors of [3] considered capping carbon and available green energy. To account for different types of
footprint while minimizing the operating cost (including utility workload, we consider heterogeneous demand. The following
price, renewable energy cost, battery cost and operational assumptions are used in the model.
expenditure). For a good survey of the literature dealing with
Each data center consolidates the workload to keep the
integration of renewable energy in data centers, see [4].
power consumption proportional to the workload served.
It is estimated that the financial loss for an hour of downtime
Service time (including queuing delay) is same for all the
can range from $250,000-$500,000 for a large data center
data centers and the latency is due to the propagation delay.
operator [5]. Typically, high availability (also termed fault-
The requests are placed in a single queue to be served by
tolerance) is handled by spare capacity provisioning to mask
any server. All servers are uniformly loaded.
partial or complete data center failure (at a site). While the
Only one data center can completely or partially fail at any
work in [6] minimizes the cost of spare capacity provisioning
point in time [6]. Failure of more than one data center at
by minimizing the number of servers, our previous work
the same time is avoided by the choice of locations.
showed the need to minimize the operating cost by considering
spatio-temporal variation in electricity price [7]. However,
none of them considered the cost of green energy procurement B. Optimization Problem Formulation
while optimizing the operating cost. Demand: Let S be the set of data centers housing ms number
of servers. Let Lah
u denote the demand from a client region
The authors are with the Department of Computer Science & Engineering,
Indian Institute of Technology Guwahati, Assam, India. (e-mail: {t.rakesh, u during hour h for application type a. Let af h
su denote the
s.vignesh, t.venkat}@iitg.ernet.in). number of requests mapped from client region u to data center

1089-7798 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2016.2631466, IEEE
Communications Letters

IEEE COMMUNICATION LETTERS 2

s (s S), at hour h for an application type a. Here f = 0 Brown energy cost: The cost of brown energy consumed
indicates the case of no data center failure and f 1, 2..|S| across all the data centers is given by
indicates the case of f th data center failure. X
Heterogeneous workload: Since we assume that a data center = sh P Bsf h (6)
s,h,f
can serve different types of workload, we explicitly considered
the heterogeneity while calculating the server utilization. Let Renewable energy cost: The total cost incurred in using
the processing rate of the server be B bps and the mean job renewable energy across all the data centers is given by
size be Ja for application type a. The effective service rate is X
h fh
B R= si si (7)
Ja . We define the average utilization as s,h,f
af h
P
fh u,a su Ja Objective function: The objective for capacity provisioning
s = (1)
ms B in fault-tolerant green distributed data centers is to minimize
Failure model: Let p be the fraction of servers failing at any the TCO, denoted by z, which is simply the sum of all the
given site. The processing rate of a failed data center reduces aforementioned costs while satisfying constraints on delay,
to (1p)ms B. The data center utilization after the failure can green energy usage and availability. Formally, the problem is
be expressed as expressed as

Eq.(1) f 6= s
minimize z = + + R; (8)




sf h = Pu,a af h
su Ja
(2) subject to


(1p)ms B f = s, p < 1
X
fsih Psf h ,

0 f = s, p = 1 f (9)

s,i,h
Delay: Let Dsu be the propagation delay between the data X
center s and the client region u. We define a target delay of af
su
h
= Lah
u , u, a, h, f (10)
Dmax for all types of workloads (with different processing s
f f f
rate), when no data center has failed and Dmax (Dmax 2Dsu ysu Dmax , s, u, f = 0 (11)
Dmax ) for the case of a data center failure. We use a binary f
2Dsu ysu f
Dmax , s, u, f 1 (12)
f
variable, ysu to indicate the ability of data center s to service
requests from client region u when data center f has failed. 0 su ysu Lah
af h f
u , s, u, a, h, f (13)
fh max
Power Consumption: Let Pidle be the average power drawn s , s, h, f (14)
in idle condition and Ppeak be the power consumed when M min
ms M max , s (15)
server is running at peak utilization. The total power consumed
by s S, at hour h H is modeled as [8] af
su
h
= 0, u, a, h, s = f (16)
f
ysu {0, 1} , s, u, f (17)
Psf h = ms (Pidle + (Es 1)Ppeak )
Among the constraints, Eq. 9 ensures that the green energy
+ ms (Ppeak Pidle )sf h + , (3) sources meet percent of the total power demand over the
where Es is the power usage effectiveness (PUE) of a data time. Eq. 10 makes sure that the demand during every hour
center at s, defined as the ratio of the total power consumed is met. Eq. 11,12, and 13 ensure that all types of workload
by the data center to the power consumed by the computing are served within the latency bound (before and after failure).
equipment, and  is an empirical constant. Eq. 14 is used to limit the queuing delay by bounding the
Modeling Brown Energy Usage: Let sh be the price of brown average server utilization to max (0, 1]. It also ensures that
energy at a data center s during hour h and si h
be the price workload assigned to a failed data center is bounded by its
of green energy of type i, i {1, 2, 3, 4, 5} corresponding to capacity. Eq. 15 ensures that capacity limit of a data center
onsite wind, offsite wind, onsite solar, offsite solar and PPA, (in terms of number of servers) is not exceeded. Eq. 16 ensures
respectively. Let P Bsf h denote the amount of brown energy that no request is served by a failed data center.
drawn at hour h and fsih denotes the amount of renewable The decision variables in the MILP are: ms , the number of
energy drawn from source i. Since the brown energy is used servers in a data center s, af h
su , the number of requests from
only after exhausting the green energy available, the brown client region u mapped to data center s at hour h for workload
energy drawn from the grid is given by type a, and fsih , the renewable energy drawn from source i.
X fh
P Bsf h = Psf h si s, h, f (4) III. N UMERICAL R ESULTS
i
The proposed MILP model (termed GACED) is solved
Cost Model: We first define the following cost components. centrally using CPLEX and MATLAB tools on a Linux server
Server cost: Let be the cost of acquiring a server. The with Intel Xeon processor and 64 GB of RAM. Since spare
total cost of the servers in all the data centers is capacity provisioning in data centers is a one-time effort at the
=
X
ms (5) time of design, the running time is not a matter of concern. We
s
obtained the necessary data for three locations: Texas, Illinois,

1089-7798 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2016.2631466, IEEE
Communications Letters

IEEE COMMUNICATION LETTERS 3

4 1.1
GACED CED-B

0 1

Normalized TCO
0.9

Gain (%)
-4
0.8
-8
0.7
-12 0.6
-16 0.5
0 20 40 60 80 0 20 40 60 80 100
Minimum bound on Green Usage(%) Number of server failure in a data center(%)
(a) Green usage bound (b) Failure percentage
Fig. 1: Impact of varying green energy usage and failure percentage on the TCO of GACED model

1 GACED CED-B GACED CED-B


1
Normalized TCO

Normalized TCO
0.8
0.98
0.6
0.96
0.4
0.2 0.94

0 0.92
1 2 3 4 5 40 50 60 70 80 90 100
Demand (X) Delay (ms)
(a) TCO with demand variation (b) TCO with delay bound
Fig. 2: Impact of demand and delay bound variation on the TCO of GACED model

and California from the Web. The server processing rate is set data from NREL4 , we calculated the total power generated. At
to 1.6 MBps. Two types of workload are considered with the each site, we considered 20 wind turbines of capacity1.5MW,
mean job size 13KB and 26KB, and the service rate of 120 each and 10,000 solar panels of 120W, each. The cost of
f
and 60 requests/sec. Dmax and Dmax are set to 40ms and generating wind and solar power is obtained by taking the
80ms, respectively. PPA price for TX, CA, IL is taken as 8, 8, installation cost of 1630 and 3100 $/kW, and life time of 20
4 cents/kWh, respectively. For other parameters refer [7]. The and 25 yrs, respectively [3]. We took quarterly average of
client demand was generated from traces of Wikipedia.org1 . client demand and renewable energy generated for every hour
We chose nine regions of USA: Illinois, Tennessee, New of the day. The brown electricity price at different locations is
York, Arizona, Massachusetts, California, Florida, Missouri, taken from US energy information administration website.
and Louisiana. The demand at each location was proportional For comparison, we designed a baseline model (termed
to the number of Internet users2 . Table I reports on-site and CED-B) that minimizes the TCO, where the data centers
off-site renewable energy sources at each location with the are powered only with brown energy while retaining other
corresponding average capacity factor (CF), defined as the constraints. For a full-site failure scenario, Fig. 1a shows the
ratio of the actual power output to the maximum rated capacity. percentage gain in the TCO using GACED model compared
to the CED-B model. Even after failure, the gain is 2%
Source Location Avg CF(%) with the green energy usage of 20%. The gain reduces with
Wind California 27
Wind Illinois 32 increase in green energy usage because, our model increases
Onsite Wind Texas 33.6 the amount of (expensive) green energy purchased to satisfy
Solar California 22.8 the constraint. On the other hand, CED-B model has no cost
Solar Texas 23.46 from green energy usage. With the GACED model greening
Wind Arizona 30 can be achieved with very little or no extra cost, unless we
Wind Colorado 43
Offsite Wind Iowa 34
target high renewable energy usage.
Solar Arizona 27.74 Fig. 1b compares the TCO for the two models while forcing
Solar Colorado 24.61 40% green energy usage and varying the failure percentage.
TABLE I: Capacity factor for various green energy sources
We see that the TCO is almost similar for both the models due
to the fact that, the GACED model optimally uses cheaper
We used the models from NREL 3 and [9] for solar and wind renewable energy to reduce the TCO. Fig. 3 illustrates the
energy generation, respectively. Based on the meteorological offsite wind energy usage in the GACED model and its
corresponding price at the Texas data center. When the wind
1 http://dumps.wikimedia.org/other/pagecounts-raw energy is cheaper, GACED uses more of it to maintain the
2 http://www.internetworldstats.com/unitedstates.htm
3 http://pvwatts.nrel.gov/pvwatts.php 4 http://www.nrel.gov/midc/

1089-7798 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2016.2631466, IEEE
Communications Letters

IEEE COMMUNICATION LETTERS 4

30 6 work with 20% green energy, where GACED model has only
Brown price Wind used Wind price
3% higher TCO.

Power used (MW)


Price in cent/kWh 20 4 Harnessing green energy depends upon the cost and effi-
ciency of technology, which is constantly improving thereby
10 2 reducing the cost. For example, between 2006 and 2014, the
world-wide average photo-voltaic solar cell cost had dropped
0 0 by 78%5 . Considering about 10% reduction in green energy
0 10 20 30 40 50 60 70 80 90 100
price per year, we computed that the GACED model leads to
Time (in hours)
a gain of 10% in the TCO even with 80% green energy usage.
Fig. 3: Illustration of wind energy usage
IV. C ONCLUSION
6
We used MILP to formulate the cost-aware capacity pro-
3
visioning problem for fault-tolerant data centers ensuring a
0
minimum green energy usage (GACED model). The pro-
Gain (%)

-3
posed model outperforms the baseline model (CED-B) which
-6 minimizes the TCO considering only brown energy. Results
-9 demonstrate that even with renewable energy integration, the
-12
-40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40
TCO can be low with GACED model, despite green energy
Change in capacity factor(%) being costlier. Even forcing a green energy usage of upto
Fig. 4: Gain in TCO for GACED and CED-B models varying CF 80% leads to an additional cost of only 15% compared to
the CED-B model. The proposed model optimally schedules
demand considering the availability of green energy and its
same TCO (as with CED-B), albeit with reduced carbon price variation to lower the TCO. We conclude that with an
footprint. Due to intelligent usage of green energy, it was appropriate model, green energy integration lowers the cost of
possible to meet the target renewable energy usage of 40% designing fault tolerant distributed data centers with reduced
at all times. We conclude that the GACED model can lead carbon footprint. Our model would be further beneficial with
to greener data center deployment with no or little additional an improvement in technology leading to larger capacity factor
cost (though green energy procurement is costlier). (lower renewable energy cost).
Fig. 2a shows the impact of increase in demand (multiples
of baseline demand in previous experiment) on the TCO and R EFERENCES
green energy procurement decision. We set green energy usage [1] V. Gao, Z. Zeng, X. Liu, and P. Kumar, The answer is blowing in the
and failure percentage to 40% and 20%, respectively. As wind: Analysis of powering internet data centers with wind energy, in
demand increases, the TCO for both the models increases Proc. of IEEE INFOCOM, April 2013, pp. 520524.
[2] J. L. Berral, I. Goiri, T. D. Nguyen, R. Gavalda, J. Torres, and R. Bian-
due to obvious reasons. However, TCO with GACED model chini, Building green cloud services at low cost, in Proc. of IEEE
increases with the demand due to the green energy usage ICDCS, 2014, pp. 449460.
constraint. We note that, even with five-fold increase in the [3] C. Ren, D. Wang, B. Urgaonkar, and A. Sivasubramaniam, Carbon-aware
energy capacity planning for datacenters, in Proc. of IEEE MASCOTS,
demand (with a cost of almost 30 cents/kW for wind energy 2012, pp. 391400.
at Texas), it is possible to meet the 40% green energy usage [4] W. Deng, F. Liu, H. Jin, B. Li, and D. Li, Harnessing renewable energy
constraint with a meagre 4% increase in the TCO. Fig. 2b in cloud datacenters: opportunities and challenges, IEEE Network, pp.
4855, 2014.
shows the impact of relaxing latency requirement on the TCO. [5] 2013 study on data center outages, http://www.emersonnetworkpower.
f
Dmax is set to twice Dmax . We notice that both the models com/documentation/en-us/brands/liebert/documents/whitepapers/2013
reduce the TCO for relaxed latency bound, since there is more emerson data center outages sl-24679.pdf.
[6] I. Narayanan, A. Kansal, A. Sivasubramaniam, B. Urgaonkar, and
choice in the data centers serving a region. However, GACED S. Govindan, Towards a leaner geo-distributed cloud infrastructure, in
model lowers the TCO by considering locations powered by Proc. of HotCloud, 2014, pp. 18.
cheaper green energy. [7] R. Tripathi, S. Vignesh, and V. Tamarapalli, Cost-aware capacity pro-
visioning for fault-tolerant geo-distributed data centers, in Proc. of
Sensitivity analysis: We quantitatively evaluate the impact COMSNETS, 2016, pp. 18.
of uncertainty in the renewable energy availability on the [8] A.-H. Mohsenian-Rad and A. Leon-Garcia, Energy-information trans-
performance of GACED model. For the case of complete data mission tradeoff in green cloud computing, Carbon, vol.100, p.200, 2010.
[9] Y. Zhang, Y. Wang, and X. Wang, Greenware: Greening cloud-scale
center failure and 40 % green energy usage requirement, the data centers to maximize the use of renewable energy, in Proc. of
capacity factor was varied in the range of [-40%, 40%]. Fig. 4 ACM/USENIX Middleware, 2011, pp. 143164.
shows the percentage gain in the TCO with GACED model
(compared to the CED-B model). Since, the cost of renewable
energy decreases with increasing capacity factor [3], the
GACED model has lower TCO compared to the CED-B model
(by about 4%). This is because it efficiently exploits cheaper
green energy. If the forecasted green energy availability is
inaccurate, i.e., the capacity factor changes by 40%, we can 5 http://solarcellcentral.com/cost page.html

1089-7798 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like