You are on page 1of 10

2016 IEEE 36th International Conference on Distributed Computing Systems

Minimum-cost Cloud Storage Service Across Multiple Cloud Providers

Guoxin Liu and Haiying Shen


Department of Electrical and Computer Engineering
Clemson University, Clemson, SC 29631, USA
{guoxinl, shenh}@clemson.edu

AbstractMany Cloud Service Providers (CSPs) provide avoids a high risk of service failures due to datacenter fail-
data storage services with datacenters distributed worldwide. ure, which may be caused by disasters or power shortages.
These datacenters provide different Get/Put latencies and unit However, a single CSP may not have datacenters in all
prices for resource utilization and reservation. Thus, when se-
lecting different CSPs datacenters, cloud customers of globally locations needed by a worldwide web application. Besides,
distributed applications (e.g., online social networks) face two using a single CSP may introduce a data storage vendor
challenges: i) how to allocate data to worldwide datacenters to lock-in problem [7], in which a customer may not be
satisfy application SLO (service level objective) requirements free to switch to the optimal vendor due to prohibitively
including both data retrieval latency and availability, and ii) high switching costs. This problem can be addressed by
how to allocate data and reserve resources in datacenters
belonging to different CSPs to minimize the payment cost. To allocating data to datacenters belonging to different CSPs.
handle these challenges, we rst model the cost minimization Building such a geo-distributed cloud storage is faced with
problem under SLO constraints using integer programming. a challenge: how to allocate data to worldwide datacenters
Due to its NP-hardness, we then introduce our heuristic solu- to satisfy application SLO (service level objective) require-
tion, including a dominant-cost based data allocation algorithm ments including both data retrieval latency and availability?
and an optimal resource reservation algorithm. We nally
introduce an infrastructure to enable the conduction of the The data allocation in this paper means the allocation of
algorithms. Our trace-driven experiments on a supercomputing both data storage and Get requests to datacenters.
cluster and on real clouds (i.e., Amazon S3, Windows Azure Different datacenters of a CSP or different CSPs offer
Storage and Google Cloud Storage) show the effectiveness of different prices for Storage, data Gets/Puts and Transfers.
our algorithms for SLO guaranteed services and customer cost For example, Amazon S3 provides cheaper data storage
minimization.
price ($0.01/GB and $0.005/1,000 requests), and Windows
Azure in the US East region provides cheaper data
I. I NTRODUCTION
Get/Put price ($0.024/GB and $0.005/100,000 requests).
Cloud storage (e.g., Amazon S3 [1], Microsoft Azure [2] An application running on Amazon EC2 in the US East
and Google Cloud Storage [3]) is emerging as a popular region has data dj with a large storage size and few Gets
commercial service. Each cloud service provider (CSP) and data di which is read-intensive. Then, to reduce the
provides a worldwide data storage service (including Gets total payment cost, the application should store data dj into
and Puts) using its geographically distributed datacenters. In Amazon S3, and stores data di into Windows Azure in the
order to save the capital expenditures to build and maintain US East region. Besides the different prices, the pricing
the hardware infrastructures and avoid the complexity of manner is even more complicated due to two charging
managing the datacenters, more and more enterprisers shift formats: pay-as-you-go and reservation. Then, the second
their data workloads to the cloud storage [4]. challenge is introduced: how to allocate data to datacenters
Web applications, such as online social networks and belonging to different CSPs and make resource reservation
web portals, provide services to clients all over the world. to minimize the service payment cost?
The data access delay and availability are important to web Though many previous works [810] focus on nding
applications, which affect cloud customers incomes. For the minimum resource to support the workload to reduce
example, experiments at the Amazon portal [5] demonstrated cloud storage cost in a single CSP, there are few works
that a small increase of 100ms in webpage presentation that studied cloud storage cost optimization across multiple
time signicantly reduces user satisfaction, and degrades CSPs with different prices. SPANStore [11] aims to
sales by one percent. For a request of data retrieval in minimize the cloud storage cost while satisfying the latency
the web presentation process, the typical latency budget and failure requirement across multiple CSPs. However, it
inside a storage system is only 50-100ms [6]. In order to neglects both the resource reservation pricing model and the
reduce data access latency, the data requested by clients datacenter capacity limits for serving Get/Put requests. A
needs to be allocated to datacenters near the clients, which datacenters Get/Put capacity is represented by the Get/Put
requires worldwide distribution of data replicas. Also, inter- rate (i.e., the number of Gets/Puts in a unit time period)
datacenter data replication enhances data availability since it it can handle. Reserving resources in advance can save

1063-6927/16 $31.00 2016 IEEE 129


DOI 10.1109/ICDCS.2016.36
signicant payment cost for customers and capacity limit their applications, and need to avoid the data request failures.
is critical for guaranteeing SLOs since datacenter network One type of SLO species the Get/Put bounded latency and
overload occurs frequently [12, 13]. The integer program the percentage of requests obeying the deadline [7]. Another
used to create a data allocation in [11] becomes NP-hard, type of SLO guarantees the data availability in the form of
if it is modied with capacity-awareness, which however a service probability [14] by ensuring a certain number of
cannot be easily resolved. As far as we know, our work is replicas in different locations [1]. DAR considers both types
the rst that provides minimum-cost cloud storage service to form its SLO and can adapt to either type easily. This SLO
across multiple CSPs with the consideration of resource species the deadlines for the Get/Put requests (Lg and Lp ),
reservation and datacenter capacity limits. the maximum allowed percentage of data Get/Put operations
To handle the above-stated two challenges, we propose a beyond the deadlines (g and p ), and the minimum number
geo-distributed cloud storage system for Data storage and of replicas (denoted by ) among storage datacenters [1]. For
request Allocation and resource Reservation across multiple a customer datacenters Get request, any storage datacenter
CSPs (DAR). It transparently helps customers to minimize holding the requested data (i.e., replica datacenter) can serve
their payment cost while guaranteeing their SLOs. We this request. A cloud storage system usually species the
summarize our contributions below: request serving ratio for each replica datacenter of a data
We have modeled the cost minimization problem under item during billing period tk (e.g., one month).
multiple constraints using integer programming. The CSPs charge the customers by the usage of three
We introduce a heuristic solution including: different types of resources: the storage measured by the
(1) A dominant-cost based data allocation algorithm, which data size stored in a specic region, the data transfer to other
nds the dominant cost (Storage, Get or Put) of each datacenters operated by the same or other CSPs, and the
data item and allocates it to the datacenter with the number of Get/Put operations on the data [15]. The storage
minimum unit price of this dominant cost to reduce cost and data transfer are charged in the pay-as-you-go manner
in the pay-as-you-go manner. based on the unit price. The Get/Put operations are charged
(2) An optimal resource reservation algorithm, which max- in the manners of both pay-as-you-go and reservation. In
imizes the saved payment cost by reservation from the the reservation manner, the customer species and prepays
pay-as-you-go payment while avoiding over reservation. the number of Puts/Gets per reservation period T (e.g.,
We conduct extensive trace-driven experiments on a su- one year). The unit price for the reserved usage is much
percomputing cluster and real clouds (i.e., Amazon S3, cheaper than the unit price of the pay-as-you-go manner (by
Windows Azure Storage and Google Cloud Storage) to a specic percentage) [15]. For simplicity, we assume all
show the effectiveness and efciency of our system in datacenters have comparable price discounts for reservation.
cost minimization, SLO compliance and system overhead That is, if a datacenter has a low unit price in the pay-
in comparison with previous systems. as-you-go manner, it also has a relatively low price in the
DAR is suitable for the scenarios in which most customer reservation manner. The amount of overhang of the reserved
data items have dominant cost. The rest of the paper is or- usage is charged by the pay-as-you-go manner. Therefore,
ganized as follows. Section II depicts the cost minimization the payment cost can be minimized by increasing the
problem. Sections III and IV present the design and infras- amount of Gets/Puts charged by reservation and reducing
tructure of DAR. Section V presents the trace-driven exper- the amount of Gets/Puts for over reservation, which reserves
imental results. Section VI presents the related work. Sec- more Gets/Puts than actual usage. For easy reference, we
tion VII gives conclusion with remarks on our future work. list the main notations used in the paper in Table I.
II. P ROBLEM S TATEMENT B. Problem Formulation
A. Background For a customer, DAR aims to nd a schedule that allocates
each data item to a number of selected datacenters, allocates
We call a datacenter that operates a customers application
request serving ratios to these datacenters and determines
a customer datacenter of this customer. According to the
reservation in order to guarantee the SLO and minimize
operations of a customers clients, the customer datacenter
the payment cost of the customer. In the following, we
generates read/write requests to a storage datacenter storing
formulate this problem using integer programming. We rst
the requested data. A customer may have multiple customer
set up the objective of payment minimization. We then
datacenters (denoted by Dc ). We use dci Dc to denote
form the constrains including satisfying SLO guarantee, data
the ith customer datacenter of the customer. We use Ds to
availability, and datacenter capacity. Finally, we formulate
denote all datacenters provided by all cloud providers and
the problem with the object and constraints.
use dpj Ds to denote storage datacenter j. A clients
Payment minimization objective. We aim to minimize
Put/Get request is forwarded from a customer datacenter to
the total payment cost for a customer (denoted as Ct ). It is
the storage datacenter of the requested data. The cloud stor-
calculated as
age customers need data request (Puts/Gets) deadlines for Ct = Cs + Cc + Cg + Cp , (1)

130
Table I: Notations of inputs and outputs in data allocation. to know the percentage of Gets and Puts from dci to dpj
g
Input Description Input Description within the deadlines Lg and Lp , denoted by Fdc i ,dpj
(Lg )
p
Dc customer datacenter set Ds storage datacenter set and Fdci ,dpj (Lp ).
dci ith customer datacenter dpj j th storage datacenter To calculate Fdc g
(Lg ) and Fdc p
(Lp ), DAR records
g
dp Get capacity of dpj p
dp Put capacity of dpj i ,dpj i ,dpj
j j the Get/Put latency from dci to dpj , and periodically
psdp dpj s unit storage price ptdp dpj s unit transfer price
j j calculates their cumulative distribution functions (CDFs)
pgdp unit Get price of dpj ppdp unit Put price of dpj g p
j j represented by Fdc i ,dpj
(x) and Fdc i ,dpj
(x). To calculate the
F g (x) CDF of Get latency F p (x) CDF of Put latency
dpj reservation price ratio D entire data set average Get and Put rates from dci to dpj per unit time
tk tk
dl /sdl data l and dl s size Lg /Lp Get/Put deadline in tk (i.e., rdc i ,dpj
and wdc i ,dpj
), DAR needs to predict
d ,t
g /p allowed % of Gets vdcl k
i
Get/Put rates towards the average Get and Put rates on each data (denoted by
d ,t dl ,tk
/Puts beyond deadline /udcl k
i
dl from dci in tk dl ) from dci per unit time in tk (denoted by vdc i
and
Qg /Qp SLO satisfaction level number of replicas dl ,tk
udci ), respectively. For easy Get/Put rate prediction, as
tk kth billing period in T T reservation time
d ,t
in [16, 11], DAR conducts coarse-grained data division to
Ct total cost for storing D Xdpl k existence of dl
j achieve relatively stable request rates since a ne-grained
and serving requests in dpj during tk
data division makes the rates vary largely and hence difcult
to predict. It divides all the data to relatively large data items,
where Cs , Cc , Cg and Cp are the total Storage, Transfer, Get
which of each is formed by a number of data blocks, such
and Put cost during entire reservation time T , respectively.
as aggregating data of users in one location [17]. We use
The storage cost is calculated by: dl ,tk
   d ,tk Hdc i ,dpj
[0, 1] to denote the ratio of requests for dl from
Cs = Xdpl psdpj sdl , (2)
tk T dl D dpj Ds
j dci resolved by dpj during tk . Then,

where sdl denotes the size of data dl , psdpj denotes the unit t
rdck i ,dpj =
d ,t d ,t
vdcl i k Hdcli ,dp
k
dl ,tk j
storage price of datacenter dpj , and Xdp j
denotes a binary dl D

t d ,t d ,t
variable: it equals to 1 if dl is stored in dpj during tk ; and wdck i ,dpj = udcl i k Xdpl j k
0 otherwise. dl D

The transfer cost for importing data to storage datacenters As a result, we can calculate the actual percentage of
is one-time cost. The imported data is not stored in the Gets/Puts satisfying the latency requirement within tk for a
datacenter during the previous period tk1 , but is stored customer (denoted as qgtk and qptk ):
  t g
k
rdc Fdc ,dp (Lg )
in the datacenter in the current period tk . Thus, the data t
qgk =
dci Dc dpj Ds i ,dpj i j
,
  tk
transfercostis:  dci Dc dp Ds rdc ,dp
j i j
(6)
d ,tk d ,tk1   t
Cc = Xdpl (1 Xdpl ) pt (dpj ) sdl , (3) t dci Dc wdc
dpj Ds
k
i ,dpj
p
Fdc ,dp (Lp )
tk T dl D dpj Ds
j j
qpk =   tk
i j
.
dci Dc dpj Ds wdci ,dpj
where pt (dpj ) is the cheapest unit transfer price of replicat- To judge whether the deadline SLO of a customer is
ing dl to dpj among all datacenters storing dl . The Get/Put satised during tk , we dene the Get/Put SLO satisfaction
billings are based on the pay-as-you-go and reservation man- level of a customer, denoted by Qg and Qp .
g
ners. The reserved number of Gets/Puts (denoted by Rdp t
p
j Qg = M in{M in{qgk }tk T , (1 g )}/(1 g )
and Rdpj ) is decided at the beginning of each reservation t (7)
Qp = M in{M in{qpk }tk T , (1 p )}/(1 p ).
time period T . The reservation prices for Gets and Puts are We see that if
a specic percentage of their unit prices in the pay-as-you-go Qg Q p = 1 (8)
manner [15]. Then, we use to denote the reservation price i.e., Q = Q = 1, the customers deadline SLO is satised.
g p
ratio, which means that the unit price for reserved Gets/Puts Next, we formulate whether the data availability SLO is
is pgdpj and ppdpj , respectively. Thus, the Get/Put cost satised. To satisfy the data availability SLO, there must
is calculated by: be at least datacenters that stores the requested data and
  t g g
Cg = (M ax{ k
rdc tk Rdp , 0}+Rdp )pgdp , (4) satisfy the Get deadline SLO for each Get request of dci
i ,dpj j j j
tk dpj dci during tk . The set of all datacenters satisfying the Get
  t p p g
Cp = (M ax{ wdck
tk Rdp , 0}+Rdp )ppdp , (5) deadline SLO for requests from dci (denoted by Sdc ) is
i ,dpj j j j i
tk dpj dci represented by:
g g
where rdctk tk
and wdc denote the average Get and Put Sdc i
= {dpj |Fdc i ,dpj
(Lg ) (1 g )}. (9)
i ,dpj i ,dpj
rates from dci to dpj per unit time in tk , respectively. The set of data items read by dci during tk is represented
dl ,tk
SLO guarantee. Recall that DARs SLO species both by: Gtdck i = {dl |vdc i
> 0 dl D}. Then, the data
deadline constraint and data availability constraint. To for- availability constrain can be expressed as during any tk ,
g
mulate the SLO objective, we rst need to calculate the there exist at least replicas of any dl Gtdck i stored in Sdc :
t
 i
d ,tk
actual percentage of Gets/Puts satisfying the latency re- dci tk dl Gdc
k
Xdpl (10)
i j
g
quirement within billing period tk . To this end, we need dpj Sdc
i

131
Each customer datacenter maintains a table that maps each This procedure ensures the maximum payment cost saving
data item to its replica datacenters with assigned request in request rate variation.
serving ratios. This integer programming problem is NP-hard. A simple
Datacenter capacity constraint. Beside the SLO con- reduction from the generalized assignment problem [19] can
straints, each datacenter has limited capacity to supply Get be used to prove this. We skip detailed formal proof due to
and Put service, respectively [18]. Therefore, the cumulative limited space. The NP-hard feature makes the solution calcu-
Get rate and Put data rate of all data in a datacenter dpj lation very time consuming. We then propose a heuristic so-
should not exceed its Get capacity and Put capacity (denoted lution to this cost minimization problem in the next section.
g p
by dp j
and dp j
), respectively. Since storage is relatively
cheap and easy to be increased, we do not consider the III. DATA A LLOCATION AND R ESOURCE R ESERVATION
storage capacity as a constraint. This constraint can be easily DAR has two steps. First, its dominant-cost based data
added to our model, if necessary. Then, we can calculate the allocation algorithm (Section III-A) conducts storage and
available Get and Put capacities, denoted by gdpj and pdpj : request allocation scheduling that leads to the lowest total
 t payment only in the pay-as-you-go manner. Second, its
gdp = g
M in{dp dci Dc
k
rdc }tk T
i ,dpj
j j
 t optimal resource reservation algorithm (Section III-B) makes
pdp = p
M in{dp dci Dc
k
wdc }tk T
j j i ,dpj
a reservation in each used storage datacenter to maximally
If both gdpj
and pdpj
are no less than 0, the datacenter reduce the total payment.
capacity constraint is satised. Then, we can express the Dominant-cost based data allocation algorithm. To reduce
datacenter capacity constraint by: the total payment in the pay-as-you-go manner as much as
dpj M in{gdp , pdp } 0 (11) possible, DAR tries to reduce the payment for each data item.
j j

Problem statement. Finally, we formulate the problem Specically, it nds the dominant cost (Storage, Get or Put)
that minimizes the payment cost under the aforementioned of each data item and allocates it to the datacenter with the
constraints using integer programming. minimum unit price of this dominant cost.
min Ct (calculated by F ormulas (2), (3), (4) and (5)) (12)
Optimal resource reservation algorithm. It is a challenge
g p
to maximize the saved payment cost by reservation from
s.t. Q Q = 1 (8)

the pay-as-you-go payment while avoiding over reservation.
tk d ,t
dci tk dl Gdc Xdpl k (10) To handle this challenge, through theoretical analysis, we
i j
g
dpj Sdc
i
nd the optimal reservation amount, which avoids both over
reservation and under reservation as much as possible.
dpj M in{gdp , pdp } 0 (11)
j j

d ,t d ,t
A. Dominant-Cost based Data Allocation
dci dpj tk dl Hdcl ,dp
k
Xdpl k 1 (13)
l j j A valid data allocation schedule must satisfy Constraints
 d ,t
dci tk dl Hdcl ,dp
k
=1 (14) (8), (10), (11), (13) and (14). To this end, DAR rst identies
i j
dpj the datacenter candidates that satisfy Constraint (8), i.e.,
Constraints (8), (10) and (11) satisfy the deadline re- can supply a Get/Put SLO guaranteed service for a specic
quirement and data availability requirement in the SLO and customer datacenter dci . Then, DAR selects datacenters from
the datacenter capacity constraint, as explained previously. the candidates to store each data item requested by dci
Constraints (13) and (14) together indicate that any request to satisfy other constraints and achieve cost minimization
should be served by a replica of the targeted data. Objective (12). We introduce these two steps below.
Operation. Table I indicates the input and output param- Datacenter candidate identication. Constraint (8) guar-
eters in this integer program. The unit cost of Gets/Puts/Sto- antees that the deadline SLO is satised. That is, the
rage/Transfer usage is provided or negotiated with the CSPs. percentage of data Get and Put operations of a customer
During each billing period tk , DAR needs to measure the beyond the specied deadlines is no more than g and
g p
latency CDF of Get/Put (Fdc i ,dpj
(x) and Fdc i ,dpj
(x)), the p , respectively. To satisfy this constraint, some Get/Put
size of new data items dl (sdl ), and the data Get/Put rate response datacenters can have service latency beyond the
dl ,tk
from each dci (vdc i
and uddcl ,ti k ). The output is the data deadlines with probability larger than g and p , while others
storage allocation (Xdp dl ,tk
), request servicing ratio allocation have the probability less than g and p . Since nding a
j
dl ,tk
(Hdci ,dpj ) and the total cost Ct . The optimal Get/Put reser- combination of these two types of datacenters to satisfy
g p the SLO is complex, DAR simply nds the datacenters that
vation in each storage datacenter (Rdp /Rdp ) is an output at
j j
have the probability less than g and p . That is, if dpj
the beginning of reservation time period T and is an input at g
serves Get/Put from dci , dpj has Fdc (Lg ) 1 g and
each billing period tk in T . After each tk , T is updated as the p
i ,dpj

remaining time after tk represented by T \{tk }. DAR adjusts Fdci ,dpj (L ) 1  . That is:
p p
t g
the data storage and request distribution among datacenters tk T dci Dc , rdc
k
> 0 Fdc (Lg ) 1 g (15)
i ,dpj i ,dpj
tk p
under the determined reservation using the same procedure. tk T dci Dc , wdc >0 Fdc (Lp ) 1 p
(16)
i ,dpj i ,dpj

132
g
Then, by replacing Fdc i ,dpj
(Lg ) with 1 g in Equation (6), If one cost based on its minimum unit price among
we can have qg 1 g which ensures the Get SLO.
tk datacenters is larger than the sum of the other costs based
The same applies to the Put SLO. Therefore, the new on their maximum unit prices among datacenters, we con-
Constraints (15) and (16) satisfy Constraint (8). sider this cost as the dominant cost. We do not consider
Accordingly, for each customers datacenter dci , we can the transfer cost for importing data when determining the
g dominant cost of a data item since it is one-time cost and
nd Sdc i
using Equation (9), a set of storage datacenters
that satisfy Get SLO for Gets from dci . For each data comparatively small compared to other three costs, and then
dl , we can nd another set of storage datacenters Sdpl = is less likely to be dominant in the total cost of a data item.
{dpj |dci tk , (uddcl ,ti k > 0) (Fdc g
i ,dpj
(Lp ) 1 p )} that We classify each dl requested by dci into four different
consists of datacenters satisfying Put SLO of dl . To allocate sets: Put dominant, Get dominant, Storage dominant and
dl requested by dci , in order to satisfy both Get and Put balanced. Data blocks in the balanced set do not have an
g
delay SLOs, we can allocate dl to any dpj Sdc Sdpl . obvious dominant cost. A data item should be stored in the
g
Sdpl that has the lowest
i
datacenter in the candidates Sdc i
Algorithm 1: Dominant-cost based data allocation. unit price in its dominant resource in order to reduce its cost
1 for each dci in Dc do as much as possible. Finding such a datacenter for each data
2 Lsdc , Lgdc and Lgdc are Sdc g
sorted in an increasing order of item dl requested by a given dci is also time-consuming.
i i i i
unit Storage/Get/Put price, respectively. g
tk Note that Sdc is common for all data items requested by dci .
3 for each dl with tk dl Gdc do i
g
4 H = 100%;
i Then, to reduce time complexity, we can calculate Sdc i
only
g
5
d
switch dl with Hdcl = H do one time. From Sdci , we select the datacenter that belongs
to Sdpl to allocate each dl requested by a given dci .
i
6 case dominant
7 L = Ldc or Lgdc or Lpdc according to the
s
The pseudocode of this algorithm is shown in Algo-
i i i
dominant cost is Storage or Get or Put
rithm 1, in which all symbols without tk denote all re-
8 case balanced g
9
g
Find dpj Sdc Sdp with the smallest maining billing periods in T . For each dci , we sort Sdc i
i l
Cdcl
d
and satises all constraints by increasing order of unit Put cost, unit Get cost and
i ,dpj
unit Storage cost, respectively, which results in three sorted
10 for each dpj with dpj L Sdp do lists. We call them Put, Get and Storage sorted datacenter
l
d
11 if (Xdpl = 1 pdp < 0) (gdp = 0) then lists, respectively. We use M axg /M ing , M axs /M ins and
j j j
12 Continue; M axp /M inp to denote the maximum/minimum Get unit
13 Find the largest Hdcl
d
satisfying prices, Storage unit prices and Put unit prices among the
i ,dpj g
g dl
dp 0 H Hdc ,dp ; datacenters belonging to Sdc i
.
j i j
14
d d
if Cdcl ,dp Cdcl ,dp when For each data dl requested by a given dci , we calculate
i j i k (k=j+1,...,j+c)
Hdci ,dpk = Hdci ,dpj then
its maximum/minimum Storage cost, Get cost and Put cost,
15
d
Xdpl = 1; H = H Hdcl
d
;
respectively:

i ,dpj
j
M axds l = tk T M axs sdl tk ,
16 else  dl ,tk
d M axdgl = tk T M axg vdc tk ,
17 Hdcl ,dp = 0;  i


i j
d
M axp = tk T M axp uddcl ,ti k tk ,
dl
if dpj S g Xdpl H = 0 then
18
dci j M inds l , M indgl and M indpl are calculated similarly. If
19 break; M inds l >> M axdgl +M axdpl , we regard that the data belongs
to the Storage dominant set. Similarly, we can decide
whether dl belongs to the Get or Put dominant set. If dl
Min-cost storage datacenter selection. After the data- does not belong to any dominant set, it is classied into the
g
center candidates Sdc i
Sdpl are identied, DAR needs to balanced set. The datacenter allocation for data items in each
further select datacenters that lead to the minimum payment dominant set is conducted in the same manner, so we use
cost. For this purpose, we can use a greedy method, in which the Get dominant set as an example to explain the process.
g
the cost of storing data item dl in each dpj Sdc i
Sdpl For each data dl in the Get dominant set, we try each
dl
(denoted as Cdci ,dpj ) is calculated based on Equation (1) datacenter from the top in the Get sorted datacenter list.
and the dpj with the lowest cost is selected. However, such We nd a datacenter satisfying Get/Put capacity constraints
a greedy method is time consuming. Our dominant-cost (Constraint (11)) (Line 11) and Get/Put latency SLO con-
based data allocation algorithm can speed up the datacenter straints (Constraint (8)) (Lines 9-10), and determine the
selection process. Its basic idea is to nd the dominant cost largest possible request serving ratio of this replica. The
among the different costs in Equation (1) for each data item subsequent datacenters in the list may have a similar unit
dl requested by each dci and stores dl in the datacenter that price for Gets but have different unit prices for Puts and
minimizes the dominant cost. Storage, which may lead to lower total cost for this data

133
40% 100% balanced set. Figure 1(b) shows the CDF of data items over
Percentage of data

CDF of data items


items of each set

30% 80% the dominant ratio in the Get dominant set as an example.
60% It shows that most of the data items in the Get dominant
20%
40% set have a dominant ratio no less than 8, and the largest
10% 20% dominant ratio reaches 3054. Thus, the cost of these data
0% 0% items quickly decreases when the Get unit price decreases,
Storage Get Put Balanced 2 8 32 128 512 2048
Domination set Domination ratio and then we can allocate them to the datacenter with the
(a) Data distribution over dominant (b) CDF of data items over dominant minimum Get unit price. These results support the algorithm
sets ratio design of nding appropriate datacenter dpj in the sorted
Figure 1: Efciency and the validity of the dominant-cost based data datacenter list of the dominant resource of a data item.
allocation algorithm .
allocation. Therefore, we choose a number of subsequent B. Optimal Resource Reservation
dl
datacenters, calculate Cdc i ,dpk
for dl , where k [j+1, j+c],
After the dominant-cost based allocation, we need to
and choose dpj to create a replica and assign requests to
dl determine reserved Get/Put rates for each datacenter in order
(Constraint (13)) (Lines 15-17) if Cdc i ,dpj
is smaller than all
dl
to further reduce the cost as much as possible given a set of
Cdci ,dpk . If there are no less than replicas (Constraint (10)) allocated data items and their Get/Put rate over T . Since the
(Line 18), and the remaining request ratio to assign is equal method to determine the reserved Get and Put rates is the
to 0 (Constraint (14)) (Lines 4 and 18), the data allocation same, we use Get as an example to present this method.
for dl is completed. For any data in the balanced set, we Before we introduce how to nd reservation amount to
g
choose the datacenter in Sdc i
Sdpl that generates the lowest achieve the maximum reservation benet, we rst introduce
total cost for dl . In the datacenter selection process, the the benet function of the reservation, denoted as Fdpj (x),
constraints in Section II-B are checked to ensure the selected where x is the reserved number of Gets/Puts in any billing
datacenter satisfying the conditions. After allocating all data period tk . The benet is the difference between the saved
items, we get a valid data allocation schedule with sub- cost by using reservation instead of pay-as-you-go manner
optimal cost minimization. and the cost for over-reservation. The over reservation cost
Efciency and validity of the algorithm. The efciency includes the cost for over reservation and the over calculated
of the dominant-cost based data allocation algorithm de- saving. Thus, we
 can calculate the benet by
pends on the percentage of data items belonging to the three Fdpj (x) = ( x (1 ) pgdpj ) Odpj (x) pgdpj , (17)
dominant sets, since it allocates data in each dominant set tk T
much more efciently than data in the balanced set. We then where Odpj (x) is the over reserved number of Gets. It is
measure the percentage of data items in each data set from a calculated by    t
real trace in order to measure the efciency of the algorithm. Odpj (x) = M ax{0, x rdck i ,dpj tk }. (18)
We get the Put rates of each data from the publicly available tk T dl D dci Dc
g
wall post trace from Facebook New Orleans networks [20], Recall that Rdp j
is the optimal number of reserved Gets for
which covers inter-posts between 188,892 distinct pairs of each billing period during T in a schedule. That is, when
g
46,674 users. We regard each users wall post as a data item. x = Rdp j
, Fdpj (x) reaches the maximum value, represented
g
The data size is typically smaller than 1 KB. The Get:Put by Bdpj = Fdpj (Rdp j
) = M ax{Fdpj (x)}xN + .
ratio is typically 100:1 in Facebooks workload [21], from In the following, we rst prove Corollary III.1, which sup-
which we set the Get rate of each data item accordingly. We ports the rationale that allocating as much data as possible
uses the unit prices for Storage, Get and Put in all regions in to the minimum-cost datacenter in the dominant-cost based
Amazon S3, Microsoft Azure and Google cloud storage [1 data allocation algorithm is useful in getting a sub-optimal
3]. For each data item dl , we calculated its dominant ratio result of reservation benet. Then, we present Corollary III.2
of Storage as M inds l /(M axdgl + M axdpl ), and if it is no less that helps nd reservation x to achieve the maximum reser-
than 2, we consider dl as storage dominant. Similarly, we vation benet. Finally, we present Theorem III.1 that shows
can get a dominant ratio of Get and Put. Figure 1(a) shows how to nd this reservation x.
the percentage of data items belonging to each dominant Corollary III.1. Given a datacenter dpj that already stores
set. We can see that most of the data items belong to a set of data items, allocating a new data item dl and its
the Storage dominant set and Get dominant set, and only requests to this datacenter, its maximum reservation benet
17.2% of data items belong to the balanced set. That is Bdpj is non-decreasing.
because in the trace, most data items are either rarely or 
Proof: After allocating dl to dpj , we use Fdp (x) to
frequently requested with majority costs as either Storage tk
j

or Get cost. The gure indicates that the dominant-cost denote the new reservation benet function since rdc i ,dpj
in
 g
based data allocation algorithm is efcient since most of Equation (18) is changed. Then, we can get Fdp j
(R dpj )
g tk
the data belongs to the three dominant sets rather than the Fdpj (Rdp j
) since r dci ,dpj is not decreasing. Since the new

134
  
reserved benet Bdp j
= M ax{Fdp j
(x)}xN + , thus Bdp j
We then use the binary search tree algorithm to nd the
 g g
Fdpj (Rdpj ) Fdpj (Rdpj ) = Bdpj after dl is allocated. 
optimal reservation number of Gets. Its pseudocode is shown
We dene the number of Gets in tk as m = M ax{ dl D in Algorithm 2. The time complexity of this algorithm is
 tk O(nlog n). It builds a binary search tree using O(nlog n),
dci Dc rdci ,dpj tk }tk T . Then, according to Equa-
tion (17), we can get the optimal reservation Gets Rdp g
and then nds the N th and (N + 1)th smallest values in the
j
[0, m]. Thus, by looping all integers within [0, m], we can tree using O(log n), and all other operations takes O(1).
get the optimal reservation that results in maximum Fdpj .
IV. S YSTEM I NFRASTRUCTURE
This greedy method, however, is time consuming. In order to
reduce the time complexity, we rst prove Corollary III.2, In this section, we introduce the infrastructure to conduct
based on which we introduce a binary search tree based the previously introduced DAR algorithms. It collects the
optimal reservation method. information of scheduling inputs, calculates the data allo-
cation schedule and conducts data allocation. As shown in
Corollary III.2. For a datacenter dpj , its benet function
g Figure 2, DARs infrastructure has one master server and
Fdpj (x) is increasing when x [0, Rdp ) and decreasing
g
j multiple agent servers, each of which is associated with a
when x (Rdp , m].
j customer datacenter. Agent servers periodically measure the
Proof: According to Equation (17), we dene FI (x) = parameters needed in the schedule calculation and conducted
Fdpj (x)Fdpj (x1) = (n(1)OI (x))pgdpj , where n by the master.
is the number of billing periods in T . The extra over reserved
number of Gets of Odpj (x) compared to Odpj (x 1),
Input
Cloud storage
represented by OI (x) = Odpj (x) Odpj (x 1), equals the Dominant-
Cost based
number of billing periods during  T that tkhave the number
Data
Front-end Allocatio
of Gets smaller than x, i.e., dci Dc rdci ,dpj tk < x. Customer Optimal
Resource
Therefore, OI (x) is increasing. At rst OI (0) = 0, and when datacenter
Statistical result
Reservation

OI (x) < n (1 ), then FI (x) > 0, which means Fdpj (x) Request ratio Output
is increasing; when OI (x) > n (1 ), then FI (x) < 0, Agent Master

which means Fdpj (x) is decreasing. Therefore, Fdpj (x) is Figure 2: Overview of DARs infrastructure
g
increasing and then decreasing. Since Fdpj (Rdp j
) reaches In cloud, the reservation is made at the beginning of
the largest Fdpj (x), we can derive that Fdpj (x) is increasing reservation time period T and remains the same during
g g
when x [0, Rdp j
), and decreasing when x (Rdp j
, m]. T . Due to the time-varying feature of the inter-datacenter
latency and Get/Put rates, the master needs to periodically
Algorithm 2: Binary search tree based resource reservation. calculate the allocation schedule after each billing period tk
t
1 Build a balanced binary search tree of A with Adp
k
; and reallocate the data accordingly if the new schedule has
j
2 N1 = n (1 ) + 1; N2 = n (1 ) + 1; a smaller cost or the current schedule cannot guarantee the
3 x1 = the N1 th smallest value of A; SLOs. Therefore, the master executes the optimal resource
4 x2 = the N2 th smallest value of A;
5 if Fdpj (x1 ) Fdpj (x2 ) then
reservation algorithm and makes a reservation only before
6
g
Rdp = x1 ; t1 , and then updates T to T \{tk } and executes the dominant-
j
cost based data allocation algorithm after each tk .
7 else
8
g
Rdp = x2 ; During each tk , for the schedule recalculation, the mas-
g
j
ter needs the latencys CDF of Get/Put (Fdc i ,dpj
(x) and
 p
Fdci ,dpj (x)), the size of each dl (sdl ), and the datas Get/Put
We use Atdp k tk
= dci Dc rdc i ,dpj
tk to denote the total dl ,tk
and uddcl ,ti k ). Each agent in each customer
j
number of Gets served by dpj during tk , and dene A = rate from dci (vdc i

{Atdp
1
j
, Atdp
2
j
, ..., Atdp
n
j
}. datacenter periodically measures and reports these measure-
ments to the master server. The DAR master calculates the
Theorem III.1. To achieve the maximum reservation benet, data allocation schedule and sends the updates of the new
the reservation amount x is the N th smallest value in A = data allocation schedule to each customer datacenter. Specif-
{Atdp1
j
, Atdp
2
j
, ..., Atdp
n
j
}, where N equals n (1 )
+ 1 ically, it measures the differences of the data item allocation
or n (1 ) + 1. between the new and the old schedules and noties storage
Proof: The proof of Corollary III.2 indicates that datacenters to store or delete data items accordingly. In
when OI (x) = n (1 )
or n (1 ) , reality, billing time period tk (e.g., one month) may be too
Fdpj (x) can reach Bdpj . As indicated above, OI (x) long to accurately reect the variation of inter-datacenter
represents  the number of billing periods during T with latency and Get/Put rates dynamically in some applications.
Atdp
k
j
= r tk
dci Dc dci ,dpj tk < x. Therefore, when x is the In this case, DAR can set tk to a relatively small value with
N th smallest value in A, where N equals n (1 )
+ 1 the consideration of the tradeoff between the cost saving,
or n (1 ) + 1, Fdpj (x) reaches Bdpj . SLO guarantee and the DAR system overhead.

135
DAR SPANStore COPS DAR SPANStore COPS DAR SPANStore COPS DAR SPANStore COPS
Cheapest Random Cheapest Random Cheapest Random Cheapest Random

Get SLO satisfaction

Put SLO satisfaction


100% 100% 100% 100%

satisfaction level
satisfaction level

Lowest put SLO


Lowest get SLO

80% 80% 80% 80%


60% 60% 60% 60%

level
level
40% 40% 40% 40%
20% 20% 20% 20%
0% 0% 0% 0%
50 60 70 80 90 100 50 60 70 80 90 100 50 60 70 80 90 100 50 60 70 80 90 100
Request ratio (%) Request ratio (%) Request ratio (%) Request ratio (%)

(a) In simulation (b) In real world (a) In simulation (b) In real world
Figure 3: Get SLO guarantee performance. Figure 4: Put SLO guarantee performance.
V. P ERFORMANCE E VALUATION item accordingly. Facebook is able to handle 1 billion/10
million Gets/Puts per second [21], and has ten datacenters
We conducted trace-driven experiments on Palmetto
over the U.S. Accordingly, we set the Get and Put capacities
Cluster [22] with 771 8-core nodes and on real clouds. We
of each datacenter in an area to 1E8 and 1E6 Gets/Puts per
rst introduce the experiment settings on the cluster.
second, respectively. Whenever a datacenter is overloaded,
Simulated clouds. We simulated geographically dis-
the Get/Put operation was repeated once again. We set the
tributed datacenters in all 25 cloud storage regions in Ama-
billing period (tk ) to 1 month and set the reservation time to
zon S3, Microsoft Azure and Google cloud storage [1, 3, 2];
3 years [15]. We computed the cost and evaluated the SLO
each region has two datacenters simulated by two nodes
performance in 3 years in experiments. For each experiment,
in Palmetto. The distribution of the inter-datacenter Get/Put
we repeated 10 times and reported the average performance.
latency between any pair of cloud storage datacenters fol-
Real clouds. We also conducted small-scale trace-driven
lows the real latency distribution as in [11]. The unit prices
experiments on real-world CSPs including Amazon S3,
for Storage, Get, Put and Transfer in each region follows
Windows Azure Storage and Google Cloud Storage. We
the prices listed online. We assumed that the reservation
simulated one customer that has customer datacenters in
price ratio () follows a bounded Pareto distribution among
Amazon EC2s US West (Oregon) Region and US East
datacenters with a shape as 2 and a lower bound and an
Region [24]. Unless otherwise indicated, the settings are
upper bound as 53% and 76%, respectively [15].
the same as before. Due to the small scale, the number of
Customers. We simulated ten times of the number of
data items was set to 1000, the size of each data item was
all customers listed in [1, 3, 2] for each cloud service
set to 100MB, and was set to 2. The datacenter in each
provider. The number of customer datacenters for each
region requests all the data objects. We set the Put deadline
customer follows a bounded Pareto distribution, with an
to 200ms. One customers Gets and Puts operations cannot
upper bound, a lower bound and a shape as 10, 8 and 2,
generate enough workload to reach the real Get/Put rate
respectively. As in [11], in the SLOs for all customers, the
capacity of each datacenter. We set the capacity of a
Get deadline is restricted to 100ms [11], the percentage
datacenter in each region of all CSPs to 40% of total
of latency guaranteed Gets and Puts is 90%, and the Put
expected Get/Put rates. Since it is impractical to conduct
deadline for a customers datacenters in the same continent
experiments lasting a real contract year, we set the billing
is 250ms and is 400ms for an over-continent customer. The
period to 4 hours, and set the reservation period to 2 days.
minimum number of replicas of each data item was set to
We compared DAR with the following methods:
= 3 [15]. The size of the aggregated data of a customer
SPANStore [11], which is a storage over multiple CSPs
was randomly chosen from [0.1T B, 1T B, 10T B] as in [11].
datacenters to minimize cost supporting SLOs without con-
The number of aggregated data items of a customer follows
sidering capacity limitations and reservations.
a bounded Pareto distribution with a lower bound, an upper
COPS [25], which allocates requested data in a datacenter
bound and a shape as 1, 30000 and 2 [23].
with the shortest latency to the customer datacenter.
Get/put operations. The percentage of data items re-
Cheapest, in which the customer selects the datacenters
quested by each customer datacenter follows a bounded
with the cheapest cost to store each data item without
Pareto distribution with an upper bound, lower bound and
considering SLOs and reservations.
shape as 20%, 80% and 2, respectively. Each aggregated data
Random, in which the customer randomly selects datacen-
item is formed by data objects and the size of each requested
ters to allocate each data item.
data object was set to 100KB [11]. The Put rate follows
the publicly available wall post trace from Facebook New A. Comparison Performance Evaluation
Orleans networks [20], which covers inter-posts between To evaluate the SLO guarantee performance, we measured
188,892 distinct pairs of 46,674 users. We regard each users the lowest SLO satisfaction levels of all customers. The
wall post as a data item. The data size is typically smaller Get/Put SLO satisfaction level of a customer, Qg /Qp , is cal-
than 1 KB. The Get:Put ratio is typically 100:1 in Facebooks culated according to Equation (7) with qtgk /qtpk as the actual
workload [21], from which we set the Get rate of each data percentage of Gets/Puts within deadline during tk . We varied

136
each data items Get/Put rate from 50% to 100% (called DAR SPANStore COPS DAR SPANStore COPS

Cost ratio to Random (%)


Cheapest Random Cheapest Random
request ratio) of its original rate with a step size of 10%.

Cost ratio to Random


120% 128
Figures 3(a) and 3(b) show the (lowest) Get SLO sat- 100%
64
isfaction level of each system versus the request ratio 80%
60%
on the testbed and real CSPs, respectively. We see that 40%
32

the lowest satisfaction level follows 100%=DAR=COPS> 20% 16


SPANStore>Random>Cheapest. DAR considers both the 50 60 70 80
Request ratio (%)
90 100 50 60 70 80
Request ratio (%)
90 100

Get SLO and capacity constraints, thus it can supply a


(a) In simulation (b) In real world
Get SLO guaranteed service. COPS always chooses the
Figure 5: Cost minimization performance.
storage datacenter with the smallest latency. SPAN- Store
always chooses the storage datacenter with the Get SLO SPANStore>Cheapest>DAR. Since both COPS and Ran-
consideration. However, since it does not consider datacenter dom do not consider cost, they produce the largest cost.
capacity, a datacenter may become overloaded and hence SPANStore selects the cheapest datacenter within the dead-
may not meet the latency requirement. Thus, it cannot supply line constraints, thus it generates a smaller cost than systems
a Get SLO guaranteed service. Random uses all storage dat- without cost considerations. However, it produces a larger
acenters to allocate data, and the probability of a datacenter cost than Cheapest, which always chooses the cheapest
to become overloaded is low. However, since it does not datacenter in all datacenters. DAR generates the smallest
consider the Get SLO, it may allocate data to datacenters cost because it chooses the cheap datacenter under SLO
far away from the customer datacenters, which leads to long constraints and makes a reservation to further maximally
request latency. Thus, Random generates a smaller (lowest) save cost. The gure also shows that the cost of DAR in-
Get SLO satisfaction level than SPANStore. Cheapest does creases as the request ratio increases, but it always generates
not consider SLOs, and stores data in a few datacenters with the smallest cost. This is because when the datacenters with
the cheapest price, leading to heavy datacenter overload. the cheapest price under constraints are used up, the second
Thus, it generates the worst SLO satisfaction level. The optimal candidates will be chosen to allocate the remaining
gures also show that for both SPANStore and Random, the data. While all others do not consider the capacities of
Get SLO satisfaction level decreases as the request ratio in- datacenter, and hence violate the Get/Put SLO by making
creases. This is because a higher request ratio leads to higher some datacenters overloaded. Figure 5(b) shows the payment
request load on an overloaded datacenter, which causes a costs of all systems compared to Random on the real CSPs. It
worse SLO guaranteed performance due to the repeated shows the same order and trends of all systems as Figure 5(a)
requests. The gures indicate that DAR can supply a Get due to the same reason, except that COPS<Random. That
SLO guaranteed service with SLO and capacity awareness. is because the storage datacenters nearest to the customer
Figures 4(a) and 4(b) show the lowest Put SLO sat- datacenters happen to have a low price. The gures indicate
isfaction level of each system versus the request ratio that DAR generates the smallest payment cost in all systems.
on the testbed and real CSPs, respectively. We see that VI. R ELATED W ORK
the lowest SLO satisfaction level follows 100%=DAR
Deploying on multiple clouds. RACS [7] and Dep-
>SPANStore>COPS>Random>Cheapest. DAR considers
Sky [26] are storage systems that transparently spread
both Put SLOs and datacenter Put capacity, so it supplies
the storage load over many cloud storage providers with
SLO guaranteed service for Puts. Due to the same reason
replication in order to better tolerate provider outages or
as Figure 3(a), SPANStore generates a smaller Put SLO sat-
failures. In [27], an application execution platform across
isfaction level. COPS allocates data into datacenters nearby
multiple CSPs was proposed. Wang et al. [28] studied
without considering the Put latency minimization, and the
content propagation in social media traces and found that the
Put to other datacenters except the datacenter nearby may
propagation is quite localized and predictable. Based on this
introduce a long delay. Thus, COPS cannot supply a Put
pattern, they proposed a social application deployment using
SLO guaranteed service, and generates a lower Put SLO
local processing for all contents and global distribution only
satisfaction level than SPANStore. Random and Cheapest
for popular contents among cloud datacenters. COPS [25]
generate smaller Put SLO satisfaction levels than the others
and Volley [16] automatically allocate user data among dat-
and the level of SPANStore decreases as the request ratio
acenters in order to minimize user latency. Blizzard [29] is
increases due to the same reasons as in Figure 3(a). The
a high performance block storage for clouds, which enables
gures indicate that DAR can supply a Put SLO guaranteed
cloud-unaware applications to fast access any remote disk
service while others cannot.
in clouds. Unlike these systems, DAR additionally considers
Figure 5(a) shows the payment costs of all systems
both SLO guarantee and cost minimization for customers
compared to Random by calculating the ratio of the each
across multiple cloud storage systems.
systems cost to the cost of Random on the testbed.
Minimizing cloud storage cost. In [810], cluster
The gure shows that the cost follows COPSRandom>
storage automate conguration methods are proposed

137
to use the minimum resources needed to support the and J. Wilkes. Minerva: An Automated Resource Provisioning Tool
desired workload. None of the above papers study the cost for Large-Scale Storage Systems. ACM Trans. Comput. Syst., 2001.
[9] E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal, and A. C.
optimization problem for geo-distributed cloud storage over Veitch. Hippodrome: Running Circles Around Storage Administra-
multiple providers under SLO constraints. SPANStore [11] tion. In Proc. of FAST, 2002.
is a key-value storage over multiple CSPs datacenters to [10] H. V. Madhyastha, J. C. McCullough, G. Porter, R. Kapoor, S. Savage,
A. C. Snoeren, and A. Vahdat. SCC: Cluster Storage Provisioning
minimize cost and guarantee SLOs. However, it does not Informed by Application Characteristics and SLAs. In Proc. of FAST,
consider the capacity limitation of datacenters, which makes 2012.
its integer program a NP-hard problem that cannot be solved [11] Z. Wu, M. Butkiewicz, D. Perkins, E. Katz-Bassett, and H. V.
Madhyastha. SPANStore: Cost-Effective Geo-Replicated Storage
by its solution. Also, SPANStore does not consider resource Spanning Multiple Cloud Services. In Proc. of SOSP, 2013.
reservation to minimize the cost. DAR is advantageous in [12] J. Dean. Software Engineering Advice from Building Large-Scale
that it considered these two neglected factors and effectively Distributed Systems. http://research.google.com/people/jeff/stanford-
295-talk.pdf.
solves the NP-hard problem for cost minimization. [13] X. Wu, D. Turner, C. Chen, D. A. Maltz, X. Yang, L. Yuan,
Improving network for SLO guarantee. Several and M. Zhang. NetPilot: Automating Datacenter Network Failure
Mitigation. In Proc. of SIGCOMM, 2012.
works [3033] have been proposed to schedule network [14] Service Level Agreements. http://azure.microsoft.com/en-us/support/
ows or packages to meet deadlines or achieve high network legal/sla/.
throughput in datacenters. All these papers focus on SLO [15] Amazon DynnamoDB. http://aws.amazon.com/dynamodb/.
[16] S. Agarwal, J. Dunagan, N. Jain, S. Saroiu, A. Wolman, and
ensuring without considering the payment cost optimization. H. Bhogan. Volley: Automated data placement for geo-distributed
cloud services. In Proc. of NSDI, 2010.
VII. C ONCLUSION [17] G. Liu, H. Shen, and H. Chandler. Selective Data Replication for
This work aims to minimize the payment cost of Online Social Networks with Distributed Datacenters. In Proc. of
customers while guarantee their SLOs by using the ICNP, 2013.
[18] D. Borthakur, J. Gray, J. S. Sarma, K. Muthukkaruppan, N. Spiegel-
worldwide distributed datacenters belonging to different berg, H. Kuang, K. Ranganathan, D. Molkov, A. Menon, S. Rash,
CSPs with different resource unit prices. We rst modeled R. Schmidt, and A. Aiyer. Apache Hadoop Goes Realtime at
this cost minimization problem using integer programming. Facebook. In Proc. of SIGMOD, 2011.
[19] M. R. Garey and D. S. Johnson. Computers and Intractability: A
Due to its NP-hardness, we then introduced the DAR system Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.
as a heuristic solution to this problem, which includes [20] B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the
a dominant-cost based data allocation algorithm among Evolution of User Interaction in Facebook. In Proc. of WOSN, 2009.
[21] R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li,
storage datacenters and an optimal resource reservation R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and
algorithm to reduce the cost of each storage datacenter. DAR V. Venkataramani. Scaling Memcache at Facebook. In Proc. of NSDI,
also incorporates an infrastructure to conduct the algorithms. 2013.
[22] Palmetto Cluster. http://http://citi.clemson.edu/palmetto/.
Our trace-driven experiments on a testbed and real CSPs [23] P. Yang. Moving an Elephant: Large Scale Hadoop Data
show the superior performance of DAR for SLO guaranteed Migration at Facebook. https://www.facebook.com/notes/paul-
services and payment cost minimization in comparison yang/moving-an-elephant-large-scale-hadoop-data-migration-at-
facebook/10150246275318920.
with other systems. In our future work, we will explore [24] Amazon EC2. http://aws.amazon.com/ec2/.
methods to handle the situations, in which the data request [25] W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen.
rate varies largely and datacenters may become overloaded. Dont Settle for Eventual: Scalable Causal Consistency for Wide-Area
Storage with COPS. In Proc. of SOSP, 2011.
[26] A. N. Bessani, M. Correia, B. Quaresma, F. Andr, and P. Sousa.
ACKNOWLEDGEMENTS DepSky: Dependable and Secure Storage in a Cloud-of-Clouds. TOS,
This research was supported in part by U.S. NSF grants 2013.
[27] A. Wieder, P. Bhatotia, A. Post, and R. Rodrigues. Orchestrating the
NSF-1404981, IIS-1354123, CNS-1254006, IBM Faculty Deployment of Computations in the Cloud with Conductor. In Proc.
Award 5501145 and Microsoft Research Faculty Fellowship of NSDI, 2012.
8300751. [28] Z. Wang, B. Li, L. Sun, and S. Yang. Cloud-based Social Application
Deployment using Local Processing and Global Distribution. In Proc.
R EFERENCES of CoNEXT, 2012.
[1] Amazon S3. http://aws.amazon.com/s3/. [29] J. Mickens, E. B. Nightingale, J. Elson, K. Nareddy, D. Gehring,
[2] Microsoft Azure. http://www.windowsazure.com/. B. Fan, A. Kadav, V. Chidambaram, and O. Khan. Blizzard: Fast,
[3] Goolge. https://cloud.google.com/products/cloud-storage/. Cloud-scale Block Storage for Cloud-oblivious Applications. In Proc.
[4] H. Stevens and C. Pettey. Gartner Says Cloud Computing Will Be of NSDI, 2014.
As Inuential As E-Business. Gartner Newsroom, Online Ed., 2008. [30] C. Hong, M. Caesar, and P. B. Godfrey. Finishing Flows Quickly
[5] R. Kohavl and R. Longbotham. Online Experiments: Lessons with Preemptive Scheduling. In Proc. of SIGCOMM, 2012.
Learned, 2007. http://exp-platform.com/Documents/IEEE Com- [31] B. Vamanan, J. Hasan, and T. N. Vijaykumar. Deadline-Aware
puter2007OnlineExperiments.pdf. Datacenter TCP (D2TCP). In Proc. of SIGCOMM, 2012.
[6] B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bo- [32] H. Wu, Z. Feng, C. Guo, and Y. Zhang. ICTCP: Incast Congestion
hannona, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Control for TCP in Data Center Networks. In Proc. of CoNEXT,
Yahoo!s Hosted Data Serving Platform. In Proc. of VLDB, 2008. 2010.
[7] A. Hussam, P. Lonnie, and W. Hakim. RACS: A Case for Cloud [33] D. Zats, T. Das, P. Mohan, D. Borthakur, and R. Katz. DeTail:
Storage Diversity. In Proc. of SoCC, 2010. Reducing the Flow Completion Time Tail in Datacenter Networks.
[8] G. A. Alvarez, E. Borowsky, S. Go, T. H. Romer, R. A. Becker- In Proc. of SIGCOMM, 2012.
Szendy, R. A. Golding, A. Merchant, M. Spasojevic, A. C. Veitch,

138

You might also like