You are on page 1of 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication.


IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1

Cooperative Provable Data Possession for Integrity Verication in Multi-Cloud Storage


Yan Zhu, Hongxin Hu, Gail-Joon Ahn, Senior Member, IEEE, Mengyang Yu
AbstractProvable data possession (PDP) is a technique for ensuring the integrity of data in storage outsourcing. In this paper, we address the construction of an efcient PDP scheme for distributed cloud storage to support the scalability of service and data migration, in which we consider the existence of multiple cloud service providers to cooperatively store and maintain the clients data. We present a cooperative PDP (CPDP) scheme based on homomorphic veriable response and hash index hierarchy. We prove the security of our scheme based on multi-prover zero-knowledge proof system, which can satisfy completeness, knowledge soundness, and zero-knowledge properties. In addition, we articulate performance optimization mechanisms for our scheme, and in particular present an efcient method for selecting optimal parameter values to minimize the computation costs of clients and storage service providers. Our experiments show that our solution introduces lower computation and communication overheads in comparison with non-cooperative approaches. Index TermsStorage Security, Provable Data Possession, Interactive Protocol, Zero-knowledge, Multiple Cloud, Cooperative

I NTRODUCTION
uncertain storage pool outside the enterprise. Therefore, it is indispensable for cloud service providers (CSPs) to provide security techniques for managing their storage services. Provable data possession (PDP) [2] (or proofs of retrievability (POR) [3]) is such a probabilistic proof technique for a storage provider to prove the integrity and ownership of clients data without downloading data. The proof-checking without downloading makes it especially important for large-size les and folders (typically including many clients les) to check whether these data have been tampered with or deleted without downloading the latest version of data. Thus, it is able to replace traditional hash and signature functions in storage outsourcing. Various PDP schemes have been recently proposed, such as Scalable PDP [4] and Dynamic PDP [5]. However, these schemes mainly focus on PDP issues at untrusted servers in a single cloud storage provider and are not suitable for a multi-cloud environment (see the comparison of POR/PDP schemes in Table 1). Motivation. To provide a low-cost, scalable, locationindependent platform for managing clients data, current cloud storage systems adopt several new distributed le systems, for example, Apache Hadoop Distribution File System (HDFS), Google File System (GFS), Amazon S3 File System, CloudStore etc. These le systems share some similar features: a single metadata server provides centralized management by a global namespace; les are split into blocks or chunks and stored on block servers; and the systems are comprised of interconnected clusters of block servers. Those features enable cloud service providers to store and process large amounts of data. However, it is crucial to offer an efcient verication on the integrity

N recent years, cloud storage service has become a faster prot growth point by providing a comparably low-cost, scalable, position-independent platform for clients data. Since cloud computing environment is constructed based on open architectures and interfaces, it has the capability to incorporate multiple internal and/or external cloud services together to provide high interoperability. We call such a distributed cloud environment as a multi-Cloud (or hybrid cloud). Often, by using virtual infrastructure management (VIM) [1], a multi-cloud allows clients to easily access his/her resources remotely through interfaces such as Web services provided by Amazon EC2. There exist various tools and technologies for multicloud, such as Platform VM Orchestrator, VMware vSphere, and Ovirt. These tools help cloud providers construct a distributed cloud storage platform (DCSP) for managing clients data. However, if such an important platform is vulnerable to security attacks, it would bring irretrievable losses to the clients. For example, the condential data in an enterprise may be illegally accessed through a remote interface provided by a multi-cloud, or relevant data and archives may be lost or tampered with when they are stored into an A preliminary version of this paper appeared under the title Efcient Provable Data Possession for Hybrid Clouds in Proc. of the 17th ACM Conference on Computer and Communications Security (CCS), Chicago, IL, USA, 2010, pp. 881-883. Y. Zhu is with the Institute of Computer Science and Technology, Peking University, Beijing 100871, China, and the Beijing Key Laboratory of Internet Security Technology, Peking University, Beijing 100871, China. E-mail: {yan.zhu,huzexing}@pku.edu.cn. H. Hu and G.-J. Ahn are with the Arizona State University, Tempe, Arizona, 85287. E-mail: {hxhu,gahn}@asu.edu. M. Yang is with the School of Mathematics Science, Peking University, Beijing 100871, China. E-mail: myyu@pku.edu.cn.

Digital Object Indentifier 10.1109/TPDS.2012.66

1045-9219/12/$31.00 2012 IEEE

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2

TABLE 1 Comparison of POR/PDP schemes for a le consisting of blocks.


Scheme PDP[2] SPDP[4] DPDP-I[5] DPDP-II[5] CPOR-I[6] CPOR-II[6] Our Scheme Type CSP Comp. () () ( log ) ( log ) () ( + ) ( + ) Client Comp. () () ( log ) ( log ) () ( + ) ( + ) Comm. (1) () ( log ) ( log ) (1) () () Frag. Privacy Multiple Clouds Prob. of Detection 1 (1 ) 1 (1 ) 1 (1 ) 1 (1 )() 1 (1 ) 1 (1 ) 1 (1 )

is the number of sectors in each block, is the number of CSPs in a multi-cloud, is the number of sampling blocks, and are the probability of block corruption in a cloud server and -th cloud server in a multi-cloud = { }, respective, denotes the verication process in a trivial approach, and , , denotes Merkle Hash tree, homomorphic tags, and homomorphic responses, respectively.

and availability of stored data for detecting faults and automatic recovery. Moreover, this verication is necessary to provide reliability by automatically maintaining multiple copies of data and automatically redeploying processing logic in the event of failures. Although existing schemes can make a false or true decision for data possession without downloading data at untrusted stores, they are not suitable for a distributed cloud storage environment since they were not originally constructed on interactive proof system. For example, the schemes based on Merkle Hash tree (MHT), such as DPDP-I, DPDP-II [2] and SPDP [4] in Table 1, use an authenticated skip list to check the integrity of le blocks adjacently in space. Unfortunately, they did not provide any algorithms for constructing distributed Merkle trees that are necessary for efcient verication in a multi-cloud environment. In addition, when a client asks for a le block, the server needs to send the le block along with a proof for the intactness of the block. However, this process incurs signicant communication overhead in a multi-cloud environment, since the server in one cloud typically needs to generate such a proof with the help of other cloud storage services, where the adjacent blocks are stored. The other schemes, such as PDP [2], CPOR-I, and CPOR-II [6] in Table 1, are constructed on homomorphic verication tags, by which the server can generate tags for multiple le blocks in terms of a single response value. However, that doesnt mean the responses from multiple clouds can be also combined into a single value on the client side. For lack of homomorphic responses, clients must invoke the PDP protocol repeatedly to check the integrity of le blocks stored in multiple cloud servers. Also, clients need to know the exact position of each le block in a multi-cloud environment. In addition, the verication process in such a case will lead to high communication overheads and computation costs at client sides as well. Therefore, it is of utmost necessary to design a cooperative PDP model to reduce the storage and network overheads and enhance the transparency of verication activities in cluster-based cloud storage systems. Moreover, such a

cooperative PDP scheme should provide features for timely detecting abnormality and renewing multiple copies of data. Even though existing PDP schemes have addressed various security properties, such as public veriability [2], dynamics [5], scalability [4], and privacy preservation [7], we still need a careful consideration of some potential attacks, including two major categories: Data Leakage Attack by which an adversary can easily obtain the stored data through verication process after running or wiretapping sufcient verication communications (see Attacks 1 and 3 in Appendix A), and Tag Forgery Attack by which a dishonest CSP can deceive the clients (see Attacks 2 and 4 in Appendix A). These two attacks may cause potential risks for privacy leakage and ownership cheating. Also, these attacks can more easily compromise the security of a distributed cloud system than that of a single cloud system. Although various security models have been proposed for existing PDP schemes [2], [7], [6], these models still cannot cover all security requirements, especially for provable secure privacy preservation and ownership authentication. To establish a highly effective security model, it is necessary to analyze the PDP scheme within the framework of zero-knowledge proof system (ZKPS) due to the reason that PDP system is essentially an interactive proof system (IPS), which has been well studied in the cryptography community. In summary, a verication scheme for data integrity in distributed storage environments should have the following features: Usability aspect: A client should utilize the integrity check in the way of collaboration services. The scheme should conceal the details of the storage to reduce the burden on clients; Security aspect: The scheme should provide adequate security features to resist some existing attacks, such as data leakage attack and tag forgery attack; Performance aspect: The scheme should have the lower communication and computation overheads than non-cooperative solution.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 3

Related Works. To check the availability and integrity of outsourced data in cloud storages, researchers have proposed two basic approaches called Provable Data Possession (PDP) [2] and Proofs of Retrievability (POR) [3]. Ateniese et al. [2] rst proposed the PDP model for ensuring possession of les on untrusted storages and provided an RSA-based scheme for a static case that achieves the (1) communication cost. They also proposed a publicly veriable version, which allows anyone, not just the owner, to challenge the server for data possession. This property greatly extended application areas of PDP protocol due to the separation of data owners and the users. However, these schemes are insecure against replay attacks in dynamic scenarios because of the dependencies on the index of blocks. Moreover, they do not t for multi-cloud storage due to the loss of homomorphism property in the verication process. In order to support dynamic data operations, Ateniese et al. developed a dynamic PDP solution called Scalable PDP [4]. They proposed a lightweight PDP scheme based on cryptographic hash function and symmetric key encryption, but the servers can deceive the owners by using previous metadata or responses due to the lack of randomness in the challenges. The numbers of updates and challenges are limited and xed in advance and users cannot perform block insertions anywhere. Based on this work, Erway et al. [5] introduced two Dynamic PDP schemes with a hash function tree to realize (log ) communication and computational costs for a -block le. The basic scheme, called DPDP-I, retains the drawback of Scalable PDP, and in the blockless scheme, called DPDPII, the data blocks { } [1,] can be leaked by the re sponse of a challenge, = =1 , where is a random challenge value. Furthermore, these schemes are also not effective for a multi-cloud environment because the verication path of the challenge block cannot be stored completely in a cloud [8]. Juels and Kaliski [3] presented a POR scheme, which relies largely on preprocessing steps that the client conducts before sending a le to a CSP. Unfortunately, these operations prevent any efcient extension for updating data. Shacham and Waters [6] proposed an improved version of this protocol called Compact POR, which uses homomorphic property to aggregate a proof into (1) authenticator value and () computation cost for challenge blocks, but their solution is also static and could not prevent the leakage of data blocks in the verication process. Wang et al. [7] presented a dynamic scheme with (log ) cost by integrating the Compact POR scheme and Merkle Hash Tree (MHT) into the DPDP. Furthermore, several POR schemes and models have been recently proposed including [9], [10]. In [9] Bowers et al. introduced a distributed cryptographic system that allows a set of servers to solve the PDP problem. This system is based on an integrity-protected error-

correcting code (IP-ECC), which improves the security and efciency of existing tools, like POR. However, a le must be transformed into distinct segments with the same length, which are distributed across servers. Hence, this system is more suitable for RAID rather than a cloud storage. Our Contributions. In this paper, we address the problem of provable data possession in distributed cloud environments from the following aspects: high security, transparent verication, and high performance. To achieve these goals, we rst propose a verication framework for multi-cloud storage along with two fundamental techniques: hash index hierarchy (HIH) and homomorphic veriable response (HVR). We then demonstrate that the possibility of constructing a cooperative PDP (CPDP) scheme without compromising data privacy based on modern cryptographic techniques, such as interactive proof system (IPS). We further introduce an effective construction of CPDP scheme using above-mentioned structure. Moreover, we give a security analysis of our CPDP scheme from the IPS model. We prove that this construction is a multi-prover zero-knowledge proof system (MP-ZKPS) [11], which has completeness, knowledge soundness, and zero-knowledge properties. These properties ensure that CPDP scheme can implement the security against data leakage attack and tag forgery attack. To improve the system performance with respect to our scheme, we analyze the performance of probabilistic queries for detecting abnormal situations. This probabilistic method also has an inherent benet in reducing computation and communication overheads. Then, we present an efcient method for the selection of optimal parameter values to minimize the computation overheads of CSPs and the clients operations. In addition, we analyze that our scheme is suitable for existing distributed cloud storage systems. Finally, our experiments show that our solution introduces very limited computation and communication overheads. Organization. The rest of this paper is organized as follows. In Section 2, we describe a formal denition of CPDP and the underlying techniques, which are utilized in the construction of our scheme. We introduce the details of cooperative PDP scheme for multicloud storage in Section 3. We describes the security and performance evaluation of our scheme in Section 4 and 5, respectively. We discuss the related work in Section and Section 6 concludes this paper.

S TRUCTURE

AND

T ECHNIQUES

In this section, we present our verication framework for multi-cloud storage and a formal denition of CPDP. We introduce two fundamental techniques for constructing our CPDP scheme: hash index hierarchy (HIH) on which the responses of the clients challenges computed from multiple CSPs can be com-

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 4

bined into a single response as the nal result; and homomorphic veriable response (HVR) which supports distributed cloud storage in a multi-cloud storage and implements an efcient construction of collisionresistant hash function, which can be viewed as a random oracle model in the verication protocol. 2.1 Verication Framework for Multi-Cloud Although existing PDP schemes offer a publicly accessible remote interface for checking and managing the tremendous amount of data, the majority of existing PDP schemes are incapable to satisfy the inherent requirements from multiple clouds in terms of communication and computation costs. To address this problem, we consider a multi-cloud storage service as illustrated in Figure 1. In this architecture, a data storage service involves three different entities: Clients who have a large amount of data to be stored in multiple clouds and have the permissions to access and manipulate stored data; Cloud Service Providers (CSPs) who work together to provide data storage services and have enough storages and computation resources; and Trusted Third Party (TTP) who is trusted to store verication parameters and offer public query services for these parameters.
&OLHQWV 3XEOLF&ORXG <DKRR $PD]RQ *RRJOH

We neither assume that CSP is trust to guarantee the security of the stored data, nor assume that data owner has the ability to collect the evidence of the CSPs fault after errors have been found. To achieve this goal, a TTP server is constructed as a core trust base on the cloud for the sake of security. We assume the TTP is reliable and independent through the following functions [12]: to setup and maintain the CPDP cryptosystem; to generate and store data owners public key; and to store the public parameters used to execute the verication protocol in the CPDP scheme. Note that the TTP is not directly involved in the CPDP scheme in order to reduce the complexity of cryptosystem 2.2 Denition of Cooperative PDP

In order to prove the integrity of data stored in a multi-cloud environment, we dene a framework for CPDP based on interactive proof system (IPS) and multi-prover zero-knowledge proof system (MPZKPS), as follows: Denition 1 (Cooperative-PDP): A cooperative provable data possession = (, , ) is a collection of two algorithms (, ) and an interactive proof system , as follows: (1): takes a security parameter as input, and returns a secret key or a public-secret keypair (, ); (, , ): takes as inputs a secret key , a le , and a set of cloud storage providers = { }, and returns the triples (, , ), where is the secret in tags, = (, ) is a set of verication parameters and an index hierarchy for , = { () } denotes a set of all tags, () is the tag of the fraction () of in ; ( , ): is a protocol of proof of data possession between CSPs ( = { }) and a verier (V), that is, () () ( , ) (, )

=RKR
9HULILFDWLRQ 'DWD)ORZ

0LFURVRIW

3ULYDWH &ORXG,
9HULILFDWLRQ ,QIR4XHU\

3ULYDWH &ORXG,,

0XOWL&ORXG
7UXVWHG7KLUG 3DUWLHV 773

Fig. 1. Verication architecture for data integrity. In this architecture, we consider the existence of multiple CSPs to cooperatively store and maintain the clients data. Moreover, a cooperative PDP is used to verify the integrity and availability of their stored data in all CSPs. The verication procedure is described as follows: Firstly, a client (data owner) uses the secret key to pre-process a le which consists of a collection of blocks, generates a set of public verication information that is stored in TTP, transmits the le and some verication tags to CSPs, and may delete its local copy; Then, by using a verication protocol, the clients can issue a challenge for one CSP to check the integrity and availability of outsourced data with respect to public information stored in TTP.

1 0

= { () } is intact , = { () } is changed

where each takes as input a le () and a set of tags () , and a public key and a set of public parameters are the common input between and . At the end of the protocol run, returns a bit {01} denoting false and true. Where, denotes cooperative computing in . A trivial way to realize the CPDP is to check the data stored in each cloud one by one, i.e., ( () , () ) (, ), where denotes the logical AND operations among the boolean outputs of all protocols , for all

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 5

. However, it would cause signicant communication and computation overheads for the verier, as well as a loss of location-transparent. Such a primitive approach obviously diminishes the advantages of cloud storage: scaling arbitrarily up and down ondemand [13]. To solve this problem, we extend above denition by adding an organizer(), which is one of CSPs that directly contacts with the verier, as follows: () () ( , ) (, ),

where the action of organizer is to initiate and organize the verication process. This denition is consistent with aforementioned architecture, e.g., a client (or an authorized application) is considered as , the CSPs are as = { }[1,] , and the Zoho cloud is as the organizer in Figure 1. Often, the organizer is an independent server or a certain CSP in . The advantage of this new multi-prover proof system is that it does not make any difference for the clients between multi-prover verication process and singleprover verication process in the way of collaboration. Also, this kind of transparent verication is able to conceal the details of data storage to reduce the burden on clients. For the sake of clarity, we list some used signals in Table 2. TABLE 2 The signal and its explanation.
Sig. Repression the number of blocks in a le; the number of sectors in each block; the number of index coefcient pairs in a query; the number of clouds to store a le; [1,] the le with sectors, i.e., = {, } [1,] ; the set of tags, i.e., = { }[1,] ; the set of index-coefcient pairs, i.e., = {(, )}; the response for the challenge .

We make use of this simple hierarchy to organize data blocks from multiple CSP services into a largesize le by shading their differences among these cloud storage systems. For example, in Figure 2 the resources in Express Layer are split and stored into three CSPs, that are indicated by different colors, in Service Layer. In turn, each CSP fragments and stores the assigned data into the storage servers in Storage Layer. We also make use of colors to distinguish different CSPs. Moreover, we follow the logical order of the data blocks to organize the Storage Layer. This architecture also provides special functions for data storage and management, e.g., there may exist overlaps among data blocks (as shown in dashed boxes) and discontinuous blocks but these functions may increase the complexity of storage management.
Storage Layer Service Layer Express Layer

[1(2) H[ ("Cn")
(1)

[i(3) ,1

H [ ( 2) ( Fi )
1

[ (1) H
[
(2) 2

s W i 1i

("Fn")

H [ (1) (" Cn ")

CSP1

[3(2)

H [ (1) (" Cn ")

CSP2 CSP3 Overlap

Fig. 2. Index-hash hierarchy of CPDP model. In storage layer, we dene a common fragment structure that provides probabilistic verication of data integrity for outsourced storage. The fragment structure is a data structure that maintains a set of block-tag pairs, allowing searches, checks and updates in (1) time. An instance of this structure is shown in storage layer of Figure 2: an outsourced le is split into blocks {1 , 2 , , }, and each block is split into sectors {,1 , ,2 , , , }. The fragment structure consists of block-tag pair ( , ), where is a signature tag of block generated by a set of secrets = (1 , 2 , , ). In order to check the data integrity, the fragment structure implements probabilistic verication as follows: given a random chosen challenge (or query) = {(, )} , where is a subset of the block indices and is a random coefcient. There exists an efcient algorithm to produce a constant-size response (1 , 2 , , , ), where comes from all {, , } and is from all { , } .

2.3 Hash Index Hierarchy for CPDP To support distributed cloud storage, we illustrate a representative architecture used in our cooperative PDP scheme as shown in Figure 2. Our architecture has a hierarchy structure which resembles a natural representation of le storage. This hierarchical structure consists of three layers to represent relationships among all blocks for stored resources. They are described as follows: 1) Express Layer: offers an abstract representation of the stored resources; 2) Service Layer: offers and manages cloud storage services; and 3) Storage Layer: realizes data storage on many physical devices.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 6

Given a collision-resistant hash function (), we make use of this architecture to construct a Hash Index Hierarchy (viewed as a random oracle), which is used to replace the common hash function in prior PDP schemes, as follows: 1) Express layer: given random { } =1 and the ( ) and makes le name , sets (1) = =1 it public for verication but makes { } =1 secret; 2) Service layer: given the (1) and the cloud name (2) , sets = (1) ( ); 3) Storage layer: given the (2) , a block number , (3) and its index record = , sets , = (2) ( ), where is the sequence number of a block, is the updated version number, and is a random integer to avoid collision. As a virtualization approach, we introduce a simple index-hash table = { } to record the changes of le blocks as well as to generate the hash value of each block in the verication process. The structure of is similar to the structure of le block allocation table in le systems. The index-hash table consists of serial number, block number, version number, random integer, and so on. Different from the common index table, we assure that all records in our index table differ from one another to prevent forgery of data blocks and tags. By using this structure, especially the index records { }, our CPDP scheme can also support dynamic data operations [8]. The proposed structure can be readily incorperated into MAC-based, ECC or RSA schemes [2], [6]. These schemes, built from collision-resistance signatures (see Section 3.1) and the random oracle model, have the shortest query and response with public veriability. They share several common characters for the implementation of the CPDP framework in the multiple clouds: 1) a le is split into sectors and each block ( sectors) corresponds to a tag, so that the storage of signature tags can be reduced by the increase of ; 2) a verier can verify the integrity of le in random sampling approach, which is of utmost importance for large les; 3) these schemes rely on homomorphic properties to aggregate data and tags into a constantsize response, which minimizes the overhead of network communication; and 4) the hierarchy structure provides a virtualization approach to conceal the storage details of multiple CSPs. 2.4 Homomorphic Veriable Response for CPDP A homomorphism is a map : between two groups such that (1 2 ) = (1 ) (2 ) for all 1 , 2 , where denotes the operation in and denotes the operation in . This notation has been used to dene Homomorphic Veriable Tags (HVTs) in [2]: Given two values and for two messages and , anyone can combine them into a value corresponding to the sum of the messages + . When provable data possession is considered as

a challenge-response protocol, we extend this notation to the concept of Homomorphic Veriable Responses (HVR), which is used to integrate multiple responses from the different CSPs in CPDP scheme as follows: Denition 2 (Homomorphic Veriable Response): A response is called homomorphic veriable response in a PDP protocol, if given two responses and for two challenges and from two CSPs, there exists an efcient algorithm to combine them into a response corresponding to the sum of the challenges . Homomorphic veriable response is the key technique of CPDP because it not only reduces the communication bandwidth, but also conceals the location of outsourced data in the distributed cloud storage environment.

C OOPERATIVE PDP S CHEME

In this section, we propose a CPDP scheme for multicloud system based on the above-mentioned structure and techniques. This scheme is constructed on collision-resistant hash, bilinear map group, aggregation algorithm, and homomorphic responses. 3.1 Notations and Preliminaries

Let = { } be a family of hash functions : {0, 1} {0, 1} index by . We say that algorithm has advantage in breaking collisionresistance of if Pr[( ) = (0 , 1 ) : 0 = 1 , (0 ) = (1 )] , where the probability is over the random choices of and the random bits of . So that, we have the following denition. Denition 3 (Collision-Resistant Hash): A hash family is (, )-collision-resistant if no -time adversary has advantage at least in breaking collisionresistance of . We set up our system using bilinear pairings proposed by Boneh and Franklin [14]. Let and be two multiplicative groups using elliptic curve conventions with a large prime order . The function is a computable bilinear map : with the following properties: for any , and all , , we have 1) Bilinearity: ([], [] ) = (, ) ; 2) Non-degeneracy: (, ) = 1 unless or = 1; and 3) Computability: (, ) is efciently computable. Denition 4 (Bilinear Map Group System): A bilinear map group system is a tuple = , , , composed of the objects as described above. 3.2 Our CPDP Scheme

In our scheme (see Fig 3), the manager rst runs algorithm to obtain the public/private key pairs for CSPs and users. Then, the clients generate the tags of outsourced data by using . Anytime, the protocol is performed by a 5-move interactive

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 7

KeyGen(1 ): Let = (, , , ) be a bilinear map group system with randomly selected generators , , where , are two bilinear groups of a large prime order , = (). Makes a hash function () public. For a CSP, chooses a random number and computes = . Thus, = and = (, ). For a user, chooses two random numbers , and sets = (, ) and = (, , 1 = , 2 = ). . Chooses random 1 , , as the TagGen(, , ): Splits into sectors {, }[1,], [1,] secret of this le and computes = for [1, ]. Constructs the index table = { } =1 and lls out the record a in for [1, ], then calculates the tag for each block as { (1) (2) ( ), (1) ( ), =1 , (3) (3) , (, ) ( ) , , (2) ( ), =1

where is the le name and is the CSP name of . And then stores = (, (1) , ) into TTP, and = {, } = to , where = (1 , , ). Finally, the data owner saves the secret = (1 , , ). Proof( , ): This is a 5-move protocol among the Provers ( = { }[1,] ), an organizer (), and a Verier ( ) with the common input (, ), which is stored in TTP, as follows: = 1 to the verier; 1) Commitment( ): the organizer chooses a random and sends 1 2) Challenge1( ): the verier chooses a set of challenge index-coefcient pairs = {(, )} and sends to the organizer, where is a set of random indexes in [1, ] and is a random integer in ; 3) Challenge2( ): the organizer forwards = {(, )} to each in ; 4) Response1( ): chooses a random and random , for [1, ], and calculates a response , , , + , , , ( , , 2 ),
(, ) where = {, } [1,] and = =1 , . Let , each sends = ( , , , ) to the organizer; 5) Response2( ): After receiving all responses from { }[1,] , the organizer aggregates { } into a nal response as ) , , , ( ) . (1) (

(, )

The organizer sends = ( , , ) to the verier. Let = Verication: Now the verier can check whether the response was correctly formed by checking that ( , ) = (
?

{ } [1,] .

(, )

(2) ( ) , 1 ) (

=1

, 2 ).

(2)

a. For = , , in Section 2.3, we can set = ( = , = 1, {0, 1} ) at initial stage of CPDP scheme.

Fig. 3. Cooperative Provable Data Possession for Multi-Cloud Storage. proof protocol between a verier and more than one CSP, in which CSPs need not to interact with each other during the verication process, but an organizer is used to organize and manage all CSPs. This protocol can be described as follows: 1) the organizer initiates the protocol and sends a commitment to the verier; 2) the verier returns a challenge set of random index-coefcient pairs to the organizer; 3) the organizer relays them into each in according to the exact position of each data block; 4) each returns its response of challenge to the organizer; and 5) the organizer synthesizes a nal response from received responses and sends it to the verier. The above process would guarantee that the verier accesses les without knowing on which CSPs or in what geographical locations their les reside. In contrast to a single CSP environment, our scheme differs from the common PDP scheme in two aspects: 1) Tag aggregation algorithm: In stage of commitment, the organizer generates a random and returns its commitment 1 to the verier. This assures that the verier and CSPs do not obtain the value of . Therefore, our approach guarantees only the organizer can compute the nal by using and received from CSPs. After is computed, we need to transfer it to the organizer in stage of Response1. In order to ensure the security of transmission of data tags, our scheme employs a new method, similar to the ElGamal to encrypt the combination of encryption, tags (, ) , that is, for = and = (, = ) 2 , the cipher of message is = (1 = , 2 = ) and its decryption is . Thus, we hold the equation performed by = 2 1

(, ) = = .
(, ) (, )

2) Homomorphic responses: Because of the homomorphic property, the responses computed from CSPs

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 8

in a multi-cloud can be combined into a single nal , , ) response as follows: given a set of = ( , received from , let = , , the organizer can compute
= = = =

4.1

Collision resistant for index-hash hierarchy

, =

, +
(, )

(, )

, ,

, + , +

(, )

(, )

, .

The commitment of is also computed by


= = =

( ) = ( , ) =1 ( , , 2 ) =1 , ( , 2 )= ( , 2 ).
=1 =1

In our CPDP scheme, the collision resistant of indexhash hierarchy is the basis and prerequisite for the security of whole scheme, which is described as being secure in the random oracle model. Although the hash function is collision resistant, a successful hash collision can still be used to produce a forged tag when the same hash value is reused multiple times, e.g., a legitimate client modies the data or repeats to insert and delete data blocks of outsourced data. To avoid (3) the hash collision, the hash value , , which is used to generate the tag in CPDP scheme, is computed from the set of values { }, , , { }. As long as there exists one bit difference in these data, we can avoid the hash collision. As a consequence, we have the following theorem (see Appendix B): Theorem 1 (Collision Resistant): The index-hash hierarchy in CPDP scheme is collision resistant, even if 1 les with the same the client generates 2 ln 1 le name and cloud name, and the client repeats 1 +1 2 ln 1 times to modify, insert and delete data blocks, where the collision probability is at least , , and = for . 4.2 Completeness property of verication

It is obvious that the nal response received by the veriers from multiple CSPs is same as that in one simple CSP. This means that our CPDP scheme is able to provide a transparent verication for the veriers. Two response algorithms, Response1 and Response2, comprise an HVR: Given two responses and for two challenges and from two CSPs, i.e., = 1(, { } , { } ), there exists an efcient algorithm to combine them into a nal response corresponding to the sum of the challenges , that is,
= = ( ) 1 , { } , { } 2( , ).

For multiple CSPs, the above equation can be extended to = 2({ } ). More importantly, the HVR is a pair of values = (, , ), which has a constant-size even for different challenges.

In our scheme, the completeness property implies public veriability property, which allows anyone, not just the client (data owner), to challenge the cloud server for data integrity and data ownership without the need for any secret information. First, for every available data-tag pair (, ) (, ) and a random challenge = (, ) , the verication protocol should be completed with success probability according to the Equation (3), that is,
( ) ( ) Pr ( , ) (, ) = 1 = 1.

S ECURITY A NALYSIS

We give a brief security analysis of our CPDP construction. This construction is directly derived from multi-prover zero-knowledge proof system (MPZKPS), which satises following properties for a given assertion : 1) Completeness: whenever , there exists a strategy for the provers that convinces the verier that this is the case; 2) Soundness: whenever , whatever strategy the provers employ, they will not convince the verier that ; 3) Zero-knowledge: no cheating verier can learn anything other than the veracity of the statement. According to existing IPS research [15], these properties can protect our construction from various attacks, such as data leakage attack (privacy leakage), tag forgery attack (ownership cheating), etc. In details, the security of our scheme can be analyzed as follows:

In this process, anyone can obtain the owners public key = (, , 1 = , 2 = ) and the corresponding le parameter = (, (1) , ) from TTP to execute the verication protocol, hence this is a public veriable protocol. Moreover, for different owners, the secrets and hidden in their public key are also different, determining that a success verication can only be implemented by the real owners public key. In addition, the parameter is used to store the le-related information, so an owner can employ a unique public key to deal with a large number of outsourced les. 4.3 Zero-knowledge property of verication

The CPDP construction is in essence a Multi-Prover Zero-knowledge Proof (MP-ZKP) system [11], which can be considered as an extension of the notion of

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 9

( , )

= = = =

=1

) ( ( , 2 ) ( , 2

(, )

, )

=1 =1

((, ) (

(3)

=1

) )

, )
,

(, )

( , 2 )

(3) ( ( ) , ) (, ) =1

=1

(, )

, ) (3)

( ) , 1 )

(3)

( , 2 ).

(, )

an interactive proof system (IPS). Roughly speaking, in the scenario of MP-ZKP, a polynomial-time bounded verier interacts with several provers whose computational powers are unlimited. According to a Simulator model, in which every cheating verier has a simulator that can produce a transcript that looks like an interaction between a honest prover and a cheating verier, we can prove our CPDP construction has Zero-knowledge property (see Appendix C): Theorem 2 (Zero-Knowledge Property): The verication protocol ( , ) in CPDP scheme is a computational zero-knowledge system under a simulator model, that is, for every probabilistic polynomial-time interactive machine , there exists a probabilistic polynomial-time algorithm such that the ensem bles ( ( () , () ) (, )) and (, ) are computationally indistinguishable. Zero-knowledge is a property that achieves the CSPs robustness against attempts to gain knowledge by interacting with them. For our construction, we make use of the zero-knowledge property to preserve the privacy of data blocks and signature tags. Firstly, randomness is adopted into the CSPs responses in order to resist the data leakage attacks (see Attacks 1 and 3 in Appendix A). That is, the random integer introduced into the response , , i.e., , = , is , + (, ) , . This means that the cheating verier cannot obtain , from , because he does not know the random integer , . At the same time, a random integer is also introduced to randomize ) . the verication tag , i.e., ( Thus, the tag cannot reveal to the cheating verier in terms of randomness. 4.4 Knowledge soundness of verication For every data-tag pairs ( , ) (, ), in order to prove nonexistence of fraudulent and , we require that the scheme satises the knowledge soundness property, that is,
Pr ( () , () ) (, ) = 1 ,

using reduction to absurdity 1 : we make use of to construct a knowledge extractor [7,13], which gets the common input (, ) and rewindable blackbox accesses to the prover , and then attempts to break the computational Dife-Hellman (CDH) problem in : given , 1 = , 2 = , output . But it is unacceptable because the CDH problem is widely regarded as an unsolved problem in polynomial-time. Thus, the opposite direction of the theorem also follows. We have the following theorem (see Appendix D): Theorem 3 (Knowledge Soundness Property): Our scheme has (, ) knowledge soundness in random oracle and rewindable knowledge extractor model assuming the (, )-computational Dife-Hellman (CDH) assumption holds in the group for . Essentially, the soundness means that it is infeasible to fool the verier to accept false statements. Often, the soundness can also be regarded as a stricter notion of unforgeability for le tags to avoid cheating the ownership. This means that the CSPs, even if collusion is attempted, cannot be tampered with the data or forge the data tags if the soundness property holds. Thus, the Theorem 3 denotes that the CPDP scheme can resist the tag forgery attacks (see Attacks 2 and 4 in Appendix A) to avoid cheating the CSPs ownership.

P ERFORMANCE E VALUATION

In this section, to detect abnormality in a lowoverhead and timely manner, we analyze and optimize the performance of CPDP scheme based on the above scheme from two aspects: evaluation of probabilistic queries and optimization of length of blocks. To validate the effects of scheme, we introduce a prototype of CPDP-based audit system and present the experimental results. 5.1 Performance Analysis for CPDP Scheme

We present the computation cost of our CPDP scheme in Table 3. We use [ ] to denote the computation cost of an exponent operation in , namely, , where is a positive integer in and or . We neglect the computation cost of algebraic operations and
1. It is a proof method in which a proposition is proved to be true by proving that it is impossible to be false.

where is a negligible error. We prove that our scheme has the knowledge soundness property by

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 10

simple modular arithmetic operations because they run fast enough [16]. The most complex operation is the computation of a bilinear map (, ) between two elliptic points (denoted as [ ]). TABLE 3 Comparison of computation overheads between our CPDP scheme and non-cooperative (trivial) scheme.
KeyGen TagGen Proof( ) Proof(V) CPDP Scheme 3[ ] (2 + )[ ] [ ] + ( + + 1)[ ] 3[ ] + ( + )[ ] Trivial Scheme 2[E] (2 + )[ ] [ ] + ( + )[ ] 3[ ] + ( + )[ ]

means much lower storage. Furthermore, in the verication protocol, the communication overhead of challenge is 2 0 = 40 -Bytes in terms of the number of challenged blocks , but its response (response1 or response2) has a constant-size communication overhead 0 + 1 + 1.3 -bytes for different le sizes. Also, it implies that clients communication overheads are of a xed size, which is entirely irrelevant for the number of CSPs. 5.2 Probabilistic Verication

Then, we analyze the storage and communication costs of our scheme. We dene the bilinear pairing takes the form : ( ) ( ) (The denition given here is from [17], [18]), where is a prime, is a positive integer, and is the embedding degree (or security multiplier). In this case, we utilize an asymmetric pairing : 1 2 to replace the symmetric pairing in the original schemes. In Table 3, it is easy to nd that clients computation overheads are entirely irrelevant for the number of CSPs. Further, our scheme has better performance compared with non-cooperative approach due to the total of computation overheads decrease 3( 1) times bilinear map operations, where is the number of clouds in a multicloud. The reason is that, before the responses are sent to the verier from clouds, the organizer has aggregate these responses into a response by using aggregation algorithm, so the verier only need to verify this response once to obtain the nal result. TABLE 4 Comparison of communication overheads between our CPDP and non-cooperative (trivial) scheme.
Commitment Challenge1 Challenge2 Response1 Response2 CPDP Scheme 2 20 20 / 0 + 21 + 0 + 1 + Trivial Scheme 2 20 (0 + 1 + )

We recall the probabilistic verication of common PDP scheme (which only involves one CSP), in which the verication process achieves the detection of CSP server misbehavior in a random sampling mode in order to reduce the workload on the server. The detection probability of disrupted blocks is an important parameter to guarantee that these blocks can be detected in time. Assume the CSP modies blocks out of the -block le, that is, the probability . Let be the number of disrupted blocks is = of queried blocks for a challenge in the verication protocol. We have detection probability 2 ) = 1 (1 ) , where, ( , ) denotes that the probability is a function over and . Hence, the number of queried log(1 ) blocks is log(1 ) for a sufciently large 3 and . This means that the number of queried blocks is directly proportional to the total number of le blocks for the constant and . Therefore, for a uniform random verication in a PDP scheme with fragment structure, given a le with = sectors and the probability of sector corruption , the detection probability of verication protocol has 1 (1 ) , where denotes the sampling probability in the verication protocol. We can obtain this result as follows: because 1 (1 ) is the probability of block corruption with sectors in common PDP scheme, the verier can detect block errors with probability 1 (1 ) 1 ((1 ) ) = 1 (1 ) for a challenge with = index-coefcient pairs. In the same way, given a multi-cloud = { }[1,] , the detection probability of CPDP scheme has ( , ) 1 ( (, { , } , ) ((1 ) ) 1 = 1 (1 ) ,

Without loss of generality, let the security parameter be 80 bits, we need the elliptic curve domain parameters over with = 160 bits and = 1 in our experiments. This means that the length of integer is 0 = 2 in . Similarly, we have 1 = 4 in 1 , 2 = 24 in 2 , and = 24 in for the embedding degree = 6. The storage and communication costs of our scheme is shown in Table 4. The storage overhead of a le with ( ) = 1 -bytes is ( ) = 0 + 1 = 1.04 -bytes for = 103 and = 50. The storage overhead of its index table is 0 = 20 -bytes. We dene the overhead rate as ( ) 1 = ( ) 1 = 0 and it should therefore be kept as low as possible in order to minimize the storage in cloud storage providers. It is obvious that a higher

where denotes the proportion of data blocks in the -th CSP, denotes the probability of le corruption
2. Exactly, we have = 1 (1 ) (1 ) (1 ). 1 +1 1 (1 Since 1 1 for [0, 1], we have = 1 =0 1 ) 1 =0 (1 ) = 1 (1 ) . ) 1 , we have 1 (1 ) = . 3. In terms of (1

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 11

TABLE 5 The inuence of , under the different corruption probabilities and the different detection probabilities .
0.8 0.85 0.9 0.95 0.99 0.999 {0.1,0.2,0.01} {0.5,0.3,0.2} 3/4 3/5 3/6 3/8 4/10 5/11 {0.01,0.02,0.001} {0.5,0.3,0.2} 7/20 8/21 10/20 11/29 13/31 16/38 {0.001,0.002,0.0001} {0.5,0.3,0.2} 23/62 26/65 28/73 31/86 39/105 48/128 {0.0001,0.0002,0.00001} {0.5,0.3,0.2} 71/202 79/214 87/236 100/267 119/345 146/433

in the -th CSP, and denotes the possible number of blocks queried by the verier in the -th CSP. Furthermore, we observe the ratio of queried blocks in the total le blocks under different detection probabilities. Based on above analysis, it is easy to nd that this ratio holds the equation log(1 ) . log(1 )

5.3

Parameter Optimization

When this probability is a constant probability, the verier can detect sever misbehavior with a certain probability by asking proof for the number of log(1 ) for PDP or blocks log(1 ) log(1 ) log(1 )

for CPDP, where = = . Note that, the value of is dependent on the total number of le blocks [2], because it is increased along with the decrease of and log(1 ) < 0 for the constant number of disrupted blocks and the larger number .

300
0.800 0.850 0.900

In the fragment structure, the number of sectors per block is an important parameter to affect the performance of storage services and audit services. Hence, we propose an optimization algorithm for the value of s in this section. Our results show that the optimal value can not only minimize the computation and communication overheads, but also reduce the size of extra storage, which is required to store the verication tags in CSPs. Assume denotes the probability of sector corruption. In the fragment structure, the choosing of is extremely important for improving the performance of the CPDP scheme. Given the detection probability and the probability of sector corruption for multiple clouds = { }, the } { optimal value of can be com ) puted by min log(1 + + , log(1 ) where + + denotes the computational cost of verication protocol in PDP scheme, , , , and is a constant. This conclusion can be obtained from following process: Let = = ( )/0. According to above-mentioned results, the sam ) pling probability holds log(1 log(1 ) = In order to minimize the com putational cost, we have
log(1 ) . log(1 )

250
Computational Compexity

0.950

200

0.990 0.999

min { + + }

150

=
0 10 20 30 40 50

100

50

min { + + } { } log(1 ) min + + . log(1 )

The number of sectors in each block

Fig. 4. The relationship between computational cost and the number of sectors in each block. Another advantage of probabilistic verication based on random sampling is that it is easy to identify the tampering or forging data blocks or tags. The identication function is obvious: when the verication fails, we can choose the partial set of challenge indexes as a new challenge set, and continue to execute the verication protocol. The above search process can be repeatedly executed until the bad block is found. The complexity of such a search process is (log ).

where denotes the proportion of data blocks in the -th CSP, denotes the probability of le corruption in the -th CSP. Since is a monotone decreasing function and is a monotone increasing function for > 0, there exists an optimal value of in the above equation. The optimal value of is unrelated to a certain le from this conclusion if the probability is a constant value. For instance, we assume a multi-cloud storage involves three CSPs = {1 , 2 , 3 } and the probability of sector corruption is a constant value {1 , 2 , 3 } = {0.01, 0.02, 0.001}. We set the detection probability with the range from 0.8 to 1, e.g., = {0.8, 0.85, 0.9, 0.95, 0.99, 0.999}. For a le, the

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 12

7KLUG 3DUW\ $XGLWRU 73$


3URWRFRO 3DUDPHWHU

1DPH6SDFH
1DPH1RGH
&OLHQWV $SSOLFDWLRQV
9HULILFDWLRQ 3URWRFRO 'DWD %ORFNV &3'3 & / 8 6 7 ( 5 &OXVWHU 0HPEHUVKLS

1DPH1RGH
&3'3

'DWD %ORFNV

'DWD1RGHV

&OXVWHU 0HPEHUVKLS

'DWD1RGHV

'DWD1RGHV

Fig. 5. Applying CPDP scheme in Hadoop distributed le system (HDFS). proportion of data blocks is 50%, 30%, and 20% in three CSPs, respectively, that is, 1 = 0.5, 2 = 0.3, and 3 = 0.2. In terms of Table 3, the computational cost of CSPs can be simplied to + 3 + 9. Then, we can observe the computational cost under different and in Figure 4. When is less than the optimal value, the computational cost decreases evidently with the increase of , and then it raises when is more than the optimal value. TABLE 6 The inuence of parameters under different detection probabilities ( = {1 , 2 , 3 } = {0.01, 0.02, 0.001}, {1 , 2 , 3 } = {0.5, 0.3, 0.2}).
P 0.8 142.60 7 20 0.85 168.09 8 21 0.9 204.02 10 20 0.95 265.43 11 29 0.99 408.04 13 31 0.999 612.06 16 38

5.4

CPDP for Integrity Audit Services

More accurately, we show the inuence of parameters, , , and , under different detection probabilities in Table 6. It is easy to see that computational cost raises with the increase of . Moreover, we can make sure the sampling number of challenge with following conclusion: Given the detection probability , the probability of sector corruption , and the number of sectors in each block , the sampling number of verication protocol are a constant = log(1 ) log(1 ) for different les.

Finally, we observe the change of under different and . The experimental results are shown in Table 5. It is obvious that the optimal value of raises with increase of and with the decrease of . We choose the optimal value of on the basis of practical settings and system requisition. For NTFS format, we suggest that the value of is 200 and the size of block is 4KBytes, which is the same as the default size of cluster when the le size is less than 16TB in NTFS. In this case, the value of ensures that the extra storage doesnt exceed 1% in storage servers.

Based on our CPDP scheme, we introduce an audit system architecture for outsourced data in multiple clouds by replacing the TTP with a third party auditor (TPA) in Figure 1. In this architecture, this architecture can be constructed into a visualization infrastructure of cloud-based storage service [1]. In Figure 5, we show an example of applying our CPDP scheme in Hadoop distributed le system (HDFS) 4 , which a distributed, scalable, and portable le system [19]. HDFS architecture is composed of NameNode and DataNode, where NameNode maps a le name to a set of indexes of blocks and DataNode indeed stores data blocks. To support our CPDP scheme, the index-hash hierarchy and the metadata of NameNode should be integrated together to provide an enquiry (3) service for the hash value , or index-hash record . Based on the hash value, the clients can implement the verication protocol via CPDP services. Hence, it is easy to replace the checksum methods with the CPDP scheme for anomaly detection in current HDFS. To validate the effectiveness and efciency of our proposed approach for audit services, we have implemented a prototype of an audit system. We simulated the audit service and the storage service by using two local IBM servers with two Intel Core 2 processors at 2.16 GHz and 500M RAM running Windows Server 2003. These servers were connected via 250 MB/sec of network bandwidth. Using GMP and PBC libraries, we have implemented a cryptographic library upon which our scheme can be constructed. This C library contains approximately 5,200 lines of codes and has been tested on both Windows and Linux platforms. The elliptic curve utilized in the experiment is a MNT curve, with base eld size of 160 bits and the embedding degree 6. The security level is chosen to be 80 bits, which means = 160.
4. Hadoop can enable applications to work with thousands of nodes and petabytes of data, and it has been adopted by currently mainstream cloud platforms from Apache, Google, Yahoo, Amazon, IBM and Sun.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 13

Computation and communication costs. (s)

180 150 120 90 60 30 0


Computation and Communcation costs. (s)

100

ratio=50% ratio=40% ratio=30% ratio=20% ratio=10%

10
Commitment Challenge1 Challenge2(CSP2) Challenge2(CSP3) Response1(CSP2)

Response1(CSP3) Response1(CSP1) Response2 Verification Total Time

0.1

10

100
(s=50)

1000
(s=100)

10000
(s=250)

(s=20)

0.01 0.1

0.2

0.3

0.4

0.5

The size of files. (K-Bytes)

The ratio of queried blocks for total file blocks. (%) (three CSP, r=(50%,30%,20%), 10M-Bytes, 250 sectors/blocks)

Fig. 6. Experimental results under different le size, sampling ratio, and sector number. Firstly, we quantify the performance of our audit scheme under different parameters, such as le size , sampling ratio , sector number per block , and so on. Our analysis shows that the value of should grow with the increase of in order to reduce computation and communication costs. Thus, our experiments were carried out as follows: the stored les were chosen from 10KB to 10MB; the sector numbers were changed from 20 to 250 in terms of le sizes; and the sampling ratios were changed from 10% to 50%. The experimental results are shown in the left side of Figure 6. These results dictate that the computation and communication costs (including I/O costs) grow with the increase of le size and sampling ratio. Next, we compare the performance of each activity in our verication protocol. We have shown the theoretical results in Table 4: the overheads of commitment and challenge resemble one another, and the overheads of response and verication resemble one another as well. To validate the theoretical results, we changed the sampling ratio from 10% to 50% for a 10MB le and 250 sectors per block in a multi-cloud = {1 , 2 , 3 }, in which the proportions of data blocks are 50%, 30%, and 20% in three CSPs, respectively. In the right side of Figure 6, our experimental results show that the computation and communication costs of commitment and challenge are slightly changed along with the sampling ratio, but those for response and verication grow with the increase of the sampling ratio. Here, challenge and response can be divided into two sub-processes: challenge1 and challenge2, as well as response1 and response2, respectively. Furthermore, the proportions of data blocks in each CSP have greater inuence on the computation costs of challenge and response processes. In summary, our scheme has better performance than non-cooperative approach. Based on homomorphic veriable response and hash index hierarchy, we have proposed a cooperative PDP scheme to support dynamic scalability on multiple storage servers. We also showed that our scheme provided all security properties required by zeroknowledge interactive proof system, so that it can resist various attacks even if it is deployed as a public audit service in clouds. Furthermore, we optimized the probabilistic query and periodic verication to improve the audit performance. Our experiments clearly demonstrated that our approaches only introduce a small amount of computation and communication overheads. Therefore, our solution can be treated as a new candidate for data integrity verication in outsourcing data storage systems. As part of future work, we would extend our work to explore more effective CPDP constructions. First, from our experiments we found that the performance of CPDP scheme, especially for large les, is affected by the bilinear mapping operations due to its high complexity. To solve this problem, RSAbased constructions may be a better choice, but this is still a challenging task because the existing RSAbased schemes have too many restrictions on the performance and security [2]. Next, from a practical point of view, we still need to address some issues about integrating our CPDP scheme smoothly with existing systems, for example, how to match indexhash hierarchy with HDFSs two-layer name space, how to match index structure with cluster-network model, and how to dynamically update the CPDP parameters according to HDFS specic requirements. Finally, it is still a challenging problem for the generation of tags with the length irrelevant to the size of data blocks. We would explore such a issue to provide the support of variable-length block verication.

C ONCLUSIONS

ACKNOWLEDGMENTS
The work of Y. Zhu and M. Yu was supported by the National Natural Science Foundation of China (Project No.61170264 and No.10990011). This work of Gail-J.

In this paper, we presented the construction of an efcient PDP scheme for distributed cloud storage.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 14

Ahn and Hongxin Hu was partially supported by the grants from US National Science Foundation (NSFIIS-0900970 and NSF-CNS-0831360) and Department of Energy (DE-SC0004308).

R EFERENCES
[1] B. Sotomayor, R. S. Montero, I. M. Llorente, and I. T. Foster, Virtual infrastructure management in private and hybrid clouds, IEEE Internet Computing, vol. 13, no. 5, pp. 1422, 2009. G. Ateniese, R. C. Burns, R. Curtmola, J. Herring, L. Kissner, Z. N. J. Peterson, and D. X. Song, Provable data possession at untrusted stores, in ACM Conference on Computer and Communications Security, P. Ning, S. D. C. di Vimercati, and P. F. Syverson, Eds. ACM, 2007, pp. 598609. A. Juels and B. S. K. Jr., Pors: proofs of retrievability for large les, in ACM Conference on Computer and Communications Security, P. Ning, S. D. C. di Vimercati, and P. F. Syverson, Eds. ACM, 2007, pp. 584597. G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, Scalable and efcient provable data possession, in Proceedings of the 4th international conference on Security and privacy in communication netowrks, SecureComm, 2008, pp. 110. C. C. Erway, A. Kupc u, C. Papamanthou, and R. Tamassia, Dynamic provable data possession, in ACM Conference on Computer and Communications Security, E. Al-Shaer, S. Jha, and A. D. Keromytis, Eds. ACM, 2009, pp. 213222. H. Shacham and B. Waters, Compact proofs of retrievability, in ASIACRYPT, ser. Lecture Notes in Computer Science, J. Pieprzyk, Ed., vol. 5350. Springer, 2008, pp. 90107. Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou, Enabling public veriability and data dynamics for storage security in cloud computing, in ESORICS, ser. Lecture Notes in Computer Science, M. Backes and P. Ning, Eds., vol. 5789. Springer, 2009, pp. 355370. Y. Zhu, H. Wang, Z. Hu, G.-J. Ahn, H. Hu, and S. S. Yau, Dynamic audit services for integrity verication of outsourced storages in clouds, in SAC, W. C. Chu, W. E. Wong, M. J. Palakal, and C.-C. Hung, Eds. ACM, 2011, pp. 15501557. K. D. Bowers, A. Juels, and A. Oprea, Hail: a high-availability and integrity layer for cloud storage, in ACM Conference on Computer and Communications Security, E. Al-Shaer, S. Jha, and A. D. Keromytis, Eds. ACM, 2009, pp. 187198. Y. Dodis, S. P. Vadhan, and D. Wichs, Proofs of retrievability via hardness amplication, in TCC, ser. Lecture Notes in Computer Science, O. Reingold, Ed., vol. 5444. Springer, 2009, pp. 109127. L. Fortnow, J. Rompel, and M. Sipser, On the power of multiprover interactive protocols, in Theoretical Computer Science, 1988, pp. 156161. Y. Zhu, H. Hu, G.-J. Ahn, Y. Han, and S. Chen, Collaborative integrity verication in hybrid clouds, in IEEE Conference on the 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom, Orlando, Florida, USA, October 15-18, 2011, pp. 197206. M. Armbrust, A. Fox, R. Grifth, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, Above the clouds: A berkeley view of cloud computing, EECS Department, University of California, Berkeley, Tech. Rep., Feb 2009. D. Boneh and M. Franklin, Identity-based encryption from the weil pairing, in Advances in Cryptology (CRYPTO2001), vol. 2139 of LNCS, 2001, pp. 213229. O. Goldreich, Foundations of Cryptography: Basic Tools. Cambridge University Press, 2001. P. S. L. M. Barreto, S. D. Galbraith, C. OEigeartaigh, and M. Scott, Efcient pairing computation on supersingular abelian varieties, Des. Codes Cryptography, vol. 42, no. 3, pp. 239271, 2007. J.-L. Beuchat, N. Brisebarre, J. Detrey, and E. Okamoto, Arithmetic operators for pairing-based cryptography, in CHES, ser. Lecture Notes in Computer Science, P. Paillier and I. Verbauwhede, Eds., vol. 4727. Springer, 2007, pp. 239255.

[18] H. Hu, L. Hu, and D. Feng, On a class of pseudorandom sequences from elliptic curves over nite elds, IEEE Transactions on Information Theory, vol. 53, no. 7, pp. 25982605, 2007. [19] A. Bialecki, M. Cafarella, D. Cutting, and O. OMalley, Hadoop: A framework for running applications on large clusters built of commodity hardware, Tech. Rep., 2005. [Online]. Available: http://lucene.apache.org/hadoop/ [20] E. Al-Shaer, S. Jha, and A. D. Keromytis, Eds., Proceedings of the 2009 ACM Conference on Computer and Communications Security, CCS 2009, Chicago, Illinois, USA, November 9-13, 2009. ACM, 2009.

[2]

[3]

[4]

Yan Zhu received the Ph.D. degree in computer science from Harbin Engineering University, China, in 2005. He was an associate professor of computer science in the Institute of Computer Science and Technology at Peking University since 2007. He worked at the Department of Computer Science and Engineering, Arizona State University as a visiting associate professor from 2008 to 2009. His research interests include cryptography and network security.

[5]

[6] [7]

Hongxin Hu is currently working toward the Ph.D. degree from the School of Computing, Informatics, and Decision Systems Engineering, Ira A. Fulton School of Engineering, Arizona State University. He is also a member of the Security Engineering for Future Computing Laboratory, Arizona State University. His current research interests include access control models and mechanisms, security and privacy in social networks, and security in distributed and cloud computing, network and system security and secure software engineering.

[8]

[9]

[10]

[11] [12]

[13]

[14] [15] [16]

Gail-Joon Ahn is an Associate Professor in the School of Computing, Informatics, and Decision Systems Engineering, Ira A. Fulton Schools of Engineering and the Director of Security Engineering for Future Computing Laboratory, Arizona State University. His research interests include information and systems security, vulnerability and risk management, access control, and security architecture for distributed systems, which has been supported by the U.S. National Science Foundation, National Security Agency, U.S. Department of Defense, U.S. Department of Energy, Bank of America, Hewlett Packard, Microsoft, and Robert Wood Johnson Foundation. Dr. Ahn is a recipient of the U.S. Department of Energy CAREER Award and the Educator of the Year Award from the Federal Information Systems Security Educators Association. He was an Associate Professor at the College of Computing and Informatics, and the Founding Director of the Center for Digital Identity and Cyber Defense Research and Laboratory of Information Integration, Security, and Privacy, University of North Carolina, Charlotte. He received the Ph.D. degree in information technology from George Mason University, Fairfax, VA, in 2000.

Mengyang Yu received his B.S. degree from the School of Mathematics Science, Peking University in 2010. He is currently a M.S. candidate in Peking University. His research interests include cryptography and computer security.

[17]

You might also like