Professional Documents
Culture Documents
Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
13 Years of Experience Automated Services 24/7 Help Desk Support Experience & Expertise Developers Advanced Technologies & Tools Legitimate Member of all Journals Having 1,50,000 Successive records in all Languages More than 12 Branches in Tamilnadu, Kerala & Karnataka. Ticketing & Appointment Systems. Individual Care for every Student. Around 250 Developers & 20 Researchers
227-230 Church Road, Anna Nagar, Madurai 625020. 0452-4390702, 4392702, + 91-9944793398. info@elysiumtechnologies.com, elysiumtechnologies@gmail.com
S.P.Towers, No.81 Valluvar Kottam High Road, Nungambakkam, Chennai - 600034. 044-42072702, +91-9600354638, chennai@elysiumtechnologies.com
15, III Floor, SI Towers, Melapudur main Road, Trichy 620001. 0431-4002234, + 91-9790464324. trichy@elysiumtechnologies.com
577/4, DB Road, RS Puram, Opp to KFC, Coimbatore 641002 0422- 4377758, +91-9677751577. coimbatore@elysiumtechnologies.com
1st Floor, A.R.IT Park, Rasi Color Scan Building, Ramanathapuram - 623501. 04567-223225, +919677704922.ramnad@elysiumtechnologies.com
Plot No: 4, C Colony, P&T Extension, Perumal puram, Tirunelveli627007. 0462-2532104, +919677733255, tirunelveli@elysiumtechnologies.com
74, 2nd floor, K.V.K Complex,Upstairs Krishna Sweets, Mettur Road, Opp. Bus stand, Erode-638 011. 0424-4030055, +919677748477 erode@elysiumtechnologies.com
No: 88, First Floor, S.V.Patel Salai, Pondicherry 605 001. 0413 4200640 +91-9677704822 pondy@elysiumtechnologies.com
TNHB A-Block, D.no.10, Opp: Hotel Ganesh Near Busstand. Salem 636007, 0427-4042220, +91-9894444716. salem@elysiumtechnologies.com
Abstract: Broadband wireless access (BWA) networks, such as LTE and WiMAX, are inherently lossy due to wireless medium unreliability. Although the Hybrid Automatic Repeat reQuest (HARQ) errorcontrol method recovers from packet loss, it has low transmission efficiency and is unsuitable for delaysensitive applications. Alternatively, network coding techniques improve the throughput of wireless networks, but incur significant overhead and ignore network constraints such as Medium Access Control (MAC) layer transmission opportunities and physical (PHY) layer channel conditions. The present study provides analysis of Random Network Coding (RNC) and Systematic Network Coding (SNC) decoding probabilities. Based on the analytical results, SNC is selected for developing an adaptive network coding scheme designated as Frame-by-frame Adaptive Systematic Network Coding (FASNC). According to network constraints per frame, FASNC dynamically utilizes either Modified Systematic Network Coding (M-SNC) or Mixed Generation Coding (MGC). An analytical model is developed for evaluating the mean decoding delay and mean goodput of the proposed FASNC scheme. The results derived using this model agree with those obtained from computer simulations. Simulations show that FASNC results in both lower decoding delay and reduced buffer requirements compared to MRNC and N-in-1 ReTX, while also yielding higher goodput than HARQ, MRNC, and N-in-1 ReTX. ETPL PDS-002 Covering Points of Interest with Mobile Sensors
Abstract: The coverage of Points of Interest (PoI) is a classical requirement in mobile wireless sensor applications. Optimizing the sensors self-deployment over a PoI while maintaining the connectivity between the sensors and the base station is thus a fundamental issue. This paper addresses the problem of autonomous deployment of mobile sensors that need to cover a predefined PoI with a connectivity constraint. In our algorithm, each sensor moves toward a PoI but has also to maintain the connectivity with a subset of its neighboring sensors that are part of the Relative Neighborhood Graph (RNG). The Relative Neighborhood Graph reduction is chosen so that global connectivity can be provided locally. Our deployment scheme minimizes the number of sensors used for connectivity thus increasing the number of monitoring sensors. Analytical results, simulation results and practical implementation are provided to show the efficiency of our algorithm. ETPL PDS-003 Detection and Localization of Multiple Spoofing Attackers in Wireless Networks
Abstract: Wireless spoofing attacks are easy to launch and can significantly impact the performance of networks. Although the identity of a node can be verified through cryptographic authentication, conventional security approaches are not always desirable because of their overhead requirements. In this paper, we propose to use spatial information, a physical property associated with each node, hard to falsify, and not reliant on cryptography, as the basis for 1) detecting spoofing attacks; 2) determining the number of attackers when multiple adversaries masquerading as the same node identity; and 3) localizing multiple adversaries. We propose to use the spatial correlation of received signal strength (RSS) inherited from wireless nodes to detect the spoofing attacks. We then formulate the problem of determining the number of attackers as a multiclass detection problem. Cluster-based mechanisms are developed to determine the number of attackers. When the training data are available, we explore using the Support Vector Machines (SVM) method to further improve the accuracy of determining the number of attackers.
Abstract: Utility computing models have long been the focus of academic research, and with the recent success of commercial cloud providers, computation and storage is finally being realized as the fifth utility. Computational economies are often proposed as an efficient means of resource allocation, however adoption has been limited due to a lack of performance and high overheads. In this paper, we address the performance limitations of existing economic allocation models by defining strategies to reduce the failure and reallocation rate, increase occupancy and thereby increase the obtainable utilization of the system. The high-performance resource utilization strategies presented can be used by market participants without requiring dramatic changes to the allocation protocol. The strategies considered include overbooking, advanced reservation, just-in-time bidding, and using substitute providers for service delivery. The proposed strategies have been implemented in a distributed metascheduler and evaluated with respect to Grid and cloud deployments. Several diverse synthetic workloads have been used to quantity both the performance benefits and economic implications of these strategies. ETPL PDS-006 Mapping a Jacobi Iterative Solver onto a High-Performance Heterogeneous Computer
Abstract: High-performance heterogeneous computers that employ field programmable gate arrays (FPGAs) as computational elements are known as high-performance reconfigurable computers (HPRCs). For floating-point applications, these FPGA-based processors must satisfy a variety of heuristics and rules of thumb to achieve a speedup compared with their software counterparts. By way of a simple sparse matrix Jacobi iterative solver, this paper illustrates some of the issues associated with mapping floating-
Abstract: We propose an NFA-based algorithm called MIN-MAX to support matching of regular expressions (regexp) composed of Character Classes with Constraint Repetitions (CCR). MIN-MAX is well suited for massive parallel processing architectures, such as FPGAs, yet it is effective on any other computing platform. In MIN-MAX, each active CCR engine (to implement one CCR term) evaluates input characters, updates (MIN, MAX) counters, and asserts control signals, and all the CCR engines implemented in the FPGA run simultaneously. Unlike traditional designs, (MIN, MAX) counters contain dynamically updated lower and upper bounds of possible matching counts, instead of actual matching counts, so that feasible matching lengths are compactly enclosed in the counter value. The counter-based design can support constraint repetitions of n using O({rm log} n) memory bits rather than that of O(n) in existing solutions. MIN-MAX can resolve character class ambiguity between adjacent CCR terms and support overlapped matching when matching collisions are absent. We developed a set of heuristic rules to assess the absence of collision for CCR-based regexps, and tested them on Snort and SpamAssassin rule sets. The results show that the vast majority of rules are immune from collisions, so that MIN-MAX can cost effectively support overlapped matching. As a bonus, the new architecture also supports fast reconfiguration via ordinary memory writes rather than resynthesis of the entire design, which is critical for time-sensitive regexp deployment scenarios. ETPL PDS-008 Network Traffic Classification Using Correlation Information
Abstract: Traffic classification has wide applications in network management, from security monitoring to quality of service measurements. Recent research tends to apply machine learning techniques to flow statistical feature based classification methods. The nearest neighbor (NN)-based method has exhibited superior classification performance. It also has several important advantages, such as no requirements of training procedure, no risk of overfitting of parameters, and naturally being able to handle a huge number of classes. However, the performance of NN classifier can be severely affected if the size of training data is small. In this paper, we propose a novel nonparametric approach for traffic classification, which can improve the classification performance effectively by incorporating correlated information into the classification process. We analyze the new classification approach and its performance benefit from both theoretical and empirical perspectives. A large number of experiments are carried out on two real-world traffic data sets to validate the proposed approach. The results show the traffic classification performance can be improved significantly even under the extreme difficult circumstance of very few training samples.
Abstract: Online task scheduling in heterogeneous multicore system-on-a-chip is a challenging problem due to precedence constraints and nonpreemptive task execution in the synergistic processor core. This study first proposes an online heterogeneous dual-core scheduling framework for dynamic workloads with real-time constraints. The general purpose processor core and the synergistic processor core are dedicated to separate schedulers with different scheduling policies, and precedence constraints among tasks are dealt with through interaction between the two schedulers. This framework is also configurable for low priority inversion and high system utilization. We then extend this framework to heterogeneous multicore systems with well-known dispatcher schemas. This paper presents a real case study to show the practicability of the proposed methodology, and presents a series of extensive simulations to obtain comparison studies using different workloads and scheduling algorithms. ETPL PDS-010 Scalable and Secure Sharing of Personal Health Records in Cloud Computing Using Attribute-Based Encryption,
Abstract: Personal health record (PHR) is an emerging patient-centric model of health information exchange, which is often outsourced to be stored at a third party, such as cloud providers. However, there have been wide privacy concerns as personal health information could be exposed to those third party servers and to unauthorized parties. To assure the patients' control over access to their own PHRs, it is a promising method to encrypt the PHRs before outsourcing. Yet, issues such as risks of privacy exposure, scalability in key management, flexible access, and efficient user revocation, have remained the most important challenges toward achieving fine-grained, cryptographically enforced data access control. In this paper, we propose a novel patient-centric framework and a suite of mechanisms for data access control to PHRs stored in semitrusted servers. To achieve fine-grained and scalable data access control for PHRs, we leverage attribute-based encryption (ABE) techniques to encrypt each patient's PHR file. Different from previous works in secure data outsourcing, we focus on the multiple data owner scenario, and divide the users in the PHR system into multiple security domains that greatly reduces the key management complexity for owners and users. A high degree of patient privacy is guaranteed simultaneously by exploiting multiauthority ABE. Our scheme also enables dynamic modification of access policies or file attributes, supports efficient on-demand user/attribute revocation and break-glass access under emergency scenarios. Extensive analytical and experimental results are presented which show the security, scalability, and efficiency of our proposed scheme. ETPL PDS-011 Strategies for Energy-Efficient Resource Management of Hybrid Programming Models
Abstract: Many scientific applications are programmed using hybrid programming models that use both message passing and shared memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. Previous work has shown that energy efficiency can be improved using software-controlled execution schemes that consider both the programming model and the power-aware execution capabilities of the system. However, such approaches have focused on identifying optimal resource utilization for one programming model, either shared memory or message passing, in isolation. The potential solution space, thus the challenge, increases substantially when optimizing hybrid models since the possible resource configurations increase exponentially. Nonetheless, with the accelerating adoption of hybrid programming models, we increasingly need improved energy efficiency in hybrid
Abstract: Current High Performance Computing (HPC) applications have seen an explosive growth in the size of data in recent years. Many application scientists have initiated efforts to integrate data-intensive computing into computational-intensive HPC facilities, particularly for data analytics. We have observed several scientific applications which must migrate their data from an HPC storage system to a dataintensive one for analytics. There is a gap between the data semantics of HPC storage and data-intensive system, hence, once migrated, the data must be further refined and reorganized. This reorganization must be performed before existing data-intensive tools such as MapReduce can be used to analyze data. This reorganization requires at least two complete scans through the data set and then at least one MapReduce program to prepare the data before analyzing it. Running multiple MapReduce phases causes significant overhead for the application, in the form of excessive I/O operations. That is for every MapReduce phase, a distributed read and write operation on the file system must be performed. Our contribution is to develop a MapReduce-based framework for HPC analytics to eliminate the multiple scans and also reduce the number of data preprocessing MapReduce programs. We also implement a data-centric scheduler to further improve the performance of HPC analytics MapReduce programs by maintaining the data locality. We have added additional expressiveness to the MapReduce language to allow application scientists to specify the logical semantics of their data such that 1) the data can be analyzed without running multiple data preprocessing MapReduce programs, and 2) the data can be simultaneously reorganized as it is migrated to the data-intensive file system. Using our augmented Map-Reduce system, MapReduce with Access Patterns (MRAP), we have demonstrated up to 33 percent throughput improvement in one real application, and up to 70 percent in an I/O kernel of another appl- cation. Our results for scheduling show up to 49 percent improvement for an I/O kernel of a prevalent HPC analysis application. ETPL PDS-013 Thermal and Energy Management of High-Performance Multicores: Distributed and Self-Calibrating Model-Predictive Controller
Abstract: As result of technology scaling, single-chip multicore power density increases and its spatial and temporal workload variation leads to temperature hot-spots, which may cause nonuniform ageing and accelerated chip failure. These critical issues can be tackled by closed-loop thermal and reliability management policies. Model predictive controllers (MPC) outperform classic feedback controllers since they are capable of minimizing performance loss while enforcing safe working temperature. Unfortunately, MPC controllers rely on a priori knowledge of thermal models and their complexity exponentially grows with the number of controlled cores. In this paper, we present a scalable, fully distributed, energy-aware thermal management solution for single-chip multicore platforms. The modelpredictive controller complexity is drastically reduced by splitting it in a set of simpler interacting controllers, each one allocated to a core in the system. Locally, each node selects the optimal frequency to
Abstract: VPN service providers (VSP) and IP-VPN customers have traditionally maintained service demarcation boundaries between their routing and signaling entities. This has resulted in the VPNs viewing the VSP network as an opaque entity and therefore limiting any meaningful interaction between the VSP and the VPNs. The purpose of this research is to address this issue by enabling a VSP to share its core topology information with the VPNs through a novel topology abstraction (TA) service which is both practical and scalable in the context of managed IP-VPNs. TA service provides tunable visibility of state of the VSP's network leading to better VPN performance. A key challenge of the TA service is to generate TA with relevant network resource information for each VPN in an accurate and fair manner. We develop three decentralized schemes for generating TAs with different performance characteristics. These decentralized schemes achieve improved call performance, fair resource sharing for VPNs, and higher network utilization for the VSP. We validate the idea of the VPN TA service and study the performance of the proposed techniques using various simulation scenarios over several topologies.
ETPL PDS-015
A Secure Payment Scheme with Low Communication and Processing Overhead for Multihop Wireless Networks
Abstract: We propose RACE, a report-based payment scheme for multihop wireless networks to stimulate node cooperation, regulate packet transmission, and enforce fairness. The nodes submit lightweight payment reports (instead of receipts) to the accounting center (AC) and temporarily store undeniable security tokens called Evidences. The reports contain the alleged charges and rewards without security proofs, e.g., signatures. The AC can verify the payment by investigating the consistency of the reports, and clear the payment of the fair reports with almost no processing overhead or cryptographic operations. For cheating reports, the Evidences are requested to identify and evict the cheating nodes that submit incorrect reports. Instead of requesting the Evidences from all the nodes participating in the cheating reports, RACE can identify the cheating nodes with requesting few Evidences. Moreover, Evidence aggregation technique is used to reduce the Evidences' storage area. Our analytical and simulation results demonstrate that RACE requires much less communication and processing overhead than the existing receipt-based schemes with acceptable payment clearance delay and storage area. This is essential for the effective implementation of a payment scheme because it uses micropayment and the overhead cost should be much less than the payment value. Moreover, RACE can secure the payment and precisely identify the cheating nodes without false accusations. ETPL PDS-016 Analysis of Distance-Based Location Management in Wireless Communication Networks
Abstract: Mobile ad hoc networks (MANETs) have attracted much attention due to their mobility and ease of deployment. However, the wireless and dynamic natures render them more vulnerable to various types of security attacks than the wired networks. The major challenge is to guarantee secure network services. To meet this challenge, certificate revocation is an important integral component to secure network communications. In this paper, we focus on the issue of certificate revocation to isolate attackers from further participating in network activities. For quick and accurate certificate revocation, we propose the Cluster-based Certificate Revocation with Vindication Capability (CCRVC) scheme. In particular, to improve the reliability of the scheme, we recover the warned nodes to take part in the certificate revocation process; to enhance the accuracy, we propose the threshold-based mechanism to assess and vindicate warned nodes as legitimate nodes or not, before recovering them. The performances of our scheme are evaluated by both numerical and simulation analysis. Extensive results demonstrate that the proposed certificate revocation scheme is effective and efficient to guarantee secure communications in mobile ad hoc networks. ETPL PDS-018 Coloring-Based Inter-WBAN Scheduling for Mobile Wireless Body Area Networks
Abstract: In this study, random incomplete coloring (RIC) with low time-complexity and high spatial reuse is proposed to overcome in-between wireless-body-area-networks (WBAN) interference, which can cause serious throughput degradation and energy waste. Interference-avoidance scheduling of wireless networks can be modeled as a problem of graph coloring. For instance, high spatial-reuse scheduling for a dense sensor network is mapped to high spatial-reuse coloring; fast convergence scheduling for a mobile ad hoc network (MANET) is mapped to low time-complexity coloring. However, for a dense and mobile WBAN, inter-WBAN scheduling (IWS) should simultaneously satisfy both of the following requirements: 1) high spatial-reuse and 2) fast convergence, which are tradeoffs in conventional coloring. By relaxing the coloring rule, the proposed distributed coloring algorithm RIC avoids this tradeoff and satisfies both requirements. Simulation results verify that the proposed coloring algorithm effectively overcomes inter-WBAN interference and invariably supports higher system throughput in various mobile WBAN scenarios compared to conventional colorings. ETPL PDS-019 Cross-Layer Design of Congestion Control and Power Control in Fast-Fading Wireless Networks
Abstract: We propose a distributed data replenishment mechanism for some distributed peer-to-peerbased storage systems that automates the process of maintaining a sufficient level of data redundancy to ensure the availability of data in presence of peer departures and failures. The dynamics of peers entering and leaving the network are modeled as a stochastic process. A novel analytical time-backward technique is proposed to bound the expected time for a piece of data to remain in P2P systems. Both theoretical and simulation results are in agreement, indicating that the data replenishment via random linear network coding (RLNC) outperforms other popular strategies. Specifically, we show that the expected time for a piece of data to remain in a P2P system, the longer the better, is exponential in the number of peers used to store the data for the RLNC-based strategy, while they are quadratic for other strategies. ETPL PDS-021 Distributed k-Core Decomposition
Abstract: Several novel metrics have been proposed in recent literature in order to study the relative importance of nodes in complex networks. Among those, k-coreness has found a number of applications in areas as diverse as sociology, proteinomics, graph visualization, and distributed system analysis and design. This paper proposes new distributed algorithms for the computation of the k-coreness of a network, a process also known as k-core decomposition. This technique 1) allows the decomposition, over a set of connected machines, of very large graphs, when size does not allow storing and processing them on a single host, and 2) enables the runtime computation of k-cores in live distributed systems. Lower bounds on the algorithms complexity are given, and an exhaustive experimental analysis on realworld data sets is provided.
Abstract: We study the dynamic aspects of the coverage of a mobile sensor network resulting from continuous movement of sensors. As sensors move around, initially uncovered locations may be covered at a later time, and intruders that might never be detected in a stationary sensor network can now be detected by moving sensors. However, this improvement in coverage is achieved at the cost that a location is covered only part of the time, alternating between covered and not covered. We characterize area coverage at specific time instants and during time intervals, as well as the time durations that a location is covered and uncovered. We further consider the time it takes to detect a randomly located intruder and prove that the detection time is exponentially distributed with parameter 2rv s where represents the sensor density , r represents the sensor 's sensing range , and v s denotes the average sensor speed. For mobile intruders, we take a game theoretic approach and derive optimal mobility strategies for both sensors and intruders. We prove that the optimal sensor strategy is to choose their directions uniformly at random between (0, 2). The optimal intruder strategy is to remain stationary. This solution represents a mixed strategy which is a Nash equilibrium of the zero-sum game between mobile sensors and intruders. ETPL PDS-023 Exploiting Ubiquitous Data Collection for Mobile Users in Wireless Sensor Networks
Abstract: We study the ubiquitous data collection for mobile users in wireless sensor networks. People with handheld devices can easily interact with the network and collect data. We propose a novel approach for mobile users to collect the network-wide data. The routing structure of data collection is additively updated with the movement of the mobile user. With this approach, we only perform a limited modification to update the routing structure while the routing performance is bounded and controlled compared to the optimal performance. The proposed protocol is easy to implement. Our analysis shows that the proposed approach is scalable in maintenance overheads, performs efficiently in the routing performance, and provides continuous data delivery during the user movement. We implement the proposed protocol in a prototype system and test its feasibility and applicability by a 49-node testbed. We further conduct extensive simulations to examine the efficiency and scalability of our protocol with varied network settings. ETPL PDS-024 Fast Channel Zapping with Destination-Oriented Multicast for IP Video Delivery
Abstract: Channel zapping time is a critical quality of experience (QoE) metric for IP-based video delivery systems such as IPTV. An interesting zapping acceleration scheme based on time-shifted subchannels (TSS) was recently proposed, which can ensure a zapping delay bound as well as maintain the picture quality during zapping. However, the behaviors of the TSS-based scheme have not been fully studied yet. Furthermore, the existing TSS-based implementation adopts the traditional IP multicast, which is not scalable for a large-scale distributed system. Corresponding to such issues, this paper makes contributions in two aspects. First, we resort to theoretical analysis to understand the fundamental properties of the TSS-based service model. We show that there exists an optimal subchannel data rate which minimizes the redundant traffic transmitted over subchannels. Moreover, we reveal a start-up effect, where the existing operation pattern in the TSS-based model could violate the zapping delay bound. With a solution proposed to resolve the start-up effect, we rigorously prove that a zapping delay
Abstract: In a Wireless Sensor Network (WSN), intrusion detection is of significant importance in many applications in detecting malicious or unexpected intruder(s). The intruder can be an enemy in a battlefield, or a malicious moving object in the area of interest. With uniform sensor deployment, the detection probability is the same for any point in a WSN. However, some applications may require different degrees of detection probability at different locations. For example, an intrusion detection application may need improved detection probability around important entities. Gaussian-distributed WSNs can provide differentiated detection capabilities at different locations but related work is limited. This paper analyzes the problem of intrusion detection in a Gaussian-distributed WSN by characterizing the detection probability with respect to the application requirements and the network parameters under both single-sensing detection and multiple-sensing detection scenarios. Effects of different network parameters on the detection probability are examined in detail. Furthermore, performance of Gaussiandistributed WSNs is compared with uniformly distributed WSNs. This work allows us to analytically formulate detection probability in a random WSN and provides guidelines in selecting an appropriate deployment strategy and determining critical network parameters. ETPL PDS-026 IDM: An Indirect Dissemination Mechanism for Spatial Voice Interaction in Networked Virtual Environments
Abstract: One type of Peer-to-Peer (P2P) live streaming has not yet been significantly investigated, namely topologies that provide many-to-many, interactive connectivity. Exemplar applications of such P2P systems include spatial audio services for networked virtual environments (NVEs) and distributed online games. Numerous challenging problems have to be overcome-among them providing low delay, resilience to churn, effective load balancing, and rapid convergence-in such dynamic environments. We propose a novel P2P overlay dissemination mechanism, termed IDM, that can satisfy such demanding real-time requirements. Our target application is to provide spatialized voice support in multiplayer NVEs, where each bandwidth constrained peer potentially communicates with all other peers within its area-of-interest (AoI). With IDM each peer maintains a set of partners, termed helpers, which may act as stream forwarders. We prove analytically that the system reachability is maximized when the loads of helpers are balanced proportionally to their network capacities. We then propose a game-theoretic algorithm that balances the loads of the peers in a fully distributed manner. Of practical importance in dynamic systems, we prove that our algorithm converges to an approximately balanced state from any prior state in rapid O(log log n) time, where n is the number of users. We further evaluate our technique with simulations and show that it can achieve near optimal system reachability and satisfy the tight latency constraints of interactive audio under conditions of churn, avatar mobility, and heterogeneous user access network bandwidth.
Abstract: The use of wireless sensor networks (WSNs) for closing the loops between the cyberspace and the physical processes is more attractive and promising for future control systems. For some real-time control applications, controllers need to accurately estimate the process state within rigid delay constraints. In this paper, we propose a novel in-network estimation approach for state estimation with delay constraints in multihop WSNs. For accurately estimating a process state as well as satisfying rigid delay constraints, we address the problem through jointly designing in-network estimation operations and an aggregation scheduling algorithm. Our in-network estimation operation performed at relays not only optimally fuses the estimates obtained from the different sensors but also predicts the upper stream sensors' estimates which cannot be aggregated to the sink before deadlines. Our estimate aggregation scheduling algorithm, which is interference free, is able to aggregate as much estimate information as possible from the network to the sink within delay constraints. We proved the unbiasedness of in-network estimation, and theoretically analyzed the optimality of our approach. Our simulation results corroborate our theoretical results and show that our in-network estimation approach can obtain significant estimation accuracy gain under different network settings. ETPL PDS-028 IP-Geolocation Mapping for Moderately Connected Internet Regions
Abstract: Most IP-geolocation mapping schemes [14], [16], [17], [18] take delay-measurement approach, based on the assumption of a strong correlation between networking delay and geographical distance between the targeted client and the landmarks. In this paper, however, we investigate a large region of moderately connected Internet and find the delay-distance correlation is weak. But we discover a more probable rule - with high probability the shortest delay comes from the closest distance. Based on this closest-shortest rule, we develop a simple and novel IP-geolocation mapping scheme for moderately connected Internet regions, called GeoGet. In GeoGet, we take a large number of webservers as passive landmarks and map a targeted client to the geolocation of the landmark that has the shortest delay. We further use JavaScript at targeted clients to generate HTTP/Get probing for delay measurement. To control the measurement cost, we adopt a multistep probing method to refine the geolocation of a targeted client, finally to city level. The evaluation results show that when probing about 100 landmarks, GeoGet correctly maps 35.4 percent clients to city level, which outperforms current schemes such as GeoLim [16] and GeoPing [14] by 270 and 239 percent, respectively, and the median error distance in GeoGet is around 120 km, outperforming GeoLim and GeoPing by 37 and 70 percent, respectively. ETPL PDS-029 Microarchitecture of a Coarse-Grain Out-of-Order Superscalar Processor
Abstract: We explore the design, implementation, and evaluation of a coarse-grain superscalar processor in the context of the microarchitecture of the Control Processor (CP) of the Multilevel Computing Architecture (MLCA), a novel architecture targeted for multimedia multicore systems. The MLCA augments a traditional multicore architecture (called the lower level) with a CP (called the top-level), which automatically extracts parallelism among coarse-grain units of computation (tasks), synchronizes these tasks and schedules them for execution on processors. It does so in a fashion similar to how instruction-level parallelism is extracted by superscalar processors, i.e., using register renaming, Out-ofOrder Execution (OoOE) and scheduling. The coarse-grain nature of tasks imposes challenging
Abstract: Time synchronization is an important requirement for many services provided by distributed networks. A lot of time synchronization protocols have been proposed for terrestrial Wireless Sensor Networks (WSNs). However, none of them can be directly applied to Underwater Sensor Networks (UWSNs). A synchronization algorithm for UWSNs must consider additional factors such as long propagation delays from the use of acoustic communication and sensor node mobility. These unique challenges make the accuracy of synchronization procedures for UWSNs even more critical. Time synchronization solutions specifically designed for UWSNs are needed to satisfy these new requirements. This paper proposes Mobi-Sync, a novel time synchronization scheme for mobile underwater sensor networks. Mobi-Sync distinguishes itself from previous approaches for terrestrial WSN by considering spatial correlation among the mobility patterns of neighboring UWSNs nodes. This enables Mobi-Sync to accurately estimate the long dynamic propagation delays. Simulation results show that Mobi-Sync outperforms existing schemes in both accuracy and energy efficiency. ETPL PDS-031 Autogeneration and Autotuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters
Abstract: This paper develops and evaluates search and optimization techniques for autotuning 3D stencil (nearest neighbor) computations on GPUs. Observations indicate that parameter tuning is necessary for heterogeneous GPUs to achieve optimal performance with respect to a search space. Our proposed framework takes a most concise specification of stencil behavior from the user as a single formula, autogenerates tunable code from it, systematically searches for the best configuration and generates the code with optimal parameter configurations for different GPUs. This autotuning approach guarantees adaptive performance for different generations of GPUs while greatly enhancing programmer productivity. Experimental results show that the delivered floating point performance is very close to previous handcrafted work and outperforms other autotuned stencil codes by a large margin. Furthermore, heterogeneous GPU clusters are shown to exhibit the highest performance for dissimilar tuning parameters leveraging proportional partitioning relative to single-GPU performance. ETPL PDS-032 An Iterative Divide-and-Merge-Based Approach for Solving Large-Scale Least Squares Problems
Abstract: Singular value decomposition (SVD) is a popular decomposition method for solving least squares estimation (LSE) problems. However, for large data sets, applying SVD directly on the coefficient matrix is very time consuming and memory demanding in obtaining least squares solutions. In
Abstract: In many applications, the traffic traversing the network has interpacket dependencies due to application-level encoding schemes. For some applications, e.g., multimedia streaming, dropping a single packet may render useless the delivery of a whole sequence. In such environments, the algorithm used to decide which packet to drop in case of buffer overflows must be carefully designed, to avoid goodput degradation. We present a model that captures such interpacket dependencies, and design algorithms for performing packet discard. Traffic consists of an aggregation of multiple streams, each of which consists of a sequence of interdependent packets. We provide two guidelines for designing buffer management algorithms, and demonstrate their effectiveness. We devise an algorithm according to these guidelines and evaluate its performance analytically, using competitive analysis. We also perform a simulation study that shows that the performance of our algorithm is within a small fraction of the performance of the best known offline algorithm. ETPL PDS-034 Design and Performance Evaluation of Overhearing-Aided Data Caching in Wireless Ad Hoc Networks
Abstract: Wireless ad hoc network is a promising networking technology to provide users with Internet access anywhere anytime. To cope with resource constraints of wireless ad hoc networks, data caching is widely used to efficiently reduce data access cost. In this paper, we propose an efficient data caching algorithm which makes use of the overhearing property of wireless communication to improve caching performance. Due to the broadcast nature of wireless links, a packet can be overheard by a node within the transmission range of the transmitter, even if the node is not the intended target. Our proposed algorithm explores the overheard information, including data request and data reply, to optimize cache placement and cache discovery. To the best of our knowledge, this is the first work that considers the overhearing property of wireless communications in data caching. The simulation results show that, compared with one representative algorithm and a naive overhearing algorithm, our proposed algorithm can significantly reduce both message cost and access delay. ETPL PDS-035 Dynamic Optimization of Multiattribute Resource Allocation in Self-Organizing Clouds
Abstract: By leveraging virtual machine (VM) technology which provides performance and fault isolation, cloud resources can be provisioned on demand in a fine grained, multiplexed manner rather than
Abstract: For better road safety and driving experience, content distribution for vehicle users through roadside Access Points (APs) becomes an important and promising complement to 3G and other cellular networks. In this paper, we introduce Cooperative Content Distribution System for Vehicles (CCDSV) which operates upon a network of infrastructure APs to collaboratively distribute contents to moving vehicles. CCDSV solves several important issues in a practical system, like the robustness to mobility prediction errors, limited resources of APs and the shared content distribution. Our system organizes the cooperative APs into a novel structure, namely, the contact map which is based on the vehicular contact patterns observed by APs. To fully utilize the wireless bandwidth provided by APs, we propose a representative-based prefetching mechanism, in which a set of representative APs are carefully selected and then share their prefetched data with others. The selection process explicitly takes into account the AP's storage capacity, storage status, inter-APs bandwidth and traffic loads on the backhaul links. We apply network coding in CCDSV to augment the distribution of shared contents. The selection of shared contents to be prefetched on an AP is based on the storage status of neighboring APs in the contact map in order to increase the information utility of each prefetched data piece. Through extensive simulations, CCDSV proves its effectiveness in vehicular content distribution under various scenarios ETPL PDS-037 Flexible Symmetrical Global-Snapshot Algorithms for Large-Scale Distributed Systems
Abstract: Most existing global-snapshot algorithms in distributed systems use control messages to coordinate the construction of a global snapshot among all processes. Since these algorithms typically assume the underlying logical overlay topology is fully connected, the number of control messages exchanged among the whole processes is proportional to the square of number of processes, resulting in higher possibility of network congestion. Hence, such algorithms are neither efficient nor scalable for a large-scale distributed system composed of a huge number of processes. Recently, some efforts have been presented to significantly reduce the number of control messages, but doing so incurs higher response time instead. In this paper, we propose an efficient global-snapshot algorithm able to let every process finish its local snapshot in a given number of rounds. Particularly, such an algorithm allows a tradeoff between the response time and the message complexity. Moreover, our global-snapshot algorithm is symmetrical in the sense that identical steps are executed by every process. This means that our algorithm
Abstract: Transactional Memory (TM) systems must track memory accesses made by concurrent transactions in order to detect conflicts. Many TM implementations use signatures for this purpose, which summarize reads and writes in fixed-size bit registers at the cost of false positives (detection of nonexisting conflicts). Signatures are commonly implemented as two separate same-sized Bloom filters, one for reads and other for writes. In contrast, transactions frequently exhibit read and write sets of uneven cardinality. This mismatch between data sets and filter storage introduces inefficiencies in the use of signatures that have some impact on performance. This paper presents different signature designs as alternatives to the common scheme to deal with the asymmetry in transactional data sets in an effective way. Basically, we analyze two classes of new signatures, called multiset and reconfigurable asymmetric signatures. The first class uses only one Bloom filter to track both read and write sets, while the second class uses Bloom filters of configurable size for reads and writes. The main focus of this paper is a thorough study of these alternative signature designs, including a statistical analysis of false positives and an experimental evaluation, providing performance results and hardware area, time and energy requirements. ETPL PDS-039 Improve Efficiency and Reliability in Single-Hop WSNs with Transmit-Only Nodes
Abstract: Wireless Sensor Networks (WSNs) will play a significant role at the edge of the future Internet of Things. In particular, WSNs with transmit -only nodes are attracting more attention due to their advantages in supporting applications requiring dense and long-lasting deployment at a very low cost and energy consumption. However, the lack of receivers in transmit-only nodes renders most existing MAC protocols invalid. Based on our previous study on WSNs with pure transmit-only nodes, this work proposes a simple, yet cost effective and powerful single-hop hybrid WSN cluster architecture that contains not only transmit-only nodes but also standard nodes (with transceivers). Along with the hybrid architecture, this work also proposes a new MAC layer protocol framework called Robust Asynchronous Resource Estimation (RARE) that efficiently and reliably manages the densely deployed single-hop hybrid cluster in a self-organized fashion. Through analysis and extensive simulations, the proposed framework is shown to meet or exceed the needs of most applications in terms of the data delivery probability, QoS differentiation, system capacity, energy consumption, and reliability. To the best of our knowledge, this work is the first that brings reliable scheduling to WSNs containing both nonsynchronized transmit-only nodes and standard nodes. ETPL PDS-040
Abstract: Distributed processing through ad hoc and sensor networks is having a major impact on scale and applications of computing. The creation of new cyber-physical services based on wireless sensor devices relies heavily on how well communication protocols can be adapted and optimized to meet
Abstract: In this paper, we propose a Bayesian-inference-based recommendation system for online social networks. In our system, users share their content ratings with friends. The rating similarity between a pair of friends is measured by a set of conditional probabilities derived from their mutual rating history. A user propagates a content rating query along the social network to his direct and indirect friends. Based on the query responses, a Bayesian network is constructed to infer the rating of the querying user. We develop distributed protocols that can be easily implemented in online social networks. We further propose to use Prior distribution to cope with cold start and rating sparseness. The proposed algorithm is evaluated using two different online rating data sets of real users. We show that the proposed Bayesianinference-based recommendation is better than the existing trust-based recommendations and is comparable to Collaborative Filtering (CF) recommendation. It allows the flexible tradeoffs between recommendation quality and recommendation quantity. We further show that informative Prior distribution is indeed helpful to overcome cold start and rating sparseness. ETPL CDS-Based Virtual Backbone Construction with Guaranteed Routing Cost in Wireless PDS-048 Sensor Networks Abstract: Inspired by the backbone concept in wired networks, virtual backbone is expected to bring substantial benefits to routing in wireless sensor networks (WSNs). Virtual backbone construction based on Connected Dominating Set (CDS) is a competitive approach among the existing methods used to establish virtual backbone in WSNs. Traditionally, CDS size was the only factor considered in the CDSbased approach. The motivation was that smaller CDS leads to simplified network maintenance.
Abstract: Cloud computing has a key requirement for resource configuration in a real-time manner. In such virtualized environments, both virtual machines (VMs) and hosted applications need to be configured on-the-fly to adapt to system dynamics. The interplay between the layers of VMs and applications further complicates the problem of cloud configuration. Independent tuning of each aspect may not lead to optimal system wide performance. In this paper, we propose a framework, namely CoTuner, for coordinated configuration of VMs and resident applications. At the heart of the framework is a model-free hybrid reinforcement learning (RL) approach, which combines the advantages of Simplex method and RL method and is further enhanced by the use of system knowledge guided exploration policies. Experimental results on Xen-based virtualized environments with TPC-W and TPC-C benchmarks demonstrate that CoTuner is able to drive a virtual server cluster into an optimal or nearoptimal configuration state on the fly, in response to the change of workload. It improves the systems throughput by more than 30 percent over independent tuning strategies. In comparison with the coordinated tuning strategies based on basic RL or Simplex algorithm, the hybrid RL algorithm gains 25 to 40 percent throughput improvement. ETPL PDS-053 Exploiting Concurrency for Efficient Dissemination in Wireless Sensor Networks
Abstract: Wireless sensor networks (WSNs) can be successfully applied in a wide range of applications. Efficient data dissemination is a fundamental service which enables many useful high-level functions such as parameter reconfiguration, network reprogramming, etc. Many current data dissemination protocols employ network coding techniques to deal with packet losses. The coding overhead, however, becomes a bottleneck in terms of dissemination delay. We exploit the concurrency potential of sensor nodes and propose MT-Deluge, a multithreaded design of a coding-based data dissemination protocol. By separating the coding and radio operations into two threads and carefully scheduling their executions, MT-Deluge shortens the dissemination delay effectively. An incremental decoding algorithm is employed to further improve MT-Deluge's performance. Experiments with 24 TelosB motes on four representative topologies show that MT-Deluge shortens the dissemination delay by 25.5-48.6 percent compared to a typical data dissemination protocol while keeping the merits of loss resilience. ETPL PDS-054 Fault Tolerance in Distributed Systems Using Fused Data Structures
Abstract: We consider the problem of gathering n anonymous and oblivious mobile robots, which requires that all robots meet in finite time at a nonpredefined point. While the gathering problem cannot be solved deterministically without assuming any additional capabilities for the robots, randomized approaches easily allow it to be solvable. However, the randomized solutions currently known have a time complexity that is exponential in n with no additional assumption. This fact yields the following two questions: Is it possible to construct a randomized gathering algorithm with polynomial expected time? If it is not possible, what is the minimal additional assumption necessary to obtain such an algorithm? In this paper, we address these questions from the aspect of multiplicity-detection capabilities. We newly introduce two weaker variants of multiplicity detection, called local-strong and local-weak multiplicity, and investigate whether those capabilities permit a gathering algorithm with polynomial expected time or not. The contribution of this paper is to show that any algorithm only assuming local-weak multiplicity detection takes exponential number of rounds in expectation. On the other hand, we can obtain a constantround gathering algorithm using local-strong multiplicity detection. These results imply that the two models of multiplicity detection are significantly different in terms of their computational power. Interestingly, these differences disappear if we take one more assumption that all robots are scattered (i.e., no two robots stay at the same location) initially. We can obtain a gathering algorithm that takes a constant number of rounds in expectation, assuming local-weak multiplicity detection and scattered initial configurations. ETPL Finding All Maximal Contiguous Subsequences of a Sequence of Numbers in O(1) PDS-056 Communication Rounds Abstract: Given a sequence A of real numbers, we wish to find a list of all nonoverlapping contiguous subsequences of A that are maximal. A maximal subsequence M of A has the property that no proper subsequence of M has a greater sum of values. Furthermore, M may not be contained properly within any subsequence of A with this property. This problem has several applications in Computational Biology and can be solved sequentially in linear time. We present a BSP/CGM algorithm that solves this problem using p processors in O(|A|=p) time and O(|A|=p) space per processor. The algorithm uses a constant number of communication rounds of size at most O(|A|=p). Thus, the algorithm achieves linear speedup
Abstract: In this paper, we consider the issue of data broadcasting in mobile social networks (MSNets). The objective is to broadcast data from a superuser to other users in the network. There are two main challenges under this paradigm, namely 1) how to represent and characterize user mobility in realistic MSNets; 2) given the knowledge of regular users' movements, how to design an efficient superuser route to broadcast data actively. We first explore several realistic data sets to reveal both geographic and social regularities of human mobility, and further propose the concepts of geocommunity and geocentrality into MSNet analysis. Then, we employ a semi-Markov process to model user mobility based on the geocommunity structure of the network. Correspondingly, the geocentrality indicating the dynamic user density of each geocommunity can be derived from the semi-Markov model. Finally, considering the geocentrality information, we provide different route algorithms to cater to the superuser that wants to either minimize total duration or maximize dissemination ratio. To the best of our knowledge, this work is the first to study data broadcasting in a realistic MSNet setting. Extensive trace-driven simulations show that our approach consistently outperforms other existing superuser route design algorithms in terms of dissemination ratio and energy efficiency. ETPL PDS-058 LOBOT: Low-Cost, Self-Contained Localization of Small-Sized Ground Robotic Vehicles
Abstract: It is often important to obtain the real-time location of a small-sized ground robotic vehicle when it performs autonomous tasks either indoors or outdoors. We propose and implement LOBOT, a low-cost, self-contained localization system for small-sized ground robotic vehicles. LOBOT provides accurate real-time, 3D positions in both indoor and outdoor environments. Unlike other localization schemes, LOBOT does not require external reference facilities, expensive hardware, careful tuning or strict calibration, and is capable of operating under various indoor and outdoor environments. LOBOT identifies the local relative movement through a set of integrated inexpensive sensors and well corrects the localization drift by infrequent GPS-augmentation. Our empirical experiments in various temporal and spatial scales show that LOBOT keeps the positioning error well under an accepted threshold. ETPL PDS-059 Lower Bound for Node Buffer Size in Intermittently Connected Wireless Networks
Abstract: We study the fundamental lower bound for node buffer size in intermittently connected wireless networks. The intermittent connectivity is caused by the possibility of node inactivity due to some external constraints. We find even with infinite channel capacity and node processing speed, buffer occupation in each node does not approach zero in a static random network where each node keeps a constant message generation rate. Given the condition that each node has the same probability p of being inactive during each time slot, there exists a critical value pc() for this probability from a percolation based perspective. When p <; pc(), the network is in the supercritical case, and there is an achievable lower bound (In our paper, achievable means t hat node buffer size in networks can achieve the same order as the lower bound by applying some transmission scheme) for the occupied buffer size of each node, which is asymptotically independent of the size of the network. If p > pc(), the network is in the
ETPL PDS-060
On-Chip Sensor Network for Efficient Management of Power Gating-Induced Power/Ground Noise in Multiprocessor System on Chip
Abstract: Reducing feature sizes and power supply voltage allows integrating more processing units (PUs) on multiprocessor system on chip (MPSoC) to satisfy the increasing demands of applications. However, it also makes MPSoC more susceptible to various reliability threats, such as high temperature and power/ground (P/G) noise. As the scale and complexity of MPSoC continuously increase, monitoring and mitigating reliability threats at runtime could offer better performance, scalability, and flexibility for MPSoC designs. In this paper, we propose a systematic approach, on-chip sensor network (SENoC), to collaboratively predict, detect, report, and alleviate runtime threats in MPSoC. SENoC not only detects reliability threats and shares related information among PUs, but also plans and coordinates the reactions of related PUs in MPSoC. SENoC is used to alleviate the impacts of simultaneous switching noise in MPSoC's P/G network during power gating. Based on the detailed noise behaviors under different scenarios derived by our circuit-level MPSoC P/G noise simulation and analysis platform, simulation results show that SENoC helps to achieve on average 26.2 percent performance improvement compared with the traditional stop-go method with 1.4 percent area overhead in an 8*8-core MPSoC in 45 nm. An architecture-level cycle-accurate simulator based on SystemC is implemented to study the performance of the proposed SENoC. By applying sophisticated scheduling techniques to optimize the total system performance, a higher performance improvement of 43.5 percent is achieved for a set of real-life applications. ETPL PDS-061 Robust Tracking of Small-Scale Mobile Primary User in Cognitive Radio Networks
Abstract: In cognitive radio networks (CRNs), secondary users must be able to accurately and reliably track the location of small-scale mobile primary users/devices (e.g., wireless microphones) in order to efficiently utilize spatial spectrum opportunities, while protecting primary communications. However, accurate tracking of the location of mobile primary users is difficult due mainly to the CR-unique constraint, i.e., localization must rely solely on reported sensing results (i.e., measured primary signal strengths), which can easily be compromised by malicious sensors (or attackers). To cope with this challenge, we propose a new framework, called Sequential mOnte carLo combIned with shadow-faDing estimation (SOLID), for accurate, attack/fault-tolerant tracking of small-scale mobile primary users. The key idea underlying SOLID is to exploit the temporal shadow fading correlation in sensing results induced by the primary user's mobility. Specifically, SOLID augments conventional Sequential Monte Carlo (SMC)-based target tracking with shadow-fading estimation. By examining the shadow-fading gain between the primary transmitter and CRs/sensors, SOLID 1) significantly improves the accuracy of primary tracking regardless of the presence/absence of attack, and 2) successfully masks the abnormal sensing reports due to sensor faults or attacks, preserving localization accuracy and improving spatial spectrum efficiency. Our extensive evaluation in realistic wireless fading environments shows that SOLID lowers localization error by up to 88 percent in the absence of attacks, and 89 percent in the presence of the challenging "slow-poisoning attack, compared to the conventional SMC -based tracking.
Abstract: The network traffic pattern of continuous sensor data collection often changes constantly over time due to the exploitation of temporal and spatial data correlations as well as the nature of conditionbased monitoring applications. In contrast to most existing TDMA schedules designed for a static network traffic pattern, this paper proposes a novel TDMA schedule that is capable of efficiently collecting sensor data for any network traffic pattern and is thus well suited to continuous data collection with dynamic traffic patterns. In the proposed schedule, the energy consumed by sensor nodes for any traffic pattern is very close to the minimum required by their workloads given in the traffic pattern. The schedule also allows the base station to conclude data collection as early as possible according to the traffic load, thereby reducing the latency of data collection. We present a distributed algorithm for constructing the proposed schedule. We develop a mathematical model to analyze the performance of the proposed schedule. We also conduct simulation experiments to evaluate the performance of different schedules using real-world data traces. Both the analytical and simulation results show that, compared with existing schedules that are targeted on a fixed traffic pattern, our proposed schedule significantly improves the energy efficiency and time efficiency of sensor data collection with dynamic traffic patterns. ETPL PDS-063 Secure SOurce-BAsed Loose Synchronization (SOBAS) for Wireless Sensor Networks
Abstract: We present the Secure SOurce-BAsed Loose Synchronization (SOBAS) protocol to securely synchronize the events in the network, without the transmission of explicit synchronization control messages. In SOBAS, nodes use their local time values as a one-time dynamic key to encrypt each message. In this way, SOBAS provides an effective dynamic en-route filtering mechanism, where the malicious data is filtered from the network. With SOBAS, we are able to achieve our main goal of synchronizing events at the sink as quickly, as accurately, and as surreptitiously as possible. With loose synchronization, SOBAS reduces the number of control messages needed for a WSN to operate providing the key benefits of reduced energy consumption as well as reducing the opportunity for malicious nodes to eavesdrop, intercept, or be made aware of the presence of the network. Albeit a loose synchronization per se, SOBAS is also able to provide 7.24 s clock precision given today's sensor technology, which is much better than other comparable schemes (schemes that do not employ GPS devices). Also, we show that by recognizing the need for and employing loose time synchronization, necessary synchronization can be provided to the WSN application using half of the energy needed for traditional schemes. Both analytical and simulation results are presented to verify the feasibility of SOBAS as well as the energy consumption of the scheme under normal operation and attack from malicious nodes. ETPL PDS-064 On Data Staging Algorithms for Shared Data Accesses in Clouds
Abstract: In this paper, we study the strategies for efficiently achieving data staging and caching on a set of vantage sites in a cloud system with a minimum cost. Unlike the traditional research, we do not intend to identify the access patterns to facilitate the future requests. Instead, with such a kind of information presumably known in advance, our goal is to efficiently stage the shared data items to predetermined sites at advocated time instants to align with the patterns while minimizing the monetary costs for caching and transmitting the requested data items. To this end, we follow the cost and network models in [1] and extend the analysis to multiple data items, each with single or multiple copies. Our results show that
Abstract: In this paper, we propose an analytical performance model that addresses the complexity of cloud centers through distinct stochastic submodels, the results of which are integrated to obtain the overall solution. Our model incorporates the important aspects of cloud centers such as pool management, compound requests (i.e., a set of requests submitted by one user simultaneously), resource virtualization and realistic servicing steps. In this manner, we obtain not only a detailed assessment of cloud center performance, but also clear insights into equilibrium arrangement and capacity planning that allows servicing delays, task rejection probability, and power consumption to be kept under control. ETPL PDS-067 A New Progressive Algorithm for a Multiple Longest Common Subsequences Problem and Its Efficient Parallelization
Abstract: The multiple longest common subsequence (MLCS) problem, which is related to the measurement of sequence similarity, is one of the fundamental problems in many fields. As an NP-hard problem, finding a good approximate solution within a reasonable time is important for solving large-size problems in practice. In this paper, we present a new progressive algorithm, Pro-MLCS, based on the dominant point approach. Pro-MLCS can find an approximate solution quickly and then progressively generate better solutions until obtaining the optimal one. Pro-MLCS employs three new techniques: 1) a new heuristic function for prioritizing candidate points; 2) a novel $(d)$-index-tree data structure for efficient computation of dominant points; and 3) a new pruning method using an upper bound function and approximate solutions. Experimental results show that Pro-MLCS can obtain the first approximate solution almost instantly and needs only a very small fraction, e.g., 3 percent, of the entire running time to
Abstract: Multicopy routing strategies have been considered the most applicable approaches to achieve message delivery in Delay Tolerant Networks (DTNs). Epidemic routing and two-hop forwarding routing are two well-reported approaches for delay tolerant networks routing which allow multiple message replicas to be launched in order to increase message delivery ratio and/or reduce message delivery delay. This advantage, nonetheless, is at the expense of additional buffer space and bandwidth overhead. Thus, to achieve efficient utilization of network resources, it is important to come up with an effective message scheduling strategy to determine which messages should be forwarded and which should be dropped in case of buffer is full. This paper investigates a new message scheduling framework for epidemic and twohop forwarding routing in DTNs, such that the forwarding/dropping decision can be made at a node during each contact for either optimal message delivery ratio or message delivery delay. Extensive simulation results show that the proposed message scheduling framework can achieve better performance than its counterparts. ETPL PDS-069 Attribute-Aware Data Aggregation Using Potential-Based Dynamic Routing in Wireless Sensor Networks
Abstract: The resources especially energy in wireless sensor networks (WSNs) are quite limited. Since sensor nodes are usually much dense, data sampled by sensor nodes have much redundancy, data aggregation becomes an effective method to eliminate redundancy, minimize the number of transmission, and then to save energy. Many applications can be deployed in WSNs and various sensors are embedded in nodes, the packets generated by heterogenous sensors or different applications have different attributes. The packets from different applications cannot be aggregated. Otherwise, most data aggregation schemes employ static routing protocols, which cannot dynamically or intentionally forward packets according to network state or packet types. The spatial isolation caused by static routing protocol is unfavorable to data aggregation. To make data aggregation more efficient, in this paper, we introduce the concept of packet attribute, defined as the identifier of the data sampled by different kinds of sensors or applications, and then propose an attribute-aware data aggregation (ADA) scheme consisting of a packet-driven timing algorithm and a special dynamic routing protocol. Inspired by the concept of potential in physics and pheromone in ant colony, a potential-based dynamic routing is elaborated to support an ADA strategy. The performance evaluation results in series of scenarios verify that the ADA scheme can make the packets with the same attribute spatially convergent as much as possible and therefore improve the efficiency of data aggregation. Furthermore, the ADA scheme also offers other properties, such as scalable with respect to network size and adaptable for tracking mobile events. ETPL PDS-070 DCNS: An Adaptable High Throughput RFID Reader-to-Reader Anticollision Protocol
Abstract: Modeling wave propagation through the earth is an important application in geoscience. We present a framework for wave propagation modeling on special-purpose hardware, which dramatically improves the application performance compared to conventional CPUs. We utilize custom hardware platforms consisting of a mix of x86 CPUs and dataflow engines connected by high-bandwidth communication links. Application programmers describe their algorithms in a domain specific language using Java syntax, with special dataflow semantics overlayed on top of the Java language. The application-specific dataflow engines run at hundreds of MHz with massive parallelism and deliver high performance/Watt, up to 30 times more energy efficient than conventional CPUs. The power efficiency of this approach suggests that dataflow computing may have a key role to play in the improvements in power efficiency necessary to reach exascale computing. ETPL PDS-072 GKAR: A Novel Geographic $(K)$-Anycast Routing for Wireless Sensor Networks
Abstract: To efficiently archive and query data in wireless sensor networks (WSNs), distributed storage systems, and multisink schemes have been proposed recently. However, such distributed access cannot be fully supported and exploited by existing routing protocols in a large-scale WSN. In this paper, we will address this challenging issue and propose a distributed geographic $(K)$-anycast routing (GKAR) protocol for WSNs, which can efficiently route data from a source sensor to any $(K)$ destinations (e.g., storage nodes or sinks). To guarantee $(K)$-delivery, an iterative approach is adopted in GKAR where in each round, GKAR will determine not only the next hops at each node, but also a set of potential destinations for every next hop node to reach. Efficient algorithms are designed to determine the selection of the next hops and destination set division at each intermediate node. We analyze the complexity of GKAR in each round and we also theoretically analyze the expected number of rounds required to guarantee $(K)$-delivery. Simulation results demonstrate the superiority of the GKAP scheme in reducing the total duration and the communication overhead for finding $(K)$ destinations, by comparing with the existing schemes, e.g., $(K 1)$-anycast [10].
Abstract: In this paper, we consider the problem of assigning a set of clients with demands to a set of servers with capacities and degree constraints. The goal is to find an allocation such that the number of clients assigned to a server is smaller than the server's degree and their overall demand is smaller than the server's capacity, while maximizing the overall throughput. This problem has several natural applications in the context of independent tasks scheduling or virtual machines allocation. We consider both the offline (when clients are known beforehand) and the online (when clients can join and leave the system at any time) versions of the problem. We first show that the degree constraint on the maximal number of clients that a server can handle is realistic in many contexts. Then, our main contribution is to prove that even if it makes the allocation problem more difficult (NP-Complete), a very small additive resource augmentation on the servers degree is enough to find in polynomial time a solution that achieves at least the optimal throughput. After a set of theoretical results on the complexity of the offline and online versions of the problem, we propose several other greedy heuristics to solve the online problem and we compare the performance (in terms of throughput) and the cost (in terms of disconnections and reconnections) of all proposed algorithms through a set of extensive simulation results ETPL PDS-074 Lightweight Location Verification Algorithms for Wireless Sensor Networks
Abstract: The knowledge of sensors' locations is crucial information for many applications in Wireless Sensor Networks (WSNs). When sensor nodes are deployed in hostile environments, the localization schemes are vulnerable to various attacks, e.g., wormhole attack, pollution attack, range enlargement/reduction attack, and etc. Therefore, sensors' locations are not trustworthy and need to be verified before they can be used by location-based applications. Previous verification schemes either require group-based deployment knowledge of the sensor field, or depend on expensive or dedicated hardware, thus they cannot be used for low-cost sensor networks. In this paper, we propose a lightweight location verification system that performs both “on-spot” and “inregion” location verifications. The on-spot verification intends to verify whether the locations claimed by sensors are far from their true spots beyond a certain distance. We propose two algorithms that detect abnormal locations by exploring the inconsistencies between sensors' claimed locations and their neighborhood observations. The in-region verification verifies whether a sensor is inside an applicationspecific verification region. Compared to on-spot verification, the in-region verification is tolerable to large errors as long as the locations of sensors don't cause the application to malfunction. We study how to derive the verification region for different applications and design a probabilistic algorithm to compute in-region confidence for each sensor. Experiment results show that our on-spot and in-region algorithms can verify sensors' locations with high detection rate and low false positive rate. They are robust in the presence of malicious attacks that are launched during the verification process. Moreover, compared with previous verification schemes, our algorithms are effective and lightweight because they do not rely on the knowledge of deployment of senso- s, and they don't require expensive or dedicated hardware, so our algorithms can be used in any low-cost sensor networks. ETPL PDS-075 Load Rebalancing for Distributed File Systems in Clouds
Abstract: Assume a forwarding cost function which depends on the sender receiver separation, and assume further that noncooperative relaying is applied. What is the minimum total forwarding cost required for sending a message from source to one or more destinations when multicasting along optimal placed relaying nodes is applied? In this paper, I define and analyze cost function properties from which I derive general lower bound expressions on multicasting costs. I consider an MAC layer model which does not exploit the broadcast property of wireless communication and an MAC layer model which exploits it. For specific cost functions, I show further that in case of optimal relay positions, multicasts can be constructed whose cost always stays below the derived lower bound expression plus an additive constant depending on the number of destinations. For both, lower and upper bounds, I define a general procedure to check if—and if yes how—my findings can be used to derive the specific lower and upper bound expressions for a given cost function. I explain the procedure with three cost function examples: the euclidean distance, energy cost function, and the expected number of retransmissions under Rayleigh fading.
ETPL PDS-078
Abstract: The study of genomes has been revolutionized by sequencing machines that output many short overlapping substrings (called reads). The task of sequence assembly in practice is to reconstruct long contiguous genome subsequences from the reads. With Next Generation Sequencing (NGS) technologies, assembly software needs to be more accurate, faster, and more memory-efficient due to the problem complexity and the size of the data sets. In this paper, we develop parallel algorithms and compressed data structures to address several computational challenges of NGS assembly. We demonstrate how
Abstract: Data declustering and replication can be used to reduce I/O times related with processing of data intensive queries. Declustering parallelizes the query retrieval process by distributing the data items requested by queries among several disks. Replication enables alternative disk choices for individual disk items and thus provides better query parallelism options. In general, existing replicated declustering schemes do not consider query log information and try to optimize all possible queries for a specific query type, such as range or spatial queries. In such schemes, it is assumed that two or more copies of all data items are to be generated and scheduling of these copies to disks are discussed. However, in some applications, generation of even two copies of all of the data items is not feasible, since data items tend to have very large sizes. In this work, we assume that there is a given limit on disk capacities and thus on replication amounts. We utilize existing query-log information to propose a selective replicated declustering scheme, in which we select the data items to be replicated and decide on their scheduling onto disks while respecting disk capacities. We propose and implement an iterative improvement algorithm to obtain a two-way replicated declustering and use this algorithm in a recursive framework to generate a multiway replicated declustering. Then we improve the obtained multiway replicated declustering by efficient refinement heuristics. Experiments conducted on realistic data sets show that the proposed scheme yields better performance results compared to existing replicated declustering schemes. ETPL PDS-080 RASS: A Real-Time, Accurate, and Scalable System for Tracking Transceiver-Free Objects
Abstract: Transceiver-free object tracking is to trace a moving object that does not carry any communication device in an environment with some monitoring nodes predeployed. Among all the tracking technologies, RF-based technology is an emerging research field facing many challenges. Although we proposed the original idea, until now there is no method achieving scalability without sacrificing latency and accuracy. In this paper, we put forward a real-time tracking system RASS, which can achieve this goal and is promising in the applications like the safeguard system. Our basic idea is to divide the tracking field into different areas, with adjacent areas using different communication channels. So, the interference among different areas can be prevented. For each area, three communicating nodes are deployed on the ceiling as a regular triangle to monitor this area. In each triangle area, we use a Support Vector Regression (SVR) model to locate the object. This model simulates the relationship between the signal dynamics caused by the object and the object position. It not only considers the ideal case of signal dynamics caused by the object, but also utilizes their irregular information. As a result, it can reach the tracking accuracy to around 1 m by just using three nodes in a triangle area with 4 m in each side. The experiments show that the tracking latency of the proposed RASS system is bounded by only about 0.26 m. Our system scales well to a large deployment field without sacrificing the latency and accuracy.
Abstract: In Genome Projects, biological sequences are aligned thousands of times, in a daily basis. The Smith-Waterman algorithm is able to retrieve the optimal local alignment with quadratic time and space complexity. So far, aligning huge sequences, such as whole chromosomes, with the Smith-Waterman algorithm has been regarded as unfeasible, due to huge computing and memory requirements. However, high-performance computing platforms such as GPUs are making it possible to obtain the optimal result for huge sequences in reasonable time. In this paper, we propose and evaluate CUDAlign 2.1, a parallel algorithm that uses GPU to align huge sequences, executing the Smith-Waterman algorithm combined with Myers-Miller, with linear space complexity. In order to achieve that, we propose optimizations which are able to reduce significantly the amount of data processed, while enforcing full parallelism most of the time. Using the NVIDIA GTX 560 Ti board and comparing real DNA sequences that range from 162 KBP (Thousand Base Pairs) to 59 MBP (Million Base Pairs), we show that CUDAlign 2.1 is scalable. Also, we show that CUDAlign 2.1 is able to produce the optimal alignment between the chimpanzee chromosome 22 (33 MBP) and the human chromosome 21 (47 MBP) in 8.4 hours and the optimal alignment between the chimpanzee chromosome Y (24 MBP) and the human chromosome Y (59 MBP) in 13.1 hours. ETPL PDS-082 Scalable and Accurate Graph Clustering and Community Structure Detection
Abstract: One of the most useful measures of cluster quality is the modularity of the partition, which measures the difference between the number of the edges joining vertices from the same cluster and the expected number of such edges in a random graph. In this paper, we show that the problem of finding a partition maximizing the modularity of a given graph $(G)$ can be reduced to a minimum weighted cut (MWC) problem on a complete graph with the same vertices as $(G)$. We then show that the resulting minimum cut problem can be efficiently solved by adapting existing graph partitioning techniques. Our algorithm finds clusterings of a comparable quality and is much faster than the existing clustering algorithms. ETPL PDS-083 Scaling Laws of Cognitive Ad Hoc Networks over General Primary Network Models
Abstract: We study the capacity scaling laws for the cognitive network that consists of the primary hybrid network (PhN) and secondary ad hoc network (SaN). PhN is further comprised of an ad hoc network and a base station-based (BS-based) network. SaN and PhN are overlapping in the same deployment region, operate on the same spectrum, but are independent with each other in terms of communication requirements. The primary users (PUs), i.e., the ad hoc nodes in PhN, have the priority to access the spectrum. The secondary users (SUs), i.e., the ad hoc nodes in SaN, are equipped with cognitive radios, and have the functionalities to sense the idle spectrum and obtain the necessary information of primary nodes in PhN. We assume that PhN adopts one out of three classical types of strategies, i.e., pure ad hoc strategy, BS-based strategy, and hybrid strategy. We aim to directly derive multicast capacity for SaN to unify the unicast and broadcast capacities under two basic principles: 1) The throughput for PhN cannot be undermined in order sense due to the presence of SaN. 2) The protocol adopted by PhN does not alter in the interest of SaN, anyway. Depending on which type of strategy is adopted in PhN, we design the optimal-throughput strategy for SaN. We show that there exists a threshold of the density of SUs
Abstract: To efficiently utilize the computing resources and provide good quality of service (QoS) to the end-to-end tasks in the distributed real-time systems, we can enforce the utilization bounds on multiple processors. The utilization control is challenging especially when the workload in the system is unpredictable. To handle the workload uncertainties, current research favors feedback control techniques, and recent work combines the task rate adaptation and processor frequency scaling in an asynchronous way for CPU utilization control, where task rates and the processor frequencies are tuned asynchronously in two decoupled control loops for control convenience. Since the two manipulated variables, task rates and processor frequencies, contribute to the CPU utilizations together with strong coupling, adjusting them asynchronously may degrade the utilization control performance. In this paper, we provide a novel scheme to make synchronous rate and frequency adjustment to enforce the utilization setpoint, referred to as SyRaFa scheme. SyRaFa can handle the workload uncertainties by identifying the system model online and can simultaneously adjust the manipulated variables by solving an optimization problem in each sampling period. Extensive evaluation results demonstrate SyRaFa outperforms the existing schemes especially under severe workload uncertainties. ETPL PDS-085 Anchor: A Versatile and Efficient Framework for Resource Management in the Cloud
Abstract: We present Anchor, a general resource management architecture that uses the stable matching framework to decouple policies from mechanisms when mapping virtual machines to physical servers. In Anchor, clients and operators are able to express a variety of distinct resource management policies as they deem fit, and these policies are captured as preferences in the stable matching framework. The highlight of Anchor is a new many-to-one stable matching theory that efficiently matches VMs with heterogeneous resource needs to servers, using both offline and online algorithms. Our theoretical analyses show the convergence and optimality of the algorithm. Our experiments with a prototype implementation on a 20-node server cluster, as well as large-scale simulations based on real-world workload traces, demonstrate that the architecture is able to realize a diverse set of policy objectives with good performance and practicality. ETPL PDS-086 Efficient Resource Mapping Framework over Networked Clouds via Iterated Local Search-Based Request Partitioning
Abstract: The cloud represents a computing paradigm where shared configurable resources are provided as a service over the Internet. Adding intra- or intercloud communication resources to the resource mix leads to a networked cloud computing environment. Following the cloud infrastructure as a Service paradigm and in order to create a flexible management framework, it is of paramount importance to address efficiently the resource mapping problem within this context. To deal with the inherent complexity and scalability issue of the resource mapping problem across different administrative
Abstract: As cloud computing becomes more and more popular, understanding the economics of cloud computing becomes critically important. To maximize the profit, a service provider should understand both service charges and business costs, and how they are determined by the characteristics of the applications and the configuration of a multiserver system. The problem of optimal multiserver configuration for profit maximization in a cloud computing environment is studied. Our pricing model takes such factors into considerations as the amount of a service, the workload of an application environment, the configuration of a multiserver system, the service-level agreement, the satisfaction of a consumer, the quality of a service, the penalty of a low-quality service, the cost of renting, the cost of energy consumption, and a service provider's margin and profit. Our approach is to treat a multiserver system as an M/M/m queuing model, such that our optimization problem can be formulated and solved analytically. Two server speed and power consumption models are considered, namely, the idle-speed model and the constant-speed model. The probability density function of the waiting time of a newly arrived service request is derived. The expected service charge to a service request is calculated. The expected net business gain in one unit of time is obtained. Numerical calculations of the optimal server size and the optimal server speed are demonstrated. ETPL PDS-088 Error-Tolerant Resource Allocation and Payment Minimization for Cloud System
Abstract: With virtual machine (VM) technology being increasingly mature, compute resources in cloud systems can be partitioned in fine granularity and allocated on demand. We make three contributions in this paper: 1) We formulate a deadline-driven resource allocation problem based on the cloud environment facilitated with VM resource isolation technology, and also propose a novel solution with polynomial time, which could minimize users' payment in terms of their expected deadlines. 2) By analyzing the upper bound of task execution length based on the possibly inaccurate workload prediction, we further propose an error-tolerant method to guarantee task's completion within its deadline. 3) We validate its effectiveness over a real VM-facilitated cluster environment under different levels of competition. In our experiment, by tuning algorithmic input deadline based on our derived bound, task execution length can always be limited within its deadline in the sufficient-supply situation; the mean execution length still keeps 70 percent as high as user-specified deadline under the severe competition. Under the original-deadline-based solution, about 52.5 percent of tasks are completed within 0.95-1.0 as high as their deadlines, which still conforms to the deadline-guaranteed requirement. Only 20 percent of tasks violate deadlines, yet most (17.5 percent) are still finished within 1.05 times of deadlines.
Abstract: Cloud computing allows business customers to scale up and down their resource usage based on needs. Many of the touted gains in the cloud model come from resource multiplexing through virtualization technology. In this paper, we present a system that uses virtualization technology to allocate data center resources dynamically based on application demands and support green computing by optimizing the number of servers in use. We introduce the concept of "skewness to measure the unevenness in the multidimensional resource utilization of a server. By minimizing skewness, we can combine different types of workloads nicely and improve the overall utilization of server resources. We develop a set of heuristics that prevent overload in the system effectively while saving energy used. Trace driven simulation and experiment results demonstrate that our algorithm achieves good performance. ETPL PDS-090 Performance Enhancement for Network I/O Virtualization with Efficient Interrupt Coalescing and Virtual Receive-Side Scaling
Abstract: Virtualization is a key technology in cloud computing; it can accommodate numerous guest VMs to provide transparent services, such as live migration, high availability, and rapid checkpointing. Cloud computing using virtualization allows workloads to be deployed and scaled quickly through the rapid provisioning of virtual machines on physical machines. However, I/O virtualization, particularly for networking, suffers from significant performance degradation in the presence of high-speed networking connections. In this paper, we first analyze performance challenges in network I/O virtualization and identify two problems-conventional network I/O virtualization suffers from excessive virtual interrupts to guest VMs, and the back-end driver does not efficiently use the computing resources of underlying multicore processors. To address these challenges, we propose optimization methods for enhancing the networking performance: 1) Efficient interrupt coalescing for network I/O virtualization and 2) virtual receive-side scaling to effectively leverage multicore processors. These methods are implemented and evaluated with extensive performance tests on a Xen virtualization platform. Our experimental results confirm that the proposed optimizations can significantly improve network I/O virtualization performance and effectively solve the performance challenges. ETPL PDS-091 A New Disk I/O Model of Virtualized Cloud Environment
Abstract: In a traditional virtualized cloud environment, using asynchronous I/O in the guest file system and synchronous I/O in the host file system to handle an asynchronous user disk write exhibits several drawbacks, such as performance disturbance among different guests and consistency maintenance across guest failures. To improve these issues, this paper introduces a novel disk I/O model for virtualized cloud system called HypeGear, where the guest file system uses synchronous operations to deal with the guest write request and the host file system performs asynchronous operations to write the data to the hard disk. A prototype system is implemented on the Xen hypervisor and our experimental results verify that this new model has many advantages over the conventional asynchronous-synchronous model. We also evaluate the overhead of asynchronous I/O at host, which is brought by our new model. The result demonstrates that it enforces little cost on host layer. ETPL PDS-092 Improving Data Center Network Utilization Using Near-Optimal Traffic Engineering
ETPL PDS-093
Abstract: Electricity expenditure comprises a significant fraction of the total operating cost in data centers. Hence, cloud service providers are required to reduce electricity cost as much as possible. In this paper, we consider utilizing existing energy storage capabilities in data centers to reduce electricity cost under wholesale electricity markets, where the electricity price exhibits both temporal and spatial variations. A stochastic program is formulated by integrating the center-level load balancing, the serverlevel configuration, and the battery management while at the same time guaranteeing the quality-ofservice experience by end users. We use the Lyapunov optimization technique to design an online algorithm that achieves an explicit tradeoff between cost saving and energy storage capacity. We demonstrate the effectiveness of our proposed algorithm through extensive numerical evaluations based on real-world workload and electricity price data sets. As far as we know, our work is the first to explore the problem of electricity cost saving using energy storage in multiple data centers by considering both the spatial and temporal variations in wholesale electricity prices and workload arrival processes. ETPL PDS-094 Simple and Effective Dynamic Provisioning for Power-Proportional Data Centers
Abstract: Energy consumption represents a significant cost in data center operation. A large fraction of the energy, however, is used to power idle servers when the workload is low. Dynamic provisioning techniques aim at saving this portion of the energy, by turning off unnecessary servers. In this paper, we explore how much gain knowing future workload information can bring to dynamic provisioning. In particular, we develop online dynamic provisioning solutions with and without future workload information available. We first reveal an elegant structure of the offline dynamic provisioning problem, which allows us to characterize the optimal solution in a divide-andconquer manner. We then exploit this insight to design two online algorithms with competitive ratios 2 - and e/(e - 1 + ), respectively, where 0 1 is the normalized size of a look-ahead window in which future workload information is available. A fundamental observation is that future workload information beyond the full-size look-ahead window (corresponding to = 1) will not improve dynamic provisioning perfor mance. Our algorithms are decentralized and easy to implement. We demonstrate their effectiveness in simulations using real-world traces.
Abstract: Cloud computing economically enables customers with limited computational resources to outsource large-scale computations to the cloud. However, how to protect customers' confidential data involved in the computations then becomes a major security concern. In this paper, we present a secure outsourcing mechanism for solving large-scale systems of linear equations (LE) in cloud. Because applying traditional approaches like Gaussian elimination or LU decomposition (aka. direct method) to such large-scale LEs would be prohibitively expensive, we build the secure LE outsourcing mechanism via a completely different approach-iterative method, which is much easier to implement in practice and only demands relatively simpler matrix-vector operations. Specifically, our mechanism enables a customer to securely harness the cloud for iteratively finding successive approximations to the LE solution, while keeping both the sensitive input and output of the computation private. For robust cheating detection, we further explore the algebraic property of matrix-vector operations and propose an efficient result verification mechanism, which allows the customer to verify all answers received from previous iterative approximations in one batch with high probability. Thorough security analysis and prototype experiments on Amazon EC2 demonstrate the validity and practicality of our proposed design. ETPL PDS-096 Mona: Secure Multi-Owner Data Sharing for Dynamic Groups in the Cloud
Abstract: With the character of low maintenance, cloud computing provides an economical and efficient solution for sharing group resource among cloud users. Unfortunately, sharing data in a multi-owner manner while preserving data and identity privacy from an untrusted cloud is still a challenging issue, due to the frequent change of the membership. In this paper, we propose a secure multi-owner data sharing scheme, named Mona, for dynamic groups in the cloud. By leveraging group signature and dynamic broadcast encryption techniques, any cloud user can anonymously share data with others. Meanwhile, the storage overhead and encryption computation cost of our scheme are independent with the number of revoked users. In addition, we analyze the security of our scheme with rigorous proofs, and demonstrate the efficiency of our scheme in experiments. ETPL PDS-097 A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-Effective Privacy Preserving of Intermediate Data Sets in Cloud
Abstract: Cloud computing provides massive computation power and storage capacity which enable users to deploy computation and data-intensive applications without infrastructure investment. Along the processing of such applications, a large volume of intermediate data sets will be generated, and often stored to save the cost of recomputing them. However, preserving the privacy of intermediate data sets becomes a challenging problem because adversaries may recover privacy-sensitive information by analyzing multiple intermediate data sets. Encrypting ALL data sets in cloud is widely adopted in existing approaches to address this challenge. But we argue that encrypting all intermediate data sets are neither efficient nor cost-effective because it is very time consuming and costly for data-intensive applications to en/decrypt data sets frequently while performing any operation on them. In this paper, we propose a novel upper bound privacy leakage constraint-based approach to identify which intermediate data sets need to be encrypted and which do not, so that privacy-preserving cost can be saved while the privacy requirements of data holders can still be satisfied. Evaluation results demonstrate that the privacypreserving cost of intermediate data sets can be significantly reduced with our approach over existing
Abstract: The ultimate goal of cloud providers by providing resources is increasing their revenues. This goal leads to a selfish behavior that negatively affects the users of a commercial multicloud environment. In this paper, we introduce a pricing model and a truthful mechanism for scheduling single tasks considering two objectives: monetary cost and completion time. With respect to the social cost of the mechanism, i.e., minimizing the completion time and monetary cost, we extend the mechanism for dynamic scheduling of scientific workflows. We theoretically analyze the truthfulness and the efficiency of the mechanism and present extensive experimental results showing significant impact of the selfish behavior of the cloud providers on the efficiency of the whole system. The experiments conducted using real-world and synthetic workflow applications demonstrate that our solutions dominate in most cases the Pareto-optimal solutions estimated by two classical multiobjective evolutionary algorithms. ETPL PDS-099 QoS Ranking Prediction for Cloud Services
Abstract: Cloud computing is becoming popular. Building high-quality cloud applications is a critical research problem. QoS rankings provide valuable information for making optimal cloud service selection from a set of functionally equivalent service candidates. To obtain QoS values, real-world invocations on the service candidates are usually required. To avoid the time-consuming and expensive real-world service invocations, this paper proposes a QoS ranking prediction framework for cloud services by taking advantage of the past service usage experiences of other consumers. Our proposed framework requires no additional invocations of cloud services when making QoS ranking prediction. Two personalized QoS ranking prediction approaches are proposed to predict the QoS rankings directly. Comprehensive experiments are conducted employing real-world QoS data, including 300 distributed users and 500 realworld web services all over the world. The experimental results show that our approaches outperform other competing approaches. ETPL PDS-100 Cloudy with a Chance of Cost Savings
Abstract: Cloud-based hosting is claimed to possess many advantages over traditional in-house (onpremise) hosting such as better scalability, ease of management, and cost savings. It is not difficult to understand how cloud-based hosting can be used to address some of the existing limitations and extend the capabilities of many types of applications. However, one of the most important questions is whether cloud-based hosting will be economically feasible for my application if migrated into the cloud. It is not straightforward to answer this question because it is not clear how my application will benefit from the claimed advantages, and, in turn, be able to convert them into tangible cost savings. Within cloud-based hosting offerings, there is a wide range of hosting options one can choose from, each impacting the cost in a different way. Answering these questions requires an in-depth understanding of the cost implications of all the possible choices specific to my circumstances. In this study, we identify a diverse set of key factors affecting the costs of deployment choices. Using benchmarks representing two different applications (TPC-W and TPC-E) we investigate the evolution of costs for different deployment choices. We consider
Abstract: Massive computation power and storage capacity of cloud computing systems allow scientists to deploy computation and data intensive applications without infrastructure investment, where large application data sets can be stored in the cloud. Based on the pay-as-you-go model, storage strategies and benchmarking approaches have been developed for cost-effectively storing large volume of generated application data sets in the cloud. However, they are either insufficiently cost-effective for the storage or impractical to be used at runtime. In this paper, toward achieving the minimum cost benchmark, we propose a novel highly cost-effective and practical storage strategy that can automatically decide whether a generated data set should be stored or not at runtime in the cloud. The main focus of this strategy is the local-optimization for the tradeoff between computation and storage, while secondarily also taking users' (optional) preferences on storage into consideration. Both theoretical analysis and simulations conducted on general (random) data sets as well as specific real world applications with Amazon's cost model show that the cost-effectiveness of our strategy is close to or even the same as the minimum cost benchmark, and the efficiency is very high for practical runtime utilization in the cloud. ETPL PDS-102 Toward Fine-Grained, Unsupervised, Scalable Performance Diagnosis for Production Cloud Computing Systems
Abstract: Performance diagnosis is labor intensive in production cloud computing systems. Such systems typically face many real-world challenges, which the existing diagnosis techniques for such distributed systems cannot effectively solve. An efficient, unsupervised diagnosis tool for locating fine-grained performance anomalies is still lacking in production cloud computing systems. This paper proposes CloudDiag to bridge this gap. Combining a statistical technique and a fast matrix recovery algorithm, CloudDiag can efficiently pinpoint fine-grained causes of the performance problems, which does not require any domain-specific knowledge to the target system. CloudDiag has been applied in a practical production cloud computing systems to diagnose performance problems. We demonstrate the effectiveness of CloudDiag in three real-world case studies. ETPL PDS-103 A Fast RPC System for Virtual Machines
Abstract: Despite the advances in high performance interdomain communications for virtual machines (VM), data intensive applications developed for VMs based on the traditional remote procedure call (RPC) mechanism still suffer from performance degradation due to the inherent inefficiency of data serialization/deserilization operations. This paper presents VMRPC, a lightweight RPC framework specifically designed for VMs that leverages the heap and stack sharing mechanism to circumvent unnecessary data copy and serialization/deserilization. Our evaluation shows that the performance of VMRPC is an order of magnitude better than traditional RPC systems and existing alternative interdomain communication optimization systems. The evaluation on a VMRPC-enhanced networked file system across a varied range of benchmarks further reveals the competitiveness of VMRPC in IO-
Abstract: The growing importance of operations such as identification, location sensing, and object tracking has led to increasing interests in contactless Radio Frequency Identification (RFID) systems. Enjoying the low cost of RFID tags, modern RFID systems tend to be deployed for large-scale mobile objects. Both the theoretical and experimental results suggest that when tags are in large numbers, most existing collision arbitration protocols do not satisfy the scalability and time-efficiency requirements of many applications. To address this problem, we propose Adaptively Splitting-based Arbitration Protocol (ASAP), a scheme that provides efficient RFID identification for both small and large deployment of RFID tags, in terms of time and energy cost. Theoretical analysis and simulation evaluation show that the performance of ASAP is better than most existing collision-arbitration solutions and the time efficiency is close to the theoretically optimal values. ETPL PDS-103 Attached-RTS: Eliminating an Exposed Terminal Problem in Wireless Networks
Abstract: Leveraging concurrent transmission is a promising way to improve throughput in wireless networks. Existing media access control (MAC) protocols like carrier sense multiple access always try to minimize the number of concurrent transmissions to avoid collision, although collisions at sender sides are harmless to the overall performance. The reason for such conservative strategy is that those protocols cannot obtain accurate channel status (who is transmitting and receiving) with low cost. They can only avoid potential collisions through rough channel status (idle or busy). To obtain additional information in a cost-efficient way, we propose a novel coding scheme, Attachment Coding, to allow control information to be attached on data packet. Nodes then transmit two kinds of signals simultaneously, without degrading the effective throughput of the original data traffic. Based on Attachment Coding, we propose an Attached-RTS MAC (AR-MAC) to exploit exposed terminals for concurrent transmissions. The attached control information provides accurate channel status for nodes in real time. Therefore, nodes can identify exposed terminals and utilize them for concurrent transmission. We theoretically analyze the feasibility of Attachment Coding, and implement it on the GNU Radio testbed to further verify it. We also conduct extensive simulations to evaluate the performance of Attached-RTS. The experimental results show that by leveraging Attachment Coding, AR-MAC achieves up to 180 percent in dense deployed ad hoc networks. ETPL PDS-103 "ESWC: Efficient Scheduling for the Mobile Sink in Wireless Sensor Networks with Delay Constraint
Abstract: This paper exploits sink mobility to prolong the network lifetime in wireless sensor networks where the information delay caused by moving the sink should be bounded. Due to the combinational complexity of this problem, most previous proposals focus on heuristics and provable optimal algorithms remain unknown. In this paper, we build a unified framework for analyzing this joint sink mobility, routing, delay, and so on. We discuss the induced subproblems and present efficient solutions for them. Then, we generalize these solutions and propose a polynomial-time optimal algorithm for the origin problem. In simulations, we show the benefits of involving a mobile sink and the impact of network
Abstract: Along with radio frequency identification (RFID) becoming ubiquitous, security issues have attracted extensive attentions. Most studies focus on the single-reader and single-tag case to provide security protection, which leads to certain limitations for diverse applications. This paper proposes a grouping-proofs-based authentication protocol (GUPA) to address the security issue for multiple readers and tags simultaneous identification in distributed RFID systems. In GUPA, distributed authentication mode with independent subgrouping proofs is adopted to enhance hierarchical protection; an asymmetric denial scheme is applied to grant fault-tolerance capabilities against an illegal reader or tag; and a sequence-based odd-even alternation group subscript is presented to define a function for secret updating. Meanwhile, GUPA is analyzed to be robust enough to resist major attacks such as replay, forgery, tracking, and denial of proof. Furthermore, performance analysis shows that compared with the known grouping-proof or yoking-proof-based protocols, GUPA has lower communication overhead and computation load. It indicates that GUPA realizing both secure and simultaneous identification is efficient for resource-constrained distributed RFID systems. ETPL PDS-103 Hint-Based Execution of Workloads in Clouds with Nefeli
Abstract: Infrastructure-as-a-Service clouds offer entire virtual infrastructures for distributed processing while concealing all physical underlying machinery. Current cloud interface abstractions restrict users from providing information regarding usage patterns of their requested virtual machines (VMs). In this paper, we propose Nefeli, a virtual infrastructure gateway that lifts this restriction. Through Nefeli, cloud consumers provide deployment hints on the possible mapping of VMs to physical nodes. Such hints include the collocation and anticollocation of VMs, the existence of potential performance bottlenecks, the presence of underlying hardware features (e.g., high availability), the proximity of certain VMs to data repositories, or any other information that would contribute in a more effective placement of VMs to physical hosting nodes. Consumers designate only properties of their virtual infrastructure and remain at all times agnostic to the cloud internal physical characteristics. The set of consumer-provided hints is augmented with high-level placement policies specified by the cloud administration. Placement policies and hints form a constraint satisfaction problem that when solved, yields the final VM-to-host placement. As workloads executed by the cloud may change over time, VM-to-host mappings must follow suit. To this end, Nefeli captures such events, changes VM deployment, helps avoid bottlenecks, and ultimately, improves the quality of the rendered services. Using our prototype, we examine overheads involved and show significant improvements in terms of time needed to execute scientific and real application workloads. We also demonstrate how power-aware policies may reduce the energy consumption of the physical installation. Finally, we compare Nefeli's placement choices with those attained by the opensource cloud middleware, OpenNebula. ETPL PDS-103 Hypocomb: Bounded-Degree Localized Geometric Planar Graphs for Wireless Ad Hoc Networks
Abstract: Multiple-Input-Multiple-Output (MIMO) has great potential for enhancing the throughput of multihop wireless networks via spatial multiplexing or spatial reuse. Spatial reuse with Stream Control (SC) provides a considerable improvement of the network throughput over spatial multiplexing. The gain of spatial reuse, however, is still not fully exploited. There exist large numbers of additional data streams, which could be transmitted concurrently with those data streams scheduled by stream control at certain time slots and vicinities. In this paper, we address the issue of MIMO link scheduling to maximize the gain of spatial reuse and thus network throughput. We propose a Receiver-Oriented Interference Suppression model (ROIS), based on which we design both centralized and distributed link scheduling algorithms to fully exploit the gain of spatial reuse in multihop MIMO networks. Further, we address the traffic-aware link scheduling problem by injecting nonuniform traffic load into the network. Through theoretical analysis and comprehensive performance evaluation, we achieve the following results: 1) link scheduling based on ROIS achieves significant higher network throughput than that based on stream control, with any interference range, number of antennas, and average hop length of data flows. 2) The traffic-aware scheduling is enticingly complementary to the link scheduling based on ROIS model. Accordingly, the two scheduling schemes can be combined to further enhance the network throughput. ETPL PDS-103 Managing Overloaded Hosts for Dynamic Consolidation of Virtual Machines in Cloud Data Centers under Quality of Service Constraints
Abstract: Dynamic consolidation of virtual machines (VMs) is an effective way to improve the utilization of resources and energy efficiency in cloud data centers. Determining when it is best to reallocate VMs from an overloaded host is an aspect of dynamic VM consolidation that directly influences the resource utilization and quality of service (QoS) delivered by the system. The influence on the QoS is explained by the fact that server overloads cause resource shortages and performance degradation of applications. Current solutions to the problem of host overload detection are generally heuristic based, or rely on statistical analysis of historical data. The limitations of these approaches are that they lead to suboptimal results and do not allow explicit specification of a QoS goal. We propose a novel approach that for any known stationary workload and a given state configuration optimally solves the problem of host overload detection by maximizing the mean intermigration time under the specified QoS goal based on a Markov chain model. We heuristically adapt the algorithm to handle unknown nonstationary workloads using the Multisize Sliding Window workload estimation technique. Through simulations with workload traces
Abstract: Priority queues are essential building blocks for implementing advanced per-flow service disciplines and hierarchical quality-of-service at high-speed network links. Scalable priority queue implementation requires solutions to two fundamental problems. The first is to sort queue elements in real time at ever increasing line speeds (e.g., at OC-768 rates). The second is to store a huge number of packets (e.g., millions of packets). In this paper, we propose novel solutions by decomposing the problem into two parts, a succinct priority index (PI) in SRAM that can efficiently maintain a real-time sorting of priorities, coupled with a DRAM-based implementation of large packet buffers. In particular, we propose three related novel succinct PI data structures for implementing high-speed PIs: a PI, a counting priority index (CPI), and a pipelined counting priority index (pCPI). We show that all three structures can be very compactly implemented in SRAM using only (U) space, where U is the size of the universe required to implement the priority keys (time stamps). We also show that our proposed PI structures can be implemented very efficiently as well by leveraging hardware-optimized instructions that are readily available in modern 64-bit processors. The operations on the PI and CPI structures take (logW U) time complexity, where W is the processor word length (i.e., W = 64). Alternatively, operations on the pCPI structure take amortized constant time with only (logW U) pipeline stages (e.g., only four pipeline stages for U = 16 million). Finally, we show the application of our proposed PI structures for the scalable management of large packet buffers at line speeds. The pCPI structure can be implemented efficiently in high-performance network processing applications such as advanced per-flow scheduling with quality-ofservice guarantee. ETPL PDS-103 POVA: Traffic Light Sensing with Probe Vehicles
Abstract: Traffic light sensing aims to detect the status of traffic lights which is valuable for many applications such as traffic management, traffic light optimization, and real-time vehicle navigation. In this work, we develop a system called POVA for traffic light sensing in large-scale urban areas. The system employs pervasive probe vehicles that just report real-time states of position and speed from time to time. POVA has advantages of wide coverage and low deployment cost. The important observation motivating the design of POVA is that a traffic light has a considerable impact on mobility of vehicles on the road attached to the traffic light. However, the system design faces three unique challenges: 1) Probe reports are by nature discrete while the goal of traffic light sensing is to determine the state of a traffic light at any time; 2) there may be a very limited number of probe reports in a given duration for traffic light state estimation; and 3) a traffic light may change its state with a variable interval. To tackle the challenges, we develop a new technique that makes the best use of limited probe reports as well as statistical features of light states. It first estimates the state of a traffic light at the time instant of a report by applying maximum a posterior estimation. Then, we formulate the state estimation of a light at any time into a joint optimization problem that is solved by an efficient heuristic algorithm. We have implemented the system and tested it with a fleet of around 4,000 probe taxis and 2,000 buses in Shanghai, China. Trace-driven experimentation and field study show that nearly 60 percent of traffic lights have an estimation error lower than 19 percent if 20,000 probe vehicles would be employed in the urban area of Shanghai. We further demonstrate that the estimation error rate is as low as 18 percent even
ETPL PDS-103
Resisting Web Proxy-Based HTTP Attacks by Temporal and Spatial Locality Behavior
Abstract: A novel server-side defense scheme is proposed to resist the Web proxy-based distributed denial of service attack. The approach utilizes the temporal and spatial locality to extract the behavior features of the proxy-to-server traffic, which makes the scheme independent of the traffic intensity and frequently varying Web contents. A nonlinear mapping function is introduced to protect weak signals from the interference of infrequent large values. Then, a new hidden semi-Markov model parameterized by Gaussian-mixture and Gamma distributions is proposed to describe the time-varying traffic behavior of Web proxies. The new method reduces the number of parameters to be estimated, and can characterize the dynamic evolution of the proxy-to-server traffic rather than the static statistics. Two diagnosis approaches at different scales are introduced to meet the requirement of both fine-grained and coarse-grained detection. Soft control is a novel attack response method proposed in this work. It converts a suspicious traffic into a relatively normal one by behavior reshaping rather than rudely discarding. This measure can protect the quality of services of legitimate users. The experiments confirm the effectiveness of the proposed scheme ETPL PDS-103 Runtime Contention and Bandwidth-Aware Adaptive Routing Selection Strategies for Networks-on-Chip
Abstract: This paper presents adaptive routing selection strategies suitable for network-on-chip (NoC). The main prototype presented in this paper uses contention information and bandwidth space occupancy to make routing decision at runtime during application execution time. The performance of the NoC router is compared to other NoC routers with queue-length-oriented adaptive routing selection strategies. The evaluation results show that the contention- and bandwidth-aware adaptive routing selection strategies are better than the queue-length-oriented adaptive selection strategies. Messages in the NoC are switched with a wormhole cut-through switching method, where different messages can be interleaved at flit-level in the same communication link without using virtual channels. Hence, the head-of-line blocking problem can be solved effectively and efficiently. The routing control concept and the VLSI microarchitecture of the NoC routers are also presented in this paper. ETPL PDS-103 Self-Adaptive Contention Aware Routing Protocol for Intermittently Connected Mobile Networks
Abstract: This paper introduces a novel multicopy routing protocol, called Self-Adaptive Utility-based Routing Protocol (SAURP), for Delay Tolerant Networks (DTNs) that are possibly composed of a vast number of devices in miniature such as smart phones of heterogeneous capacities in terms of energy resources and buffer spaces. SAURP is characterized by the ability of identifying potential opportunities for forwarding messages to their destinations via a novel utility function-based mechanism, in which a suite of environment parameters, such as wireless channel condition, nodal buffer occupancy, and encounter statistics, are jointly considered. Thus, SAURP can reroute messages around nodes
ETPL PDS-103
Abstract: We propose a pervasive usage of the sensor network infrastructure as a cyber-physical system for navigating internal users in locations of potential danger. Our proposed application differs from previous work in that they typically treat the sensor network as a media of data acquisition while in our navigation application, in-situ interactions between users and sensors become ubiquitous. In addition, human safety and time factors are critical to the success of our objective. Without any preknowledge of user and sensor locations, the design of an effective and efficient navigation protocol faces nontrivial challenges. We propose to embed a road map system in the sensor network without location information so as to provide users navigating routes with guaranteed safety. We accordingly design efficient road map updating mechanisms to rebuild the road map in the event of changes in dangerous areas. In this navigation system, each user only issues local queries to obtain their navigation route. The system is highly scalable for supporting multiple users simultaneously. We implement a prototype system with 36 TelosB motes to validate the effectiveness of this design. We further conduct comprehensive and largescale simulations to examine the efficiency and scalability of the proposed approach under various environmental dynamics. ETPL PDS-103 The Bodyguard Allocation Problem
Abstract: In this paper, we introduce the Bodyguard Allocation Problem (BAP) game, that illustrates the behavior of processes with contradictory individual goals in distributed systems. In particular, the game deals with the conflict of interest between two classes of processes that maximize/minimize their distance to a special process called the root. A solution of the BAP game represents a rooted spanning tree in which there exists a condition of equilibrium with maximum social welfare. We analyze the inefficiency of equilibria of the game based on both a completely cooperative and noncooperative approach. Additionally, we design two algorithms, CBAP and DBAP, that provide approximated solutions for the BAP game. We prove that both algorithms always terminate in a configuration with equilibrium and we analyze their running time based on the approach of cooperation used. We perform experimental simulations to compare the overall quality of equilibria obtained by the proposed algorithms. ETPL PDS-103 A 3.42-Approximation Algorithm for Scheduling Malleable Tasks under Precedence Constraints
Abstract: Scheduling malleable tasks under general precedence constraints involves finding a minimum
Abstract: Multicore platforms are characterized by increasing variability and aging effects that imply heterogeneity in core performance, energy consumption, and reliability. In particular, wear-out effects such as negative-bias-temperature-instability require runtime adaptation of system resource utilization to time-varying and uneven platform degradation, so as to prevent premature chip failure. In this context, task allocation techniques can be used to deal with heterogeneous cores and extend chip lifetime while minimizing energy and preserving quality of service. We propose a new formulation of the task allocation problem for variability affected platforms, which manages per-core utilization to achieve a target lifetime while minimizing energy consumption during the execution of rate-constrained multimedia applications. We devise an adaptive solution that can be applied online and approximates the result of an optimal, offline version. Our allocator has been implemented and tested on real-life functional workloads running on a timing accurate simulator of a next-generation industrial multicore platform. We extensively assess the effectiveness of the online strategy both against the optimal solution and also compared to alternative state-of-the-art policies. The proposed policy outperforms state-of-the-art strategies in terms of lifetime preservation, while saving up to 20 percent of energy consumption without impacting timing constraints. ETPL PDS-103 An Efficient Penalty-Aware Cache to Improve the Performance of Parity-Based Disk Arrays under Faulty Conditions
Abstract: The buffer cache plays an essential role in smoothing the gap between the upper level computational components and the lower level storage devices. A good buffer cache management scheme should be beneficial to not only the computational components, but also the storage components by reducing disk I/Os. Existing cache replacement algorithms are well optimized for disks in normal mode, but inefficient under faulty scenarios, such as a parity-based disk array with faulty disk(s). To address this issue, we propose a novel penalty-aware buffer cache replacement strategy, named Victim Disk(s) First (VDF) cache, to improve the reliability and performance of a storage system consisting of a buffer cache and disk arrays. VDF cache gives higher priority to cache the blocks on the faulty disks when the disk array fails, thus reducing the I/Os addressed directly to the faulty disks. To verify the effectiveness of the VDF cache, we have integrated VDF into the popular cache algorithms least frequently used (LFU) and least recently used (LRU), named VDF-LFU and VDF-LRU, respectively. We have conducted intensive simulations as well as a prototype implementation for disk arrays to tolerate one disk failure (RAID-5) and two disk failures (RAID-6). The simulation results have shown that VDF-LFU can reduce disk I/Os to surviving disks by up to 42.3 percent in RAID-5 and 50.7 percent in RAID-6, and VDF-LRU can
Abstract: Wireless mesh networks (WMNs) have been deployed in many areas. There is an increasing demand for supporting a large number of mobile users in WMNs. As one of the key components in mobility management support, location management serves the purpose of tracking mobile users and locating them prior to establishing new communications. Previous dynamic location management schemes proposed for cellular and wireless local area networks (WLANs) cannot be directly applied to WMNs due to the existence of multihop wireless links in WMNs. Moreover, new design challenges arise when applying location management for silently roaming mobile users in the mesh backbone. Considering the number of wireless hops, an important factor affecting the performance of WMNs, we propose a DoMaIN framework that can help mobile users to decide whether an intra- or intergateway location update (LU) is needed to ensure the best location management performance (i.e., packet delivery) among dynamic location management solutions. In addition, by dynamically guiding mobile users to perform LU to a desirable location entity, the proposed DoMaIN framework can minimize the location management protocol overhead in terms of LU overhead in the mesh backbone. Furthermore, DoMaIN brings extra benefits for supporting a dynamic hop-based LU triggering method that is different from previous dynamic LU triggering schemes proposed for cellular networks and WLANs. We evaluate the performance of DoMaIN in different case studies using OPNET simulations. Comprehensive simulation results demonstrate that DoMaIN outperforms other location management schemes and is a satisfactory location management solution for a large number of mobile users silently and arbitrarily roaming under the wireless mesh backbone. ETPL PDS-103 Efficient Computation of Robust Average of Compressive Sensing Data in Wireless Sensor Networks in the Presence of Sensor Faults
Abstract: Wireless sensor networks (WSNs) enable the collection of physical measurements over a large geographic area. It is often the case that we are interested in computing and tracking the spatial-average of the sensor measurements over a region of the WSN. Unfortunately, the standard average operation is not robust because it is highly susceptible to sensor faults and heterogeneous measurement noise. In this paper, we propose a computational efficient method to compute a weighted average (which we will call robust average) of sensor measurements, which appropriately takes sensor faults and sensor noise into consideration. We assume that the sensors in the WSN use random projections to compress the data and send the compressed data to the data fusion centre. Computational efficiency of our method is achieved by having the data fusion centre work directly with the compressed data streams. The key advantage of our proposed method is that the data fusion centre only needs to perform decompression once to compute the robust average, thus greatly reducing the computational requirements. We apply our proposed method to the data collected from two WSN deployments to demonstrate its efficiency and accuracy.
Abstract: Small talk is an important social lubricant that helps people, especially strangers, initiate conversations and make friends with each other in physical proximity. However, due to difficulties in quickly identifying significant topics of common interest, real-world small talk tends to be superficial. The mass popularity of mobile phones can help improve the effectiveness of small talk. In this paper, we present E-SmallTalker, a distributed mobile communications system that facilitates social networking in physical proximity. It automatically discovers and suggests topics such as common interests for more significant conversations. We build on Bluetooth Service Discovery Protocol (SDP) to exchange potential topics by customizing service attributes to publish non-service-related information without establishing a connection. We propose a novel iterative Bloom filter protocol that encodes topics to fit in SDP attributes and achieves a low false-positive rate. We have implemented the system in Java ME for ease of deployment. Our experiments on real-world phones show that it is efficient enough at the system level to facilitate social interactions among strangers in physical proximity. To the best of our knowledge, ESmallTalker is the first distributed mobile system to achieve the same purpose. ETPL PDS-103 Formal Specification and Runtime Detection of Dynamic Properties in Asynchronous Pervasive Computing Environments
Abstract: Formal specification and runtime detection of contextual properties is one of the primary approaches to enabling context awareness in pervasive computing environments. Due to the intrinsic dynamism of the pervasive computing environment, dynamic properties, which delineate concerns of context-aware applications on the temporal evolution of the environment state, are of great importance. However, detection of dynamic properties is challenging, mainly due to the intrinsic asynchrony among computing entities in the pervasive computing environment. Moreover, the detection must be conducted at runtime in pervasive computing scenarios, which makes existing schemes do not work. To address these challenges, we propose the property detection for asynchronous context (PDAC) framework, which consists of three essential parts: 1) Logical time is employed to model the temporal evolution of environment state as a lattice. The active surface of the lattice is introduced as the key notion to model the runtime evolution of the environment state; 2) Specification of dynamic properties is viewed as a formal language defined over the trace of environment state evolution; and 3) The SurfMaint algorithm is proposed to achieve runtime maintenance of the active surface of the lattice, which further enables runtime detection of dynamic properties. A case study is conducted to demonstrate how the PDAC framework enables context awareness in asynchronous pervasive computing scenarios. The SurfMaint algorithm is implemented and evaluated over MIPA--the open-source context-aware middleware we developed. Performance measurements show the accuracy and cost-effectiveness of SurfMaint, even when faced with dynamic changes in the asynchronous pervasive computing environment. ETPL PDS-103 GPUs as Storage System Accelerators
Abstract: Massively multicore processors, such as graphics processing units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing
Abstract: Localization is of great importance in mobile and wireless network applications. Time Difference of Arrival (TDOA) is one of the widely used localization schemes, in which the target (source) emits a signal and a number of anchors (receivers) record the arriving time of the source signal. By calculating the time difference of different receivers, the location of the target is estimated. In such a scheme, receivers must be precisely time synchronized. But time synchronization adds computational cost, and brings errors which may lower localization accuracy. Previous studies have shown that existing time synchronization approaches using low-cost devices are insufficiently accurate, or even infeasible under high requirement for accuracy. In our scheme (called Whistle), several asynchronous receivers record a target signal and a successive signal that is generated artificially. By two-signal sensing and sample counting techniques, time synchronization requirement can be removed, while high time resolution can be achieved. This design fundamentally changes TDOA in the sense of releasing the synchronization requirement and avoiding many sources of errors caused by time synchronization. We implement Whistle on commercial off-the-shelf (COTS) cell phones with acoustic signal and perform simulations with UWB signal. Especially we use Whistle to localize nodes of large-scale wireless networks, and also achieve desirable results. The extensive real-world experiments and simulations show that Whistle can be widely used with good accuracy.
ETPL PDS-103
Abstract: Recent studies have shown that a significant portion of the total energy consumption of many data centers is caused by the inefficient operation of their cooling systems. Without effective thermal monitoring with accurate location information, the cooling systems often use unnecessarily low temperature set points to overcool the entire room, resulting in excessive energy consumption. Sensor network technology has recently been adopted for data-center thermal monitoring because of its nonintrusive nature for the already complex data center facilities and robustness to instantaneous CPU or disk activities. However, existing solutions place sensors in a simplistic way without considering the thermal dynamics in data centers, resulting in unnecessarily degraded hot server detection probability. In this paper, we first formulate the problems of sensor placement for hot server detection in a data center as constrained optimization problems in two different scenarios. We then propose a novel placement scheme based on computational fluid dynamics (CFD) to take various factors, such as cooling systems and server layout, as inputs to analyze the thermal conditions of the data center. Based on the CFD analysis in
Abstract: One of the most appealing characteristics of unstructured P2P overlays is their enhanced self-* properties, which results from their loose, random structure. In addition, most of the algorithms which make searching in unstructured P2P systems scalable, such as dynamic querying and 1-hop replication, rely on the random nature of the overlay to function efficiently. The underlying communications network (i.e., the Internet), however, is not as randomly constructed. This leads to a mismatch between the distance of two peers on the overlay and the hosts they reside on at the IP layer, which in turn leads to its misuse. The crux of the problem arises from the fact that any effort to provide a better match between the overlay and the IP layer will inevitably lead to a reduction in the random structure of the P2P overlay, with many adverse results. With this in mind, we propose ITA, an algorithm which creates a random overlay of randomly connected neighborhoods providing topology awareness to P2P systems, while at the same time has no negative effect on the self-* properties or the operation of the other P2P algorithms. Using extensive simulations, both at the IP router level and autonomous system level, we show that ITA reduces communication latencies by as much as 50 percent. Furthermore, it not only reduces by 20 percent the number of IP network messages which is critical for ISPs carrying the burden of transporting P2P traffic, but also distributes the traffic load more evenly on the routers of the IP network layer. ETPL PDS-103 K-Means for Parallel Architectures Using All-Prefix-Sum Sorting and Updating Steps
Abstract: We present an implementation of parallel $(K)$-means clustering, called $(K_{ps})$-means, that achieves high performance with near-full occupancy compute kernels without imposing limits on the number of dimensions and data points permitted as input, thus combining flexibility with high degrees of parallelism and efficiency. As a key element to performance improvement, we introduce parallel sorting as data preprocessing and updating steps. Our final implementation for Nvidia GPUs achieves speedups of up to 200-fold over CPU reference code and of up to three orders of magnitude when compared with popular numerical software packages.
ETPL PDS-103
Abstract: LU factorization with partial pivoting is a canonical numerical procedure and the main component of the high performance LINPACK benchmark. This paper presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. The difficulty of implementing the algorithm for such a system lies in the disproportion between the computational power of the CPUs, compared to the GPUs, and in the meager bandwidth of the
Abstract: Partition enforcement policy is essential in the cache partition, and its main function is to protect the lines and retain the cache quota of each core. This paper focuses online protection based on its generation time rather than the CPU core ID that it belongs to or the position of the replacement stack, where it is located. The basic idea is that when a line is live, it must be protected and retained in the cache; when the line is "dead," it needs to be evicted as early as possible. Therefore, the live-time protected counter (LvtP, four bits) is augmented to trace the lines' live time. Moreover, dead blocks are predicted according to the access event sequence. This paper presents a pseudopartition approach--LvtPPP and proposes a two-cascade victim selection mechanism to alleviate dead blocks based on the LRU replacement policy and the LvtP counter. LvtPPP also supports flexible handling of allocation deviation by introducing a parameter $(lambda)$ to adjust the generation time of the line. There is significant improvement of the performance and fairness in LvtPPP over PIPP and UCP according to the evaluation results based on Simics. ETPL PDS-103 Modeling Propagation Dynamics of Social Network Worms
Abstract: Social network worms, such as email worms and facebook worms, pose a critical security threat to the Internet. Modeling their propagation dynamics is essential to predict their potential damages and develop countermeasures. Although several analytical models have been proposed for modeling propagation dynamics of social network worms, there are two critical problems unsolved: temporal dynamics and spatial dependence. First, previous models have not taken into account the different time periods of Internet users checking emails or social messages, namely, temporal dynamics. Second, the problem of spatial dependence results from the improper assumption that the states of neighboring nodes are independent. These two problems seriously affect the accuracy of the previous analytical models. To address these two problems, we propose a novel analytical model. This model implements a spatialtemporal synchronization process, which is able to capture the temporal dynamics. Additionally, we find the essence of spatial dependence is the spreading cycles. By eliminating the effect of these cycles, our model overcomes the computational challenge of spatial dependence and provides a stronger approximation to the propagation dynamics. To evaluate our susceptible-infectious-immunized (SII) model, we conduct both theoretical analysis and extensive simulations. Compared with previous epidemic models and the spatial-temporal model, the experimental results show our SII model achieves a greater accuracy. We also compare our model with the susceptible-infectious-susceptible and susceptibleinfectious-recovered models. The results show that our model is more suitable for modeling the propagation of social network worms.
Abstract: The analysis of real-world complex networks has been the focus of recent research. Detecting communities helps in uncovering their structural and functional organization. Valuable insight can be obtained by analyzing the dense, overlapping, and highly interwoven $(k)$-clique communities. However, their detection is challenging due to extensive memory requirements and execution time. In this paper, we present a novel, parallel $(k)$-clique community detection method, based on an innovative technique which enables connected components of a network to be obtained from those of its subnetworks. The novel method has an unbounded, user-configurable, and input-independent maximum degree of parallelism, and hence is able to make full use of computational resources. Theoretical tight upper bounds on its worst case time and space complexities are given as well. Experiments on real-world networks such as the Internet and the World Wide Web confirmed the almost optimal use of parallelism (i.e., a linear speedup). Comparisons with other state-of-the-art $(k)$-clique community detection methods show dramatic reductions in execution time and memory footprint. An open-source implementation of the method is also made publicly available. ETPL PDS-103 Scalable Hypergrid k-NN-Based Online Anomaly Detection in Wireless Sensor Networks
Abstract: Online anomaly detection (AD) is an important technique for monitoring wireless sensor networks (WSNs), which protects WSNs from cyberattacks and random faults. As a scalable and parameter-free unsupervised AD technique, $(k)$-nearest neighbor (kNN) algorithm has attracted a lot of attention for its applications in computer networks and WSNs. However, the nature of lazy-learning makes the kNN-based AD schemes difficult to be used in an online manner, especially when communication cost is constrained. In this paper, a new kNN-based AD scheme based on hypergrid intuition is proposed for WSN applications to overcome the lazy-learning problem. Through redefining anomaly from a hypersphere detection region (DR) to a hypercube DR, the computational complexity is reduced significantly. At the same time, an attached coefficient is used to convert a hypergrid structure into a positive coordinate space in order to retain the redundancy for online update and tailor for bit operation. In addition, distributed computing is taken into account, and position of the hypercube is encoded by a few bits only using the bit operation. As a result, the new scheme is able to work successfully in any environment without human interventions. Finally, the experiments with a real WSN data set demonstrate that the proposed scheme is effective and robust. ETPL PDS-103 Task Allocation for Undependable Multiagent Systems in Social Networks
Abstract: Task execution of multiagent systems in social networks (MAS-SN) can be described through agents' operations when accessing necessary resources distributed in the social networks; thus, task allocation can be implemented based on the agents' access to the resources required for each task and aimed to minimize this resource access time. Currently, in undependable MAS-SN, there are deceptive agents that may fabricate their resource status information during task allocation but not really contribute resources to task execution; although there are some game theory-based solutions for undependable MAS, but which do not consider minimizing resource access time that is crucial to the performance of task execution in social networks. To achieve dependable resources with the least access time to execute tasks in undependable MAS-SN, this paper presents a novel task allocation model based on the negotiation
Abstract: In this paper, we conduct an in-depth study on the feasibility of using network coding to ameliorate the content availability of BitTorrent swarms. We first perform mathematical analysis on the potential improvement in the content availability and bandwidth utilization induced by two existing network coding schemes. It is found that these two coding schemes either incur a very high coding complexity and disk operation overhead or cannot effectively leverage the potential of improving the content availability. In this regard, we propose a simple sparse network coding scheme in which both the drawbacks mentioned before are precluded. To accommodate the proposed coding scheme into BitTorrent, a new block scheduling algorithm is also developed based on the original rarest-first block scheduling policy of BitTorrent. Through extensive simulations and performance evaluations, we show that the proposed coding scheme is very effective in terms of improving the content availability of BitTorrent swarms when compared with some existing methods. ETPL PDS-103 Virtual Batching: Request Batching for Server Energy Conservation in Virtualized Data Centers
Abstract: Many power management strategies have been proposed for enterprise servers based on dynamic voltage and frequency scaling (DVFS), but those solutions cannot further reduce the energy consumption of a server when the server processor is already at the lowest DVFS level and the server utilization is still low (e.g., 10 percent or lower). To achieve improved energy efficiency, request batching can be conducted to group received requests into batches and put the processor into sleep between the batches. However, it is challenging to perform request batching on a virtualized server because different virtual machines on the same server may have different workload intensities. Hence, putting the shared processor into sleep may severely impact the application performance of all the virtual machines. This paper proposes Virtual Batching, a novel request batching solution for virtualized servers with primarily light workloads. Our solution dynamically allocates CPU resources such that all the virtual machines can have approximately the same performance level relative to their allowed peak values. Based on this uniform level, Virtual Batching determines the time length for periodically batching incoming requests and putting the processor into sleep. When the workload intensity changes from light to moderate, request batching is automatically switched to DVFS to increase processor frequency for performance guarantees. Virtual Batching is also extended to integrate with server consolidation for maximized energy conservation with performance guarantees for virtualized data centers. Empirical results based on a hardware testbed and real trace files show that Virtual Batching can achieve the desired performance with more energy conservation than several well-designed baselines, e.g., 63 percent more, on average, than a solution based on DVFS only