EAFR An Energy-Efficient Adaptive File Replication System in Data-Intensive Clusters

EAFR: An Energy-Efficient Adaptive File Replication System in Data-Intensive Clusters    (IEEE 2017 – 2018)

Abstract:

In data intensive clusters, a large amount of files are stored, processed and transferred simultaneously. To increase the data availability, some file systems create and store three replicas for each file in randomly selected servers across different racks. However, they neglect the file heterogeneity and server heterogeneity, which can be leveraged to further enhance data availability and file system efficiency (in terms of replication delay and request response delay). As files have heterogeneous popularities, a rigid number of three replicas may not provide immediate response to an excessive number of read requests to hot files, and waste resources (including energy) for replicas of cold files that have few read requests. Also, servers are heterogeneous in network bandwidth, hardware configuration and capacity (i.e., the maximal number of service requests that can be supported simultaneously), it is crucial to select replica servers to ensure low replication delay and request response delay. In this paper, we propose an Energy-Efficient Adaptive File Replication System (EAFR), which incorporates three components. It is adaptive to time-varying file popularities to achieve a good tradeoff between data availability and efficiency. Higher popularity of a file leads to more replicas and vice versa. Also, to achieve energy efficiency, servers are classified into hot servers and cold servers with different energy consumption, and cold files are stored in cold servers. Further, EAFR selects a server with sufficient capacity (including network bandwidth and capacity) to hold a replica. Experimental results on a real-world cluster show the effectiveness of EAFR in reducing file read latency, replication time, and power consumption in large clusters.


Comments are closed.