Dynamic Optimization of Multiattribute Resource Allocation in Self-Organizing Clouds

Dynamic Optimization of Multiattribute
Resource Allocation in Self-Organizing Clouds
Sheng Di, Member, IEEE, and Cho-Li Wang, Member, IEEE
Abstract—By leveraging virtual machine (VM) technology which provides performance and fault isolation, cloud resources can be
provisioned on demand in a fine grained, multiplexed manner rather than in monolithic pieces. By integrating volunteer computing into
cloud architectures, we envision a gigantic self-organizing cloud (SOC) being formed to reap the huge potential of untapped commodity
computing power over the Internet. Toward this new architecture where each participant may autonomously act as both resource
consumer and provider, we propose a fully distributed, VM-multiplexing resource allocation scheme to manage decentralized
resources. Our approach not only achieves maximized resource utilization using the proportional share model (PSM), but also delivers
provably and adaptively optimal execution efficiency. We also design a novel multiattribute range query protocol for locating qualified
nodes. Contrary to existing solutions which often generate bulky messages per request, our protocol produces only one lightweight
query message per task on the Content Addressable Network (CAN). It works effectively to find for each task its qualified resources
under a randomized policy that mitigates the contention among requesters. We show the SOC with our optimized algorithms can make
an improvement by 15-60 percent in system throughput than a P2P Grid model. Our solution also exhibits fairly high adaptability in a
dynamic node-churning environment.
Index Terms—Cloud computing, VM-multiplexing resource allocation, convex optimization, P2P multiattribute range query
Ç
1 INTRODUCTION
CLOUD computing has emerged as a compelling paradigm
for deploying distributed services. Resource
allocation problem in cloud systems emphasizes how to
harness the multiattribute resources by multiplexing operating
systems. With virtual machine (VM) technology [1],
we are able to multiplex several operating systems on the
same hardware and allow task execution over its VM
substrates without performance interference. Fine-grained
resource sharing can be achieved as each VM substrate can
be configured with proper shares of resources (such as
CPU, memory, storage, network bandwidth) dynamically.
In recent years, various enhancements on resource
isolation techniques [2], [13], [8] have been proposed to
achieve fine-grained dynamic resource provisioning. A
proportional share scheduler can be implemented based on
Xen’s credit scheduler [14] to multiplex CPU resource among
virtual machines in a fair manner. The balloon driver [15],
difference engine [10], joint-VM [8], and virtual putty [9], can
dynamically adjust the memory resource among colocated
virtual machines. The dm-ioband [16] can dynamically control
the usage of disk I/O bandwidth among colocated virtual
machines. These advanced techniques enable computing
resources to be dynamically partitioned or reassembled to
meet the elastic needs of end users. Such solutions create an
unprecedented opportunity to maximize resource utilization,
which were not possibly applied in most Grid systems
[34], [36], [19], [37], [38] that usually treat the underlying
resources as indivisible ones and prevent simultaneous
access to them.
Today’s cloud architectures are not without problems.
Most cloud services built on top of a centralized architecture
may suffer denial-of-service (DoS) attacks [3], unexpected
outages, and limited pooling of computational
resources. On the contrary, volunteer computing systems
(or Desktop Grids) can easily aggregate huge potential
computing power to tackle grand challenge science problems
[4]. In view of this, we propose a novel cloud
architecture, namely self-organizing cloud (SOC), which can
connect a large number of desktop computers on the
Internet by a P2P network. In SOC, each participating
computer acts as both a resource provider and a resource
consumer. They operate autonomously for locating nodes
with more abundant resource or unique services in the
network to offload some of their tasks, meanwhile they
could construct multiple VM instances for executing tasks
submitted from others whenever they have idle resources.
We focus on two key issues in the design of SOC: 1) the
multiattribute range query problem in a fully decentralized
environment for locating a qualified node to satisfy a user
task’s resource demand with bounded delay and 2) how to
optimize a task’s execution time by determining the optimal
shares of the multiattribute resources to allocate to the tasks
with various QoS constraints, such as the expected
execution time and limited budget.
As a fundamental difference to existing approaches, we
formulate such a resource allocation problem to be a convex
optimization problem [23]. Given a task with its resource
requirements and a budget, we first prove that the optimal
resource allocation on a qualified node that can minimize a
464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013
. S. Di is with the MESCAL Group, INRIA, Grenoble, Room 213,
Laboratoire LIG, ENSIMAG, antenne de Montbonnot, Zirst, 51, avenue
Jean Kuntzmann, 38330 Montbonnot Saint Martin, France.
E-mail: disheng222@gmail.com.
. C.-L. Wang is with the Department of Computer Science, The University of
Hong Kong, Pokfulam Road, Hong Kong. E-mail: clwang@cs.hku.hk.
Manuscript received 24 Sept. 2010; revised 2 Apr. 2012; accepted 11 Apr.
2012; published online 14 May 2012.
Recommended for acceptance by L.E. Li.
For information on obtaining reprints of this article, please send e-mail to:
tpds@computer.org, and reference IEEECS Log Number TPDS-2011-09-0722.
Digital Object Identifier no. 10.1109/TPDS.2012.144.
1045-9219/13/$31.00  2013 IEEE Published by the IEEE Computer Society
task’s execution time does exist. We further show that it is
nontrivial to solve such a convex optimization problem
directly via a brute-force strategy and the interior point
method [23]. By relaxing the problem definition, we
propose an algorithm to optimize the task execution time
on a qualified resource node, given its preset budget and
tolerable quality of service (QoS). The proposed algorithm
involves only OðR2Þ adjustment steps, where R denotes the
number of resource attributes (or dimensions). We further
propose a dynamic optimal proportional-share (DOPS) resource
allocation algorithm with OðR3Þ complexity, by
incorporating the proportional-share model (PSM) [12]. The
key idea is to dynamically scale the amount of resources at
each dimension among running tasks proportional to their
demand, such that these tasks could use up the maximum
capacity of each resource type at a node.
To locate qualified nodes in the SOC environment, we
design a fully decentralized range query protocol, namely
pointer-gossiping CAN (PG-CAN), tailored for DOPS. Existing
P2P desktop Grids favor CAN-based [17] or Chord-based
[18] resource discovery protocols [19], [20]. Every joining
node registers its static resource attributes (e.g., CPU
architecture, OS version) or maximum capacity on the
CAN/Chord overlay, so that other users could find the most
matched node within a logarithmic (or sublinear) number of
routing steps. Such a design is feasible for a P2P desktop Grid
because the resources of a selected node can only be used
exclusively by a single task. However, due to dynamic
resource provisioning technologies used in cloud, the
frequent resource repartitioning and reallocation (e.g., upon
task arrival or completion) make it a challenging problem to
locate a node containing a combination of available resources
along all the R resource attributes that would satisfy the
requirements of a submitted task.
The proposed PG-CAN range query protocol in this
work aims to find the qualified resources with minimized
contention among requesters based on task’s demand. It is
unique in that for each task, there is only one query message
propagated in the network during the entire course of
discovery. This is different from most existing multiattribute
range query solutions that require to propagate
multiple subqueries along multiple dimensions in parallel.
To mitigate the contention problem due to analogous
queries in CAN, our range query protocol proactively
diffuses resource indexes over the network and randomly
route query messages among nodes to locate qualified ones
that satisfy tasks’ minimal demands. To avoid possibly
uneven load distribution and abrupt resource overutilization
caused by uncoordinated node selection process from
autonomous participants, we investigate three node selection
policies, namely double-check policy [21], queuing
policy [22], and extra-virtual-dimension (VD) policy [19].
The rest of the paper is organized as follows: We
formulate the resource allocation problem in a VM-multiplexing
environment in Section 2. In Section 3, we prove
that optimal resource allocation does exist and show that
our solution is optimal. In Section 4, we present our DOPS
resource allocation scheme. Section 5 details the proposed
range query protocol. In Section 6, we show the simulation
results. We discuss related work in Section 7 and conclude
with an outline of future work in Section 8.
2 PROBLEM FORMULATION
Fig. 1 shows the entire journey of a task from its submission
to completion over the SOC system. In this work, we only
focus on the multiattribute range query problem (Step 2)
and the resource allocation problem for determining the
amount of resources of a qualified node to the submitted
task (Step 4).
Suppose there are n nodes in SOC, each is denoted as pi,
where 1  i n. Each node owns R different resources (or
resource attributes)1 managed by a Virtual Machine Monitor
(VMM). We denote  to be the set of resource attributes
owned by node pi and cðpiÞ ¼ ðc1ðpiÞ; c2ðpiÞ; . . .; cRðpiÞÞT as
pi’s capacity vector. For example, if a computer owns a
2.4 Gflops single-core CPU, 1 GB memory, and a 10 Mbps
network bandwidth, its capacity vector is (2.4, 1, 10)T .
Let mi denote the total number of tasks submitted to pi.
A task submitted to node pi is denoted as tij, where
1  j  mi. Each task is associated with an expected resource
vector eðtijÞ¼ ðe1ðtijÞ; e2ðtijÞ; . . . ; eRðtijÞÞT . The user-specified
expected resource vector is a rough estimation of the needed
amount of resources with respect to the R resource
attributes for a submitted task to be completed within a
tolerable execution time. After tij gets scheduled, we denote
its actual allocated resource as rðtijÞ ¼ ðr1ðtijÞ; r2ðtijÞ; . . . ;
rRðtijÞ)T , where rðtijÞ eðtijÞ. Here,  means the componentwise
inequality between two vectors. For short, we
denote rðtijÞ and rkðtijÞ as r and rk, respectively, in the case
without causing ambiguity.
Each task has a load vector, denoted as lðtijÞ ¼ ðl1ðtijÞ;
l2ðtijÞ; . . . ; lRðtijÞÞT , indicating the amount of workload on
each of the R resource attributes for completing the task.
For simplicity, we assume the execution of a task cannot be
done concurrently among different resources at the same
node. Hence, if a task tij is executed at pi, its execution time
is equal to lðtijÞT rðtijÞ1, where rðtijÞ1 ¼ ðr1
1 ðtijÞ; r1
2 ðtijÞ;
. . . ; r1
R ðtijÞÞT .
We define a preferential weight vector wðtijÞ ¼ ðw1ðtijÞ;
w2ðtijÞ; . . . ; wRðtijÞÞT , satisfying w1ðtijÞ : w2ðtijÞ :    :
wRðtijÞ ¼ l1ðtijÞ : l2ðtijÞ :    : lRðtijÞ, which indicates the relative
importance of a resource that might affect the
execution time of a task according to its property (e.g.,
CPU bound or IO bound). In essence, wðtijÞ acts as a more
relaxed requirement in using our model as we assume the
user does not know the exact load vector, but only needs to
specify the preferential weight vector.
DI AND WANG: DYNAMIC OPTIMIZATION OF MULTIATTRIBUTE RESOURCE ALLOCATION IN SELF-ORGANIZING CLOUDS 465
Fig. 1. The entire task execution procedure.
1. The R resource attributes can also be viewed as R dimensions, so we
use attributes and dimensions interchangeably in the following text.
To mimic the pricing scheme in a real-world cloud, we
follow a simple monetary model to analyze the economic
implications between consumers and providers. For any
node pi, its resource price vector is denoted as bðpiÞ ¼ ðb1ðpiÞ;
b2ðpiÞ; . . . ; bRðpiÞÞT , which is designated by the resource
provider. Let bkðpiÞð1  k  RÞ represent the per-time-unit
price for using the kth resource attribute on pi. Thus, to run
a task tij on node ps, the total payment is calculated as (1),
where t is the execution time of tij on ps
ðtij;tÞ ¼ t  bðpsÞT  rðtijÞ: ð1Þ
In our model, the user is “satisfactory” if the per-timeunit
rate, i.e., the total of per-time-unit cost for using the R
resource attributes at a selected node (denoted as BðtijÞ), is
always in accordance with Inequality (3). We argue that
users can hardly predict their tasks’ exact execution times in
practice. However, the users would still feel worthy as long
as the per-time-unit rate is still within their budget.
Given a submitted task tij with designated eðtijÞ and
wðtijÞ, this work investigates two key issues:
1. How to efficiently locate a qualified node in a largescale
peer-to-peer network for executing the
submitted task with controllable message delivery
overhead.
2. How to optimize tij’s execution time (i.e., (2)) by
determining the optimal amount of resources (i.e.,
rðÞ) to allocate to tij, subject to the constraints (3), (4),
and (5), where node ps’s available resource vector a(ps)
¼ cðpsÞ 
P
tij on ps
rðtijÞ
Min fðrðtijÞÞ ¼ lT ðtijÞ  r1ðtijÞ ¼
XR
k¼1
lk
rk
; ð2Þ
s.t.
bðpsÞT  rðtijÞ  BðtijÞ; ð3Þ
w1ðtijÞ :    : wRðtijÞ ¼ l1ðtijÞ :    : lRðtijÞ; ð4Þ
eðtijÞ  rðtijÞ  aðpsÞ: ð5Þ
We summarize the key notations in Table 1. In the
following text, we might omit the symbols tij and ps for
simplicity if thus would not lead to the ambiguity. For
example, the notations lkðtijÞ; ekðtijÞ; rðtijÞ; bkðpsÞ;BðtijÞ, and
aðpsÞ may be substituted by lk; ek; r; bk;B, and a, respectively,
in some following expressions or formulas.
3 OPTIMAL RESOURCE ALLOCATION
Given a task tij with its weight vector wðtijÞ and a budget
BðtijÞ, we first prove that the optimal resource allocation on
a qualified node ps with its price vector bðpsÞ does exist.
Lemma 1. The optimal allocation (denoted r(tij)) exists iff
(i.e., ()) Inequalities (6) and (7) are met
bðpsÞT  eðtijÞ  BðtijÞ; ð6Þ
eðtijÞ  aðpsÞ: ð7Þ
Proof. To prove ) : (transport property of inequalities)
If r(tij) exists, it must satisfy Inequalities (3) and (5), thus
the Inequalities (6) and (7) should hold.
To prove ( : (to satisfy Slater’s condition [23])
If bðpsÞT  eðtijÞ ¼ BðtijÞ or eðtijÞ ¼ aðpsÞ, then eðtijÞ is a
unique solution, which can be regarded as an optimal one.
If bðpsÞT  eðtijÞ < BðtijÞ and eðtijÞ aðpsÞ, other than eðtijÞ, there must exist another better solution (denoted r0ðtijÞÞ such that bðpsÞT  r0ðtijÞ < BðtijÞ and eðtijÞ  r0ðtijÞ  aðpsÞ, thus rðtijÞ must exist according to Slater’s condition [23]. Similarly, if bðpsÞT  eðtijÞ < BðtijÞ and eðtijÞ  aðpsÞ, Slater’s condition can also hold by excluding the equations from the constraints (7). tu We assume the qualified node ps that satisfies Inequalities (6) and (7) can be found based on tij’s expected resource vector eðtijÞ by a resource discovery protocol (to be discussed in Section 5). Thus, we could rewrite the constraint (5) to be Inequality (8) and construct a Lagrangian function F1ðrðtijÞÞ (i.e., (9)), where  and 1; 2; . . . ; R are the Lagrangian multipliers ðrk  ekÞðrk  akÞ  0; k ¼ 1; 2; . . .;R; ð8Þ F1ðrÞ ¼ XR k¼1 lk rk þ  XR k¼1 bkrk  B ! þ XR k¼1 kðrk  ekÞðrk  akÞ: ð9Þ As analyzed previously, the optimal resource allocation (r) does exist. Therefore, the optimal solution must satisfy Karush-Kuhn-Tucker (KKT) conditions [23], listed below bbT  r  B r k  ek  r k  ak   0 k ¼ 1; 2; . . .;R  0;   0   ðbT  r  BÞ ¼ 0 k   r k  ek  r k  ak  ¼ 0 k ¼ 1; 2; . . .;R  lk r2 k þ   bk þ k  2r k  ek  ak  ¼ 0 k ¼ 1; 2; . . .;R: 8>>>>>>>>>>< >>>>>>>>>>:
ð10Þ
That is, the optimal resource vector r could be found as
long as we could satisfy the above conditions simultaneously.
In order to solve the above simultaneous equations
466 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013
TABLE 1
Key Notations Used in SOC Model
and inequalities, there are two traditional candidate
strategies: 1) the brute-force method and 2) the interior
point method. Based on the brute-force method, we can first
focus on the fifth formula, which involves R equations
(kðr
k  ekÞðr
k  akÞ ¼ 0, where k ¼ 1; 2,. . . ,R) and 2R variables
(r
1 ; r
2; . . . ; r
R; 1; 2; . . . ; R). Each equation (e.g.,
kðr
k  ekÞðr
k  akÞ ¼ 0) holds, if and only if one of the
three conditions is valid, i.e., k ¼ 0; rk ¼ ek, or rk ¼ ak. For
those combinations with their Lagrangian multipliers being
equal to zero (e.g., k ¼ 0), we still need to calculate their
resource assignments (i.e., rk) based on the fourth formula
and the sixth formula. For all of the R equations
(k  ðr
k  ekÞðr
k  akÞ ¼ 0, where k ¼ 1; 2,. . . ,R), there are
totally 3R combinations that could make them simultaneously
hold, so the overall time complexity is O(3R), which
is intolerable with a large number of dimensions. Based on
the interior point method (or Newton’s method), we need to
guess a set of initial values for the resource vector r and try
to guarantee the method is converged with them, which is
complex especially because of a large number (2R þ 1) of
variables, and its computation result will only be an
approximate value. Consequently, this problem seems
unsolvable based on the two traditional methods.
By revisiting Constraint (5), it is clear that aðtijÞ is a
“firm” bound while eðtijÞ is a “soft” bound. That is, rðtijÞ
cannot be greater than aðtijÞ anyhow due to the limited
resource capacity. But it can be lower than eðtijÞ because
eðtijÞ was not a strict limitation but just estimated by users.
If we replace Constraint (5) with Constraint (11), we could
find an optimal solution through a few convex optimization
steps. That is, via such a constraint relaxation, we could
optimize the resource allocation for task tij on node ps
without exhausting all 3R possible combinations like the
brute-force method or worrying about the convergence
problem in the interior point method
rðtijÞ  aðpsÞ: ð11Þ
In the following text, we discuss the situation without
Constraint (11) via convex optimization analysis, and then
derive the optimal algorithm for the case with the constraint.
Theorem 1. In order to minimize fðrðtijÞÞ subject to Constraints
(3) and (4), tij’s optimal resource vector rðÞðtijÞ is shown as
(12), where k ¼ 1; 2; . . .;R. Note that rðÞðtijÞ is not subject to
Inequality (5) or (11), unlike the notation rðtijÞ
rðÞ
k ðtijÞ ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
wkðtijÞ=bkðpsÞ
p
PR
k¼1
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
wkðtijÞbkðpsÞ
p  BðtijÞ: ð12Þ
Proof. We first prove that the target function fðrÞ is convex,
then find the optimal rðÞ via convex optimization.
Since @2fðrÞ
@rk
¼ 2 lk
r3
k
> 0; f(r) is convex with a minimum
extreme point. Then, the target Lagrangian function can
be defined as (13) and  is the Lagrangian multiplier
F2ðrÞ ¼
l1
r1
þ  þ
lR
rR
þ ðb1r1 þ  þbRrR  BÞ: ð13Þ
Let @F2ðrÞ
@rk
¼ 0, then we could get (14), where k ¼ 1;
2; . . .;R. Accordingly, we can get (15)
lk=r2
k ¼ bk; ð14Þ
r1 : r2 :    : rR ¼
ffiffiffiffi
l1
b1
r
:
ffiffiffiffi
l2
b2
r
:    :
ffiffiffiffiffi
lR
bR
s
: ð15Þ
In order to minimize fðrÞ, the optimal resource vector
rðÞ should satisfy bT  r ¼ B. By combining this equation
with (4) and (15), we could get (12). tu
Remark. 1) Based on (15), the relative amount of resource
(rk) to be allocated to a task is determined by
ffiffiffi
lk
bk
q
. As
long as the users and resource providers express their
workload (lk) and price (bk) based on the same units of
measurement (e.g., Gflops or Gflops/$), the ratio
remains unchanged. 2) Formula (12) can be used to
derive the resource vector rðÞ for tij such that its
execution time can be minimized within a budget limit
(i.e., (3)). Based on (12) and the proof of Theorem 1, rðÞ
can be easily computed in O(R) time. 3) As mentioned,
rðÞðtijÞ is not subject to Inequality (5) or (11), but can be
regarded as an optimal resource allocation as long as it
satisfies Constraint (11), i.e., 8k: rðÞ
k  akðpsÞ. However, if
rðÞ does not fully satisfy Constraint (11) (i.e., 9k:
rðÞ
k > akðpsÞ), rðÞ is not a feasible solution. Therefore,
we propose Algorithm 1 with a provable time complexity
O(R2) to enforce Constraint (11), while minimizing
fðrðtijÞÞ under the two additional constraints (3) and (4).
Definition 1 (CO-STEP (; C)). Let  denote a subset of
resource attributes  (i.e., 
), while rðtijÞ and bT
 ðpsÞ
denote the resource vector assigned to tij and the price vector
specified by ps w.r.t. , respectively. Given a budget C, COSTEP
(; C) is a procedure for computing the optimal resource
vector for tij w.r.t. , which minimizes fðrðtijÞÞ, subject to
Constraints (4) and (16) but excluding Constraint (11) by
using convex optimization.
bT
ðpsÞ  rðtijÞ  C; where C is a constant: ð16Þ
We devise Algorithm 1 for minimizing fðrðtijÞÞ subject to
Constraints (3), (4), and (11), as shown below.
Algorithm 1. COMPUTE OPTIMAL RESOURCE VECTOR r
Input: ;BðtijÞ; bðpsÞ; aðpsÞ; Output: r
1:  ¼ ; C ¼ BðtijÞ; r ¼  (empty set);
2: repeat
3: rðÞ
 ¼ CO-STEP(,C); /* Compute optimal r
based on */
4:  ¼ fdkjdk 2  & rðÞ
k > ak}; /*Select elements
violating Constraint (11)*/
5:  ¼ n; /*Remove  from */
6: C ¼ C 
P
dk2 ðbk  akÞ; /*Update C*/
7: r ¼ r [ {r
k ¼ akjdk 2  & ak is dk’s upper bound};
8: until ( ¼ );
9: r ¼ r [ rðÞ
 ;
In this algorithm, Line 3 executes CO-STEP(; C) in order
to find the rðÞ
 without considering Constraint (11). If rðÞ

completely satisfies Constraint (11) (i.e.,  ¼ ), the final
result is found. Since CO-STEP(; C) excludes Constraint (11),
there might be some cases which are assigned with resource
amount larger than the availability (i.e., rðÞ
k > akðpsÞ). For
DI AND WANG: DYNAMIC OPTIMIZATION OF MULTIATTRIBUTE RESOURCE ALLOCATION IN SELF-ORGANIZING CLOUDS 467
these cases, we will assign the maximum available resources
to them (i.e., rðÞ
k ¼ ak) and remove the corresponding
resource attributes/types (i.e., ) away from . The remaining
budget (i.e., C ¼ C 
P
dk2 ðbk  akÞ) is also updated
accordingly. The whole process will continue until 
becomes empty. Since the time complexity of CO-STEP(,C)
is O(jj), the number of computation steps of Algorithm 1 in
the worst case is
PR1
i¼0 ðR  iÞ. The time complexity of
Algorithm 1 is OðR2Þ.
Theorem 2. Given a submitted task tij with its weight vector
wðtijÞ and a budget BðtijÞ, and a qualified node ps with its
resource price vector bðpsÞ, Algorithm 1’s output r is optimal
for minimizing tij’s execution time (i.e., f(r(tij))), subject to
Constraints (3), (4), and (11).
Proof. With Constraints (3), (4), and (11), this is a typical
convex optimization problem with the Lagrangian function
formulated as
F3ðrÞ ¼
XR
k¼1
lk
rk
þ 
XR
k¼1
bkrk  B
!
þ
XR
k¼1
kðrk  akÞ: ð17Þ
The main idea is to prove the output of Algorithm 1
must satisfy KKT conditions (i.e., the necessary and
sufficient condition of the optimization), which are
listed below
bT  r  B
r
k  ak  0 k ¼ 1; 2; . . .;R
 0;   0
 

bT  r  B

¼ 0
k 

r
k  ak

¼ 0 k ¼ 1; 2; . . .;R

lk
r2
k
þ   bk þ k ¼ 0 k ¼ 1; 2; . . .;R:
8>>>>>>>>< >>>>>>>>:
ð18Þ
Algorithm 1 starts with the execution of COSTEP(
;BðtijÞ), which returns a result of rðÞ
 . According
to Definition 1 and Theorem 1, rðÞ
 is deduced from
Constraints (4) and (16), so rðÞ
 must satisfy  lk
r2
k
  bk ¼
0 and bT  r ¼ B. Then, if we let k ¼ 0 for any k, there
must exist an assignment such that all conditions in
(18) hold except for the second condition rðÞ
k  ak  0.
Accordingly, r ¼ rðÞ
 as long as rðÞ
k  ak  0; 8 k in 
(i.e., 1  k  R).
If rðÞ
 cannot satisfy all the R Inequalities (i.e., 9k in
; rðÞ
k  ak > 0), we need to further adjust the solution rðÞ

to find an assignment that satisfies all KKT conditions in
the (18). Based on Algorithm 1, for those cases with rðÞ
k >
ak; rðÞ
k will be set to ak. Assuming there are h1 such cases
and they are denoted as r1; r2; . . . ; rh1 . Obviously, each
selected rk must satisfy k  ðrk  akÞ ¼ 0 because rk ¼ ak.
On the other hand, Algorithm 1 will continue to execute
CO-STEP(; C) on the remaining R  h1 undecided cases
(denoted as ), with C ¼ BðtijÞ 
Ph1
k¼1 rk. Likewise, all
the R  h1 cases (each denoted by rk; k ¼ h1 þ 1; . . .;R)
must also satisfy  lk
r2
k
þ   bk ¼ 0 and bT  r ¼ B. Thus, if
each of them meets the condition rk  ak  0, then the
R  h1 cases in  and the previously selected h1 cases will
together compose the solution satisfying all conditions in
(18). If there are still h2 (0 < h2  R  h1) new cases violating rk  ak  0 in this round, Algorithm 1 will continue the adjustment until the Hth round that either all the R  PH i¼1 hi cases can satisfy rk  ak  0 or  becomes empty. In the former case, we can easily verify that resource allocation among all the R attributes satisfy all KKT conditions in (18) simultaneously, composing an optimal solution. For the P latter, we could conclude R k¼1 bk  ak  B(tij), then the optimal resource allocation is r ¼ a. tu 4 OPTIMAL PROPORTIONAL-SHARE ALLOCATION In this section, we discuss the design of our dynamic optimal proportional-share resource allocation method, which leverages the proportional share model. The key idea to redistribute available resources among running tasks dynamically, such that these tasks could use up the maximum capacity of each resource in a node (i.e., up to cðpsÞ), while each task’s execution time can be further minimized in a fair way. DOPS consists of two main procedures: 1) Slice handler: It is activated periodically to equally scale the amount of resources allocated to tasks, such that each running task can acquire additional resources proportional to their demand along each resource dimension. 2) Event handler: It is responsible for resource redistribution upon the events of task arrival and completion. The pseudocodes are shown in Algorithm 2 and Algorithm 3. The slice handler (Algorithm 2) is periodically performed by ps’s VMM, while the event handler (Algorithm 3) is only invoked upon task arrival or completion. Algorithm 2. SLICE HANDLER (PSM) This program is activated periodically. 1: for (k ¼ 1 ! R) do 2: sum_allocation ¼ PM p¼1 r ðpÞk; 3: for (each tðjÞ; j ¼ 1; 2; . . .;M) do 4: r ðjÞk ¼ ðr ðjÞkr ðjÞk=sum allocationÞ  ck; 5: Assign r ðjÞk to tðjÞ; 6: end for 7: end for 8: Notify VMM to readjust resource allocation among all running tasks; Algorithm 3. EVENT HANDLER This program is invoked as an event is generated. 1: if (The event is the arrival of a scheduled task tðxÞ) then 2: aðxÞ ¼ cðpsÞ  Px1 j¼1 r ðjÞ; 3: Conduct Algorithm 1 for tðxÞ; 4: end if 5: if (The event is the completion of a task tðyÞ) then 6: aðpsÞ ¼ aðpsÞ þ r ðyÞ; /*Release tðyÞ’s resource r ðyÞ*/ 7: for (each tðiÞ still running on ps) do 8: if (9 k such that r ðiÞk < rðÞ ðiÞk; k ¼ 1; 2; . . .;R) then 9: Conduct Algorithm 1 for tðiÞ; 10: end if 11: end for 12: end if Suppose there are M tasks running on a particular node ps, denoted as tð1Þ; tð2Þ . . . ; tðMÞ based on their arrival order. Accordingly, wðiÞ;BðiÞ, and rðiÞ denote tðiÞ’s preferential 468 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013 weight vector, budget, and resource vector, respectively. Assume that when tðiÞ is to be scheduled onto ps, the available resource vector is denoted as aðiÞ, which can be componentwise calculated as cðpsÞ  Pi1 j¼1 r ðjÞ. Note that resource node ps found by our discovery protocol (to be described in Section 5) is “qualified”, i.e., for tðiÞði ¼ 1; 2; . . .;MÞbT ðpsÞ  eðiÞ  BðiÞ and eðiÞ  cðpsÞ  Pi1 j¼1 r ðjÞ. Thus, according to Lemma 1, the optimal resource vector r ðiÞ must exist and r ðiÞ ei. Based on the above analysis, it is possible that the resource along certain dimensions (say the kth) may not be fully used, provided that r ðMÞk is lower than aðMÞk. Consequently, we could improve the resource utilization by redistributing the remaining resource at the kth dimension. Algorithm 2 shows how to determine the amount of resources allocated to the M running tasks so as to make full use of the underlying resources along every dimension (i.e., up to cðpsÞ). By leveraging PSM, each running task can acquire its resources proportional to their computed optimal shares (i.e., r) along every dimension (Line 4-5). Since r ðiÞ  r ðiÞ, task tðiÞ will be executed faster with the augmented resource, while the payment will still be calculated based on r instead of r. This means the user will not be charged more even with any extra resource allocation. It is easy to prove that Algorithm 2’s time complexity is OðR MÞ. Another issue is how to determine the optimal resource allocation (i.e., r) upon task arrival and completion. As shown in Algorithm 3, when a new task is scheduled onto the node ps, it will get the optimal shares of resource based on the availability of resources at ps (Line 1-4). Note that the calculation is based on notations of Algorithm 1, instead of the scaled resource (i.e., r derived from Algorithm 2) as the slice handler will be activated afterwards based on the DOPS design. When a task is finished, it is possible for other running tasks to share the newly released resources (Line 5- 12). The time complexity is M  R3 according the execution steps described in Algorithm 3. There remain two issues concerning the sharing of the newly released resources: 1) Can the execution time of a running task with r 6¼ rðÞ be further reduced by allocating additional resource? 2) If the answer to the above question is yes, would there occur the resource contention problem, i.e., there exist two running tasks which compete for the newly released resource at the same resource attributes (or dimensions). Theorems 3 and 4 answer the two questions. Theorem 3. 8tðiÞ such that r ðiÞk < rðÞ ðiÞk, if another task was just completed and its released resource shares are denoted 4r ¼ ð4r1;4r2; . . . ;4rRÞT and 4rk > 0, then tðiÞ’s execution
time is able to be reduced by increasing r
ðiÞk.
Proof. For tðiÞ, if bT r
ðiÞ< BðiÞ, it is obvious that r ðiÞ rðÞ ðiÞ , because this statement is a contraposition of the statement “9 r ðiÞ¼ rðÞ ðiÞ) bT r ðiÞ¼ BðiÞ” according to Algorithm 1. Since 4rk > 0, there must exist an increment
> 0 along the kth dimension, such that the new
execution time of task tðiÞ (i.e., l1
r1
þ  þ lk
rkþ þ  þ lR
rR
)
is smaller than its original execution time (i.e., l1
r1
þ   þ
lk
rk
þ  þ lR
rR
).
Let us discuss the situation that bT  r ¼ BðiÞ for tðiÞ.
Recall that rðÞ is the resource vector calculated based on
the Formula (12) (i.e., without the constraint of the
capacity on each dimension), while r is the output of the
Algorithm 1 (i.e., with the capacity constraint on each
dimension). Since Formula (12) is derived from
bT  r ¼ BðiÞ, it is obvious that bT  rðÞ ¼ BðiÞ. So, we get
bT  r ¼ BðiÞ holds for both r ¼ r and r ¼ rðÞ. Hence, if
there exists a dimension k such that r
k < rðÞ k , there must exist another dimension (say j) such that r j > rðÞ
j .
Otherwise, bT  r < bT  rðÞ ¼ B, which is contradicting to bT  r ¼ BðiÞ. Based on the above analysis, let us introduce a small increment ð> 0Þ to bk  rk and the same
amount of decrement ðÞ to bj  rj such that r
k þ 
bk
 rðÞ
k
and r
j  
bj
rðÞ
j . Then, we just need to prove that such
an adjustment will make tðiÞ’s execution time fðrðtðiÞÞÞ
become shorter. We denote the original execution time of
tðiÞ as X and its execution time after the adjustment as Y .
We will show X Y >0. Note that the last deduction is
due to (14)
X ¼
l1
r1
þ  þ
lk
r
k
þ  þ
lj
r
j
þ  þ
lR
rR
ð19Þ
Y ¼
l1
r1
þ  þ
lk
r
k þ 
bk
þ  þ
lj
r
j  
bj
þ  þ
lR
rR
ð20Þ
X  Y ¼
lk
r
k
þ
lj
r
j
!

lk
r
k þ =bk
þ
lj
r
j  =bj
!
¼ 
lk=bk
r
k

r
k þ =bk
 
lj=bj
r
j

r
j þ =bj

!
> 
lk=bk 
rðÞ
k  =bk

rðÞ
k

lj=bj 
rðÞ
j þ =bj

rðÞ
j
!
> 
lk=bk
rðÞ
k  rðÞ
k

lj=bj
rðÞ
j  rðÞ
j
!
¼ 0:
tu
Theorem 4. Based on Algorithm 3, there would not appear the
conflict problem that two tasks with suboptimal resource
allocation (i.e., 9 k; r
k < rðÞ k ) compete for the released resource along the same dimension upon the completion of a task. Proof. Provided that a task was just finished and the released resources are 4 r ¼ ð4r1;4 r2; . . . ;4 rRÞT along R dimensions. Our objective is equivalently to prove that at any given dimension (denoted k), there exists at most one task (denoted tðiÞ) such that r ðiÞk < rðÞ ðiÞk based on our resource redistribution scheme described in Algorithm 3 (i.e., Line 5-11). To prove by contradiction, suppose there do exist two tasks running on ps (denoted as tðiÞ and tðjÞ), where tðiÞ arrives earlier than tðjÞ and they have been scheduled before the newly completed task. In addition, both tðiÞ and tðjÞ need to use resource on the kth dimension (i.e., r ðiÞk > 0 and r
ðjÞk > 0), while r
ðiÞk < rðÞ ðiÞk and r ðjÞk < rðÞ ðjÞk hold simultaneously. We will prove such scenario cannot happen as follows: If r ðiÞk < rðÞ ðiÞk holds, this implies that after tðiÞ is scheduled (Line 3 in Algorithm 3), tðiÞ must already use up the DI AND WANG: DYNAMIC OPTIMIZATION OF MULTIATTRIBUTE RESOURCE ALLOCATION IN SELF-ORGANIZING CLOUDS 469 resource at the dimension k. If tðiÞ had not used up the resource at dimension k (i.e., r ðiÞk < ak), this could only happen when r ðiÞk 6 rðÞ ðiÞk. If so, the original resource allocation on tðiÞ would not be optimized, which was proved by Theorem 3. Hence, tðiÞ must have already used up the resource on the k dimension if r ðiÞk < rðÞ ðiÞk. This contradicts to the assumption that the succeeding task tðjÞ is also able to be assigned with r ðiÞk, where 0 < r ðjÞk < rðÞ ðjÞk, from the kth dimension upon executing Line 3 in Algorithm 3 because the available amount of resource (i.e., aðjÞ) along the dimension k is equal to 0, if tðjÞ is scheduled after tðiÞ. This contradicts to the assumption that r ðjÞ > 0. Note r
ðjÞ ¼ 0 ) fðjÞ ¼1. Hence, it is
impossible for more than one task to keep suboptimal
share simultaneously along the same dimension. tu
5 POINTER-GOSSIPING CAN
Our resource allocation approach relies on the assumption
that all qualified nodes must satisfy Inequalities (6) and (7)
(i.e., Lemma 1). To meet this requirement, we design a
resource discovery protocol, namely pointer-gossiping CAN,
to find these qualified nodes. We choose CAN [17] as the
DHT overlay to adapt to the multidimensional feature.
Like traditional CAN, each node (a.k.a., duty node) under
PG-CAN is responsible for a unique multidimensional range
zone randomly selected when it joins the overlay. Fig. 2a
illustrates an example of CAN overlay network. Suppose
there are 25 joined nodes, each taking charge of a single
zone. If a new node (node 26) joins, a random point such as
(0.6 Gflops, 0.55 GB) will be generated and its zone will
be set as the new zone evenly split along a dimension from
the existing zone (node 25 in Fig. 2a) that contains this
point. If there is only one nonoverlapped range dimension
between two nodes (e.g., pi and pj) and they are adjacent at
this dimension, we call them neighbors to each other.
Furthermore, if the nonoverlapped range of pi is always
no less than pj’s, pi is called pj’s positive neighbor and pj is
called pi’s negative neighbor. In Fig. 2a, Nodes 9, 12, and 20
are positive neighbors of node 1.
Every node will periodically propagate the state-update
messages about its available resource vector aðpsÞ to the
duty node whose zone encloses this vector. After a task tij
generates a query (Step 1 in Fig. 2b) with the constraints (6)
and (7), the query message will be routed to the duty node
containing the expected vector eðtijÞ. We could justify that
the state messages (or state records) of all qualified nodes
must be kept in those onward nodes (i.e., shadow area in
Fig. 2b) of the duty node.
Obviously, the searching area may still be too large for
the complete resource query without flooding, so the
existing solutions [19] usually adopt random walk to get
an approximated effect. However, according to our observation
(to be presented), this will significantly reduce the
likelihood of finding qualified resources, finally degrading
the system throughput and user’s QoS. Alternatively, we
improve the mechanism by periodically diffusing a few
pointer messages for any duty nodes owning state-update
messages (or records) to the distant nodes (with distance as
2k hops, where k ¼ 0; 1; . . . ) toward negative directions, so
that these duty nodes could be more easily found. In Fig. 2,
for instance, Node 4’s negative pointer nodes along CPU
dimension are Nodes 14, 3, and 23. By periodically sending
pointer-recovery messages, each with empty payload outward,
each node could easily maintain the connection to the
negative pointer nodes. On the other hand, each query
routed to the duty node will check its stored records and the
pointed duty nodes. If it finds qualified resource records on
the current or other pointed duty nodes, it will return those
information to the requesting node; otherwise, it will
continue searching next positive neighbor duty nodes.
Each duty node (such as D1) will cache state-update
messages received from its neighbors, which are checked
periodically and removed if outdated (i.e., beyond their
TTL). In the meanwhile, it propagates its own identifier
(such as IP) to a few randomly selected pointer nodes
toward it negative direction. For those duty nodes containing
valid state messages, we call them nonempty-cache nodes.
Basically, there are two manners to propagate the duty
nodes’ identifiers (or pointers) backward—spreading manner
(Fig. 3a) and hopping manner (Fig. 3b), thus the PG-CAN
can also be split into two types, namely spreading mannerbased
PG-CAN (SPG-CAN) and hopping manner-based PGCAN
(HPG-CAN). In Fig. 3a, the duty node D1 sends a
pointer-message containing D1’s identifier to its selected
pointer nodes (such as D2 and D3), notifying them that D1
has records. Upon receiving the message, the pointer nodes
(D2 and D3) will further gossip D1’s identifer to their
negative direction pointer nodes along next dimension. In
Fig. 3b, the identifer of any nonempty-cache node will be
forwarded from pointer node to pointer node along each
dimension. Obviously, the former results in fewer number
of hops for message delivery, but its identifers cannot be
diffused as widely as the latter’s. In fact, we can prove that
the delay complexity of identifier delivery for the hopping
manner is Oðlog2 nÞ (Theorem 5), so the hopping manner is
likely to be better than the spreading manner (to be
confirmed in our simulation).
Theorem 5. The delay complexity of hops by hopping manner for
relaying any node’s index to any of its negative-direction
nodes is Oðlog2 nÞ, where n refers to the total number of nodes.
Note that log2 n ¼ d  log2 n1
d, so our objective is to prove
the delay is bounded under d  log2 n1
d. The strict proof can
be found in our previous work [24]. Here, we just use an
470 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013
Fig. 2. Range query on CAN.
example (shown in Fig. 4) to illustrate the idea. In this
example, suppose there are n1
d ¼ 19 nodes along each
dimension, it is obvious that the top-most node (Node 1)
will take longest time (less than Oðlogð19ÞÞ ¼ 4Þ to diffuse
its own index. Specifically, over the first hop, Nodes 2, 3, 5,
9, and 17 could receive the index (Node 1’s identifier). Via
the second hop, Nodes 4, 6, 7, 10, 11, and 13 could receive
the relayed index. For instance, Node 7 could receive Node
1’s index forwarded from Node 5 or Node 3. With just three
hops, most of the negative-direction nodes of Node 1 could
receive its index notification.
Obviously, it is infeasible for peer nodes to broadcast
their indexes (either their own identifiers or those of other
nodes to forward) due to the probably considerable
message delivery overhead. Suppose that L negative index
nodes are selected along each dimension as the notification
targets, the total number of the messages (denoted as !) to
deliver for any index is equal to L þ L2 þ  þLd ¼ LðLd1Þ
L1 .
Hence, the message overhead could be controlled by setting
L to a small value. For example, if L ¼ 2 and d ¼ 3, the total
number of messages is always only 14. In other words, L
has to be seriously limited in the design. L will always be
set to 2 in our following design. The whole pointer
gossiping procedure is conducted by two algorithms,
pointer-sender and pointer-relay. We just show the pseudocodes
for hopping manner (as presented in Algorithm 4 and
Algorithm 5), since the spreading manner’s can be easily
converted from it. Specifically, unlike the line 3-4 of
Algorithm 4, the spreading manner-based pointer-sender
algorithm will randomly select L negative pointer nodes
along the dimension #1 and send its state message to them.
The corresponding pointer-relay algorithm of these pointer
nodes will not only store these messages but also forward
them to L randomly selected negative-direction pointer
nodes at the next dimension.
Algorithm 4. POINTER-SENDER (HOPPING MANNER)
This program is invoked as long as pd detects that it owns
records.
1: while (TRUE) do
2: Construct a pointer-message containing pd’s
identifier, i.e., {pi’s ID, #1}, where #1 refers to the first
dimension;
3: Randomly select a negative pointer node PNi at
dimension #1;
4: Send {pi’s ID, #1, #1} to PNi;/*the 2nd field and 3rd
field indicate the current dimension number and the
number of pointer-messages sent along this
dimension*/
5: Sleep for a tiny cycle;
6: end while
Algorithm 5. POINTER-RELAY (HOPPING MANNER)
This is invoked upon receiving a pointer-message {pi’s ID,
#j, #k}.
1: Put pi’s ID in PointerList on the current node;
2: if (k < L) then 3: Randomly select a negative pointer node PNi along dimension #j; 4: Send {pi’s ID, #j, #k} to PNi; 5: else 6: j ¼ j þ 1; 7: if (j < R) then 8: Construct a new pointer-message: {pi’s ID, #j #1}; 9: Randomly select one negative pointer node along dimension #j; 10: Send {pi’s ID, #j, #1} to PNi; 11: end if 12: end if The procedure of resource query is shown in Fig. 3c. When a node (denoted as Q) generates a query message, it will first be routed to its duty node (denoted as D). On Node D, each stored record will be checked against the message’s demand (i.e., Conditions (6) and (7)). If Node D keeps enough qualified records for the query, they will DI AND WANG: DYNAMIC OPTIMIZATION OF MULTIATTRIBUTE RESOURCE ALLOCATION IN SELF-ORGANIZING CLOUDS 471 Fig. 4. Quick backward index diffusion. Fig. 3. The procedure of resource matching. be returned to the requesting node and the query will be terminated. If there are no matched records, a few other duty nodes pointed by the current duty node will be randomly selected and encapsulated in a so-called pointerjump message, which will be sent outward in a relay fashion (Steps 2, 3, 4, 5 in Fig. 3b) until it meets the qualified records midway through the traversal (then the query will be terminated) or all of the pointed duty nodes are checked (then the query message will be forwarded to D’s next neighbor (Step 6)). We present the pseudocode in Algorithm 6. FoundList is used to keep the matched records after traversing all the ones stored on the current node. At Line 10, the current node sends the query message to another duty node; upon receiving such a message, the remote duty node will also perform Line 2-12, yet at line 7 JumpList will be extracted from the received message instead. Algorithm 6. RESOURCE QUERY ALGORITHM This program at node pq is invoked upon receiving a query message {e;w;B}. 1: if (the current node is the duty node) then 2: Search pq’s record list and put the qualified records in FoundList; 3: if (FoundList is not empty) then 4: Send FoundList to the requesting node; 5: Return;/*Query is terminated here*/ 6: else 7: Construct JumpList by randomly selecting a few pointed duty nodes; 8: if (JumpList is not empty) then 9: Randomly take out a duty node and remove it from the JumpList; 10: Send the query message with JumpList to the selected duty node; 11: end if 12: end if 13: else 14: Forward the query message {e;w;B} based on CAN’s routing rules; 15: end if Note that the returned query result FoundList could be a set of qualified resource nodes based on Algorithm 3. Consequently, upon receiving the query result, the requesting node will randomly choose one out of them as the final resource node for executing the submitted task. With this random selection policy, we can effectively mitigate the decision conflict among different tasks (i.e., different nodes with analogous resource demands select the same node for executing their tasks, resulting in an abrupt resource overutilization situation) due to the un-coordinated node selection process from those autonomous participants. However, even with such an opportunistic scheduling policy, resource overutilization and load unbalancing phenomena cannot be totally eliminated if the number of tasks to be executed at a node cannot be controlled. We investigate three different policies to control imported tasks or disperse the load distribution, namely double-check policy [21], queue-assistant policy, and extra-virtual-dimension policy [19]. For the double-check policy, each requesting task will recheck the current resource availability of the selected node before the task is actually migrated. If the remote node does not allow extra load importing because this could make it overutilized due to an earlier task admission from another node, the task could get one more chance to select another qualified node. Unlike double-check policy, queueassistant policy allows user tasks to be temporally queued on the selected resource node even though its current resource cannot fit the new demand immediately. Extravirtual- dimension policy, which adopts an additional dimension for any new node joining the CAN overlay, is also a candidate policy in dispersing the zone distribution. We will evaluate all these policies in the next section. 6 PERFORMANCE EVALUATION 6.1 Experimental Setting To conduct the simulation, we first build an emulated proportional-share scheduler in accordance with Xen’s credit-scheduler. We use PeerSim [26] to implement the proposed CAN-based range query protocols on a large network containing 2,000 to 12,000 participating nodes. The hardware configuration of each node is randomly selected according to system parameters specified in Table 2. Via this table, we can derive the min capacity and max capacity at each resource dimension. For instance, along the CPU dimension, min capacity and max capacity are 1 1 ¼ 1 Gflops and 8 3:2 ¼ 25:6 Gflops, respectively, which happen when there is only one core at a node running at the speed of 1 Gflops and eight cores per node, each operating at 3.2 Gflops. Each node’s resource prices are randomly generated from the range [ 1 min capacity ; 100 max capacity ]. To investigate the contention issue in the course of resource query among user requests, we use a parameter called demand ratio (denoted as , where 1=8    1) to control the generation of resource demand from each user task, as shown in Table 3. Intuitively,  indicates different contention levels on each resource dimension in presence of large number of analogous queries injected from participants. For instance, if  is set to 1.0, each task’s expected resources would be randomly set in [1:0 min capacity; 1:0 max capacity]. If  is set to 0.2, the resource amounts demanded by all the 472 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013 TABLE 2 System Setting TABLE 3 User Task’s Demand tasks will be distributed in [0:2 min capacity; 0:2 max capacity] at each dimension, leading to a higher level of contention. Each experiment simulates 86,400 seconds (i.e., one day) and each node will periodically receive the user requests whose workload on each attribute (such as CPU, I/O) will be randomized based on a Poisson process with 4,000 seconds as its mean. For example, if a request’s workload vector is (CPU ¼ 2:4 GFLOP, disk data ¼100 Mb, network data ¼ 200 Mb), it will be finished until all the workloads are processed by subtracting the allocated shares of resource over time. Such a request could be analogous to the jobs which contain sequentially submitted tasks in Google’s trace [27], where job lengths are from dozens of minutes to several days. Moreover, Google’s trace shows that most of jobs contain only one single service, which conforms to our multiattribute resource allocation model. The weight vector of each task is generated based on the ratio of its workload on different resource dimensions (or phases). In practice, the weight vector could be estimated by statistics based on normalized usage data like Google’s trace. The TTL (i.e., age) of each state-update message is 600 seconds and message updating cycle is 400 seconds. We first compare the execution efficiency of SOC to that of P2P desktop Grid [19], [20] by taking into account the VMM overhead, to validate the efficacy of the DOPS algorithm. According to Google’s trace, task’s execution may be related to different attributes like CPU and memory. We consider five types of resource demand: CPU, disk speed, network, memory size, and disk space; thus constituting a 5D resource attribute space. The last two will not impact the task’s execution time but are regarded as the constraints during the resource discovery phase. According to the existing experimental report [29], we set the cost in maintaining one VM instance as follows: processor rate ¼ 5%, IO speed ¼ 10%, network bandwidth ¼ 5%, and memory cost ¼ 5 MB. We compare PG-CAN with hopping manner and spreading manner to other solutions, including the basic newscast model [30], random diffusion CAN (RD-CAN) [19], and virtual-dimension support [19]. Under the newscast setting, each node maintains a fixed-size cache containing 2 logðnÞ neighbors which are randomly selected from the whole node set. Each node will periodically push its state message to three sampled random neighbors and be further gossiped for three more hops. Any node’s cache could be refreshed by merging with one of its neighbors periodically. In RD-CAN, any duty node diffuses its received state messages over CAN toward the negative neighbors for a few hops, and any query message is raised with Condition (6) and (7). With VD support, every state records is inserted into the CAN space based on R þ 1 dimensions, in order to mitigate the analogous query contention. 6.2 Experimental Results Fig. 5 presents the throughput ratio between SOC and P2P desktop Grid (2,000 nodes), using spreading manner-based PG-CAN with the extra-virtual-dimension policy support and different demand ratios  ð¼ 1=2; 1=4; 1=6; and 1=8). The throughput ratio is defined as the ratio of the number of finished tasks and the total number of generated tasks in the whole system over time. In SOC, every task is allowed to share the multiple types of resources on the same node, so the resources can be utilized more abundantly. For example, a CPU-bound task and an IO-bound task could run at the same physical node at the same time by leveraging VM resource isolation technology. We observe that SOC would achieve up to about 60 percent improvement as task sizes are relatively small on average (say  ¼ 1=6 or 1/8). When all the task sizes are relatively large (say  ¼ 1=2), SOC could still get about 15 percent improvement compared to P2P Grid model. Another observation is that the additional cost of maintaining VM instances is always constant, which becomes neglectable with smaller task sizes. In addition to the throughput, we also evaluate task’s execution efficiency from the perspective of execution time. According to user’s expected resource vector eðtijÞ, we define tij’s expected execution time as PR k¼1 lk ek . Then, we define tij’s execution efficiency (denoted as ij) as its expected execution time divided by its real turnaround time (from the task’s submission to its completion). Apparently, higher value of ij implies shorter execution time of task tij. From Fig. 6, we observe that both SOC and P2P Grid deliver satisfactory average execution efficiency, which is calculated based on all the finished tasks. The reason why the average execution efficiency in P2P Grid appears a little higher than that in SOC is due to the fact that exclusive queuing model in P2P Grid may allocate relatively more resource amount to DI AND WANG: DYNAMIC OPTIMIZATION OF MULTIATTRIBUTE RESOURCE ALLOCATION IN SELF-ORGANIZING CLOUDS 473 Fig. 6. SOC versus P2P grid (on execution efficiency). Fig. 5. SOC versus P2P Grid (on throughput). each task. While this could result in shorter execution time for each individual task, the tradeoff is a lower throughput as reported in Fig. 5. Fig. 7 shows the converged throughput ratio under the SPG-CAN with different policies (or their combinations) to control the load congestion with different demand ratios. We observe that the combination of double-check/queueassistant policy and the extra-virtual-dimension policy performs better than the pure policies without extravirtual- dimension. We also observe that the queue-assistant policy outperforms double-check policy in most of cases. Recall that in the queue-assistant policy, the task will be failed (i.e., the searching is terminated) if there were no qualified resources found. This means that this policy suffers the least query cost compared to others. As such, the combination of queue-assistant and extra-virtual-dimension policy seems the best. As follows, we will show that HPGCAN without any load control policy would still outperform the SPG-CAN with extra-virtual-dimension policy. In the rest of this section, we uniformly adopt the queueassistant mode with VM cost. Figs. 8 and 9 present the effectiveness of different range query protocols on SOC with 2,000 nodes, under various load ratios (). The failed task ratio in Fig. 8b refers to the number of tasks that cannot find qualified node divided by the total number of submitted tasks. We observe that HPG-CAN leads to the best performance (including highest throughput ratio and lowest failed task ratio) in that it could efficiently discover the global idle resources. Moreover, its failed task ratio can be limited down to 0.00007 in the situation with smaller load-ratio  (Fig. 9b). Note that smaller  means higher degree of resource contention in that all such small-demand queries would alwaysbe routed to the similardutynodeslocated at the lower position of the CAN space. In comparison, SPG-CAN works notably inferior toHPG-CANbecause of its suboptimal effect in gossiping nonempty-cache nodes’ identifiers. Fig. 9cshows aninteresting result about average execution efficiency: task’s execution efficiency under HPG-CAN converges up to 4, which is much higher than that of SPG-CAN. In other words, HPG-CAN outperforms the other solutions on all the three key metrics. Furthermore, we use Jain’s fairness index [31] (denoted ’ 2 [0,1]) to evaluate the fairness among user tasks’ completion time. The fairness index is given in (21), where ij refers to task tij’s execution efficiency. Higher ’ implies more steady execution efficiency ’ ¼ Pn i¼1 Pmi j¼1 ij 2 Pn i¼1 mi   Pn i¼1 Pmi j¼1 2 ij  : ð21Þ Figs. 8c and 9c present the fairness of all the completed tasks’ execution. We observe that HPG-CAN reaches the highest fairness, which means it provides most stable results compared to other solutions. We evaluate the system scalability (as shown in Table 4) of the PG-CAN protocol. All the values in this table are recorded after the one-day duration test. With the increasing system scale (e.g., up to 12,000 nodes), the performance metrics (including throughput, average efficiency (i.e., the 474 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013 Fig. 7. Throughput ratio under different load control policies. Fig. 8. The effectiveness of query protocols ( ¼ 1 2 ). Fig. 9. The effectiveness of query protocols ( ¼ 1 4 ). TABLE 4 System Scalability of PG-CAN mean value of ij), etc.) will not change notably. The message delivery cost is defined as the number of messages to be sent/ forwarded by each node on average during the whole 24 hours. Our test shows that this cost increases roughly with a logarithmic speed, which is much better than linear rate. Moreover, most of messages are actually lightweight pointer messages, each containing just an identifier. Thus, the PGCAN protocol can result in little message delivery overhead. We evaluate the PG-CAN protocol under node churning situations.Weassume there are X percent of nodes arbitrarily joining/departuring the entire system every minute. We faithfully implement the node departure maintenance on each departure node’s neighbors to refresh their neighborhoods and a binary partition tree-based background zone reassignment algorithm [17] to ensure each node always corresponds to one globally unique zone. Specifically, each node does not actually maintain the global view of such a tree but only needs to distributively communicate with its neighbors. In our simulation, the demand ratio  is set to 1/4 and X percent (also called dynamic level) will be set to 0, 3, 6, and 9 percent. Note that the environment with node churning rate (dynamic level) set to 3 percent per minute is already very volatile, especially compared to the 4,000-seconds average completion time for all tasks (as mentioned in Section 6.1). In order to observe the impact of node churning to the PGCAN, we first conduct our simulation under an assumption that the tasks would not be suspended/interrupted on the departure nodes. Later, we eliminate this assumption by considering task checkpoint/migration cost to observe the synthetic system performance. Figs. 10 and 11 show the PGCAN’s working efficiency under the noninterrupting task condition, based on hopping manner and spreading manner, respectively. It is not surprising that HPG-CAN works much better than SPG-CAN in the dynamic environment, due to considerably higher system throughput (HPG-CAN converges up to 0.83 while SPG-CAN converges to about 0.78) and lower failed task ratio (HPG-CAN converges to about 0.0005 while SPG-CAN converges to 0.025). In Fig. 11b, we observe that the blue curve (the case without any churning nodes) decreases at the beginning and increases linearly for the rest of time. This is reasonable due to the following analysis: at the beginning, the whole system is relatively idle such that most of the hosts are qualified for any submitted tasks; however, with increasing number of tasks submitted, the failure probability of finding qualified nodes for any task will be increased accordingly, especially when the rate of processing tasks becomes lower than that of importing new tasks. What is most interesting for SPG-CAN is that the overall system performance (including throughput ratio, failed task ratio) will not get worse with increasing dynamic level of the environment, but get prominent improvement on contrary (See Fig. 11). This is sound, since in the dynamic environment, the nodes would frequently change their neighbors and the pointercache maintained would also be changed accordingly. As such, the pointers and state messages would be more widely diffused than the original SPG-CAN, leading to a higher probability of finding qualified resource nodes. We further analyze the overall system performance by taking into account the tasks’ interruption and their checkpointing/ migration cost on the departure nodes. In practice, the tasks running on the departure nodes will probably be interrupted or suspended until the nodes are restored. Hence, weassume the scheduled (or running) tasks on any departure node would take longer time to complete, and the wasted time is set to be equal to that for executing 10 percent more load of the current remaining workload at each resource attribute. From Figs. 12 and 13, we observe that HPG-CAN with 6 percent-dynamic-level can still outperform SPG-CAN with 0 percent-dynamic-level. Although the performance degradation could be observed with increasing dynamic level (Fig. 12), the whole system could still perform very well when there are 3 percent churning nodes per minute (i.e., about 1  0:9723 ¼ 50:4 percent churning nodes per 23 minutes), confirming the high adaptability of our solution. DI AND WANG: DYNAMIC OPTIMIZATION OF MULTIATTRIBUTE RESOURCE ALLOCATION IN SELF-ORGANIZING CLOUDS 475 Fig. 12. HPG-CAN under dynamic environment with checkpointing and task migration cost. Fig. 11. SPG-CAN under dynamic environment. Fig. 10. HPG-CAN under dynamic environment. Fig. 13. SPG-CAN under dynamic environment with checkpointing and task migration cost. 7 RELATED WORK SOC is different from the traditional Grid model (including P2P desktop Grid [32], [33]) in the resource consumption manner. Grids generally assume exclusive resource usage to ensure user QoS. The problem of job scheduling in Grids is usually categorized as a multiprocessor scheduling (MPS) problem [5], [6] (a kind of combinatorial optimization problem), which has been proved to be NP-complete [7]. Accordingly, many approximation algorithms as well as (meta)heuristics applied to various versions of the MPS problem in the Grid environment have been studied. For example, Rossi et al. [34] proposed a metaheuristic for solving the fixed job scheduling problem where processors are subject to spread time constraints, i.e., the time spent between the submission time and the completion time should not exceed a given duration. Generalized Extremal Optimization (GEO) [35] is another metaheuristic for solving the MPS problem. Singh et al. [36] approached the Grid scheduling problem through a cost-based provisioning model and a multiobjective genetic algorithm for getting approximately optimized performance (such as throughput). In P2P desktop Grids, Kim et al. [19] proposed a heuristic load balancing method for improving the task scheduling throughput on desktop Grids over CAN overlay. Similarly, Zheng et al. [37] formulated the problem to be a bins-and-balls model with herds phenomenon and tried to get the approximately optimal performance using a stochastic algorithm atop a DHT overlay. Lee and Snavely [38] studied a user-centric utility function of task turnaround time to improve the system performance based on simulation. Compared with these existing works, we devise an autonomous VM-multiplexing resource consumption model, namely SOC, which allows each task to dynamically make full use of the resource slices isolated by VM technology. Although there are also a few existing research studies on VM-multiplexing strategies, they are not well suited to the SOCfor most of them mainly focus on a few specific attributes such asCPUor memory.For instance, virtual-putty [9] used an application-load forecasting method as well as a strategy for reshaping the involved VMs to improve a single host’s CPU and I/O resource utilization. Gupta et al. [10] proposed a method allowing memory sharing to happen within page boundaries only. Govindan et al. [11] adopted statistical multiplexing of applications to make applications fit into the given power budgets. In contrast, Meng et al. [8] explicitly endeavored to maximize the VM-multiplexing resource utilization by analyzing VM-pairs’ compatibility in terms of the forecasted workload and estimated VM sizes. However, two significant drawbacks still remain: 1) poor scalability due to the central management of VM-correlation matrix; 2) restrictive constraints on implementation since they only identify the compatibility of VM-pairs. To overcome these problems, we formulate multiattribute resource allocation as a convex optimization problem and devise a resource allocation algorithm to minimize the task execution time with O(R3) time complexity. Since the node identifiers over the DHT are often generated based on some hash functions, it is uneasy to directly perform range queries. Some existing strategies [39] have to build an extra layer to reorganize all of nodes over the DHT, whereas others (such as [19]) leverage a CAN topology. Many other existing works [39], [40], [41], [42], [43], [44] mainly focus on how to locate the duty nodes that satisfy the user-specified range in all dimensions with limited delays. However, for most tasks with low resource requirements (which is true in most cloud-based applications), nearly all the nodes in the network can be qualified. This will generate a vast amount of network traffic and also adds large burden to user on the filtering process. Indeed, most ordinary users just want the system to quickly locate a qualified node to meet its QoS goals. This issue however becomes more complex due to the adoption of rather flexible VM-enabled allocation scheme and the highdimensional range query conditions. In view of this problem, we propose a new distributed protocol to search resources with the mitigated contention among requesters and strictly limited query-message traffic cost. 8 CONCLUSIONS AND FUTURE WORK This paper proposes a novel scheme (DOPS) for virtual resource allocation on a SOC, with three key contributions listed below. . Optimization of task’s resource allocation under user’s budget: With a realistic monetary model, we propose a solution which can optimize the task execution performance based on its assigned resources under the user budget. We prove its optimality using the KKT conditions in the convex-optimization theory. . Maximized resource utilization based on PSM: In order to further make use of the idle resources, we design a dynamic algorithm by combining the above algorithm with PSM and the arrival/completion of new tasks. This can give incentives to users by gaining an extra share of unused resource without more payment. Experiments confirm achieving a superoptimal execution efficiency of their tasks is possible. DOPS could get an improvement on system throughput by 15 percent 60 percent than the traditional methods used in P2P Grid model, according to the simulation. . Lightweight resource query protocol with low contention: We summarize the resource searching request as two range query constraints, (6) and (7). We prove them to be the sufficient and necessary conditions for getting the optimal resource allocation. Experiments confirm the designed PG-CAN protocol with lightweight query overhead is able to search qualified resources very effectively. So far, we have successfully built a prototype supporting live migration of VMs between any two nodes on the Internet (even though they are behind different NATs). In the future, we will study fault-tolerance support for a (DOPS-based, PG-CAN-enabled) SOC system; we will also conduct sensitivity analysis of how violation of our model assumptions would impact the optimal resource allocation. ACKNOWLEDGMENTS This research is supported by a Hong Kong RGC grant HKU 7179/09E and a Hong Kong UGC Special Equipment Grant (SEG HKU09). REFERENCES [1] J.E. Smith and R. Nair, Virtual Machines: Versatile Platforms for Systems And Processes. Morgan Kaufmann, 2005. 476 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013 [2] D. Gupta, L. Cherkasova, R. Gardner, and A. Vahdat, “Enforcing Performance Isolation across Virtual Machines in Xen,” Proc. Seventh ACM/IFIP/USENIX Int’l Conf. Middleware (Middleware ’06), pp. 342-362, 2006. [3] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A. Konwinski, G. Lee, D.A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “Above the Clouds: A Berkeley View of Cloud Computing,” Technical Report UCB/EECS-2009-28, Feb. 2009. [4] D.P. Anderson, “Boinc: A System for Public-Resource Computing and Storage,” Proc. IEEE/ACM Fifth Int’l Workshop Grid Computing, pp. 4-10, 2004. [5] P. Crescenzi and V. Kann, A Compendium of NP Optimization Problems. ftp://ftp.nada.kth.se/Theory/Viggo-Kann/ compendium.pdf, 2012. [6] O. Sinnen, Task Scheduling for Parallel Systems, Wiley Series on Parallel and Distributed Computing. Wiley-Interscience, 2007. [7] O.H. Ibarra and C.E. Kim, “Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors,” J. ACM, vol. 24, pp. 280-289, Apr. 1977. [8] X. Meng et al., “Efficient Resource Provisioning in Compute Clouds via vm Multiplexing,” Proc. IEEE Seventh Int’l Conf. Autonomic Computing (ICAC ’10), pp. 11-20, 2010. [9] J. Sonneck and A. Chandra, “Virtual Putty: Reshaping the Physical Footprint of Virtual Machines,” Proc. Int’l HotCloud Workshop in Conjunction with USENIX Ann. Technical Conf., 2009. [10] D. Gupta et al., “Difference Engine: Harnessing Memory Redundancy in Virtual Machines,” Proc. Eighth Int’l USENIX Symp. Operating Systems Design and Implementation, pp. 309-322, 2008. [11] S. Govindan, J. Choi, B. Urgaonkar, A. Sivasubramaniam, and A. Baldini, “Statistical Profiling-Based Techniques for Effective Power Provisioning in Data Centers,” Proc. Fourth ACM Conf. European Conf. Computer Systems (EuroSys ’09), pp. 317-330. 2009, [12] M. Feldman, K. Lai, and L. Zhang, “The Proportional-Share Allocation Market for Computational Resources,” IEEE Trans. Parallel and Distributed Systems, vol. 20, no. 8, pp. 1075-1088, Aug. 2009. [13] S. Soltesz, H. Poetzl, M.E. Fiuczynski, A. Bavier, and L. Peterson, “Container-Based Operating System Virtualization: A Scalable, High-Performance Alternative to Hypervisors,” Proc. Second ACM Int’l European Conf. Computer Systems (Euro ’07), pp. 275-287. 2007, [14] L. Cherkasova, D. Gupta, and A. Vahdat, “Comparison of the Three cpu Schedulers in Xen,” SIGMETRICS Performance Evaluation Rev., vol. 35, no. 2, pp. 42-51, 2007. [15] “The Role of Memory in Vmware Esx Server 3: On Line At: http:// www.vmware.com/pdf/esx3_memory.pdf,”technical report, 2012. [16] dm-ioband: http://sourceforge.net/apps/trac/ioband, 2012. [17] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, “A Scalable Content-Addressable Network,” Proc. ACM SIGCOMM ’01, pp. 161-172, 2001. [18] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” Proc. ACM SIGCOMM ’01, pp. 149-160, 2001. [19] J.S. Kim et al., “Using Content-Addressable Networks for Load Balancing in Desktop Grids,” Proc. 16th ACM Int’l Symp. High Performance Distributed Computing (HPDC ’07), pp. 189- 198, 2007. [20] A. Leite, H. Mendes, L. Weigang, A. de Melo, and A. Boukerche, “An Architecture for P2P Bag-of-Tasks Execution with Multiple Task Allocation Policies in Desktop Grids,” Proc. IEEE Int’l Conf. Cluster Computing, pp. 1-11, Feb. 2011. [21] Y. Drougas and V. Kalogeraki, “A Fair Resource Allocation Algorithm for Peer-to-Peer Overlays,” Proc. IEEE INFOCOM ’05, pp. 2853-2858, 2005. [22] D. Gross and C.M. Harris, Fundamentals of Queueing Theory, Wiley Series in Probability and Statistics. Wiley-Interscience, Feb. 1998. [23] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Univ. Press, 2009. [24] S. Di, C.-L. Wang, W. Zhang, and L. Cheng, “Probabilistic Best-Fit Multi-Dimensional Range Query in Self-Organizing Cloud,” Proc. IEEE 40th Int’l Conf. Parallel Processing, pp. 763-772, 2011. [25] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, “Xen and the Art of Virtualization,” Proc. 19th ACM Symp. Operating Systems Principles (SOSP ’03), pp. 164-177, 2003. [26] Peersim Simulator: http://peersim.sourceforge.net, 2012. [27] Google Cluster-Usage Traces: http://code.google.com/p/ googleclusterdata, 2012. [28] C.A. Waldspurger, “Memory Resource Management in Vmware Esx Server,” http://www.usenix.org/events/osdi02/tech/ waldspurger.html, 2012. [29] J.P. Walters, V. Chaudhary, M. Cha, S. Guercio Jr., and S. Gallo, “A Comparison of Virtualization Technologies for hpc,” Proc. IEEE 22nd Int’l Conf. Advanced Information Networking and Applications (AINA ’08), pp. 861-868, 2008. [30] W.K. Mark Jelasity and M. van Steen, “Newscast Computing,” technical report, Vrije Universiteit Amsterdam, 2006. [31] R.K. Jain, The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation and Modelling. John Wiley & Sons, Apr. 1991. [32] J. Cao, F.B. Liu, and C.Z. Xu, “P2pgrid: Integrating P2P Networks Into the Grid Environment: Research Articles,” vol. 19, no. 7, pp. 1023-1046, 2007. [33] H. Abbes, C. Cerin, and M. Jemni, “Bonjourgrid: Orchestration of Multi-Instances of Grid Middlewares on Institutional Desktop Grids,” Proc. IEEE 23rd Int’l Symp. Parallel & Distributed Processing (IPDPS ’09), pp. 1-8, 2009. [34] A. Rossi, A. Singh, and M. Sevaux, “A Metaheuristic for the Fixed Job Scheduling Problem under Spread Time Constraints,” Computational Operation Research, vol. 37, pp. 1045- 1054, June 2010. [35] P. Switalski and F. Seredynski, “Generalized Extremal Optimization for Solving Multiprocessor Task Scheduling Problem,” Proc. Seventh Int’l Conf. Simulated Evolution and Learning, pp. 161-169, 2008. [36] G. Singh, C. Kesselman, and E. Deelman, “A Provisioning Model and its Comparison with Best-Effort for Performance-Cost Optimization in Grids,” Proc. 16th ACM Symp. High Performance Distributed Computing (HPDC ’07), 117-126, 2007. [37] Q. Zheng, H. Yang, and Y. Sun, “How to Avoid Herd: A Novel Stochastic Algorithm in Grid Scheduling,” Proc. 15th ACM Int’l Symp. High Performance Distributed Computing (HPDC ’06), pp. 267- 278, 2006. [38] C.B. Lee and A.E. Snavely, “Precise and Realistic Utility Functions for User-Centric Performance Analysis of Schedulers,” Proc. 16th ACM Int’l Symp. High Performance Distributed Computing (HPDC ’07), pp. 107-116, 2007. [39] A.R. Bharambe, M. Agrawal, and S. Seshan, “Mercury: Supporting Scalable Multi-Attribute Range Queries,” Proc. ACM SIGCOMM ’04, pp. 353-366, 2004. [40] D. Li, J. Cao, X. Lu, and K.C.C. Chen, “Efficient Range Query Processing in Peer-to-Peer Systems,” IEEE Trans. Knowledge and Data Eng., vol. 21, no. 1, pp. 78-91, Jan. 2009. [41] A. Gonzalezbeltran, P. Milligan, and P. Sage, “Range Queries Over Skip Tree Graphs,” Computer Comm., vol. 31, no. 2, pp. 358- 374, Feb. 2008. [42] S. Wang, Q.H. Vu, B.C. Ooi, A.K. Tung, and L. Xu, “Skyframe: A Framework for Skyline Query Processing in Peer-to-Peer Systems,” J. VLDB, vol. 18, pp. 345-362, Jan. 2009. [43] M.A. Arefin, M.Y.S. Uddin, I. Gupta, and K. Nahrstedt, “Q-Tree: A Multi-Attribute Based Range Query Solution for Tele-Immersive Framework,” Proc. IEEE 29th Int’l Conf. Distributed Computing Systems (ICDCS ’09), pp. 299-307, 2009. [44] J. Wang, S. Wu, H. Gao, J. Li, and B.C. Ooi, “Indexing Multi- Dimensional Data in a Cloud System,” Proc. ACM Int’l Conf. Management of Data (SIGMOD ’10), pp. 591-602, 2010. DI AND WANG: DYNAMIC OPTIMIZATION OF MULTIATTRIBUTE RESOURCE ALLOCATION IN SELF-ORGANIZING CLOUDS 477 Sheng Di received the MPhil degree from Huazhong University of Science and Technology in 2007 and the PhD degree from the University of Hong Kong in 2011. He is currently a postdoctor researcher at INRIA. His research interest involves optimization of distributed resource allocation especially in P2P systems and large-scale cloud computing platforms. His background is mainly on the fundamental theoretical analysis and practical system implementation. He is a member of the IEEE. Cho-Li Wang received the PhD degree from the University of Southern California in 1995. His research interests include multicore computing, software systems for cluster and grid computing, and virtualization techniques for cloud computing. He serves on the editorial boards of several international journals, including IEEE Transactions on Computers (2006- 2010), Journal of Information Science and Engineering, and Multiagent and Grid Systems. He is a member of the IEEE. . For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib. 478 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013


Comments are closed.