Distributed Multiple ConstraintsGeneralized Sidelobe Canceler for Fully Connected Wireless Acoustic Sensor Networks

October 21, 2013 Comments Off Posted in: Final year projects

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013 343
Distributed Multiple ConstraintsGeneralized Sidelobe
Canceler for Fully Connected Wireless Acoustic
Sensor Networks
Shmulik Markovich-Golan, Member, IEEE, Sharon Gannot, Senior Member, IEEE, and
Israel Cohen, Senior Member, IEEE
Abstract—This paper proposes a distributed multiple constraints
generalized sidelobe canceler (GSC) for speech enhancement
in an -node fully connected wireless acoustic sensor
network (WASN) comprising microphones. Our algorithm
is designed to operate in reverberant environments with
constrained speakers (including both desired and competing
speakers). Rather than broadcasting microphone signals,
a significant communication bandwidth reduction is obtained
by performing local beamforming at the nodes, and utilizing
only transmission channels. Each node processes its
own microphone signals together with the transmitted signals.
The GSC-form implementation, by separating the constraints
and the minimization, enables the adaptation of the BF during
speech-absent time segments, and relaxes the requirement of other
distributed LCMV based algorithms to re-estimate the sources
RTFs after each iteration. We provide a full convergence proof
of the proposed structure to the centralized GSC-beamformer
(BF). An extensive experimental study of both narrowband and
(wideband) speech signals verifies the theoretical analysis.
Index Terms—Array signal processing, microphone array,
speech enhancement.
I. INTRODUCTION
RECENT advances in the fields of nano-technology
and communications encourage the development of
low-cost, low-power and miniaturized modules which can be
incorporated in WSN applications. A wireless acoustic sensor
network (WASN) comprises several nodesWSN modules interconnected
in some manner via a wireless medium. Each node
consists of one or more sensors, a processing unit and a wireless
communication module allowing them to exchange data. The
goal of the system is to perceive some physical phenomenon,
to process it, and to yield a required result. In classical array
processing systems, the sensing and the processing of the
acquired data are concentrated in a single location denoted
a fusion center. A phenomenon originating in the enclosure,
Manuscript received March 17, 2012; revised July 18, 2012 and September
25, 2012; accepted October 01, 2012. Date of publication October 12, 2012;
date of current version December 10, 2012. The associate editor coordinating
the review of this manuscript and approving it for publication was Prof. Boaz
Rafaely.
S. Markovich-Golan and S. Gannot are with the Faculty of Engineering, Bar-
Ilan University, Ramat-Gan 56000, Israel (e-mail: shmulik.markovich@gmail.
com; sharon.gannot@biu.ac.il).
I. Cohen is with the Department of Electrical Engineering, Technion, Technion
City, Haifa 32000, Israel (e-mail: icohen@ee.technion.ac.il).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TASL.2012.2224454
results in a disturbance that propagates in space. The closer the
sensors are to the origin of the phenomenon, the higher is the
signal to noise ratio (SNR) of the acquired signal, resulting in
lower estimation errors and better quality at the output of the
signal processing procedure. The concept of the wireless sensor
network (WSN) is to distribute the system resources (sensors,
processing units and actuators) and to provide a scalable, easy
to deploy, and robust structure. The wireless interface allows
for the extension of the sensing range beyond the limits of the
wired fusion center systems. The distribution of the sensors in
a larger volume enables a better coverage with higher SNR. For
a survey on the topic of WSN please refer to [1]–[4]. Limited
power and communication bandwidth resources set bounds
on the amount of data shared between nodes and necessitate
developing distributed algorithms. In recent years, many contributions
to the field of wireless acoustic sensor networks
(WASNs) have been introduced, circumventing the severe
network constraints [5]–[11]. A trivial solution is obtained
by utilizing only microphones local to the node without any
communication link. However this solution fails to utilize the
entire information from the network and hence is sub-optimal.
A common scheme for distributed signal processing algorithms
in WASNs comprises the following steps. First, local processing
of microphone signals results in intermediate signals
or estimates at each node, requiring less communication-bandwidth.
Second, the results of the first step are broadcast in
the WASN. Finally, a global estimate or an enhanced signal
is obtained by merging all intermediate signals or estimates.
Since the data available at each node is incomplete, an iterative
(or time-recursive) solution becomes necessary.
Several contributions have considered using aWASN system
for speech processing applications. Two main criteria are
common in speech beamforming applications: the minimum
mean squared error (MMSE) and the minimum variance distortionless
response (MVDR). The mean squared error (MSE)
between the output signal and the desired signal comprises two
components, namely the distortion and the residual noise. The
multi-channel Wiener filter (MWF)-BF [12]–[14] minimizes
the MSE between the desired signal and the output signal,
while the MVDR, first introduced by Capon [15], minimizes
the noise power at the output signal while maintaining a distortionless
response towards the desired signal, i.e., resulting in
zero distortion. Er and Cantoni [16] generalized the single distortionless
response to a set of linear constraints, and denoted
the BF as linearly constrained minimum variance (LCMV)-BF.
1558-7916/$31.00 © 2012 IEEE
344 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013
The speech distortion weighted (SDW)-MWF-BF, proposed
by Doclo et al. [17], generalizes both BF criteria. It introduces
a trade-off factor between noise reduction and distortion. It
can be shown that two special cases of the SDW-MWF are
the MWF-BF and, in case of a single desired speaker, the
MVDR-BF.
Signals and parameters at a node which are obtained by processing
its own microphone signals are denoted “local” to the
node. Other signals and parameters which are obtained by processing
data received from other nodes in the WASN are denoted
“global”. Doclo et al. [7] addressed the problem of enhancing
a single desired speaker contaminated by a stationary
noise. They adopted the SDW-MWF criterion and used a binaural
hearing aid system comprising two apparatuses with multiple
microphones in each ear.
Bertrand and Moonen [8] considered the more general case
of an node WASN and desired sources. They allowed
each node to define individual desired signals by using different
weighting of the spatial components of the speech. They proposed
a distributed adaptive node-specific signal estimation
(DANSE)- algorithm which necessitates transmission of
channels from each node and proved the convergence of the
algorithm to the global SDW-MWF-BF. In complicated scenarios
where multiple speakers exist and more control over the
beampattern is desired, the LCMV-BF is amore suitable option.
The linear constraints set can be designed to maintain undistorted
desired speakers while mitigating competing speakers.
Adaptive formulation of the MVDR-BF was proposed by
Frost [18]. Frost developed a constrained least mean squares
(LMS) algorithm for the adaptation of the BF coefficients. Griffiths
and Jim [19] showed that the MVDR criterion, can be
equivalently described in a two branch structure, denoted GSC.
This structure conveniently separates the constraining and the
minimization operations. Breed and Strauss [20] further proved
the equivalence between the closed-form LCMV and the GSCform
in the case of multiple constraints.
Gannot et al. [21] considered the single desired source case
and suggested to implement the MVDR-BF in its GSC-form in
the short time Fourier transform (STFT) domain. They also proposed
to use the relative transfer function (RTF) rather than the
acoustic transfer function (ATF) of the desired speaker, and proposed
an applicable estimation procedure based on the non-stationarity
of the speech. Markovich-Golan et al. [22] considered
the multiple speakers case and proposed to use an LCMV-BF in
aGSC-form. They constructed a constraints set from an estimate
of the RTFs of the desired speakers and an estimate of the basis
spanning the ATFs of the competing speakers and the stationary
noise. In [9] the authors adopted the MVDR criterion and proposed
an iterative distributed MVDR-BF for a binaural hearing
aid system. Bertrand and Moonen [23] proposed a distributed
LCMV algorithm, denoted linearly constrained DANSE (LCDANSE).
They considered the case of speakers and noise
picked up by microphones of an node WASN. Assuming
that each node may define the set of desired speakers differently,
they proposed that the constraints matrix will be common
to all nodes, whereas the desired response will be node-specific.
Their proposed algorithm constructs node-specific constraints
LCMV-BFs that require each node to transmit audio
channels. A total of transmission channels (the output
signals of all local BFs) are required. At each iteration, each
node has to re-estimate two sets of basis vectors spanning the
ATFs of the desired and the interfering speakers.
Ahmed and Vorobyov [24] presented a novel technique for
controlling the sidelobe level in collaborative beamforming for
WSNs where nodes comprise both sensors and actuators. They
considered the problem of transmitting multiple data streams
from different clusters of nodes to some remote target nodes.
Each cluster forms a beam pattern by properly setting the
phases and amplitudes at the transmission such that the signals
received at the designated target node are with equal phases
and amplitudes. An efficient algorithm for controlling the
inter-channel interference is based on repeatedly and randomly
selecting the nodes which participate in the beamforming, and
using low communication-bandwidth feedback channels from
the target nodes which report the interference level that they
experience.
In the current contribution we consider the case where the
nodes agree on the classification of desired and competing
speakers and share a common constraints set as well as desired
responses. A distributed time-recursive version of the centralized
GSC, denoted distributed GSC (DGSC), is proposed. We
prove that the proposed algorithm converges to the centralized
GSC. The proposed algorithm requires the transmission of
only audio channels. In static scenarios, the RTFs of
the sources need to be estimated only once at the initialization
stage. The estimation procedure of the RTFs may require
non-overlapping activity patterns of the speakers.
The structure of the paper is as follows. In Section II, the
problem is formulated. In Section III, a closed-form and a GSC
structure of the centralized LCMV-BF are presented. We show
that, under certain conditions, an LCMV which operates on a
transformation of the inputs is equivalent to the regular BF. In
Section IV, we derive the DGSC algorithm. The latter is based
on a specific transformation which allows to reformulate the
centralized BF as a sum of local GSC-BFs. The proposed algorithm
makes use of shared signals, one for each source, which
are broadcast in the WASN. We give an analytical proof of the
equivalence between the DGSC and the centralized GSC-BF. In
Section V, we propose a scheme for constructing the shared signals.
We compare the proposed DGSC and the LC-DANSE in
Section VI. An extensive experimental study, which verifies the
equivalence of the DGSC and the centralized GSC, is presented
in Section VII. Conclusions are drawn in Section VIII.
II. PROBLEM FORMULATION
Consider a WASN of microphones comprised of nodes.
Denote the number of microphones in the th node by . The
total number of microphones is denoted and equals
(1)
The problem is formulated in the STFT domain where denotes
the frequency index and denotes the time-frame index. The
vector of signals received by the microphones of all nodes is
MARKOVICH-GOLAN et al.: DISTRIBUTED MULTIPLE CONSTRAINTS GENERALIZED SIDELOBE CANCELER 345
. It is composed by concatenating the microphone signals
of all nodes:
(2)
where is an vector consisting of locally received
signals at the th node. The vector of all received signals is
given by:
(3)
where
(4)
is a vector comprised of the speech sources, and
(5)
is an matrix which columns are the ATFs relating the
speakers and the microphones. The vector is a vector
of interfering signals picked up by the microphones. Assuming
that the speakers’ signals and the noise sources are uncorrelated,
the dimensional covariancematrix of the received
signals may be written as:
(6)
where denotes the conjugate-transpose operator,
is the dimensional covariance
matrix of the speech signals and is the covariance
matrix of the noise. Note that multiple speakers and
noise sources may be simultaneously active at each frequency
bin. We assume that the network is fully connected, hence any
transmitted signal is available to all nodes. In cases that the network
is not fully connected a hierarchial algorithm, for example
based on a spanning tree of the network, can be sought. However,
this is beyond the scope of the current contribution. As
an example for a distributed algorithm in a partially connected
WASN please refer to [25]. The locations of the speakers are
assumed static, therefore their corresponding ATFs are time-invariant,
and hence the frame index is omitted in . The
algorithm is applied to each frequency bin independently. For
brevity, the index is hereafter omitted. The noise statistics is
assumed to vary significantly slower than the convergence-time
of the algorithm. For brevity, the index is also omitted from
hereafter.
Denote the set of microphone indexes at the th node by
, where and
denotes the number of elements in a set. The vector of the received
signals at the th node is given by
(7)
where is an selection matrix which extracts the
entries that correspond to the microphone indexes of the
th node:
(8)
and is an identity matrix.
III. AN EQUIVALENT CENTRALIZED LCMV-BF
In the following, the centralized LCMV-BF is formulated.
We show that under certain conditions, an LCMV-BF which
operates on a transformation of the inputs is equivalent to the
LCMV-BF which directly processes the microphone signals. A
common design relaxation of using the RTFs rather than the
ATFs is formulated, and the GSC-form implementation is defined.
The distributed algorithm, derived in Section IV, will be
based on a specific transformationmatrix, that will conveniently
split the centralized BF into a sum of BFs. Each of the BFs
utilizes only local microphones and shared signals, generated
as a linear combination of the local microphone signals in some
remote nodes. Together with the transmission of the local BF
outputs, a total of transmission channels is required.
The centralized LCMV-BF, denoted , is given by:
(9)
where the global constraints set is
(10)
and is a desired response vector. Typically, the desired
response vector is comprised of values of zeros and
ones, where a value of 1 is associated with a desired speaker
and a value of 0 is associated with an interfering speaker. In
this case the BF is required to yield a combination of all the
desired speakers while mitigating the interfering speakers and
the noise. Generally, can be any arbitrary vector. We
assume that the ATFs are linearly independent, i.e., the column
rank of the constraintsmatrix is . In practice, when
the latter assumption usually holds, however, of course it is not
guaranteed. In cases for which the ATFs are linearly dependent,
the constraints set might consist of contradicting requirements.
Hence, no solution that satisfy all constraints can be obtained.
When contradicting constraints exist, the system designer has
to compromise and alleviate the contradiction by reducing the
number of constraints. The closed-form solution of (9) is given
by Van Veen and Buckley in [12]:
(11)
where we assume that is invertible since one of its components
is a spatially white sensor noise.
The output of the LCMV-BF is given by:
(12)
346 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013
where . Note that the output comprises of the
sum of the constrained sources weighted by their corresponding
desired responses and a residual noise component.
Suppose that rather than , a linear transformation of the
inputs is available:
(13)
where is an matrix and . Assuming that
the column-subspace of is full rank, i.e., its rank is , we
will show that the LCMV-BFs in the original and in the transformed
domains are equivalent. Denote the following terms in
the transformed domain:
(14a)
(14b)
Consider the following constraints set in the transformed domain:
(15)
According to the fundamental theorem of linear algebra, any
BF, , in the transformed domain can be expressed as
the sum of two components:
(16)
where and lie in the column-subspace of and its
complementary subspace, respectively. Similarly to (9), the
LCMV criterion in the transformed domain is:
(17)
Note that from the definition of and in (14a) and (14b),
their columns lie in the column-subspace of . Hence, substituting
(16) in the transformed constraint set (15) and in the
minimization of the transformed LCMV-BF (17) yields:
(18a)
(18b)
(18c)
where the orthogonal component can be chosen arbitrarily,
since it affects neither the noise power at the output nor the
satisfaction of the constraints set. Any can be expressed as
a linear combination of the columns of :
(19)
where is an vector.
Substituting (19) in (18a), (18b), (18c), becomes:
(20)
Note that
(21)
is a full-rank matrix since both and are
dimensional rank- matrices. Hence, similarly to (11),
the closed-form LCMV-BF of (20) in the transformed domain
equals:
(22)
Substituting (14a), (14b) and (11) in (22) yields
(23)
where we also used the invertibility of . It can be easily
deduced that the BFs in the original and transformed domains
are equivalent as their outputs coincide:
(24)
In practice the ATFs of the speakers are unknown, and difficult
to estimate. A practical solution can be obtained by replacing
the sources in (12) with filtered versions thereof [21],
[22], [26], [27]. Let ; be such filters. The RTF
of the th source in the transformed domain is defined as:
(25)
The filters ; will be determined in Section IV.
Note that these procedures may require non-overlapping activity
patterns of the speakers.
Define the transformed ATF and RTF matrices of dimensions
, respectively:
(26a)
(26b)
The modified constraints set is finally given by substituting
by in (10):
(27)
The modified centralized LCMV-BF (in the transformed domain),
which satisfies the modified constraints set in (27), is denoted
by and is given in closed-form, similarly to (22):
(28)
where is an arbitrary vector lying in the null-subspace of the
column-subspace of . Similarly to (18b), (18c) we identify
MARKOVICH-GOLAN et al.: DISTRIBUTED MULTIPLE CONSTRAINTS GENERALIZED SIDELOBE CANCELER 347
that the component of which lies in the column-subspace
of is:
(29)
The GSC-form implementation of (29), denoted centralized
GSC-BF [19], [21], is obtained by splitting into two components:
(30)
Both and the columns of lie in the column-subspace
of . The vector , denoted fixed beamformer
(FBF), lies in the column-subspace of . is responsible
for maintaining the modified constraints set (27), and equals:
(31)
The blocking matrix (BM) matrix blocks the RTFs of the
constrained speakers. Explicitly,
(32)
Since the ranks of and are and , respectively, the
rank of is and its dimensions are .
The BM is not unique and can be obtained in several ways, for
example, as suggested in [12], [28], by applying the singular
value decomposition (SVD). To construct the BM, the SVD is
applied to the matrix , rather than , and then
projected to the transformed domain. Using this procedure an
BM is obtained. Denote the noise canceler (NC)
by an vector . According to [12] it equals:
(33)
Note that the invertibility of is guaranteed
by the definition (14b) and by the BM construction procedure
above.
To enable the construction of the DGSC in Section IV, an
extended GSC-structure is proposed:
(34)
where the regular GSC components, , and , are
replaced by:
(35a)
(35b)
(35c)
Here, the regular FBF is extended by the vectors
and , and the regular BM is extended by the matrix
. The extensions and lie in the columns null-subspace
of the matrix , and lies in null-subspace of . For
any choice of the modified constraints set (27) is
maintained. Note that the regular GSC can be obtained as a special
case of (35a), (35b) and (35c) by setting ,
and . As will be seen in the sequel, the introduction
of , and will enable us to derive a
distributed version of the GSC.
Now, we show that:
(36)
i.e., that and are equivalent. By substituting (35a),
(35b), (35c) in (34), it is evident that:
(37)
where is identified as:
(38)
This concludes the proof of the equivalence between the extended
and the regular GSC-structures.
The output signal of the proposed GSC-structure is given by:
(39)
where and are the outputs of the upper and
lower branches of the GSC, respectively:
(40a)
(40b)
(40c)
and are the noise reference signals at the output of the BM.
Substituting the constraints set of (27) in (39) yields:
(41)
Note that the output of the GSC in the transformed domain
and, by equivalence, the LCMV in the original domain, is comprised
of a summation of filtered versions of the sources and
a residual noise component. It is interesting to compare the different
combinations of the constrained sources at the output of
the regular LCMV-BF (12) and the extended GSC-BF (41).
In conclusion, applying a transformation that preserves
the rank- signal subspace, guarantees the equivalence between
the LCMV-BFs in the original and the transformed domains.
Furthermore, an equivalent extended GSC structure exists
in the transformed domain. Its optimality can be guaranteed
by designing a FBF (35a) which satisfies the transformed constraints
set (27), and by designing a BM (35b) with
linearly independent noise references.
In the following section we propose a specific transformation
which enables the construction of a distributed version of the
extended GSC-BF.
348 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013
IV. DGSC
A recursive distributed version of the GSC-BF is now proposed.
We present a specific transformation matrix which
conveniently splits the centralized GSC into a sum of GSCBFs,
denoted for , operating in each of the
WASN nodes. The proposed transformation matrix consists of
sub-matrices:
(42)
where the transformed inputs of the th node are constructed by
(43)
and the concatenation of all transformed inputs yields:
(44)
Note that is an matrix and the corresponding transformed
input is an vector for . The
sub-matrices ; will be later defined.
The transformed inputs of each node will comprise all of
its local microphone signals and a subset of the shared signals.
With the proposed transformation each node has at least
input signals, allowing for constraints to be maintained
locally, without unnecessary sacrificing degrees of freedom, as
will be shown in the following sub-sections. In this section, the
selection of the shared signals is arbitrary, and should only
satisfy linear independence. We will elaborate on this matter in
Section IV-C. A specific and simple selection of the shared
signals is given in Section V. The outputs of the GSC-BFs,
denoted for , and the shared signals are
transmitted in the WASN, where:
(45)
and is the GSC-BF at the th node. Hence, a total of
transmission channels are required by the algorithm.
These channels effectively extend the number of available
microphones at each node and should be continuously broadcast
(also after the algorithm has converged). Note that for
a node that comprises a single microphone, i.e., ,
no communication-bandwidth reduction is obtained, since the
single microphone signal is transmitted. The global GSC-BF is
given by augmenting the nodes’ BFs:
(46)
The final output of the algorithm is obtained by substituting (44),
(45) and (46) in (39):
(47)
The GSC-BF at the th node is given by:
(48)
where , and are the FBF, BM and NC at each node.
Substituting (48) in (47), the output of the algorithm can be restated
as:
(49)
Considering (49), we identify the global components of
the GSC-BF (34) as a concatenation of and for
, respectively:
(50a)
(50b)
The global BM, , is constructed as a block-diagonal matrix
with blocks:
(51)
Similarly to the notation in (39), (40a), (40b), (40c) for the
global GSC, the outputs of the upper and lower branches, and
the noise references at the th node, are defined as:
(52a)
(52b)
(52c)
(52d)
The global noise references vector is given by augmenting the
noise reference signals of all nodes:
(53)
A proper selection of shared signals ensures that the number
of noise references at the output of the global BM is ,
and hence satisfies the requirement that is a full-rank
matrix.
In the following, we prove analytically that the proposed
DGSC converges to the centralized GSC. In Section IV-A we
propose a proper transformation matrix , that will allow us to
split the BF into the structure defined by (49).We show that the
proposed transformation matrix preserves the rank- signals
subspace, as required for the equivalence shown in Section III.
The design of the FBF, the BM, and the NC of the DGSC is
presented in Sections IV-B, IV-C, IV-D. This structure is shown
to satisfy the requirements of Section III.
A. The Transformation Matrix
In the following, we define some notations for formulating
the DGSC. The node that transmits the shared signal of the th
speaker is denoted as the “owner” of the th source. In SectionV
we describe the procedure for selecting the owners of each of the
P signals1, and for generating the shared signals. Denote by
the index of the node which is the owner of the th source. The
shared signals are denoted by ; . Consider
the th shared signal, corresponding to the th source. Assume
1A node can be the owner of several sources.
MARKOVICH-GOLAN et al.: DISTRIBUTED MULTIPLE CONSTRAINTS GENERALIZED SIDELOBE CANCELER 349
that the th source is owned by the th node, i.e., . We
suggest to construct the shared signal as:
(54)
where is an “local” BF that processes only the microphone
signals of the th node. A specific choice of the BFs
for and will be defined in Section V.
Denote by the set of sources
owned by the th node, and by the number of sources
owned by the th node. The shared signals generated by the th
node, are defined in a vector notation by the vector:
(55)
(56)
where
(57)
The dimensional matrix should be properly constructed
to have a rank . As each source is exclusively owned
by a single node
(58)
The vector of all shared signals is constructed by augmenting
the contributions of all nodes:
(59)
Note, that some of the nodes may own no sources. For instance,
suppose that the th node does not own any source. In that case,
and the corresponding vector of shared signals
will be empty.
The set of indexes of the speakers is denoted by
. Denote the set of shared signals that the th node
receives as . It comprises the indexes of all sources except
the self-owned sources:
(60)
where denotes the set subtraction operation and .
The vector of shared signals received by the th node is
denoted by:
(61)
As previously defined in (43), the signals available for processing
at the th node are denoted by , an vector:
where
(62a)
(62b)
From (62a), the number of transformed input signals at the th
node is given by:
(63)
Note that and are linearly independent, since
they comprise differentmicrophones. Now, since the rank of
in (57) is , it follows that the rank of is also .Hence,
we argue that the rank of is . A similar
argument can be applied to . Constructed as a concatenation
of and , its rank equals .
We designate the th shared signal, , as the referencemicrophone
for the th source RTF (25). We identify the acoustic
transfer function (TF) relating the th source and the th shared
signal (54) as:
(64)
Now, the th RTF (25) can be defined with respect to the th
shared signal. Considerations for constructing ;
will be discussed in Section V.
The proposed dimensional transformation matrix is
finally given by:
(65)
where we note that
(66)
It can be easily shown that the rank of the column-subspace of
is , since a permutation of is its submatrix.
Hence, is a valid transformation matrix, rendering
and equivalent (24).
According to (43) and (62a), the transformed input vector in
the th node is the -dimensional vector:
(67)
where the received shared signals at the th node are given by
the vector .
Examining (3) and (43), the transformed inputs vector of the
th node is given by:
(68)
where is an matrix and .
Define
(69)
350 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013
Fig. 1. The DGSC.
where is defined as
(70)
and is defined in (57). The elements of the matrix
are the ATFs relating the speakers and the shared signals.
We assume that is a full rank matrix. The condition for the
invertibility of is given is Section V.
We now show that the rank of is for .
Notice that the matrix is a column permutation of the
matrix:
Since , we conclude that
.
Determining as above is instrumental for transforming,
the centralized GSC-BF into a sum of GSC-BFs in the
transformed domain. The total output of the DGSC algorithm
is available at each of the nodes in the WASN.
In the following sections we substitute by the RTFs
matrix
(71)
for . A block-diagram of the proposed algorithm
is depicted in Fig. 1.
B. The Distributed FBF
Had been known to all nodes, it would have been possible
to calculate the classic centralized FBF, . In our case, we
propose a distributed FBF consisting of a summation of local
FBFs, which are calculated from the transformed RTFs at each
node. Explicitly, the proposed distributed FBF at the th node
is defined as:
(72)
As equals up to a different column scaling, its rank
equals . Therefore, is an invertible matrix. As stated
earlier, the FBF is not unique, and can have different forms
with different selections of in (35a), (35b). Various
choices of the FBF will differ in their robustness to estimation
errors.
It can be easily verified, by substituting (72) in (50a), that the
global distributed FBF (35a) satisfies the global constraints set
(27) since
(73)
This simple FBF design utilizes each of theWASNmicrophones
and is not optimal in any sense. The robustness analysis of the
proposed algorithm to estimation errors is out of the scope of
the current contribution.
C. The Distributed BM
As mentioned earlier, the BM is not unique, and several procedures
for its construction are available. Recently, we have proposed
an efficient implementation of a sparse BM [28]. Similarly
to the construction of the BM in Section III, we propose
that the th node will construct a transformed BM by applying
the SVD to , for . The SVD of is
(74)
where the column-space of is spanned by the column-space
of . The null-subspace of is spanned by the column-subspace
of and hence is an adequate BM at the th node. Since
the column rank of is , The dimensions of the BM at the
th node are , and its column rank is .
Next, we prove that is a valid BM. From its construction
(51), it trivially blocks , hence, in order to complete the proof,
we need to show that is of full-rank. From the definition
of in (14b), and since is full-rank (rank- ), the
latter condition is equivalent to showing that the column rank
of is .
The rank of is . A one-to-one linear transformation
from to exists for where is
an matrix orthogonal to . It follows that
share at least degrees of freedom (the columns
of ). Now, since the rank of is , we conclude that
share exactly degrees of freedom. Hence, the
rank of is . By construction, is an
BM of , and its outputs are
equivalent (represent the same noise signals) to the
outputs . Finally, is a concatenation
of the sub-matrices for . Hence, it has
the same rank as the concatenation of for .
Based on the above discussion, it is guaranteed that
is a full-rank matrix.
MARKOVICH-GOLAN et al.: DISTRIBUTED MULTIPLE CONSTRAINTS GENERALIZED SIDELOBE CANCELER 351
D. The Distributed NC
The normalized LMS (NLMS) adaptation of the global NC
in [21] is given by:
(75)
where is a recursive estimator of the power of the noise
reference signals, i.e., :
(76)
where is a forgetting factor (typically ). Due to inevitable
estimation errors, some of the speech signalsmight leak
to the noise reference signals. In order to prevent the self-cancellation
phenomenon, which is manifested in a severe speech
distortion, the NC is updated according to (75) only when the
speakers are inactive. A perfect voice activity detector (VAD)
is assumed for this purpose. The total output of the algorithm,
, is available to all nodes as the summation in (49). As
clearly seen in (75), the noise reference signals at the th node,
, only affect . Hence, updating the NC is equivalent
to simultaneous updates of the distributed NCs ;
. Explicitly, the recursive update of the distributed
NC is given by:
(77)
where is the estimated power of the global noise reference
vector at the th node. We assume that
the power of the local noise reference signals at the various
nodes are approximately the same, i.e.,
; . Hence the estimated
power at the th node is:
(78)
The latter assumption can be circumvented by sharing estimates
of the variance of the noise reference signals ;
in the WASN. Assuming that the noise statistics
is slowly varying, the latter exchange of power estimates does
not consume a large bandwidth.
V. SHARED SIGNALS CONSTRUCTION
Here, we propose a simple procedure for generating the
shared signals, which is based on selecting the microphones
with the highest SNR for each of the sources. Since the th
shared signal is used as the reference signal in the definition of
the RTF (64), and since in practice the RTF is unknown and has
to be estimated, it is desired that the SNR of the th source will
be maximal. The SNR of a microphone with respect to some
source is defined as the ratio between the source power and
the power of the slowly time-varying noise.
As mentioned in Section IV, the shared signals should satisfy
that the column rank of is . Therefore, a microphone that
was selected as the shared signal of a certain source, cannot be
chosen as a shared signal for another source, or else the rank of
will be lower than .
During the initialization of the algorithm each node sets
the index set of candidate microphones for shared
signals. For each source the following procedure
is applied. First, the th node estimates ; ,
the th source SNR at each of its available local microphones,
; . Each node selects the microphone with the
highest SNR. The SNR and the index of the candidate microphone
of the th node are:
(79a)
(79b)
Each node shares the maximal SNR with the rest of the
nodes.
The node with the maximum SNR will be declared the
owner of the source , i.e., :
(80)
The th node constructs the BF that extracts the th shared
signal
(81)
and removes from its set of candidate microphones to own
a signal
(82)
This way, it is guaranteed that a single microphone will not
be chosen more than once. The procedure is repeated for all
sources, resulting in the entire set of shared signals. Note that
some nodes may be the owners of more than a single source,
and some nodes may have no ownership on sources. The proposed
method is very simple, and does not require any processing
for constructing the shared signals. In practice, is
usually full-rank, however, this is not guaranteed. In case, that
is rank-deficient, a simple procedure of replacing some of
the shared signals until the rank is full can be applied.
VI. A COMPARISON BETWEEN THE DGSC
AND THE LC-DANSE
We compare the proposed DGSC and the LC-DANSE [23].
Both algorithms converge to the centralized LCMV-BF. The
LC-DANSE implements a distributed version of the closedform
LCMV, whereas the DGSC adopts the GSC implementation
of the LCMV structure. In the DGSC a common objective
to all nodes, i.e., the classification of desired and competing
speakers, yields a single common constraints set. A more
general approach is adopted by the LC-DANSE, which allows
node-specific constraint sets. In practice, this enables each node
to define its own objective, i.e., a set of desired and competing
speakers. The LC-DANSE is an iterative algorithm (although,
the iterations can be carried out recursively over time), while the
DGSC is a time-recursive algorithm. The GSC structure conveniently
decouples the task of noise reduction from the task
of satisfying the constraints set. Hence, allowing the adaptive
352 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013
noise canceler (ANC) to adjust to variations in the noise statistics.
The DGSC requires transmission channels, whereas
the LC-DANSE requires transmission channels. Both
algorithms, require estimates of the sources RTFs. In static scenarios,
the DGSC requires a single estimate thereof, whereas
in the LC-DANSE, each iteration requires additional RTF estimates.
In the following section, we experimentally compare the
DGSC and the LC-DANSE.
VII. EXPERIMENTAL STUDY
In order to verify the equivalence between the centralized
GSC and the proposed DGSC, a comprehensive experimental
study is carried out. The validity of the proposed algorithm is
tested for narrowband signals in Section VII-A and for speech
signals in Section VII-B. We compare the following five algorithms,
namely, the centralized closed-form LCMV, the centralized
GSC, a single node local GSC (arbitrarily chosen as the
first node), the LC-DANSE and the proposed DGSC algorithm.
The comparison criteria are noise reduction and distortion of the
constrained sources. Opposed to the global BFs and the DGSC
algorithms where the number of constraints can be as large as the
total number of microphones in the WASN , the local
GSC is constrained to handle only scenarios where .
The performance is averaged over multipleMonte-Carlo experiments
in various scenarios.
A. Narrowband Signals
A WASN comprising nodes, each consisting of
microphones was simulated. We denote by constrained
sources, sources for which desired responses exist
and are maintained with a proper linear constraints set. Furthermore,
we denote by unconstrained sources, all interfering
sources that comprises. We examine a total of 32 scenarios:
all combinations of constrained sources
and unconstrained sources. A spatially white
Gaussian sensor noise is added to the microphone signals. In
each scenario (a specific selection of and ), 10 sets of
source ATFs and a vector of desired responses are randomized.
For each set, 10 realizations of samples of independent
identically distributed (IID) Gaussian processes are
randomized. These signals serve as the constrained and unconstrained
sources. Note that in the narrowband case all sources
are stationary. A total of 3200 Monte-Carlo experiments are
used for the comparison of the various algorithms. The SNR,
the ratio between the constrained signals power and the sensors
spatially white noise, is set to 30 dB, and the interference to
noise ratio (INR), the ratio between the unconstrained sources
power and the sensors noise, is set to 25 dB. The step-size of
the NLMS algorithms is set to . The results of the
LC-DANSE algorithm are measured after 10 iterations. We
assume that the RTFs are known without estimation errors,
hence no distortion to the constrained signals is measured for
the centralized LCMV, the centralized GSC, and the DGSC for
all values of . For the single node GSC, there is no
distortion for , but for due to lack of
degrees of freedom (there are only beams that can be
steered), distortion is inevitable. The distortion measured in the
Fig. 2. The convergence of the tested algorithms versus the number of samples
for constraints and .
Fig. 3. The NR of the tested algorithms versus the number of constraints ,
for .
TABLE I
THE RATIO OF THE NOISE LEVEL AT THE OUTPUTS
OF THE DGSC AND THE CENTRALIZED GSC [dB]
LC-DANSE is also low ( 23 dB) in all scenarios. The noise
reduction (NR) of the various algorithms after convergence
for versus the number of constraints, , is depicted in
Fig. 3. The figure of merit is defined as the ratio between the
slowly time varying noise power at the input and at the output.
As expected, the NR of the centralized GSC is about 0.35 dB
lower than the centralized LCMV. This is a result of using the
MARKOVICH-GOLAN et al.: DISTRIBUTED MULTIPLE CONSTRAINTS GENERALIZED SIDELOBE CANCELER 353
Fig. 4. The room setup of one of the Monte Carlo simulations.
Fig. 5. The SNR improvement of the tested algorithms in various Monte Carlo
experiments.
Fig. 6. The SIR improvement of the tested algorithms in various Monte Carlo
experiments.
LMS algorithm, which suffers from excess MSE. It can be mitigated
by reducing the step-size compromising convergence
Fig. 7. The distortion of the tested algorithms in various Monte Carlo
experiments.
TABLE II
PERFORMANCE COMPARISON OF THE CENTRALIZED GSC, THE DGSC AND THE
SINGLE NODE GSC ALGORITHMS WITH SPEECH SIGNALS
Fig. 8. The convergence of the tested algorithms versus time.
rate. The NR of the proposed DGSC is 0.52 dB lower than
the centralized GSC (probably since longer convergence time
is required), whereas the NR of the single node GSC is much
lower (from 7.7 dB to 47.6 dB, depending on the number of
constraints). The NR performance of all BFs reduces as the
number of constraints increases. The convergence of the NR
versus the number of samples is depicted in Fig. 2 for a scenario
with and .
Although the proposed DGSC and the centralized GSC converge
to more or less the same NR as the centralized LCMV,
the convergence time of the DGSC is higher. This may result
from the higher condition number, defined as the ratio of the
354 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013
Fig. 9. Sonograms of the various components of the signal received in the first microphone, and the outputs of the centralized GSC, the DGSC and the single
node GSC. (a) Desired speaker; (b) Competing speaker; (c) First microphone; (d) Centralized. GSC; (e) DGSC; (f) Single node GSC
largest and smallest eigenvalues, of the noise references covariance
matrix . Higher condition number is known to increase
the convergence time [29]. For example, in the depicted
scenario, the average condition number of the noise references
covariance matrix of the DGSC is 6.9 dB higher than of the centralized
GSC. The latter phenomenon may be attributed to the
vector in (35a), which increases the norm of the ANC in
(35c), however, this subject requires further research.
The ratio of the noise level at the output of the DGSC and
the noise level at the output of the centralized GSC is given in
Table I for and .
B. Speech Signals
The performance of the various BFs is tested in a simulated
room scenario, by using a room impulse response (RIR) generator
[30], [31]. The dimensions of the simulated room are
MARKOVICH-GOLAN et al.: DISTRIBUTED MULTIPLE CONSTRAINTS GENERALIZED SIDELOBE CANCELER 355
m m m, and its reverberation time is set to
ms. An nodes WASN where each node comprises
microphones at a distance of 5 cm is set. The nodes are
located at the center of each of the four walls, 10 cm from the
walls surface and at a height of 1.5 m. A desired female speaker
and a competing male speaker, are located in the room as well
as two white Gaussian stationary interferences. The figures of
merit of the BFs are tested by 90 Monte Carlo experiments,
where in each experiment the sources locations are randomly
selected, and the microphone constellation remains fixed. The
room setup of one of the Monte Carlo experiments is depicted
in Fig. 4.
The microphone signals are sampled at a sampling rate of
8 kHz. The length of the STFT window is 4096 points with
75% overlap between frames. The estimated RTFs are double
sided filters 3072 coefficient long. They are estimated using
the subspace method as in [22]. In the DGSC algorithm, in
order to save communication-bandwidth, the signals undergo
inverse STFT prior to the broadcast in the network. We use
the overlap and save scheme for applying the filters in the
STFT domain [21], [32]. The SNR improvement, signal to
interference ratio (SIR) improvement and distortion measures
of the centralized GSC, the DGSC and the single node GSC for
the various Monte Carlo experiments are depicted in Figs. 5,
6, 7, respectively.
The SNR is the ratio between the powers of the desired
speaker and the stationary noise, the SIR is the ratio between
the powers of the desired speaker and the competing speaker,
and the distortion is the ratio between the MSE of the desired
speech at the output and the power of the desired speech signal.
The SNR and the SIR at the input are set to 13 dB and 0 dB,
respectively. It is clear from these figures that the NR values
of the DGSC and the centralized GSC are equivalent, and that
both outperform the single node GSC. The average figures of
merit of the various algorithms is depicted in Table II. The SNR
improvement of the DGSC and the centralized GSC are similar
(20.1 dB and 19.3 dB, respectively. The slight differences
may be explained as in the narrowband case), while the SNR
improvement of the single node GSC is significantly lower (1.7
dB). The SIR improvement and the distortion of the centralized
GSC are 22.9 dB and 23.0 dB, respectively, whereas the
corresponding measures of the DGSC are a bit worse 18.6
dB and 20.3 dB, respectively. This may be attributed to
differences in the robustness of the BFs against RTF estimation
errors (see discussion in the narrowband case). Due to the significantly
lower number of microphones, the SIR improvement
and distortion of the single node GSC (11.0 dB and 14.1 dB,
respectively) are much worse than the centralized GSC. The
centralized GSC and the DGSC exhibit comparable convergence
behavior as depicted in Fig. 8. Note, that the single node
GSC converges much faster, but its overall performance is very
poor.
Sonograms of the various components of the signal received
in the first microphone, and the outputs of the centralized GSC,
the DGSC and the single node GSC are depicted in Fig. 9. The
equivalence of the DGSC and the centralized GSC and their
superiority to the single node GSC can be deduced from the
figures.
VIII. CONCLUSIONS
In this paper, we have introduced the DGSC, a novel distributed
algorithm for speech enhancement inmultiple speakers,
noisy and reverberant environment. It is proven analytically that
the proposed algorithm converges to the optimal centralized
GSC-BF. The adaptive procedure of the DGSC is based on the
low complexity, time recursive NLMS algorithm. A common
linear constraints set, comprising the speakers’ATF, is shared by
all nodes in the network. The algorithm requires transmission
channels. TheGSC structure splits the BF into two components.
The first component lies in the constraints (speakers)
subspace and the second component lies in its corresponding
null-space. The constraints subspace component of the DGSC
is determined at the initialization phase of the algorithm where
shared signals are constructed by a selection procedure in the
WASN. In static environments this procedure should be applied
only at the initialization stage. The second component is implemented
as an adaptive algorithm which converges in speech-absent
time segments.
A comprehensive experimental study validates the equivalence
between the centralized GSC and the DGSC algorithms.
The proposed algorithm was tested successfully for both
narrowband and speech signals in multiple Monte Carlo
experiments.
REFERENCES
[1] D. Estrin, G. Pottie, and M. Srivastava, “Instrumenting the world with
wireless sensor networks,” in Proc. IEEE Int. Conf. Acoust., Speech,
Signal Process. (ICASSP), May 2001, pp. 2033–2036.
[2] D. Culler, D. Estrin, and M. Srivastava, “Overview of sensor networks,”
Computer, vol. 37, no. 8, pp. 41–49, Aug. 2004.
[3] H. Ochiai, P. Mitran, H. Poor, and V. Tarokh, “Collaborative beamforming
for distributed wireless ad hoc sensor networks,” IEEE Trans.
Signal Process., vol. 53, no. 11, pp. 4110–4124, Nov. 2005.
[4] M. Ahmed and S. Vorobyov, “Collaborative beamforming for wireless
sensor networks with Gaussian distributed sensor nodes,” IEEE Trans.
Wireless Commun., vol. 8, no. 2, pp. 638–643, Feb. 2009.
[5] S.Wehr, I. Kozintsev, R. Lienhart, and W. Kellermann, “Synchronization
of acoustic sensors for distributed ad-hoc audio networks and its
use for blind source separation,” in Proc. IEEE 6th Int. Symp. Multimedia
Software Eng., Dec. 2004, pp. 18–25.
[6] Y. Jia, Y. Luo, Y. Lin, and I. Kozintsev, “Distributed microphone arrays
for digital home and office,” in Proc. IEEE Int. Conf. Acoust.,
Speech, Signal Process. (ICASSP), May 2006, vol. 5, pp. 1065–1068.
[7] S. Doclo, M. Moonen, T. V. den Bogaert, and J. Wouters, “Reducedbandwidth
and distributed MWF-based noise reduction algorithms for
binaural hearing aids,” IEEE Trans. Audio, Speech, Lang. Process., vol.
17, no. 1, pp. 38–51, Jan. 2009.
[8] A. Bertrand and M. Moonen, “Distributed adaptive node-specific
signal estimation in fully connected sensor networks—Part I: Sequential
node updating,” IEEE Trans. Signal Process., vol. 58, no. 10, pp.
5277–5291, Oct. 2010.
[9] S. Markovich-Golan, S. Gannot, and I. Cohen, “A reduced bandwidth
binaural MVDR beamformer,” in Proc. Int. Workshop Acoust. Echo
Noise Control (IWAENC), Tel Aviv, Israel, Aug. 2010.
[10] T. C. Lawin-Ore and S. Doclo, “Analysis of rate constraints for
MWF-based noise reduction in acoustic sensor networks,” in Proc.
IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), May
2011, pp. 269–272.
[11] A. Bertrand, “Applications and trends in wireless acoustic sensor
networks: A signal processing perspective,” in Proc. IEEE Symp.
Commun. Veh. Technol. (SCVT), Ghent, Belgium, Nov. 2011, pp. 1–6.
[12] B. D. Van Veen and K. M. Buckley, “Beamforming: A versatile
approach to spatial filtering,” IEEE Trans. Acoust., Speech, Signal
Process. Mag., vol. 5, no. 2, pp. 4–24, Apr. 1988.
[13] S. Doclo andM.Moonen, “GSVD-based optimal filtering formulti-microphone
speech enhancement,” in Microphone arrays: Signal processing
techniques and applications. New York: Springer, 2001, pp.
111–132.
356 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013
[14] S. Doclo and M. Moonen, “GSVD-based optimal filtering for single
and multimicrophone speech enhancement,” IEEE Trans. Signal
Process., vol. 50, no. 9, pp. 2230–2244, Sep. 2002.
[15] J. Capon, “High-resolution frequency-wavenumber spectrum analysis,”
Proc. IEEE, vol. 57, no. 8, pp. 1408–1418, Aug. 1969.
[16] M. Er and A. Cantoni, “Derivative constraints for broad-band element
space antenna array processors,” IEEE Trans. Acoust., Speech, Signal
Process., vol. ASSP-31, no. 6, pp. 1378–1393, Dec. 1983.
[17] S. Doclo, A. Spriet, J. Wouters, and M. Moonen, “Speech distortion
weighted multichannel Wiener filtering techniques for noise reduction,”
in Speech Enhancement, J. Benesty, S. Makino, and J. Chen,
Eds. New York: Springer, 2005, pp. 199–228.
[18] O. L. Frost, III, “An algorithm for linearly constrained adaptive array
processing,” Proc. IEEE, vol. 60, no. 8, pp. 926–935, Aug. 1972.
[19] L. J. Griffiths and C.W. Jim, “An alternative approach to linearly constrained
adaptive beamforming,” IEEE Trans. Antennas Propag., vol.
AP-30, pp. 27–34, Jan. 1982.
[20] B. R. Breed and J. Strauss, “A short proof of the equivalence of LCMV
and GSC beamforming,” IEEE Signal Process. Lett., vol. 9, no. 6, pp.
168–169, Jun. 2002.
[21] S. Gannot, D. Burshtein, and E.Weinstein, “Signal enhancement using
beamforming and nonstationarity with applications to speech,” IEEE
Trans. Signal Process., vol. 49, no. 8, pp. 1614–1626, Aug. 2001.
[22] S. Markovich-Golan, S. Gannot, and I. Cohen, “Multichannel
eigenspace beamforming in a reverberant noisy environment with
multiple interfering speech signals,” IEEE Trans. Audio, Speech,
Lang. Process., vol. 17, no. 6, pp. 1071–1086, Aug. 2009.
[23] A. Bertrand andM.Moonen, “Distributed node-specific LCMV beamforming
in wireless sensor networks,” IEEE Trans. Signal Process.,
vol. 60, no. 1, pp. 233–246, Jan. 2012.
[24] M. F. A. Ahmed and S. A. Vorobyov, “Sidelobe control in collaborative
beamforming via node selection,” IEEE Trans. Signal Process., vol. 58,
no. 12, pp. 6168–6180, Dec. 2010.
[25] A. Bertrand andM.Moonen., “Distributed adaptive estimation of nodespecific
signals in wireless sensor networks with a tree topology,” IEEE
Trans. Signal Process., vol. 59, no. 5, pp. 2196–2210, May 2011.
[26] I. Cohen, “Relative transfer function identification using speech signals,”
IEEE Trans. Speech Audio Process., vol. 12, no. 5, pp. 451–459,
Sep. 2004.
[27] S. Markovich-Golan, S. Gannot, and I. Cohen, “Subspace tracking of
multiple sources and its application to speakers extraction,” in Proc.
IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Mar. 2010,
pp. 201–204.
[28] S. Markovich-Golan, S. Gannot, and I. Cohen, “A sparse blocking matrix
formultiple constraintsGSC beamformer,” in Proc. IEEE Int.Conf.
Acoust., Speech, Signal Process. (ICASSP),Mar. 2012, pp. 197–200.
[29] B.Widrow and S.D. Stearns, “The LMSalgorithm,” in Adaptive Signal
Processing, S. Haykin, Ed. Englewood Cliffs, NJ: Prentice-Hall,
1985.
[30] J. Allen and D. Berkley, “Image method for efficiently simulating
small-room acoustics,” J. Acoust. Soc. Amer., vol. 65, no. 4, pp.
943–950, Apr. 1979.
[31] E. Habets, Room impulse response (RIR) generator, [Online]. Available:
http://home.tiscali.nl/ehabets/rir_generator.html Jul. 2006
[32] J. J. Shynk, “Frequency-domain andmultirate adaptive filtering,” IEEE
Signal Process. Mag., vol. 9, no. 1, pp. 14–37, Jan. 1992.
Shmulik Markovich-Golan (M’12) received the
B.Sc. (cum laude) and M.Sc. degrees in electrical
engineering from the Technion—Israel Institute of
Technology, Haifa, Israel, in 2002 and 2008 respectively.
He is currently pursuing the Ph.D. degree
at the Engineering Faculty in Bar-Ilan University.
His research interests include multi-channel signal
processing, distributed sensor networks, speech enhancement
using microphone arrays and distributed
estimation.
Sharon Gannot (S’92–M’01–SM’06) received
his B.Sc. degree (summa cum laude) from the
Technion—Israel Institute of Technology, Haifa,
Israel in 1986 and the M.Sc. (cum laude) and Ph.D.
degrees from Tel-Aviv University, Israel in 1995
and 2000 respectively, all in electrical engineering.
In 2001 he held a post-doctoral position at the
department of Electrical Engineering (ESAT-SISTA)
at K.U.Leuven, Belgium. From 2002 to 2003 he held
a research and teaching position at the Faculty of
Electrical Engineering, Technion-Israel Institute of
Technology, Haifa, Israel. Currently, he is an Associate Professor at the Faculty
of Engineering, Bar-Ilan University, Israel, where he is heading the Speech
and Signal Processing laboratory. Prof. Gannot is the recipient of Bar-Ilan
University outstanding lecturer award for 2010.
Prof. Gannot is currently an Associate Editor of IEEE TRANSACTIONS ON
AUDIO, SPEECH, AND LANGUAGE PROCESSING. He served as anAssociate Editor
of the EURASIP Journal of Advances in Signal Processing between 2003–2012,
and as an Editor of two special issues on Multi-microphone Speech Processing
of the same journal. He also served as a guest editor of ELSEVIER Speech Communication
Journal and a reviewer of many IEEE journals and conferences.
Prof. Gannot has been a member of the Audio and Acoustic Signal Processing
(AASP) technical committee of the IEEE since Jan., 2010. He has also been
a member of the Technical and Steering committee of the International Workshop
on Acoustic Echo and Noise Control (IWAENC) since 2005 and the general
co-chair of IWAENC held at Tel-Aviv, Israel in August 2010. Prof. Gannot
will serve as the general co-chair of the IEEE Workshop on Applications of
Signal Processing to Audio and Acoustics (WASPAA) in 2013. His research
interests include parameter estimation, statistical signal processing, especially
speech processing using either single- or multi-microphone arrays.
Israel Cohen (M’01–SM’03) is an Associate
Professor of electrical engineering at the Technion—
Israel Institute of Technology, Haifa, Israel.
He received the B.Sc. (summa cum laude), M.Sc.
and Ph.D. degrees in electrical engineering from the
Technion—Israel Institute of Technology, in 1990,
1993 and 1998, respectively.
From 1990 to 1998, he was a Research Scientist
with RAFAEL Research Laboratories, Haifa, Israel
Ministry of Defense. From 1998 to 2001, he was a
Postdoctoral Research Associate with the Computer
Science Department, Yale University, New Haven, CT. In 2001 he joined the
Electrical Engineering Department of the Technion. His research interests are
statistical signal processing, analysis and modeling of acoustic signals, speech
enhancement, noise estimation, microphone arrays, source localization, blind
source separation, system identification and adaptive filtering.
He is a coeditor of the Multichannel Speech Processing section of the
Springer Handbook of Speech Processing (Springer, 2008), a coauthor of
Noise Reduction in Speech Processing (Springer, 2009), a coeditor of Speech
Processing in Modern Communication: Challenges and Perspectives (Springer,
20010), and a general co-chair of the 2010 InternationalWorkshop on Acoustic
Echo and Noise Control (IWAENC).
Prof. Cohen is a recipient of the Alexander Goldberg Prize for Excellence in
Research, and theMuriel and David Jacknow award for Excellence in Teaching.
He served as Associate Editor of the IEEE TRANSACTIONS ON AUDIO, SPEECH,
AND LANGUAGE PROCESSING and IEEE SIGNAL PROCESSING LETTERS, and as
Guest Editor of a special issue of the EURASIP Journal on Advances in Signal
Processing on Advances in Multimicrophone Speech Processing and a special
issue of the Elsevier Speech Communication Journal on Speech Enhancement.

Comments are closed.