Professional Documents
Culture Documents
1
"Why has Mark Zuckerberg taped over the webcam
and microphone on his MacBook?" [4]
Disabled microphones. In this scenario, the computer
may be equipped with a microphone, which at some
point was disabled, muted (with an 'off' button), or taped
(e.g., in laptops [4]). This typically occurs when the user
wants to increase security and ensure a conversation’s
confidentiality. In these cases the malware provides the
ability to record using the headphones.
Table 2 contains the SNR measurements in decibels (dB) Figure 5. Average signal and noise energy bands for micro-
of the reference signal used in the experiments, namely, phone, one meter apart (top left); headphones, one meter apart
(top right); headphones, five meters apart (bottom left); and
a sequential recitation of sentences in the Harvard sen- headphones, nine meters apart (bottom right).
tences list. Note that SAR and PESQ measures are not
included, since this table addresses the pre-recorded ref-
erence signal alone.
Tables 3 and 4 show the results for the speech quality
measures for headphone and microphone recordings for
different recording distances. Table 5 shows the quality
degradation of the reference signal and headphone rec-
orded speech after encoding and decoding, reproducing
a subsequent transmission of the acquired speech via the
Internet. The codec used was the Advanced Audio Cod-
ing (AAC) [21], the potential MP3 successor and default
codec for YouTube, the iPhone, iPod, and other media
devices. This codec has a compression ratio of approxi-
mately ten to one. We note that in general, SNR meas-
urements are highly dependent on an accurate segmenta-
tion of speech versus noise excerpts. Therefore, in order
to optimize segmentation accuracy and consistency Figure 6. Energy histograms for microphone, one meter apart
across the different setups investigated, voice activity de- (top left); headphones, one meter apart (top right); headphones,
tection was estimated for the reference and not recorded five meters apart (bottom left); and headphones, nine meters
signals. This segmentation was then applied to pairs of apart (bottom right).
reference and recorded signals after they were time-
aligned through cross-correlation.
away, (3) headphones, five meters away, and (4) head-
Summary: The results above show a decrease in the phones, nine meters away.
SNR when replacing the microphone with headphones
Figure 5 displays average energy in voice-active regions
for the recordings. Nevertheless, note that the SAR and
(signal) compared to the average energy in voice-inac-
PESQ indices are much less affected by this change.
tive (noise) regions. Note the sharp drop in the SNR in
These indicators, being directly correlated with intelligi-
the headphone channel setup for frequencies above
bility, support our subjective evaluation in which the
around 1500 Hz in comparison with the microphone
headphone recordings in our experiments were judged to
setup. High frequencies are further compromised as re-
be intelligible.
cording distance increases.
4.3 Spectral analysis
In addition to the objective SNR measures presented, we Figure 6 contains histograms for the energy levels in dec-
also provide frequency domain graphs corresponding to ibels for each frequency band. The histograms illustrate
features used in the above calculations, comparing four the lack of spectrum variability for the headphone spec-
transmission setups: (1) microphone, one meter away tra in comparison with that of the microphone. Note, as
(from the computer), (2) headphones, one meter
well, the relatively flat energy distribution for the head- 1 m. 5 m. 9 m.
phones, especially at higher frequencies. 2 60
1.8
50
The results portrayed in the figures and tables indicate 1.6
that of the speech quality measures utilized the 1.4
40
SNR (DB/HZ)
related with intelligibility. These measures consistently 1 30
decrease as the distance increases, are far better for mi- 0.8
20
crophone recordings (versus headphone recordings), and 0.6
decrease after AAC coding, as expected. The SAR index 0.4
10
is of particular interest, since it is known to correlate, to 0.2
a certain extent, with subjective ratings thus assessing 0 0
speech intelligibility. Note, for instance, that the SAR in- 1 4 7 9 12 15 18 21
FREQUENCY (KHZ)
dex for a microphone positioned nine meters away from Figure 7. Channel capacity and SNR as a function of frequency.
the speech source is in between the indices obtained for
headphones located three and five meters apart from the SNR values were then used to evaluate the channel ca-
speech source. In addition, note that the codec’s impact pacity for each of the 100 Hz bandwidth sub-channels
on degradation does not contribute to a substantial de- for different transmission distances, using the linear ap-
crease in the speech quality. proximation previously described.
4.4 Channel capacity Our experiments suggest that headphones turned into mi-
Thus far we have assessed the speech recording quality crophones have significant potential for covert infor-
attained using headphones as microphones for speech mation transmission, particularly considering that nor-
transmission. In this sub-section, we investigate the po- mal human hearing capabilities typically decrease at fre-
tential of using the headphone acquired acoustic waves quencies over 10 kHz, and large inaudible spectrum
to convey digital information, in terms of channel capac- regions are available for communication at reasonable
ity. We focus on frequencies beyond the hearing range, rates.
which can be seen as secure and covert channels for 4.5 Improving SNR by combining headphone channels
transferring information between two computers. Headphones and earphones contain two speakers (right
Channel capacity ( ) is a measure of the theoretical up- and left) which are transformed by SPEAKE(a)R into
per bound on the rate at which information can be trans- two microphones (channels). It known that the SNR of
mitted (in bits per second) over a communication chan- sampled audio can be reduced by averaging the outputs
nel by means of signal S. Under the assumptions of ad- of the two channels. Assuming that the noise signals pre-
ditive interfering Gaussian noise ( ) and available sent in different channels are random and thus uncorre-
bandwidth ( (Hz)), the channel capacity can be calcu- lated, combine the headphone/loudspeakers’ right and
lated using the Shannon-Hartley theorem [21]: left channels would average out the noise, while enhanc-
ing the desired signals which are correlated. Theoreti-
cally, the uncorrelated noise sums as a root sum square,
resulting in a √2 increase, while perfectly correlated sig-
= log 1+ nals increase by a factor of 2. This difference yields a 3
dB increase in the SNR. Nevertheless, in practice, we did
As the formula indicates, the higher the SNR and chan- not succeed in improving the SNR of speech signals ac-
nel bandwidth, the higher the amount of information that quired through the headphones. We aligned and com-
can be conveyed. Note that for large SNRs (S/N >>1) bined right and left headphone channels, but the overall
≈ 0.33 ∗ ∗ ( ). Using this approximation, SNR gain obtained was a marginal 0.1 dB. We believe
Figure 7 shows SNR values and respective channel ca- that due to the headphone channels’ proximity, there is a
pacity measured for different frequency ranges in our ex- high level of correlation between the channels, and the
perimental setup, calculated as follows. Similar to the averaging process is inefficient.
previous experiments, pure sinusoidal tones were played
from a source located at different distances from the re- 5. Related Work
ceiving computer and recorded via the headphones at a It is known that PC malware and mobile apps can use a
44.1 kHz sample rate. The SNR was measured for con- microphone for spying purposes, and many types of spy-
secutive 100 Hz frequency bands up to 22 kHz as the ware with recording capabilities have been found in the
power ratio between the respective frequency tone and wild [22] [23] [24] [25] [26] [27]. In 2015, Google re-
the average background noise over the bandwidth. moved its listening software from the Chromium
browser after receiving complaints about potentially ex- Table 6. Defensive Countermeasures
posing private conversations [28]. More recently, Face- Countermeasure Pros Cons
book was suspected of (and denied) using a mobile de- Prohibit the use of head- Hermetic Low usability
vice's microphone to eavesdrop on conversations so it phones/earphones/speakers protection
could better target ads [29]. In addition, there are cur-
rently many applications sold on the Internet that facili- Use one way speakers/on- Hermetic Not relevant to
tate the use of microphones and cameras to gather infor- board amplifiers protection headphones and
mation for surveillance and other purposes [30]. How- earphones
ever, such mobile or desktop applications require either Disable BIOS/UEFI audio co- Easy to Low usability
built-in or external microphones. dec deploy