Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (recognizing up to 50 characters per second of audio).
We apply our white-box iterative optimization-based attack to Mozilla’s implementation DeepSpeech end-to-end, and show it has a 100% success rate.
The feasibility of this attack introduces a new domain to study adversarial examples. more audio examples
From one of our Blue Blaze irregulars... "Audio
Adversarialism is the practice of fooling voice-to-text and voice
recognition systems by effectively embedding ‘hidden’ commands in audio
files which are inaudible to human ears but which are picked up by
speakers and mean, in theory, that we might hear the telly saying
“Should have gone to Specsavers!” where instead our Amazon Echo is in
fact hearing “Alexa, lock all the doors, turn on the gas and start
sparking all the bogs in 00:59, 00:58…”. This is...not scary at all, oh
no. Hi Siri! Hi Alexa!"