Kevin's Security Scrapbook: Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Thursday, May 3, 2018

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

We construct targeted audio adversarial examples on automatic speech recognition.

Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (recognizing up to 50 characters per second of audio).

We apply our white-box iterative optimization-based attack to Mozilla’s implementation DeepSpeech end-to-end, and show it has a 100% success rate.

The feasibility of this attack introduces a new domain to study adversarial examples. more audio examples

From one of our Blue Blaze irregulars... "Audio Adversarialism is the practice of fooling voice-to-text and voice recognition systems by effectively embedding ‘hidden’ commands in audio files which are inaudible to human ears but which are picked up by speakers and mean, in theory, that we might hear the telly saying “Should have gone to Specsavers!” where instead our Amazon Echo is in fact hearing “Alexa, lock all the doors, turn on the gas and start sparking all the bogs in 00:59, 00:58…”. This is...not scary at all, oh no. Hi Siri! Hi Alexa!"