Northwestern University
EECS 352: Machine Perception of Audio
Bryan Pardo
Josh Shi
joshshi@u.northwestern.edu
Diane Liu
dianeliu@u.northwestern.edu
John Franklin
jef@u.northwestern.edu
This project takes inspiration from Google Tone and dial-up Internet, both of which use audio tones to transfer data (in the case of Google Tone, it's to share a URL).
Steganotes goes one step further and investigates if we can transfer text of arbitrary length over-the-air using those same underlying principles.
The encoding function Steganotes uses takes in two parameters: 1) a file (e.g. a text file) and 2) an optional pre-existing wav file in which the data will be encoded (if you do not provide a wav file, a simple sine wave will be generated and used instead).
To begin the encoding process, Steganotes first generates a spectrogram of the signal into which the data will be encoded. It steps through the indices in the time domain of that spectrogram and, for each time index i
, gets the i
th byte of the data. It then coverts that byte into its decimal value and maxes out the corresponding frequency bin. In this way, each byte has a one-to-one mapping to a frequency bin.
The signal is also concatenated with signals that indicate where the message begins and ends.
The decoding process is then relatively straightforward. Steganotes takes in a recorded signal, generates a spectrogram of it, and steps through the time domain until it finds the start signal. Then it looks through the signal and records the index of the frequency bin with the highest value. This index is then converted back to the corresponding hex character.
// this is not the actual code, but it gives you a good idea of what's going on def encode(data_file): signal = make_sinewave() spec = stft(signal) i = 0 for byte in data_file: bin = toDecimal(byte) spec[bin][i] = a number higher than the max value in spec[:][i] i += 1 encoded_signal = istft(spec) concat(START_SIGNAL, encoded_signal, STOP_SIGNAL) def decode(wavfile): signal = load(wavfile) spec = stft(signal) // spec[f][t], where f marks the frequncy bin and t marks the time index message = "" i, decode = 0, False for i in spec.timeDomain: bin = spec.indexOfMaxValueAtTimeIndex(i) if bin == START_FREQ: decode = True i += 1 continue if bin == STOP_FREQ: break if decode: char = str(chr(h)) message += char
We tested the system by encoding data into a wav file and playing it through a pair of laptop speakers (13" 2017 Macbook Pro) while another laptop (15" 2014 Macbook Pro) listened for the signal and decoded it once it was received. The message encoded was "The quick brown fox jumped over the lazy dog".
The following results were from tests performed in an open room with little to no background noise. The sending laptop had its volume at about 3/4 of its maximum. High-level tests with simulated background noise (coffee shop background noise) found no significant effect on performance because the frequency bins modified are outside the range of typical background noise. However, when we performed high-level tests consisting of a few distinct voices, we found that decoding performace fell.
Distance (cm) | Recorded Message | Hamming Distance |
---|---|---|
0 | TTe !uuc !rown !ox !umped !ver ! e !azy !og |
12 |
20 | The quick brown fox jumped over the layy dog |
1 |
40 | he quick broonnfox jumppe over the lazy dogg |
6 |
60 | hh iic bron fox jmmpe oee h lay og |
16 |
80 | he !uick brown fox jumpee !ver the lazy dogg\ |
6 |
100 | The hick bronn ox mmped oeer he ayy dog |
10 |
The following audio file is the signal generated by the string "The quick brown fox jumped over the lazy dog".