Northwestern University
EECS 352: Machine Perception of Audio
Bryan Pardo

Josh Shi

Diane Liu

John Franklin


This project takes inspiration from Google Tone and dial-up Internet, both of which use audio tones to transfer data (in the case of Google Tone, it's to share a URL).

Steganotes goes one step further and investigates if we can transfer text of arbitrary length over-the-air using those same underlying principles.

How it Works

The encoding function Steganotes uses takes in two parameters: 1) a file (e.g. a text file) and 2) an optional pre-existing wav file in which the data will be encoded (if you do not provide a wav file, a simple sine wave will be generated and used instead).

To begin the encoding process, Steganotes first generates a spectrogram of the signal into which the data will be encoded. It steps through the indices in the time domain of that spectrogram and, for each time index i, gets the ith byte of the data. It then coverts that byte into its decimal value and maxes out the corresponding frequency bin. In this way, each byte has a one-to-one mapping to a frequency bin.

The signal is also concatenated with signals that indicate where the message begins and ends.

The decoding process is then relatively straightforward. Steganotes takes in a recorded signal, generates a spectrogram of it, and steps through the time domain until it finds the start signal. Then it looks through the signal and records the index of the frequency bin with the highest value. This index is then converted back to the corresponding hex character.

// this is not the actual code, but it gives you a good idea of what's going on
def encode(data_file):
  signal = make_sinewave()
  spec = stft(signal)

  i = 0
  for byte in data_file:
    bin = toDecimal(byte)
    spec[bin][i] = a number higher than the max value in spec[:][i]
    i += 1

  encoded_signal = istft(spec)
  concat(START_SIGNAL, encoded_signal, STOP_SIGNAL)

def decode(wavfile):
  signal = load(wavfile)
  spec = stft(signal)
  // spec[f][t], where f marks the frequncy bin and t marks the time index
  message = ""

  i, decode = 0, False
  for i in spec.timeDomain:
    bin = spec.indexOfMaxValueAtTimeIndex(i)

    if bin == START_FREQ:
      decode = True
      i += 1

    if bin == STOP_FREQ:

    if decode:
      char = str(chr(h))
      message += char


We tested the system by encoding data into a wav file and playing it through a pair of laptop speakers (13" 2017 Macbook Pro) while another laptop (15" 2014 Macbook Pro) listened for the signal and decoded it once it was received. The message encoded was "The quick brown fox jumped over the lazy dog".

The following results were from tests performed in an open room with little to no background noise. The sending laptop had its volume at about 3/4 of its maximum. High-level tests with simulated background noise (coffee shop background noise) found no significant effect on performance because the frequency bins modified are outside the range of typical background noise. However, when we performed high-level tests consisting of a few distinct voices, we found that decoding performace fell.

Distance (cm) Recorded Message Hamming Distance
TTe !uuc  !rown !ox !umped !ver ! e !azy !og
The quick brown fox jumped over the layy dog
he quick broonnfox jumppe over the lazy dogg
hh   iic  bron  fox jmmpe  oee   h  lay   og
he !uick brown fox jumpee !ver the lazy dogg\
The  hick bronn  ox  mmped oeer  he  ayy dog


The following audio file is the signal generated by the string "The quick brown fox jumped over the lazy dog".