Introduction to Speech Analysis

In [24]:
from __future__ import division
import matplotlib.pyplot as plt
%matplotlib inline
import scipy.io.wavfile
import scipy.signal
from IPython.display import Audio

1. Wave files, sampling

Read a wave file, get its sampling rate and length, and display it to play. The wave data is a single dimensional array, where wave[i] is the amplitude of frame i. Each frame represents 1/fs of a second.

In [25]:
filename = 'sa1.wav'

fs, wave = scipy.io.wavfile.read(filename) 
#Note that this particular file has a single channel. Most audio files will have two (stereo) channels.

print 'Data:', wave
print 'Sampling rate:', fs
print 'Audio length:', wave.size/fs, 'seconds'
print 'Lowest amplitude:', min(wave)
print 'Highest amplitude:', max(wave)

Audio(filename)
Data: [ 2 -2  2 ...,  9  2 24]
Sampling rate: 16000
Audio length: 3.09125 seconds
Lowest amplitude: -5590
Highest amplitude: 7709
Out[25]:

Let's plot the wave signal. Zoom in to look at the waves at different points in time. What do you notice?

In [26]:
def plotwave(fs, signal, maxf=None):
    """Visualize (a segment of) a wave file."""
    # maxf = maximum number of frames
    frames = scipy.arange(signal.size)   # x-axis
    if maxf:
        plt.plot(frames[:maxf], signal[:maxf])
        plt.xticks(scipy.arange(0, maxf, 0.5*fs), scipy.arange(0, maxf/fs, 0.5))
        plt.show()
    else:
        plt.plot(frames, signal)
        plt.xticks(scipy.arange(0, signal.size, 0.5*fs), scipy.arange(0, signal.size/fs, 0.5))
        plt.show()
In [27]:
plotwave(fs, wave)

Exercise: Complete the definition of downsample. Downsampling by a factor of n removes all but every nth sample from the original sound, and writes a new file with the same pitch as the original.

In [28]:
def downsample(filename, factor):
    """Lower the sampling rate by factor."""
    newfilename = filename[:-4]+'-down'+str(factor)+'.wav'
    fs, wave = scipy.io.wavfile.read(filename)
    newfs = fs/factor
    # fill in the rest
    indices = range(0, wave.size, factor)
    wave = wave[indices]
    scipy.io.wavfile.write(newfilename, newfs, wave)
    
downsample('sa1.wav', 2)
downsample('sa1.wav', 4)
downsample('sa1.wav', 8)
downsample('sa1.wav', 12)

Notice that the general shape looks the same as the original. If you zoom in to the visualization, you will see the lower resolution of the signal.

In [29]:
fs12, wave12 = scipy.io.wavfile.read('sa1-down12.wav')
plotwave(fs12, wave12)

2. Sine waves and frequencies

If we can read a wav file and store its signal, we can also create a signal and write it to a wav file. Let's generate sine waves from note frequencies in the 4th octave.

In [30]:
note2freq = {'C4':261.6, 'D4':293.7, 'E4':329.6, 'F4':349.2, 'G4':392.0, 'A4':440.0, 'B4':493.9, 'C5':523.3} # in Hz (waves per second)

# basic parameters
duration = .6  # in seconds
fs = 8000  # sampling rate
frames = scipy.arange(duration*fs)
amplitude = 4000   

note2signal = {note: amplitude * scipy.sin(2*scipy.pi*frames*note2freq[note]/fs) for note in note2freq}
note2signal['sp'] = scipy.ones(int(duration*fs/4))

note = 'E4'
plotwave(fs, note2signal[note], 1000)  # visualize first 1000 frames

scipy.io.wavfile.write('sine'+str(note)+'.wav', fs, note2signal[note])  # write to file
Audio('sine'+str(note)+'.wav')
Out[30]:

When you know the notes to sing, you can sing most aaanythiing.

In [31]:
# c major scale
allnotes = scipy.hstack([note2signal[note] for note in 'sp C4 D4 E4 F4 G4 A4 B4 C5 sp'.split()])
scipy.io.wavfile.write('cscale.wav', fs, allnotes)
Audio('cscale.wav')
Out[31]:
In [32]:
# twinkle twinkle little star
twinkle = scipy.hstack([note2signal[note] for note in 'sp C4 sp C4 sp G4 sp G4 sp A4 sp A4 sp G4'.split()])
scipy.io.wavfile.write('twinkle.wav', fs, twinkle)
Audio('twinkle.wav')
Out[32]: