A Spectutils Tutorial

Kai Lassfolk

Version history 2010-12-23: Translated from Finnish version
2010-12-26: Updated

Introduction

This document describes the use of the Spectutils audio analysis and visualization functions for the Gnu Octave numerical programming language. The Octave functions covered here are named oscgram, spec2dw, spec3dw, sonogw, hps2dt, hps2d and hpssono. They can be used for making oscillograms, 2D and 3D spectrograms, and pitch plots.

While learning to use Spectutils, it is recommended to also learn how to use Octave's command line memory (e.g. browsing through previous command lines using the arrow keys). It makes using Spectutils a lot easier, because you don't have to retype every Spectutils function for each analysis. Instead, you can use a previous call as a template and only change the needed parameters.

This tutorial doen not cover all Spectutils functions. For the functions not yet coverer here (e.g. hps2d, hpssono, rms2dw, thdn, linsweepgen, maxfreq, maxlevel etc.), please refer to their help texts obtained under Octave (e.g. "help hps2d").

Octave functions

oscgram

The oscgram function plots an oscillogram from an input audio file. Oscgram can analyze either WAV monophonic, stereo or multichannel files or RAW (i.e. headerless) monophonic files. Only one channel of a stereo or multichannel file is analyzed. This applies to all Spectutils audio analysis functions.

Ofter starting the GNU Octave program, a help text can be displayed by typing the following command at the octave command prompt:

octave:1> help oscgram
There,
octave:1>
is the Octave's command "prompt" and "help oscgram" is the command typed by the user. If a help for oscgram text appears, it indicates that Spectutils is installed and working properly.

Oscgram has three mandatory parameters (from left to right): input sound file name, start time (offset, in seconds), and duration (in seconds) of the oscillogram. Aliohjelmakutsulla on kolme pakollista parametria, An optional fourth parameter can be used for typing a comment text (surrounded by quotes), which will be appended to the oscillogram title. A fifth, also optional parameter may be used for specifying a set of "options" for modifying the visual appearance and other features of the oscillogram plot. A detailed description of the is given the oscgram help text.

Function call examples:

Example 1:

octave:2> oscgram('a2000111902-05.wav', 2.74, 0.5, 'Violin X, G string.');
There, a2000111902-05.wav is the input sound file name, offset (start time) is 2.74 seconds and duration is 0.5 seconds. The last parameter is the comment text.

WAV format sound files can be either monophonic (one-channel) or multichanneled. They can have any sampling rate. Also, any sample encoding supported by the Octave wavread function is supported, e.g., 16 bit integer or 32 bit floating point. By default, the first channel of a multichannel file (i.e., the left channel of a two-cahnnel stereo file) is analyzed. Another channel can be selected with an optional parameter. See help text for details.

RAW format (i.e. headerless) sound files can be only monophonic. By default, Spectutils functions assume that a RAW format file has a sampling rate of 44100 Hz and samples are 16 bit signed integers. Other sampling rates and sample encodings can be specified with optional parameters.

spec2dw

The spec2dw plots 2D magnitude spectrums. Like oscgram, spec2dw, oscgram supports both WAV and RAW format files.

A help text can be displayed by typing:

octave:3> help spec2dw
Spec2dw is used in a similar way as oscgram. The main difference is that spec2dw has parameters specific to Fourier analysis.

Spec2dw:llä has 5 to 10 parameters, from left to right:

  1. Sound file specifier, which can be either a sound file name (enclosed in quotes) or a "cell" (enclosed in braces). aaltosuluilla. If, for example, you want to analyze the first channel of the file test.wav, you simply type 'test.wav' as the sound file specifier. To analyze the second channel, you would type {'test.wav', 2} as the first parameter. See help text for details.
  2. Offset, i.e. start time of the spectrum in seconds.
  3. Number for Fast Fourier Transform (FFT) "points", i.e. the frequency resolution of the analysis. This must be a power of two, e.g., 256, 512, 1024, etc.
  4. The analysis window size (in samples). This must be equal or less than the number of FFT points. It does not have to be a power of two.
  5. Window type: One of either 'hanning', 'hamming' or 'rectangle'.
  6. Low frequency limit of the spectrum plot. This is the fist optional parameter.
  7. High frequency limit of the spectrum plot.
  8. A weighting factor for high frequencies.
  9. A comment text appended to the plot title.
  10. A text string of one or more options, separated by a comma. E.g. 'magdb' specifies that the magnitude axis is displayed in decibels using a logarithmic scale (by detault, a linear magnitude scale is used), 'logfreq' specifies a logarithmic scale for the frequency axis, 'spline' plots a continuous curve between concecutive Fourier points, and 'splineplus' plots also a plus ("+") character for each Fourier point.

Function call examples:

Example 1:

octave:4> spec2dw('a2001061901-01d.wav', 4.0, 4096, \
            4096, 'hanning', 20, 30000, 1, 'Viola X, C string.');
There, a2001061901-01d.wav is the sound file name. The offset is 4.0 seconds. Both the number of FFT points and the window size is 4096. The window type is 'hanning'. The low frequency limit is 20 Hz and high frequency limit is 20 kHz. The high frequency weighting factor is 1 (i.e., no weighting is used).

A backslash character (\) can be typed for dividing an Octave command onto more than one line. Inside a function call, backslashes can be omitted.

Example 2:

octave:5> spec2dw('a2001061901-01d.raw', 4.0, 4096, 4096, \
                'hanning', 20, 30000, 1, 'Altto X, c-kieli', 'magdb');
There, the parameters the same as in the previous example with the exception that the magnitude axis is displayed in decibels.

spec3dw

Spec3dw produces three-dimensional magnitude spectrograms.

A help text can be displeyd by typing:

octave:6> help spec3dw

Spec3dw has 7 to 12 parameters. They are nearly the same as with the spec2dw. The main differences are the additional parameters concerning the time dimension. The parameters are (from left to right):

  1. Sound file specifier (as above).
  2. Time offset
  3. Duration
  4. Number of FFT points (as above)
  5. Window size
  6. Window type, either 'hanning', 'hamming' or 'rectangle'.
  7. The amount of "folding" of concecutive windows. The value specifies the amount of samples that are advanced from one window to the next one. A typical value is half the window size (or less).
  8. Low frequency limit
  9. High frequency limit
  10. High frequency weighting factor
  11. Comment text
  12. Optional parameters, e.g.: 'magdb' for displaying the magnitude axis as decibels, 'logfreq' for displaying a logarithmic frequency axis, and 'revfreq' for reversing the frequency axis.

Example function calls:

Example 1:

octave:8> spec3dw('a2001061901-01d.wav', 3.53, 0.5, 4096, \
            4096, 'hanning', 512, 0, 10000, 1, 'Viola X, c-kieli.');
The the sound file name is a2001061901-01d.wav, time offset of the plot is 3.53 seconds and the duration of the plot in 0.5 seconds. Both the number of FFT points and the window size are 4096. A Hanning window is used. A new spectrum is produced for every 512 audio samples. The low and high frequency limits are 0 Hz and 10 kHz, respectively. The high frequency weighting factor is 1 (i.e. no weighting).

Example 2:

octave:9> spec3dw('a2000111902-05.wav', 2.74, 0.5, 1024, \
            800, 'hanning', 400, 20, 4000, 1, 'Violin Y, G string.');
There, the soud file name is a2001061901-01d.wav, the number of FFT points is 1024 and window size is 800 samples. A new spectrum analysis is performed for each 400 audio samples. The frequency range is 20-4000 Hz.

Example 3:

octave:10> spec3dw('a2000111902-05.wav', 2.74, 0.5, 1024, \
                      800, 'hanning', 400, 20, 4000, 1, \
		      'Violin Y, G string.', 'magdb,logfreq');
The parameters are the same is in the previous example with the exception of specifyig a decibel scale for the magnitude axis and a logarithmic frequency axis.

Additional parameters can be specifyed for sound files. These are dependent of the sound file format (either WAV or RAW). For WAV files, an audio channel can be specified, for example:

octave:11> spec3dw({'a2000111902-05.wav', 2}, ...);
There, the sound file name is a2000111902-05.wav and the analyzed audio channel is 2 (instead of the default 1).

For RAW files, the sampling rate and sample encoding format can be specified. Supported sample encodings are 'short' (16-bit signed integer) 'float' (32-bit floating point), and 'double' (64-bit floating point).

An example function call for RAW format files (other parameters are the same as in Example 3 above):

octave:12> spec3dw({'a2000111902-05.raw', 44100, 'short'}, 2.74, 0.5, 1024, \
                      800, 'hanning', 400, 20, 4000, 1, \
		      'Violin Y, G string.', 'magdb,logfreq');

The same for 96 kHz 32-bit floating point files:

octave:13> spec3dw('a2001061901-01d.raw', 96000, 'float', 3.53, 0.5, 4096, \
            4096, 'hanning', 512, 0, 30000, 1, 'Viola X, C string.');

sonogw

Sonogw plots a two-dimensional sonogram, where magnitude peaks are displayed as dark areas in the plot. Its parameters are basically the same as in spec3dw.

Example function call:

octave:14> sonogw('a2001061901-01d.wav', 0, 10, 2048, \
            2048, 'hanning', 512, 0, 30000, 1, 'Viola X, open strings.');

hps2dt

Hps2dt produces a fundamental frequency plot versus time using a Harmonic Product Spectrum algorithm (abbr. HPS). Both a time-varying pitch and a reliability graph are plotted.

The parameters are nearly the same as in spec3dw and sonogw. The main difference is that the high frequency weight factor is replaced by the number of "iterations" of the HPS analysis. Recommended values for are 3 and 4. Values lower than 3 give unreliable pitch estimates for most real-world audio signals while the function get very slow with values higher than 4 with little or no increase in reliabiity.

With typical musical audio sigals, it is recommended to set the high frequency limit to 5000 Hz or less.

An example function call:

octave:15> hps2dt('a2001061901-01d.wav', 3.53, 0.5, 2048, \
            2048, 'hanning', 512, 0, 2000, 3, 'Viola X, C string.');
Hps2dt is a slow function. Therefore, it is recommended to start with a small amount of FFT points (e.g. 1024) and with small durations (a couple of seconds).

Tips

If Octave starts to behave stranglely, it can be easily reset by quitting it (with command "quit") and starting again.

The Octave view function can be used to change the viewing angle of 3-dimensional plots, such as thos produced with spec3dw. (E.g. "view(-60, 20);" or "view(-60, 160);".)

Furthermore, the color map of the 3D plots can be changed using the Octave colormap function (see "help colormap"). For example, the command "colormap('winter');" shows the plot in shades of blue while "colormap('gray');" shows it as shades of gray.

The color map is a matrix of 3 by 64 real number values with values ranging from 0 to 1. Each row of the matrix represents the color components (red, green, and blue) of one shade in the colormap where 0 is a maximum value and 1 the minimum. The contrast and/or intensity of the colormap can be adjusted, for example, by multiplying the matrix with some value. For example, the command line "newmap = colormap('gray'); colormap(newmap * 0.2);" sets the color map to gray scale and adjust it to a use darker shades of gray. More refined adjustment can be made by using Octaves matrix calculation operations.

Octave can make a new plot either to a new window or overwrite a previous plot in an existing window. The latter is the default action. To create a new window for the plot, type the command "figure" before calling a Spectutils analysis function (e.g osgram or spec3dw). The plot can also directed to any existing window by entering a window number as a parameter, e.g., "figure(1);", "figure(2);" etc.

To save calculation time, several Octave function calls can be combined on one command line by separating the individual function calls by semicolons (";"). For example, it is recommended to set both the viewing angle and color map on the same command line as a spec3dw function, e.g.:

octave:15> figure; spec3dw('a2001061901-01d.raw', 96000, 'float', \
            3.53, 0.5, 4096, 4096, 'hanning', 512, 0, 30000, 1, \
	    'Viola X, C string.'); view(-75, 15); colormap('winter');
There, Octave creates a new window and sets the viewing angle and color map to desired values. This takes much less calculation time than entering each function call on a separate command line.

Printing the plots

The analysis plots can be printed with the Octave pring function. On some systems (e.g. Mac OS X with AquaTerm), the plots can be also printed or saved to a file from the viewing program.

The Octave prin function prints the last plot to aither a printer or to a file. The default printer can be chanced by entering a parameter string. For example,

    print('-Plp');
prints to the printer named "lp".

A "portrait" or "landscape" orientation can also be specified, for example:

    print('-landscape');
prints the last plot in landscape orientation.

Printing to a file can also be specified with a parameter. For example,

    print('testplot.jpg', '-djpeg');
prints the last plot to a JPEG file named "testplot.jpg".

Higher resolution plots can be obtained by saving to EPS files, for example:

    print('testplot.eps', '-deps');
Several parameters can be specified by separating them with commas. For example, the command
    print('testplot.ps', '-deps', '-landscape');
Prints the last plot to a EPS file in landscape orientation.
Copyright (c) 2001-2010 Kai Lassfolk
All Rights Reserved