gnes.preprocessor.audio.vggish_example_helper.mel_features module¶

gnes.preprocessor.audio.vggish_example_helper.mel_features.frame(data, window_length, hop_length)[source]¶

Convert array into a sequence of successive possibly overlapping frames.

An n-dimensional array of shape (num_samples, …) is converted into an (n+1)-D array of shape (num_frames, window_length, …), where each frame starts hop_length points after the preceding one.

This is accomplished using stride_tricks, so the original data is not copied. However, there is no zero-padding, so any incomplete frames at the end are not included.

Args:: data: np.array of dimension N >= 1. window_length: Number of samples in each frame. hop_length: Advance (in samples) between each window.
Returns:: (N+1)-D np.array with as many rows as there are complete frames that can be extracted.

gnes.preprocessor.audio.vggish_example_helper.mel_features.hertz_to_mel(frequencies_hertz)[source]¶

Convert frequencies to mel scale using HTK formula.

Args:: frequencies_hertz: Scalar or np.array of frequencies in hertz.
Returns:: Object of same size as frequencies_hertz containing corresponding values on the mel scale.

gnes.preprocessor.audio.vggish_example_helper.mel_features.log_mel_spectrogram(data, audio_sample_rate=8000, log_offset=0.0, window_length_secs=0.025, hop_length_secs=0.01, **kwargs)[source]¶

Convert waveform to a log magnitude mel-frequency spectrogram.

Args:: data: 1D np.array of waveform data. audio_sample_rate: The sampling rate of data. log_offset: Add this to values when taking log to avoid -Infs. window_length_secs: Duration of each window to analyze. hop_length_secs: Advance between successive analysis windows. **kwargs: Additional arguments to pass to spectrogram_to_mel_matrix.
Returns:: 2D np.array of (num_frames, num_mel_bins) consisting of log mel filterbank magnitudes for successive frames.

gnes.preprocessor.audio.vggish_example_helper.mel_features.periodic_hann(window_length)[source]¶

Calculate a “periodic” Hann window.

The classic Hann window is defined as a raised cosine that starts and ends on zero, and where every value appears twice, except the middle point for an odd-length window. Matlab calls this a “symmetric” window and np.hanning() returns it. However, for Fourier analysis, this actually represents just over one cycle of a period N-1 cosine, and thus is not compactly expressed on a length-N Fourier basis. Instead, it’s better to use a raised cosine that ends just before the final zero value - i.e. a complete cycle of a period-N cosine. Matlab calls this a “periodic” window. This routine calculates it.

Args:: window_length: The number of points in the returned window.
Returns:: A 1D np.array containing the periodic hann window.

gnes.preprocessor.audio.vggish_example_helper.mel_features.spectrogram_to_mel_matrix(num_mel_bins=20, num_spectrogram_bins=129, audio_sample_rate=8000, lower_edge_hertz=125.0, upper_edge_hertz=3800.0)[source]¶

Return a matrix that can post-multiply spectrogram rows to make mel.

Returns a np.array matrix A that can be used to post-multiply a matrix S of spectrogram values (STFT magnitudes) arranged as frames x bins to generate a “mel spectrogram” M of frames x num_mel_bins. M = S A.

The classic HTK algorithm exploits the complementarity of adjacent mel bands to multiply each FFT bin by only one mel weight, then add it, with positive and negative signs, to the two adjacent mel bands to which that bin contributes. Here, by expressing this operation as a matrix multiply, we go from num_fft multiplies per frame (plus around 2*num_fft adds) to around num_fft^2 multiplies and adds. However, because these are all presumably accomplished in a single call to np.dot(), it’s not clear which approach is faster in Python. The matrix multiplication has the attraction of being more general and flexible, and much easier to read.

Args:

num_mel_bins: How many bands in the resulting mel spectrum. This is: the number of columns in the output matrix.
num_spectrogram_bins: How many bins there are in the source spectrogram: data, which is understood to be fft_size/2 + 1, i.e. the spectrogram only contains the nonredundant FFT bins.
audio_sample_rate: Samples per second of the audio at the input to the: spectrogram. We need this to figure out the actual frequencies for each spectrogram bin, which dictates how they are mapped into mel.
lower_edge_hertz: Lower bound on the frequencies to be included in the mel: spectrum. This corresponds to the lower edge of the lowest triangular band.

upper_edge_hertz: The desired top edge of the highest frequency band.

Returns:

An np.array with shape (num_spectrogram_bins, num_mel_bins).

Raises:

ValueError: if frequency edges are incorrectly ordered or out of range.

gnes.preprocessor.audio.vggish_example_helper.mel_features.stft_magnitude(signal, fft_length, hop_length=None, window_length=None)[source]¶

Calculate the short-time Fourier transform magnitude.

Args:: signal: 1D np.array of the input time-domain signal. fft_length: Size of the FFT to apply. hop_length: Advance (in samples) between each frame passed to FFT. window_length: Length of each block of samples to pass to FFT.
Returns:: 2D np.array where each row contains the magnitudes of the fft_length/2+1 unique values of the FFT for the corresponding frame of input samples.