Post-process embeddings from VGGish.
Post-processes VGGish embeddings.
The initial release of AudioSet included 128-D VGGish embeddings for each segment of AudioSet. These released embeddings were produced by applying a PCA transformation (technically, a whitening transform is included as well) and 8-bit quantization to the raw embedding output from VGGish, in order to stay compatible with the YouTube-8M project which provides visual embeddings in the same format for a large set of YouTube videos. This class implements the same PCA (with whitening) and quantization transformations.
Constructs a postprocessor.
- pca_params_npz_path: Path to a NumPy-format .npz file that
- contains the PCA parameters used in postprocessing.
Applies postprocessing to a batch of embeddings.
- embeddings_batch: An nparray of shape [batch_size, embedding_size]
- containing output from the embedding layer of VGGish.
- An nparray of the same shape as the input but of type uint8, containing the PCA-transformed and quantized version of the input.