gnes.encoder.audio.vggish_cores.vggish_postprocess module¶
Post-process embeddings from VGGish.
-
class
gnes.encoder.audio.vggish_cores.vggish_postprocess.
Postprocessor
(pca_params_npz_path)[source]¶ Bases:
object
Post-processes VGGish embeddings.
The initial release of AudioSet included 128-D VGGish embeddings for each segment of AudioSet. These released embeddings were produced by applying a PCA transformation (technically, a whitening transform is included as well) and 8-bit quantization to the raw embedding output from VGGish, in order to stay compatible with the YouTube-8M project which provides visual embeddings in the same format for a large set of YouTube videos. This class implements the same PCA (with whitening) and quantization transformations.
Constructs a postprocessor.
- Args:
- pca_params_npz_path: Path to a NumPy-format .npz file that
- contains the PCA parameters used in postprocessing.
-
postprocess
(embeddings_batch)[source]¶ Applies postprocessing to a batch of embeddings.
- Args:
- embeddings_batch: An nparray of shape [batch_size, embedding_size]
- containing output from the embedding layer of VGGish.
- Returns:
- An nparray of the same shape as the input but of type uint8, containing the PCA-transformed and quantized version of the input.