On the issue of calibration in DNN-based speaker recognition systems

This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment tra...

Descripción completa

Guardado en:
Detalles Bibliográficos
Publicado: 2016
Materias:
Acceso en línea:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p1825_McLaren
http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p1825_McLaren
Aporte de:
id paper:paper_2308457X_v08-12-September-2016_n_p1825_McLaren
record_format dspace
spelling paper:paper_2308457X_v08-12-September-2016_n_p1825_McLaren2023-06-08T16:35:30Z On the issue of calibration in DNN-based speaker recognition systems Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Calibration Speech communication Speech processing Bottleneck features Computationally efficient Deep neural networks Discriminative power Mismatch Speaker recognition Speaker recognition system Universal background model Speech recognition This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power. Copyright © 2016 ISCA. 2016 https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p1825_McLaren http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p1825_McLaren
institution Universidad de Buenos Aires
institution_str I-28
repository_str R-134
collection Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic Bottleneck features
Calibration
Deep neural network
Mismatch
Speaker recognition
Alignment
Calibration
Speech communication
Speech processing
Bottleneck features
Computationally efficient
Deep neural networks
Discriminative power
Mismatch
Speaker recognition
Speaker recognition system
Universal background model
Speech recognition
spellingShingle Bottleneck features
Calibration
Deep neural network
Mismatch
Speaker recognition
Alignment
Calibration
Speech communication
Speech processing
Bottleneck features
Computationally efficient
Deep neural networks
Discriminative power
Mismatch
Speaker recognition
Speaker recognition system
Universal background model
Speech recognition
On the issue of calibration in DNN-based speaker recognition systems
topic_facet Bottleneck features
Calibration
Deep neural network
Mismatch
Speaker recognition
Alignment
Calibration
Speech communication
Speech processing
Bottleneck features
Computationally efficient
Deep neural networks
Discriminative power
Mismatch
Speaker recognition
Speaker recognition system
Universal background model
Speech recognition
description This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power. Copyright © 2016 ISCA.
title On the issue of calibration in DNN-based speaker recognition systems
title_short On the issue of calibration in DNN-based speaker recognition systems
title_full On the issue of calibration in DNN-based speaker recognition systems
title_fullStr On the issue of calibration in DNN-based speaker recognition systems
title_full_unstemmed On the issue of calibration in DNN-based speaker recognition systems
title_sort on the issue of calibration in dnn-based speaker recognition systems
publishDate 2016
url https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p1825_McLaren
http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p1825_McLaren
_version_ 1768544843124965376