On the issue of calibration in DNN-based speaker recognition systems

This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment tra...

Descripción completa

Guardado en:

Detalles Bibliográficos
Publicado:	2016
Materias:	Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Speech communication Speech processing Computationally efficient Deep neural networks Discriminative power Speaker recognition system Universal background model Speech recognition
Acceso en línea:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p1825_McLaren http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p1825_McLaren
Aporte de:	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) de Universidad de Buenos Aires

id	paper:paper_2308457X_v08-12-September-2016_n_p1825_McLaren
record_format	dspace
spelling	paper:paper_2308457X_v08-12-September-2016_n_p1825_McLaren2023-06-08T16:35:30Z On the issue of calibration in DNN-based speaker recognition systems Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Calibration Speech communication Speech processing Bottleneck features Computationally efficient Deep neural networks Discriminative power Mismatch Speaker recognition Speaker recognition system Universal background model Speech recognition This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power. Copyright © 2016 ISCA. 2016 https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p1825_McLaren http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p1825_McLaren
institution	Universidad de Buenos Aires
institution_str	I-28
repository_str	R-134
collection	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic	Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Calibration Speech communication Speech processing Bottleneck features Computationally efficient Deep neural networks Discriminative power Mismatch Speaker recognition Speaker recognition system Universal background model Speech recognition
spellingShingle	Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Calibration Speech communication Speech processing Bottleneck features Computationally efficient Deep neural networks Discriminative power Mismatch Speaker recognition Speaker recognition system Universal background model Speech recognition On the issue of calibration in DNN-based speaker recognition systems
topic_facet	Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Calibration Speech communication Speech processing Bottleneck features Computationally efficient Deep neural networks Discriminative power Mismatch Speaker recognition Speaker recognition system Universal background model Speech recognition
description	This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power. Copyright © 2016 ISCA.
title	On the issue of calibration in DNN-based speaker recognition systems
title_short	On the issue of calibration in DNN-based speaker recognition systems
title_full	On the issue of calibration in DNN-based speaker recognition systems
title_fullStr	On the issue of calibration in DNN-based speaker recognition systems
title_full_unstemmed	On the issue of calibration in DNN-based speaker recognition systems
title_sort	on the issue of calibration in dnn-based speaker recognition systems
publishDate	2016
url	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p1825_McLaren http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p1825_McLaren
_version_	1768544843124965376

On the issue of calibration in DNN-based speaker recognition systems

Ejemplares similares