Turn-taking cues in task-oriented dialogue

As interactive voice response systems become more prevalent and provide increasingly more complex functionality, it becomes clear that the challenges facing such systems are not solely in their synthesis and recognition capabilities. Issues such as the coordination of turn exchanges between system a...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Gravano, A.
Otros Autores: Hirschberg, J.
Formato: Capítulo de libro
Lenguaje:Inglés
Publicado: 2011
Acceso en línea:Registro en Scopus
DOI
Handle
Registro en la Biblioteca Digital
Aporte de:Registro referencial: Solicitar el recurso aquí
LEADER 16625caa a22013937a 4500
001 PAPER-10482
003 AR-BaUEN
005 20230518204031.0
008 190411s2011 xx ||||fo|||| 00| 0 eng|d
024 7 |2 scopus  |a 2-s2.0-79952619484 
040 |a Scopus  |b spa  |c AR-BaUEN  |d AR-BaUEN 
030 |a CSPLE 
100 1 |a Gravano, A. 
245 1 0 |a Turn-taking cues in task-oriented dialogue 
260 |c 2011 
270 1 0 |m Gravano, A.; Departamento de Computación, FCEyN, Universidad de Buenos AiresArgentina; email: gravano@dc.uba.ar 
506 |2 openaire  |e Política editorial 
504 |a Abney, S., Partial parsing via finite-state cascades (1996) Journal of Natural Language Engineering, 2 (4), pp. 337-344 
504 |a Atterer, M., Baumann, T., Schlangen, D., Towards incremental end-of-utterance detection in dialogue systems (2008) Coling, pp. 11-14. , Manchester, UK 
504 |a Beattie, G.W., The regulation of speaker turns in face-to-face conversation; Some implications for conversation in soundonly communication channels (1981) Semiotica, 34, pp. 55-70 
504 |a Beattie, G.W., Turn-taking and interruption in political interviews: Margaret Thatcher and Jim Callaghan compared and contrasted (1982) Semiotica, 39, pp. 93-114 
504 |a Beckman, M.E., Hirschberg, J., (1994) The ToBI Annotation Conventions, , Ohio State Univ 
504 |a Bhuta, T., Patrick, L., Garnett, J.D., Perceptual evaluation of voice quality and its correlation with acoustic measurements (2004) Journal of Voice, 18 (3), pp. 299-304. , DOI 10.1016/j.jvoice.2003.12.004, PII S0892199703001735 
504 |a Boersma, P., Weenink, D., (2001) Praat: Doing Phonetics by Computer, , http://www.praat.org 
504 |a Bull, M., Aylett, M., An analysis of the timing of turn-taking in a corpus of goal-oriented dialogue (1998) ICSLP 
504 |a Cathcart, N., Carletta, J., Klein, E., A shallow model of backchannel continuers in spoken dialogue (2003) EACL, pp. 51-58 
504 |a Charniak, E., Johnson, M., Edit detection and parsing for transcribed speech (2001) Proceedings of NAACL 
504 |a Collins, M., Head-driven statistical models for natural language parsing (2003) Computational Linguistics, 29 (4), pp. 589-637. , DOI 10.1162/089120103322753356 
504 |a Cohen, J., A coefficient of agreement for nominal scales (1960) Educational and Psychological Measurement, 20, pp. 37-46 
504 |a Cohen, W.C., Fast effective rule induction (1995) Proceedings of the Twelfth International Conference on Machine Learning 
504 |a Cortes, C., Vapnik, V., Support vector networks (1995) Machine Learning, pp. 273-297 
504 |a Cutler, E.A., Pearson, M., On the analysis of prosodic turn-taking cues (1986) Intonation in Discourse, pp. 139-156 
504 |a Duncan, S., Some signals and rules for taking speaking turns in conversations (1972) Journal of Personality and Social Psychology, 23, pp. 283-292 
504 |a Duncan, S., Toward a grammar for dyadic conversation (1973) Semiotica, 9, pp. 29-46 
504 |a Duncan, S., On the structure of speaker-auditor interaction during speaking turns (1974) Language in Society, 3, pp. 161-180 
504 |a Duncan, S., Interaction units during speaking turns in dyadic, face-to-face conversations (1975) Organization of Behavior in Face-to-Face Interaction, , Mouton Publishers Den Hague 
504 |a Duncan, S., Fiske, D., (1977) Face-To-Face Interaction: Research, Methods, and Theory, , Lawrence Erlbaum Associates 
504 |a Du Bois, J., Schuetze-Coburn, S., Cumming, S., Paolino, D., Outline of discourse transcription (1993) Talking Data: Transcription and Coding in Discourse Research 
504 |a Edlund, J., Heldner, M., Gustafson, J., Utterance segmentation and turn-taking in spoken dialogue systems (2005) Sprachtechnologie Mobile Kommunikation und Linguistische Ressourcen, pp. 576-587 
504 |a Eskenazi, L., Childers, D.G., Hicks, D.M., Acoustic correlates of vocal quality (1990) Journal of Speech and Hearing Research, 33 (2), pp. 298-306 
504 |a Ferguson, N., Simultaneous speech, interruptions and dominance (1977) British Journal of Social and Clinical Psychology, 16 (4), pp. 295-302 
504 |a Ferrer, L., Shriberg, E., Stolcke, A., A prosody-based approach to end-of-utterance detection that does not require speech recognition (2003) Proceedings of ICASSP 
504 |a Ferrer, L., Shriberg, E., Stolcke, A., Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody (2002) Proceedings of the ICSLP, pp. 2061-2064 
504 |a Ford, C., Thompson, S., Interactional units in conversation: Syntactic intonational and pragmatic resources for the management of turns (1996) Interaction and Grammar, pp. 134-184 
504 |a Fry, D., Simple reaction-times to speech and non-speech stimuli (1975) Cortex, 11, pp. 355-360 
504 |a Godfrey, J., Holliman, E., McDaniel, J., Switchboard: Telephone speech corpus for research and development (1992) IEEE International Conference on Acoustics, Speech, and Signal Processing 
504 |a Goodwin, C., (1981) Conversational Organization: Interaction between Speakers and Hearers, , Academic Press 
504 |a Gravano, A., Benus, S., Hirschberg, J., Mitchell, S., Vovsha, I., Classification of discourse functions of affirmative words in spoken dialogue (2007) Proceedings of Interspeech 
504 |a Heckerman, D., Geiger, D., Chickering, D., Learning Bayesian networks: The combination of knowledge and statistical data (1995) Machine Learning, 20, pp. 197-243 
504 |a Hemphill, C., Godfrey, J., Doddington, G., The ATIS spoken language systems pilot corpus (1990) Proceedings of the Workshop on Speech and Natural Language, pp. 96-101 
504 |a Hjalmarsson, A., On cue - Additive effects of turn-regulating phenomena in dialogue (2009) Diaholmia 
504 |a Jefferson, G., Notes on a systematic deployment of the acknowledgement tokens "yeah"; And "mm hm" (1984) Research on Language & Social Interaction, 17, pp. 197-216 
504 |a Jensen, F., (1996) Introduction to Bayesian Networks, , Springer-Verlag New York 
504 |a Jurafsky, D., Shriberg, E., Fox, B., Curl, T., Lexical, prosodic and syntactic cues for dialog acts (1998) Proceedings of ACL/COLING, Workshop on Discourse Relations and Discourse Markers, pp. 114-120 
504 |a Kendon, A., Some functions of gaze-direction in social interaction (1967) Acta Psychologica, 26, pp. 22-63 
504 |a Kendon, A., Some relationships between body motion and speech (1972) Studies in Dyadic Communication, pp. 177-210 
504 |a Kitch, J.A., Oates, J., Greenwood, K., Performance effects on the voices of 10 choral tenors: Acoustic and perceptual findings (1996) Journal of Voice, 10 (2-3), pp. 217-227 
504 |a Koehn, P., Abney, S., Hirschberg, J., Collins, M., Improving intonational phrasing with syntactic information (2000) Proceedings of ICASSP, Vol. 3, pp. 1289-1290 
504 |a Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., Den, Y., An Analysis of Turn-Taking and Backchannels Based on Prosodic and Syntactic Features in Japanese Map Task Dialogs (1998) Language and Speech, 41 (3-4), pp. 295-321 
504 |a Lafferty, J., McCallum, A., Pereira, F., Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001) 18th International Conference on Machine Learning, pp. 282-289. , Morgan Kaufmann San Francisco, CA 
504 |a Marcus, M., Marcinkiewicz, M., Santorini, B., Building a large annotated corpus of English: The Penn Treebank (1993) Computational Linguistics, 19, pp. 313-330 
504 |a McNeill, D., (1992) Hand and Mind: What Gestures Reveal about Thought, , University of Chicago Press 
504 |a Mushin, I., Stirling, L., Fletcher, J., Wales, R., Discourse structure, grounding, and prosody in task-oriented dialogue (2003) Discourse Processes, 35, pp. 1-31 
504 |a Novick, D., Sutton, S., An empirical model of acknowledgment for spoken-language systems (1994) Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 96-101. , Morristown, NJ, USA 
504 |a Ogden, R., Creaky voice and turn-taking in Finnish (2002) Colloquium of the British Association of Audiological Physicians 
504 |a Pierrehumbert, J., (1980) The Phonology and Phonetics of English Intonation, , Ph.D. Thesis. Massachusetts Institute of Technology 
504 |a Pierrehumbert, J., Hirschberg, J., The meaning of intonational contours in the interpretation of discourse (1990) Intentions in Communication, pp. 271-311 
504 |a Pitrelli, J.F., Beckman, M.E., Hirschberg, J., Evaluation of prosodic transcription labeling reliability in the ToBI framework (1994) Proceedings of ICSLP, pp. 123-126 
504 |a Quinlan, J.R., (1993) C4.5: Programs for Machine Learning, , Morgan Kaufmann, 1993 
504 |a Rabiner, L., A tutorial on Hidden Markov Models and selected applications in speech recognition (1989) Proceedings of the IEEE 77, pp. 257-286 
504 |a Ratnaparkhi, A., Brill, E., Church, K., A maximum entropy model for part-of-speech tagging (1996) Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 133-142 
504 |a Raux, A., Bohus, D., Langner, B., Black, A.W., Eskenazi, M., Doing research on a deployed spoken dialogue system: One year of Let's Go! experience (2006) Proceedings of Interspeech 
504 |a Raux, A., Eskenazi, M., Optimizing endpointing thresholds using dialogue features in a spoken dialogue system (2008) SIGdial, , Columbus, OH 
504 |a Schegloff, E., Discourse as an interactional achievement: Some uses of uh huhand other things that come between sentences (1982) Analyzing Discourse: Text and Talk 
504 |a Sacks, H., Schegloff, E.A., Jefferson, G., A simplest systematics for the organization of turn-taking for conversation (1974) Language, 50, pp. 696-735 
504 |a Schaffer, D., The role of intonation as a cue to turn taking in conversation (1983) Journal of Phonetics, 11, pp. 243-257 
504 |a Schlangen, D., From reaction to prediction: Experiments with computational models of turn-taking (2006) Proceedings of Interspeech 
504 |a Shriberg, E., Stolcke, A., Jurafsky, D., Coccaro, N., Meteer, M., Bates, R., Taylor, P., Van Ess-Dykema, C., Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech? (1998) Language and Speech, 41 (3-4), pp. 443-492 
504 |a Shriberg, E., Stolcke, A., Baron, D., Observations on overlap: Findings and implications for automatic processing of multi-party conversation (2001) Eurospeech, pp. 1359-1362 
504 |a Stolcke, A., Ries, K., Coccaro, N., Shriberg, E., Bates, R., Jurafsky, D., Taylor, P., Meteer, M., Dialogue act modeling for automatic tagging and recognition of conversational speech (2000) Computational Linguistics, 26, pp. 339-373 
504 |a Ten Bosch, L., Oostdijk, N., Boves, L., On temporal aspects of turn taking in conversational dialogues (2005) Speech Communication, 47 (1-2), pp. 80-86. , DOI 10.1016/j.specom.2005.05.009, PII S0167639305001330 
504 |a Vapnik, V.N., (1995) The Nature of Statistical Learning Theory, , Springer-Verlag New York 
504 |a Ward, N., Tsukahara, W., Prosodic features which cue back-channel responses in English and Japanese (2000) Journal of Pragmatics, 32, pp. 1177-1207 
504 |a Ward, N., Rivera, A., Ward, K., Novick, D., Root causes of lost time and user stress in a simple dialog system (2005) Interspeech 
504 |a Wennerstrom, A., Siegel, A.F., Keeping the floor in multi-party conversations: Intonation, syntax, and pause (2003) Discourse Processes, 36, pp. 77-107 
504 |a Wichmann, A., Caspers, J., Melodic cues to turn-taking in English: Evidence from perception (2001) Proceedings of the Second SIGdial Workshop on Discourse and Dialogue 
504 |a Wightman, C., Shattuck-Hufnagel, S., Ostendorf, M., Price, P., Segmental durations in the vicinity of prosodic phrase boundaries (1992) The Journal of the Acoustical Society of America, 91, pp. 1707-1717 
504 |a Witten, I., Frank, E., (2000) Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, , Morgan Kaufmann 
504 |a Yngve, V., On getting a word in edgewise (1970) Proceedings of the Sixth Regional Meeting of the Chicago Linguistic Society, Vol. 6, pp. 657-677 
504 |a Yuan, J., Liberman, M., Cieri, C., Towards an integrated understanding of speech overlaps in conversation (2007) ICPhS XVI, Saarbrücken, Germany 
520 3 |a As interactive voice response systems become more prevalent and provide increasingly more complex functionality, it becomes clear that the challenges facing such systems are not solely in their synthesis and recognition capabilities. Issues such as the coordination of turn exchanges between system and user also play an important role in system usability. In particular, both systems and users have difficulty determining when the other is taking or relinquishing the turn. In this paper, we seek to identify turn-taking cues correlated with human-human turn exchanges which are automatically computable. We compare the presence of potential prosodic, acoustic, and lexico-syntactic turn-yielding cues in prosodic phrases preceding turn changes (smooth switches) vs. turn retentions (holds) vs. backchannels in the Columbia Games Corpus, a large corpus of task-oriented dialogues, to determine which features reliably distinguish between these three. We identify seven turn-yielding cues, all of which can be extracted automatically, for future use in turn generation and recognition in interactive voice response (IVR) systems. Testing Duncan's (1972) hypothesis that these turn-yielding cues are linearly correlated with the occurrence of turn-taking attempts, we further demonstrate that, the greater the number of turn-yielding cues that are present, the greater the likelihood that a turn change will occur. We also identify six cues that precede backchannels, which will also be useful for IVR backchannel generation and recognition; these cues correlate with backchannel occurrence in a quadratic manner. We find similar results for overlapping and for non-overlapping speech. © 2010 Elsevier Ltd. All rights reserved.  |l eng 
536 |a Detalles de la financiación: IIS-0307905, IIS-0803148 
536 |a Detalles de la financiación: This work was funded in part by NSF IIS-0307905 and IIS-0803148. We thank Štefan Beňuš, Héctor Chávez, Frank Enos, Michel Galley, Enrique Henestroza, Hanae Koiso, Jackson Liscombe, Michael Mulley, Andrew Rosenberg, Elisa Sneed German, and Gregory Ward, for valuable discussions and for their help in collecting, labeling and processing the data. We also thank our anonymous reviewers for valuable suggestions. 
593 |a Departamento de Computación, FCEyN, Universidad de Buenos Aires, Argentina 
593 |a Laboratorio de Investigaciones Sensoriales, Hospital de Clínicas, Universidad de Buenos Aires, Argentina 
593 |a Department of Computer Science, Columbia University, New York, NY, United States 
690 1 0 |a DIALOGUE 
690 1 0 |a IVR SYSTEMS 
690 1 0 |a PROSODY 
690 1 0 |a TURN-TAKING 
690 1 0 |a BACK CHANNELS 
690 1 0 |a COLUMBIA 
690 1 0 |a DIALOGUE 
690 1 0 |a INTERACTIVE VOICE RESPONSE 
690 1 0 |a INTERACTIVE VOICE RESPONSE SYSTEMS 
690 1 0 |a IVR SYSTEMS 
690 1 0 |a PROSODY 
690 1 0 |a SYSTEM USABILITY 
690 1 0 |a TURN-TAKING 
690 1 0 |a SPEECH RECOGNITION 
700 1 |a Hirschberg, J. 
773 0 |d 2011  |g v. 25  |h pp. 601-634  |k n. 3  |p Comput Speech Lang  |x 08852308  |t Computer Speech and Language 
856 4 1 |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-79952619484&doi=10.1016%2fj.csl.2010.10.003&partnerID=40&md5=200ad47419c62b11039b0666c992a97f  |y Registro en Scopus 
856 4 0 |u https://doi.org/10.1016/j.csl.2010.10.003  |y DOI 
856 4 0 |u https://hdl.handle.net/20.500.12110/paper_08852308_v25_n3_p601_Gravano  |y Handle 
856 4 0 |u https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_08852308_v25_n3_p601_Gravano  |y Registro en la Biblioteca Digital 
961 |a paper_08852308_v25_n3_p601_Gravano  |b paper  |c PE 
962 |a info:eu-repo/semantics/article  |a info:ar-repo/semantics/artículo  |b info:eu-repo/semantics/publishedVersion 
999 |c 71435