Special Sessions

Special sessions at Interspeech are intended to stimulate particular topics, identified by colleagues as bound to deserve a specific focus during the conference.

The following special session are scheduled at Interspeech:

» Speech science in end-user applications
» Intelligibility-enhancing speech modifications
» Spoofing and countermeasures for automatic speaker verification
» Articulatory data acquisition and processing
» Child computer interaction
» Computational Paralinguistics Challenge

Detailed description of Special Sessions:

Speech science in end-user applications

   Felix Burkhardt Deutsche Telekom Laboratories
   Juergen Schroeter AT&T
   Björn Schuller Technische Universität München

To cast a balance between science and technology applications, this special session focuses on applications of speech technologies. Junior researchers in particular will appreciate getting an idea on how speech technology is used in industrial products designed with the end user in mind. We felt that a special session would be a good place to bring the academic and industrial world closer together, and exchange experiences in the overlapping areas of science and technology. We encourage contributions in all related fields, e.g. recognition, synthesis, semantics, classification, analytics, etc., but with a focus on real world applications and the problems detected in user studies or extracted from real-use log files. A poster session will allow for good individual exchange. The session will begin with an introduction by the organizers followed by a short presentation to introduce the posters at display. A brief panel discussion at the end will wrap up the session highlighting common grounds.

Intelligibility-enhancing speech modifications

   Martin Cooke Ikerbasque
   Yannis Stylianou Toshiba Research Laboratory
   Catherine Mayo Centre for Speech Technology Research, University of Edinburgh

Natural (live and recorded) and synthetic speech are deployed increasingly in applications involving speech technology, many of which need to function under non-ideal listening conditions. In order to ensure correct message reception, existing systems are forced to rely upon excessive output level or repetition. An alternative approach is to manipulate speech or the message generation process to achieve adequate intelligibility levels while ideally reducing output intensity. A number of algorithms for intelligibility-enhancing speech modification have been proposed in recent years, with claims of improvements equivalent to reducing output levels by up to 5 dB. The purpose of the Special Session is to compare the effectiveness of algorithms whose goal is to increase the intelligibility of natural and synthetic speech in known noise conditions. If you are a researcher working on speech modifications which boost intelligibility, we welcome your submission

Spoofing and countermeasures for automatic speaker verification

   Nicholas Evans EURECOM
   Tomi Kinnunen University of Eastern Finland
   Junichi Yamagishi University of Edinburgh
   Sebastien Marcel Idiap Research Institute

It is widely acknowledged that most biometric systems are vulnerable to imposture or spoofing attacks. While vulnerabilities and countermeasures for other biometric modalities have been widely studied, automatic speaker verification systems remain vulnerable. This special session aims to promote the study of spoofing and countermeasures for the speech modality. We invite submissions with an emphasis on new countermeasures in addition to papers with a focus on previously unconsidered vulnerabilities, new databases, evaluation protocols and metrics for the assessment of automatic speaker verification in the face of spoofed samples. In particular, we aim to stimulate new interest from colleagues working in related fields, e.g. voice conversion and speech synthesis, whose participation is sought for the design of future evaluations.

Articulatory data acquisition and processing

   Slim Ouni LORIA
   Korin Richmond CSTR, University of Edinburgh
   Asterios Toutios Signal and Image Processing Institute, University of Southern California

In recent years, the techniques available for acquiring articulatory data, such as electromagnetic articulography, magnetic resonance imaging, ultrasound tongue imaging, electropalatography, electroglottography, video recording, optical motion capture, air flow and pressure measurements, have matured steadily, driving great advances in speech production research and other related fields. Nevertheless, using such methods tends to involve a large duplication of effort, with each research group developing their own data analysis and processing tools, while there is little exchange of knowledge regarding the practical details of data acquisition and processing methods. It would be enormously beneficial to the scientific community to actively encourage the identification of best practices and to establish guidelines regarding acquisition protocols. The aim of this special session is to meet this need, focusing on the technical aspects of articulatory data acquisition.

Child computer interaction

   Kay Berkling Duale Hochschule Karlsruhe
   Shrikanth Narayanan University of Southern California
   Keelan Evanini ETS
   Johan Schalkwyk Google
   Arthur Kantor IBM
   Takayuki Arai Sophia University
   Stefan Steidl University of Erlangen

This special session aims to bring together researchers and practitioners from universities and industry working in all aspects of multimodal child-computer interaction with a particular emphasis on, but not limited to, interactive spoken language interfaces. Examples of targeted domains where speech technology applications involving child-computer interaction are becoming increasingly important include healthcare and education, especially with the spread of mobile devices into the lives of children. Of special interest for Interspeech 2013, in the light of the humanistic view point, will be to consider the issue of global accessibility to these technologies. One challenge for the next two decades will be to employ affordable mobile technology and remove barriers caused by health issues or remoteness in order to grant accessibility to children around the globe. We look forward to receiving a large variety of submissions addressing issues related to child-computer interaction from the areas of automatic speech recognition, linguistics, multimedia, robotics, human computer interaction and related disciplines.

Computational Paralinguistics Challenge

Björn Schuller Technische Universität München
Stefan Steidl Friedrich-Alexander-University
Anton Batliner Technische Universität München
Alessandro Vinciarelli University of Glasgow
Klaus Scherer Swiss Center for Affective Sciences
Fabien Ringeval University of Fribourg
Mohamed Chetouani Université Pierre et Marie Curie

After four consecutive Challenges at INTERSPEECH, there still exists a multiplicity of not yet covered, but highly relevant paralinguistic phenomena. In the last instalments, we focused on single speakers. With a new task, we now want to broaden to analysing discussion of multiple speakers in the Conflict Sub-Challenge. A further novelty is introduced by the Social Signals Sub-Challenge: For the first time, non-linguistic events have to be classified and localised – laughter and fillers. In the Emotion Sub-Challenge we are literally “going back to the roots”. However, by intention, we use acted material for the first time to fuel the ever on-going discussion on differences between naturalistic and acted material and hope to highlight the differences. Finally, the Autism Sub-Challenge picks up on Autism Spectrum Condition in children’s speech in this year. Apart from intelligent and socially competent future agents and robots, main applications are found in the medical domain and surveillance.

This special session is dedicated to participants to the Computational Paralinguistic Challenge, a special event of Interspeech 2013.

» More details

Institutional sponsors