Conference Areas

1. Human Speech Production, Speech and Language Acquisition

  1.1 Physiology and neurophysiology of speech production
  1.2 Neural basis of speech production
  1.3 Experimental setups for articulatory data acquisition
  1.4 Speech motor control
  1.5 Speech acoustics
  1.6 Phonation, voice quality
  1.7 Coarticulation
  1.8 Theory of speech production with other modalities (e.g. gestures, facial expressions)
  1.9 Articulatory and acoustic cues of prosody
  1.10 Models of speech production
  1.11 Infant spoken language acquisition
  1.12 Speech development from toddlerhood and beyond
  1.13 L2 acquisition
  1.14 Multilingual studies of speech production
  1.15 Singing acoustics


2. Human Speech Perception, Interaction Production-Perception and Face-to-Face Communication

  2.1 Physiology and neurophysiology of speech perception
  2.2 Neural basis of speech perception
  2.3 Multimodal speech perception
  2.4 Models of speech perception
  2.5 Face-to-face communication
  2.6 Interaction speech production-speech perception
  2.7 Phonetic convergence
  2.8 Multistability in speech perception
  2.9 Acoustic and articulatory cues in speech perception
  2.10 Perception of prosody
  2.11 Perception of emotions
  2.12 Non-verbal human interaction
  2.13 Multilingual studies
  2.14 Perception of non native sounds
  2.15 Infant speech perception
  2.16 Perception of singing voice


3. Linguistic Systems, Language Description, Languages in Contact, Sound Changes

  3.1 Linguistic systems
  3.2 Language descriptions
  3.3 Phonetics and phonology
  3.4 Discourse and dialog structures
  3.5 Phonological processes and models
  3.6 Language in contacts
  3.7 Laboratory phonology
  3.8 Phonetic universals
  3.9 Sound changes
  3.10 Socio-phonetics  
  3.11 Phonetics of L1-L2 interaction
  3.12 Neurophonetics
  3.13 Paralinguistic and nonlinguistic cues (other than emotion, expression)


4. Speech and Hearing Disorders

  4.1 Dysarthria
  4.2 Aphasia
  4.3 Dysphonia
  4.4 Peripheral disorders of speech production
  4.5 Stuttering
  4.6 Speech in cochlear implanted patients
  4.7 Voice disorders
  4.8 Cued speech
  4.9 Neural correlates of speech production and speech perception disorders
  4.10 Speech technology applications for speech and hearing disorders


5. Analysis of Speech, Audio Signals, Speech Coding, Speech Enhancement

  5.1 Speech analysis and representation
  5.2 Audio signal analysis and representation
  5.3 Speech and audio segmentation and classification
  5.4 Voice activity detection
  5.5 Speech coding and transmission
  5.6 Speech enhancement: single-channel
  5.7 Speech enhancement: multi-channel
  5.8 Source separation and computational auditory scene analysis
  5.9 Speaker spatial localization
  5.10 Voice separation
  5.11 Signal processing for music and song
  5.12 Singing analysis


6. Speech Synthesis, Audiovisual Speech Synthesis, Spoken Language Generation

  6.1 Grapheme-to-phoneme conversion for synthesis
  6.2 Text processing for speech synthesis (text normalization, syntactic and semantic analysis)
  6.3 Segmental-level and/or concatenative synthesis
  6.4 Signal processing/statistical model for synthesis
  6.5 Speech synthesis paradigms and methods; silence speech, articulatory synthesis, parametric synthesis etc.
  6.6 Prosody modeling and generation
  6.7 Expression, emotion and personality generation
  6.8 Voice conversion and modification, morphing
  6.9 Concept-to-speech conversion
  6.10 Cross-lingual and multilingual aspects for synthesis
  6.11 Avatars and talking faces
  6.12 Tools and data for speech synthesis
  6.13 Quality assessment/evaluation metrics in synthesis


7. Speech Recognition: Signal Processing, Acoustic Modeling, Pronunciation, Robustness, Adaptation

  7.1 Feature extraction and low-level feature modeling for ASR
  7.2 Prosodic features and models
  7.3 Robustness against noise, reverberation
  7.4 Far field and microphone array speech recognition
  7.5 Speaker normalization (e.g., VTLN)
  7.6 New paradigms, including articulatory models, silent speech interfaces
  7.7 Discriminative acoustic training methods for ASR
  7.8 Acoustic model adaptation (speaker, bandwidth, emotion, accent)
  7.9 Speaker adaptation; speaker adapted training methods
  7.10 Pronunciation variants and modeling for speech recognition
  7.11 Acoustic confidence measures
  7.12 Multimodal aspects (e.g., AV speech recognition)
  7.13 Cross-lingual and multilingual aspects, non native accents
  7.14 Acoustic modeling for conversational speech (dialog, interaction)

 

8. Speech Recognition: System, Architecture, Lexical and Linguistic Components, Language Modeling, Search

  8.1 Lexical modeling and access: units and models
  8.2 Automatic lexicon learning
  8.3 Supervised/unsupervised morphological models
  8.4 Prosodic features and models for LM
  8.5 Discriminative training methods for LM  
  8.6 Language model adaptation (domain, diachronic adaptation)
  8.7 Language modeling for conversational speech (dialog, interaction)
  8.8 Search methods, decoding algorithms and implementation; lattices; multipass strategies
  8.9 New computational strategies, data-structures for ASR
  8.10 Computational resource constrained speech recognition
  8.11 Confidence measures
  8.12 Cross-lingual and multilingual aspects for speech recognition
  8.13 Structured classification approaches


9. Speaker Characterization, Speaker and Language Recognition

  9.1 Language identification and verification
  9.2 Dialect and accent recognition
  9.3 Speaker characterization, verification and identification
  9.4 Features for speaker and language recognition
  9.5 Robustness to variable and degraded channels
  9.6 Speaker confidence estimation
  9.7 Extraction of para-linguistic information (gender, stress, mood, age, emotion)  
  9.8 Speaker diarization
  9.9 Multimodal and multimedia speaker recognition
  9.10 Higher-level knowledge in speaker and language recognition


10. Spoken Language Understanding, Dialog Systems, Spoken Information Retrieval, and Other Applications

  10.1 Spoken and multimodal dialog systems
  10.2 Stochastic modeling for dialog
  10.3 Question/answering from speech
  10.4 Multimodal systems
  10.5 Applications in education and learning (including call, assessment of language fluency)
  10.6 Applications in medical practice (CIS, voice assessment, ...)
  10.7 Applications in other areas
  10.8 Systems for LVCSR and rich transcription
  10.9 Systems for SL understanding
  10.10 Systems for mining spoken data, search/retrieval of speech documents
  10.11 Spoken language information retrieval
  10.12 Topic spotting and classification
  10.13 Entity extraction from speech
  10.14 Spoken document summarization
  10.15 Semantic analysis and classification


11. Metadata, Evaluation and Resources, NLP including MT applied to spoken data

  11.1 Speech and multimodal resources and annotation
  11.2 Metadata descriptions of speech, audio, and text resources
  11.3 Metadata for semantic/content markup
  11.4 Metadata for linguistic/discourse structure (e.g., disfluencies, sentence/topic boundaries, speech acts)
  11.5 Methodologies and tools for LR construction and annotation
  11.6 Automatic segmentation and labeling of resources
  11.7 Natural language processing (semantic classification, entity extraction, summarization) of speech
  11.8 Spoken language translation (interlingua and transfer, integration of speech and linguistic processing)
  11.9 Multilingual resources
  11.10 Validation, quality assurance, evaluation of LRs
  11.11 Evaluation and standardization of speech and language technologies and systems

12. Special sessions and topics

  12.1 Speech science in end user applications
  12.2 Intelligibility-enhancing speech modifications
  12.3 Voicing in speech production and perception
  12.4 Early history of instrumental phonetics
  12.5 Spoofing and countermeasures for automatic speaker verification
  12.6 Articulatory data acquisition and processing
  12.7 Speech synthesis for language varieties
  12.8 Voice attractiveness: causes and consequences
  12.9 Advances in machine learning for prosody modeling
  12.10 Simultaneous speech interpretation
  12.11 Child computer interaction