1. Human Speech Production, Speech and Language Acquisition
1.1 Physiology and neurophysiology of speech production
1.2 Neural basis of speech production
1.3 Experimental setups for articulatory data acquisition
1.4 Speech motor control
1.5 Speech acoustics
1.6 Phonation, voice quality
1.7 Coarticulation
1.8 Theory of speech production with other modalities (e.g. gestures, facial expressions)
1.9 Articulatory and acoustic cues of prosody
1.10 Models of speech production
1.11 Infant spoken language acquisition
1.12 Speech development from toddlerhood and beyond
1.13 L2 acquisition
1.14 Multilingual studies of speech production
1.15 Singing acoustics
2. Human Speech Perception, Interaction Production-Perception and Face-to-Face Communication
2.1 Physiology and neurophysiology of speech perception
2.2 Neural basis of speech perception
2.3 Multimodal speech perception
2.4 Models of speech perception
2.5 Face-to-face communication
2.6 Interaction speech production-speech perception
2.7 Phonetic convergence
2.8 Multistability in speech perception
2.9 Acoustic and articulatory cues in speech perception
2.10 Perception of prosody
2.11 Perception of emotions
2.12 Non-verbal human interaction
2.13 Multilingual studies
2.14 Perception of non native sounds
2.15 Infant speech perception
2.16 Perception of singing voice
3. Linguistic Systems, Language Description, Languages in Contact, Sound Changes
3.1 Linguistic systems
3.2 Language descriptions
3.3 Phonetics and phonology
3.4 Discourse and dialog structures
3.5 Phonological processes and models
3.6 Language in contacts
3.7 Laboratory phonology
3.8 Phonetic universals
3.9 Sound changes
3.10 Socio-phonetics
3.11 Phonetics of L1-L2 interaction
3.12 Neurophonetics
3.13 Paralinguistic and nonlinguistic cues (other than emotion, expression)
4. Speech and Hearing Disorders
4.1 Dysarthria
4.2 Aphasia
4.3 Dysphonia
4.4 Peripheral disorders of speech production
4.5 Stuttering
4.6 Speech in cochlear implanted patients
4.7 Voice disorders
4.8 Cued speech
4.9 Neural correlates of speech production and speech perception disorders
4.10 Speech technology applications for speech and hearing disorders
5. Analysis of Speech, Audio Signals, Speech Coding, Speech Enhancement
5.1 Speech analysis and representation
5.2 Audio signal analysis and representation
5.3 Speech and audio segmentation and classification
5.4 Voice activity detection
5.5 Speech coding and transmission
5.6 Speech enhancement: single-channel
5.7 Speech enhancement: multi-channel
5.8 Source separation and computational auditory scene analysis
5.9 Speaker spatial localization
5.10 Voice separation
5.11 Signal processing for music and song
5.12 Singing analysis
6. Speech Synthesis, Audiovisual Speech Synthesis, Spoken Language Generation
6.1 Grapheme-to-phoneme conversion for synthesis
6.2 Text processing for speech synthesis (text normalization, syntactic and semantic analysis)
6.3 Segmental-level and/or concatenative synthesis
6.4 Signal processing/statistical model for synthesis
6.5 Speech synthesis paradigms and methods; silence speech, articulatory synthesis, parametric synthesis etc.
6.6 Prosody modeling and generation
6.7 Expression, emotion and personality generation
6.8 Voice conversion and modification, morphing
6.9 Concept-to-speech conversion
6.10 Cross-lingual and multilingual aspects for synthesis
6.11 Avatars and talking faces
6.12 Tools and data for speech synthesis
6.13 Quality assessment/evaluation metrics in synthesis
7. Speech Recognition: Signal Processing, Acoustic Modeling, Pronunciation, Robustness, Adaptation
7.1 Feature extraction and low-level feature modeling for ASR
7.2 Prosodic features and models
7.3 Robustness against noise, reverberation
7.4 Far field and microphone array speech recognition
7.5 Speaker normalization (e.g., VTLN)
7.6 New paradigms, including articulatory models, silent speech interfaces
7.7 Discriminative acoustic training methods for ASR
7.8 Acoustic model adaptation (speaker, bandwidth, emotion, accent)
7.9 Speaker adaptation; speaker adapted training methods
7.10 Pronunciation variants and modeling for speech recognition
7.11 Acoustic confidence measures
7.12 Multimodal aspects (e.g., AV speech recognition)
7.13 Cross-lingual and multilingual aspects, non native accents
7.14 Acoustic modeling for conversational speech (dialog, interaction)
8. Speech Recognition: System, Architecture, Lexical and Linguistic Components, Language Modeling, Search
8.1 Lexical modeling and access: units and models
8.2 Automatic lexicon learning
8.3 Supervised/unsupervised morphological models
8.4 Prosodic features and models for LM
8.5 Discriminative training methods for LM
8.6 Language model adaptation (domain, diachronic adaptation)
8.7 Language modeling for conversational speech (dialog, interaction)
8.8 Search methods, decoding algorithms and implementation; lattices; multipass strategies
8.9 New computational strategies, data-structures for ASR
8.10 Computational resource constrained speech recognition
8.11 Confidence measures
8.12 Cross-lingual and multilingual aspects for speech recognition
8.13 Structured classification approaches
9. Speaker Characterization, Speaker and Language Recognition
9.1 Language identification and verification
9.2 Dialect and accent recognition
9.3 Speaker characterization, verification and identification
9.4 Features for speaker and language recognition
9.5 Robustness to variable and degraded channels
9.6 Speaker confidence estimation
9.7 Extraction of para-linguistic information (gender, stress, mood, age, emotion)
9.8 Speaker diarization
9.9 Multimodal and multimedia speaker recognition
9.10 Higher-level knowledge in speaker and language recognition
10. Spoken Language Understanding, Dialog Systems, Spoken Information Retrieval, and Other Applications
10.1 Spoken and multimodal dialog systems
10.2 Stochastic modeling for dialog
10.3 Question/answering from speech
10.4 Multimodal systems
10.5 Applications in education and learning (including call, assessment of language fluency)
10.6 Applications in medical practice (CIS, voice assessment, ...)
10.7 Applications in other areas
10.8 Systems for LVCSR and rich transcription
10.9 Systems for SL understanding
10.10 Systems for mining spoken data, search/retrieval of speech documents
10.11 Spoken language information retrieval
10.12 Topic spotting and classification
10.13 Entity extraction from speech
10.14 Spoken document summarization
10.15 Semantic analysis and classification
11. Metadata, Evaluation and Resources, NLP including MT applied to spoken data
11.1 Speech and multimodal resources and annotation
11.2 Metadata descriptions of speech, audio, and text resources
11.3 Metadata for semantic/content markup
11.4 Metadata for linguistic/discourse structure (e.g., disfluencies, sentence/topic boundaries, speech acts)
11.5 Methodologies and tools for LR construction and annotation
11.6 Automatic segmentation and labeling of resources
11.7 Natural language processing (semantic classification, entity extraction, summarization) of speech
11.8 Spoken language translation (interlingua and transfer, integration of speech and linguistic processing)
11.9 Multilingual resources
11.10 Validation, quality assurance, evaluation of LRs
11.11 Evaluation and standardization of speech and language technologies and systems
12. Special sessions and topics
12.1 Speech science in end user applications
12.2 Intelligibility-enhancing speech modifications
12.3 Voicing in speech production and perception
12.4 Early history of instrumental phonetics
12.5 Spoofing and countermeasures for automatic speaker verification
12.6 Articulatory data acquisition and processing
12.7 Speech synthesis for language varieties
12.8 Voice attractiveness: causes and consequences
12.9 Advances in machine learning for prosody modeling
12.10 Simultaneous speech interpretation
12.11 Child computer interaction