Speech Processing

Prof. Dr.-Ing. S. Feldes

Contents

75% lectures, 25% laboratory team work

  • Introduction –Speech as interaction modality; Application fields and examples; Diversity of speech processing tasks, Interdisciplinarity.
  • Speech signal and speech production – Characteristics of speech signals in time and frequency domain, Spectrograms, Voiced/unvoiced sounds, Pitch, Formants; Human speech production process, Source filter model; Co-articulation, Prosody; Phonetic categorisation and alphabet; Matlab-Laboratory on speech signal analysis.
  • Human Ear and Hearing – Outer, middle and inner ear, Frequency-Place-transformation; Psycho-acoustic fundamentals, Hearing area, Masking in frequency and time, Critical bands, Perception based measures: Loudness (Phon), Pitch (Bark, Mel)
  • Speech synthesis – Speech reproduction with slot filling; Parametric synthesis (articulatory, Formant); Time domain synthesis (Diphon synthesis, PSOLA, Unit-Selection-Synthesis); Text-To-Speech, syntactic, lexical & phonetic processing, prosody, Concept-To-Speech.
  • Speech recognition– Knowledge based vs. statistical approach; Feature extraction (windowing, LPC, Mel Frequency Cepstral Coeff.); Basics in pattern recognition, minimum distance classifier, Bayes-classifier; Dynamic Time Warping; Hidden-Markov-Models, Viterbi-Decoding, Training, Phonetic-acoustic modelling, Language models; Recognition rates; Laboratory on speech recognition (HTK)
  • Applications – Dictation systems, Dialogue systems, Speech-To-Text systems, Multimodal systems; Aspects of dialogue design and usability, mixed initiative, barge-in; Context Free Grammars, Stochastic language models; Speech understanding; VoiceXML; Laboratory on dialogue systems using VoiceXML

Full version (in German)