posted on 2025-05-09, 23:25authored byDamián Marelli, Peter Balazs
A speech production model consists of a linear, slowly time-varying filter. Pole-zero models are required for a good representation of certain types of speech sounds, like nasals and laterals. From a perceptual point of view, designing them by minimizing a logarithmic criterion appears as a very suitable approach. The most accurate available results are obtained by using Newton-like search algorithms to optimize pole and zero positions, or the coefficients of a decomposition into quadratic factors. In this paper, we propose to optimize the numerator and denominator coefficients instead. Experimental results show that this is the computationally most efficient approach, especially when the optimization criterion considers a psychoacoustical frequency scale. To illustrate its applicability in speech processing, we used the proposed method for formant and anti-formant tracking as well as speech resynthesis.
History
Journal title
IEEE Transactions on Audio, Speech and Language Processing
Volume
18
Issue
2
Pagination
237-248
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Language
en, English
College/Research Centre
Faculty of Engineering and Built Environment
School
School of Electrical Engineering and Computer Science