Using STFT real and imaginary parts of modulation signals for MMSE-based speech enhancement
In this paper we investigate an alternate, RI-modulation (R=Real, I=Imaginary) AMS framework for speech enhancement, in which the real and imaginary parts of the modulation signal are processed in secondary AMS procedures. This framework offers theoretical advantages over the previously proposed modulation AMS frameworks in that noise is additive in the modulation signal and noisy acoustic phase is not used to reconstruct speech. Using the MMSE magnitude estimation to modify modulation magnitude spectra, initial experiments presented in this work evaluate if these advantages translate into improvements in processed speech quality. The effect of speech presence uncertainty and log-domain processing on MMSE magnitude estimation in the RI-modulation framework is also investigated. Finally, a comparison of different enhancement approaches applied in the RI-modulation framework is presented. Using subjective and objective experiments as well as spectrogram analysis, we show that RI-modulation MMSE magnitude estimation with speech presence uncertainty produces stimuli which has a higher preference by listeners than the other RI-modulation types. In comparisons to similar approaches in the modulation AMS framework, results showed that the theoretical advantages of the RI-modulation framework did not translate to an improvement in overall quality, with both frameworks yielding very similar sounding stimuli, but a clear improvement (compared to the corresponding modulation AMS based approach) in speech intelligibility was found.