43
技术前沿
619, 2006. [17] N. Roman, and J. Woodruff, “Intelligibility of reverberant
[3] M. Hawkes, and A. Nehorai, “Acoustic vector-sensor noisy speech with ideal binary masking,” The Journal of the
beamforming and Capon direction estimation,” IEEE Acoustical Society of America, vol. 130, no. 4, pp. 2153-2161,
Transactions on Signal Processing, vol. 46, no. 9, pp. 2291-2304, 2011.
1998. [18] Y. Wang, and D. Wang, “Towards scaling up classification-
[4] J. Cao, J. Liu, J. Wang et al., “Acoustic vector sensor: reviews based speech separation,” IEEE Transactions on Audio, Speech,
and future perspectives,” IET Signal Processing, 2016. and Language Processing, vol. 21, no. 7, pp. 1381-1390, 2013.
[5] D. Levin, E. A. Habets, and S. Gannot, “Maximum likelihood [19] C. J. Burges, “A tutorial on support vector machines for
estimation of direction of arrival using an acoustic vector- pattern recognition,” Data Mining and Knowledge Discovery,
sensor,” The Journal of the Acoustical Society of America, vol. vol. 2, no. 2, pp. 121-167, 1998.
131, no. 2, pp. 1240-1248, 2012. [20] N. Yang, R. Muraleedharan, J. Kohl et al., “Speech-based
[6] B. Li, and Y. X. Zou, “Improved DOA estimation with emotion classification using multiclass SVM with hybrid kernel
acoustic vector sensor arrays using spatial sparsity and subarray and thresholding fusion,” in Spoken Language Technology
manifold,” in 2012 IEEE International Conference on Acoustics, Workshop (SLT), 2012 IEEE, pp. 455-460, 2012.
Speech and Signal Processing (ICASSP), pp. 2557-2560, 2012. [21] Y. Wang, K. Han, and D. Wang, “Exploring monaural
[7] Y. X. Zou, W. Shi, B. Li et al., “Multisource DOA estimation features for classification-based speech segregation,” IEEE
based on time-frequency sparsity and joint inter-sensor Transactions on Audio, Speech, and Language Processing, vol.
data ratio with single acoustic vector sensor,” in 2013 IEEE 21, no. 2, pp. 270-279, 2013.
International Conference on Acoustics, Speech and Signal [22] C. J. Taylor, “2012 Benjamin Franklin Medal in Computer
Processing, pp. 4011-4015, 2013. and Cognitive Science presented to Vladimir Vapnik,” Journal
[8] S. Zhao, T. Saluev, and D. L. Jones, “Underdetermined of the Franklin Institute, vol. 352, no. 7, pp. 2579-2584, 2015.
direction of arrival estimation using acoustic vector sensor,” [23] G. E. Hinton, “Training products of experts by minimizing
Signal Processing, vol. 100, pp. 160-168, 2014. contrastive divergence,” Neural Computation, vol. 14, no. 8,
[9] K. Wu, V. Reju, and A. W. Khong, “Multi-source direction- pp. 1771-1800, 2002.
of-arrival estimation in a reverberant environment using single [24] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training
acoustic vector sensor,” in 2015 IEEE International Conference algorithm for optimal margin classifiers,” in Proceedings of the
on Acoustics, Speech and Signal Processing (ICASSP), pp. 444- fifth annual workshop on Computational learning theory, pp.
448, 2015. 144-152, 1992.
[10] W. Zheng, Y. Zou, and C. Ritz, “Spectral mask estimation [25] M. E. Tipping, “Sparse Bayesian learning and the relevance
using deep neural networks for inter-sensor data ratio model vector machine,” Journal of Machine Learning Research, vol. 1,
based robust DOA estimation,” in 2015 IEEE International no. Jun, pp. 211-244, 2001.
Conference on Acoustics, Speech and Signal Processing [26] J. B. Allen, and D. A. Berkley, “Image method for efficiently
(ICASSP), pp. 325-329, 2015. simulating small‐room acoustics,” The Journal of the
[11] Y. H. Jin, and Y. Zou, “Robust speaker DOA estimation with Acoustical Society of America, vol. 65, no. 4, pp. 943-950, 1979.
single AVS in bispectrum domain,” in 2016 IEEE International [27] J. S. Garofolo, “Getting started with the DARPA TIMIT CD-
Conference on Acoustics, Speech and Signal Processing ROM: An acoustic phonetic continuous speech database,”
(ICASSP), pp. 3196-3200, 2016. National Institute of Standards and Technology (NIST),
[12] W. Zhang, and B. D. Rao, “A two microphone-based Gaithersburgh, MD, vol. 107, 1988.
approach for source localization of multiple speech sources,” [28] C.-C. Chang, and C.-J. Lin, “LIBSVM: a library for support
IEEE transactions on audio, speech, and language processing, vector machines,” ACM Transactions on Intelligent Systems
vol. 18, no. 8, pp. 1913-1928, 2010. and Technology (TIST), vol. 2, no. 3, pp. 27, 2011.
[13] D. Levin, E. A. Habets, and S. Gannot, “On the angular [29] L. v. d. Maaten, and G. Hinton, “Visualizing data using
error of intensity vector based direction of arrival estimation t-SNE,” Journal of Machine Learning Research, vol. 9, no. Nov,
in reverberant sound fields,” The Journal of the Acoustical pp. 2579-2605, 2008.
Society of America, vol. 128, no. 4, pp. 1800-1811, 2010. [30] Y. Tang, “Deep learning using support vector machines,”
[14] Z. I. Botev, J. F. Grotowski, and D. P. Kroese, “Kernel density CoRR, abs\/1306.0239, vol. 2, 2013.
estimation via diffusion,” The Annals of Statistics, vol. 38, no. 5, [31] A. Varga, and H. J. Steeneken, “Assessment for automatic
pp. 2916-2957, 2010. speech recognition: II. NOISEX-92: A database and an
[15] J. B. Allen, “How do humans process and recognize experiment to study the effect of additive noise on speech
speech?,” IEEE Transactions on Speech and Audio Processing, recognition systems,” Speech Communication, vol. 12, no. 3,
vol. 2, no. 4, pp. 567-577, 1994. pp. 247-251, 1993.
[16] G. Hinton, L. Deng, D. Yu et al., “Deep neural networks for
acoustic modeling in speech recognition: The shared views of
four research groups,” IEEE Signal Processing Magazine, vol.
29, no. 6, pp. 82-97, 2012.