After many decades of slow incremental growth, computer-based automatic recognition of human speech has recently gone through a much more rapid transition from the research lab to mainstream application, available on most of the 1+ trillion smartphones on the planet. (Worldwide, there are almost as many mobile phones as people, and about 1 in 5 of these are smartphones.) This growth has been largely fueled by the growth of raw computational power, rather than fundamental changes in speech recognition technology itself. The methods used in nearly every state-of-the-art automatic speech recognition system are based on the same statistical model that was first used for speech more than 30 years ago, the Hidden Markov Model. Hidden Markov Models are in many ways straightforward models, simple state machines that take input sequences and identify the most likely corresponding state sequences. The main strength of the approach is in its flexibility – flexibility to match sequences in a non-linear temporal pattern, flexibility to learn more detailed models if more training data is available, flexibility to connect multiple models together into longer continuous patterns, and flexibility to incorporate whatever data features and probabilistic models are best suited to the task. Nearly all of these benefits also carry over to the domain of bioacoustics, specifically to the classification of animal vocalizations. Although there are limits to this – human speech is better understood than animal communication – there is also much to gain, and many improvements that are possible by taking advantage of the large body of knowledge available through the long history of human speech processing and recognition technology. Agreeing with this idea, this chapter presents an overview of the use of Hidden Markov Models for classification, detection, and clustering of bioacoustics signals.
Keywords: Feature extraction, Gaussian mixture models, Hidden Markov models, Signal classification, Signal detection.