In this paper we present comparative data, suggesting that the various elements of human speech evolved at different times, and originally had different functions. Recent work by Nishimura [1-6] shows that what is commonly known as the laryngeal descent actually evolved in a mosaic way in minimally two steps: (a) a descent of the thyroid cartilage (Adam's apple) relative to the hyoid (tongue bone), a descent which is also seen in non-human hominoids, and (b) a descent of the hyoid bone relative to the palate, which is less obvious in non-human hominoids, and which is accentuated by the absence of prognathism in the short and flat human face. Comparisons with other animals suggest that (a) the first descent might be associated with loud and/or varied sound production, and that (b) the second might be part of an adaptation to eating seafoods such as shell fish, which can be sucked into the mouth and swallowed without chewing, even under water. We argue that the origin of human speech is based on different pre-adaptations that were present in human ancestors, such as (a) sound production adaptations related to the descent of the thyroid cartilage associated with the territorial calls of apes, (b) transformation of the oral and dentitional anatomy including the descent of the hyoid, associated with reduced biting and chewing, and (c) diving adaptations, leading to voluntary control of the airway entrances and voluntary breath control. Whereas chimpanzee ancestors became frugivores in tropical forests after they split from human ancestors about 5 Ma (million years ago), human ancestors became littoral omnivores. This might help explain why chimpanzees did not evolve language skills, why human language is a relatively recent phenomenon, and why it is so strongly dependent upon the availability of voluntary breath control, not seen in other hominoids, but clearly present in diving mammals.