Birth defects and other adverse outcomes of pregnancy have been known since before the industrial age, but the need for guidelines to test chemicals to which pregnant women might be exposed was defined by the thalidomide tragedy in the 1950s and early 1960s. During the five decades that followed the tragedy, guidelines were written to test drugs, foods, and environmental contaminants. The guidelines were written to fulfill national needs, and then were expanded to international levels in order to streamline procedures in the expanding global economy. Multiple sets of animal guidelines were written, based on the need to simulate human experience. It was soon realized that the underlying principles were similar for all guidelines. Guidelines gradually evolved in two directions, in complexity and number of endpoints measured and in the expansion of internationally acceptable guidelines. This manuscript reviews, in chronological order, some of the milestones in the development of guidelines for animal studies and in the interpretations for safety assessment. Guidelines for long-term and short-term studies are reviewed, followed by a discussion of recently added endpoints and the future integration process for the assessment of reproductive toxicity risk.