Whereas docking screens have emerged as the most practical way to use protein structure for ligand discovery, an inconsistent track record raises questions about how well docking actually works. In its favor, a growing number of publications report the successful discovery of new ligands, often supported by experimental affinity data and controls for artifacts. Few reports, however, actually test the underlying structural hypotheses that docking makes. To be successful and not just lucky, prospective docking must not only rank a true ligand among the top scoring compounds, it must also correctly orient the ligand so the score it receives is biophysically sound. If the correct binding pose is not predicted, a skeptic might well infer that the discovery was serendipitous. Surveying over 15 years of the docking literature, we were surprised to discover how rarely sufficient evidence is presented to establish whether docking actually worked for the right reasons. The paucity of experimental tests of theoretically predicted poses undermines confidence in a technique that has otherwise become widely accepted. Of course, solving a crystal structure is not always possible, and even when it is, it can be a lot of work, and is not readily accessible to all groups. Even when a structure can be determined, investigators may prefer to gloss over an erroneous structural prediction to better focus on their discovery. Still, the absence of a direct test of theory by experiment is a loss for method developers seeking to understand and improve docking methods. We hope this review will motivate investigators to solve structures and compare them with their predictions whenever possible, to advance the field.