Abstract
In the field of functional genomics, many computer programs are being implemented to find DNA sequences that are candidates for coding proteins, regulating gene expression or playing some other important role. Many such programs have adjustable parameters whose chosen values dramatically affect the output. Attempts to optimize such parameter settings involve establishing an appropriate reference data set and evaluating the results obtained with the programs over a reasonable range of parameter values varied in small intervals. Within the context of automated searches for candidate gene regulatory regions in mammals and bacteria, we discuss problems and progress in (i) building such reference data sets, and (ii) defining appropriate cost functions.
Keywords: Validating Computer Programs, Functional Genomics, Regulatory Regions, Gene expression, Gene regulatory, ilvGMEDA, Salmonella, Yersinia, Kelbsiella