The availability of high-throughput genomic data has motivated the development of numerous algorithms to infer gene regulatory networks. The validity of an inference procedure must be evaluated relative to its ability to infer a model network close to the ground-truth network from which the data have been generated. The input to an inference algorithm is a sample set of data and its output is a network. Since input, output, and algorithm are mathematical structures, the validity of an inference algorithm is a mathematical issue. This paper formulates validation in terms of a semi-metric distance between two networks, or the distance between two structures of the same kind deduced from the networks, such as their steady-state distributions or regulatory graphs. The paper sets up the validation framework, provides examples of distance functions, and applies them to some discrete Markov network models. It also considers approximate validation methods based on data for which the generating network is not known, the kind of situation one faces when using real data.
Epistemology, gene network, inference, validation
Department of Electrical and Computer Engineering, Texas A University, College Station, TX 77843-3128, USA.