Copy number alterations (CNA) in genomic DNA are linked to a variety of human diseases. Although many methods have been developed to analyze data from a single subject, disease-critical genes are more likely to be found in regions that are common or recurrent among diseased subjects. Unfortunately, finding recurrent CNA regions remains a challenge. We review existing methods for the identification of recurrent CNA regions. Methods differ in their working definition of “recurrent region”, the type of input data, the statistical and computational methods used to identify recurrence, and the biological considerations they incorporate (which play a role in the identification of “interesting” regions and in the details of null models used to assess statistical significance). Very few approaches use and/or return probabilities, and code is not easily available for several methods. We emphasize that, when analyzing data from complex diseases with significant among-subject heterogeneity, methods should be able to identify CNAs that affect only a subset of subjects. We suggest that finding recurrent CNAs would benefit from clearly specifying the types of pattern to be detected and the intended usage of the regions found (CNA association with disease, CNA effects on gene expression, clustering of subjects). We finish with suggestions for further methodological research.
aCGH, copy number alterations, recurrent, common regions
Breast Cancer Functional Genomics, Cancer Research UK, Cambridge, UK and The Spanish National Cancer Research Centre, Spain.