Background: Cereal hull color is an important quality specification characteristic. Many
studies were conducted to identify genetic changes underlying cereal hull color diversity. However,
these studies mainly focused on the gene level. Recent studies have suggested that metabolomics can
accurately reflect the integrated and real-time cell processes that contribute to the formation of
different cereal colors.
Methods: In this study, we exploited published metabolomics databases and applied several
advanced computational methods, such as minimum redundancy maximum relevance (mRMR),
incremental forward search (IFS), random forest (RF) to investigate cereal hull color at the metabolic
level. First, the mRMR was applied to analyze cereal hull samples represented by metabolite
features, yielding a feature list. Then, the IFS and RF were used to test several feature sets,
constructed according to the aforementioned feature list. Finally, the optimal feature sets and RF
classifier were accessed based on the testing results.
Results and Conclusion: A total of 158 key metabolites were found to be useful in distinguishing
white cereal hulls from colorful cereal hulls. A prediction model constructed with these metabolites
and a random forest algorithm generated a high Matthews coefficient correlation value of 0.701.
Furthermore, 24 of these metabolites were previously found to be relevant to cereal color. Our study
can provide new insights into the molecular basis of cereal hull color formation.