A metric set to investigate the evolution of expression divergence following gene duplications in bulk and single-cell RNA-seq data

A metric set to investigate the evolution of expression divergence following gene duplications in bulk and single-cell RNA-seq data


Author(s): Fabrício Almeida-Silva,Yves Van de Peer

Affiliation(s): VIB-UGent Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium



Gene and genome duplications are important sources of novel genetic material for evolution to work with. However, after duplications, retained duplicate pairs evolve differently depending on the duplication mechanism that originated them. At the transcriptional level, some gene classes tend to preserve ancestral expression patterns, while others tend to subfunctionalize or neofunctionalize, eventually leading to novel traits. Here, we present a set of metrics to explore and understand gene expression divergence under two contexts: i. within the same species, and ii. in hybrid triplets (i.e., hybrids and their progenitors). Under the first context, metrics are implemented in the R package exdiva, which can be used to calculate expression divergence through pairwise Spearman’s correlations with permutation-based P values, and comparative analyses of τ indices of tissue specificity and co-occurrence in coexpression modules. We demonstrate the application of exdiva in a large compendium of bulk and single-cell RNA-seq obtained from several plant species. Under the second context, we present HybridExpress, an R package for comparative transcriptomic analyses of hybrids relative to their progenitor species. HybridExpress can be used to normalize count data, identify differentially expressed genes between generations, and identify genes that display additive expression, transgressive up- and down-regulation in hybrids, and expression-level dominance towards one of the parents. We demonstrate the application of HybridExpress in allopolyploid cotton under salt stress, and in root trait heterosis in rice.