RESUMO
DNA repetitive sequences (or repeats) comprise over 50% of the human genome and have a crucial regulatory role, specifically regulating transcription machinery. The human brain is the tissue with the highest detectable repeat expression and dysregulations on the repeat activity are related to several neurological and neurodegenerative disorders, as repeat-derived products can stimulate a pro-inflammatory response. Even so, it is unclear how repeat expression acts on the aging neurotypical brain. Here, we leverage a large postmortem transcriptome cohort spanning the human lifespan to assess global repeat expression in the neurotypical brain. We identified 21,696 differentially expressed repeats (DERs) that varied across seven age bins (Prenatal; 0-15; 16-29; 30-39; 40-49; 50-59; 60+) across the caudate nucleus (n=271), dorsolateral prefrontal cortex (n=304), and hippocampus (n=310). Interestingly, we found that long interspersed nuclear elements and long terminal repeats (LTRs) DERs were the most abundant repeat families when comparing infants to early adolescence (0-15) with older adults (60+). Of these differentially regulated LTRs, we identified 17 shared across all brain regions, including increased expression of HERV-K-int in older adult brains (60+). Co-expression analysis from each of the three brain regions also showed repeats from the HERV subfamily were intramodular hubs in its subnetworks. While we do not observe a strong global relationship between repeat expression and age, we identified HERV-K as a repeat signature associated with the aging neurotypical brain. Our study is the first global assessment of repeat expression in the neurotypical brain.
RESUMO
MOTIVATION: Advances in technology have generated larger omics datasets with potential applications for machine learning. In many datasets, however, cost and limited sample availability result in an excessively higher number of features as compared to observations. Moreover, biological processes are associated with networks of core and peripheral genes, while traditional feature selection approaches capture only core genes. RESULTS: To overcome these limitations, we present dRFEtools that implements dynamic recursive feature elimination (RFE), reducing computational time with high accuracy compared to standard RFE, expanding dynamic RFE to regression algorithms, and outputting the subsets of features that hold predictive power with and without peripheral features. dRFEtools integrates with scikit-learn (the popular Python machine learning platform) and thus provides new opportunities for dynamic RFE in large-scale omics data while enhancing its interpretability. AVAILABILITY AND IMPLEMENTATION: dRFEtools is freely available on PyPI at https://pypi.org/project/drfetools/ or on GitHub https://github.com/LieberInstitute/dRFEtools, implemented in Python 3, and supported on Linux, Windows, and Mac OS.