RESUMO
Non-coding genetic variants outside of protein-coding genome regions play an important role in genetic and epigenetic regulation. It has become increasingly important to understand their roles, as non-coding variants often make up the majority of top findings of genome-wide association studies (GWAS). In addition, the growing popularity of disease-specific whole-genome sequencing (WGS) efforts expands the library of and offers unique opportunities for investigating both common and rare non-coding variants, which are typically not detected in more limited GWAS approaches. However, the sheer size and breadth of WGS data introduce additional challenges to predicting functional impacts in terms of data analysis and interpretation. This review focuses on the recent approaches developed for efficient, at-scale annotation and prioritization of non-coding variants uncovered in WGS analyses. In particular, we review the latest scalable annotation tools, databases and functional genomic resources for interpreting the variant findings from WGS based on both experimental data and in silico predictive annotations. We also review machine learning-based predictive models for variant scoring and prioritization. We conclude with a discussion of future research directions which will enhance the data and tools necessary for the effective functional analyses of variants identified by WGS to improve our understanding of disease etiology.
Assuntos
Epigênese Genética , Estudo de Associação Genômica Ampla , Sequenciamento Completo do Genoma , GenômicaRESUMO
SUMMARY: Preparing functional genomic (FG) data with diverse assay types and file formats for integration into analysis workflows that interpret genome-wide association and other studies is a significant and time-consuming challenge. Here we introduce hipFG (Harmonization and Integration Pipeline for Functional Genomics), an automatically customized pipeline for efficient and scalable normalization of heterogenous FG data collections into standardized, indexed, rapidly searchable analysis-ready datasets while accounting for FG datatypes (e.g. chromatin interactions, genomic intervals, quantitative trait loci). AVAILABILITY AND IMPLEMENTATION: hipFG is freely available at https://bitbucket.org/wanglab-upenn/hipFG. A Docker container is available at https://hub.docker.com/r/wanglab/hipfg.
Assuntos
Estudo de Associação Genômica Ampla , Software , Genômica , Cromatina , Locos de Características QuantitativasRESUMO
INTRODUCTION: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site Alzheimer's Genomics Database (GenomicsDB) is a public knowledge base of Alzheimer's disease (AD) genetic datasets and genomic annotations. METHODS: GenomicsDB uses a custom systems architecture to adopt and enforce rigorous standards that facilitate harmonization of AD-relevant genome-wide association study summary statistics datasets with functional annotations, including over 230 million annotated variants from the AD Sequencing Project. RESULTS: GenomicsDB generates interactive reports compiled from the harmonized datasets and annotations. These reports contextualize AD-risk associations in a broader functional genomic setting and summarize them in the context of functionally annotated genes and variants. DISCUSSION: Created to make AD-genetics knowledge more accessible to AD researchers, the GenomicsDB is designed to guide users unfamiliar with genetic data in not only exploring but also interpreting this ever-growing volume of data. Scalable and interoperable with other genomics resources using data technology standards, the GenomicsDB can serve as a central hub for research and data analysis on AD and related dementias. HIGHLIGHTS: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) offers to the public a unique, disease-centric collection of AD-relevant GWAS summary statistics datasets. Interpreting these data is challenging and requires significant bioinformatics expertise to standardize datasets and harmonize them with functional annotations on genome-wide scales. The NIAGADS Alzheimer's GenomicsDB helps overcome these challenges by providing a user-friendly public knowledge base for AD-relevant genetics that shares harmonized, annotated summary statistics datasets from the NIAGADS repository in an interpretable, easily searchable format.
Assuntos
Doença de Alzheimer , Estados Unidos , Humanos , Doença de Alzheimer/genética , Estudo de Associação Genômica Ampla , National Institute on Aging (U.S.) , Genômica , Bases de Dados Factuais , Predisposição Genética para Doença/genéticaRESUMO
NIAGADS is the National Institute on Aging (NIA) designated national data repository for human genetics research on Alzheimer's Disease and related dementia (ADRD). NIAGADS maintains a high-quality data collection for ADRD genetic/genomic research and supports genetics data production and analysis. NIAGADS hosts whole genome and exome sequence data from the Alzheimer's Disease Sequencing Project (ADSP) and other genotype/phenotype data, encompassing 209,000 samples. NIAGADS shares these data with hundreds of research groups around the world via the Data Sharing Service, a FISMA moderate compliant cloud-based platform that fully supports the NIH Genome Data Sharing Policy. NIAGADS Open Access consists of multiple knowledge bases with genome-wide association summary statistics and rich annotations on the biological significance of genetic variants and genes across the human genome. NIAGADS stands as a keystone in promoting collaborations to advance the understanding and treatment of Alzheimer's disease.
RESUMO
Preparing functional genomic (FG) data with diverse assay types and file formats for integration into analysis workflows that interpret genome-wide association and other studies is a significant and time-consuming challenge. Here we introduce hipFG, an automatically customized pipeline for efficient and scalable normalization of heterogenous FG data collections into standardized, indexed, rapidly searchable analysis-ready datasets while accounting for FG datatypes (e.g., chromatin interactions, genomic intervals, quantitative trait loci).
RESUMO
Lysine specific methyltransferase 2D (Kmt2d) catalyzes the mono-methylation of histone 3 lysine 4 (H3K4me1) and plays a critical role in regulatory T cell generation via modulating Foxp3 gene expression. Here we report a role of Kmt2d in naïve CD8+ T cell generation and survival. In the absence of Kmt2d, the number of CD8+ T cells, particularly naïve CD8+ T cells (CD62Lhi/CD44lo), in spleen was greatly decreased and in vitro activation-related death significantly increased from Kmt2d fl/flCD4cre+ (KO) compared to Kmt2d fl/flCD4cre- (WT) mice. Furthermore, analyses by ChIPseq, RNAseq, and scRNAseq showed reduced H3K4me1 levels in enhancers and reduced expression of apoptosis-related genes in activated naïve CD8+ T cells in the absence of Kmt2d. Finally, we confirmed the activation-induced death of antigen-specific naïve CD8+ T cells in vivo in Kmt2d KO mice upon challenge with Listeria monocytogenes infection. These findings reveal that Kmt2d regulates activation-induced naïve CD8+ T cell survival via modulating H3K4me1 levels in enhancer regions of apoptosis and immune function-related genes.
Assuntos
Linfócitos T CD8-Positivos , Histona-Lisina N-Metiltransferase , Lisina , Animais , Camundongos , Biomarcadores , Linfócitos T CD8-Positivos/metabolismo , Histona-Lisina N-Metiltransferase/genética , Histona-Lisina N-Metiltransferase/metabolismo , Histonas/metabolismo , Fatores de Transcrição/metabolismoRESUMO
The decline of CD8+ T cell functions contributes to deteriorating health with aging, but the mechanisms that underlie this phenomenon are not well understood. We use single-cell RNA sequencing with both cross-sectional and longitudinal samples to assess how human CD8+ T cell heterogeneity and transcriptomes change over nine decades of life. Eleven subpopulations of CD8+ T cells and their dynamic changes with age are identified. Age-related changes in gene expression result from changes in the percentage of cells expressing a given transcript, quantitative changes in the transcript level, or a combination of these two. We develop a machine learning model capable of predicting the age of individual cells based on their transcriptomic features, which are closely associated with their differentiation and mutation burden. Finally, we validate this model in two separate contexts of CD8+ T cell aging: HIV infection and CAR T cell expansion in vivo.
Assuntos
Linfócitos T CD8-Positivos , Infecções por HIV , Envelhecimento/genética , Linfócitos T CD8-Positivos/metabolismo , Estudos Transversais , Infecções por HIV/genética , Infecções por HIV/metabolismo , Humanos , TranscriptomaRESUMO
A diverse T cell receptor (TCR) repertoire is essential for protection against a variety of pathogens, and TCR repertoire size is believed to decline with age. However, the precise size of human TCR repertoires, in both total and subsets of T cells, as well as their changes with age, are not fully characterized. We conducted a longitudinal analysis of the human blood TCRα and TCRß repertoire of CD4+ and CD8+ T cell subsets using a unique molecular identifier-based (UMI-based) RNA-seq method. Thorough analysis of 1.9 × 108 T cells yielded the lower estimate of TCR repertoire richness in an adult at 3.8 × 108. Alterations of the TCR repertoire with age were observed in all 4 subsets of T cells. The greatest reduction was observed in naive CD8+ T cells, while the greatest clonal expansion was in memory CD8+ T cells, and the highest increased retention of TCR sequences was in memory CD8+ T cells. Our results demonstrated that age-related TCR repertoire attrition is subset specific and more profound for CD8+ than CD4+ T cells, suggesting that aging has a more profound effect on cytotoxic as opposed to helper T cell functions. This may explain the increased susceptibility of older adults to novel infections.