RESUMEN
The Genome Aggregation Database (gnomAD), widely recognized as the gold-standard reference map of human genetic variation, has largely overlooked tandem repeat (TR) expansions, despite the fact that TRs constitute â¼6% of our genome and are linked to over 50 human diseases. Here, we introduce the TR-gnomAD (https://wlcb.oit.uci.edu/TRgnomAD), a biobank-scale reference of 0.86 million TRs derived from 338,963 whole-genome sequencing (WGS) samples of diverse ancestries (39.5% non-European samples). TR-gnomAD offers critical insights into ancestry-specific disease prevalence using disparities in TR unit number frequencies among ancestries. Moreover, TR-gnomAD is able to differentiate between common, presumably benign TR expansions, which are prevalent in TR-gnomAD, from those potentially pathogenic TR expansions, which are found more frequently in disease groups than within TR-gnomAD. Together, TR-gnomAD is an invaluable resource for researchers and physicians to interpret TR expansions in individuals with genetic diseases.
Asunto(s)
Genoma Humano , Secuencias Repetidas en Tándem , Humanos , Secuencias Repetidas en Tándem/genética , Secuenciación Completa del Genoma , Bases de Datos Genéticas , Expansión de las Repeticiones de ADN/genética , Estudio de Asociación del Genoma CompletoRESUMEN
Significant disparities in the clinical usefulness of genomic information across diverse groups are due to underrepresentation in genetic databases and inequitable access to genetic services. Remedying disparities is immediately needed to ensure that genomic medicine is more equitable but will take a long-term commitment and active engagement of diverse communities.
Asunto(s)
Medicina Genómica , Genómica , Disparidades en Atención de Salud , Bases de Datos GenéticasRESUMEN
Understanding population health disparities is an essential component of equitable precision health efforts. Epidemiology research often relies on definitions of race and ethnicity, but these population labels may not adequately capture disease burdens and environmental factors impacting specific sub-populations. Here, we propose a framework for repurposing data from electronic health records (EHRs) in concert with genomic data to explore the demographic ties that can impact disease burdens. Using data from a diverse biobank in New York City, we identified 17 communities sharing recent genetic ancestry. We observed 1,177 health outcomes that were statistically associated with a specific group and demonstrated significant differences in the segregation of genetic variants contributing to Mendelian diseases. We also demonstrated that fine-scale population structure can impact the prediction of complex disease risk within groups. This work reinforces the utility of linking genomic data to EHRs and provides a framework toward fine-scale monitoring of population health.
Asunto(s)
Etnicidad/genética , Salud Poblacional , Bases de Datos Genéticas , Registros Electrónicos de Salud , Genómica , Humanos , AutoinformeRESUMEN
We present a global atlas of 4,728 metagenomic samples from mass-transit systems in 60 cities over 3 years, representing the first systematic, worldwide catalog of the urban microbial ecosystem. This atlas provides an annotated, geospatial profile of microbial strains, functional characteristics, antimicrobial resistance (AMR) markers, and genetic elements, including 10,928 viruses, 1,302 bacteria, 2 archaea, and 838,532 CRISPR arrays not found in reference databases. We identified 4,246 known species of urban microorganisms and a consistent set of 31 species found in 97% of samples that were distinct from human commensal organisms. Profiles of AMR genes varied widely in type and density across cities. Cities showed distinct microbial taxonomic signatures that were driven by climate and geographic differences. These results constitute a high-resolution global metagenomic atlas that enables discovery of organisms and genes, highlights potential public health and forensic applications, and provides a culture-independent view of AMR burden in cities.
Asunto(s)
Farmacorresistencia Bacteriana/genética , Metagenómica , Microbiota/genética , Población Urbana , Biodiversidad , Bases de Datos Genéticas , HumanosRESUMEN
When it comes to precision oncology, proteogenomics may provide better prospects to the clinical characterization of tumors, help make a more accurate diagnosis of cancer, and improve treatment for patients with cancer. This perspective describes the significant contributions of The Cancer Genome Atlas and the Clinical Proteomic Tumor Analysis Consortium to precision oncology and makes the case that proteogenomics needs to be fully integrated into clinical trials and patient care in order for precision oncology to deliver the right cancer treatment to the right patient at the right dose and at the right time.
Asunto(s)
Neoplasias/diagnóstico , Proteogenómica/métodos , Bases de Datos Genéticas , Descubrimiento de Drogas , Estudios de Asociación Genética , Humanos , Neoplasias/genética , Neoplasias/terapia , Medicina de PrecisiónRESUMEN
Intra-tumor heterogeneity (ITH) is a mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin, and drivers of ITH across cancer types are poorly understood. To address this, we extensively characterize ITH across whole-genome sequences of 2,658 cancer samples spanning 38 cancer types. Nearly all informative samples (95.1%) contain evidence of distinct subclonal expansions with frequent branching relationships between subclones. We observe positive selection of subclonal driver mutations across most cancer types and identify cancer type-specific subclonal patterns of driver gene mutations, fusions, structural variants, and copy number alterations as well as dynamic changes in mutational processes between subclonal expansions. Our results underline the importance of ITH and its drivers in tumor evolution and provide a pan-cancer resource of comprehensively annotated subclonal events from whole-genome sequencing data.
Asunto(s)
Heterogeneidad Genética , Neoplasias/genética , Variaciones en el Número de Copia de ADN , ADN de Neoplasias/química , ADN de Neoplasias/metabolismo , Bases de Datos Genéticas , Resistencia a Antineoplásicos/genética , Humanos , Neoplasias/patología , Polimorfismo de Nucleótido Simple , Secuenciación Completa del GenomaRESUMEN
The Middle East region is important to understand human evolution and migrations but is underrepresented in genomic studies. Here, we generated 137 high-coverage physically phased genome sequences from eight Middle Eastern populations using linked-read sequencing. We found no genetic traces of early expansions out-of-Africa in present-day populations but found Arabians have elevated Basal Eurasian ancestry that dilutes their Neanderthal ancestry. Population sizes within the region started diverging 15-20 kya, when Levantines expanded while Arabians maintained smaller populations that derived ancestry from local hunter-gatherers. Arabians suffered a population bottleneck around the aridification of Arabia 6 kya, while Levantines had a distinct bottleneck overlapping the 4.2 kya aridification event. We found an association between movement and admixture of populations in the region and the spread of Semitic languages. Finally, we identify variants that show evidence of selection, including polygenic selection. Our results provide detailed insights into the genomic and selective histories of the Middle East.
Asunto(s)
Genética de Población/historia , Genoma Humano , Animales , Cromosomas Humanos Y/genética , Bases de Datos Genéticas , Pool de Genes , Introgresión Genética , Geografía , Historia Antigua , Migración Humana , Humanos , Medio Oriente , Modelos Genéticos , Hombre de Neandertal/genética , Filogenia , Densidad de Población , Selección Genética , Análisis de Secuencia de ADNRESUMEN
Structural variants contribute substantially to genetic diversity and are important evolutionarily and medically, but they are still understudied. Here we present a comprehensive analysis of structural variation in the Human Genome Diversity panel, a high-coverage dataset of 911 samples from 54 diverse worldwide populations. We identify, in total, 126,018 variants, 78% of which were not identified in previous global sequencing projects. Some reach high frequency and are private to continental groups or even individual populations, including regionally restricted runaway duplications and putatively introgressed variants from archaic hominins. By de novo assembly of 25 genomes using linked-read sequencing, we discover 1,643 breakpoint-resolved unique insertions, in aggregate accounting for 1.9 Mb of sequence absent from the GRCh38 reference. Our results illustrate the limitation of a single human reference and the need for high-quality genomes from diverse populations to fully discover and understand human genetic variation.
Asunto(s)
Genética de Población , Variación Estructural del Genoma , Alelos , Bases de Datos Genéticas , Dosificación de Gen , Duplicación de Gen , Frecuencia de los Genes/genética , Variación Genética , Genoma Humano , HumanosRESUMEN
Hepatocellular carcinoma (HCC) is an aggressive malignancy with its global incidence and mortality rate continuing to rise, although early detection and surveillance are suboptimal. We performed serological profiling of the viral infection history in 899 individuals from an NCI-UMD case-control study using a synthetic human virome, VirScan. We developed a viral exposure signature and validated the results in a longitudinal cohort with 173 at-risk patients who had long-term follow-up for HCC development. Our viral exposure signature significantly associated with HCC status among at-risk individuals in the validation cohort (area under the curve: 0.91 [95% CI 0.87-0.96] at baseline and 0.98 [95% CI 0.97-1] at diagnosis). The signature identified cancer patients prior to a clinical diagnosis and was superior to alpha-fetoprotein. In summary, we established a viral exposure signature that can predict HCC among at-risk patients prior to a clinical diagnosis, which may be useful in HCC surveillance.
Asunto(s)
Carcinoma Hepatocelular/patología , Neoplasias Hepáticas/patología , Virosis/patología , Adulto , Anciano , Área Bajo la Curva , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/metabolismo , Estudios de Casos y Controles , Estudios de Cohortes , Bases de Datos Genéticas , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Desequilibrio de Ligamiento , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/metabolismo , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Curva ROC , Factores de Riesgo , Virosis/complicaciones , Adulto Joven , alfa-Fetoproteínas/análisisRESUMEN
Most human protein-coding genes are regulated by multiple, distinct promoters, suggesting that the choice of promoter is as important as its level of transcriptional activity. However, while a global change in transcription is recognized as a defining feature of cancer, the contribution of alternative promoters still remains largely unexplored. Here, we infer active promoters using RNA-seq data from 18,468 cancer and normal samples, demonstrating that alternative promoters are a major contributor to context-specific regulation of transcription. We find that promoters are deregulated across tissues, cancer types, and patients, affecting known cancer genes and novel candidates. For genes with independently regulated promoters, we demonstrate that promoter activity provides a more accurate predictor of patient survival than gene expression. Our study suggests that a dynamic landscape of active promoters shapes the cancer transcriptome, opening new diagnostic avenues and opportunities to further explore the interplay of regulatory mechanisms with transcriptional aberrations in cancer.
Asunto(s)
Biología Computacional/métodos , Regulación Neoplásica de la Expresión Génica/genética , Neoplasias/genética , Regiones Promotoras Genéticas/genética , Transcriptoma/genética , Bases de Datos Genéticas , Humanos , RNA-Seq/métodosRESUMEN
Alternative polyadenylation (APA) is a major driver of transcriptome diversity in human cells. Here, we use deep learning to predict APA from DNA sequence alone. We trained our model (APARENT, APA REgression NeT) on isoform expression data from over 3 million APA reporters. APARENT's predictions are highly accurate when tasked with inferring APA in synthetic and human 3'UTRs. Visualizing features learned across all network layers reveals that APARENT recognizes sequence motifs known to recruit APA regulators, discovers previously unknown sequence determinants of 3' end processing, and integrates these features into a comprehensive, interpretable, cis-regulatory code. We apply APARENT to forward engineer functional polyadenylation signals with precisely defined cleavage position and isoform usage and validate predictions experimentally. Finally, we use APARENT to quantify the impact of genetic variants on APA. Our approach detects pathogenic variants in a wide range of disease contexts, expanding our understanding of the genetic origins of disease.
Asunto(s)
Aprendizaje Profundo , Modelos Genéticos , Poliadenilación/genética , Regiones no Traducidas 3'/genética , Secuencia de Bases/genética , Bases de Datos Genéticas , Expresión Génica/genética , Células HEK293 , Humanos , Mutagénesis/genética , División del ARN/genética , ARN Mensajero/genética , RNA-Seq , Biología Sintética , TranscriptomaRESUMEN
Early genome-wide association studies (GWASs) led to the surprising discovery that, for typical complex traits, most of the heritability is due to huge numbers of common variants with tiny effect sizes. Previously, we argued that new models are needed to understand these patterns. Here, we provide a formal model in which genetic contributions to complex traits are partitioned into direct effects from core genes and indirect effects from peripheral genes acting in trans. We propose that most heritability is driven by weak trans-eQTL SNPs, whose effects are mediated through peripheral genes to impact the expression of core genes. In particular, if the core genes for a trait tend to be co-regulated, then the effects of peripheral variation can be amplified such that nearly all of the genetic variance is driven by weak trans effects. Thus, our model proposes a framework for understanding key features of the architecture of complex traits.
Asunto(s)
Regulación de la Expresión Génica/genética , Herencia/genética , Herencia Multifactorial/genética , Bases de Datos Genéticas , Expresión Génica/genética , Perfilación de la Expresión Génica/métodos , Variación Genética/genética , Estudio de Asociación del Genoma Completo , Humanos , Modelos Teóricos , Fenotipo , Polimorfismo Genético/genética , Sitios de Carácter Cuantitativo/genéticaRESUMEN
Denisovans are an extinct group of humans whose morphology remains unknown. Here, we present a method for reconstructing skeletal morphology using DNA methylation patterns. Our method is based on linking unidirectional methylation changes to loss-of-function phenotypes. We tested performance by reconstructing Neanderthal and chimpanzee skeletal morphologies and obtained >85% precision in identifying divergent traits. We then applied this method to the Denisovan and offer a putative morphological profile. We suggest that Denisovans likely shared with Neanderthals traits such as an elongated face and a wide pelvis. We also identify Denisovan-derived changes, such as an increased dental arch and lateral cranial expansion. Our predictions match the only morphologically informative Denisovan bone to date, as well as the Xuchang skull, which was suggested by some to be a Denisovan. We conclude that DNA methylation can be used to reconstruct anatomical features, including some that do not survive in the fossil record.
Asunto(s)
Metilación de ADN/genética , Hombre de Neandertal/anatomía & histología , Hombre de Neandertal/genética , Pan troglodytes/anatomía & histología , Pan troglodytes/genética , Fenotipo , Animales , Secuencia de Bases , Bases de Datos Genéticas , Extinción Biológica , Fósiles , Genoma Humano/genética , Humanos , Polimorfismo de Nucleótido Simple/genética , Esqueleto , CráneoRESUMEN
Circular RNAs (circRNAs) are an intriguing class of RNA due to their covalently closed structure, high stability, and implicated roles in gene regulation. Here, we used an exome capture RNA sequencing protocol to detect and characterize circRNAs across >2,000 cancer samples. When compared against Ribo-Zero and RNase R, capture sequencing significantly enhanced the enrichment of circRNAs and preserved accurate circular-to-linear ratios. Using capture sequencing, we built the most comprehensive catalog of circRNA species to date: MiOncoCirc, the first database to be composed primarily of circRNAs directly detected in tumor tissues. Using MiOncoCirc, we identified candidate circRNAs to serve as biomarkers for prostate cancer and were able to detect circRNAs in urine. We further detected a novel class of circular transcripts, termed read-through circRNAs, that involved exons originating from different genes. MiOncoCirc will serve as a valuable resource for the development of circRNAs as diagnostic or therapeutic targets across cancer types.
Asunto(s)
Perfilación de la Expresión Génica/métodos , Neoplasias/genética , ARN/genética , Biomarcadores de Tumor/genética , Bases de Datos Genéticas , Regulación Neoplásica de la Expresión Génica/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , MicroARNs/genética , ARN/metabolismo , ARN Circular , Análisis de Secuencia de ARN/métodos , Secuenciación del Exoma/métodosRESUMEN
Multiple signatures of somatic mutations have been identified in cancer genomes. Exome sequences of 1,001 human cancer cell lines and 577 xenografts revealed most common mutational signatures, indicating past activity of the underlying processes, usually in appropriate cancer types. To investigate ongoing patterns of mutational-signature generation, cell lines were cultured for extended periods and subsequently DNA sequenced. Signatures of discontinued exposures, including tobacco smoke and ultraviolet light, were not generated in vitro. Signatures of normal and defective DNA repair and replication continued to be generated at roughly stable mutation rates. Signatures of APOBEC cytidine deaminase DNA-editing exhibited substantial fluctuations in mutation rate over time with episodic bursts of mutations. The initiating factors for the bursts are unclear, although retrotransposon mobilization may contribute. The examined cell lines constitute a resource of live experimental models of mutational processes, which potentially retain patterns of activity and regulation operative in primary human cancers.
Asunto(s)
Desaminasas APOBEC/genética , Neoplasias/genética , Desaminasas APOBEC/metabolismo , Línea Celular , Línea Celular Tumoral , ADN/metabolismo , Análisis Mutacional de ADN/métodos , Bases de Datos Genéticas , Exoma , Genoma Humano/genética , Xenoinjertos , Humanos , Mutagénesis , Mutación/genética , Tasa de Mutación , Retroelementos , Secuenciación del Exoma/métodosRESUMEN
Metagenomic sequencing is revolutionizing the detection and characterization of microbial species, and a wide variety of software tools are available to perform taxonomic classification of these data. The fast pace of development of these tools and the complexity of metagenomic data make it important that researchers are able to benchmark their performance. Here, we review current approaches for metagenomic analysis and evaluate the performance of 20 metagenomic classifiers using simulated and experimental datasets. We describe the key metrics used to assess performance, offer a framework for the comparison of additional classifiers, and discuss the future of metagenomic data analysis.
Asunto(s)
Bacterias/clasificación , Benchmarking/métodos , Hongos/clasificación , Metagenoma/genética , Metagenómica/métodos , Virus/clasificación , Bacterias/genética , Bases de Datos Genéticas , Hongos/genética , Filogenia , Reacción en Cadena de la Polimerasa , Análisis de Secuencia de ADN , Programas Informáticos , Virus/genéticaRESUMEN
We performed a comprehensive assessment of rare inherited variation in autism spectrum disorder (ASD) by analyzing whole-genome sequences of 2,308 individuals from families with multiple affected children. We implicate 69 genes in ASD risk, including 24 passing genome-wide Bonferroni correction and 16 new ASD risk genes, most supported by rare inherited variants, a substantial extension of previous findings. Biological pathways enriched for genes harboring inherited variants represent cytoskeletal organization and ion transport, which are distinct from pathways implicated in previous studies. Nevertheless, the de novo and inherited genes contribute to a common protein-protein interaction network. We also identified structural variants (SVs) affecting non-coding regions, implicating recurrent deletions in the promoters of DLG2 and NR3C2. Loss of nr3c2 function in zebrafish disrupts sleep and social function, overlapping with human ASD-related phenotypes. These data support the utility of studying multiplex families in ASD and are available through the Hartwell Autism Research and Technology portal.
Asunto(s)
Trastorno del Espectro Autista/genética , Predisposición Genética a la Enfermedad/genética , Linaje , Mapas de Interacción de Proteínas/genética , Animales , Niño , Bases de Datos Genéticas , Modelos Animales de Enfermedad , Femenino , Eliminación de Gen , Guanilato-Quinasas/genética , Humanos , Patrón de Herencia/genética , Aprendizaje Automático , Masculino , Núcleo Familiar , Regiones Promotoras Genéticas/genética , Receptores de Mineralocorticoides/genética , Factores de Riesgo , Proteínas Supresoras de Tumor/genética , Secuenciación Completa del Genoma , Pez Cebra/genéticaRESUMEN
Single-cell genomics technology has transformed our understanding of complex cellular systems. However, excessive cost and a lack of strategies for the purification of newly identified cell types impede their functional characterization and large-scale profiling. Here, we have generated high-content single-cell proteo-genomic reference maps of human blood and bone marrow that quantitatively link the expression of up to 197 surface markers to cellular identities and biological processes across all main hematopoietic cell types in healthy aging and leukemia. These reference maps enable the automatic design of cost-effective high-throughput cytometry schemes that outperform state-of-the-art approaches, accurately reflect complex topologies of cellular systems and permit the purification of precisely defined cell states. The systematic integration of cytometry and proteo-genomic data enables the functional capacities of precisely mapped cell states to be measured at the single-cell level. Our study serves as an accessible resource and paves the way for a data-driven era in cytometry.
Asunto(s)
Células Sanguíneas/metabolismo , Células de la Médula Ósea/metabolismo , Separación Celular , Citometría de Flujo , Perfilación de la Expresión Génica , Proteoma , Proteómica , Análisis de la Célula Individual , Transcriptoma , Factores de Edad , Células Sanguíneas/inmunología , Células Sanguíneas/patología , Células de la Médula Ósea/inmunología , Células de la Médula Ósea/patología , Células Cultivadas , Bases de Datos Genéticas , Envejecimiento Saludable/genética , Envejecimiento Saludable/inmunología , Envejecimiento Saludable/metabolismo , Humanos , Leucemia/genética , Leucemia/inmunología , Leucemia/metabolismo , Leucemia/patología , RNA-Seq , Biología de SistemasRESUMEN
The signals driving the adaptation of type 2 dendritic cells (DC2s) to diverse peripheral environments remain mostly undefined. We show that differentiation of CD11blo migratory DC2s-a DC2 population unique to the dermis-required IL-13 signaling dependent on the transcription factors STAT6 and KLF4, whereas DC2s in lung and small intestine were STAT6-independent. Similarly, human DC2s in skin expressed an IL-4 and IL-13 gene signature that was not found in blood, spleen and lung DCs. In mice, IL-13 was secreted homeostatically by dermal innate lymphoid cells and was independent of microbiota, TSLP or IL-33. In the absence of IL-13 signaling, dermal DC2s were stable in number but remained CD11bhi and showed defective activation in response to allergens, with diminished ability to support the development of IL-4+GATA3+ helper T cells (TH), whereas antifungal IL-17+RORγt+ TH cells were increased. Therefore, homeostatic IL-13 fosters a noninflammatory skin environment that supports allergic sensitization.
Asunto(s)
Comunicación Celular , Diferenciación Celular , Interleucina-13/metabolismo , Células de Langerhans/metabolismo , Piel/metabolismo , Células Th17/metabolismo , Células Th2/metabolismo , Alérgenos/farmacología , Animales , Antígeno CD11b/genética , Antígeno CD11b/metabolismo , Células Cultivadas , Bases de Datos Genéticas , Humanos , Interleucina-13/genética , Células de Langerhans/efectos de los fármacos , Células de Langerhans/inmunología , Ratones Endogámicos BALB C , Ratones Endogámicos C57BL , Ratones Noqueados , Fenotipo , Factor de Transcripción STAT6/genética , Factor de Transcripción STAT6/metabolismo , Transducción de Señal , Piel/citología , Piel/efectos de los fármacos , Piel/inmunología , Células Th17/efectos de los fármacos , Células Th17/inmunología , Células Th2/efectos de los fármacos , Células Th2/inmunología , TranscriptomaRESUMEN
This SnapShot provides a list of the tumor types characterized by The Cancer Genome Atlas (TCGA) program. Key findings shown are the most relevant discoveries described in each marker paper for the tumor type.