RESUMEN
OBJECTIVES: The All of Us Research Program is a precision medicine initiative aimed at establishing a vast, diverse biomedical database accessible through a cloud-based data analysis platform, the Researcher Workbench (RW). Our goal was to empower the research community by co-designing the implementation of SAS in the RW alongside researchers to enable broader use of All of Us data. MATERIALS AND METHODS: Researchers from various fields and with different SAS experience levels participated in co-designing the SAS implementation through user experience interviews. RESULTS: Feedback and lessons learned from user testing informed the final design of the SAS application. DISCUSSION: The co-design approach is critical for reducing technical barriers, broadening All of Us data use, and enhancing the user experience for data analysis on the RW. CONCLUSION: Our co-design approach successfully tailored the implementation of the SAS application to researchers' needs. This approach may inform future software implementations on the RW.
RESUMEN
The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.
Asunto(s)
Genoma Humano , Genómica , Modelos Genéticos , Mutación , Humanos , Acceso a la Información , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Frecuencia de los Genes , Genoma Humano/genética , Mutación/genética , Selección GenéticaRESUMEN
An improved understanding of the human lung necessitates advanced systems models informed by an ever-increasing repertoire of molecular omics, cellular, imaging, and pathological datasets. To centralize and standardize information across broad lung research efforts we expanded the LungMAP.net website into a new gateway portal. This portal connects a broad spectrum of research networks, bulk and single-cell multi-omics data and a diverse collection of image data that span mammalian lung development, and disease. The data are standardized across species and technologies using harmonized data and metadata models that leverage recent advances including those from the Human Cell Atlas, diverse ontologies, and the LungMAP CellCards initiative. To cultivate future discoveries, we have aggregated a diverse collection of single-cell atlases for multiple species (human, rhesus, mouse), to enable consistent queries across technologies, cohorts, age, disease, and drug treatment. These atlases are provided as independent and integrated queryable datasets, with an emphasis on dynamic visualization, figure generation, re-analysis, cell-type curation, and automated reference-based classification of user-provided single-cell genomics datasets (Azimuth). As this resource grows, we intend to increase the breadth of available interactive interfaces, supported data types, data portals and datasets from LungMAP and external research efforts.
RESUMEN
Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.
Asunto(s)
Enfermedad/genética , Variación Genética , Genética Médica/normas , Genética de Población/normas , Genoma Humano/genética , Femenino , Pruebas Genéticas , Técnicas de Genotipaje , Humanos , Masculino , Persona de Mediana Edad , Mutación , Polimorfismo de Nucleótido Simple/genética , Grupos Raciales/genética , Estándares de Referencia , Selección Genética , Secuenciación Completa del GenomaRESUMEN
Aberrant HGF-MET (hepatocyte growth factor-met proto-oncogene) signaling activation via interactions with surrounding stromal cells in tumor microenvironment has significant roles in malignant tumor progression. However, extracellular proteolytic regulation of HGF activation, which is influenced by the tumor microenvironment, and its consequential effects on melanoma malignancy remain uncharacterized. In this study, we identified SPINT2 (serine peptidase inhibitor Kunitz type 2), a proteolytic inhibitor of hepatocyte growth factor activator (HGFA), which has a significant role in the suppression of the HGF-MET pathway and malignant melanoma progression. SPINT2 expression is significantly lower in metastatic melanoma tissues compared with those in early-stage primary melanomas, which also corresponded with DNA methylation levels isolated from tissue samples. Treatment with the DNA-hypomethylating agent decitabine in cultured melanoma cells induced transcriptional reactivation of SPINT2, suggesting that this gene is epigenetically silenced in malignant melanomas. Furthermore, we show that ectopically expressed SPINT2 in melanoma cells inhibits the HGF-induced MET-AKT (v-Akt murine thymoma viral oncogene) signaling pathway and decreases malignant phenotype potential such as cell motility and invasive growth of melanoma cells. These results suggest that SPINT2 is associated with tumor-suppressive functions in melanoma by inhibiting an extracellular signal regulator of HGF, which is typically activated by tumor-stromal interactions. These findings indicate that epigenetic impairment of the tightly regulated cytokine-receptor communications in tumor microenvironment may contribute to malignant tumor progression.