RESUMO
The data-intensive fields of genomics and machine learning (ML) are in an early stage of convergence. Genomics researchers increasingly seek to harness the power of ML methods to extract knowledge from their data; conversely, ML scientists recognize that genomics offers a wealth of large, complex, and well-annotated datasets that can be used as a substrate for developing biologically relevant algorithms and applications. The National Human Genome Research Institute (NHGRI) inquired with researchers working in these two fields to identify common challenges and receive recommendations to better support genomic research efforts using ML approaches. Those included increasing the amount and variety of training datasets by integrating genomic with multiomics, context-specific (e.g., by cell type), and social determinants of health datasets; reducing the inherent biases of training datasets; prioritizing transparency and interpretability of ML methods; and developing privacy-preserving technologies for research participants' data.
Assuntos
Bioética , Genômica , Humanos , Algoritmos , Privacidade , Aprendizado de MáquinaRESUMO
Genomic architecture appears to play crucial roles in health and a variety of diseases. How nuclear structures reorganize over different timescales is elusive, partly because the tools needed to probe and perturb them are not as advanced as needed by the field. To fill this gap, the National Institutes of Health Common Fund started a program in 2015, called the 4D Nucleome (4DN), with the goal of developing and ultimately applying technologies to interrogate the structure and function of nuclear organization in space and time.
Assuntos
Núcleo Celular , Genoma , Estados Unidos , Núcleo Celular/genética , GenômicaRESUMO
Starting with the launch of the Human Genome Project three decades ago, and continuing after its completion in 2003, genomics has progressively come to have a central and catalytic role in basic and translational research. In addition, studies increasingly demonstrate how genomic information can be effectively used in clinical care. In the future, the anticipated advances in technology development, biological insights, and clinical applications (among others) will lead to more widespread integration of genomics into almost all areas of biomedical research, the adoption of genomics into mainstream medical and public-health practices, and an increasing relevance of genomics for everyday life. On behalf of the research community, the National Human Genome Research Institute recently completed a multi-year process of strategic engagement to identify future research priorities and opportunities in human genomics, with an emphasis on health applications. Here we describe the highest-priority elements envisioned for the cutting-edge of human genomics going forward-that is, at 'The Forefront of Genomics'.
Assuntos
Pesquisa Biomédica/tendências , Genoma Humano/genética , Genômica/tendências , Saúde Pública/normas , Pesquisa Translacional Biomédica/tendências , Pesquisa Biomédica/economia , COVID-19/genética , Genômica/economia , Humanos , National Human Genome Research Institute (U.S.)/economia , Mudança Social , Pesquisa Translacional Biomédica/economia , Estados UnidosRESUMO
We studied 137 primary testicular germ cell tumors (TGCTs) using high-dimensional assays of genomic, epigenomic, transcriptomic, and proteomic features. These tumors exhibited high aneuploidy and a paucity of somatic mutations. Somatic mutation of only three genes achieved significance-KIT, KRAS, and NRAS-exclusively in samples with seminoma components. Integrated analyses identified distinct molecular patterns that characterized the major recognized histologic subtypes of TGCT: seminoma, embryonal carcinoma, yolk sac tumor, and teratoma. Striking differences in global DNA methylation and microRNA expression between histology subtypes highlight a likely role of epigenomic processes in determining histologic fates in TGCTs. We also identified a subset of pure seminomas defined by KIT mutations, increased immune infiltration, globally demethylated DNA, and decreased KRAS copy number. We report potential biomarkers for risk stratification, such as miRNA specifically expressed in teratoma, and others with molecular diagnostic potential, such as CpH (CpA/CpC/CpT) methylation identifying embryonal carcinomas.
Assuntos
Neoplasias Embrionárias de Células Germinativas/patologia , Neoplasias Testiculares/patologia , Variações do Número de Cópias de DNA , Metilação de DNA , Regulação Neoplásica da Expressão Gênica , Humanos , Masculino , MicroRNAs/metabolismo , Neoplasias Embrionárias de Células Germinativas/classificação , Neoplasias Embrionárias de Células Germinativas/metabolismo , Proteínas Proto-Oncogênicas c-kit/genética , Proteínas Proto-Oncogênicas c-kit/metabolismo , Seminoma/metabolismo , Seminoma/patologia , Neoplasias Testiculares/classificação , Neoplasias Testiculares/metabolismo , Proteínas ras/genética , Proteínas ras/metabolismoRESUMO
The Cancer Genome Atlas (TCGA) team now presents the Pan-Cancer Atlas, investigating different aspects of cancer biology by analyzing the data generated during the 10+ years of the TCGA project.
Assuntos
Bases de Dados Genéticas , Genes Neoplásicos , Neoplasias/patologia , Aneuploidia , Genoma Humano , Humanos , Mutação , Neoplasias/genética , Neoplasias/imunologia , Neoplasias/metabolismoRESUMO
The Cancer Genome Atlas (TCGA) has catalyzed systematic characterization of diverse genomic alterations underlying human cancers. At this historic junction marking the completion of genomic characterization of over 11,000 tumors from 33 cancer types, we present our current understanding of the molecular processes governing oncogenesis. We illustrate our insights into cancer through synthesis of the findings of the TCGA PanCancer Atlas project on three facets of oncogenesis: (1) somatic driver mutations, germline pathogenic variants, and their interactions in the tumor; (2) the influence of the tumor genome and epigenome on transcriptome and proteome; and (3) the relationship between tumor and the microenvironment, including implications for drugs targeting driver events and immunotherapies. These results will anchor future characterization of rare and common tumor types, primary and relapsed tumors, and cancers across ancestry groups and will guide the deployment of clinical genomic sequencing.
Assuntos
Carcinogênese/genética , Genômica , Neoplasias/patologia , Reparo do DNA/genética , Bases de Dados Genéticas , Genes Neoplásicos , Humanos , Redes e Vias Metabólicas/genética , Instabilidade de Microssatélites , Mutação , Neoplasias/genética , Neoplasias/imunologia , Transcriptoma , Microambiente Tumoral/genéticaRESUMO
The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects.
Assuntos
Genômica/métodos , Neoplasias/genética , Análise de Sequência de DNA/métodos , Algoritmos , Exoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Disseminação de Informação/métodos , Mutação , Software , Sequenciamento do Exoma/métodosRESUMO
Recent studies have highlighted the imperatives of including diverse and under-represented individuals in human genomics research and the striking gaps in attaining that inclusion. With its multidecade experience in supporting research and policy efforts in human genomics, the National Human Genome Research Institute is committed to establishing foundational approaches to study the role of genomic variation in health and disease that include diverse populations. Large-scale efforts to understand biology and health have yielded key scientific findings, lessons and recommendations on how to increase diversity in genomic research studies and the genomic research workforce. Increased attention to diversity will increase the accuracy, utility and acceptability of using genomic information for clinical care.
Assuntos
Variação Genética , Genoma Humano , Genômica/métodos , Genética Humana/métodos , Medicina de Precisão/métodos , HumanosRESUMO
PURPOSE: As massively parallel sequencing is increasingly being used for clinical decision making, it has become critical to understand parameters that affect sequencing quality and to establish methods for measuring and reporting clinical sequencing standards. In this report, we propose a definition for reduced coverage regions and describe a set of standards for variant calling in clinical sequencing applications. METHODS: To enable sequencing centers to assess the regions of poor sequencing quality in their own data, we optimized and used a tool (ExCID) to identify reduced coverage loci within genes or regions of particular interest. We used this framework to examine sequencing data from 500 patients generated in 10 projects at sequencing centers in the National Human Genome Research Institute/National Cancer Institute Clinical Sequencing Exploratory Research Consortium. RESULTS: This approach identified reduced coverage regions in clinically relevant genes, including known clinically relevant loci that were uniquely missed at individual centers, in multiple centers, and in all centers. CONCLUSION: This report provides a process road map for clinical sequencing centers looking to perform similar analyses on their data.
Assuntos
Sequenciamento do Exoma/métodos , Análise de Sequência de DNA/métodos , Sequenciamento Completo do Genoma/métodos , Sequência de Bases , Mapeamento Cromossômico , Exoma , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de DNA/normas , SoftwareRESUMO
Stevens-Johnson syndrome/toxic epidermal necrolysis (SJS/TEN) is one of the most devastating of adverse drug reactions (ADRs) and was, until recently, essentially unpredictable. With the discovery of several risk alleles for drug-induced SJS/TEN and the demonstration of effectiveness of screening in reducing incidence, the stage is set for implementation of preventive strategies in populations at risk. Yet much remains to be learned about this potentially fatal complication of commonly used drugs.
Assuntos
Predisposição Genética para Doença/genética , Testes Genéticos , Síndrome de Stevens-Johnson/genética , Predisposição Genética para Doença/prevenção & controle , Humanos , Incidência , Necrose , Valor Preditivo dos Testes , Síndrome de Stevens-Johnson/epidemiologia , Síndrome de Stevens-Johnson/prevenção & controleRESUMO
Recently, many new approaches, study designs, and statistical and analytical methods have emerged for studying gene-environment interactions (G×Es) in large-scale studies of human populations. There are opportunities in this field, particularly with respect to the incorporation of -omics and next-generation sequencing data and continual improvement in measures of environmental exposures implicated in complex disease outcomes. In a workshop called "Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases," held October 17-18, 2014, by the National Institute of Environmental Health Sciences and the National Cancer Institute in conjunction with the annual American Society of Human Genetics meeting, participants explored new approaches and tools that have been developed in recent years for G×E discovery. This paper highlights current and critical issues and themes in G×E research that need additional consideration, including the improved data analytical methods, environmental exposure assessment, and incorporation of functional data and annotations.
Assuntos
Doença/etiologia , Interação Gene-Ambiente , Estudo de Associação Genômica Ampla/métodos , Doença/genética , Predisposição Genética para Doença , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , SoftwareRESUMO
Genome-wide association studies (GWAS) have identified many genetic susceptibility loci for colorectal cancer (CRC). However, variants in these loci explain only a small proportion of familial aggregation, and there are likely additional variants that are associated with CRC susceptibility. Genome-wide studies of gene-environment interactions may identify variants that are not detected in GWAS of marginal gene effects. To study this, we conducted a genome-wide analysis for interaction between genetic variants and alcohol consumption and cigarette smoking using data from the Colon Cancer Family Registry (CCFR) and the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO). Interactions were tested using logistic regression. We identified interaction between CRC risk and alcohol consumption and variants in the 9q22.32/HIATL1 (Pinteraction = 1.76×10-8; permuted p-value 3.51x10-8) region. Compared to non-/occasional drinking light to moderate alcohol consumption was associated with a lower risk of colorectal cancer among individuals with rs9409565 CT genotype (OR, 0.82 [95% CI, 0.74-0.91]; P = 2.1×10-4) and TT genotypes (OR,0.62 [95% CI, 0.51-0.75]; P = 1.3×10-6) but not associated among those with the CC genotype (p = 0.059). No genome-wide statistically significant interactions were observed for smoking. If replicated our suggestive finding of a genome-wide significant interaction between genetic variants and alcohol consumption might contribute to understanding colorectal cancer etiology and identifying subpopulations with differential susceptibility to the effect of alcohol on CRC risk.
Assuntos
Consumo de Bebidas Alcoólicas/genética , Neoplasias Colorretais/genética , Proteínas de Membrana Transportadoras/genética , Fumar/genética , Proteínas Supressoras de Tumor/genética , Idoso , Consumo de Bebidas Alcoólicas/patologia , Neoplasias Colorretais/epidemiologia , Neoplasias Colorretais/patologia , Feminino , Interação Gene-Ambiente , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Fatores de Risco , Fumar/patologiaRESUMO
Despite rapid technical progress and demonstrable effectiveness for some types of diagnosis and therapy, much remains to be learned about clinical genome and exome sequencing (CGES) and its role within the practice of medicine. The Clinical Sequencing Exploratory Research (CSER) consortium includes 18 extramural research projects, one National Human Genome Research Institute (NHGRI) intramural project, and a coordinating center funded by the NHGRI and National Cancer Institute. The consortium is exploring analytic and clinical validity and utility, as well as the ethical, legal, and social implications of sequencing via multidisciplinary approaches; it has thus far recruited 5,577 participants across a spectrum of symptomatic and healthy children and adults by utilizing both germline and cancer sequencing. The CSER consortium is analyzing data and creating publically available procedures and tools related to participant preferences and consent, variant classification, disclosure and management of primary and secondary findings, health outcomes, and integration with electronic health records. Future research directions will refine measures of clinical utility of CGES in both germline and somatic testing, evaluate the use of CGES for screening in healthy individuals, explore the penetrance of pathogenic variants through extensive phenotyping, reduce discordances in public databases of genes and variants, examine social and ethnic disparities in the provision of genomics services, explore regulatory issues, and estimate the value and downstream costs of sequencing. The CSER consortium has established a shared community of research sites by using diverse approaches to pursue the evidence-based development of best practices in genomic medicine.
Assuntos
Pesquisa Biomédica , Prática Clínica Baseada em Evidências , Exoma/genética , Genoma Humano , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único/genética , Adulto , Doenças Cardiovasculares/genética , Criança , Ensaios Clínicos como Assunto , Humanos , National Human Genome Research Institute (U.S.) , Grupos Populacionais , Software , Estados UnidosRESUMO
BACKGROUND: Menopausal hormone therapy (MHT) use has been consistently associated with a decreased risk of colorectal cancer (CRC) in women. Our aim was to use a genome-wide gene-environment interaction analysis to identify genetic modifiers of CRC risk associated with use of MHT. METHODS: We included 10 835 postmenopausal women (5419 cases and 5416 controls) from 10 studies. We evaluated use of any MHT, oestrogen-only (E-only) and combined oestrogen-progestogen (E+P) hormone preparations. To test for multiplicative interactions, we applied the empirical Bayes (EB) test as well as the Wald test in conventional case-control logistic regression as primary tests. The Cocktail test was used as secondary test. RESULTS: The EB test identified a significant interaction between rs964293 at 20q13.2/CYP24A1 and E+P (interaction OR (95% CIs)=0.61 (0.52-0.72), P=4.8 × 10(-9)). The secondary analysis also identified this interaction (Cocktail test OR=0.64 (0.52-0.78), P=1.2 × 10(-5) (alpha threshold=3.1 × 10(-4)). The ORs for association between E+P and CRC risk by rs964293 genotype were as follows: C/C, 0.96 (0.61-1.50); A/C, 0.61 (0.39-0.95) and A/A, 0.40 (0.22-0.73), respectively. CONCLUSIONS: Our results indicate that rs964293 modifies the association between E+P and CRC risk. The variant is located near CYP24A1, which encodes an enzyme involved in vitamin D metabolism. This novel finding offers additional insight into downstream pathways of CRC etiopathogenesis.
Assuntos
Adenocarcinoma/genética , Neoplasias Colorretais/genética , Terapia de Reposição de Estrogênios/métodos , Estrogênios/uso terapêutico , Progestinas/uso terapêutico , Vitamina D3 24-Hidroxilase/genética , Adenocarcinoma/epidemiologia , Idoso , Teorema de Bayes , Estudos de Casos e Controles , Neoplasias Colorretais/epidemiologia , Quimioterapia Combinada , Feminino , Interação Gene-Ambiente , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Modelos Logísticos , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Fatores de RiscoRESUMO
Genome-wide association studies have identified several germline single nucleotide polymorphisms (SNPs) significantly associated with colorectal cancer (CRC) incidence. Common germline genetic variation may also be related to CRC survival. We used a discovery-based approach to identify SNPs related to survival outcomes after CRC diagnosis. Genome-wide genotyping arrays were conducted for 3494 individuals with invasive CRC enrolled in six prospective cohort studies (median study-specific follow-up = 4.2-8.1 years). In pooled analyses, we used Cox regression to assess SNP-specific associations with CRC-specific and overall survival, with additional analyses stratified by stage at diagnosis. Top findings were followed-up in independent studies. A P value threshold of P < 5×10(-8) in analyses combining discovery and follow-up studies was required for genome-wide significance. Among individuals with distant-metastatic CRC, several SNPs at 6p12.1, nearest the ELOVL5 gene, were statistically significantly associated with poorer survival, with the strongest associations noted for rs209489 [hazard ratio (HR) = 1.8, P = 7.6×10(-10) and HR = 1.8, P = 3.7×10(-9) for CRC-specific and overall survival, respectively). No SNPs were statistically significantly associated with survival among all cases combined or in cases without distant-metastases. SNPs in 6p12.1/ELOVL5 were associated with survival outcomes in individuals with distant-metastatic CRC, and merit further follow-up for functional significance. Findings from this genome-wide association study highlight the potential importance of genetic variation in CRC prognosis and provide clues to genomic regions of potential interest.
Assuntos
Neoplasias Colorretais/genética , Neoplasias Colorretais/mortalidade , Idoso , Idoso de 80 Anos ou mais , Estudos de Coortes , Neoplasias Colorretais/diagnóstico , Feminino , Variação Genética , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Prognóstico , Estudos ProspectivosRESUMO
Genetic susceptibility to colorectal cancer is caused by rare pathogenic mutations and common genetic variants that contribute to familial risk. Here we report the results of a two-stage association study with 18,299 cases of colorectal cancer and 19,656 controls, with follow-up of the most statistically significant genetic loci in 4,725 cases and 9,969 controls from two Asian consortia. We describe six new susceptibility loci reaching a genome-wide threshold of P<5.0E-08. These findings provide additional insight into the underlying biological mechanisms of colorectal cancer and demonstrate the scientific value of large consortia-based genetic epidemiology studies.
Assuntos
Neoplasias Colorretais/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Estudos de Casos e Controles , Humanos , Razão de Chances , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Although genome-wide association studies (GWAS) have separately identified many genetic susceptibility loci for ulcerative colitis (UC), Crohn's disease (CD) and colorectal cancer (CRC), there has been no large-scale examination for pleiotropy, or shared genetic susceptibility, for these conditions. We used logistic regression modeling to examine the associations of 181 UC and CD susceptibility variants previously identified by GWAS with risk of CRC using data from the Genetics and Epidemiology of Colorectal Cancer Consortium and the Colon Cancer Family Registry. We also examined associations of significant variants with clinical and molecular characteristics in a subset of the studies. Among 11794 CRC cases and 14190 controls, rs11676348, the susceptibility single nucleotide polymorphism (SNP) for UC, was significantly associated with reduced risk of CRC (P = 7E-05). The multivariate-adjusted odds ratio of CRC with each copy of the T allele was 0.93 (95% CI 0.89-0.96). The association of the SNP with risk of CRC differed according to mucinous histological features (P heterogeneity = 0.008). In addition, the (T) allele was associated with lower risk of tumors with Crohn's-like reaction but not tumors without such immune infiltrate (P heterogeneity = 0.02) and microsatellite instability-high (MSI-high) but not microsatellite stable or MSI-low tumors (P heterogeneity = 0.03). The minor allele (T) in SNP rs11676348, located downstream from CXCR2 that has been implicated in CRC progression, is associated with a lower risk of CRC, particularly tumors with a mucinous component, Crohn's-like reaction and MSI-high. Our findings offer the promise of risk stratification of inflammatory bowel disease patients for complications such as CRC.
Assuntos
Colite Ulcerativa/genética , Neoplasias Colorretais/genética , Doença de Crohn/genética , Instabilidade de Microssatélites , Colite Ulcerativa/complicações , Colite Ulcerativa/epidemiologia , Neoplasias Colorretais/epidemiologia , Neoplasias Colorretais/etiologia , Doença de Crohn/complicações , Doença de Crohn/epidemiologia , Frequência do Gene , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Repetições de Microssatélites/genética , Polimorfismo de Nucleotídeo Único , Risco , População BrancaRESUMO
BACKGROUND: Several regions of the genome show pleiotropic associations with multiple cancers. We sought to evaluate whether 181 single-nucleotide polymorphisms previously associated with various cancers in genome-wide association studies were also associated with melanoma risk. METHODS: We evaluated 2,131 melanoma cases and 20,353 controls from three studies in the Population Architecture using Genomics and Epidemiology (PAGE) study (EAGLE-BioVU, MEC, WHI) and two collaborating studies (HPFS, NHS). Overall and sex-stratified analyses were performed across studies. RESULTS: We observed statistically significant associations with melanoma for two lung cancer SNPs in the TERT-CLPTM1L locus (Bonferroni-corrected p<2.8x10-4), replicating known pleiotropic effects at this locus. In sex-stratified analyses, we also observed a potential male-specific association between prostate cancer risk variant rs12418451 and melanoma risk (OR=1.22, p=8.0x10-4). No other variants in our study were associated with melanoma after multiple comparisons adjustment (p>2.8e-4). CONCLUSIONS: We provide confirmatory evidence of pleiotropic associations with melanoma for two SNPs previously associated with lung cancer, and provide suggestive evidence for a male-specific association with melanoma for prostate cancer variant rs12418451. This SNP is located near TPCN2, an ion transport gene containing SNPs which have been previously associated with hair pigmentation but not melanoma risk. Previous evidence provides biological plausibility for this association, and suggests a complex interplay between ion transport, pigmentation, and melanoma risk that may vary by sex. If confirmed, these pleiotropic relationships may help elucidate shared molecular pathways between cancers and related phenotypes.