Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Nature ; 631(8021): 583-592, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38768635

RESUMO

Rare coding variants that substantially affect function provide insights into the biology of a gene1-3. However, ascertaining the frequency of such variants requires large sample sizes4-8. Here we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. In total, 23% of the Regeneron Genetics Center Million Exome (RGC-ME) data come from individuals of African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The catalogue includes more than 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss of function (LOF), we identify 3,988 LOF-intolerant genes, including 86 that were previously assessed as tolerant and 1,153 that lack established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions that are depleted of missense variants despite being tolerant of pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this resource of coding variation from the RGC-ME dataset publicly accessible through a variant allele frequency browser.


Assuntos
Exoma , Variação Genética , Proteínas , Humanos , Alelos , Exoma/genética , Sequenciamento do Exoma , Frequência do Gene , Variação Genética/genética , Heterozigoto , Mutação com Perda de Função/genética , Mutação de Sentido Incorreto/genética , Fases de Leitura Aberta/genética , Proteínas/genética , Sítios de Splice de RNA/genética , Medicina de Precisão
2.
Nature ; 622(7984): 784-793, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37821707

RESUMO

The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City1. Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using whole-genome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent.


Assuntos
Sequenciamento do Exoma , Genoma Humano , Genótipo , Hispânico ou Latino , Adulto , Humanos , África/etnologia , América/etnologia , Europa (Continente)/etnologia , Frequência do Gene/genética , Genética Populacional , Genoma Humano/genética , Técnicas de Genotipagem , Hispânico ou Latino/genética , Homozigoto , Mutação com Perda de Função/genética , México , Estudos Prospectivos
3.
Nature ; 599(7886): 628-634, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34662886

RESUMO

A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing1 to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study2. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10-11. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene-trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.


Assuntos
Bancos de Espécimes Biológicos , Bases de Dados Genéticas , Sequenciamento do Exoma , Exoma/genética , África/etnologia , Ásia/etnologia , Asma/genética , Diabetes Mellitus/genética , Europa (Continente)/etnologia , Oftalmopatias/genética , Feminino , Predisposição Genética para Doença/genética , Variação Genética , Estudo de Associação Genômica Ampla , Humanos , Hipertensão/genética , Hepatopatias/genética , Masculino , Mutação , Neoplasias/genética , Característica Quantitativa Herdável , Reino Unido
4.
Nature ; 586(7831): 749-756, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33087929

RESUMO

The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.


Assuntos
Bases de Dados Genéticas , Sequenciamento do Exoma , Exoma/genética , Mutação com Perda de Função/genética , Fenótipo , Idoso , Densidade Óssea/genética , Colágeno Tipo VI/genética , Demografia , Feminino , Genes BRCA1 , Genes BRCA2 , Genótipo , Humanos , Canais Iônicos/genética , Masculino , Pessoa de Meia-Idade , Neoplasias/genética , Penetrância , Fragmentos de Peptídeos/genética , Reino Unido , Varizes/genética , Proteínas Ativadoras de ras GTPase/genética
6.
Am J Hum Genet ; 108(7): 1350-1355, 2021 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-34115965

RESUMO

Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) causes coronavirus disease 2019 (COVID-19), a respiratory illness that can result in hospitalization or death. We used exome sequence data to investigate associations between rare genetic variants and seven COVID-19 outcomes in 586,157 individuals, including 20,952 with COVID-19. After accounting for multiple testing, we did not identify any clear associations with rare variants either exome wide or when specifically focusing on (1) 13 interferon pathway genes in which rare deleterious variants have been reported in individuals with severe COVID-19, (2) 281 genes located in susceptibility loci identified by the COVID-19 Host Genetics Initiative, or (3) 32 additional genes of immunologic relevance and/or therapeutic potential. Our analyses indicate there are no significant associations with rare protein-coding variants with detectable effect sizes at our current sample sizes. Analyses will be updated as additional data become available, and results are publicly available through the Regeneron Genetics Center COVID-19 Results Browser.


Assuntos
COVID-19/diagnóstico , COVID-19/genética , Sequenciamento do Exoma , Exoma/genética , Predisposição Genética para Doença , Hospitalização/estatística & dados numéricos , COVID-19/imunologia , COVID-19/terapia , Feminino , Humanos , Interferons/genética , Masculino , Prognóstico , SARS-CoV-2 , Tamanho da Amostra
7.
Genet Med ; 24(3): 703-711, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-34906480

RESUMO

PURPOSE: Recurrent pathogenic copy number variants (pCNVs) have large-effect impacts on brain function and represent important etiologies of neurodevelopmental psychiatric disorders (NPDs), including autism and schizophrenia. Patterns of health care utilization in adults with pCNVs have gone largely unstudied and are likely to differ in significant ways from those of children. METHODS: We compared the prevalence of NPDs and electronic health record-based medical conditions in 928 adults with 26 pCNVs to a demographically-matched cohort of pCNV-negative controls from >135,000 patient-participants in Geisinger's MyCode Community Health Initiative. We also evaluated 3 quantitative health care utilization measures (outpatient, inpatient, and emergency department visits) in both groups. RESULTS: Adults with pCNVs (24.9%) were more likely than controls (16.0%) to have a documented NPD. They had significantly higher rates of several chronic diseases, including diabetes (29.3% in participants with pCNVs vs 20.4% in participants without pCNVs) and dementia (2.2% in participants with pCNVs vs 1.0% participants without pCNVs), and twice as many annual emergency department visits. CONCLUSION: These findings highlight the potential for genetic information-specifically, pCNVs-to inform the study of health care outcomes and utilization in adults. If, as our findings suggest, adults with pCNVs have poorer health and require disproportionate health care resources, early genetic diagnosis paired with patient-centered interventions may help to anticipate problems, improve outcomes, and reduce the associated economic burden.


Assuntos
Variações do Número de Cópias de DNA , Atenção à Saúde , Adulto , Criança , Estudos de Coortes , Variações do Número de Cópias de DNA/genética , Humanos , Aceitação pelo Paciente de Cuidados de Saúde , Prevalência
8.
Am J Hum Genet ; 102(5): 874-889, 2018 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-29727688

RESUMO

Large-scale human genetics studies are ascertaining increasing proportions of populations as they continue growing in both number and scale. As a result, the amount of cryptic relatedness within these study cohorts is growing rapidly and has significant implications on downstream analyses. We demonstrate this growth empirically among the first 92,455 exomes from the DiscovEHR cohort and, via a custom simulation framework we developed called SimProgeny, show that these measures are in line with expectations given the underlying population and ascertainment approach. For example, within DiscovEHR we identified ∼66,000 close (first- and second-degree) relationships, involving 55.6% of study participants. Our simulation results project that >70% of the cohort will be involved in these close relationships, given that DiscovEHR scales to 250,000 recruited individuals. We reconstructed 12,574 pedigrees by using these relationships (including 2,192 nuclear families) and leveraged them for multiple applications. The pedigrees substantially improved the phasing accuracy of 20,947 rare, deleterious compound heterozygous mutations. Reconstructed nuclear families were critical for identifying 3,415 de novo mutations in ∼1,783 genes. Finally, we demonstrate the segregation of known and suspected disease-causing mutations, including a tandem duplication that occurs in LDLR and causes familial hypercholesterolemia, through reconstructed pedigrees. In summary, this work highlights the prevalence of cryptic relatedness expected among large healthcare population-genomic studies and demonstrates several analyses that are uniquely enabled by large amounts of cryptic relatedness.


Assuntos
Exoma/genética , Medicina de Precisão , Estudos de Coortes , Simulação por Computador , Registros Eletrônicos de Saúde , Éxons/genética , Família , Feminino , Genética Populacional , Geografia , Heterozigoto , Humanos , Masculino , Mutação/genética , Linhagem , Fenótipo , Reprodutibilidade dos Testes
9.
Bioinformatics ; 32(1): 133-5, 2016 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-26382196

RESUMO

MOTIVATION: Several algorithms exist for detecting copy number variants (CNVs) from human exome sequencing read depth, but previous tools have not been well suited for large population studies on the order of tens or hundreds of thousands of exomes. Their limitations include being difficult to integrate into automated variant-calling pipelines and being ill-suited for detecting common variants. To address these issues, we developed a new algorithm--Copy number estimation using Lattice-Aligned Mixture Models (CLAMMS)--which is highly scalable and suitable for detecting CNVs across the whole allele frequency spectrum. RESULTS: In this note, we summarize the methods and intended use-case of CLAMMS, compare it to previous algorithms and briefly describe results of validation experiments. We evaluate the adherence of CNV calls from CLAMMS and four other algorithms to Mendelian inheritance patterns on a pedigree; we compare calls from CLAMMS and other algorithms to calls from SNP genotyping arrays for a set of 3164 samples; and we use TaqMan quantitative polymerase chain reaction to validate CNVs predicted by CLAMMS at 39 loci (95% of rare variants validate; across 19 common variant loci, the mean precision and recall are 99% and 94%, respectively). In the Supplementary Materials (available at the CLAMMS Github repository), we present our methods and validation results in greater detail. AVAILABILITY AND IMPLEMENTATION: https://github.com/rgcgithub/clamms (implemented in C). CONTACT: jeffrey.reid@regeneron.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Variações do Número de Cópias de DNA/genética , Exoma/genética , Análise de Sequência de DNA/métodos , Humanos , Cadeias de Markov , Reprodutibilidade dos Testes
10.
Nucleic Acids Res ; 43(8): 3886-98, 2015 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-25813044

RESUMO

MicroRNAs (miRNAs) regulate gene expression by binding to partially complementary sequences on target mRNA transcripts, thereby causing their degradation, deadenylation, or inhibiting their translation. Genomic variants can alter miRNA regulation by modifying miRNA target sites, and multiple human disease phenotypes have been linked to such miRNA target site variants (miR-TSVs). However, systematic genome-wide identification of functional miR-TSVs is difficult due to high false positive rates; functional miRNA recognition sequences can be as short as six nucleotides, with the human genome encoding thousands of miRNAs. Furthermore, while large-scale clinical genomic data sets are becoming increasingly commonplace, existing miR-TSV prediction methods are not designed to analyze these data. Here, we present an open-source tool called SubmiRine that is designed to perform efficient miR-TSV prediction systematically on variants identified in novel clinical genomic data sets. Most importantly, SubmiRine allows for the prioritization of predicted miR-TSVs according to their relative probability of being functional. We present the results of SubmiRine using integrated clinical genomic data from a large-scale cohort study on chronic obstructive pulmonary disease (COPD), making a number of high-scoring, novel miR-TSV predictions. We also demonstrate SubmiRine's ability to predict and prioritize known miR-TSVs that have undergone experimental validation in previous studies.


Assuntos
Regiões 3' não Traduzidas , MicroRNAs/metabolismo , Software , Sítios de Ligação , Genômica , Humanos , Polimorfismo de Nucleotídeo Único , Doença Pulmonar Obstrutiva Crônica/genética
11.
BMC Evol Biol ; 14: 212, 2014 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-25281000

RESUMO

BACKGROUND: The recent expansion of whole-genome sequence data available from diverse animal lineages provides an opportunity to investigate the evolutionary origins of specific classes of human disease genes. Previous studies have observed that human disease genes are of particularly ancient origin. While this suggests that many animal species have the potential to serve as feasible models for research on genes responsible for human disease, it is unclear whether this pattern has meaningful implications and whether it prevails for every class of human disease. RESULTS: We used a comparative genomics approach encompassing a broad phylogenetic range of animals with sequenced genomes to determine the evolutionary patterns exhibited by human genes associated with different classes of disease. Our results support previous claims that most human disease genes are of ancient origin but, more importantly, we also demonstrate that several specific disease classes have a significantly large proportion of genes that emerged relatively recently within the metazoans and/or vertebrates. An independent assessment of the synonymous to non-synonymous substitution rates of human disease genes found in mammals reveals that disease classes that arose more recently also display unexpected rates of purifying selection between their mammalian and human counterparts. CONCLUSIONS: Our results reveal the heterogeneity underlying the evolutionary origins of (and selective pressures on) different classes of human disease genes. For example, some disease gene classes appear to be of uncommonly recent (i.e., vertebrate-specific) origin and, as a whole, have been evolving at a faster rate within mammals than the majority of disease classes having more ancient origins. The novel patterns that we have identified may provide new insight into cases where studies using traditional animal models were unable to produce results that translated to humans. Conversely, we note that the larger set of disease classes do have ancient origins, suggesting that many non-traditional animal models have the potential to be useful for studying many human disease genes. Taken together, these findings emphasize why model organism selection should be done on a disease-by-disease basis, with evolutionary profiles in mind.


Assuntos
Evolução Biológica , Modelos Animais de Doenças , Doença/genética , Animais , Humanos , Modelos Genéticos , Especificidade da Espécie
12.
Nat Genet ; 2024 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-39322778

RESUMO

Whole-genome sequencing (WGS), whole-exome sequencing (WES) and array genotyping with imputation (IMP) are common strategies for assessing genetic variation and its association with medically relevant phenotypes. To date, there has been no systematic empirical assessment of the yield of these approaches when applied to hundreds of thousands of samples to enable the discovery of complex trait genetic signals. Using data for 100 complex traits from 149,195 individuals in the UK Biobank, we systematically compare the relative yield of these strategies in genetic association studies. We find that WGS and WES combined with arrays and imputation (WES + IMP) have the largest association yield. Although WGS results in an approximately fivefold increase in the total number of assayed variants over WES + IMP, the number of detected signals differed by only 1% for both single-variant and gene-based association analyses. Given that WES + IMP typically results in savings of lab and computational time and resources expended per sample, we evaluate the potential benefits of applying WES + IMP to larger samples. When we extend our WES + IMP analyses to 468,169 UK Biobank individuals, we observe an approximately fourfold increase in association signals with the threefold increase in sample size. We conclude that prioritizing WES + IMP and large sample sizes rather than contemporary short-read WGS alternatives will maximize the number of discoveries in genetic association studies.

13.
Nat Genet ; 56(8): 1592-1596, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39103650

RESUMO

Coronavirus disease 2019 (COVID-19) and influenza are respiratory illnesses caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and influenza viruses, respectively. Both diseases share symptoms and clinical risk factors1, but the extent to which these conditions have a common genetic etiology is unknown. This is partly because host genetic risk factors are well characterized for COVID-19 but not for influenza, with the largest published genome-wide association studies for these conditions including >2 million individuals2 and about 1,000 individuals3-6, respectively. Shared genetic risk factors could point to targets to prevent or treat both infections. Through a genetic study of 18,334 cases with a positive test for influenza and 276,295 controls, we show that published COVID-19 risk variants are not associated with influenza. Furthermore, we discovered and replicated an association between influenza infection and noncoding variants in B3GALT5 and ST6GAL1, neither of which was associated with COVID-19. In vitro small interfering RNA knockdown of ST6GAL1-an enzyme that adds sialic acid to the cell surface, which is used for viral entry-reduced influenza infectivity by 57%. These results mirror the observation that variants that downregulate ACE2, the SARS-CoV-2 receptor, protect against COVID-19 (ref. 7). Collectively, these findings highlight downregulation of key cell surface receptors used for viral entry as treatment opportunities to prevent COVID-19 and influenza.


Assuntos
COVID-19 , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Influenza Humana , SARS-CoV-2 , Humanos , Influenza Humana/genética , Influenza Humana/epidemiologia , Influenza Humana/virologia , COVID-19/genética , COVID-19/virologia , Fatores de Risco , SARS-CoV-2/genética , Masculino , Feminino , Polimorfismo de Nucleotídeo Único , Estudos de Casos e Controles , Pessoa de Meia-Idade
14.
bioRxiv ; 2023 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-37214792

RESUMO

Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts.

15.
BMC Genomics ; 13: 714, 2012 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-23256903

RESUMO

BACKGROUND: MicroRNAs play a vital role in the regulation of gene expression and have been identified in every animal with a sequenced genome examined thus far, except for the placozoan Trichoplax. The genomic repertoires of metazoan microRNAs have become increasingly endorsed as phylogenetic characters and drivers of biological complexity. RESULTS: In this study, we report the first investigation of microRNAs in a species from the phylum Ctenophora. We use short RNA sequencing and the assembled genome of the lobate ctenophore Mnemiopsis leidyi to show that this species appears to lack any recognizable microRNAs, as well as the nuclear proteins Drosha and Pasha, which are critical to canonical microRNA biogenesis. This finding represents the first reported case of a metazoan lacking a Drosha protein. CONCLUSIONS: Recent phylogenomic analyses suggest that Mnemiopsis may be the earliest branching metazoan lineage. If this is true, then the origins of canonical microRNA biogenesis and microRNA-mediated gene regulation may postdate the last common metazoan ancestor. Alternatively, canonical microRNA functionality may have been lost independently in the lineages leading to both Mnemiopsis and the placozoan Trichoplax, suggesting that microRNA functionality was not critical until much later in metazoan evolution.


Assuntos
Ctenóforos/genética , Genômica , MicroRNAs/genética , MicroRNAs/metabolismo , Processamento Pós-Transcricional do RNA/genética , Animais , Sequência de Bases , Evolução Molecular , Loci Gênicos/genética , MicroRNAs/biossíntese
16.
Nat Genet ; 54(4): 382-392, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35241825

RESUMO

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) enters human host cells via angiotensin-converting enzyme 2 (ACE2) and causes coronavirus disease 2019 (COVID-19). Here, through a genome-wide association study, we identify a variant (rs190509934, minor allele frequency 0.2-2%) that downregulates ACE2 expression by 37% (P = 2.7 × 10-8) and reduces the risk of SARS-CoV-2 infection by 40% (odds ratio = 0.60, P = 4.5 × 10-13), providing human genetic evidence that ACE2 expression levels influence COVID-19 risk. We also replicate the associations of six previously reported risk variants, of which four were further associated with worse outcomes in individuals infected with the virus (in/near LZTFL1, MHC, DPP9 and IFNAR2). Lastly, we show that common variants define a risk score that is strongly associated with severe disease among cases and modestly improves the prediction of disease severity relative to demographic and clinical factors alone.


Assuntos
COVID-19 , Enzima de Conversão de Angiotensina 2/genética , COVID-19/genética , Estudo de Associação Genômica Ampla , Humanos , Fatores de Risco , SARS-CoV-2/genética
17.
Nat Genet ; 53(7): 1097-1103, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34017140

RESUMO

Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case-control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.


Assuntos
Biologia Computacional , Estudo de Associação Genômica Ampla , Genômica , Estudos de Casos e Controles , Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Genótipo , Humanos , Modelos Logísticos , Aprendizado de Máquina , Fenótipo , Reprodutibilidade dos Testes
18.
JAMA Psychiatry ; 77(12): 1276-1285, 2020 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-32697297

RESUMO

Importance: Population screening for medically relevant genomic variants that cause diseases such as hereditary cancer and cardiovascular disorders is increasing to facilitate early disease detection or prevention. Neuropsychiatric disorders (NPDs) are common, complex disorders with clear genetic causes; yet, access to genetic diagnosis is limited. We explored whether inclusion of NPD in population-based genomic screening programs is warranted by assessing 3 key factors: prevalence, penetrance, and personal utility. Objective: To evaluate the suitability of including pathogenic copy number variants (CNVs) associated with NPD in population screening by determining their prevalence and penetrance and exploring the personal utility of disclosing results. Design, Setting, and Participants: In this cohort study, the frequency of 31 NPD CNVs was determined in patient-participants via exome data. Associated clinical phenotypes were assessed using linked electronic health records. Nine CNVs were selected for disclosure by licensed genetic counselors, and participants' psychosocial reactions were evaluated using a mixed-methods approach. A primarily adult population receiving medical care at Geisinger, a large integrated health care system in the United States with the only population-based genomic screening program approved for medically relevant results disclosure, was included. The cohort was identified from the Geisinger MyCode Community Health Initiative. Exome and linked electronic health record data were available for this cohort, which was recruited from February 2007 to April 2017. Data were collected for the qualitative analysis April 2017 through February 2018. Analysis began February 2018 and ended December 2019. Main Outcomes and Measures: The planned outcomes of this study include (1) prevalence estimate of NPD-associated CNVs in an unselected health care system population; (2) penetrance estimate of NPD diagnoses in CNV-positive individuals; and (3) qualitative themes that describe participants' responses to receiving NPD-associated genomic results. Results: Of 90 595 participants with CNV data, a pathogenic CNV was identified in 708 (0.8%; 436 women [61.6%]; mean [SD] age, 50.04 [18.74] years). Seventy percent (n = 494) had at least 1 associated clinical symptom. Of these, 28.8% (204) of CNV-positive individuals had an NPD code in their electronic health record, compared with 13.3% (11 835 of 89 887) of CNV-negative individuals (odds ratio, 2.21; 95% CI, 1.86-2.61; P < .001); 66.4% (470) of CNV-positive individuals had a history of depression and anxiety compared with 54.6% (49 118 of 89 887) of CNV-negative individuals (odds ratio, 1.53; 95% CI, 1.31-1.80; P < .001). 16p13.11 (71 [0.078%]) and 22q11.2 (108 [0.119%]) were the most prevalent deletions and duplications, respectively. Only 5.8% of individuals (41 of 708) had a previously known genetic diagnosis. Results disclosure was completed for 141 individuals. Positive participant responses included poignant reactions to learning a medical reason for lifelong cognitive and psychiatric disabilities. Conclusions and Relevance: This study informs critical factors central to the development of population-based genomic screening programs and supports the inclusion of NPD in future designs to promote equitable access to clinically useful genomic information.


Assuntos
Variações do Número de Cópias de DNA/genética , Prestação Integrada de Cuidados de Saúde , Testes Genéticos , Programas de Rastreamento , Transtornos Mentais/genética , Transtornos Neurocognitivos/genética , Satisfação do Paciente , Penetrância , Adulto , Estudos de Coortes , Registros Eletrônicos de Saúde , Feminino , Humanos , Masculino , Programas de Rastreamento/normas , Transtornos Mentais/epidemiologia , Pessoa de Meia-Idade , Transtornos Neurocognitivos/epidemiologia , Pennsylvania/epidemiologia , Prevalência , Sequenciamento do Exoma
19.
PLoS One ; 7(9): e45474, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23049804

RESUMO

Glycosylation modifies the physicochemical properties and protein binding functions of glycoconjugates. These modifications are biosynthesized in the endoplasmic reticulum and Golgi apparatus by a series of enzymatic transformations that are under complex control. As a result, mature glycans on a given site are heterogeneous mixtures of glycoforms. This gives rise to a spectrum of adhesive properties that strongly influences interactions with binding partners and resultant biological effects. In order to understand the roles glycosylation plays in normal and disease processes, efficient structural analysis tools are necessary. In the field of glycomics, liquid chromatography/mass spectrometry (LC/MS) is used to profile the glycans present in a given sample. This technology enables comparison of glycan compositions and abundances among different biological samples, i.e. normal versus disease, normal versus mutant, etc. Manual analysis of the glycan profiling LC/MS data is extremely time-consuming and efficient software tools are needed to eliminate this bottleneck. In this work, we have developed a tool to computationally model LC/MS data to enable efficient profiling of glycans. Using LC/MS data deconvoluted by Decon2LS/DeconTools, we built a list of unique neutral masses corresponding to candidate glycan compositions summarized over their various charge states, adducts and range of elution times. Our work aims to provide confident identification of true compounds in complex data sets that are not amenable to manual interpretation. This capability is an essential part of glycomics work flows. We demonstrate this tool, GlycReSoft, using an LC/MS dataset on tissue derived heparan sulfate oligosaccharides. The software, code and a test data set are publically archived under an open source license.


Assuntos
Glicômica/métodos , Heparitina Sulfato/análise , Polissacarídeos/análise , Software , Animais , Bovinos , Cromatografia Líquida , Glicosilação , Heparina Liase/química , Heparitina Sulfato/química , Internet , Espectrometria de Massas , Polissacarídeos/química , Curva ROC
20.
PLoS One ; 6(12): e28358, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22174793

RESUMO

Detailed information about stage-specific changes in gene expression is crucial for understanding the gene regulatory networks underlying development and the various signal transduction pathways contributing to morphogenesis. Here we describe the global gene expression dynamics during early murine limb development, when cartilage, tendons, muscle, joints, vasculature and nerves are specified and the musculoskeletal system of limbs is established. We used whole-genome microarrays to identify genes with differential expression at 5 stages of limb development (E9.5 to 13.5), during fore- and hind-limb patterning. We found that the onset of limb formation is characterized by an up-regulation of transcription factors, which is followed by a massive activation of genes during E10.5 and E11.5 which levels off at later time points. Among the 3520 genes identified as significantly up-regulated in the limb, we find ~30% to be novel, dramatically expanding the repertoire of candidate genes likely to function in the limb. Hierarchical and stage-specific clustering identified expression profiles that are likely to correlate with functional programs during limb development and further characterization of these transcripts will provide new insights into specific tissue patterning processes. Here, we provide for the first time a comprehensive analysis of developmentally regulated genes during murine limb development, and provide some novel insights into the expression dynamics governing limb morphogenesis.


Assuntos
Extremidades/embriologia , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Organogênese/genética , Animais , Botões de Extremidades/anatomia & histologia , Botões de Extremidades/embriologia , Camundongos , Especificidade de Órgãos/genética , Regiões Promotoras Genéticas/genética , Fatores de Tempo , Transcriptoma/genética , Regulação para Cima/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA