Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 70
Filtrar
1.
Genome Med ; 16(1): 66, 2024 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-38741190

RESUMO

BACKGROUND: Inflammatory bowel disease (IBD) and Parkinson's disease (PD) are chronic disorders that have been suggested to share common pathophysiological processes. LRRK2 has been implicated as playing a role in both diseases. Exploring the genetic basis of the IBD-PD comorbidity through studying high-impact rare genetic variants can facilitate the identification of the novel shared genetic factors underlying this comorbidity. METHODS: We analyzed whole exomes from the BioMe BioBank and UK Biobank, and whole genomes from a cohort of 67 European patients diagnosed with both IBD and PD to examine the effects of LRRK2 missense variants on IBD, PD and their co-occurrence (IBD-PD). We performed optimized sequence kernel association test (SKAT-O) and network-based heterogeneity clustering (NHC) analyses using high-impact rare variants in the IBD-PD cohort to identify novel candidate genes, which we further prioritized by biological relatedness approaches. We conducted phenome-wide association studies (PheWAS) employing BioMe BioBank and UK Biobank whole exomes to estimate the genetic relevance of the 14 prioritized genes to IBD-PD. RESULTS: The analysis of LRRK2 missense variants revealed significant associations of the G2019S and N2081D variants with IBD-PD in addition to several other variants as potential contributors to increased or decreased IBD-PD risk. SKAT-O identified two significant genes, LRRK2 and IL10RA, and NHC identified 6 significant gene clusters that are biologically relevant to IBD-PD. We observed prominent overlaps between the enriched pathways in the known IBD, PD, and candidate IBD-PD gene sets. Additionally, we detected significantly enriched pathways unique to the IBD-PD, including MAPK signaling, LPS/IL-1 mediated inhibition of RXR function, and NAD signaling. Fourteen final candidate IBD-PD genes were prioritized by biological relatedness methods. The biological importance scores estimated by protein-protein interaction networks and pathway and ontology enrichment analyses indicated the involvement of genes related to immunity, inflammation, and autophagy in IBD-PD. Additionally, PheWAS provided support for the associations of candidate genes with IBD and PD. CONCLUSIONS: Our study confirms and uncovers new LRRK2 associations in IBD-PD. The identification of novel inflammation and autophagy-related genes supports and expands previous findings related to IBD-PD pathogenesis, and underscores the significance of therapeutic interventions for reducing systemic inflammation.


Assuntos
Comorbidade , Predisposição Genética para Doença , Doenças Inflamatórias Intestinais , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina , Doença de Parkinson , Humanos , Doença de Parkinson/genética , Doenças Inflamatórias Intestinais/genética , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/genética , Feminino , Masculino , Mutação de Sentido Incorreto , Estudo de Associação Genômica Ampla , Variação Genética , Pessoa de Meia-Idade , Idoso
2.
Genome Med ; 15(1): 103, 2023 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-38037155

RESUMO

Gain-of-function (GOF) variants give rise to increased/novel protein functions whereas loss-of-function (LOF) variants lead to diminished protein function. Experimental approaches for identifying GOF and LOF are generally slow and costly, whilst available computational methods have not been optimized to discriminate between GOF and LOF variants. We have developed LoGoFunc, a machine learning method for predicting pathogenic GOF, pathogenic LOF, and neutral genetic variants, trained on a broad range of gene-, protein-, and variant-level features describing diverse biological characteristics. LoGoFunc outperforms other tools trained solely to predict pathogenicity for identifying pathogenic GOF and LOF variants and is available at https://itanlab.shinyapps.io/goflof/ .


Assuntos
Genoma , Proteínas , Humanos , Aprendizado de Máquina
3.
Proc Natl Acad Sci U S A ; 120(46): e2314225120, 2023 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-37931111

RESUMO

Human genetic variants that introduce an AG into the intronic region between the branchpoint (BP) and the canonical splice acceptor site (ACC) of protein-coding genes can disrupt pre-mRNA splicing. Using our genome-wide BP database, we delineated the BP-ACC segments of all human introns and found extreme depletion of AG/YAG in the [BP+8, ACC-4] high-risk region. We developed AGAIN as a genome-wide computational approach to systematically and precisely pinpoint intronic AG-gain variants within the BP-ACC regions. AGAIN identified 350 AG-gain variants from the Human Gene Mutation Database, all of which alter splicing and cause disease. Among them, 74% created new acceptor sites, whereas 31% resulted in complete exon skipping. AGAIN also predicts the protein-level products resulting from these two consequences. We performed AGAIN on our exome/genomes database of patients with severe infectious diseases but without known genetic etiology and identified a private homozygous intronic AG-gain variant in the antimycobacterial gene SPPL2A in a patient with mycobacterial disease. AGAIN also predicts a retention of six intronic nucleotides that encode an in-frame stop codon, turning AG-gain into stop-gain. This allele was then confirmed experimentally to lead to loss of function by disrupting splicing. We further showed that AG-gain variants inside the high-risk region led to misspliced products, while those outside the region did not, by two case studies in genes STAT1 and IRF7. We finally evaluated AGAIN on our 14 paired exome-RNAseq samples and found that 82% of AG-gain variants in high-risk regions showed evidence of missplicing. AGAIN is publicly available from https://hgidsoft.rockefeller.edu/AGAIN and https://github.com/casanova-lab/AGAIN.


Assuntos
Sítios de Splice de RNA , Splicing de RNA , Humanos , Íntrons , Mutação , Genoma
4.
Science ; 380(6648): 913-924, 2023 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-37262173

RESUMO

Comparative analysis of primate genomes within a phylogenetic context is essential for understanding the evolution of human genetic architecture and primate diversity. We present such a study of 50 primate species spanning 38 genera and 14 families, including 27 genomes first reported here, with many from previously less well represented groups, the New World monkeys and the Strepsirrhini. Our analyses reveal heterogeneous rates of genomic rearrangement and gene evolution across primate lineages. Thousands of genes under positive selection in different lineages play roles in the nervous, skeletal, and digestive systems and may have contributed to primate innovations and adaptations. Our study reveals that many key genomic innovations occurred in the Simiiformes ancestral node and may have had an impact on the adaptive radiation of the Simiiformes and human evolution.


Assuntos
Evolução Molecular , Primatas , Animais , Humanos , Genoma , Genômica , Filogenia , Primatas/anatomia & histologia , Primatas/classificação , Primatas/genética , Rearranjo Gênico , Encéfalo/anatomia & histologia
5.
Nat Commun ; 14(1): 2256, 2023 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-37080976

RESUMO

Inflammatory bowel disease (IBD) is a group of chronic digestive tract inflammatory conditions whose genetic etiology is still poorly understood. The incidence of IBD is particularly high among Ashkenazi Jews. Here, we identify 8 novel and plausible IBD-causing genes from the exomes of 4453 genetically identified Ashkenazi Jewish IBD cases (1734) and controls (2719). Various biological pathway analyses are performed, along with bulk and single-cell RNA sequencing, to demonstrate the likely physiological relatedness of the novel genes to IBD. Importantly, we demonstrate that the rare and high impact genetic architecture of Ashkenazi Jewish adult IBD displays significant overlap with very early onset-IBD genetics. Moreover, by performing biobank phenome-wide analyses, we find that IBD genes have pleiotropic effects that involve other immune responses. Finally, we show that polygenic risk score analyses based on genome-wide high impact variants have high power to predict IBD susceptibility.


Assuntos
Doenças Inflamatórias Intestinais , Judeus , Adulto , Humanos , Judeus/genética , Exoma/genética , Doenças Inflamatórias Intestinais/genética , Medição de Risco , Predisposição Genética para Doença
6.
Hum Genet ; 142(2): 245-274, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36344696

RESUMO

Whilst DNA repeat expansions cause numerous heritable human disorders, their origins and underlying pathological mechanisms are often unclear. We collated a dataset comprising 224 human repeat expansions encompassing 203 different genes, and performed a systematic analysis with respect to key topological features at the DNA, RNA and protein levels. Comparison with controls without known pathogenicity and genomic regions lacking repeats, allowed the construction of the first tool to discriminate repeat regions harboring pathogenic repeat expansions (DPREx). At the DNA level, pathogenic repeat expansions exhibited stronger signals for DNA regulatory factors (e.g. H3K4me3, transcription factor-binding sites) in exons, promoters, 5'UTRs and 5'genes but were not significantly different from controls in introns, 3'UTRs and 3'genes. Additionally, pathogenic repeat expansions were also found to be enriched in non-B DNA structures. At the RNA level, pathogenic repeat expansions were characterized by lower free energy for forming RNA secondary structure and were closer to splice sites in introns, exons, promoters and 5'genes than controls. At the protein level, pathogenic repeat expansions exhibited a preference to form coil rather than other types of secondary structure, and tended to encode surface-located protein domains. Guided by these features, DPREx ( http://biomed.nscc-gz.cn/zhaolab/geneprediction/# ) achieved an Area Under the Curve (AUC) value of 0.88 in a test on an independent dataset. Pathogenic repeat expansions are thus located such that they exert a synergistic influence on the gene expression pathway involving inter-molecular connections at the DNA, RNA and protein levels.


Assuntos
Expansão das Repetições de DNA , DNA , Humanos , Íntrons/genética , RNA , Expansão das Repetições de Trinucleotídeos
7.
Hum Genet ; 142(2): 275-288, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36352240

RESUMO

Epilepsy (EP) and congenital heart disease (CHD) are two apparently unrelated diseases that nevertheless display substantial mutual comorbidity. Thus, while congenital heart defects are associated with an elevated risk of developing epilepsy, the incidence of epilepsy in CHD patients correlates with CHD severity. Although genetic determinants have been postulated to underlie the comorbidity of EP and CHD, the precise genetic etiology is unknown. We performed variant and gene association analyses on EP and CHD patients separately, using whole exomes of genetically identified Europeans from the UK Biobank and Mount Sinai BioMe Biobank. We prioritized biologically plausible candidate genes and investigated the enriched pathways and other identified comorbidities by biological proximity calculation, pathway analyses, and gene-level phenome-wide association studies. Our variant- and gene-level results point to the Voltage-Gated Calcium Channels (VGCC) pathway as being a unifying framework for EP and CHD comorbidity. Additionally, pathway-level analyses indicated that the functions of disease-associated genes partially overlap between the two disease entities. Finally, phenome-wide association analyses of prioritized candidate genes revealed that cerebral blood flow and ulcerative colitis constitute the two main traits associated with both EP and CHD.


Assuntos
Epilepsia , Cardiopatias Congênitas , Humanos , População Europeia , Cardiopatias Congênitas/genética , Epilepsia/epidemiologia , Epilepsia/genética , Estudos de Associação Genética , Fenótipo
8.
Proc Natl Acad Sci U S A ; 119(44): e2211194119, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36306325

RESUMO

Pre-messenger RNA splicing is initiated with the recognition of a single-nucleotide intronic branchpoint (BP) within a BP motif by spliceosome elements. Forty-eight rare variants in 43 human genes have been reported to alter splicing and cause disease by disrupting BP. However, until now, no computational approach was available to efficiently detect such variants in massively parallel sequencing data. We established a comprehensive human genome-wide BP database by integrating existing BP data and generating new BP data from RNA sequencing of lariat debranching enzyme DBR1-mutated patients and from machine-learning predictions. We characterized multiple features of BP in major and minor introns and found that BP and BP-2 (two nucleotides upstream of BP) positions exhibit a lower rate of variation in human populations and higher evolutionary conservation than the intronic background, while being comparable to the exonic background. We developed BPHunter as a genome-wide computational approach to systematically and efficiently detect intronic variants that may disrupt BP recognition. BPHunter retrospectively identified 40 of the 48 known pathogenic BP variants, in which we summarized a strategy for prioritizing BP variant candidates. The remaining eight variants all create AG-dinucleotides between the BP and acceptor site, which is the likely reason for missplicing. We demonstrated the practical utility of BPHunter prospectively by using it to identify a novel germline heterozygous BP variant of STAT2 in a patient with critical COVID-19 pneumonia and a novel somatic intronic 59-nucleotide deletion of ITPKB in a lymphoma patient, both of which were validated experimentally. BPHunter is publicly available from https://hgidsoft.rockefeller.edu/BPHunter and https://github.com/casanova-lab/BPHunter.


Assuntos
COVID-19 , Humanos , Íntrons/genética , Estudos Retrospectivos , COVID-19/genética , Splicing de RNA/genética , Nucleotídeos
9.
Genome Med ; 14(1): 81, 2022 07 29.
Artigo em Inglês | MEDLINE | ID: mdl-35906703

RESUMO

Stopgain substitutions are the third-largest class of monogenic human disease mutations and often examined first in patient exomes. Existing computational stopgain pathogenicity predictors, however, exhibit poor performance at the high sensitivity required for clinical use. Here, we introduce a new classifier, termed X-CAP, which uses a novel training methodology and unique feature set to improve the AUROC by 18% and decrease the false-positive rate 4-fold on large variant databases. In patient exomes, X-CAP prioritizes causal stopgains better than existing methods do, further illustrating its clinical utility. X-CAP is available at https://github.com/bejerano-lab/X-CAP .


Assuntos
Exoma , Software , Biologia Computacional/métodos , Humanos , Mutação , Mutação de Sentido Incorreto , Virulência
10.
Am J Hum Genet ; 109(3): 457-470, 2022 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-35120630

RESUMO

We used a machine learning approach to analyze the within-gene distribution of missense variants observed in hereditary conditions and cancer. When applied to 840 genes from the ClinVar database, this approach detected a significant non-random distribution of pathogenic and benign variants in 387 (46%) and 172 (20%) genes, respectively, revealing that variant clustering is widespread across the human exome. This clustering likely occurs as a consequence of mechanisms shaping pathogenicity at the protein level, as illustrated by the overlap of some clusters with known functional domains. We then took advantage of these findings to develop a pathogenicity predictor, MutScore, that integrates qualitative features of DNA substitutions with the new additional information derived from this positional clustering. Using a random forest approach, MutScore was able to identify pathogenic missense mutations with very high accuracy, outperforming existing predictive tools, especially for variants associated with autosomal-dominant disease and cancer. Thus, the within-gene clustering of pathogenic and benign DNA changes is an important and previously underappreciated feature of the human exome, which can be harnessed to improve the prediction of pathogenicity and disambiguation of DNA variants of uncertain significance.


Assuntos
Genoma Humano , Mutação de Sentido Incorreto , Análise por Conglomerados , Exoma/genética , Genoma Humano/genética , Humanos , Mutação de Sentido Incorreto/genética , Virulência
11.
Hum Mutat ; 43(3): 328-346, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-34918412

RESUMO

Microdeletions and gross deletions are important causes (~20%) of human inherited disease and their genomic locations are strongly influenced by the local DNA sequence environment. This notwithstanding, no study has systematically examined their underlying generative mechanisms. Here, we obtained 42,098 pathogenic microdeletions and gross deletions from the Human Gene Mutation Database (HGMD) that together form a continuum of germline deletions ranging in size from 1 to 28,394,429 bp. We analyzed the DNA sequence within 1 kb of the breakpoint junctions and found that the frequencies of non-B DNA-forming repeats, GC-content, and the presence of seven of 78 specific sequence motifs in the vicinity of pathogenic deletions correlated with deletion length for deletions of length ≤30 bp. Further, we found that the presence of DR, GQ, and STR repeats is important for the formation of longer deletions (>30 bp) but not for the formation of shorter deletions (≤30 bp) while significantly (χ2 , p < 2E-16) more microhomologies were identified flanking short deletions than long deletions (length >30 bp). We provide evidence to support a functional distinction between microdeletions and gross deletions. Finally, we propose that a deletion length cut-off of 25-30 bp may serve as an objective means to functionally distinguish microdeletions from gross deletions.


Assuntos
DNA , Genoma Humano , Composição de Bases , Sequência de Bases , DNA/genética , Genoma Humano/genética , Humanos , Mutação , Deleção de Sequência
12.
Am J Hum Genet ; 108(12): 2301-2318, 2021 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-34762822

RESUMO

Identifying whether a given genetic mutation results in a gene product with increased (gain-of-function; GOF) or diminished (loss-of-function; LOF) activity is an important step toward understanding disease mechanisms because they may result in markedly different clinical phenotypes. Here, we generated an extensive database of documented germline GOF and LOF pathogenic variants by employing natural language processing (NLP) on the available abstracts in the Human Gene Mutation Database. We then investigated various gene- and protein-level features of GOF and LOF variants and applied machine learning and statistical analyses to identify discriminative features. We found that GOF variants were enriched in essential genes, for autosomal-dominant inheritance, and in protein binding and interaction domains, whereas LOF variants were enriched in singleton genes, for protein-truncating variants, and in protein core regions. We developed a user-friendly web-based interface that enables the extraction of selected subsets from the GOF/LOF database by a broad set of annotated features and downloading of up-to-date versions. These results improve our understanding of how variants affect gene/protein function and may ultimately guide future treatment options.


Assuntos
Bases de Dados Genéticas , Mutação com Ganho de Função , Mutação com Perda de Função , Proteínas/genética , Computação em Nuvem , Predisposição Genética para Doença , Genoma Humano , Mutação em Linhagem Germinativa , Humanos , Intervenção Baseada em Internet , Aprendizado de Máquina
13.
Proc Natl Acad Sci U S A ; 118(36)2021 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-34426522

RESUMO

The construction of population-based variomes has contributed substantially to our understanding of the genetic basis of human inherited disease. Here, we investigated the genetic structure of Turkey from 3,362 unrelated subjects whose whole exomes (n = 2,589) or whole genomes (n = 773) were sequenced to generate a Turkish (TR) Variome that should serve to facilitate disease gene discovery in Turkey. Consistent with the history of present-day Turkey as a crossroads between Europe and Asia, we found extensive admixture between Balkan, Caucasus, Middle Eastern, and European populations with a closer genetic relationship of the TR population to Europeans than hitherto appreciated. We determined that 50% of TR individuals had high inbreeding coefficients (≥0.0156) with runs of homozygosity longer than 4 Mb being found exclusively in the TR population when compared to 1000 Genomes Project populations. We also found that 28% of exome and 49% of genome variants in the very rare range (allele frequency < 0.005) are unique to the modern TR population. We annotated these variants based on their functional consequences to establish a TR Variome containing alleles of potential medical relevance, a repository of homozygous loss-of-function variants and a TR reference panel for genotype imputation using high-quality haplotypes, to facilitate genome-wide association studies. In addition to providing information on the genetic structure of the modern TR population, these data provide an invaluable resource for future studies to identify variants that are associated with specific phenotypes as well as establishing the phenotypic consequences of mutations in specific genes.


Assuntos
Variação Genética/genética , Genoma Humano/genética , Alelos , Consanguinidade , Exoma , Frequência do Gene/genética , Deriva Genética , Genética Populacional/métodos , Estudo de Associação Genômica Ampla/métodos , Genótipo , Haplótipos/genética , Migração Humana/tendências , Humanos , Turquia/etnologia , Sequenciamento do Exoma/métodos
14.
Hum Genet ; 140(9): 1329-1342, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34173867

RESUMO

A non-negligible proportion of human pathogenic variants are known to be present as wild type in at least some non-human mammalian species. The standard explanation for this finding is that molecular mechanisms of compensatory epistasis can alleviate the mutations' otherwise pathogenic effects. Examples of compensated variants have been described in the literature but the interacting residue(s) postulated to play a compensatory role have rarely been ascertained. In this study, the examination of five human X-chromosomally encoded proteins (FIX, GLA, HPRT1, NDP and OTC) allowed us to identify several candidate compensated variants. Strong evidence for a compensated/compensatory pair of amino acids in the coagulation FIXa protein (involving residues 270 and 271) was found in a variety of mammalian species. Both amino acid residues are located within the 60-loop, spatially close to the 39-loop that performs a key role in coagulation serine proteases. To understand the nature of the underlying interactions, molecular dynamics simulations were performed. The predicted conformational change in the 39-loop consequent to the Glu270Lys substitution (associated with hemophilia B) appears to impair the protein's interaction with its substrate but, importantly, such steric hindrance is largely mitigated in those proteins that carry the compensatory residue (Pro271) at the neighboring amino acid position.


Assuntos
Cromossomos Humanos X/genética , Epistasia Genética , Fator IXa , Simulação de Dinâmica Molecular , Mutação de Sentido Incorreto , Substituição de Aminoácidos , Fator IXa/química , Fator IXa/genética , Humanos
15.
Proc Natl Acad Sci U S A ; 117(24): 13626-13636, 2020 06 16.
Artigo em Inglês | MEDLINE | ID: mdl-32487729

RESUMO

Humans homozygous or hemizygous for variants predicted to cause a loss of function (LoF) of the corresponding protein do not necessarily present with overt clinical phenotypes. We report here 190 autosomal genes with 207 predicted LoF variants, for which the frequency of homozygous individuals exceeds 1% in at least one human population from five major ancestry groups. No such genes were identified on the X and Y chromosomes. Manual curation revealed that 28 variants (15%) had been misannotated as LoF. Of the 179 remaining variants in 166 genes, only 11 alleles in 11 genes had previously been confirmed experimentally to be LoF. The set of 166 dispensable genes was enriched in olfactory receptor genes (41 genes). The 41 dispensable olfactory receptor genes displayed a relaxation of selective constraints similar to that observed for other olfactory receptor genes. The 125 dispensable nonolfactory receptor genes also displayed a relaxation of selective constraints consistent with greater redundancy. Sixty-two of these 125 genes were found to be dispensable in at least three human populations, suggesting possible evolution toward pseudogenes. Of the 179 LoF variants, 68 could be tested for two neutrality statistics, and 8 displayed robust signals of positive selection. These latter variants included a known FUT2 variant that confers resistance to intestinal viruses, and an APOL3 variant involved in resistance to parasitic infections. Overall, the identification of 166 genes for which a sizeable proportion of humans are homozygous for predicted LoF alleles reveals both redundancies and advantages of such deficiencies for human survival.


Assuntos
Genética Humana , Mutação com Perda de Função , Alelos , Apolipoproteínas L/genética , Fucosiltransferases/genética , Variação Genética , Homozigoto , Humanos , Proteínas/genética , Cromossomos Sexuais/genética , Galactosídeo 2-alfa-L-Fucosiltransferase
16.
Hum Genet ; 139(10): 1197-1207, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32596782

RESUMO

The Human Gene Mutation Database (HGMD®) constitutes a comprehensive collection of published germline mutations in nuclear genes that are thought to underlie, or are closely associated with human inherited disease. At the time of writing (June 2020), the database contains in excess of 289,000 different gene lesions identified in over 11,100 genes manually curated from 72,987 articles published in over 3100 peer-reviewed journals. There are primarily two main groups of users who utilise HGMD on a regular basis; research scientists and clinical diagnosticians. This review aims to highlight how to make the most out of HGMD data in each setting.


Assuntos
Bases de Dados Genéticas , Genoma Humano , Mutação em Linhagem Germinativa , Polimorfismo Genético , Bibliometria , Pesquisa Biomédica/métodos , Predisposição Genética para Doença , Humanos , Parcerias Público-Privadas
17.
Sci Transl Med ; 12(544)2020 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32434849

RESUMO

The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient's disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process. AMELIE parses all 29 million PubMed abstracts and downloads and further parses hundreds of thousands of full-text articles in search of information supporting the causality and associated phenotypes of most published genetic variants. AMELIE then prioritizes patient candidate variants for their likelihood of explaining any patient's given set of phenotypes. Diagnosis of singleton patients (without relatives' exomes) is the most time-consuming scenario, and AMELIE ranked the causative gene at the very top for 66% of 215 diagnosed singleton Mendelian patients from the Deciphering Developmental Disorders project. Evaluating only the top 11 AMELIE-scored genes of 127 (median) candidate genes per patient resulted in a rapid diagnosis in more than 90% of cases. AMELIE-based evaluation of all cases was 3 to 19 times more efficient than hand-curated database-based approaches. We replicated these results on a retrospective cohort of clinical cases from Stanford Children's Health and the Manton Center for Orphan Disease Research. An analysis web portal with our most recent update, programmatic interface, and code is available at AMELIE.stanford.edu.


Assuntos
Exoma , Criança , Genótipo , Humanos , Fenótipo , Probabilidade , Estudos Retrospectivos
18.
Genet Med ; 22(2): 362-370, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31467448

RESUMO

PURPOSE: Both monogenic pathogenic variant cataloging and clinical patient diagnosis start with variant-level evidence retrieval followed by expert evidence integration in search of diagnostic variants and genes. Here, we try to accelerate pathogenic variant evidence retrieval by an automatic approach. METHODS: Automatic VAriant evidence DAtabase (AVADA) is a novel machine learning tool that uses natural language processing to automatically identify pathogenic genetic variant evidence in full-text primary literature about monogenic disease and convert it to genomic coordinates. RESULTS: AVADA automatically retrieved almost 60% of likely disease-causing variants deposited in the Human Gene Mutation Database (HGMD), a 4.4-fold improvement over the current best open source automated variant extractor. AVADA contains over 60,000 likely disease-causing variants that are in HGMD but not in ClinVar. AVADA also highlights the challenges of automated variant mapping and pathogenicity curation. However, when combined with manual validation, on 245 diagnosed patients, AVADA provides valuable evidence for an additional 18 diagnostic variants, on top of ClinVar's 21, versus only 2 using the best current automated approach. CONCLUSION: AVADA advances automated retrieval of pathogenic monogenic variant evidence from full-text literature. Far from perfect, but much faster than PubMed/Google Scholar search, careful curation of AVADA-retrieved evidence can aid both database curation and patient diagnosis.


Assuntos
Processamento Eletrônico de Dados/métodos , Genômica/métodos , Armazenamento e Recuperação da Informação/métodos , Gerenciamento de Dados/métodos , Bases de Dados Factuais , Bases de Dados Genéticas , Humanos , Processamento de Linguagem Natural , PubMed , Publicações
19.
Nat Commun ; 10(1): 4141, 2019 09 12.
Artigo em Inglês | MEDLINE | ID: mdl-31515488

RESUMO

Each human genome carries tens of thousands of coding variants. The extent to which this variation is functional and the mechanisms by which they exert their influence remains largely unexplored. To address this gap, we leverage the ExAC database of 60,706 human exomes to investigate experimentally the impact of 2009 missense single nucleotide variants (SNVs) across 2185 protein-protein interactions, generating interaction profiles for 4797 SNV-interaction pairs, of which 421 SNVs segregate at > 1% allele frequency in human populations. We find that interaction-disruptive SNVs are prevalent at both rare and common allele frequencies. Furthermore, these results suggest that 10.5% of missense variants carried per individual are disruptive, a higher proportion than previously reported; this indicates that each individual's genetic makeup may be significantly more complex than expected. Finally, we demonstrate that candidate disease-associated mutations can be identified through shared interaction perturbations between variants of interest and known disease mutations.


Assuntos
Frequência do Gene/genética , Variação Genética , Genética Populacional , Alelos , Animais , Sequência de Bases , Doença/genética , Predisposição Genética para Doença , Genoma Humano , Células HEK293 , Humanos , Camundongos , Mutação de Sentido Incorreto/genética , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Ligação Proteica/genética
20.
Nucleic Acids Res ; 47(W1): W623-W631, 2019 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-31045209

RESUMO

Human whole-genome-sequencing reveals about 4 000 000 genomic variants per individual. These data are mostly stored as VCF-format files. Although many variant analysis methods accept VCF as input, many other tools require DNA or protein sequences, particularly for splicing prediction, sequence alignment, phylogenetic analysis, and structure prediction. However, there is no existing webserver capable of extracting DNA/protein sequences for genomic variants from VCF files in a user-friendly and efficient manner. We developed the SeqTailor webserver to bridge this gap, by enabling rapid extraction of (i) DNA sequences around genomic variants, with customizable window sizes and options to annotate the splice sites closest to the variants and to consider the neighboring variants within the window; and (ii) protein sequences encoded by the DNA sequences around genomic variants, with built-in SnpEff annotator and customizable window sizes. SeqTailor supports 11 species, including: human (GRCh37/GRCh38), chimpanzee, mouse, rat, cow, chicken, lizard, zebrafish, fruitfly, Arabidopsis and rice. Standalone programs are provided for command-line-based needs. SeqTailor streamlines the sequence extraction process, and accelerates the analysis of genomic variants with software requiring DNA/protein sequences. It will facilitate the study of genomic variation, by increasing the feasibility of sequence-based analysis and prediction. The SeqTailor webserver is freely available at http://shiva.rockefeller.edu/SeqTailor/.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Software , Animais , Bovinos , Variação Genética , Humanos , Mutação INDEL , Internet , Camundongos , Ratos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...