Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 93
Filtrar
1.
bioRxiv ; 2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38853852

RESUMO

Genome-wide association studies (GWAS) with proteomics are essential tools for drug discovery. To date, most studies have used affinity proteomics platforms, which have limited discovery to protein panels covered by the available affinity binders. Furthermore, it is not clear to which extent protein epitope changing variants interfere with the detection of protein quantitative trait loci (pQTLs). Mass spectrometry-based (MS) proteomics can overcome some of these limitations. Here we report a GWAS using the MS-based Seer Proteograph™ platform with blood samples from a discovery cohort of 1,260 American participants and a replication in 325 individuals from Asia, with diverse ethnic backgrounds. We analysed 1,980 proteins quantified in at least 80% of the samples, out of 5,753 proteins quantified across the discovery cohort. We identified 252 and replicated 90 pQTLs, where 30 of the replicated pQTLs have not been reported before. We further investigated 200 of the strongest associated cis-pQTLs previously identified using the SOMAscan and the Olink platforms and found that up to one third of the affinity proteomics pQTLs may be affected by epitope effects, while another third were confirmed by MS proteomics to be consistent with the hypothesis that genetic variants induce changes in protein expression. The present study demonstrates the complementarity of the different proteomics approaches and reports pQTLs not accessible to affinity proteomics, suggesting that many more pQTLs remain to be discovered using MS-based platforms.

2.
Nat Commun ; 15(1): 989, 2024 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-38307861

RESUMO

Proteogenomics studies generate hypotheses on protein function and provide genetic evidence for drug target prioritization. Most previous work has been conducted using affinity-based proteomics approaches. These technologies face challenges, such as uncertainty regarding target identity, non-specific binding, and handling of variants that affect epitope affinity binding. Mass spectrometry-based proteomics can overcome some of these challenges. Here we report a pQTL study using the Proteograph™ Product Suite workflow (Seer, Inc.) where we quantify over 18,000 unique peptides from nearly 3000 proteins in more than 320 blood samples from a multi-ethnic cohort in a bottom-up, peptide-centric, mass spectrometry-based proteomics approach. We identify 184 protein-altering variants in 137 genes that are significantly associated with their corresponding variant peptides, confirming target specificity of co-associated affinity binders, identifying putatively causal cis-encoded proteins and providing experimental evidence for their presence in blood, including proteins that may be inaccessible to affinity-based proteomics.


Assuntos
Proteogenômica , Proteômica , Humanos , Proteômica/métodos , Espectrometria de Massas/métodos , Proteínas/análise , Peptídeos/análise , Proteogenômica/métodos , Proteínas Mutantes
3.
bioRxiv ; 2024 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-38260620

RESUMO

Alzheimer's disease (AD) and related dementias (ADRD) is a complex disease with multiple pathophysiological drivers that determine clinical symptomology and disease progression. These diseases develop insidiously over time, through many pathways and disease mechanisms and continue to have a huge societal impact for affected individuals and their families. While emerging blood-based biomarkers, such as plasma p-tau181 and p-tau217, accurately detect Alzheimer neuropthology and are associated with faster cognitive decline, the full extension of plasma proteomic changes in ADRD remains unknown. Earlier detection and better classification of the different subtypes may provide opportunities for earlier, more targeted interventions, and perhaps a higher likelihood of successful therapeutic development. In this study, we aim to leverage unbiased mass spectrometry proteomics to identify novel, blood-based biomarkers associated with cognitive decline. 1,786 plasma samples from 1,005 patients were collected over 12 years from partcipants in the Massachusetts Alzheimer's Disease Research Center Longitudinal Cohort Study. Patient metadata includes demographics, final diagnoses, and clinical dementia rating (CDR) scores taken concurrently. The Proteograph™ Product Suite (Seer, Inc.) and liquid-chromatography mass-spectrometry (LC-MS) analysis were used to process the plasma samples in this cohort and generate unbiased proteomics data. Data-independent acquisition (DIA) mass spectrometry results yielded 36,259 peptides and 4,007 protein groups. Linear mixed effects models revealed 138 differentially abundant proteins between AD and healthy controls. Machine learning classification models for AD diagnosis identified potential candidate biomarkers including MBP, BGLAP, and APoD. Cox regression models were created to determine the association of proteins with disease progression and suggest CLNS1A, CRISPLD2, and GOLPH3 as targets of further investigation as potential biomarkers. The Proteograph workflow provided deep, unbiased coverage of the plasma proteome at a speed that enabled a cohort study of almost 1,800 samples, which is the largest, deep, unbiased proteomics study of ADRD conducted to date.

4.
bioRxiv ; 2023 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-37693476

RESUMO

Background: The wide dynamic range of circulating proteins coupled with the diversity of proteoforms present in plasma has historically impeded comprehensive and quantitative characterization of the plasma proteome at scale. Automated nanoparticle (NP) protein corona-based proteomics workflows can efficiently compress the dynamic range of protein abundances into a mass spectrometry (MS)-accessible detection range. This enhances the depth and scalability of quantitative MS-based methods, which can elucidate the molecular mechanisms of biological processes, discover new protein biomarkers, and improve comprehensiveness of MS-based diagnostics. Methods: Investigating multi-species spike-in experiments and a cohort, we investigated fold-change accuracy, linearity, precision, and statistical power for the using the Proteograph™ Product Suite, a deep plasma proteomics workflow, in conjunction with multiple MS instruments. Results: We show that NP-based workflows enable accurate identification (false discovery rate of 1%) of more than 6,000 proteins from plasma (Orbitrap Astral) and, compared to a gold standard neat plasma workflow that is limited to the detection of hundreds of plasma proteins, facilitate quantification of more proteins with accurate fold-changes, high linearity, and precision. Furthermore, we demonstrate high statistical power for the discovery of biomarkers in small- and large-scale cohorts. Conclusions: The automated NP workflow enables high-throughput, deep, and quantitative plasma proteomics investigation with sufficient power to discover new biomarker signatures with a peptide level resolution.

5.
Science ; 380(6648): eabn8153, 2023 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-37262156

RESUMO

Personalized genome sequencing has revealed millions of genetic differences between individuals, but our understanding of their clinical relevance remains largely incomplete. To systematically decipher the effects of human genetic variants, we obtained whole-genome sequencing data for 809 individuals from 233 primate species and identified 4.3 million common protein-altering variants with orthologs in humans. We show that these variants can be inferred to have nondeleterious effects in humans based on their presence at high allele frequencies in other primate populations. We use this resource to classify 6% of all possible human protein-altering variants as likely benign and impute the pathogenicity of the remaining 94% of variants with deep learning, achieving state-of-the-art accuracy for diagnosing pathogenic variants in patients with genetic diseases.


Assuntos
Variação Genética , Primatas , Animais , Humanos , Sequência de Bases , Frequência do Gene , Primatas/genética , Sequenciamento Completo do Genoma
6.
bioRxiv ; 2023 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-37205491

RESUMO

Personalized genome sequencing has revealed millions of genetic differences between individuals, but our understanding of their clinical relevance remains largely incomplete. To systematically decipher the effects of human genetic variants, we obtained whole genome sequencing data for 809 individuals from 233 primate species, and identified 4.3 million common protein-altering variants with orthologs in human. We show that these variants can be inferred to have non-deleterious effects in human based on their presence at high allele frequencies in other primate populations. We use this resource to classify 6% of all possible human protein-altering variants as likely benign and impute the pathogenicity of the remaining 94% of variants with deep learning, achieving state-of-the-art accuracy for diagnosing pathogenic variants in patients with genetic diseases. One Sentence Summary: Deep learning classifier trained on 4.3 million common primate missense variants predicts variant pathogenicity in humans.

7.
PLoS One ; 18(3): e0282821, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36989217

RESUMO

Advancements in deep plasma proteomics are enabling high-resolution measurement of plasma proteoforms, which may reveal a rich source of novel biomarkers previously concealed by aggregated protein methods. Here, we analyze 188 plasma proteomes from non-small cell lung cancer subjects (NSCLC) and controls to identify NSCLC-associated protein isoforms by examining differentially abundant peptides as a proxy for isoform-specific exon usage. We find four proteins comprised of peptides with opposite patterns of abundance between cancer and control subjects. One of these proteins, BMP1, has known isoforms that can explain this differential pattern, for which the abundance of the NSCLC-associated isoform increases with stage of NSCLC progression. The presence of cancer and control-associated isoforms suggests differential regulation of BMP1 isoforms. The identified BMP1 isoforms have known functional differences, which may reveal insights into mechanisms impacting NSCLC disease progression.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Carcinoma Pulmonar de Células não Pequenas/metabolismo , Neoplasias Pulmonares/metabolismo , Biomarcadores Tumorais/metabolismo , Isoformas de Proteínas/metabolismo , Peptídeos , Proteína Morfogenética Óssea 1
8.
Adv Mater ; 34(44): e2206008, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-35986672

RESUMO

Introducing engineered nanoparticles (NPs) into a biofluid such as blood plasma leads to the formation of a selective and reproducible protein corona at the particle-protein interface, driven by the relationship between protein-NP affinity and protein abundance. This enables scalable systems that leverage protein-nano interactions to overcome current limitations of deep plasma proteomics in large cohorts. Here the importance of the protein to NP-surface ratio (P/NP) is demonstrated and protein corona formation dynamics are modeled, which determine the competition between proteins for binding. Tuning the P/NP ratio significantly modulates the protein corona composition, enhancing depth and precision of a fully automated NP-based deep proteomic workflow (Proteograph). By increasing the binding competition on engineered NPs, 1.2-1.7× more proteins with 1% false discovery rate are identified on the surface of each NP, and up to 3× more proteins compared to a standard plasma proteomics workflow. Moreover, the data suggest P/NP plays a significant role in determining the in vivo fate of nanomaterials in biomedical applications. Together, the study showcases the importance of P/NP as a key design element for biomaterials and nanomedicine in vivo and as a powerful tuning strategy for accurate, large-scale NP-based deep proteomic studies.


Assuntos
Nanopartículas , Coroa de Proteína , Coroa de Proteína/química , Proteoma , Proteômica , Nanopartículas/química , Nanomedicina
9.
Proc Natl Acad Sci U S A ; 119(11): e2106053119, 2022 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-35275789

RESUMO

SignificanceDeep profiling of the plasma proteome at scale has been a challenge for traditional approaches. We achieve superior performance across the dimensions of precision, depth, and throughput using a panel of surface-functionalized superparamagnetic nanoparticles in comparison to conventional workflows for deep proteomics interrogation. Our automated workflow leverages competitive nanoparticle-protein binding equilibria that quantitatively compress the large dynamic range of proteomes to an accessible scale. Using machine learning, we dissect the contribution of individual physicochemical properties of nanoparticles to the composition of protein coronas. Our results suggest that nanoparticle functionalization can be tailored to protein sets. This work demonstrates the feasibility of deep, precise, unbiased plasma proteomics at a scale compatible with large-scale genomics enabling multiomic studies.


Assuntos
Proteínas Sanguíneas , Aprendizado Profundo , Nanopartículas , Proteômica , Proteínas Sanguíneas/química , Nanopartículas/química , Coroa de Proteína/química , Proteoma , Proteômica/métodos
10.
Gigascience ; 9(7)2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32649757

RESUMO

BACKGROUND: Macaque species share >93% genome homology with humans and develop many disease phenotypes similar to those of humans, making them valuable animal models for the study of human diseases (e.g., HIV and neurodegenerative diseases). However, the quality of genome assembly and annotation for several macaque species lags behind the human genome effort. RESULTS: To close this gap and enhance functional genomics approaches, we used a combination of de novo linked-read assembly and scaffolding using proximity ligation assay (HiC) to assemble the pig-tailed macaque (Macaca nemestrina) genome. This combinatorial method yielded large scaffolds at chromosome level with a scaffold N50 of 127.5 Mb; the 23 largest scaffolds covered 90% of the entire genome. This assembly revealed large-scale rearrangements between pig-tailed macaque chromosomes 7, 12, and 13 and human chromosomes 2, 14, and 15. We subsequently annotated the genome using transcriptome and proteomics data from personalized induced pluripotent stem cells derived from the same animal. Reconstruction of the evolutionary tree using whole-genome annotation and orthologous comparisons among 3 macaque species, human, and mouse genomes revealed extensive homology between human and pig-tailed macaques with regards to both pluripotent stem cell genes and innate immune gene pathways. Our results confirm that rhesus and cynomolgus macaques exhibit a closer evolutionary distance to each other than either species exhibits to humans or pig-tailed macaques. CONCLUSIONS: These findings demonstrate that pig-tailed macaques can serve as an excellent animal model for the study of many human diseases particularly with regards to pluripotency and innate immune pathways.


Assuntos
Cromossomos , Genoma , Genômica , Macaca nemestrina/genética , Animais , Biologia Computacional/métodos , Genômica/métodos , Humanos , Cariotipagem/métodos , Masculino , Anotação de Sequência Molecular , Proteômica/métodos , Sequências Repetitivas de Ácido Nucleico
11.
Genome Med ; 12(1): 50, 2020 05 29.
Artigo em Inglês | MEDLINE | ID: mdl-32471482

RESUMO

BACKGROUND: Populations of closely related microbial strains can be simultaneously present in bacterial communities such as the human gut microbiome. We recently developed a de novo genome assembly approach that uses read cloud sequencing to provide more complete microbial genome drafts, enabling precise differentiation and tracking of strain-level dynamics across metagenomic samples. In this case study, we present a proof-of-concept using read cloud sequencing to describe bacterial strain diversity in the gut microbiome of one hematopoietic cell transplantation patient over a 2-month time course and highlight temporal strain variation of gut microbes during therapy. The treatment was accompanied by diet changes and administration of multiple immunosuppressants and antimicrobials. METHODS: We conducted short-read and read cloud metagenomic sequencing of DNA extracted from four longitudinal stool samples collected during the course of treatment of one hematopoietic cell transplantation (HCT) patient. After applying read cloud metagenomic assembly to discover strain-level sequence variants in these complex microbiome samples, we performed metatranscriptomic analysis to investigate differential expression of antibiotic resistance genes. Finally, we validated predictions from the genomic and metatranscriptomic findings through in vitro antibiotic susceptibility testing and whole genome sequencing of isolates derived from the patient stool samples. RESULTS: During the 56-day longitudinal time course that was studied, the patient's microbiome was profoundly disrupted and eventually dominated by Bacteroides caccae. Comparative analysis of B. caccae genomes obtained using read cloud sequencing together with metagenomic RNA sequencing allowed us to identify differences in substrain populations over time. Based on this, we predicted that particular mobile element integrations likely resulted in increased antibiotic resistance, which we further supported using in vitro antibiotic susceptibility testing. CONCLUSIONS: We find read cloud assembly to be useful in identifying key structural genomic strain variants within a metagenomic sample. These strains have fluctuating relative abundance over relatively short time periods in human microbiomes. We also find specific structural genomic variations that are associated with increased antibiotic resistance over the course of clinical treatment.


Assuntos
Bactérias/genética , Microbioma Gastrointestinal/genética , Anti-Infecciosos/farmacologia , Azacitidina/farmacologia , Azitromicina/farmacologia , Bactérias/classificação , Bactérias/efeitos dos fármacos , Bactérias/isolamento & purificação , Ciprofloxacina/farmacologia , DNA Bacteriano , Dieta , Fezes/microbiologia , Microbioma Gastrointestinal/efeitos dos fármacos , Genoma Bacteriano , Transplante de Células-Tronco Hematopoéticas , Humanos , Imunossupressores/farmacologia , Masculino , Metagenoma , Pessoa de Meia-Idade , Síndromes Mielodisplásicas/microbiologia , Síndromes Mielodisplásicas/terapia , Mielofibrose Primária/microbiologia , Mielofibrose Primária/terapia , RNA-Seq , Análise de Sequência de DNA
12.
Bioinformatics ; 36(4): 1082-1090, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31584621

RESUMO

MOTIVATION: We propose Meltos, a novel computational framework to address the challenging problem of building tumor phylogeny trees using somatic structural variants (SVs) among multiple samples. Meltos leverages the tumor phylogeny tree built on somatic single nucleotide variants (SNVs) to identify high confidence SVs and produce a comprehensive tumor lineage tree, using a novel optimization formulation. While we do not assume the evolutionary progression of SVs is necessarily the same as SNVs, we show that a tumor phylogeny tree using high-quality somatic SNVs can act as a guide for calling and assigning somatic SVs on a tree. Meltos utilizes multiple genomic read signals for potential SV breakpoints in whole genome sequencing data and proposes a probabilistic formulation for estimating variant allele fractions (VAFs) of SV events. RESULTS: In order to assess the ability of Meltos to correctly refine SNV trees with SV information, we tested Meltos on two simulated datasets with five genomes in both. We also assessed Meltos on two real cancer datasets. We tested Meltos on multiple samples from a liposarcoma tumor and on a multi-sample breast cancer data (Yates et al., 2015), where the authors provide validated structural variation events together with deep, targeted sequencing for a collection of somatic SNVs. We show Meltos has the ability to place high confidence validated SV calls on a refined tumor phylogeny tree. We also showed the flexibility of Meltos to either estimate VAFs directly from genomic data or to use copy number corrected estimates. AVAILABILITY AND IMPLEMENTATION: Meltos is available at https://github.com/ih-lab/Meltos. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias , Genoma , Variação Estrutural do Genoma , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/genética , Filogenia , Análise de Sequência , Software
13.
Nat Commun ; 10(1): 3341, 2019 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-31350405

RESUMO

Tens of thousands of genotype-phenotype associations have been discovered to date, yet not all of them are easily accessible to scientists. Here, we describe GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms. Our information extraction system helps curators by automatically collecting over 6,000 associations from open-access publications with an estimated recall of 60-80% and with an estimated precision of 78-94% (measured relative to existing manually curated knowledge bases). This system represents a fully automated GWAS curation effort and is made possible by a paradigm for constructing machine learning systems called data programming. Our work represents a step towards making the curation of scientific literature more efficient using automated systems.


Assuntos
Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Biologia Computacional , Mineração de Dados , Genoma Humano , Humanos , Aprendizado de Máquina
14.
Cell ; 176(3): 535-548.e24, 2019 01 24.
Artigo em Inglês | MEDLINE | ID: mdl-30661751

RESUMO

The splicing of pre-mRNAs into mature transcripts is remarkable for its precision, but the mechanisms by which the cellular machinery achieves such specificity are incompletely understood. Here, we describe a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing. Synonymous and intronic mutations with predicted splice-altering consequence validate at a high rate on RNA-seq and are strongly deleterious in the human population. De novo mutations with predicted splice-altering consequence are significantly enriched in patients with autism and intellectual disability compared to healthy controls and validate against RNA-seq in 21 out of 28 of these patients. We estimate that 9%-11% of pathogenic mutations in patients with rare genetic disorders are caused by this previously underappreciated class of disease variation.


Assuntos
Previsões/métodos , Precursores de RNA/genética , Splicing de RNA/genética , Algoritmos , Processamento Alternativo/genética , Transtorno Autístico/genética , Aprendizado Profundo , Éxons/genética , Humanos , Deficiência Intelectual/genética , Íntrons/genética , Redes Neurais de Computação , Precursores de RNA/metabolismo , Sítios de Splice de RNA/genética , Sítios de Splice de RNA/fisiologia
15.
Nat Genet ; 51(2): 364, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30559491

RESUMO

In the version of this article originally published, the name of author Serafim Batzoglou was misspelled. The error has been corrected in the HTML and PDF versions of the article.

16.
Nat Biotechnol ; 2018 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-30320765

RESUMO

Although shotgun metagenomic sequencing of microbiome samples enables partial reconstruction of strain-level community structure, obtaining high-quality microbial genome drafts without isolation and culture remains difficult. Here, we present an application of read clouds, short-read sequences tagged with long-range information, to microbiome samples. We present Athena, a de novo assembler that uses read clouds to improve metagenomic assemblies. We applied this approach to sequence stool samples from two healthy individuals and compared it with existing short-read and synthetic long-read metagenomic sequencing techniques. Read-cloud metagenomic sequencing and Athena assembly produced the most comprehensive individual genome drafts with high contiguity (>200-kb N50, fewer than ten contigs), even for bacteria with relatively low (20×) raw short-read-sequence coverage. We also sequenced a complex marine-sediment sample and generated 24 intermediate-quality genome drafts (>70% complete, <10% contaminated), nine of which were complete (>90% complete, <5% contaminated). Our approach allows for culture-free generation of high-quality microbial genome drafts by using a single shotgun experiment.

17.
Nat Commun ; 9(1): 4453, 2018 10 26.
Artigo em Inglês | MEDLINE | ID: mdl-30367051

RESUMO

Outcomes for cancer patients vary greatly even within the same tumor type, and characterization of molecular subtypes of cancer holds important promise for improving prognosis and personalized treatment. This promise has motivated recent efforts to produce large amounts of multidimensional genomic (multi-omic) data, but current algorithms still face challenges in the integrated analysis of such data. Here we present Cancer Integration via Multikernel Learning (CIMLR), a new cancer subtyping method that integrates multi-omic data to reveal molecular subtypes of cancer. We apply CIMLR to multi-omic data from 36 cancer types and show significant improvements in both computational efficiency and ability to extract biologically meaningful cancer subtypes. The discovered subtypes exhibit significant differences in patient survival for 27 of 36 cancer types. Our analysis reveals integrated patterns of gene expression, methylation, point mutations, and copy number changes in multiple cancers and highlights patterns specifically associated with poor patient outcomes.


Assuntos
Biologia Computacional , Genômica , Neoplasias/genética , Neoplasias/mortalidade , Algoritmos , Análise por Conglomerados , Variações do Número de Cópias de DNA , Metilação de DNA , Perfilação da Expressão Gênica , Humanos , Neoplasias/classificação , Neoplasias/terapia , Mutação Puntual , Prognóstico , Análise de Sobrevida
18.
Nat Commun ; 9(1): 3108, 2018 08 06.
Artigo em Inglês | MEDLINE | ID: mdl-30082777

RESUMO

Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. However, biological networks are noisy due to the limitations of measurement technology and inherent natural variation, which can hamper discovery of network patterns and dynamics. We propose Network Enhancement (NE), a method for improving the signal-to-noise ratio of undirected, weighted networks. NE uses a doubly stochastic matrix operator that induces sparsity and provides a closed-form solution that increases spectral eigengap of the input network. As a result, NE removes weak edges, enhances real connections, and leads to better downstream performance. Experiments show that NE improves gene-function prediction by denoising tissue-specific interaction networks, alleviates interpretation of noisy Hi-C contact maps from the human genome, and boosts fine-grained identification accuracy of species. Our results indicate that NE is widely applicable for denoising biological networks.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica , Genoma Humano , Algoritmos , Área Sob a Curva , Produtos Biológicos , Difusão , Ecossistema , Humanos , Modelos Biológicos , Domínios Proteicos , Razão Sinal-Ruído , Processos Estocásticos
19.
Nat Genet ; 50(8): 1161-1170, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-30038395

RESUMO

Millions of human genomes and exomes have been sequenced, but their clinical applications remain limited due to the difficulty of distinguishing disease-causing mutations from benign genetic variation. Here we demonstrate that common missense variants in other primate species are largely clinically benign in human, enabling pathogenic mutations to be systematically identified by the process of elimination. Using hundreds of thousands of common variants from population sequencing of six non-human primate species, we train a deep neural network that identifies pathogenic mutations in rare disease patients with 88% accuracy and enables the discovery of 14 new candidate genes in intellectual disability at genome-wide significance. Cataloging common variation from additional primate species would improve interpretation for millions of variants of uncertain significance, further advancing the clinical utility of human genome sequencing.


Assuntos
Genoma Humano , Mutação , Rede Nervosa/fisiologia , Animais , Exoma , Predisposição Genética para Doença , Humanos , Deficiência Intelectual/genética , Deficiência Intelectual/patologia , Primatas
20.
BMC Genomics ; 19(1): 467, 2018 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-29914369

RESUMO

BACKGROUND: De novo mutations (DNMs) are associated with neurodevelopmental and congenital diseases, and their detection can contribute to understanding disease pathogenicity. However, accurate detection is challenging because of their small number relative to the genome-wide false positives in next generation sequencing (NGS) data. Software such as DeNovoGear and TrioDeNovo have been developed to detect DNMs, but at good sensitivity they still produce many false positive calls. RESULTS: To address this challenge, we develop HAPDeNovo, a program that leverages phasing information from linked read sequencing, to remove false positive DNMs from candidate lists generated by DNM-detection tools. Short reads from each phasing block are allocated to each of the two haplotypes followed by generating a haploid genotype for each putative DNM. HAPDeNovo removes variants that are called as heterozygous in one of the haplotypes because they are almost certainly false positives. Our experiments on 10X Chromium linked read sequencing trio data reveal that HAPDeNovo eliminates 80 to 99% of false positives regardless of how large the candidate DNM set is. CONCLUSIONS: HAPDeNovo leverages the haplotype information from linked read sequencing to remove spurious false positive DNMs effectively, and it increases accuracy of DNM detection dramatically without sacrificing sensitivity.


Assuntos
Genoma Humano , Haplótipos , Mutação , Software , Algoritmos , Biologia Computacional , Análise Mutacional de DNA , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...