Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Commun ; 14(1): 6042, 2023 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-37758728

RESUMO

Multimodal epigenetic characterization of cell-free DNA (cfDNA) could improve the performance of blood-based early cancer detection. However, integrative profiling of cfDNA methylome and fragmentome has been technologically challenging. Here, we adapt an enzyme-mediated methylation sequencing method for comprehensive analysis of genome-wide cfDNA methylation, fragmentation, and copy number alteration (CNA) characteristics for enhanced cancer detection. We apply this method to plasma samples of 497 healthy controls and 780 patients of seven cancer types and develop an ensemble classifier by incorporating methylation, fragmentation, and CNA features. In the test cohort, our approach achieves an area under the curve value of 0.966 for overall cancer detection. Detection sensitivity for early-stage patients achieves 73% at 99% specificity. Finally, we demonstrate the feasibility to accurately localize the origin of cancer signals with combined methylation and fragmentation profiling of tissue-specific accessible chromatin regions. Overall, this proof-of-concept study provides a technical platform to utilize multimodal cfDNA features for improved cancer detection.


Assuntos
Ácidos Nucleicos Livres , Neoplasias , Humanos , Ácidos Nucleicos Livres/genética , Epigenoma , Neoplasias/diagnóstico , Neoplasias/genética , Epigenômica/métodos , Metilação de DNA/genética , Biomarcadores Tumorais/genética
2.
Comput Biol Med ; 151(Pt B): 106323, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36436482

RESUMO

Deep learning-based virtual screening methods have been shown to significantly improve the accuracy of traditional docking-based virtual screening methods. In this paper, we developed Deffini, a structure-based virtual screening neural network model. During training, Deffini learns protein-ligand docking poses to distinguish actives and decoys and then to predict whether a new ligand will bind to the protein target. Deffini outperformed Smina with an average AUC ROC of 0.92 and AUC PRC of 0.44 in 3-fold cross-validation on the benchmark dataset DUD-E. However, when tested on the maximum unbiased validation (MUV) dataset, Deffini achieved poor results with an average AUC ROC of 0.517. We used the family-specific training approach to train the model to improve the model performance and concluded that family-specific models performed better than the pan-family models. To explore the limits of the predictive power of the family-specific models, we constructed Kernie, a new protein kinase dataset consisting of 358 kinases. Deffini trained with the Kernie dataset outperformed all recent benchmarks on the MUV kinases, with an average AUC ROC of 0.745, which highlights the importance of quality datasets in improving the performance of deep neural network models and the importance of using family-specific models.


Assuntos
Redes Neurais de Computação , Proteínas , Ligantes , Proteínas/metabolismo
3.
BMC Bioinformatics ; 22(1): 23, 2021 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-33451280

RESUMO

BACKGROUND: Copy number alterations (CNAs), due to their large impact on the genome, have been an important contributing factor to oncogenesis and metastasis. Detecting genomic alterations from the shallow-sequencing data of a low-purity tumor sample remains a challenging task. RESULTS: We introduce Accucopy, a method to infer total copy numbers (TCNs) and allele-specific copy numbers (ASCNs) from challenging low-purity and low-coverage tumor samples. Accucopy adopts many robust statistical techniques such as kernel smoothing of coverage differentiation information to discern signals from noise and combines ideas from time-series analysis and the signal-processing field to derive a range of estimates for the period in a histogram of coverage differentiation information. Statistical learning models such as the tiered Gaussian mixture model, the expectation-maximization algorithm, and sparse Bayesian learning were customized and built into the model. Accucopy is implemented in C++ /Rust, packaged in a docker image, and supports non-human samples, more at http://www.yfish.org/software/ . CONCLUSIONS: We describe Accucopy, a method that can predict both TCNs and ASCNs from low-coverage low-purity tumor sequencing data. Through comparative analyses in both simulated and real-sequencing samples, we demonstrate that Accucopy is more accurate than Sclust, ABSOLUTE, and Sequenza.


Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Neoplasias , Alelos , Teorema de Bayes , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/genética , Software
4.
Bioinformatics ; 34(12): 2004-2011, 2018 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-29385401

RESUMO

Motivation: Tumor purity and ploidy have a substantial impact on next-gen sequence analyses of tumor samples and may alter the biological and clinical interpretation of results. Despite the existence of several computational methods that are dedicated to estimate tumor purity and/or ploidy from The Cancer Genome Atlas (TCGA) tumor-normal whole-genome-sequencing (WGS) data, an accurate, fast and fully-automated method that works in a wide range of sequencing coverage, level of tumor purity and level of intra-tumor heterogeneity, is still missing. Results: We describe a computational method called Accurity that infers tumor purity, tumor cell ploidy and absolute allelic copy numbers for somatic copy number alterations (SCNAs) from tumor-normal WGS data by jointly modelling SCNAs and heterozygous germline single-nucleotide-variants (HGSNVs). Results from both in silico and real sequencing data demonstrated that Accurity is highly accurate and robust, even in low-purity, high-ploidy and low-coverage settings in which several existing methods perform poorly. Accounting for tumor purity and ploidy, Accurity significantly increased signal/noise gaps between different copy numbers. We are hopeful that Accurity is of clinical use for identifying cancer diagnostic biomarkers. Availability and implementation: Accurity is implemented in C++/Rust, available at http://www.yfish.org/software/. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Variações do Número de Cópias de DNA , Neoplasias/genética , Ploidias , Software , Sequenciamento Completo do Genoma/métodos , Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Mutação em Linhagem Germinativa , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos
5.
Nat Genet ; 49(12): 1714-1721, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29083405

RESUMO

By analyzing multitissue gene expression and genome-wide genetic variation data in samples from a vervet monkey pedigree, we generated a transcriptome resource and produced the first catalog of expression quantitative trait loci (eQTLs) in a nonhuman primate model. This catalog contains more genome-wide significant eQTLs per sample than comparable human resources and identifies sex- and age-related expression patterns. Findings include a master regulatory locus that likely has a role in immune function and a locus regulating hippocampal long noncoding RNAs (lncRNAs), whose expression correlates with hippocampal volume. This resource will facilitate genetic investigation of quantitative traits, including brain and behavioral phenotypes relevant to neuropsychiatric disorders.


Assuntos
Chlorocebus aethiops/genética , Perfilação da Expressão Gênica , Variação Genética , Locos de Características Quantitativas/genética , Animais , Encéfalo/crescimento & desenvolvimento , Encéfalo/metabolismo , Chlorocebus aethiops/crescimento & desenvolvimento , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único
6.
BMC Biol ; 13: 41, 2015 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-26092298

RESUMO

BACKGROUND: We report here the first genome-wide high-resolution polymorphism resource for non-human primate (NHP) association and linkage studies, constructed for the Caribbean-origin vervet monkey, or African green monkey (Chlorocebus aethiops sabaeus), one of the most widely used NHPs in biomedical research. We generated this resource by whole genome sequencing (WGS) of monkeys from the Vervet Research Colony (VRC), an NIH-supported research resource for which extensive phenotypic data are available. RESULTS: We identified genome-wide single nucleotide polymorphisms (SNPs) by WGS of 721 members of an extended pedigree from the VRC. From high-depth WGS data we identified more than 4 million polymorphic unequivocal segregating sites; by pruning these SNPs based on heterozygosity, quality control filters, and the degree of linkage disequilibrium (LD) between SNPs, we constructed genome-wide panels suitable for genetic association (about 500,000 SNPs) and linkage analysis (about 150,000 SNPs). To further enhance the utility of these resources for linkage analysis, we used a further pruned subset of the linkage panel to generate multipoint identity by descent matrices. CONCLUSIONS: The genetic and phenotypic resources now available for the VRC and other Caribbean-origin vervets enable their use for genetic investigation of traits relevant to human diseases.


Assuntos
Chlorocebus aethiops/genética , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Animais , Mapeamento Cromossômico , Feminino , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Masculino , Repetições de Microssatélites , Fenótipo , Locos de Características Quantitativas , Análise de Sequência
7.
PLoS Genet ; 8(9): e1002923, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22969436

RESUMO

Understanding the mechanism of cadmium (Cd) accumulation in plants is important to help reduce its potential toxicity to both plants and humans through dietary and environmental exposure. Here, we report on a study to uncover the genetic basis underlying natural variation in Cd accumulation in a world-wide collection of 349 wild collected Arabidopsis thaliana accessions. We identified a 4-fold variation (0.5-2 µg Cd g(-1) dry weight) in leaf Cd accumulation when these accessions were grown in a controlled common garden. By combining genome-wide association mapping, linkage mapping in an experimental F2 population, and transgenic complementation, we reveal that HMA3 is the sole major locus responsible for the variation in leaf Cd accumulation we observe in this diverse population of A. thaliana accessions. Analysis of the predicted amino acid sequence of HMA3 from 149 A. thaliana accessions reveals the existence of 10 major natural protein haplotypes. Association of these haplotypes with leaf Cd accumulation and genetics complementation experiments indicate that 5 of these haplotypes are active and 5 are inactive, and that elevated leaf Cd accumulation is associated with the reduced function of HMA3 caused by a nonsense mutation and polymorphisms that change two specific amino acids.


Assuntos
Adenosina Trifosfatases/metabolismo , Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Folhas de Planta/metabolismo , Adenosina Trifosfatases/genética , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Cádmio , Estudo de Associação Genômica Ampla , Raízes de Plantas/metabolismo , Brotos de Planta/metabolismo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
8.
Nat Genet ; 44(2): 212-6, 2012 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-22231484

RESUMO

Arabidopsis thaliana is native to Eurasia and is naturalized across the world. Its ability to be easily propagated and its high phenotypic variability make it an ideal model system for functional, ecological and evolutionary genetics. To date, analyses of the natural genetic variation of A. thaliana have involved small numbers of individual plants or genetic markers. Here we genotype 1,307 worldwide accessions, including several regional samples, using a 250K SNP chip. This allowed us to produce a high-resolution description of the global pattern of genetic variation. We applied three complementary selection tests and identified new targets of selection. Further, we characterized the pattern of historical recombination in A. thaliana and observed an enrichment of hotspots in its intergenic regions and repetitive DNA, which is consistent with the pattern that is observed for humans but which is strikingly different from that observed in other plant species. We have made the seeds we used to produce this Regional Mapping (RegMap) panel publicly available. This panel comprises one of the largest genomic mapping resources currently available for global natural isolates of a non-human species.


Assuntos
Arabidopsis/genética , Variação Genética , Genoma de Planta , Mapeamento Cromossômico , Genótipo , Geografia , Polimorfismo de Nucleotídeo Único , Recombinação Genética , Seleção Genética
9.
Plant Cell ; 24(12): 4793-805, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-23277364

RESUMO

Arabidopsis thaliana is an important model organism for understanding the genetics and molecular biology of plants. Its highly selfing nature, small size, short generation time, small genome size, and wide geographic distribution make it an ideal model organism for understanding natural variation. Genome-wide association studies (GWAS) have proven a useful technique for identifying genetic loci responsible for natural variation in A. thaliana. Previously genotyped accessions (natural inbred lines) can be grown in replicate under different conditions and phenotyped for different traits. These important features greatly simplify association mapping of traits and allow for systematic dissection of the genetics of natural variation by the entire A. thaliana community. To facilitate this, we present GWAPP, an interactive Web-based application for conducting GWAS in A. thaliana. Using an efficient implementation of a linear mixed model, traits measured for a subset of 1386 publicly available ecotypes can be uploaded and mapped with a mixed model and other methods in just a couple of minutes. GWAPP features an extensive, interactive, and user-friendly interface that includes interactive Manhattan plots and linkage disequilibrium plots. It also facilitates exploratory data analysis by implementing features such as the inclusion of candidate polymorphisms in the model as cofactors.


Assuntos
Arabidopsis/genética , Estudo de Associação Genômica Ampla/métodos , Internet , Desequilíbrio de Ligação/genética , Software , Interface Usuário-Computador
10.
Database (Oxford) ; 2011: bar014, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21609965

RESUMO

With large-scale genomic data becoming the norm in biological studies, the storing, integrating, viewing and searching of such data have become a major challenge. In this article, we describe the development of an Arabidopsis thaliana database that hosts the geographic information and genetic polymorphism data for over 6000 accessions and genome-wide association study (GWAS) results for 107 phenotypes representing the largest collection of Arabidopsis polymorphism data and GWAS results to date. Taking advantage of a series of the latest web 2.0 technologies, such as Ajax (Asynchronous JavaScript and XML), GWT (Google-Web-Toolkit), MVC (Model-View-Controller) web framework and Object Relationship Mapper, we have created a web-based application (web app) for the database, that offers an integrated and dynamic view of geographic information, genetic polymorphism and GWAS results. Essential search functionalities are incorporated into the web app to aid reverse genetics research. The database and its web app have proven to be a valuable resource to the Arabidopsis community. The whole framework serves as an example of how biological data, especially GWAS, can be presented and accessed through the web. In the end, we illustrate the potential to gain new insights through the web app by two examples, showcasing how it can be used to facilitate forward and reverse genetics research. Database URL: http://arabidopsis.usc.edu/


Assuntos
Arabidopsis/genética , Biologia Computacional/métodos , Genoma de Planta/genética , Estudo de Associação Genômica Ampla/métodos , Internet , Alelos , Bases de Dados Genéticas , Genótipo , Geografia , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Análise de Componente Principal
11.
PLoS Genet ; 6(11): e1001193, 2010 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-21085628

RESUMO

The genetic model plant Arabidopsis thaliana, like many plant species, experiences a range of edaphic conditions across its natural habitat. Such heterogeneity may drive local adaptation, though the molecular genetic basis remains elusive. Here, we describe a study in which we used genome-wide association mapping, genetic complementation, and gene expression studies to identify cis-regulatory expression level polymorphisms at the AtHKT1;1 locus, encoding a known sodium (Na(+)) transporter, as being a major factor controlling natural variation in leaf Na(+) accumulation capacity across the global A. thaliana population. A weak allele of AtHKT1;1 that drives elevated leaf Na(+) in this population has been previously linked to elevated salinity tolerance. Inspection of the geographical distribution of this allele revealed its significant enrichment in populations associated with the coast and saline soils in Europe. The fixation of this weak AtHKT1;1 allele in these populations is genetic evidence supporting local adaptation to these potentially saline impacted environments.


Assuntos
Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Transporte de Cátions/genética , Proteínas de Transporte de Cátions/metabolismo , Ecossistema , Variação Genética , Água do Mar , Sódio/metabolismo , Simportadores/genética , Simportadores/metabolismo , Alelos , Arabidopsis/crescimento & desenvolvimento , Regulação da Expressão Gênica de Plantas , Teste de Complementação Genética , Genoma de Planta/genética , Estudo de Associação Genômica Ampla , Geografia , Folhas de Planta/genética , Folhas de Planta/metabolismo
12.
Proc Natl Acad Sci U S A ; 107(22): 10302-7, 2010 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-20479233

RESUMO

The model plant Arabidopsis thaliana exhibits extensive natural variation in resistance to parasites. Immunity is often conferred by resistance (R) genes that permit recognition of specific races of a disease. The number of such R genes and their distribution are poorly understood. In this study, we investigated the basis for resistance to the downy mildew agent Hyaloperonospora arabidopsidis ex parasitica (Hpa) in a global sample of A. thaliana. We implemented a combined genome-wide mapping of resistance using populations of recombinant inbred lines and a collection of wild A. thaliana accessions. We tested the interaction between 96 host genotypes collected worldwide and five strains of Hpa. Then, a fraction of the species-wide resistance was genetically dissected using six recently constructed populations of recombinant inbred lines. We found that resistance is usually governed by single dominant R genes that are concentrated in four genomic regions only. We show that association genetics of resistance to diseases such as downy mildew enables increased mapping resolution from quantitative trait loci interval to candidate gene level. Association patterns in quantitative trait loci intervals indicate that the pool of A. thaliana resistance sources against the tested Hpa isolates may be predominantly confined to six RPP (Resistance to Hpa) loci isolated in previous studies. Our results suggest that combining association and linkage mapping could accelerate resistance gene discovery in plants.


Assuntos
Arabidopsis/genética , Arabidopsis/microbiologia , Genoma de Planta , Oomicetos/patogenicidade , Doenças das Plantas/genética , Doenças das Plantas/microbiologia , Mapeamento Cromossômico , Variação Genética , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas
13.
Nature ; 465(7298): 627-31, 2010 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-20336072

RESUMO

Although pioneered by human geneticists as a potential solution to the challenging problem of finding the genetic basis of common human diseases, genome-wide association (GWA) studies have, owing to advances in genotyping and sequencing technology, become an obvious general approach for studying the genetics of natural variation and traits of agricultural importance. They are particularly useful when inbred lines are available, because once these lines have been genotyped they can be phenotyped multiple times, making it possible (as well as extremely cost effective) to study many different traits in many different environments, while replicating the phenotypic measurements to reduce environmental noise. Here we demonstrate the power of this approach by carrying out a GWA study of 107 phenotypes in Arabidopsis thaliana, a widely distributed, predominantly self-fertilizing model plant known to harbour considerable genetic variation for many adaptively important traits. Our results are dramatically different from those of human GWA studies, in that we identify many common alleles of major effect, but they are also, in many cases, harder to interpret because confounding by complex genetics and population structure make it difficult to distinguish true associations from false. However, a-priori candidates are significantly over-represented among these associations as well, making many of them excellent candidates for follow-up experiments. Our study demonstrates the feasibility of GWA studies in A. thaliana and suggests that the approach will be appropriate for many other organisms.


Assuntos
Arabidopsis/classificação , Arabidopsis/genética , Genoma de Planta/genética , Estudo de Associação Genômica Ampla , Fenótipo , Alelos , Proteínas de Arabidopsis/genética , Flores/genética , Genes de Plantas/genética , Loci Gênicos/genética , Genótipo , Imunidade Inata/genética , Endogamia , Polimorfismo de Nucleotídeo Único/genética
14.
PLoS Genet ; 6(2): e1000843, 2010 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-20169178

RESUMO

The population structure of an organism reflects its evolutionary history and influences its evolutionary trajectory. It constrains the combination of genetic diversity and reveals patterns of past gene flow. Understanding it is a prerequisite for detecting genomic regions under selection, predicting the effect of population disturbances, or modeling gene flow. This paper examines the detailed global population structure of Arabidopsis thaliana. Using a set of 5,707 plants collected from around the globe and genotyped at 149 SNPs, we show that while A. thaliana as a species self-fertilizes 97% of the time, there is considerable variation among local groups. This level of outcrossing greatly limits observed heterozygosity but is sufficient to generate considerable local haplotypic diversity. We also find that in its native Eurasian range A. thaliana exhibits continuous isolation by distance at every geographic scale without natural breaks corresponding to classical notions of populations. By contrast, in North America, where it exists as an exotic species, A. thaliana exhibits little or no population structure at a continental scale but local isolation by distance that extends hundreds of km. This suggests a pattern for the development of isolation by distance that can establish itself shortly after an organism fills a new habitat range. It also raises questions about the general applicability of many standard population genetics models. Any model based on discrete clusters of interchangeable individuals will be an uneasy fit to organisms like A. thaliana which exhibit continuous isolation by distance on many scales.


Assuntos
Arabidopsis/genética , Alelos , Cruzamentos Genéticos , Geografia , Haplótipos/genética , Heterozigoto , Endogamia , Dinâmica Populacional
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...