Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Mol Biol Evol ; 2022 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-35932227

RESUMO

We present the fifth edition of the TimeTree of Life resource (TToL5), a product of the timetree of life project that aims to synthesize published molecular timetrees and make evolutionary knowledge easily accessible to all. Using the TToL5 web portal, users can retrieve published studies and divergence times between species, the timeline of a species' evolution beginning with the origin of life, and the timetree for a given evolutionary group at the desired taxonomic rank. TToL5 contains divergence time information on 137,306 species, 41% more than the previous edition. The TToL5 web interface is now ADA-compliant and mobile-friendly, a result of comprehensive source code refactoring. TToL5 also offers programmatic access to species divergence times and timelines through an application programming interface, which is accessible at timetree.temple.edu/api. TToL5 is publicly available at timetree.org.

2.
Bioinformatics ; 38(10): 2719-2726, 2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35561179

RESUMO

MOTIVATION: Building reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites but millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate and fast phylogenetic inference of resolvable phylogenetic features. RESULTS: We present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. We develop a bootstrap strategy that resamples genomes spatiotemporally to assess topological robustness. The application of TopHap to build a phylogeny of 68 057 SARS-CoV-2 genomes (68KG) from the first year of the pandemic produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million SARS-CoV-2 genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major and recent variants of concern. AVAILABILITY AND IMPLEMENTATION: TopHap is available at https://github.com/SayakaMiura/TopHap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
COVID-19 , SARS-CoV-2 , Genoma Viral , Haplótipos , Humanos , Mutação , Filogenia , SARS-CoV-2/genética
3.
PLoS Comput Biol ; 18(4): e1010006, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35389981

RESUMO

Many pathogenic missense mutations are found in protein positions that are neither well-conserved nor fall in any known functional domains. Consequently, we lack any mechanistic underpinning of dysfunction caused by such mutations. We explored the disruption of allosteric dynamic coupling between these positions and the known functional sites as a possible mechanism for pathogenesis. In this study, we present an analysis of 591 pathogenic missense variants in 144 human enzymes that suggests that allosteric dynamic coupling of mutated positions with known active sites is a plausible biophysical mechanism and evidence of their functional importance. We illustrate this mechanism in a case study of ß-Glucocerebrosidase (GCase) in which a vast majority of 94 sites harboring Gaucher disease-associated missense variants are located some distance away from the active site. An analysis of the conformational dynamics of GCase suggests that mutations on these distal sites cause changes in the flexibility of active site residues despite their distance, indicating a dynamic communication network throughout the protein. The disruption of the long-distance dynamic coupling caused by missense mutations may provide a plausible general mechanistic explanation for biological dysfunction and disease.


Assuntos
Mutação de Sentido Incorreto , Proteínas , Domínio Catalítico/genética , Humanos , Mutação , Mutação de Sentido Incorreto/genética , Proteínas/química
4.
Mol Biol Evol ; 38(8): 3046-3059, 2021 07 29.
Artigo em Inglês | MEDLINE | ID: mdl-33942847

RESUMO

Global sequencing of genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continued to reveal new genetic variants that are the key to unraveling its early evolutionary history and tracking its global spread over time. Here we present the heretofore cryptic mutational history and spatiotemporal dynamics of SARS-CoV-2 from an analysis of thousands of high-quality genomes. We report the likely most recent common ancestor of SARS-CoV-2, reconstructed through a novel application and advancement of computational methods initially developed to infer the mutational history of tumor cells in a patient. This progenitor genome differs from genomes of the first coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections. However, multiple coronavirus infections in China and the United States harbored the progenitor genetic fingerprint in January 2020 and later, suggesting that the progenitor was spreading worldwide months before and after the first reported cases of COVID-19 in China. Mutations of the progenitor and its offshoots have produced many dominant coronavirus strains that have spread episodically over time. Fingerprinting based on common mutations reveals that the same coronavirus lineage has dominated North America for most of the pandemic in 2020. There have been multiple replacements of predominant coronavirus strains in Europe and Asia as well as continued presence of multiple high-frequency strains in Asia and North America. We have developed a continually updating dashboard of global evolution and spatiotemporal trends of SARS-CoV-2 spread (http://sars2evo.datamonkey.org/).


Assuntos
COVID-19/genética , SARS-CoV-2/genética , Evolução Biológica , COVID-19/metabolismo , Biologia Computacional/métodos , Busca de Comunicante/métodos , Evolução Molecular , Genoma Viral , Humanos , Mutação , Pandemias , Filogenia , SARS-CoV-2/metabolismo , SARS-CoV-2/patogenicidade , Análise de Sequência de DNA/métodos
5.
Bioinformatics ; 37(8): 1125-1134, 2021 05 23.
Artigo em Inglês | MEDLINE | ID: mdl-33135051

RESUMO

MOTIVATION: Expression quantitative trait loci (eQTL) harbor genetic variants modulating gene transcription. Fine mapping of regulatory variants at these loci is a daunting task due to the juxtaposition of causal and linked variants at a locus as well as the likelihood of interactions among multiple variants. This problem is exacerbated in genes with multiple cis-acting eQTL, where superimposed effects of adjacent loci further distort the association signals. RESULTS: We developed a novel algorithm, TreeMap, that identifies putative causal variants in cis-eQTL accounting for multisite effects and genetic linkage at a locus. Guided by the hierarchical structure of linkage disequilibrium, TreeMap performs an organized search for individual and multiple causal variants. Via extensive simulations, we show that TreeMap detects co-regulating variants more accurately than current methods. Furthermore, its high computational efficiency enables genome-wide analysis of long-range eQTL. We applied TreeMap to GTEx data of brain hippocampus samples and transverse colon samples to search for eQTL in gene bodies and in 4 Mbps gene-flanking regions, discovering numerous distal eQTL. Furthermore, we found concordant distal eQTL that were present in both brain and colon samples, implying long-range regulation of gene expression. AVAILABILITY AND IMPLEMENTATION: TreeMap is available as an R package enabled for parallel processing at https://github.com/liliulab/treemap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Mapeamento Cromossômico , Colo , Expressão Gênica , Hipocampo , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas/genética
6.
Bioinformatics ; 36(Suppl_2): i675-i683, 2020 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-33381835

RESUMO

SUMMARY: Metastases cause a vast majority of cancer morbidity and mortality. Metastatic clones are formed by dispersal of cancer cells to secondary tissues, and are not medically detected or visible until later stages of cancer development. Clone phylogenies within patients provide a means of tracing the otherwise inaccessible dynamic history of migrations of cancer cells. Here, we present a new Bayesian approach, PathFinder, for reconstructing the routes of cancer cell migrations. PathFinder uses the clone phylogeny, the number of mutational differences among clones, and the information on the presence and absence of observed clones in primary and metastatic tumors. By analyzing simulated datasets, we found that PathFinder performes well in reconstructing clone migrations from the primary tumor to new metastases as well as between metastases. It was more challenging to trace migrations from metastases back to primary tumors. We found that a vast majority of errors can be corrected by sampling more clones per tumor, and by increasing the number of genetic variants assayed per clone. We also identified situations in which phylogenetic approaches alone are not sufficient to reconstruct migration routes.In conclusion, we anticipate that the use of PathFinder will enable a more reliable inference of migration histories and their posterior probabilities, which is required to assess the relative preponderance of seeding of new metastasis by clones from primary tumors and/or existing metastases. AVAILABILITY AND IMPLEMENTATION: PathFinder is available on the web at https://github.com/SayakaMiura/PathFinder.


Assuntos
Neoplasias , Teorema de Bayes , Células Clonais , Humanos , Mutação , Neoplasias/genética , Filogenia
7.
Mol Biol Evol ; 35(8): 2015-2025, 2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-29846678

RESUMO

The human genome contains hundreds of thousands of missense mutations. However, only a handful of these variants are known to be adaptive, which implies that adaptation through protein sequence change is an extremely rare phenomenon in human evolution. Alternatively, existing methods may lack the power to pinpoint adaptive variation. We have developed and applied an Evolutionary Probability Approach (EPA) to discover candidate adaptive polymorphisms (CAPs) through the discordance between allelic evolutionary probabilities and their observed frequencies in human populations. EPA reveals thousands of missense CAPs, which suggest that a large number of previously optimal alleles experienced a reversal of fortune in the human lineage. We explored nonadaptive mechanisms to explain CAPs, including the effects of demography, mutation rate variability, and negative and positive selective pressures in modern humans. Many nonadaptive hypotheses were tested, but failed to explain the data, which suggests that a large proportion of CAP alleles have increased in frequency due to beneficial selection. This suggestion is supported by the fact that a vast majority of adaptive missense variants discovered previously in humans are CAPs, and hundreds of CAP alleles are protective in genotype-phenotype association data. Our integrated phylogenomic and population genetic EPA approach predicts the existence of thousands of nonneutral candidate variants in the human proteome. We expect this collection to be enriched in beneficial variation. The EPA approach can be applied to discover candidate adaptive variation in any protein, population, or species for which allele frequency data and reliable multispecies alignments are available.


Assuntos
Adaptação Biológica , Evolução Biológica , Exoma , Genoma Humano , Polimorfismo Genético , Conversão Gênica , Humanos , Filogenia
8.
Mol Biol Evol ; 33(1): 245-54, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26464126

RESUMO

Widespread sequencing efforts are revealing unprecedented amount of genomic variation in populations. Such information is routinely used to derive consensus reference sequences and to infer positions subject to natural selection. Here, we present a new molecular evolutionary method for estimating neutral evolutionary probabilities (EPs) of each amino acid, or nucleotide state at a genomic position without using intraspecific polymorphism data. Because EPs are derived independently of population-level information, they serve as null expectations that can be used to evaluate selective forces on alleles at both polymorphic and monomorphic positions in populations. We applied this method to coding sequences in the human genome and produced a comprehensive evolutionary variome reference for all human proteins. We found that EPs accurately predict neutral and disease-associated alleles. Through an analysis of discordance between allelic EPs and their observed population frequencies, we discovered thousands of novel candidate sites for nonneutral evolution in human proteins. Many of these were validated in a joint analysis of disease-associated variants and population data. The EP method is also directly applicable to the analysis of noncoding sequences and genomic analyses of nonmodel species.


Assuntos
Evolução Molecular , Variação Genética/genética , Genoma/genética , Genômica/métodos , Adaptação Biológica/genética , Doença/genética , Humanos , Mutação/genética , Filogenia
9.
Bioinformatics ; 30(9): 1305-7, 2014 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-24413669

RESUMO

Computational diagnosis of amino acid variants in the human exome is the first step in assessing the disruptive impacts of non-synonymous single nucleotide variants (nsSNVs) on human health and disease. The Molecular Evolutionary Genetics Analysis software with mutational diagnosis (MEGA-MD) is a suite of tools developed to forecast the deleteriousness of nsSNVs using multiple methods and to explore nsSNVs in the context of the variability permitted in the long-term evolution of the affected position. In its graphical interface for use on desktops, it enables interactive computational diagnosis and evolutionary exploration of nsSNVs. As a web service, MEGA-MD is suitable for diagnosing variants on an exome scale. The MEGA-MD suite intends to serve the needs for conducting low- and high-throughput analysis of nsSNVs in diverse applications.


Assuntos
Aminoácidos/genética , Evolução Molecular , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação , Animais , Exoma , Humanos , Software
10.
Mol Biol Evol ; 29(9): 2087-94, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22389448

RESUMO

Genome-wide disease association studies contrast genetic variation between disease cohorts and healthy populations to discover single nucleotide polymorphisms (SNPs) and other genetic markers revealing underlying genetic architectures of human diseases. Despite scores of efforts over the past decade, many reproducible genetic variants that explain substantial proportions of the heritable risk of common human diseases remain undiscovered. We have conducted a multispecies genomic analysis of 5,831 putative human risk variants for more than 230 disease phenotypes reported in 2,021 studies. We find that the current approaches show a propensity for discovering disease-associated SNPs (dSNPs) at conserved genomic positions because the effect size (odds ratio) and allelic P value of genetic association of an SNP relates strongly to the evolutionary conservation of their genomic position. We propose a new measure for ranking SNPs that integrates evolutionary conservation scores and the P value (E-rank). Using published data from a large case-control study, we demonstrate that E-rank method prioritizes SNPs with a greater likelihood of bona fide and reproducible genetic disease associations, many of which may explain greater proportions of genetic variance. Therefore, long-term evolutionary histories of genomic positions offer key practical utility in reassessing data from existing disease association studies, and in the design and analysis of future studies aimed at revealing the genetic basis of common human diseases.


Assuntos
Evolução Molecular , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Animais , Marcadores Genéticos , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Curva ROC
11.
Comput Struct Biotechnol J ; 21: 3894-3903, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37602230

RESUMO

The study of tumor evolution is being revolutionalized by single-cell sequencing technologies that survey the somatic variation of cancer cells. In these endeavors, reliable inference of the evolutionary relationship of single cells is a key step. However, single-cell sequences contain many errors and missing bases, which necessitate advancing standard molecular phylogenetics approaches for applications in analyzing these datasets. We have developed a computational approach that integratively applies standard phylogenetic optimality principles and patterns of co-occurrence of sequence variations to produce more expansive and accurate cellular phylogenies from single-cell sequence datasets. We found the new approach to also perform well for CRISPR/Cas9 genome editing datasets, suggesting that it can be useful for various applications. We apply the new approach to some empirical datasets to showcase its use for reconstructing recurrent mutations and mutational reversals as well as for phylodynamics analysis to infer metastatic cell migrations between tumors.

12.
Front Bioinform ; 3: 1090730, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37261293

RESUMO

Bulk sequencing is commonly used to characterize the genetic diversity of cancer cell populations in tumors and the evolutionary relationships of cancer clones. However, bulk sequencing produces aggregate information on nucleotide variants and their sample frequencies, necessitating computational methods to predict distinct clone sequences and their frequencies within a sample. Interestingly, no methods are available to measure the statistical confidence in the variants assigned to inferred clones. We introduce a bootstrap resampling approach that combines clone prediction and statistical confidence calculation for every variant assignment. Analysis of computer-simulated datasets showed the bootstrap approach to work well in assessing the reliability of predicted clones as well downstream inferences using the predicted clones (e.g., mapping metastatic migration paths). We found that only a fraction of inferences have good bootstrap support, which means that many inferences are tentative for real data. Using the bootstrap approach, we analyzed empirical datasets from metastatic cancers and placed bootstrap confidence on the estimated number of mutations involved in cell migration events. We found that the numbers of driver mutations involved in metastatic cell migration events sourced from primary tumors are similar to those where metastatic tumors are the source of new metastases. So, mutations with driver potential seem to keep arising during metastasis. The bootstrap approach developed in this study is implemented in software available at https://github.com/SayakaMiura/CloneFinderPlus.

13.
Cancers (Basel) ; 14(17)2022 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-36077861

RESUMO

Dispersal routes of metastatic cells are not medically detected or even visible. A molecular evolutionary analysis of tumor variation provides a way to retrospectively infer metastatic migration histories and answer questions such as whether the majority of metastases are seeded from clones within primary tumors or seeded from clones within pre-existing metastases, as well as whether the evolution of metastases is generally consistent with any proposed models. We seek answers to these fundamental questions through a systematic patient-centric retrospective analysis that maps the dynamic evolutionary history of tumor cell migrations in many cancers. We analyzed tumor genetic heterogeneity in 51 cancer patients and found that most metastatic migration histories were best described by a hybrid of models of metastatic tumor evolution. Synthesizing across metastatic migration histories, we found new tumor seedings arising from clones of pre-existing metastases as often as they arose from clones from primary tumors. There were also many clone exchanges between the source and recipient tumors. Therefore, a molecular phylogenetic analysis of tumor variation provides a retrospective glimpse into general patterns of metastatic migration histories in cancer patients.

14.
bioRxiv ; 2021 Dec 14.
Artigo em Inglês | MEDLINE | ID: mdl-34931186

RESUMO

MOTIVATION: Building reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of SARS-CoV-2 strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites and millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate phylogenetic inference of resolvable phylogenetic features. RESULTS: We present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. To assess topological robustness, we develop a bootstrap resampling strategy that resamples genomes spatiotemporally. The application of TopHap to build a phylogeny of 68,057 genomes (68KG) produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major variants of concern. AVAILABILITY: TopHap is available on the web at https://github.com/SayakaMiura/TopHap . CONTACT: s.kumar@temple.edu.

15.
bioRxiv ; 2021 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-32995781

RESUMO

We report the likely most recent common ancestor of SARS-CoV-2 - the coronavirus that causes COVID-19. This progenitor SARS-CoV-2 genome was recovered through a novel application and advancement of computational methods initially developed to reconstruct the mutational history of tumor cells in a patient. The progenitor differs from the earliest coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections. However, multiple coronavirus infections in China and the USA harbored the progenitor genetic fingerprint in January 2020 and later, suggesting that the progenitor was spreading worldwide as soon as weeks after the first reported cases of COVID-19. Mutations of the progenitor and its offshoots have produced many dominant coronavirus strains, which have spread episodically over time. Fingerprinting based on common mutations reveals that the same coronavirus lineage has dominated North America for most of the pandemic. There have been multiple replacements of predominant coronavirus strains in Europe and Asia and the continued presence of multiple high-frequency strains in Asia and North America. We provide a continually updating dashboard of global evolution and spatiotemporal trends of SARS-CoV-2 spread (http://sars2evo.datamonkey.org/).

17.
Cell Rep ; 32(4): 107949, 2020 07 28.
Artigo em Inglês | MEDLINE | ID: mdl-32726638

RESUMO

Long-lived rodents have become an attractive model for the studies on aging. To understand evolutionary paths to long life, we prepare chromosome-level genome assemblies of the two longest-lived rodents, Canadian beaver (Castor canadensis) and naked mole rat (NMR, Heterocephalus glaber), which were scaffolded with in vitro proximity ligation and chromosome conformation capture data and complemented with long-read sequencing. Our comparative genomic analyses reveal that amino acid substitutions at "disease-causing" sites are widespread in the rodent genomes and that identical substitutions in long-lived rodents are associated with common adaptive phenotypes, e.g., enhanced resistance to DNA damage and cellular stress. By employing a newly developed substitution model and likelihood ratio test, we find that energy and fatty acid metabolism pathways are enriched for signals of positive selection in both long-lived rodents. Thus, the high-quality genome resource of long-lived rodents can assist in the discovery of genetic factors that control longevity and adaptive evolution.


Assuntos
Longevidade/genética , Ratos-Toupeira/genética , Roedores/genética , Envelhecimento/genética , Animais , Genoma/genética , Modelos Animais , Especificidade da Espécie , Transcriptoma/genética
18.
Curr Biol ; 30(22): 4329-4341.e4, 2020 11 16.
Artigo em Inglês | MEDLINE | ID: mdl-32888484

RESUMO

Naked mole-rats are highly vocal, eusocial, subterranean rodents with, counterintuitively, poor hearing. The causes underlying their altered hearing are unknown. Moreover, whether altered hearing is degenerate or adaptive to their unique lifestyles is controversial. We used various methods to identify the factors contributing to altered hearing in naked and the related Damaraland mole-rats and to examine whether these alterations result from relaxed or adaptive selection. Remarkably, we found that cochlear amplification was absent from both species despite normal prestin function in outer hair cells isolated from naked mole-rats. Instead, loss of cochlear amplification appears to result from abnormal hair bundle morphologies observed in both species. By exploiting a well-curated deafness phenotype-genotype database, we identified amino acid substitutions consistent with abnormal hair bundle morphology and reduced hearing sensitivity. Amino acid substitutions were found in unique groups of six hair bundle link proteins. Molecular evolutionary analyses revealed shifts in selection pressure at both the gene and the codon level for five of these six hair bundle link proteins. Substitutions in three of these proteins are associated exclusively with altered hearing. Altogether, our findings identify the likely mechanism of altered hearing in African mole-rats, making them the only identified mammals naturally lacking cochlear amplification. Moreover, our findings suggest that altered hearing in African mole-rats is adaptive, perhaps tailoring hearing to eusocial and subterranean lifestyles. Finally, our work reveals multiple, unique evolutionary trajectories in African mole-rat hearing and establishes species members as naturally occurring disease models to investigate human hearing loss.


Assuntos
Adaptação Fisiológica/genética , Surdez/genética , Evolução Molecular , Audição/genética , Ratos-Toupeira/fisiologia , África , Substituição de Aminoácidos , Animais , Células Ciliadas Auditivas/fisiologia , Células Ciliadas Auditivas/ultraestrutura , Microscopia Eletrônica de Varredura , Seleção Genética
19.
Nat Commun ; 10(1): 330, 2019 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-30659175

RESUMO

Computational prediction of the phenotypic propensities of noncoding single nucleotide variants typically combines annotation of genomic, functional and evolutionary attributes into a single score. Here, we evaluate if the claimed excellent accuracies of these predictions translate into high rates of success in addressing questions important in biological research, such as fine mapping causal variants, distinguishing pathogenic allele(s) at a given position, and prioritizing variants for genetic risk assessment. A significant disconnect is found to exist between the statistical modelling and biological performance of predictive approaches. We discuss fundamental reasons underlying these deficiencies and suggest that future improvements of computational predictions need to address confounding of allelic, positional and regional effects as well as imbalance of the proportion of true positive variants in candidate lists.


Assuntos
Doença/genética , Modelos Estatísticos , RNA não Traduzido/genética , Algoritmos , Animais , Biologia Computacional , Evolução Molecular , Estudo de Associação Genômica Ampla , Humanos , Aprendizado de Máquina , Mamíferos/genética , Polimorfismo de Nucleotídeo Único
20.
G3 (Bethesda) ; 7(8): 2791-2797, 2017 08 07.
Artigo em Inglês | MEDLINE | ID: mdl-28667017

RESUMO

Gene expression patterns assayed across development can offer key clues about a gene's function and regulatory role. Drosophila melanogaster is ideal for such investigations as multiple individual and high-throughput efforts have captured the spatiotemporal patterns of thousands of embryonic expressed genes in the form of in situ images. FlyExpress (www.flyexpress.net), a knowledgebase based on a massive and unique digital library of standardized images and a simple search engine to find coexpressed genes, was created to facilitate the analytical and visual mining of these patterns. Here, we introduce the next generation of FlyExpress resources to facilitate the integrative analysis of sequence data and spatiotemporal patterns of expression from images. FlyExpress 7 now includes over 100,000 standardized in situ images and implements a more efficient, user-defined search algorithm to identify coexpressed genes via Genomewide Expression Maps (GEMs). Shared motifs found in the upstream 5' regions of any pair of coexpressed genes can be visualized in an interactive dotplot. Additional webtools and link-outs to assist in the downstream validation of candidate motifs are also provided. Together, FlyExpress 7 represents our largest effort yet to accelerate discovery via the development and dispersal of new webtools that allow researchers to perform data-driven analyses of coexpression (image) and genomic (sequence) data.


Assuntos
Drosophila melanogaster/genética , Regulação da Expressão Gênica , Imageamento Tridimensional , Hibridização In Situ , Software , Animais , Sítios de Ligação/genética , Sequência Conservada/genética , Genoma de Inseto , Fatores de Transcrição/metabolismo
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa