Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
J Evol Biol ; 36(10): 1525-1538, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37776088

RESUMO

Populations suffer two types of stochasticity: demographic stochasticity, from sampling error in offspring number, and environmental stochasticity, from temporal variation in the growth rate. By modelling evolution through phenotypic selection following an abrupt environmental change, we investigate how genetic and demographic dynamics, as well as effects on population survival of the genetic variance and of the strength of stabilizing selection, differ under the two types of stochasticity. We show that population survival probability declines sharply with stronger stabilizing selection under demographic stochasticity, but declines more continuously when environmental stochasticity is strengthened. However, the genetic variance that confers the highest population survival probability differs little under demographic and environmental stochasticity. Since the influence of demographic stochasticity is stronger when population size is smaller, a slow initial decline of genetic variance, which allows quicker evolution, is important for population persistence. In contrast, the influence of environmental stochasticity is population-size-independent, so higher initial fitness becomes important for survival under strong environmental stochasticity. The two types of stochasticity interact in a more than multiplicative way in reducing the population survival probability. Our work suggests the importance of explicitly distinguishing and measuring the forms of stochasticity during evolutionary rescue.

2.
Syst Biol ; 71(6): 1290-1306, 2022 10 12.
Artigo em Inglês | MEDLINE | ID: mdl-35285502

RESUMO

Morphology remains a primary source of phylogenetic information for many groups of organisms, and the only one for most fossil taxa. Organismal anatomy is not a collection of randomly assembled and independent "parts", but instead a set of dependent and hierarchically nested entities resulting from ontogeny and phylogeny. How do we make sense of these dependent and at times redundant characters? One promising approach is using ontologies-structured controlled vocabularies that summarize knowledge about different properties of anatomical entities, including developmental and structural dependencies. Here, we assess whether evolutionary patterns can explain the proximity of ontology-annotated characters within an ontology. To do so, we measure phylogenetic information across characters and evaluate if it matches the hierarchical structure given by ontological knowledge-in much the same way as across-species diversity structure is given by phylogeny. We implement an approach to evaluate the Bayesian phylogenetic information (BPI) content and phylogenetic dissonance among ontology-annotated anatomical data subsets. We applied this to data sets representing two disparate animal groups: bees (Hexapoda: Hymenoptera: Apoidea, 209 chars) and characiform fishes (Actinopterygii: Ostariophysi: Characiformes, 463 chars). For bees, we find that BPI is not substantially explained by anatomy since dissonance is often high among morphologically related anatomical entities. For fishes, we find substantial information for two clusters of anatomical entities instantiating concepts from the jaws and branchial arch bones, but among-subset information decreases and dissonance increases substantially moving to higher-level subsets in the ontology. We further applied our approach to address particular evolutionary hypotheses with an example of morphological evolution in miniature fishes. While we show that phylogenetic information does match ontology structure for some anatomical entities, additional relationships and processes, such as convergence, likely play a substantial role in explaining BPI and dissonance, and merit future investigation. Our work demonstrates how complex morphological data sets can be interrogated with ontologies by allowing one to access how information is spread hierarchically across anatomical concepts, how congruent this information is, and what sorts of processes may play a role in explaining it: phylogeny, development, or convergence. [Apidae; Bayesian phylogenetic information; Ostariophysi; Phenoscape; phylogenetic dissonance; semantic similarity.].


Assuntos
Artrópodes , Caraciformes , Animais , Teorema de Bayes , Fósseis , Filogenia
3.
Syst Biol ; 69(2): 345-362, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31596473

RESUMO

There is a growing body of research on the evolution of anatomy in a wide variety of organisms. Discoveries in this field could be greatly accelerated by computational methods and resources that enable these findings to be compared across different studies and different organisms and linked with the genes responsible for anatomical modifications. Homology is a key concept in comparative anatomy; two important types are historical homology (the similarity of organisms due to common ancestry) and serial homology (the similarity of repeated structures within an organism). We explored how to most effectively represent historical and serial homology across anatomical structures to facilitate computational reasoning. We assembled a collection of homology assertions from the literature with a set of taxon phenotypes for the skeletal elements of vertebrate fins and limbs from the Phenoscape Knowledgebase. Using seven competency questions, we evaluated the reasoning ramifications of two logical models: the Reciprocal Existential Axioms (REA) homology model and the Ancestral Value Axioms (AVA) homology model. The AVA model returned all user-expected results in addition to the search term and any of its subclasses. The AVA model also returns any superclass of the query term in which a homology relationship has been asserted. The REA model returned the user-expected results for five out of seven queries. We identify some challenges of implementing complete homology queries due to limitations of OWL reasoning. This work lays the foundation for homology reasoning to be incorporated into other ontology-based tools, such as those that enable synthetic supermatrix construction and candidate gene discovery. [Homology; ontology; anatomy; morphology; evolution; knowledgebase; phenoscape.].


Assuntos
Classificação/métodos , Modelos Biológicos , Nadadeiras de Animais/anatomia & histologia , Animais , Extremidades/anatomia & histologia , Vertebrados/anatomia & histologia
4.
Res Policy ; 50(1): 104069, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33390628

RESUMO

Synthesis centers are a form of scientific organization that catalyzes and supports research that integrates diverse theories, methods and data across spatial or temporal scales to increase the generality, parsimony, applicability, or empirical soundness of scientific explanations. Synthesis working groups are a distinctive form of scientific collaboration that produce consequential, high-impact publications. But no one has asked if synthesis working groups synthesize: are their publications substantially more diverse than others, and if so, in what ways and with what effect? We investigate these questions by using Latent Dirichlet Analysis to compare the topical diversity of papers published by synthesis center collaborations with that of papers in a reference corpus. Topical diversity was operationalized and measured in several ways, both to reflect aggregate diversity and to emphasize particular aspects of diversity (such as variety, evenness, and balance). Synthesis center publications have greater topical variety and evenness, but less disparity, than do papers in the reference corpus. The influence of synthesis center origins on aspects of diversity is only partly mediated by the size and heterogeneity of collaborations: when taking into account the numbers of authors, distinct institutions, and references, synthesis center origins retain a significant direct effect on diversity measures. Controlling for the size and heterogeneity of collaborative groups, synthesis center origins and diversity measures significantly influence the visibility of publications, as indicated by citation measures. We conclude by suggesting social processes within collaborations that might account for the observed effects, by inviting further exploration of what this novel textual analysis approach might reveal about interdisciplinary research, and by offering some practical implications of our results.

5.
Mol Biol Evol ; 33(1): 13-24, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26500251

RESUMO

Phenotypes resulting from mutations in genetic model organisms can help reveal candidate genes for evolutionarily important phenotypic changes in related taxa. Although testing candidate gene hypotheses experimentally in nonmodel organisms is typically difficult, ontology-driven information systems can help generate testable hypotheses about developmental processes in experimentally tractable organisms. Here, we tested candidate gene hypotheses suggested by expert use of the Phenoscape Knowledgebase, specifically looking for genes that are candidates responsible for evolutionarily interesting phenotypes in the ostariophysan fishes that bear resemblance to mutant phenotypes in zebrafish. For this, we searched ZFIN for genetic perturbations that result in either loss of basihyal element or loss of scales phenotypes, because these are the ancestral phenotypes observed in catfishes (Siluriformes). We tested the identified candidate genes by examining their endogenous expression patterns in the channel catfish, Ictalurus punctatus. The experimental results were consistent with the hypotheses that these features evolved through disruption in developmental pathways at, or upstream of, brpf1 and eda/edar for the ancestral losses of basihyal element and scales, respectively. These results demonstrate that ontological annotations of the phenotypic effects of genetic alterations in model organisms, when aggregated within a knowledgebase, can be used effectively to generate testable, and useful, hypotheses about evolutionary changes in morphology.


Assuntos
Peixes-Gato/genética , Evolução Molecular , Expressão Gênica , Modelos Genéticos , Fenótipo , Animais , Biologia Computacional , Expressão Gênica/genética , Expressão Gênica/fisiologia , Software
6.
PLoS Biol ; 11(1): e1001468, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23335860

RESUMO

How should funding agencies enable researchers to explore high-risk but potentially high-reward science? One model that appears to work is the NSF-funded synthesis center, an incubator for community-led, innovative science.


Assuntos
Pesquisa Biomédica/economia , Financiamento Governamental/economia , Interpretação Estatística de Dados , Administração Financeira , Humanos , Pesquisadores , Estados Unidos
7.
Genesis ; 53(8): 561-71, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26220875

RESUMO

The abundance of phenotypic diversity among species can enrich our knowledge of development and genetics beyond the limits of variation that can be observed in model organisms. The Phenoscape Knowledgebase (KB) is designed to enable exploration and discovery of phenotypic variation among species. Because phenotypes in the KB are annotated using standard ontologies, evolutionary phenotypes can be compared with phenotypes from genetic perturbations in model organisms. To illustrate the power of this approach, we review the use of the KB to find taxa showing evolutionary variation similar to that of a query gene. Matches are made between the full set of phenotypes described for a gene and an evolutionary profile, the latter of which is defined as the set of phenotypes that are variable among the daughters of any node on the taxonomic tree. Phenoscape's semantic similarity interface allows the user to assess the statistical significance of each match and flags matches that may only result from differences in annotation coverage between genetic and evolutionary studies. Tools such as this will help meet the challenge of relating the growing volume of genetic knowledge in model organisms to the diversity of phenotypes in nature. The Phenoscape KB is available at http://kb.phenoscape.org.


Assuntos
Bases de Dados Genéticas , Estudos de Associação Genética/métodos , Animais , Evolução Biológica , Biologia Computacional/métodos , Humanos , Bases de Conhecimento , Fenótipo
8.
Proc Natl Acad Sci U S A ; 107(26): 11889-94, 2010 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-20547848

RESUMO

The mushroom Coprinopsis cinerea is a classic experimental model for multicellular development in fungi because it grows on defined media, completes its life cycle in 2 weeks, produces some 10(8) synchronized meiocytes, and can be manipulated at all stages in development by mutation and transformation. The 37-megabase genome of C. cinerea was sequenced and assembled into 13 chromosomes. Meiotic recombination rates vary greatly along the chromosomes, and retrotransposons are absent in large regions of the genome with low levels of meiotic recombination. Single-copy genes with identifiable orthologs in other basidiomycetes are predominant in low-recombination regions of the chromosome. In contrast, paralogous multicopy genes are found in the highly recombining regions, including a large family of protein kinases (FunK1) unique to multicellular fungi. Analyses of P450 and hydrophobin gene families confirmed that local gene duplications drive the expansions of paralogous copies and the expansions occur in independent lineages of Agaricomycotina fungi. Gene-expression patterns from microarrays were used to dissect the transcriptional program of dikaryon formation (mating). Several members of the FunK1 kinase family are differentially regulated during sexual morphogenesis, and coordinate regulation of adjacent duplications is rare. The genomes of C. cinerea and Laccaria bicolor, a symbiotic basidiomycete, share extensive regions of synteny. The largest syntenic blocks occur in regions with low meiotic recombination rates, no transposable elements, and tight gene spacing, where orthologous single-copy genes are overrepresented. The chromosome assembly of C. cinerea is an essential resource in understanding the evolution of multicellularity in the fungi.


Assuntos
Cromossomos Fúngicos/genética , Coprinus/genética , Evolução Molecular , Sequência de Bases , Mapeamento Cromossômico , Coprinus/citologia , Coprinus/crescimento & desenvolvimento , Sistema Enzimático do Citocromo P-450/genética , Primers do DNA/genética , Proteínas Fúngicas/genética , Duplicação Gênica , Genoma Fúngico , Meiose/genética , Dados de Sequência Molecular , Família Multigênica , Filogenia , Proteínas Quinases/genética , RNA Fúngico/genética , Recombinação Genética , Retroelementos/genética
9.
Syst Biol ; 60(2): 117-25, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21186249

RESUMO

Phylogenetic analyses using genome-scale data sets must confront incongruence among gene trees, which in plants is exacerbated by frequent gene duplications and losses. Gene tree parsimony (GTP) is a phylogenetic optimization criterion in which a species tree that minimizes the number of gene duplications induced among a set of gene trees is selected. The run time performance of previous implementations has limited its use on large-scale data sets. We used new software that incorporates recent algorithmic advances to examine the performance of GTP on a plant data set consisting of 18,896 gene trees containing 510,922 protein sequences from 136 plant taxa (giving a combined alignment length of >2.9 million characters). The relationships inferred from the GTP analysis were largely consistent with previous large-scale studies of backbone plant phylogeny and resolved some controversial nodes. The placement of taxa that were present in few gene trees generally varied the most among GTP bootstrap replicates. Excluding these taxa either before or after the GTP analysis revealed high levels of phylogenetic support across plants. The analyses supported magnoliids sister to a eudicot + monocot clade and did not support the eurosid I and II clades. This study presents a nuclear genomic perspective on the broad-scale phylogenic relationships among plants, and it demonstrates that nuclear genes with a history of duplication and loss can be phylogenetically informative for resolving the plant tree of life.


Assuntos
Classificação/métodos , Filogenia , Plantas/classificação , Plantas/genética , Algoritmos , Etiquetas de Sequências Expressas , Genômica
11.
PLoS Genet ; 5(6): e1000502, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19503594

RESUMO

High-throughput techniques for detecting DNA polymorphisms generally do not identify changes in which the genomic position of a sequence, but not its copy number, varies among individuals. To explore such balanced structural polymorphisms, we used array-based Comparative Genomic Hybridization (aCGH) to conduct a genome-wide screen for single-copy genomic segments that occupy different genomic positions in the standard laboratory strain of Saccharomyces cerevisiae (S90) and a polymorphic wild isolate (Y101) through analysis of six tetrads from a cross of these two strains. Paired-end high-throughput sequencing of Y101 validated four of the predicted rearrangements. The transposed segments contained one to four annotated genes each, yet crosses between S90 and Y101 yielded mostly viable tetrads. The longest segment comprised 13.5 kb near the telomere of chromosome XV in the S288C reference strain and Southern blotting confirmed its predicted location on chromosome IX in Y101. Interestingly, inter-locus crossover events between copies of this segment occurred at a detectable rate. The presence of low-copy repetitive sequences at the junctions of this segment suggests that it may have arisen through ectopic recombination. Our methodology and findings provide a starting point for exploring the origins, phenotypic consequences, and evolutionary fate of this largely unexplored form of genomic polymorphism.


Assuntos
Elementos de DNA Transponíveis/genética , Polimorfismo Genético/genética , Saccharomyces cerevisiae/genética , Hibridização Genômica Comparativa , DNA Fúngico , Dosagem de Genes , Genoma Fúngico , Modelos Genéticos
12.
Syst Biol ; 59(4): 369-83, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20547776

RESUMO

The rich knowledge of morphological variation among organisms reported in the systematic literature has remained in free-text format, impractical for use in large-scale synthetic phylogenetic work. This noncomputable format has also precluded linkage to the large knowledgebase of genomic, genetic, developmental, and phenotype data in model organism databases. We have undertaken an effort to prototype a curated, ontology-based evolutionary morphology database that maps to these genetic databases (http://kb.phenoscape.org) to facilitate investigation into the mechanistic basis and evolution of phenotypic diversity. Among the first requirements in establishing this database was the development of a multispecies anatomy ontology with the goal of capturing anatomical data in a systematic and computable manner. An ontology is a formal representation of a set of concepts with defined relationships between those concepts. Multispecies anatomy ontologies in particular are an efficient way to represent the diversity of morphological structures in a clade of organisms, but they present challenges in their development relative to single-species anatomy ontologies. Here, we describe the Teleost Anatomy Ontology (TAO), a multispecies anatomy ontology for teleost fishes derived from the Zebrafish Anatomical Ontology (ZFA) for the purpose of annotating varying morphological features across species. To facilitate interoperability with other anatomy ontologies, TAO uses the Common Anatomy Reference Ontology as a template for its upper level nodes, and TAO and ZFA are synchronized, with zebrafish terms specified as subtypes of teleost terms. We found that the details of ontology architecture have ramifications for querying, and we present general challenges in developing a multispecies anatomy ontology, including refinement of definitions, taxon-specific relationships among terms, and representation of taxonomically variable developmental pathways.


Assuntos
Evolução Biológica , Peixes/anatomia & histologia , Peixes/genética , Animais , Classificação , Biologia Computacional , Bases de Dados Factuais , Genômica
13.
Bioinformatics ; 25(5): 592-8, 2009 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-19147663

RESUMO

MOTIVATION: Full-length DNA and protein sequences that span the entire length of a gene are ideally used for multiple sequence alignments (MSAs) and the subsequent inference of their relationships. Frequently, however, MSAs contain a substantial amount of missing data. For example, expressed sequence tags (ESTs), which are partial sequences of expressed genes, are the predominant source of sequence data for many organisms. The patterns of missing data typical for EST-derived alignments greatly compromise the accuracy of estimated phylogenies. RESULTS: We present a statistical method for inferring phylogenetic trees from EST-based incomplete MSA data. We propose a class of hierarchical models for modeling pairwise distances between the sequences, and develop a fully Bayesian approach for estimation of the model parameters. Once the distance matrix is estimated, the phylogenetic tree may be constructed by applying neighbor-joining (or any other algorithm of choice). We also show that maximizing the marginal likelihood from the Bayesian approach yields similar results to a profile likelihood estimation. The proposed methods are illustrated using simulated protein families, for which the true phylogeny is known, and one real protein family. AVAILABILITY: R code for fitting these models are available from: http://people.bu.edu/gupta/software.htm.


Assuntos
Biologia Computacional/métodos , Filogenia , Alinhamento de Sequência/métodos , Algoritmos , Teorema de Bayes , Etiquetas de Sequências Expressas , Modelos Estatísticos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos
14.
Evolution ; 74(8): 1590-1602, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32267552

RESUMO

The role of genetic architecture in adaptation to novel environments has received considerable attention when the source of adaptive variation is de novo mutation. Relatively less is known when the source of adaptive variation is inter- or intraspecific hybridization. We model hybridization between divergent source populations and subsequent colonization of an unoccupied novel environment using individual-based simulations to understand the influence of genetic architecture on the timing of colonization and the mode of adaptation. We find that two distinct categories of genetic architecture facilitate rapid colonization but that they do so in qualitatively different ways. For few and/or tightly linked loci, the mode of adaptation is via the recovery of adaptive parental genotypes. With many unlinked loci, the mode of adaptation is via the generation of novel hybrid genotypes. The first category results in the shortest colonization lag phases across the widest range of parameter space, but further adaptation is mutation limited. The second category takes longer and is more sensitive to genetic variance and dispersal rate, but can facilitate adaptation to environmental conditions that exceed the tolerance of parental populations. These findings have implications for understanding the origins of biological invasions and the success of hybrid populations.


Assuntos
Hibridização Genética , Espécies Introduzidas , Modelos Genéticos , Epistasia Genética , Ligação Genética
15.
PLoS One ; 15(3): e0230281, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32210449

RESUMO

Despite the increase in the number of journals issuing data policies requiring authors to make data underlying reporting findings publicly available, authors do not always do so, and when they do, the data do not always meet standards of quality that allow others to verify or extend published results. This phenomenon suggests the need to consider the effectiveness of journal data policies to present and articulate transparency requirements, and how well they facilitate (or hinder) authors' ability to produce and provide access to data, code, and associated materials that meet quality standards for computational reproducibility. This article describes the results of a research study that examined the ability of journal-based data policies to: 1) effectively communicate transparency requirements to authors, and 2) enable authors to successfully meet policy requirements. To do this, we conducted a mixed-methods study that examined individual data policies alongside editors' and authors' interpretation of policy requirements to answer the following research questions. Survey responses from authors and editors along with results from a content analysis of data policies found discrepancies among editors' assertion of data policy requirements, authors' understanding of policy requirements, and the requirements stated in the policy language as written. We offer explanations for these discrepancies and offer recommendations for improving authors' understanding of policies and increasing the likelihood of policy compliance.


Assuntos
Atitude , Políticas Editoriais , Publicações Periódicas como Assunto/normas , Fidelidade a Diretrizes/estatística & dados numéricos , Humanos , Inquéritos e Questionários , Redação/normas
16.
Mol Biol Evol ; 25(8): 1778-87, 2008 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-18535013

RESUMO

There is growing evidence that interactions between biological molecules (e.g., RNA-RNA, protein-protein, RNA-protein) place limits on the rate and trajectory of molecular evolution. Here, by extending Kimura's model of compensatory evolution at interacting sites, we show that the ratio of transition to transversion substitutions (kappa) at interacting sites should be equal to the square of the ratio at independent sites. Because transition mutations generally occur at a higher rate than transversions, the model predicts that kappa should be higher at interacting sites than at independent sites. We tested this prediction in 10 RNA secondary structures by comparing phylogenetically derived estimates of kappa in paired sites within stems (kappa(p)) and unpaired sites within loops (kappa(u)). Eight of the 10 structures showed an excellent match to the quantitative predictions of the model, and 9 of the 10 structures matched the qualitative prediction kappa(p) > kappa(u). Only the Rev response element from the human immunovirus (HIV) genome showed the reverse pattern, with kappa(p) < kappa(u). Although a variety of evolutionary forces could produce quantitative deviations from the model predictions, the reversal in magnitude of kappa(p) and kappa(u) could be achieved only by violating the model assumption that the underlying transition (or transversion) mutation rates were identical in paired and unpaired regions of the molecule. We explore the ability of the APOBEC3 enzymes, host defense mechanisms against retroviruses, which induce transition mutations preferentially in single-stranded regions of the HIV genome, to explain this exception to the rule. Taken as a whole, our findings suggest that kappa may have utility as a simple diagnostic to evaluate proposed secondary structures.


Assuntos
Evolução Molecular , Genes env/genética , Modelos Genéticos , Mutação/genética , Conformação de Ácido Nucleico , Filogenia , RNA/genética , Desaminases APOBEC , Teorema de Bayes , Biologia Computacional , Citidina Desaminase , Citosina Desaminase/genética , Alinhamento de Sequência
17.
PeerJ Comput Sci ; 5: e234, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-33816887

RESUMO

Conferences with contributed talks grouped into multiple concurrent sessions pose an interesting scheduling problem. From an attendee's perspective, choosing which talks to visit when there are many concurrent sessions is challenging since an individual may be interested in topics that are discussed in different sessions simultaneously. The frequency of topically similar talks in different concurrent sessions is, in fact, a common cause for complaint in post-conference surveys. Here, we introduce a practical solution to the conference scheduling problem by heuristic optimization of an objective function that weighs the occurrence of both topically similar talks in one session and topically different talks in concurrent sessions. Rather than clustering talks based on a limited number of preconceived topics, we employ a topic model to allow the topics to naturally emerge from the corpus of contributed talk titles and abstracts. We then measure the topical distance between all pairs of talks. Heuristic optimization of preliminary schedules seeks to balance the topical similarity of talks within a session and the dissimilarity between concurrent sessions. Using an ecology conference as a test case, we find that stochastic optimization dramatically improves the objective function relative to the schedule manually produced by the program committee. Approximate Integer Linear Programming can be used to provide a partially-optimized starting schedule, but the final value of the discrimination ratio (an objective function used to estimate coherence within a session and disparity between concurrent sessions) is surprisingly insensitive to the starting schedule. Furthermore, we show that, in contrast to the manual process, arbitrary scheduling constraints are straightforward to include. We applied our method to a second biology conference with over 1,000 contributed talks plus scheduling constraints. In a randomized experiment, biologists responded similarly to a machine-optimized schedule and a highly modified schedule produced by domain experts on the conference program committee.

18.
BMC Evol Biol ; 8: 95, 2008 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-18366758

RESUMO

BACKGROUND: While full genome sequences are still only available for a handful of taxa, large collections of partial gene sequences are available for many more. The alignment of partial gene sequences results in a multiple sequence alignment containing large gaps that are arranged in a staggered pattern. The consequences of this pattern of missing data on the accuracy of phylogenetic analysis are not well understood. We conducted a simulation study to determine the accuracy of phylogenetic trees obtained from gappy alignments using three commonly used phylogenetic reconstruction methods (Neighbor Joining, Maximum Parsimony, and Maximum Likelihood) and studied ways to improve the accuracy of trees obtained from such datasets. RESULTS: We found that the pattern of gappiness in multiple sequence alignments derived from partial gene sequences substantially compromised phylogenetic accuracy even in the absence of alignment error. The decline in accuracy was beyond what would be expected based on the amount of missing data. The decline was particularly dramatic for Neighbor Joining and Maximum Parsimony, where the majority of gappy alignments contained 25% to 40% incorrect quartets. To improve the accuracy of the trees obtained from a gappy multiple sequence alignment, we examined two approaches. In the first approach, alignment masking, potentially problematic columns and input sequences are excluded from from the dataset. Even in the absence of alignment error, masking improved phylogenetic accuracy up to 100-fold. However, masking retained, on average, only 83% of the input sequences. In the second approach, alignment subdivision, the missing data is statistically modelled in order to retain as many sequences as possible in the phylogenetic analysis. Subdivision resulted in more modest improvements to alignment accuracy, but succeeded in including almost all of the input sequences. CONCLUSION: These results demonstrate that partial gene sequences and gappy multiple sequence alignments can pose a major problem for phylogenetic analysis. The concern will be greatest for high-throughput phylogenomic analyses, in which Neighbor Joining is often the preferred method due to its computational efficiency. Both approaches can be used to increase the accuracy of phylogenetic inference from a gappy alignment. The choice between the two approaches will depend upon how robust the application is to the loss of sequences from the input set, with alignment masking generally giving a much greater improvement in accuracy but at the cost of discarding a larger number of the input sequences.


Assuntos
Biologia Computacional/métodos , Etiquetas de Sequências Expressas , Modelos Genéticos , Filogenia , Alinhamento de Sequência/métodos , Evolução Molecular , Funções Verossimilhança , Análise de Sequência de Proteína , Software
19.
Bioinformatics ; 23(9): 1132-40, 2007 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-17237041

RESUMO

MOTIVATION: Identification of the genetic variation underlying complex traits is challenging. The wealth of information publicly available about the biology of complex traits and the function of individual genes permits the development of informatics-assisted methods for the selection of candidate genes for these traits. RESULTS: We have developed a computational system named CAESAR that ranks all annotated human genes as candidates for a complex trait by using ontologies to semantically map natural language descriptions of the trait with a variety of gene-centric information sources. In a test of its effectiveness, CAESAR successfully selected 7 out of 18 (39%) complex human trait susceptibility genes within the top 2% of ranked candidates genome-wide, a subset that represents roughly 1% of genes in the human genome and provides sufficient enrichment for an association study of several hundred human genes. This approach can be applied to any well-documented mono- or multi-factorial trait in any organism for which an annotated gene set exists. AVAILABILITY: CAESAR scripts and test data can be downloaded from http://visionlab.bio.unc.edu/caesar/


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genes/genética , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Locos de Características Quantitativas/genética , Software , Algoritmos
20.
Nucleic Acids Res ; 34(Database issue): D724-30, 2006 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-16381967

RESUMO

Phytome is an online comparative genomics resource that can be applied to functional plant genomics, molecular breeding and evolutionary studies. It contains predicted protein sequences, protein family assignments, multiple sequence alignments, phylogenies and functional annotations for proteins from a large, phylogenetically diverse set of plant taxa. Phytome serves as a glue between disparate plant gene databases both by identifying the evolutionary relationships among orthologous and paralogous protein sequences from different species and by enabling cross-references between different versions of the same gene curated independently by different database groups. The web interface enables sophisticated queries on lineage-specific patterns of gene/protein family proliferation and loss. This rich dataset is serving as a platform for the unification of sequence-anchored comparative maps across taxonomic families of plants. The Phytome web interface can be accessed at the following URL: http://www.phytome.org. Batch homology searches and bulk downloads are available upon free registration.


Assuntos
Bases de Dados Genéticas , Genoma de Planta , Filogenia , Proteínas de Plantas/classificação , Plantas/classificação , Genômica , Internet , Proteínas de Plantas/química , Proteínas de Plantas/genética , Plantas/genética , Alinhamento de Sequência , Análise de Sequência de Proteína , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA