Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 73
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
2.
Int J Mol Sci ; 23(12)2022 Jun 12.
Artículo en Inglés | MEDLINE | ID: mdl-35743002

RESUMEN

The isochore theory, which was proposed more than 40 years ago, depicts the mammalian genome as a mosaic of long, homogeneous regions that are characterized by their guanine and cytosine (GC) content. The human genome, for instance, was claimed to consist of five compositionally distinct isochore families. The isochore theory, in all its reincarnations, has been repeatedly falsified in the literature, yet isochore proponents have persistently resurrected it by either redefining isochores or by proposing alternative means of testing the theory. Here, I deal with the latest attempt to salvage this seemingly immortal zombie-a sequence segmentation method called isoSegmenter, which was claimed to "identify" isochores while at the same time disregarding the main characteristic attribute of isochores-compositional homogeneity. I used a series of controlled, randomly generated simulated sequences as a benchmark to study the performance of isoSegmenter. The main advantage of using simulated sequences is that, unlike real data, the exact start and stop point of any isochore or homogeneous compositional domain is known. Based on three key performance metrics-sensitivity, precision, and Jaccard similarity index-isoSegmenter was found to be vastly inferior to isoPlotter, a segmentation algorithm with no user input. Moreover, isoSegmenter identified isochores where none exist and failed to identify compositionally homogeneous sequences that were shorter than 100-200 kb. Will this zillionth refutation of "isochores" ensure a final and permanent entombment of the isochore theory? This author is not holding his breath.


Asunto(s)
Genoma Humano , Isocoras , Algoritmos , Animales , Composición de Base , Encéfalo , Humanos , Mamíferos/genética
3.
J Mol Evol ; 86(6): 365-378, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29955898

RESUMEN

A low ratio of nonsynonymous and synonymous substitution rates (dN/dS) at a codon is an indicator of functional constraint caused by purifying selection. Intuitively, the functional constraint would also be expected to prevent such a codon from being deleted. However, to the best of our knowledge, the correlation between the rates of deletion and substitution has never actually been estimated. Here, we use 8595 protein-coding region sequences from nine mammalian species to examine the relationship between deletion rate and dN/dS. We find significant positive correlations at the levels of both sites and genes. We compared our data against controls consisting of simulated coding sequences evolving along identical phylogenetic trees, where deletions occur independently of substitutions. A much weaker correlation was found in the corresponding simulated sequences, probably caused by alignment errors. In the real data, the correlations cannot be explained by alignment errors. Separate investigations on nonsynonymous (dN) and synonymous (dS) substitution rates indicate that the correlation is most likely due to a similarity in patterns of selection rather than in mutation rates.


Asunto(s)
Aminoácidos/genética , Proteínas/química , Proteínas/genética , Selección Genética , Secuencia de Aminoácidos , Sustitución de Aminoácidos , Animales , Genes , Mamíferos/genética , Filogenia , Estadísticas no Paramétricas
4.
J Mol Evol ; 82(1): 51-64, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26563252

RESUMEN

It has been claimed that synonymous sites in mammals are under selective constraint. Furthermore, in many studies the selective constraint at such sites in primates was claimed to be more stringent than that in rodents. Given the larger effective population sizes in rodents than in primates, the theoretical expectation is that selection in rodents would be more effective than that in primates. To resolve this contradiction between expectations and observations, we used processed pseudogenes as a model for strict neutral evolution, and estimated selective constraint on synonymous sites using the rate of substitution at pseudosynonymous and pseudononsynonymous sites in pseudogenes as the neutral expectation. After controlling for the effects of GC content, our results were similar to those from previous studies, i.e., synonymous sites in primates exhibited evidence for higher selective constraint that those in rodents. Specifically, our results indicated that in primates up to 24% of synonymous sites could be under purifying selection, while in rodents synonymous sites evolved neutrally. To further control for shifts in GC content, we estimated selective constraint at fourfold degenerate sites using a maximum parsimony approach. This allowed us to estimate selective constraint using mutational patterns that cause a shift in GC content (GT ↔ TG, CT ↔ TC, GA ↔ AG, and CA ↔ AC) and ones that do not (AT ↔ TA and CG ↔ GC). Using this approach, we found that synonymous sites evolve neutrally in both primates and rodents. Apparent deviations from neutrality were caused by a higher rate of C → A and C → T mutations in pseudogenes. Such differences are most likely caused by the shift in GC content experienced by pseudogenes. We conclude that previous estimates according to which 20-40% of synonymous sites in primates were under selective constraint were most likely artifacts of the biased pattern of mutation.


Asunto(s)
Evolución Molecular , Genes , Modelos Genéticos , Mutación , Primates/genética , Roedores/genética , Animales , Composición de Base , Humanos
5.
Genome Res ; 23(8): 1235-47, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23636946

RESUMEN

Genomes of eusocial insects code for dramatic examples of phenotypic plasticity and social organization. We compared the genomes of seven ants, the honeybee, and various solitary insects to examine whether eusocial lineages share distinct features of genomic organization. Each ant lineage contains ∼4000 novel genes, but only 64 of these genes are conserved among all seven ants. Many gene families have been expanded in ants, notably those involved in chemical communication (e.g., desaturases and odorant receptors). Alignment of the ant genomes revealed reduced purifying selection compared with Drosophila without significantly reduced synteny. Correspondingly, ant genomes exhibit dramatic divergence of noncoding regulatory elements; however, extant conserved regions are enriched for novel noncoding RNAs and transcription factor-binding sites. Comparison of orthologous gene promoters between eusocial and solitary species revealed significant regulatory evolution in both cis (e.g., Creb) and trans (e.g., fork head) for nearly 2000 genes, many of which exhibit phenotypic plasticity. Our results emphasize that genomic changes can occur remarkably fast in ants, because two recently diverged leaf-cutter ant species exhibit faster accumulation of species-specific genes and greater divergence in regulatory elements compared with other ants or Drosophila. Thus, while the "socio-genomes" of ants and the honeybee are broadly characterized by a pervasive pattern of divergence in gene composition and regulation, they preserve lineage-specific regulatory features linked to eusociality. We propose that changes in gene regulation played a key role in the origins of insect eusociality, whereas changes in gene composition were more relevant for lineage-specific eusocial adaptations.


Asunto(s)
Hormigas/genética , Genoma de los Insectos , Animales , Conducta Animal , Sitios de Unión , Secuencia Conservada , Metilación de ADN , Evolución Molecular , Regulación de la Expresión Génica , Himenópteros/genética , Proteínas de Insectos/genética , MicroARNs/genética , Modelos Genéticos , Filogenia , Secuencias Reguladoras de Ácidos Nucleicos , Análisis de Secuencia de ADN , Conducta Social , Especificidad de la Especie , Sintenía , Factores de Transcripción/genética
6.
PLoS Comput Biol ; 10(11): e1003925, 2014 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-25375262

RESUMEN

For the past four decades the compositional organization of the mammalian genome posed a formidable challenge to molecular evolutionists attempting to explain it from an evolutionary perspective. Unfortunately, most of the explanations adhered to the "isochore theory," which has long been rebutted. Recently, an alternative compositional domain model was proposed depicting the human and cow genomes as composed mostly of short compositionally homogeneous and nonhomogeneous domains and a few long ones. We test the validity of this model through a rigorous sequence-based analysis of eleven completely sequenced mammalian and avian genomes. Seven attributes of compositional domains are used in the analyses: (1) the number of compositional domains, (2) compositional domain-length distribution, (3) density of compositional domains, (4) genome coverage by the different domain types, (5) degree of fit to a power-law distribution, (6) compositional domain GC content, and (7) the joint distribution of GC content and length of the different domain types. We discuss the evolution of these attributes in light of two competing phylogenetic hypotheses that differ from each other in the validity of clade Euarchontoglires. If valid, the murid genome compositional organization would be a derived state and exhibit a high similarity to that of other mammals. If invalid, the murid genome compositional organization would be closer to an ancestral state. We demonstrate that the compositional organization of the murid genome differs from those of primates and laurasiatherians, a phenomenon previously termed the "murid shift," and in many ways resembles the genome of opossum. We find no support to the "isochore theory." Instead, our findings depict the mammalian genome as a tapestry of mostly short homogeneous and nonhomogeneous domains and few long ones thus providing strong evidence in favor of the compositional domain model and seem to invalidate clade Euarchontoglires.


Asunto(s)
Núcleo Celular/genética , Genoma/genética , Mamíferos/clasificación , Mamíferos/genética , Animales , Genómica , Humanos , Filogenia
8.
BMC Genomics ; 15: 906, 2014 Oct 17.
Artículo en Inglés | MEDLINE | ID: mdl-25326207

RESUMEN

BACKGROUND: The human pathogen Trichomonas vaginalis is a parabasalian flagellate that is estimated to infect 3% of the world's population annually. With a 160 megabase genome and up to 60,000 genes residing in six chromosomes, the parasite has the largest genome among sequenced protists. Although it is thought that the genome size and unusual large coding capacity is owed to genome duplication events, the exact reason and its consequences are less well studied. RESULTS: Among transcriptome data we found thousands of instances, in which reads mapped onto genomic loci not annotated as genes, some reaching up to several kilobases in length. At first sight these appear to represent long non-coding RNAs (lncRNAs), however, about half of these lncRNAs have significant sequence similarities to genomic loci annotated as protein-coding genes. This provides evidence for the transcription of hundreds of pseudogenes in the parasite. Conventional lncRNAs and pseudogenes are expressed in Trichomonas through their own transcription start sites and independently from flanking genes in Trichomonas. Expression of several representative lncRNAs was verified through reverse-transcriptase PCR in different T. vaginalis strains and case studies exclude the use of alternative start codons or stop codon suppression for the genes analysed. CONCLUSION: Our results demonstrate that T. vaginalis expresses thousands of intergenic loci, including numerous transcribed pseudogenes. In contrast to yeast these are expressed independently from neighbouring genes. Our results furthermore illustrate the effect genome duplication events can have on the transcriptome of a protist. The parasite's genome is in a steady state of changing and we hypothesize that the numerous lncRNAs could offer a large pool for potential innovation from which novel proteins or regulatory RNA units could evolve.


Asunto(s)
Seudogenes , ARN Largo no Codificante/genética , ARN Protozoario/genética , Trichomonas vaginalis/genética , Duplicación de Gen , Perfilación de la Expresión Génica , Análisis de Secuencia de ARN
9.
BMC Genomics ; 15: 86, 2014 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-24479613

RESUMEN

BACKGROUND: The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. RESULTS: Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. CONCLUSIONS: Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.


Asunto(s)
Abejas/genética , Genes de Insecto , Animales , Composición de Base , Bases de Datos Genéticas , Secuencias Repetitivas Esparcidas/genética , Anotación de Secuencia Molecular , Sistemas de Lectura Abierta/genética , Péptidos/análisis , Análisis de Secuencia de ARN , Homología de Secuencia de Aminoácido
10.
PLoS Genet ; 7(2): e1002007, 2011 Feb 10.
Artículo en Inglés | MEDLINE | ID: mdl-21347285

RESUMEN

Leaf-cutter ants are one of the most important herbivorous insects in the Neotropics, harvesting vast quantities of fresh leaf material. The ants use leaves to cultivate a fungus that serves as the colony's primary food source. This obligate ant-fungus mutualism is one of the few occurrences of farming by non-humans and likely facilitated the formation of their massive colonies. Mature leaf-cutter ant colonies contain millions of workers ranging in size from small garden tenders to large soldiers, resulting in one of the most complex polymorphic caste systems within ants. To begin uncovering the genomic underpinnings of this system, we sequenced the genome of Atta cephalotes using 454 pyrosequencing. One prediction from this ant's lifestyle is that it has undergone genetic modifications that reflect its obligate dependence on the fungus for nutrients. Analysis of this genome sequence is consistent with this hypothesis, as we find evidence for reductions in genes related to nutrient acquisition. These include extensive reductions in serine proteases (which are likely unnecessary because proteolysis is not a primary mechanism used to process nutrients obtained from the fungus), a loss of genes involved in arginine biosynthesis (suggesting that this amino acid is obtained from the fungus), and the absence of a hexamerin (which sequesters amino acids during larval development in other insects). Following recent reports of genome sequences from other insects that engage in symbioses with beneficial microbes, the A. cephalotes genome provides new insights into the symbiotic lifestyle of this ant and advances our understanding of host-microbe symbioses.


Asunto(s)
Hormigas/fisiología , Genoma de los Insectos/genética , Hojas de la Planta/fisiología , Simbiosis , Animales , Hormigas/genética , Arginina/genética , Arginina/metabolismo , Secuencia de Bases , Hongos/genética , Proteínas de Insectos/genética , Proteínas de Insectos/metabolismo , Análisis de Secuencia de ADN , Serina Proteasas/genética , Serina Proteasas/metabolismo
11.
Proc Natl Acad Sci U S A ; 108(14): 5673-8, 2011 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-21282631

RESUMEN

Ants are some of the most abundant and familiar animals on Earth, and they play vital roles in most terrestrial ecosystems. Although all ants are eusocial, and display a variety of complex and fascinating behaviors, few genomic resources exist for them. Here, we report the draft genome sequence of a particularly widespread and well-studied species, the invasive Argentine ant (Linepithema humile), which was accomplished using a combination of 454 (Roche) and Illumina sequencing and community-based funding rather than federal grant support. Manual annotation of >1,000 genes from a variety of different gene families and functional classes reveals unique features of the Argentine ant's biology, as well as similarities to Apis mellifera and Nasonia vitripennis. Distinctive features of the Argentine ant genome include remarkable expansions of gustatory (116 genes) and odorant receptors (367 genes), an abundance of cytochrome P450 genes (>110), lineage-specific expansions of yellow/major royal jelly proteins and desaturases, and complete CpG DNA methylation and RNAi toolkits. The Argentine ant genome contains fewer immune genes than Drosophila and Tribolium, which may reflect the prominent role played by behavioral and chemical suppression of pathogens. Analysis of the ratio of observed to expected CpG nucleotides for genes in the reproductive development and apoptosis pathways suggests higher levels of methylation than in the genome overall. The resources provided by this genome sequence will offer an abundance of tools for researchers seeking to illuminate the fascinating biology of this emerging model organism.


Asunto(s)
Hormigas/genética , Genoma de los Insectos/genética , Genómica/métodos , Filogenia , Animales , Hormigas/fisiología , Secuencia de Bases , California , Metilación de ADN , Biblioteca de Genes , Genética de Población , Jerarquia Social , Datos de Secuencia Molecular , Polimorfismo de Nucleótido Simple/genética , Receptores Odorantes/genética , Análisis de Secuencia de ADN
12.
Proc Natl Acad Sci U S A ; 108(14): 5667-72, 2011 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-21282651

RESUMEN

We report the draft genome sequence of the red harvester ant, Pogonomyrmex barbatus. The genome was sequenced using 454 pyrosequencing, and the current assembly and annotation were completed in less than 1 y. Analyses of conserved gene groups (more than 1,200 manually annotated genes to date) suggest a high-quality assembly and annotation comparable to recently sequenced insect genomes using Sanger sequencing. The red harvester ant is a model for studying reproductive division of labor, phenotypic plasticity, and sociogenomics. Although the genome of P. barbatus is similar to other sequenced hymenopterans (Apis mellifera and Nasonia vitripennis) in GC content and compositional organization, and possesses a complete CpG methylation toolkit, its predicted genomic CpG content differs markedly from the other hymenopterans. Gene networks involved in generating key differences between the queen and worker castes (e.g., wings and ovaries) show signatures of increased methylation and suggest that ants and bees may have independently co-opted the same gene regulatory mechanisms for reproductive division of labor. Gene family expansions (e.g., 344 functional odorant receptors) and pseudogene accumulation in chemoreception and P450 genes compared with A. mellifera and N. vitripennis are consistent with major life-history changes during the adaptive radiation of Pogonomyrmex spp., perhaps in parallel with the development of the North American deserts.


Asunto(s)
Hormigas/genética , Redes Reguladoras de Genes/genética , Genoma de los Insectos/genética , Genómica/métodos , Filogenia , Animales , Hormigas/fisiología , Secuencia de Bases , Clima Desértico , Jerarquia Social , Datos de Secuencia Molecular , América del Norte , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Receptores Odorantes/genética , Análisis de Secuencia de ADN
13.
BMC Genet ; 14: 37, 2013 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-23651527

RESUMEN

BACKGROUND: Whether or not a mutant allele in a population is under selection is an important issue in population genetics, and various neutrality tests have been invented so far to detect selection. However, detection of negative selection has been notoriously difficult, partly because negatively selected alleles are usually rare in the population and have little impact on either population dynamics or the shape of the gene genealogy. Recently, through studies of genetic disorders and genome-wide analyses, many structural variations were shown to occur recurrently in the population. Such "recurrent mutations" might be revealed as deleterious by exploiting the signal of negative selection in the gene genealogy enhanced by their recurrence. RESULTS: Motivated by the above idea, we devised two new test statistics. One is the total number of mutants at a recurrently mutating locus among sampled sequences, which is tested conditionally on the number of forward mutations mapped on the sequence genealogy. The other is the size of the most common class of identical-by-descent mutants in the sample, again tested conditionally on the number of forward mutations mapped on the sequence genealogy. To examine the performance of these two tests, we simulated recurrently mutated loci each flanked by sites with neutral single nucleotide polymorphisms (SNPs), with no recombination. Using neutral recurrent mutations as null models, we attempted to detect deleterious recurrent mutations. Our analyses demonstrated high powers of our new tests under constant population size, as well as their moderate power to detect selection in expanding populations. We also devised a new maximum parsimony algorithm that, given the states of the sampled sequences at a recurrently mutating locus and an incompletely resolved genealogy, enumerates mutation histories with a minimum number of mutations while partially resolving genealogical relationships when necessary. CONCLUSIONS: With their considerably high powers to detect negative selection, our new neutrality tests may open new venues for dealing with the population genetics of recurrent mutations as well as help identifying some types of genetic disorders that may have escaped identification by currently existing methods.


Asunto(s)
Mutación , Selección Genética , Humanos , Polimorfismo de Nucleótido Simple
14.
Proc Natl Acad Sci U S A ; 107(27): 12168-73, 2010 Jul 06.
Artículo en Inglés | MEDLINE | ID: mdl-20566863

RESUMEN

As an obligatory parasite of humans, the body louse (Pediculus humanus humanus) is an important vector for human diseases, including epidemic typhus, relapsing fever, and trench fever. Here, we present genome sequences of the body louse and its primary bacterial endosymbiont Candidatus Riesia pediculicola. The body louse has the smallest known insect genome, spanning 108 Mb. Despite its status as an obligate parasite, it retains a remarkably complete basal insect repertoire of 10,773 protein-coding genes and 57 microRNAs. Representing hemimetabolous insects, the genome of the body louse thus provides a reference for studies of holometabolous insects. Compared with other insect genomes, the body louse genome contains significantly fewer genes associated with environmental sensing and response, including odorant and gustatory receptors and detoxifying enzymes. The unique architecture of the 18 minicircular mitochondrial chromosomes of the body louse may be linked to the loss of the gene encoding the mitochondrial single-stranded DNA binding protein. The genome of the obligatory louse endosymbiont Candidatus Riesia pediculicola encodes less than 600 genes on a short, linear chromosome and a circular plasmid. The plasmid harbors a unique arrangement of genes required for the synthesis of pantothenate, an essential vitamin deficient in the louse diet. The human body louse, its primary endosymbiont, and the bacterial pathogens that it vectors all possess genomes reduced in size compared with their free-living close relatives. Thus, the body louse genome project offers unique information and tools to use in advancing understanding of coevolution among vectors, symbionts, and pathogens.


Asunto(s)
Genoma Bacteriano/genética , Genoma de los Insectos/genética , Pediculus/genética , Pediculus/microbiología , Animales , Enterobacteriaceae/genética , Genes Bacterianos/genética , Genes de Insecto/genética , Genómica/métodos , Humanos , Infestaciones por Piojos/parasitología , Datos de Secuencia Molecular , Análisis de Secuencia de ADN , Simbiosis
15.
Mol Biol Evol ; 28(7): 2115-23, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21285032

RESUMEN

Mutational robustness describes the extent to which a phenotype remains unchanged in the face of mutations. Theory predicts that the strength of direct selection for mutational robustness is at most the magnitude of the rate of deleterious mutation. As far as nucleic acid sequences are concerned, only long sequences in organisms with high deleterious mutation rates and large population sizes are expected to evolve mutational robustness. Surprisingly, recent studies have concluded that molecules that meet none of these conditions--the microRNA precursors (pre-miRNAs) of multicellular eukaryotes--show signs of selection for mutational and/or environmental robustness. To resolve the apparent disagreement between theory and these studies, we have reconstructed the evolutionary history of Drosophila pre-miRNAs and compared the robustness of each sequence to that of its reconstructed ancestor. In addition, we "replayed the tape" of pre-miRNA evolution via simulation under different evolutionary assumptions and compared these alternative histories with the actual one. We found that Drosophila pre-miRNAs have evolved under strong purifying selection against changes in secondary structure. Contrary to earlier claims, there is no evidence that these RNAs have been shaped by either direct or congruent selection for any kind of robustness. Instead, the high robustness of Drosophila pre-miRNAs appears to be mostly intrinsic and likely a consequence of selection for functional structures.


Asunto(s)
Drosophila/genética , Evolución Molecular , MicroARNs/genética , Penetrancia , Algoritmos , Animales , Simulación por Computador , Mutación , Conformación de Ácido Nucleico , Filogenia , Selección Genética , Estadísticas no Paramétricas
16.
RNA ; 16(1): 141-53, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19952117

RESUMEN

It has been suggested that tRNA acceptor stems specify an operational RNA code for amino acids. In the last 20 years several attributes of the putative code have been elucidated for a small number of model organisms. To gain insight about the ensemble attributes of the code, we analyzed 4925 tRNA sequences from 102 bacterial and 21 archaeal species. Here, we used a classification and regression tree (CART) methodology, and we found that the degrees of degeneracy or specificity of the RNA codes in both Archaea and Bacteria differ from those of the genetic code. We found instances of taxon-specific alternative codes, i.e., identical acceptor stem determinants encrypting different amino acids in different species, as well as instances of ambiguity, i.e., identical acceptor stem determinants encrypting two or more amino acids in the same species. When partitioning the data by class of synthetase, the degree of code ambiguity was significantly reduced. In cryptographic terms, a plausible interpretation of this result is that the class distinction in synthetases is an essential part of the decryption rules for resolving the subset of RNA code ambiguities enciphered by identical acceptor stem determinants of tRNAs acylated by enzymes belonging to the two classes. In evolutionary terms, our findings lend support to the notion that in the pre-DNA world, interactions between tRNA acceptor stems and synthetases formed the basis for the distinction between the two classes; hence, ambiguities in the ancient RNA code were pivotal for the fixation of these enzymes in the genomes of ancestral prokaryotes.


Asunto(s)
Algoritmos , Aminoácidos/genética , Codón/genética , Código Genético/fisiología , Secuencia de Aminoácidos , Archaea/genética , Bacterias/genética , Biología Computacional/métodos , Evolución Molecular , Filogenia , ARN de Transferencia/genética , ARN de Transferencia/metabolismo , ARN de Transferencia/fisiología
17.
Nucleic Acids Res ; 38(15): e158, 2010 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-20571085

RESUMEN

It has been suggested that the mammalian genome is composed mainly of long compositionally homogeneous domains. Such domains are frequently identified using recursive segmentation algorithms based on the Jensen-Shannon divergence. However, a common difficulty with such methods is deciding when to halt the recursive partitioning and what criteria to use in deciding whether a detected boundary between two segments is real or not. We demonstrate that commonly used halting criteria are intrinsically biased, and propose IsoPlotter, a parameter-free segmentation algorithm that overcomes such biases by using a simple dynamic halting criterion and tests the homogeneity of the inferred domains. IsoPlotter was compared with an alternative segmentation algorithm, D(JS), using two sets of simulated genomic sequences. Our results show that IsoPlotter was able to infer both long and short compositionally homogeneous domains with low GC content dispersion, whereas D(JS) failed to identify short compositionally homogeneous domains and sequences with low compositional dispersion. By segmenting the human genome with IsoPlotter, we found that one-third of the genome is composed of compositionally nonhomogeneous domains and the remaining is a mixture of many short compositionally homogeneous domains and relatively few long ones.


Asunto(s)
Algoritmos , Genoma Humano , Genómica/métodos , Composición de Base , Simulación por Computador , Humanos , Isocoras , Modelos Genéticos
18.
Nucleic Acids Res ; 38(Web Server issue): W23-8, 2010 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-20497997

RESUMEN

Evaluating the accuracy of multiple sequence alignment (MSA) is critical for virtually every comparative sequence analysis that uses an MSA as input. Here we present the GUIDANCE web-server, a user-friendly, open access tool for the identification of unreliable alignment regions. The web-server accepts as input a set of unaligned sequences. The server aligns the sequences and provides a simple graphic visualization of the confidence score of each column, residue and sequence of an alignment, using a color-coding scheme. The method is generic and the user is allowed to choose the alignment algorithm (ClustalW, MAFFT and PRANK are supported) as well as any type of molecular sequences (nucleotide, protein or codon sequences). The server implements two different algorithms for evaluating confidence scores: (i) the heads-or-tails (HoT) method, which measures alignment uncertainty due to co-optimal solutions; (ii) the GUIDANCE method, which measures the robustness of the alignment to guide-tree uncertainty. The server projects the confidence scores onto the MSA and points to columns and sequences that are unreliably aligned. These can be automatically removed in preparation for downstream analyses. GUIDANCE is freely available for use at http://guidance.tau.ac.il.


Asunto(s)
Alineación de Secuencia/métodos , Programas Informáticos , Proteínas del Virus de la Inmunodeficiencia Humana/química , Internet , Análisis de Secuencia de Proteína , Proteínas Reguladoras y Accesorias Virales/química
19.
Mol Biol Evol ; 27(5): 1015-24, 2010 May.
Artículo en Inglés | MEDLINE | ID: mdl-20018981

RESUMEN

Numerous segmentation methods for the detection of compositionally homogeneous domains within genomic sequences have been proposed. Unfortunately, these methods yield inconsistent results. Here, we present a benchmark consisting of two sets of simulated genomic sequences for testing the performances of segmentation algorithms. Sequences in the first set are composed of fixed-sized homogeneous domains, distinct in their between-domain guanine and cytosine (GC) content variability. The sequences in the second set are composed of a mosaic of many short domains and a few long ones, distinguished by sharp GC content boundaries between neighboring domains. We use these sets to test the performance of seven segmentation algorithms in the literature. Our results show that recursive segmentation algorithms based on the Jensen-Shannon divergence outperform all other algorithms. However, even these algorithms perform poorly in certain instances because of the arbitrary choice of a segmentation-stopping criterion.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Simulación por Computador , ADN/genética , Análisis de Secuencia de ADN/métodos , Composición de Base/genética , Emparejamiento Base/genética , Secuencia de Bases , Cromosomas Humanos Par 1/genética , Bases de Datos de Ácidos Nucleicos , Genoma Humano/genética , Humanos , Factores de Tiempo
20.
Mol Biol Evol ; 27(8): 1759-67, 2010 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-20207713

RESUMEN

Multiple sequence alignment (MSA) is the basis for a wide range of comparative sequence analyses from molecular phylogenetics to 3D structure prediction. Sophisticated algorithms have been developed for sequence alignment, but in practice, many errors can be expected and extensive portions of the MSA are unreliable. Hence, it is imperative to understand and characterize the various sources of errors in MSAs and to quantify site-specific alignment confidence. In this paper, we show that uncertainties in the guide tree used by progressive alignment methods are a major source of alignment uncertainty. We use this insight to develop a novel method for quantifying the robustness of each alignment column to guide tree uncertainty. We build on the widely used bootstrap method for perturbing the phylogenetic tree. Specifically, we generate a collection of trees and use each as a guide tree in the alignment algorithm, thus producing a set of MSAs. We next test the consistency of every column of the MSA obtained from the unperturbed guide tree with respect to the set of MSAs. We name this measure the "GUIDe tree based AligNment ConfidencE" (GUIDANCE) score. Using the Benchmark Alignment data BASE benchmark as well as simulation studies, we show that GUIDANCE scores accurately identify errors in MSAs. Additionally, we compare our results with the previously published Heads-or-Tails score and show that the GUIDANCE score is a better predictor of unreliably aligned regions.


Asunto(s)
Algoritmos , Secuencia de Aminoácidos , Secuencia de Bases , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Animales , Simulación por Computador , Bases de Datos Factuales , Drosophila melanogaster/genética , Datos de Secuencia Molecular , Filogenia , Curva ROC , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA