Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
1.
Nucleic Acids Res ; 52(D1): D529-D535, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37843103

RESUMO

To date, the databases built to gather information on gene orthology do not provide end-users with descriptors of the molecular evolution information and phylogenetic pattern of these orthologues. In this context, we developed OrthoMaM, a database of ORTHOlogous MAmmalian Markers describing the evolutionary dynamics of coding sequences in mammalian genomes. OrthoMaM version 12 includes 15,868 alignments of orthologous coding sequences (CDS) from the 190 complete mammalian genomes currently available. All annotations and 1-to-1 orthology assignments are based on NCBI. Orthologous CDS can be mined for potential informative markers at the different taxonomic levels of the mammalian tree. To this end, several evolutionary descriptors of DNA sequences are provided for querying purposes (e.g. base composition and relative substitution rate). The graphical web interface allows the user to easily browse and sort the results of combined queries. The corresponding multiple sequence alignments and ML trees, inferred using state-of-the art approaches, are available for download both at the nucleotide and amino acid levels. OrthoMaM v12 can be used by researchers interested either in reconstructing the phylogenetic relationships of mammalian taxa or in understanding the evolutionary dynamics of coding sequences in their genomes. OrthoMaM is available for browsing, querying and complete or filtered download at https://orthomam.mbb.cnrs.fr/.


Assuntos
Bases de Dados Genéticas , Genômica , Animais , Sequência de Bases , Genoma , Genômica/métodos , Mamíferos/classificação , Mamíferos/genética , Filogenia , Evolução Biológica
2.
Plant J ; 108(2): 492-508, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34382706

RESUMO

Oryza sativa (rice) plays an essential food security role for more than half of the world's population. Obtaining crops with high levels of disease resistance is a major challenge for breeders, especially today, given the urgent need for agriculture to be more sustainable. Plant resistance genes are mainly encoded by three large leucine-rich repeat (LRR)-containing receptor (LRR-CR) families: the LRR-receptor-like kinase (LRR-RLK), LRR-receptor-like protein (LRR-RLP) and nucleotide-binding LRR receptor (NLR). Using lrrprofiler, a pipeline that we developed to annotate and classify these proteins, we compared three publicly available annotations of the rice Nipponbare reference genome. The extended discrepancies that we observed for LRR-CR gene models led us to perform an in-depth manual curation of their annotations while paying special attention to nonsense mutations. We then transferred this manually curated annotation to Kitaake, a cultivar that is closely related to Nipponbare, using an optimized strategy. Here, we discuss the breakthrough achieved by manual curation when comparing genomes and, in addition to 'functional' and 'structural' annotations, we propose that the community adopts this approach, which we call 'comprehensive' annotation. The resulting data are crucial for further studies on the natural variability and evolution of LRR-CR genes in order to promote their use in breeding future resilient varieties.


Assuntos
Anotação de Sequência Molecular , Oryza/genética , Proteínas de Plantas/genética , Sequências Repetitivas de Aminoácidos , Genoma de Planta , Genótipo , Anotação de Sequência Molecular/métodos , Oryza/química , Proteínas de Plantas/química
3.
Mol Biol Evol ; 36(4): 861-862, 2019 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-30698751

RESUMO

We present version 10 of OrthoMaM, a database of orthologous mammalian markers. OrthoMaM is already 11 years old and since the outset it has kept on improving, providing alignments and phylogenetic trees of high-quality computed with state-of-the-art methods on up-to-date data. The main contribution of this version is the increase in the number of taxa: 116 mammalian genomes for 14,509 one-to-one orthologous genes. This has been made possible by the combination of genomic data deposited in Ensembl complemented by additional good-quality genomes only available in NCBI. Version 10 users will benefit from pipeline improvements and a completely redesigned web-interface.


Assuntos
Bases de Dados Genéticas , Genoma , Mamíferos/genética , Filogenia , Alinhamento de Sequência , Animais
4.
Mol Biol Evol ; 35(10): 2582-2584, 2018 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-30165589

RESUMO

Multiple sequence alignment is a prerequisite for many evolutionary analyses. Multiple Alignment of Coding Sequences (MACSE) is a multiple sequence alignment program that explicitly accounts for the underlying codon structure of protein-coding nucleotide sequences. Its unique characteristic allows building reliable codon alignments even in the presence of frameshifts. This facilitates downstream analyses such as selection pressure estimation based on the ratio of nonsynonymous to synonymous substitutions. Here, we present MACSE v2, a major update with an improved version of the initial algorithm enriched with a complete toolkit to handle multiple alignments of protein-coding sequences. A graphical interface now provides user-friendly access to the different subprograms.


Assuntos
Alinhamento de Sequência , Software , Códon de Terminação , Mutação da Fase de Leitura
5.
Bioinformatics ; 33(9): 1387-1388, 2017 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-28453680

RESUMO

Motivation: Marker-assisted selection strongly relies on genetic maps to accelerate breeding programs. High-density maps are now available for numerous species. Dedicated tools are required to compare several high-density maps on the basis of their key characteristics, while pinpointing their differences and similarities. Results: We developed the Genetic Map Comparator-a web-based application for easy comparison of different maps according to their key statistics and the relative positions of common markers. Availability and Implementation: The Genetic Map Comparator is available online at: http://bioweb.supagro.inra.fr/geneticMapComparator. The source code is freely available on GitHub under the under the CeCILL general public license: https://github.com/holtzy/GenMap-Comparator. Contact: Holtz@supagro.fr; Ranwez@supagro.fr.


Assuntos
Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Resistência à Doença/genética , Genes de Plantas , Doenças das Plantas/genética , Locos de Características Quantitativas , Triticum/genética , Triticum/virologia , Viroses/genética
6.
Theor Appl Genet ; 130(7): 1491-1505, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28451771

RESUMO

KEY MESSAGE: The resistance of durum wheat to the Wheat spindle streak mosaic virus (WSSMV) is controlled by two main QTLs on chromosomes 7A and 7B, with a huge epistatic effect. Wheat spindle streak mosaic virus (WSSMV) is a major disease of durum wheat in Europe and North America. Breeding WSSMV-resistant cultivars is currently the only way to control the virus since no treatment is available. This paper reports studies of the inheritance of WSSMV resistance using two related durum wheat populations obtained by crossing two elite cultivars with a WSSMV-resistant emmer cultivar. In 2012 and 2015, 354 recombinant inbred lines (RIL) were phenotyped using visual notations, ELISA and qPCR and genotyped using locus targeted capture and sequencing. This allowed us to build a consensus genetic map of 8568 markers and identify three chromosomal regions involved in WSSMV resistance. Two major regions (located on chromosomes 7A and 7B) jointly explain, on the basis of epistatic interactions, up to 43% of the phenotypic variation. Flanking sequences of our genetic markers are provided to facilitate future marker-assisted selection of WSSMV-resistant cultivars.


Assuntos
Resistência à Doença/genética , Epistasia Genética , Doenças das Plantas/genética , Potyviridae , Locos de Características Quantitativas , Triticum/genética , Mapeamento Cromossômico , Cruzamentos Genéticos , Ligação Genética , Marcadores Genéticos , Genótipo , Fenótipo , Doenças das Plantas/virologia , Triticum/virologia
7.
J Theor Biol ; 432: 1-13, 2017 11 07.
Artigo em Inglês | MEDLINE | ID: mdl-28801222

RESUMO

Gene trees and species trees can be discordant due to several processes. Standard models of reconciliations consider macro-evolutionary events at the gene level: duplications, losses and transfers of genes. However, another common source of gene tree-species tree discordance is incomplete lineage sorting (ILS), whereby gene divergences corresponding to speciations occur "out of order". However, ILS is seldom considered in reconciliation models. In this paper, we devise a unified formal IDTL reconciliation model which includes all the above mentioned processes. We show how to properly cost ILS under this model, and then give a fixed-parameter tractable (FPT) algorithm which calculates the most parsimonious IDTL reconciliation, with guaranteed time-consistency of transfer events. Provided that the number of branches in contiguous regions of the species tree in which ILS is allowed is bounded by a constant, this algorithm is linear in the number of genes and quadratic in the number of species. This provides a formal foundation to the inference of ILS in a reconciliation framework.


Assuntos
Duplicação Gênica , Transferência Genética Horizontal , Filogenia , Algoritmos , Haploidia , Modelos Genéticos
8.
J Math Biol ; 72(7): 1811-44, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-26337177

RESUMO

In the field of phylogenetics, the evolutionary history of a set of organisms is commonly depicted by a species tree-whose internal nodes represent speciation events-while the evolutionary history of a gene family is depicted by a gene tree-whose internal nodes can also represent macro-evolutionary events such as gene duplications and transfers. As speciation events are only part of the events shaping a gene history, the topology of a gene tree can show incongruences with that of the corresponding species tree. These incongruences can be used to infer the macro-evolutionary events undergone by the gene family. This is done by embedding the gene tree inside the species tree and hence providing a reconciliation of those trees. In the past decade, several parsimony-based methods have been developed to infer such reconciliations, accounting for gene duplications ([Formula: see text]), transfers ([Formula: see text]) and losses ([Formula: see text]). The main contribution of this paper is to formally prove an important assumption implicitly made by previous works on these reconciliations, namely that solving the (maximum) parsimony [Formula: see text] reconciliation problem in the discrete framework is equivalent to finding a most parsimonious [Formula: see text] scenario in the continuous framework. In the process, we also prove several intermediate results that are useful on their own and constitute a theoretical toolbox that will likely facilitate future theoretical contributions in the field.


Assuntos
Evolução Biológica , Duplicação Gênica , Modelos Biológicos , Algoritmos , Evolução Molecular , Deleção de Genes , Transferência Genética Horizontal , Especiação Genética , Filogenia
9.
BMC Bioinformatics ; 16: 384, 2015 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-26573665

RESUMO

BACKGROUND: Given a gene and a species tree, reconciliation methods attempt to retrieve the macro-evolutionary events that best explain the discrepancies between the two tree topologies. The DTL parsimonious approach searches for a most parsimonious reconciliation between a gene tree and a (dated) species tree, considering four possible macro-evolutionary events (speciation, duplication, transfer, and loss) with specific costs. Unfortunately, many events are erroneously predicted due to errors in the input trees, inappropriate input cost values or because of the existence of several equally parsimonious scenarios. It is thus crucial to provide a measure of the reliability for predicted events. It has been recently proposed that the reliability of an event can be estimated via its frequency in the set of most parsimonious reconciliations obtained using a variety of reasonable input cost vectors. To compute such a support, a straightforward but time-consuming approach is to generate the costs slightly departing from the original ones, independently compute the set of all most parsimonious reconciliations for each vector, and combine these sets a posteriori. Another proposed approach uses Pareto-optimality to partition cost values into regions which induce reconciliations with the same number of DTL events. The support of an event is then defined as its frequency in the set of regions. However, often, the number of regions is not large enough to provide reliable supports. RESULTS: We present here a method to compute efficiently event supports via a polynomial-sized graph, which can represent all reconciliations for several different costs. Moreover, two methods are proposed to take into account alternative input costs: either explicitly providing an input cost range or allowing a tolerance for the over cost of a reconciliation. Our methods are faster than the region based method, substantially faster than the sampling-costs approach, and have a higher event-prediction accuracy on simulated data. CONCLUSIONS: We propose a new approach to improve the accuracy of event supports for parsimonious reconciliation methods to account for uncertainty in the input costs. Furthermore, because of their speed, our methods can be used on large gene families. Our algorithms are implemented in the ecceTERA program, freely available from http://mbb.univ-montp2.fr/MBB/.


Assuntos
Evolução Molecular , Filogenia , Proteobactérias/genética , Algoritmos , Simulação por Computador , Genes Bacterianos , Reprodutibilidade dos Testes
10.
BMC Bioinformatics ; 16: 83, 2015 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-25887746

RESUMO

BACKGROUND: Semantic approaches such as concept-based information retrieval rely on a corpus in which resources are indexed by concepts belonging to a domain ontology. In order to keep such applications up-to-date, new entities need to be frequently annotated to enrich the corpus. However, this task is time-consuming and requires a high-level of expertise in both the domain and the related ontology. Different strategies have thus been proposed to ease this indexing process, each one taking advantage from the features of the document. RESULTS: In this paper we present USI (User-oriented Semantic Indexer), a fast and intuitive method for indexing tasks. We introduce a solution to suggest a conceptual annotation for new entities based on related already indexed documents. Our results, compared to those obtained by previous authors using the MeSH thesaurus and a dataset of biomedical papers, show that the method surpasses text-specific methods in terms of both quality and speed. Evaluations are done via usual metrics and semantic similarity. CONCLUSIONS: By only relying on neighbor documents, the User-oriented Semantic Indexer does not need a representative learning set. Yet, it provides better results than the other approaches by giving a consistent annotation scored with a global criterion - instead of one score per concept.


Assuntos
Indexação e Redação de Resumos , Algoritmos , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Semântica , Interface Usuário-Computador , Humanos , Medical Subject Headings , Reconhecimento Automatizado de Padrão , Vocabulário Controlado
11.
Mol Biol Evol ; 31(7): 1923-8, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24723423

RESUMO

Comparative genomic studies extensively rely on alignments of orthologous sequences. Yet, selecting, gathering, and aligning orthologous exons and protein-coding sequences (CDS) that are relevant for a given evolutionary analysis can be a difficult and time-consuming task. In this context, we developed OrthoMaM, a database of ORTHOlogous MAmmalian Markers describing the evolutionary dynamics of orthologous genes in mammalian genomes using a phylogenetic framework. Since its first release in 2007, OrthoMaM has regularly evolved, not only to include newly available genomes but also to incorporate up-to-date software in its analytic pipeline. This eighth release integrates the 40 complete mammalian genomes available in Ensembl v73 and provides alignments, phylogenies, evolutionary descriptor information, and functional annotations for 13,404 single-copy orthologous CDS and 6,953 long exons. The graphical interface allows to easily explore OrthoMaM to identify markers with specific characteristics (e.g., taxa availability, alignment size, %G+C, evolutionary rate, chromosome location). It hence provides an efficient solution to sample preprocessed markers adapted to user-specific needs. OrthoMaM has proven to be a valuable resource for researchers interested in mammalian phylogenomics, evolutionary genomics, and has served as a source of benchmark empirical data sets in several methodological studies. OrthoMaM is available for browsing, query and complete or filtered downloads at http://www.orthomam.univ-montp2.fr/.


Assuntos
Bases de Dados Genéticas , Mamíferos/classificação , Mamíferos/genética , Animais , Sequência de Bases , Sequência Conservada , Evolução Molecular , Éxons , Genômica , Humanos , Filogenia , Alinhamento de Sequência , Software , Navegador
12.
J Math Biol ; 71(5): 1179-209, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25502987

RESUMO

Reconciliations between gene and species trees have important applications in the study of genome evolution (e.g. sequence orthology prediction or quantification of transfer events). While numerous methods have been proposed to infer them, little has been done to study the underlying reconciliation space. In this paper, we characterise the reconciliation space for two evolutionary models: the [Formula: see text] (duplication, loss and transfer) model and a variant of it-the no-[Formula: see text] model-which does not allow [Formula: see text] events (a transfer immediately followed by a loss). We provide formulae to compute the size of the corresponding spaces and define a set of transformation operators sufficient to explore the entire reconciliation space. We also define a distance between two reconciliations as the minimal number of operations needed to transform one into the other and prove that this distance is easily computable in the no-[Formula: see text] model. Computing this distance in the [Formula: see text] model is more difficult and it is an open question whether it is NP-hard or not. This work constitutes an important step toward reconciliation space characterisation and reconciliation comparison, needed to better assess the performance of reconciliation inference methods through simulations.


Assuntos
Evolução Molecular , Modelos Genéticos , Filogenia , Algoritmos , Simulação por Computador , Deleção de Genes , Duplicação Gênica , Transferência Genética Horizontal , Especiação Genética , Conceitos Matemáticos , Família Multigênica , Especificidade da Espécie
13.
Mol Biol Evol ; 30(9): 2134-44, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23813978

RESUMO

Despite the rapid increase of size in phylogenomic data sets, a number of important nodes on animal phylogeny are still unresolved. Among these, the rooting of the placental mammal tree is still a controversial issue. One difficulty lies in the pervasive phylogenetic conflicts among genes, with each one telling its own story, which may be reliable or not. Here, we identified a simple criterion, that is, the GC content, which substantially helps in determining which gene trees best reflect the species tree. We assessed the ability of 13,111 coding sequence alignments to correctly reconstruct the placental phylogeny. We found that GC-rich genes induced a higher amount of conflict among gene trees and performed worse than AT-rich genes in retrieving well-supported, consensual nodes on the placental tree. We interpret this GC effect mainly as a consequence of genome-wide variations in recombination rate. Indeed, recombination is known to drive GC-content evolution through GC-biased gene conversion and might be problematic for phylogenetic reconstruction, for instance, in an incomplete lineage sorting context. When we focused on the AT-richest fraction of the data set, the resolution level of the placental phylogeny was greatly increased, and a strong support was obtained in favor of an Afrotheria rooting, that is, Afrotheria as the sister group of all other placentals. We show that in mammals most conflicts among gene trees, which have so far hampered the resolution of the placental tree, are concentrated in the GC-rich regions of the genome. We argue that the GC content-because it is a reliable indicator of the long-term recombination rate-is an informative criterion that could help in identifying the most reliable molecular markers for species tree inference.


Assuntos
Sequência Rica em At , Composição de Bases , Evolução Molecular , Genoma , Mamíferos/classificação , Filogenia , Animais , Feminino , Mamíferos/genética , Dados de Sequência Molecular , Placenta/fisiologia , Gravidez , Recombinação Genética , Alinhamento de Sequência , Homologia de Sequência do Ácido Nucleico
14.
BMC Plant Biol ; 14: 151, 2014 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-24884640

RESUMO

BACKGROUND: Recurrent gene duplication and retention played an important role in angiosperm genome evolution. It has been hypothesized that these processes contribute significantly to plant adaptation but so far this hypothesis has not been tested at the genome scale. RESULTS: We studied available sequenced angiosperm genomes to assess the frequency of positive selection footprints in lineage specific expanded (LSE) gene families compared to single-copy genes using a dN/dS-based test in a phylogenetic framework. We found 5.38% of alignments in LSE genes with codons under positive selection. In contrast, we found no evidence for codons under positive selection in the single-copy reference set. An analysis at the branch level shows that purifying selection acted more strongly on single-copy genes than on LSE gene clusters. Moreover we detect significantly more branches indicating evolution under positive selection and/or relaxed constraint in LSE genes than in single-copy genes. CONCLUSIONS: In this - to our knowledge -first genome-scale study we provide strong empirical support for the hypothesis that LSE genes fuel adaptation in angiosperms. Our conservative approach for detecting selection footprints as well as our results can be of interest for further studies on (plant) gene family evolution.


Assuntos
Adaptação Fisiológica/genética , Duplicação Gênica , Genoma de Planta , Análise por Conglomerados , Códon/genética , Bases de Dados Genéticas , Anotação de Sequência Molecular , Família Multigênica , Mutação/genética , Polimorfismo Genético , Seleção Genética , Fatores de Tempo
15.
Nucleic Acids Res ; 40(18): 9102-14, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22833609

RESUMO

We have sequenced the genome of the emerging human pathogen Babesia microti and compared it with that of other protozoa. B. microti has the smallest nuclear genome among all Apicomplexan parasites sequenced to date with three chromosomes encoding ∼3500 polypeptides, several of which are species specific. Genome-wide phylogenetic analyses indicate that B. microti is significantly distant from all species of Babesidae and Theileridae and defines a new clade in the phylum Apicomplexa. Furthermore, unlike all other Apicomplexa, its mitochondrial genome is circular. Genome-scale reconstruction of functional networks revealed that B. microti has the minimal metabolic requirement for intraerythrocytic protozoan parasitism. B. microti multigene families differ from those of other protozoa in both the copy number and organization. Two lateral transfer events with significant metabolic implications occurred during the evolution of this parasite. The genomic sequencing of B. microti identified several targets suitable for the development of diagnostic assays and novel therapies for human babesiosis.


Assuntos
Babesia microti/genética , Genoma de Protozoário , Babesia microti/classificação , Babesia microti/metabolismo , Glicosilfosfatidilinositóis/biossíntese , Glicosilfosfatidilinositóis/metabolismo , Proteoma/metabolismo , Análise de Sequência de DNA
16.
bioRxiv ; 2024 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-38293033

RESUMO

Babesiosis, caused by protozoan parasites of the genus Babesia , is an emerging tick-borne disease of significance for both human and animal health. Babesia parasites infect erythrocytes of vertebrate hosts where they develop and multiply rapidly to cause the pathological symptoms associated with the disease. The identification of various Babesia species underscores the ongoing risk of new zoonotic pathogens capable of infecting humans, a concern amplified by anthropogenic activities and environmental shifts impacting the distribution and transmission dynamics of parasites, their vectors, and reservoir hosts. One such species, Babesia MO1, previously implicated in severe cases of human babesiosis in the midwestern United States, was initially considered closely related to B. divergens , the predominant agent of human babesiosis in Europe. Yet, uncertainties persist regarding whether these pathogens represent distinct variants of the same species or are entirely separate species. We show that although both B. MO1 and B. divergens share similar genome sizes, comprising three nuclear chromosomes, one linear mitochondrial chromosome, and one circular apicoplast chromosome, major differences exist in terms of genomic sequence divergence, gene functions, transcription profiles, replication rates and susceptibility to antiparasitic drugs. Furthermore, both pathogens have evolved distinct classes of multigene families, crucial for their pathogenicity and adaptation to specific mammalian hosts. Leveraging genomic information for B. MO1, B. divergens , and other members of the Babesiidae family within Apicomplexa provides valuable insights into the evolution, diversity, and virulence of these parasites. This knowledge serves as a critical tool in preemptively addressing the emergence and rapid transmission of more virulent strains.

17.
BMC Bioinformatics ; 14: 332, 2013 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-24252193

RESUMO

BACKGROUND: Genes located in the same chromosome region share common evolutionary events more often than other genes (e.g. a segmental duplication of this region). Their evolution may also be related if they are involved in the same protein complex or biological process. Identifying co-evolving genes can thus shed light on ancestral genome structures and functional gene interactions. RESULTS: We devise a simple, fast and accurate probability method based on species tree-gene tree reconciliations to detect when two gene families have co-evolved. Our method observes the number and location of predicted macro-evolutionary events, and estimates the probability of having the observed number of common events by chance. CONCLUSIONS: Simulation studies confirm that our method effectively identifies co-evolving families. This opens numerous perspectives on genome-scale analysis where this method could be used to pinpoint co-evolving gene families and thus help to unravel ancestral genome arrangements or undocumented gene interactions.


Assuntos
Evolução Molecular , Família Multigênica/genética , Simulação por Computador , Genoma Bacteriano , Filogenia , Probabilidade , Proteobactérias/genética , Distribuição Aleatória , Duplicações Segmentares Genômicas
18.
BMC Bioinformatics ; 14 Suppl 15: S15, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24564644

RESUMO

BACKGROUND: Using Next Generation Sequencing, SNP discovery is relatively easy on diploid species and still hampered in polyploid species by the confusion due to homeology. We develop HomeoSplitter; a fast and effective solution to split original contigs obtained by RNAseq into two homeologous sequences. It uses the differential expression of the two homeologous genes in the RNA. We verify that the new sequences are closer to the diploid progenitors of the allopolyploid species than the original contig. By remapping original reads on these new sequences, we also verify that the number of valuable detected SNPs has significantly increased. RESULTS: HomeoSplitter is a fast and effective solution to disentangle homeologous sequences based on a maximum likelihood optimization. On a benchmark set of 2,505 clusters containing homologous sequences of urartu, speltoides and durum, HomeoSplitter was efficient to build sequences closer to the diploid references and increased the number of valuable SNPs from 188 out of 1,360 SNPs detected when mapping the reads on the de novo durum assembly to 762 out of 1,620 SNPs when mapping on HomeoSplitter contigs. CONCLUSIONS: The HomeoSplitter program is freely available at http://bioweb.supagro.inra.fr/homeoSplitter/. This work provides a practical solution to the complex problem of disentangling homeologous transcripts in allo-tetraploids, which further allows an improved SNP detection.


Assuntos
Análise de Sequência de DNA , Tetraploidia , Triticum/genética , Sequência de Bases , Diploide , Sequenciamento de Nucleotídeos em Larga Escala , Polimorfismo de Nucleotídeo Único
19.
Mol Biol Evol ; 29(7): 1861-74, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22319139

RESUMO

The analysis of extant sequences shows that molecular evolution has been heterogeneous through time and among lineages. However, for a given sequence alignment, it is often difficult to uncover what factors caused this heterogeneity. In fact, identifying and characterizing heterogeneous patterns of molecular evolution along a phylogenetic tree is very challenging, for lack of appropriate methods. Users either have to a priori define groups of branches along which they believe molecular evolution has been similar or have to allow each branch to have its own pattern of molecular evolution. The first approach assumes prior knowledge that is seldom available, and the second requires estimating an unreasonably large number of parameters. Here we propose a convenient and reliable approach where branches get clustered by their pattern of molecular evolution alone, with no need for prior knowledge about the data set under study. Model selection is achieved in a statistical framework and therefore avoids overparameterization. We rely on substitution mapping for efficiency and present two clustering approaches, depending on whether or not we expect neighbouring branches to share more similar patterns of sequence evolution than distant branches. We validate our method on simulations and test it on four previously published data sets. We find that our method correctly groups branches sharing similar equilibrium GC contents in a data set of ribosomal RNAs and recovers expected footprints of selection through dN/dS. Importantly, it also uncovers a new pattern of relaxed selection in a phylogeny of Mantellid frogs, which we are able to correlate to life-history traits. This shows that our programs should be very useful to study patterns of molecular evolution and reveal new correlations between sequence and species evolution. Our programs can run on DNA, RNA, codon, or amino acid sequences with a large set of possible models of substitutions and are available at http://biopp.univ-montp2.fr/forge/testnh.


Assuntos
Algoritmos , Evolução Molecular , Modelos Genéticos , Animais , Evolução Biológica , Análise por Conglomerados , Simulação por Computador , Daphnia/genética , Muramidase/genética , Filogenia , RNA Ribossômico/genética , Ranidae/genética
20.
Genome Res ; 20(8): 1001-9, 2010 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-20530252

RESUMO

The origin, evolution, and functional relevance of genomic variations in GC content are a long-debated topic, especially in mammals. Most of the existing literature, however, has focused on a small number of model species and/or limited sequence data sets. We analyzed more than 1000 orthologous genes in 33 fully sequenced mammalian genomes, reconstructed their ancestral isochore organization in the maximum likelihood framework, and explored the evolution of third-codon position GC content in representatives of 16 orders and 27 families. We showed that the previously reported erosion of GC-rich isochores is not a general trend. Several species (e.g., shrew, microbat, tenrec, rabbit) have independently undergone a marked increase in GC content, with a widening gap between the GC-poorest and GC-richest classes of genes. The intensively studied apes and (especially) murids do not reflect the general placental pattern. We correlated GC-content evolution with species life-history traits and cytology. Significant effects of body mass and genome size were detected, with each being consistent with the GC-biased gene conversion model.


Assuntos
Composição de Bases/genética , Cromossomos de Mamíferos/genética , Evolução Molecular , Genoma , Isocoros/genética , Filogenia , Animais , Sequência de Bases , Genômica , Alinhamento de Sequência
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa