Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
1.
Nucleic Acids Res ; 52(D1): D529-D535, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37843103

RESUMEN

To date, the databases built to gather information on gene orthology do not provide end-users with descriptors of the molecular evolution information and phylogenetic pattern of these orthologues. In this context, we developed OrthoMaM, a database of ORTHOlogous MAmmalian Markers describing the evolutionary dynamics of coding sequences in mammalian genomes. OrthoMaM version 12 includes 15,868 alignments of orthologous coding sequences (CDS) from the 190 complete mammalian genomes currently available. All annotations and 1-to-1 orthology assignments are based on NCBI. Orthologous CDS can be mined for potential informative markers at the different taxonomic levels of the mammalian tree. To this end, several evolutionary descriptors of DNA sequences are provided for querying purposes (e.g. base composition and relative substitution rate). The graphical web interface allows the user to easily browse and sort the results of combined queries. The corresponding multiple sequence alignments and ML trees, inferred using state-of-the art approaches, are available for download both at the nucleotide and amino acid levels. OrthoMaM v12 can be used by researchers interested either in reconstructing the phylogenetic relationships of mammalian taxa or in understanding the evolutionary dynamics of coding sequences in their genomes. OrthoMaM is available for browsing, querying and complete or filtered download at https://orthomam.mbb.cnrs.fr/.


Asunto(s)
Bases de Datos Genéticas , Genómica , Animales , Secuencia de Bases , Genoma , Genómica/métodos , Mamíferos/clasificación , Mamíferos/genética , Filogenia , Evolución Biológica
2.
Plant J ; 108(2): 492-508, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34382706

RESUMEN

Oryza sativa (rice) plays an essential food security role for more than half of the world's population. Obtaining crops with high levels of disease resistance is a major challenge for breeders, especially today, given the urgent need for agriculture to be more sustainable. Plant resistance genes are mainly encoded by three large leucine-rich repeat (LRR)-containing receptor (LRR-CR) families: the LRR-receptor-like kinase (LRR-RLK), LRR-receptor-like protein (LRR-RLP) and nucleotide-binding LRR receptor (NLR). Using lrrprofiler, a pipeline that we developed to annotate and classify these proteins, we compared three publicly available annotations of the rice Nipponbare reference genome. The extended discrepancies that we observed for LRR-CR gene models led us to perform an in-depth manual curation of their annotations while paying special attention to nonsense mutations. We then transferred this manually curated annotation to Kitaake, a cultivar that is closely related to Nipponbare, using an optimized strategy. Here, we discuss the breakthrough achieved by manual curation when comparing genomes and, in addition to 'functional' and 'structural' annotations, we propose that the community adopts this approach, which we call 'comprehensive' annotation. The resulting data are crucial for further studies on the natural variability and evolution of LRR-CR genes in order to promote their use in breeding future resilient varieties.


Asunto(s)
Anotación de Secuencia Molecular , Oryza/genética , Proteínas de Plantas/genética , Secuencias Repetitivas de Aminoácido , Genoma de Planta , Genotipo , Anotación de Secuencia Molecular/métodos , Oryza/química , Proteínas de Plantas/química
3.
Mol Biol Evol ; 36(4): 861-862, 2019 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-30698751

RESUMEN

We present version 10 of OrthoMaM, a database of orthologous mammalian markers. OrthoMaM is already 11 years old and since the outset it has kept on improving, providing alignments and phylogenetic trees of high-quality computed with state-of-the-art methods on up-to-date data. The main contribution of this version is the increase in the number of taxa: 116 mammalian genomes for 14,509 one-to-one orthologous genes. This has been made possible by the combination of genomic data deposited in Ensembl complemented by additional good-quality genomes only available in NCBI. Version 10 users will benefit from pipeline improvements and a completely redesigned web-interface.


Asunto(s)
Bases de Datos Genéticas , Genoma , Mamíferos/genética , Filogenia , Alineación de Secuencia , Animales
4.
Mol Biol Evol ; 35(10): 2582-2584, 2018 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-30165589

RESUMEN

Multiple sequence alignment is a prerequisite for many evolutionary analyses. Multiple Alignment of Coding Sequences (MACSE) is a multiple sequence alignment program that explicitly accounts for the underlying codon structure of protein-coding nucleotide sequences. Its unique characteristic allows building reliable codon alignments even in the presence of frameshifts. This facilitates downstream analyses such as selection pressure estimation based on the ratio of nonsynonymous to synonymous substitutions. Here, we present MACSE v2, a major update with an improved version of the initial algorithm enriched with a complete toolkit to handle multiple alignments of protein-coding sequences. A graphical interface now provides user-friendly access to the different subprograms.


Asunto(s)
Alineación de Secuencia , Programas Informáticos , Codón de Terminación , Mutación del Sistema de Lectura
5.
Bioinformatics ; 33(9): 1387-1388, 2017 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-28453680

RESUMEN

Motivation: Marker-assisted selection strongly relies on genetic maps to accelerate breeding programs. High-density maps are now available for numerous species. Dedicated tools are required to compare several high-density maps on the basis of their key characteristics, while pinpointing their differences and similarities. Results: We developed the Genetic Map Comparator-a web-based application for easy comparison of different maps according to their key statistics and the relative positions of common markers. Availability and Implementation: The Genetic Map Comparator is available online at: http://bioweb.supagro.inra.fr/geneticMapComparator. The source code is freely available on GitHub under the under the CeCILL general public license: https://github.com/holtzy/GenMap-Comparator. Contact: Holtz@supagro.fr; Ranwez@supagro.fr.


Asunto(s)
Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Resistencia a la Enfermedad/genética , Genes de Plantas , Enfermedades de las Plantas/genética , Sitios de Carácter Cuantitativo , Triticum/genética , Triticum/virología , Virosis/genética
6.
Theor Appl Genet ; 130(7): 1491-1505, 2017 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-28451771

RESUMEN

KEY MESSAGE: The resistance of durum wheat to the Wheat spindle streak mosaic virus (WSSMV) is controlled by two main QTLs on chromosomes 7A and 7B, with a huge epistatic effect. Wheat spindle streak mosaic virus (WSSMV) is a major disease of durum wheat in Europe and North America. Breeding WSSMV-resistant cultivars is currently the only way to control the virus since no treatment is available. This paper reports studies of the inheritance of WSSMV resistance using two related durum wheat populations obtained by crossing two elite cultivars with a WSSMV-resistant emmer cultivar. In 2012 and 2015, 354 recombinant inbred lines (RIL) were phenotyped using visual notations, ELISA and qPCR and genotyped using locus targeted capture and sequencing. This allowed us to build a consensus genetic map of 8568 markers and identify three chromosomal regions involved in WSSMV resistance. Two major regions (located on chromosomes 7A and 7B) jointly explain, on the basis of epistatic interactions, up to 43% of the phenotypic variation. Flanking sequences of our genetic markers are provided to facilitate future marker-assisted selection of WSSMV-resistant cultivars.


Asunto(s)
Resistencia a la Enfermedad/genética , Epistasis Genética , Enfermedades de las Plantas/genética , Potyviridae , Sitios de Carácter Cuantitativo , Triticum/genética , Mapeo Cromosómico , Cruzamientos Genéticos , Ligamiento Genético , Marcadores Genéticos , Genotipo , Fenotipo , Enfermedades de las Plantas/virología , Triticum/virología
7.
J Theor Biol ; 432: 1-13, 2017 11 07.
Artículo en Inglés | MEDLINE | ID: mdl-28801222

RESUMEN

Gene trees and species trees can be discordant due to several processes. Standard models of reconciliations consider macro-evolutionary events at the gene level: duplications, losses and transfers of genes. However, another common source of gene tree-species tree discordance is incomplete lineage sorting (ILS), whereby gene divergences corresponding to speciations occur "out of order". However, ILS is seldom considered in reconciliation models. In this paper, we devise a unified formal IDTL reconciliation model which includes all the above mentioned processes. We show how to properly cost ILS under this model, and then give a fixed-parameter tractable (FPT) algorithm which calculates the most parsimonious IDTL reconciliation, with guaranteed time-consistency of transfer events. Provided that the number of branches in contiguous regions of the species tree in which ILS is allowed is bounded by a constant, this algorithm is linear in the number of genes and quadratic in the number of species. This provides a formal foundation to the inference of ILS in a reconciliation framework.


Asunto(s)
Duplicación de Gen , Transferencia de Gen Horizontal , Filogenia , Algoritmos , Haploidia , Modelos Genéticos
8.
J Math Biol ; 72(7): 1811-44, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-26337177

RESUMEN

In the field of phylogenetics, the evolutionary history of a set of organisms is commonly depicted by a species tree-whose internal nodes represent speciation events-while the evolutionary history of a gene family is depicted by a gene tree-whose internal nodes can also represent macro-evolutionary events such as gene duplications and transfers. As speciation events are only part of the events shaping a gene history, the topology of a gene tree can show incongruences with that of the corresponding species tree. These incongruences can be used to infer the macro-evolutionary events undergone by the gene family. This is done by embedding the gene tree inside the species tree and hence providing a reconciliation of those trees. In the past decade, several parsimony-based methods have been developed to infer such reconciliations, accounting for gene duplications ([Formula: see text]), transfers ([Formula: see text]) and losses ([Formula: see text]). The main contribution of this paper is to formally prove an important assumption implicitly made by previous works on these reconciliations, namely that solving the (maximum) parsimony [Formula: see text] reconciliation problem in the discrete framework is equivalent to finding a most parsimonious [Formula: see text] scenario in the continuous framework. In the process, we also prove several intermediate results that are useful on their own and constitute a theoretical toolbox that will likely facilitate future theoretical contributions in the field.


Asunto(s)
Evolución Biológica , Duplicación de Gen , Modelos Biológicos , Algoritmos , Evolución Molecular , Eliminación de Gen , Transferencia de Gen Horizontal , Especiación Genética , Filogenia
9.
BMC Bioinformatics ; 16: 384, 2015 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-26573665

RESUMEN

BACKGROUND: Given a gene and a species tree, reconciliation methods attempt to retrieve the macro-evolutionary events that best explain the discrepancies between the two tree topologies. The DTL parsimonious approach searches for a most parsimonious reconciliation between a gene tree and a (dated) species tree, considering four possible macro-evolutionary events (speciation, duplication, transfer, and loss) with specific costs. Unfortunately, many events are erroneously predicted due to errors in the input trees, inappropriate input cost values or because of the existence of several equally parsimonious scenarios. It is thus crucial to provide a measure of the reliability for predicted events. It has been recently proposed that the reliability of an event can be estimated via its frequency in the set of most parsimonious reconciliations obtained using a variety of reasonable input cost vectors. To compute such a support, a straightforward but time-consuming approach is to generate the costs slightly departing from the original ones, independently compute the set of all most parsimonious reconciliations for each vector, and combine these sets a posteriori. Another proposed approach uses Pareto-optimality to partition cost values into regions which induce reconciliations with the same number of DTL events. The support of an event is then defined as its frequency in the set of regions. However, often, the number of regions is not large enough to provide reliable supports. RESULTS: We present here a method to compute efficiently event supports via a polynomial-sized graph, which can represent all reconciliations for several different costs. Moreover, two methods are proposed to take into account alternative input costs: either explicitly providing an input cost range or allowing a tolerance for the over cost of a reconciliation. Our methods are faster than the region based method, substantially faster than the sampling-costs approach, and have a higher event-prediction accuracy on simulated data. CONCLUSIONS: We propose a new approach to improve the accuracy of event supports for parsimonious reconciliation methods to account for uncertainty in the input costs. Furthermore, because of their speed, our methods can be used on large gene families. Our algorithms are implemented in the ecceTERA program, freely available from http://mbb.univ-montp2.fr/MBB/.


Asunto(s)
Evolución Molecular , Filogenia , Proteobacteria/genética , Algoritmos , Simulación por Computador , Genes Bacterianos , Reproducibilidad de los Resultados
10.
BMC Bioinformatics ; 16: 83, 2015 Mar 14.
Artículo en Inglés | MEDLINE | ID: mdl-25887746

RESUMEN

BACKGROUND: Semantic approaches such as concept-based information retrieval rely on a corpus in which resources are indexed by concepts belonging to a domain ontology. In order to keep such applications up-to-date, new entities need to be frequently annotated to enrich the corpus. However, this task is time-consuming and requires a high-level of expertise in both the domain and the related ontology. Different strategies have thus been proposed to ease this indexing process, each one taking advantage from the features of the document. RESULTS: In this paper we present USI (User-oriented Semantic Indexer), a fast and intuitive method for indexing tasks. We introduce a solution to suggest a conceptual annotation for new entities based on related already indexed documents. Our results, compared to those obtained by previous authors using the MeSH thesaurus and a dataset of biomedical papers, show that the method surpasses text-specific methods in terms of both quality and speed. Evaluations are done via usual metrics and semantic similarity. CONCLUSIONS: By only relying on neighbor documents, the User-oriented Semantic Indexer does not need a representative learning set. Yet, it provides better results than the other approaches by giving a consistent annotation scored with a global criterion - instead of one score per concept.


Asunto(s)
Indización y Redacción de Resúmenes , Algoritmos , Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural , Semántica , Interfaz Usuario-Computador , Humanos , Medical Subject Headings , Reconocimiento de Normas Patrones Automatizadas , Vocabulario Controlado
11.
Mol Biol Evol ; 31(7): 1923-8, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24723423

RESUMEN

Comparative genomic studies extensively rely on alignments of orthologous sequences. Yet, selecting, gathering, and aligning orthologous exons and protein-coding sequences (CDS) that are relevant for a given evolutionary analysis can be a difficult and time-consuming task. In this context, we developed OrthoMaM, a database of ORTHOlogous MAmmalian Markers describing the evolutionary dynamics of orthologous genes in mammalian genomes using a phylogenetic framework. Since its first release in 2007, OrthoMaM has regularly evolved, not only to include newly available genomes but also to incorporate up-to-date software in its analytic pipeline. This eighth release integrates the 40 complete mammalian genomes available in Ensembl v73 and provides alignments, phylogenies, evolutionary descriptor information, and functional annotations for 13,404 single-copy orthologous CDS and 6,953 long exons. The graphical interface allows to easily explore OrthoMaM to identify markers with specific characteristics (e.g., taxa availability, alignment size, %G+C, evolutionary rate, chromosome location). It hence provides an efficient solution to sample preprocessed markers adapted to user-specific needs. OrthoMaM has proven to be a valuable resource for researchers interested in mammalian phylogenomics, evolutionary genomics, and has served as a source of benchmark empirical data sets in several methodological studies. OrthoMaM is available for browsing, query and complete or filtered downloads at http://www.orthomam.univ-montp2.fr/.


Asunto(s)
Bases de Datos Genéticas , Mamíferos/clasificación , Mamíferos/genética , Animales , Secuencia de Bases , Secuencia Conservada , Evolución Molecular , Exones , Genómica , Humanos , Filogenia , Alineación de Secuencia , Programas Informáticos , Navegador Web
12.
J Math Biol ; 71(5): 1179-209, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-25502987

RESUMEN

Reconciliations between gene and species trees have important applications in the study of genome evolution (e.g. sequence orthology prediction or quantification of transfer events). While numerous methods have been proposed to infer them, little has been done to study the underlying reconciliation space. In this paper, we characterise the reconciliation space for two evolutionary models: the [Formula: see text] (duplication, loss and transfer) model and a variant of it-the no-[Formula: see text] model-which does not allow [Formula: see text] events (a transfer immediately followed by a loss). We provide formulae to compute the size of the corresponding spaces and define a set of transformation operators sufficient to explore the entire reconciliation space. We also define a distance between two reconciliations as the minimal number of operations needed to transform one into the other and prove that this distance is easily computable in the no-[Formula: see text] model. Computing this distance in the [Formula: see text] model is more difficult and it is an open question whether it is NP-hard or not. This work constitutes an important step toward reconciliation space characterisation and reconciliation comparison, needed to better assess the performance of reconciliation inference methods through simulations.


Asunto(s)
Evolución Molecular , Modelos Genéticos , Filogenia , Algoritmos , Simulación por Computador , Eliminación de Gen , Duplicación de Gen , Transferencia de Gen Horizontal , Especiación Genética , Conceptos Matemáticos , Familia de Multigenes , Especificidad de la Especie
13.
Mol Biol Evol ; 30(9): 2134-44, 2013 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-23813978

RESUMEN

Despite the rapid increase of size in phylogenomic data sets, a number of important nodes on animal phylogeny are still unresolved. Among these, the rooting of the placental mammal tree is still a controversial issue. One difficulty lies in the pervasive phylogenetic conflicts among genes, with each one telling its own story, which may be reliable or not. Here, we identified a simple criterion, that is, the GC content, which substantially helps in determining which gene trees best reflect the species tree. We assessed the ability of 13,111 coding sequence alignments to correctly reconstruct the placental phylogeny. We found that GC-rich genes induced a higher amount of conflict among gene trees and performed worse than AT-rich genes in retrieving well-supported, consensual nodes on the placental tree. We interpret this GC effect mainly as a consequence of genome-wide variations in recombination rate. Indeed, recombination is known to drive GC-content evolution through GC-biased gene conversion and might be problematic for phylogenetic reconstruction, for instance, in an incomplete lineage sorting context. When we focused on the AT-richest fraction of the data set, the resolution level of the placental phylogeny was greatly increased, and a strong support was obtained in favor of an Afrotheria rooting, that is, Afrotheria as the sister group of all other placentals. We show that in mammals most conflicts among gene trees, which have so far hampered the resolution of the placental tree, are concentrated in the GC-rich regions of the genome. We argue that the GC content-because it is a reliable indicator of the long-term recombination rate-is an informative criterion that could help in identifying the most reliable molecular markers for species tree inference.


Asunto(s)
Secuencia Rica en At , Composición de Base , Evolución Molecular , Genoma , Mamíferos/clasificación , Filogenia , Animales , Femenino , Mamíferos/genética , Datos de Secuencia Molecular , Placenta/fisiología , Embarazo , Recombinación Genética , Alineación de Secuencia , Homología de Secuencia de Ácido Nucleico
14.
BMC Plant Biol ; 14: 151, 2014 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-24884640

RESUMEN

BACKGROUND: Recurrent gene duplication and retention played an important role in angiosperm genome evolution. It has been hypothesized that these processes contribute significantly to plant adaptation but so far this hypothesis has not been tested at the genome scale. RESULTS: We studied available sequenced angiosperm genomes to assess the frequency of positive selection footprints in lineage specific expanded (LSE) gene families compared to single-copy genes using a dN/dS-based test in a phylogenetic framework. We found 5.38% of alignments in LSE genes with codons under positive selection. In contrast, we found no evidence for codons under positive selection in the single-copy reference set. An analysis at the branch level shows that purifying selection acted more strongly on single-copy genes than on LSE gene clusters. Moreover we detect significantly more branches indicating evolution under positive selection and/or relaxed constraint in LSE genes than in single-copy genes. CONCLUSIONS: In this - to our knowledge -first genome-scale study we provide strong empirical support for the hypothesis that LSE genes fuel adaptation in angiosperms. Our conservative approach for detecting selection footprints as well as our results can be of interest for further studies on (plant) gene family evolution.


Asunto(s)
Adaptación Fisiológica/genética , Duplicación de Gen , Genoma de Planta , Análisis por Conglomerados , Codón/genética , Bases de Datos Genéticas , Anotación de Secuencia Molecular , Familia de Multigenes , Mutación/genética , Polimorfismo Genético , Selección Genética , Factores de Tiempo
15.
Nucleic Acids Res ; 40(18): 9102-14, 2012 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-22833609

RESUMEN

We have sequenced the genome of the emerging human pathogen Babesia microti and compared it with that of other protozoa. B. microti has the smallest nuclear genome among all Apicomplexan parasites sequenced to date with three chromosomes encoding ∼3500 polypeptides, several of which are species specific. Genome-wide phylogenetic analyses indicate that B. microti is significantly distant from all species of Babesidae and Theileridae and defines a new clade in the phylum Apicomplexa. Furthermore, unlike all other Apicomplexa, its mitochondrial genome is circular. Genome-scale reconstruction of functional networks revealed that B. microti has the minimal metabolic requirement for intraerythrocytic protozoan parasitism. B. microti multigene families differ from those of other protozoa in both the copy number and organization. Two lateral transfer events with significant metabolic implications occurred during the evolution of this parasite. The genomic sequencing of B. microti identified several targets suitable for the development of diagnostic assays and novel therapies for human babesiosis.


Asunto(s)
Babesia microti/genética , Genoma de Protozoos , Babesia microti/clasificación , Babesia microti/metabolismo , Glicosilfosfatidilinositoles/biosíntesis , Glicosilfosfatidilinositoles/metabolismo , Proteoma/metabolismo , Análisis de Secuencia de ADN
16.
Plant Methods ; 20(1): 103, 2024 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-39003455

RESUMEN

BACKGROUND: Genotyping of individuals plays a pivotal role in various biological analyses, with technology choice influenced by multiple factors including genomic constraints, number of targeted loci and individuals, cost considerations, and the ease of sample preparation and data processing. Target enrichment capture of specific polymorphic regions has emerged as a flexible and cost-effective genomic reduction method for genotyping, especially adapted to the case of very large genomes. However, this approach necessitates complex bioinformatics treatment to extract genotyping data from raw reads. Existing workflows predominantly cater to phylogenetic inference, leaving a gap in user-friendly tools for genotyping analysis based on capture methods. In response to these challenges, we have developed GeCKO (Genotyping Complexity Knocked-Out). To assess the effectiveness of combining target enrichment capture with GeCKO, we conducted a case study on durum wheat domestication history, involving sequencing, processing, and analyzing variants in four relevant durum wheat groups. RESULTS: GeCKO encompasses four distinct workflows, each designed for specific steps of genomic data processing: (i) read demultiplexing and trimming for data cleaning, (ii) read mapping to align sequences to a reference genome, (iii) variant calling to identify genetic variants, and (iv) variant filtering. Each workflow in GeCKO can be easily configured and is executable across diverse computational environments. The workflows generate comprehensive HTML reports including key summary statistics and illustrative graphs, ensuring traceable, reproducible results and facilitating straightforward quality assessment. A specific innovation within GeCKO is its 'targeted remapping' feature, specifically designed for efficient treatment of targeted enrichment capture data. This process consists of extracting reads mapped to the targeted regions, constructing a smaller sub-reference genome, and remapping the reads to this sub-reference, thereby enhancing the efficiency of subsequent steps. CONCLUSIONS: The case study results showed the expected intra-group diversity and inter-group differentiation levels, confirming the method's effectiveness for genotyping and analyzing genetic diversity in species with complex genomes. GeCKO streamlined the data processing, significantly improving computational performance and efficiency. The targeted remapping enabled straightforward SNP calling in durum wheat, a task otherwise complicated by the species' large genome size. This illustrates its potential applications in various biological research contexts.

17.
bioRxiv ; 2024 Jan 18.
Artículo en Inglés | MEDLINE | ID: mdl-38293033

RESUMEN

Babesiosis, caused by protozoan parasites of the genus Babesia , is an emerging tick-borne disease of significance for both human and animal health. Babesia parasites infect erythrocytes of vertebrate hosts where they develop and multiply rapidly to cause the pathological symptoms associated with the disease. The identification of various Babesia species underscores the ongoing risk of new zoonotic pathogens capable of infecting humans, a concern amplified by anthropogenic activities and environmental shifts impacting the distribution and transmission dynamics of parasites, their vectors, and reservoir hosts. One such species, Babesia MO1, previously implicated in severe cases of human babesiosis in the midwestern United States, was initially considered closely related to B. divergens , the predominant agent of human babesiosis in Europe. Yet, uncertainties persist regarding whether these pathogens represent distinct variants of the same species or are entirely separate species. We show that although both B. MO1 and B. divergens share similar genome sizes, comprising three nuclear chromosomes, one linear mitochondrial chromosome, and one circular apicoplast chromosome, major differences exist in terms of genomic sequence divergence, gene functions, transcription profiles, replication rates and susceptibility to antiparasitic drugs. Furthermore, both pathogens have evolved distinct classes of multigene families, crucial for their pathogenicity and adaptation to specific mammalian hosts. Leveraging genomic information for B. MO1, B. divergens , and other members of the Babesiidae family within Apicomplexa provides valuable insights into the evolution, diversity, and virulence of these parasites. This knowledge serves as a critical tool in preemptively addressing the emergence and rapid transmission of more virulent strains.

18.
BMC Bioinformatics ; 14: 332, 2013 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-24252193

RESUMEN

BACKGROUND: Genes located in the same chromosome region share common evolutionary events more often than other genes (e.g. a segmental duplication of this region). Their evolution may also be related if they are involved in the same protein complex or biological process. Identifying co-evolving genes can thus shed light on ancestral genome structures and functional gene interactions. RESULTS: We devise a simple, fast and accurate probability method based on species tree-gene tree reconciliations to detect when two gene families have co-evolved. Our method observes the number and location of predicted macro-evolutionary events, and estimates the probability of having the observed number of common events by chance. CONCLUSIONS: Simulation studies confirm that our method effectively identifies co-evolving families. This opens numerous perspectives on genome-scale analysis where this method could be used to pinpoint co-evolving gene families and thus help to unravel ancestral genome arrangements or undocumented gene interactions.


Asunto(s)
Evolución Molecular , Familia de Multigenes/genética , Simulación por Computador , Genoma Bacteriano , Filogenia , Probabilidad , Proteobacteria/genética , Distribución Aleatoria , Duplicaciones Segmentarias en el Genoma
19.
BMC Bioinformatics ; 14 Suppl 15: S15, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24564644

RESUMEN

BACKGROUND: Using Next Generation Sequencing, SNP discovery is relatively easy on diploid species and still hampered in polyploid species by the confusion due to homeology. We develop HomeoSplitter; a fast and effective solution to split original contigs obtained by RNAseq into two homeologous sequences. It uses the differential expression of the two homeologous genes in the RNA. We verify that the new sequences are closer to the diploid progenitors of the allopolyploid species than the original contig. By remapping original reads on these new sequences, we also verify that the number of valuable detected SNPs has significantly increased. RESULTS: HomeoSplitter is a fast and effective solution to disentangle homeologous sequences based on a maximum likelihood optimization. On a benchmark set of 2,505 clusters containing homologous sequences of urartu, speltoides and durum, HomeoSplitter was efficient to build sequences closer to the diploid references and increased the number of valuable SNPs from 188 out of 1,360 SNPs detected when mapping the reads on the de novo durum assembly to 762 out of 1,620 SNPs when mapping on HomeoSplitter contigs. CONCLUSIONS: The HomeoSplitter program is freely available at http://bioweb.supagro.inra.fr/homeoSplitter/. This work provides a practical solution to the complex problem of disentangling homeologous transcripts in allo-tetraploids, which further allows an improved SNP detection.


Asunto(s)
Análisis de Secuencia de ADN , Tetraploidía , Triticum/genética , Secuencia de Bases , Diploidia , Secuenciación de Nucleótidos de Alto Rendimiento , Polimorfismo de Nucleótido Simple
20.
Mol Biol Evol ; 29(7): 1861-74, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22319139

RESUMEN

The analysis of extant sequences shows that molecular evolution has been heterogeneous through time and among lineages. However, for a given sequence alignment, it is often difficult to uncover what factors caused this heterogeneity. In fact, identifying and characterizing heterogeneous patterns of molecular evolution along a phylogenetic tree is very challenging, for lack of appropriate methods. Users either have to a priori define groups of branches along which they believe molecular evolution has been similar or have to allow each branch to have its own pattern of molecular evolution. The first approach assumes prior knowledge that is seldom available, and the second requires estimating an unreasonably large number of parameters. Here we propose a convenient and reliable approach where branches get clustered by their pattern of molecular evolution alone, with no need for prior knowledge about the data set under study. Model selection is achieved in a statistical framework and therefore avoids overparameterization. We rely on substitution mapping for efficiency and present two clustering approaches, depending on whether or not we expect neighbouring branches to share more similar patterns of sequence evolution than distant branches. We validate our method on simulations and test it on four previously published data sets. We find that our method correctly groups branches sharing similar equilibrium GC contents in a data set of ribosomal RNAs and recovers expected footprints of selection through dN/dS. Importantly, it also uncovers a new pattern of relaxed selection in a phylogeny of Mantellid frogs, which we are able to correlate to life-history traits. This shows that our programs should be very useful to study patterns of molecular evolution and reveal new correlations between sequence and species evolution. Our programs can run on DNA, RNA, codon, or amino acid sequences with a large set of possible models of substitutions and are available at http://biopp.univ-montp2.fr/forge/testnh.


Asunto(s)
Algoritmos , Evolución Molecular , Modelos Genéticos , Animales , Evolución Biológica , Análisis por Conglomerados , Simulación por Computador , Daphnia/genética , Muramidasa/genética , Filogenia , ARN Ribosómico/genética , Ranidae/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA