Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 77
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
2.
Syst Biol ; 72(6): 1370-1386, 2023 Dec 30.
Artículo en Inglés | MEDLINE | ID: mdl-37703307

RESUMEN

Phylogenetic tree reconciliation is extensively employed for the examination of coevolution between host and symbiont species. An important concern is the requirement for dependable cost values when selecting event-based parsimonious reconciliation. Although certain approaches deduce event probabilities unique to each pair of host and symbiont trees, which can subsequently be converted into cost values, a significant limitation lies in their inability to model the invasion of diverse host species by the same symbiont species (termed as a spread event), which is believed to occur in symbiotic relationships. Invasions lead to the observation of multiple associations between symbionts and their hosts (indicating that a symbiont is no longer exclusive to a single host), which are incompatible with the existing methods of coevolution. Here, we present a method called AmoCoala (an enhanced version of the tool Coala) that provides a more realistic estimation of cophylogeny event probabilities for a given pair of host and symbiont trees, even in the presence of spread events. We expand the classical 4-event coevolutionary model to include 2 additional outcomes, vertical and horizontal spreads, that lead to multiple associations. In the initial step, we estimate the probabilities of spread events using heuristic frequencies. Subsequently, in the second step, we employ an approximate Bayesian computation approach to infer the probabilities of the remaining 4 classical events (cospeciation, duplication, host switch, and loss) based on these values. By incorporating spread events, our reconciliation model enables a more accurate consideration of multiple associations. This improvement enhances the precision of estimated cost sets, paving the way to a more reliable reconciliation of host and symbiont trees. To validate our method, we conducted experiments on synthetic datasets and demonstrated its efficacy using real-world examples. Our results showcase that AmoCoala produces biologically plausible reconciliation scenarios, further emphasizing its effectiveness.


Asunto(s)
Especificidad del Huésped , Simbiosis , Filogenia , Teorema de Bayes
3.
Genes (Basel) ; 14(3)2023 03 07.
Artículo en Inglés | MEDLINE | ID: mdl-36980936

RESUMEN

By pairing to messenger RNAs (mRNAs for short), microRNAs (miRNAs) regulate gene expression in animals and plants. Accurately identifying which mRNAs interact with a given miRNA and the precise location of the interaction sites is crucial to reaching a more complete view of the regulatory network of an organism. Only a few experimental approaches, however, allow the identification of both within a single experiment. Computational predictions of miRNA-mRNA interactions thus remain generally the first step used, despite their drawback of a high rate of false-positive predictions. The major computational approaches available rely on a diversity of features, among which anchoring the miRNA seed and measuring mRNA accessibility are the key ones, with the first being universally used, while the use of the second remains controversial. Revisiting the importance of each is the aim of this paper, which uses Cross-Linking, Ligation, And Sequencing of Hybrids (CLASH) datasets to achieve this goal. Contrary to what might be expected, the results are more ambiguous regarding the use of the seed match as a feature, while accessibility appears to be a feature worth considering, indicating that, at least under some conditions, it may favour anchoring by miRNAs.


Asunto(s)
Regulación de la Expresión Génica , MicroARNs , ARN Mensajero , MicroARNs/genética , MicroARNs/metabolismo , ARN Mensajero/genética
4.
Gigascience ; 112022 10 25.
Artículo en Inglés | MEDLINE | ID: mdl-36283679

RESUMEN

MicroRNAs (miRNAs) are small noncoding RNAs that are key players in the regulation of gene expression. In the past decade, with the increasing accessibility of high-throughput sequencing technologies, different methods have been developed to identify miRNAs, most of which rely on preexisting reference genomes. However, when a reference genome is absent or is not of high quality, such identification becomes more difficult. In this context, we developed BrumiR, an algorithm that is able to discover miRNAs directly and exclusively from small RNA (sRNA) sequencing (sRNA-seq) data. We benchmarked BrumiR with datasets encompassing animal and plant species using real and simulated sRNA-seq experiments. The results demonstrate that BrumiR reaches the highest recall for miRNA discovery, while at the same time being much faster and more efficient than the state-of-the-art tools evaluated. The latter allows BrumiR to analyze a large number of sRNA-seq experiments, from plants or animal species. Moreover, BrumiR detects additional information regarding other expressed sequences (sRNAs, isomiRs, etc.), thus maximizing the biological insight gained from sRNA-seq experiments. Additionally, when a reference genome is available, BrumiR provides a new mapping tool (BrumiR2reference) that performs an a posteriori exhaustive search to identify the precursor sequences. Finally, we also provide a machine learning classifier based on a random forest model that evaluates the sequence-derived features to further refine the prediction obtained from the BrumiR-core. The code of BrumiR and all the algorithms that compose the BrumiR toolkit are freely available at https://github.com/camoragaq/BrumiR.


Asunto(s)
MicroARNs , ARN Pequeño no Traducido , Animales , MicroARNs/genética , MicroARNs/metabolismo , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN Pequeño no Traducido/genética
5.
Front Genet ; 13: 815476, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35281848

RESUMEN

Motivation: The increasing availability of metabolomic data and their analysis are improving the understanding of cellular mechanisms and how biological systems respond to different perturbations. Currently, there is a need for novel computational methods that facilitate the analysis and integration of increasing volume of available data. Results: In this paper, we present Totoro a new constraint-based approach that integrates quantitative non-targeted metabolomic data of two different metabolic states into genome-wide metabolic models and predicts reactions that were most likely active during the transient state. We applied Totoro to real data of three different growth experiments (pulses of glucose, pyruvate, succinate) from Escherichia coli and we were able to predict known active pathways and gather new insights on the different metabolisms related to each substrate. We used both the E. coli core and the iJO1366 models to demonstrate that our approach is applicable to both smaller and larger networks. Availability: Totoro is an open source method (available at https://gitlab.inria.fr/erable/totoro) suitable for any organism with an available metabolic model. It is implemented in C++ and depends on IBM CPLEX which is freely available for academic purposes.

6.
Algorithms Mol Biol ; 17(1): 2, 2022 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-35168648

RESUMEN

BACKGROUND: Cophylogeny reconciliation is a powerful method for analyzing host-parasite (or host-symbiont) co-evolution. It models co-evolution as an optimization problem where the set of all optimal solutions may represent different biological scenarios which thus need to be analyzed separately. Despite the significant research done in the area, few approaches have addressed the problem of helping the biologist deal with the often huge space of optimal solutions. RESULTS: In this paper, we propose a new approach to tackle this problem. We introduce three different criteria under which two solutions may be considered biologically equivalent, and then we propose polynomial-delay algorithms that enumerate only one representative per equivalence class (without listing all the solutions). CONCLUSIONS: Our results are of both theoretical and practical importance. Indeed, as shown by the experiments, we are able to significantly reduce the space of optimal solutions while still maintaining important biological information about the whole space.

7.
BMC Biol ; 19(1): 241, 2021 11 09.
Artículo en Inglés | MEDLINE | ID: mdl-34749730

RESUMEN

BACKGROUND: The rice weevil Sitophilus oryzae is one of the most important agricultural pests, causing extensive damage to cereal in fields and to stored grains. S. oryzae has an intracellular symbiotic relationship (endosymbiosis) with the Gram-negative bacterium Sodalis pierantonius and is a valuable model to decipher host-symbiont molecular interactions. RESULTS: We sequenced the Sitophilus oryzae genome using a combination of short and long reads to produce the best assembly for a Curculionidae species to date. We show that S. oryzae has undergone successive bursts of transposable element (TE) amplification, representing 72% of the genome. In addition, we show that many TE families are transcriptionally active, and changes in their expression are associated with insect endosymbiotic state. S. oryzae has undergone a high gene expansion rate, when compared to other beetles. Reconstruction of host-symbiont metabolic networks revealed that, despite its recent association with cereal weevils (30 kyear), S. pierantonius relies on the host for several amino acids and nucleotides to survive and to produce vitamins and essential amino acids required for insect development and cuticle biosynthesis. CONCLUSIONS: Here we present the genome of an agricultural pest beetle, which may act as a foundation for pest control. In addition, S. oryzae may be a useful model for endosymbiosis, and studying TE evolution and regulation, along with the impact of TEs on eukaryotic genomes.


Asunto(s)
Escarabajos , Gorgojos , Animales , Comunicación Celular , Elementos Transponibles de ADN/genética , Grano Comestible , Humanos , Gorgojos/genética
8.
NAR Genom Bioinform ; 3(1): lqab009, 2021 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-33709074

RESUMEN

The human gut microbiota performs functions that are essential for the maintenance of the host physiology. However, characterizing the functioning of microbial communities in relation to the host remains challenging in reference-based metagenomic analyses. Indeed, as taxonomic and functional analyses are performed independently, the link between genes and species remains unclear. Although a first set of species-level bins was built by clustering co-abundant genes, no reference bin set is established on the most used gut microbiota catalog, the Integrated Gene Catalog (IGC). With the aim to identify the best suitable method to group the IGC genes, we benchmarked nine taxonomy-independent binners implementing abundance-based, hybrid and integrative approaches. To this purpose, we designed a simulated non-redundant gene catalog (SGC) and computed adapted assessment metrics. Overall, the best trade-off between the main metrics is reached by an integrative binner. For each approach, we then compared the results of the best-performing binner with our expected community structures and applied the method to the IGC. The three approaches are distinguished by specific advantages, and by inherent or scalability limitations. Hybrid and integrative binners show promising and potentially complementary results but require improvements to be used on the IGC to recover human gut microbial species.

9.
Nat Biotechnol ; 39(4): 422-430, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33318652

RESUMEN

Generating accurate genome assemblies of large, repeat-rich human genomes has proved difficult using only long, error-prone reads, and most human genomes assembled from long reads add accurate short reads to polish the consensus sequence. Here we report an algorithm for hybrid assembly, WENGAN, that provides very high quality at low computational cost. We demonstrate de novo assembly of four human genomes using a combination of sequencing data generated on ONT PromethION, PacBio Sequel, Illumina and MGI technology. WENGAN implements efficient algorithms to improve assembly contiguity as well as consensus quality. The resulting genome assemblies have high contiguity (contig NG50: 17.24-80.64 Mb), few assembly errors (contig NGA50: 11.8-59.59 Mb), good consensus quality (QV: 27.84-42.88) and high gene completeness (BUSCO complete: 94.6-95.2%), while consuming low computational resources (CPU hours: 187-1,200). In particular, the WENGAN assembly of the haploid CHM13 sample achieved a contig NG50 of 80.64 Mb (NGA50: 59.59 Mb), which surpasses the contiguity of the current human reference genome (GRCh38 contig NG50: 57.88 Mb).


Asunto(s)
Biología Computacional/métodos , Mapeo Contig/métodos , Genoma Humano , Algoritmos , Haploidia , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN
10.
Sci Rep ; 10(1): 13707, 2020 08 13.
Artículo en Inglés | MEDLINE | ID: mdl-32792522

RESUMEN

Mycoplasma hyopneumoniae is the most costly pathogen for swine production. Although several studies have focused on the host-bacterium association, little is known about the changes in gene expression of swine cells upon infection. To improve our understanding of this interaction, we infected swine epithelial NPTr cells with M. hyopneumoniae strain J to identify differentially expressed mRNAs and miRNAs. The levels of 1,268 genes and 170 miRNAs were significantly modified post-infection. Up-regulated mRNAs were enriched in genes related to redox homeostasis and antioxidant defense, known to be regulated by the transcription factor NRF2 in related species. Down-regulated mRNAs were enriched in genes associated with cytoskeleton and ciliary functions. Bioinformatic analyses suggested a correlation between changes in miRNA and mRNA levels, since we detected down-regulation of miRNAs predicted to target antioxidant genes and up-regulation of miRNAs targeting ciliary and cytoskeleton genes. Interestingly, most down-regulated miRNAs were detected in exosome-like vesicles suggesting that M. hyopneumoniae infection induced a modification of the composition of NPTr-released vesicles. Taken together, our data indicate that M. hyopneumoniae elicits an antioxidant response induced by NRF2 in infected cells. In addition, we propose that ciliostasis caused by this pathogen is partially explained by the down-regulation of ciliary genes.


Asunto(s)
Antioxidantes/metabolismo , Proteínas Bacterianas/metabolismo , Cilios/genética , Células Epiteliales/metabolismo , Mycoplasma hyopneumoniae/genética , Mycoplasma hyopneumoniae/metabolismo , Neumonía Porcina por Mycoplasma/microbiología , Animales , Proteínas Bacterianas/genética , Biomarcadores/análisis , Células Cultivadas , Cilios/metabolismo , Células Epiteliales/microbiología , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , MicroARNs/análisis , Mycoplasma hyopneumoniae/crecimiento & desarrollo , Neumonía Porcina por Mycoplasma/genética , Neumonía Porcina por Mycoplasma/metabolismo , ARN Mensajero/análisis , Porcinos
11.
Int J Biol Macromol ; 163: 240-250, 2020 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-32622773

RESUMEN

Reconstruction of genome-based metabolic model is a useful approach for the assessment of metabolic pathways, genes and proteins involved in the environmental fitness capabilities or pathogenic potential as well as for biotechnological processes development. Pseudomonas sp. LFM046 was selected as a good polyhydroxyalkanoates (PHA) producer from carbohydrates and plant oils. Its complete genome sequence and metabolic model were obtained. Analysis revealed that the gnd gene, encoding 6-phosphogluconate dehydrogenase, is absent in Pseudomonas sp. LFM046 genome. In order to improve the knowledge about LFM046 metabolism, the coenzyme specificities of different enzymes was evaluated. Furthermore, the heterologous expression of gnd genes from Pseudomonas putida KT2440 (NAD+ dependent) and Escherichia coli MG1655 (NADP+ dependent) in LFM046 was carried out and provoke a delay on cell growth and a reduction in PHA yield, respectively. The results indicate that the adjustment in cyclic Entner-Doudoroff pathway may be an interesting strategy for it and other bacteria to simultaneously meet divergent cell needs during cultivation phases of growth and PHA production.


Asunto(s)
Coenzimas/metabolismo , Fosfogluconato Deshidrogenasa/metabolismo , Polihidroxialcanoatos/biosíntesis , Pseudomonas/metabolismo , Metabolismo de los Hidratos de Carbono , Activación Enzimática , Genoma Bacteriano , Redes y Vías Metabólicas , Filogenia , Pseudomonas/clasificación , Pseudomonas/genética , ARN Ribosómico 16S/genética , Especificidad por Sustrato , Virulencia
12.
Algorithms Mol Biol ; 15: 14, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32704304

RESUMEN

Cytoplasmic incompatibility (CI) relates to the manipulation by the parasite Wolbachia of its host reproduction. Despite its widespread occurrence, the molecular basis of CI remains unclear and theoretical models have been proposed to understand the phenomenon. We consider in this paper the quantitative Lock-Key model which currently represents a good hypothesis that is consistent with the data available. CI is in this case modelled as the problem of covering the edges of a bipartite graph with the minimum number of chain subgraphs. This problem is already known to be NP-hard, and we provide an exponential algorithm with a non trivial complexity. It is frequent that depending on the dataset, there may be many optimal solutions which can be biologically quite different among them. To rely on a single optimal solution may therefore be problematic. To this purpose, we address the problem of enumerating (listing) all minimal chain subgraph covers of a bipartite graph and show that it can be solved in quasi-polynomial time. Interestingly, in order to solve the above problems, we considered also the problem of enumerating all the maximal chain subgraphs of a bipartite graph and improved on the current results in the literature for the latter. Finally, to demonstrate the usefulness of our methods we show an application on a real dataset.

13.
Bioinformatics ; 36(14): 4197-4199, 2020 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-32556075

RESUMEN

MOTIVATION: Phylogenetic tree reconciliation is the method of choice in analyzing host-symbiont systems. Despite the many reconciliation tools that have been proposed in the literature, two main issues remain unresolved: (i) listing suboptimal solutions (i.e. whose score is 'close' to the optimal ones) and (ii) listing only solutions that are biologically different 'enough'. The first issue arises because the optimal solutions are not always the ones biologically most significant; providing many suboptimal solutions as alternatives for the optimal ones is thus very useful. The second one is related to the difficulty to analyze an often huge number of optimal solutions. In this article, we propose Capybara that addresses both of these problems in an efficient way. Furthermore, it includes a tool for visualizing the solutions that significantly helps the user in the process of analyzing the results. AVAILABILITY AND IMPLEMENTATION: The source code, documentation and binaries for all platforms are freely available at https://capybara-doc.readthedocs.io/. CONTACT: yishu.wang@univ-lyon1.fr or blerina.sinaimeri@inria.fr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Roedores , Animales , Filogenia , Programas Informáticos
14.
BMC Bioinformatics ; 21(1): 69, 2020 Feb 24.
Artículo en Inglés | MEDLINE | ID: mdl-32093622

RESUMEN

BACKGROUND: In this paper, we explore the concept of multi-objective optimization in the field of metabolic engineering when both continuous and integer decision variables are involved in the model. In particular, we propose a multi-objective model that may be used to suggest reaction deletions that maximize and/or minimize several functions simultaneously. The applications may include, among others, the concurrent maximization of a bioproduct and of biomass, or maximization of a bioproduct while minimizing the formation of a given by-product, two common requirements in microbial metabolic engineering. RESULTS: Production of ethanol by the widely used cell factory Saccharomyces cerevisiae was adopted as a case study to demonstrate the usefulness of the proposed approach in identifying genetic manipulations that improve productivity and yield of this economically highly relevant bioproduct. We did an in vivo validation and we could show that some of the predicted deletions exhibit increased ethanol levels in comparison with the wild-type strain. CONCLUSIONS: The multi-objective programming framework we developed, called MOMO, is open-source and uses POLYSCIP (Available at http://polyscip.zib.de/). as underlying multi-objective solver. MOMO is available at http://momo-sysbio.gforge.inria.fr.


Asunto(s)
Ingeniería Metabólica/métodos , Programas Informáticos , Biomasa , Etanol/metabolismo , Modelos Biológicos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
15.
Bioinformatics ; 36(2): 514-523, 2020 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-31504164

RESUMEN

MOTIVATION: Analysis of differential expression of genes is often performed to understand how the metabolic activity of an organism is impacted by a perturbation. However, because the system of metabolic regulation is complex and all changes are not directly reflected in the expression levels, interpreting these data can be difficult. RESULTS: In this work, we present a new algorithm and computational tool that uses a genome-scale metabolic reconstruction to infer metabolic changes from differential expression data. Using the framework of constraint-based analysis, our method produces a qualitative hypothesis of a change in metabolic activity. In other words, each reaction of the network is inferred to have increased, decreased, or remained unchanged in flux. In contrast to similar previous approaches, our method does not require a biological objective function and does not assign on/off activity states to genes. An implementation is provided and it is available online. We apply the method to three published datasets to show that it successfully accomplishes its two main goals: confirming or rejecting metabolic changes suggested by differentially expressed genes based on how well they fit in as parts of a coordinated metabolic change, as well as inferring changes in reactions whose genes did not undergo differential expression. AVAILABILITY AND IMPLEMENTATION: github.com/htpusa/moomin. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes y Vías Metabólicas , Algoritmos , Biología Computacional , Genoma , Modelos Biológicos
16.
Syst Biol ; 68(4): 607-618, 2019 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-30418649

RESUMEN

Tree reconciliation is the mathematical tool that is used to investigate the coevolution of organisms, such as hosts and parasites. A common approach to tree reconciliation involves specifying a model that assigns costs to certain events, such as cospeciation, and then tries to find a mapping between two specified phylogenetic trees which minimizes the total cost of the implied events. For such models, it has been shown that there may be a huge number of optimal solutions, or at least solutions that are close to optimal. It is therefore of interest to be able to systematically compare and visualize whole collections of reconciliations between a specified pair of trees. In this article, we consider various metrics on the set of all possible reconciliations between a pair of trees, some that have been defined before but also new metrics that we shall propose. We show that the diameter for the resulting spaces of reconciliations can in some cases be determined theoretically, information that we use to normalize and compare properties of the metrics. We also implement the metrics and compare their behavior on several host parasite data sets, including the shapes of their distributions. In addition, we show that in combination with multidimensional scaling, the metrics can be useful for visualizing large collections of reconciliations, much in the same way as phylogenetic tree metrics can be used to explore collections of phylogenetic trees. Implementations of the metrics can be downloaded from: https://team.inria.fr/erable/en/team-members/blerina-sinaimeri/reconciliation-distances/.


Asunto(s)
Clasificación/métodos , Interacciones Huésped-Parásitos/fisiología , Filogenia , Modelos Biológicos
17.
Front Microbiol ; 9: 2141, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30258423

RESUMEN

Xylella fastidiosa is a notorious plant pathogenic bacterium that represents a threat to crops worldwide. Its subspecies, Xylella fastidiosa subsp. fastidiosa is the causal agent of Pierce's disease of grapevines. Pierce's disease has presented a serious challenge for the grapevine industry in the United States and turned into an epidemic in Southern California due to the invasion of the insect vector Homalodisca vitripennis. In an attempt to minimize the effects of Xylella fastidiosa subsp. fastidiosa in vineyards, various studies have been developing and testing strategies to prevent the occurrence of Pierce's disease, i.e., prophylactic strategies. Research has also been undertaken to investigate therapeutic strategies to cure vines infected by Xylella fastidiosa subsp. fastidiosa. This report explicitly reviews all the strategies published to date and specifies their current status. Furthermore, an epidemiological model of Xylella fastidiosa subsp. fastidiosa is proposed and key parameters for the spread of Pierce's disease deciphered in a sensitivity analysis of all model parameters. Based on these results, it is concluded that future studies should prioritize therapeutic strategies, while investments should only be made in prophylactic strategies that have demonstrated promising results in vineyards.

18.
Sci Rep ; 8(1): 10755, 2018 Jul 17.
Artículo en Inglés | MEDLINE | ID: mdl-30018343

RESUMEN

Klebsiella pneumoniae (Kp) is a globally disseminated opportunistic pathogen that can cause life-threatening infections. It has been found as the culprit of many infection outbreaks in hospital environments, being particularly aggressive towards newborns and adults under intensive care. Many Kp strains produce extended-spectrum ß-lactamases, enzymes that promote resistance against antibiotics used to fight these infections. The presence of other resistance determinants leading to multidrug-resistance also limit therapeutic options, and the use of 'last-resort' drugs, such as polymyxins, is not uncommon. The global emergence and spread of resistant strains underline the need for novel antimicrobials against Kp and related bacterial pathogens. To tackle this great challenge, we generated multiple layers of 'omics' data related to Kp and prioritized proteins that could serve as attractive targets for antimicrobial development. Genomics, transcriptomics, structuromic and metabolic information were integrated in order to prioritize candidate targets, and this data compendium is freely available as a web server. Twenty-nine proteins with desirable characteristics from a drug development perspective were shortlisted, which participate in important processes such as lipid synthesis, cofactor production, and core metabolism. Collectively, our results point towards novel targets for the control of Kp and related bacterial pathogens.


Asunto(s)
Descubrimiento de Drogas/métodos , Klebsiella pneumoniae/efectos de los fármacos , Antibacterianos/química , Antibacterianos/farmacología , Proteínas Bacterianas/química , Farmacorresistencia Bacteriana Múltiple/genética , Genoma Bacteriano , Genómica , Humanos , Klebsiella pneumoniae/genética , Klebsiella pneumoniae/metabolismo , Redes y Vías Metabólicas , Metabolómica , Modelos Moleculares , Estructura Terciaria de Proteína , Transcriptoma
19.
Artículo en Inglés | MEDLINE | ID: mdl-29993554

RESUMEN

The aim of this paper is to explore the robustness of the parsimonious host-symbiont tree reconciliation method under editing or small perturbations of the input. The editing involves making different choices of unique symbiont mapping to a host in the case where multiple associations exist. This is made necessary by the fact that the tree reconciliation model is currently unable to handle such associations. The analysis performed could however also address the problem of errors. The perturbations are re-rootings of the symbiont tree to deal with a possibly wrong placement of the root specially in the case of fast-evolving species. In order to do this robustness analysis, we introduce a simulation scheme specifically designed for the host-symbiont cophylogeny context, as well as a measure to compare sets of tree reconciliations, both of which are of interest by themselves.

20.
Gigascience ; 7(5)2018 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-29741627

RESUMEN

Background: Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes. Results: Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffoldinggraph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878). Conclusions: Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Arabidopsis/genética , Escherichia coli K12/genética , Biblioteca de Genes , Genoma Bacteriano , Genoma Humano , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...