Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Genes (Basel) ; 15(2)2024 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-38397211

RESUMO

The SpTransformer (SpTrf) gene family in the purple sea urchin, Strongylocentrotus purpuratus, encodes immune response proteins. The genes are clustered, surrounded by short tandem repeats, and some are present in genomic segmental duplications. The genes share regions of sequence and include repeats in the coding exon. This complex structure is consistent with putative local genomic instability. Instability of the SpTrf gene cluster was tested by 10 days of growth of Escherichia coli harboring bacterial artificial chromosome (BAC) clones of sea urchin genomic DNA with inserts containing SpTrf genes. After the growth period, the BAC DNA inserts were analyzed for size and SpTrf gene content. Clones with multiple SpTrf genes showed a variety of deletions, including loss of one, most, or all genes from the cluster. Alternatively, a BAC insert with a single SpTrf gene was stable. BAC insert instability is consistent with variations in the gene family composition among sea urchins, the types of SpTrf genes in the family, and a reduction in the gene copy number in single coelomocytes. Based on the sequence variability among SpTrf genes within and among sea urchins, local genomic instability of the family may be important for driving sequence diversity in this gene family that would be of benefit to sea urchins in their arms race with marine microbes.


Assuntos
Strongylocentrotus purpuratus , Animais , Strongylocentrotus purpuratus/genética , Cromossomos Artificiais Bacterianos/genética , Família Multigênica , DNA , Ouriços-do-Mar/genética , Instabilidade Genômica
2.
Gigascience ; 10(3)2021 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-33718948

RESUMO

BACKGROUND: Anopheles coluzzii and Anopheles arabiensis belong to the Anopheles gambiae complex and are among the major malaria vectors in sub-Saharan Africa. However, chromosome-level reference genome assemblies are still lacking for these medically important mosquito species. FINDINGS: In this study, we produced de novo chromosome-level genome assemblies for A. coluzzii and A. arabiensis using the long-read Oxford Nanopore sequencing technology and the Hi-C scaffolding approach. We obtained 273.4 and 256.8 Mb of the total assemblies for A. coluzzii and A. arabiensis, respectively. Each assembly consists of 3 chromosome-scale scaffolds (X, 2, 3), complete mitochondrion, and unordered contigs identified as autosomal pericentromeric DNA, X pericentromeric DNA, and Y sequences. Comparison of these assemblies with the existing assemblies for these species demonstrated that we obtained improved reference-quality genomes. The new assemblies allowed us to identify genomic coordinates for the breakpoint regions of fixed and polymorphic chromosomal inversions in A. coluzzii and A. arabiensis. CONCLUSION: The new chromosome-level assemblies will facilitate functional and population genomic studies in A. coluzzii and A. arabiensis. The presented assembly pipeline will accelerate progress toward creating high-quality genome references for other disease vectors.


Assuntos
Anopheles , Malária , Animais , Anopheles/genética , Cromossomos/genética , Genômica , Malária/genética , Mosquitos Vetores/genética
3.
Bioinformatics ; 36(10): 2993-3003, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32058559

RESUMO

MOTIVATION: One of the key computational problems in comparative genomics is the reconstruction of genomes of ancestral species based on genomes of extant species. Since most dramatic changes in genomic architectures are caused by genome rearrangements, this problem is often posed as minimization of the number of genome rearrangements between extant and ancestral genomes. The basic case of three given genomes is known as the genome median problem. Whole-genome duplications (WGDs) represent yet another type of dramatic evolutionary events and inspire the reconstruction of preduplicated ancestral genomes, referred to as the genome halving problem. Generalization of WGDs to whole-genome multiplication events leads to the genome aliquoting problem. RESULTS: In this study, we propose polynomial-size integer linear programming (ILP) formulations for the aforementioned problems. We further obtain such formulations for the restricted and conserved versions of the median and halving problems, which have been recently introduced to improve biological relevance of the solutions. Extensive evaluation of solutions to the different ILP problems demonstrates their good accuracy. Furthermore, since the ILP formulations for the conserved versions have linear size, they provide a novel practical approach to ancestral genome reconstruction, which combines the advantages of homology- and rearrangements-based methods. AVAILABILITY AND IMPLEMENTATION: Code and data are available in https://github.com/AvdeevPavel/ILP-WGD-reconstructor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Programação Linear , Algoritmos , Evolução Biológica , Evolução Molecular , Genômica , Filogenia
4.
BMC Biol ; 18(1): 1, 2020 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-31898513

RESUMO

BACKGROUND: New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from 'finished'. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies. RESULTS: We evaluated and employed 3 gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies, we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: 6 with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and 3 with new assemblies based on re-scaffolding or long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: 7 for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further 7 with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi. CONCLUSIONS: Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our evaluations show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.


Assuntos
Anopheles/genética , Evolução Biológica , Cromossomos , Técnicas Genéticas/instrumentação , Genômica/métodos , Sintenia , Animais , Mapeamento Cromossômico
5.
Evol Bioinform Online ; 15: 1176934318820534, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31217687

RESUMO

Reconstruction of the median genome consisting of linear chromosomes from three given genomes is known to be intractable. There exist efficient methods for solving a relaxed version of this problem, where the median genome is allowed to have circular chromosomes. We propose a method for construction of an approximate solution to the original problem from a solution to the relaxed problem and prove a bound on its approximation error. Our method also provides insights into the combinatorial structure of genome transformations with respect to appearance of circular chromosomes.

6.
J Comput Biol ; 25(11): 1203-1219, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30133318

RESUMO

Construction of phylogenetic trees and networks for extant species from their characters represents one of the key problems in phylogenomics. While solution to this problem is not always uniquely defined and there exist multiple methods for tree/network construction, it becomes important to measure how well the constructed networks capture the given character relationship across the species. We propose a novel method for measuring the specificity of a given phylogenetic network in terms of the total number of distributions of homoplasy-free character states at the leaves that the network may impose. While for binary phylogenetic trees, this number has an exact formula and depends only on the number of leaves and character states but not on the tree topology, the situation is much more complicated for nonbinary trees or networks. Nevertheless, we develop an algorithm for combinatorial enumeration of such distributions, which is applicable for arbitrary trees and networks under some reasonable assumptions. We further extend our algorithm to a special class of characters that follow Dollo's law of irreversibility.


Assuntos
Algoritmos , Biologia Computacional/métodos , Modelos Genéticos , Redes Neurais de Computação , Filogenia , Criança , Cor , Humanos , Conceitos Matemáticos
7.
BMC Bioinformatics ; 18(Suppl 15): 496, 2017 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-29244014

RESUMO

BACKGROUND: Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose positions and orientations along the genome are unknown. While there exists a number of methods for reconstruction of the genome from its scaffolds, utilizing various computational and wet-lab techniques, they often can produce only partial error-prone scaffold assemblies. It therefore becomes important to compare and merge scaffold assemblies produced by different methods, thus combining their advantages and highlighting present conflicts for further investigation. These tasks may be labor intensive if performed manually. RESULTS: We present CAMSA-a tool for comparative analysis and merging of two or more given scaffold assemblies. The tool (i) creates an extensive report with several comparative quality metrics; (ii) constructs the most confident merged scaffold assembly; and (iii) provides an interactive framework for a visual comparative analysis of the given assemblies. Among the CAMSA features, only scaffold merging can be evaluated in comparison to existing methods. Namely, it resembles the functionality of assembly reconciliation tools, although their primary targets are somewhat different. Our evaluations show that CAMSA produces merged assemblies of comparable or better quality than existing assembly reconciliation tools while being the fastest in terms of the total running time. CONCLUSIONS: CAMSA addresses the current deficiency of tools for automated comparison and analysis of multiple assemblies of the same set scaffolds. Since there exist numerous methods and techniques for scaffold assembly, identifying similarities and dissimilarities across assemblies produced by different methods is beneficial both for the developers of scaffold assembly algorithms and for the researchers focused on improving draft assemblies of specific organisms.


Assuntos
Mapeamento Cromossômico/métodos , Genômica/métodos , Software , Algoritmos , Genoma , Alinhamento de Sequência , Análise de Sequência de DNA
8.
BMC Genomics ; 18(Suppl 4): 356, 2017 05 24.
Artigo em Inglês | MEDLINE | ID: mdl-28589865

RESUMO

BACKGROUND: The ability to estimate the evolutionary distance between extant genomes plays a crucial role in many phylogenomic studies. Often such estimation is based on the parsimony assumption, implying that the distance between two genomes can be estimated as the rearrangement distance equal the minimal number of genome rearrangements required to transform one genome into the other. However, in reality the parsimony assumption may not always hold, emphasizing the need for estimation that does not rely on the rearrangement distance. The distance that accounts for the actual (rather than minimal) number of rearrangements between two genomes is often referred to as the true evolutionary distance. While there exists a method for the true evolutionary distance estimation, it however assumes that genomes can be broken by rearrangements equally likely at any position in the course of evolution. This assumption, known as the random breakage model, has recently been refuted in favor of the more rigorous fragile breakage model postulating that only certain "fragile" genomic regions are prone to rearrangements. RESULTS: We propose a new method for estimating the true evolutionary distance between two genomes under the fragile breakage model. We evaluate the proposed method on simulated genomes, which show its high accuracy. We further apply the proposed method for estimation of evolutionary distances within a set of five yeast genomes and a set of two fish genomes. CONCLUSIONS: The true evolutionary distances between the five yeast genomes estimated with the proposed method reveals that some pairs of yeast genomes violate the parsimony assumption. The proposed method further demonstrates that the rearrangement distance between the two fish genomes underestimates their evolutionary distance by about 20%. These results demonstrate how drastically the two distances can differ and justify the use of true evolutionary distance in phylogenomic studies.


Assuntos
Evolução Molecular , Modelos Genéticos , Genômica , Filogenia
9.
J Comput Biol ; 24(2): 93-105, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-28045556

RESUMO

Genome rearrangements can be modeled as k-breaks, which break a genome at k positions and glue the resulting fragments in a new order. In particular, reversals, translocations, fusions, and fissions are modeled as 2-breaks, and transpositions are modeled as 3-breaks. Although k-break rearrangements for [Formula: see text] have not been observed in evolution, they are used in cancer genomics to model chromothripsis, a catastrophic event of multiple breakages happening simultaneously in a genome. It is known that the k-break distance between two genomes (i.e., the minimum number of k-breaks required to transform one genome into the other) can be computed in terms of cycle lengths in the breakpoint graph of these genomes. In this work, we address the combinatorial problem of enumerating genomes at a given k-break distance from a fixed unichromosomal genome. More generally, we enumerate genome pairs, whose breakpoint graph has a given distribution of cycle lengths. We further show how our enumeration can be used for uniform sampling of random genomes at a given k-break distance, and describe its connection to various combinatorial objects such as Bell polynomials.


Assuntos
Algoritmos , Quebra Cromossômica , Cromotripsia , Rearranjo Gênico , Genoma , Genômica/métodos , Animais , Gráficos por Computador , Evolução Molecular , Humanos , Modelos Genéticos
10.
Front Genet ; 8: 212, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29312438

RESUMO

Genome rearrangements are large-scale evolutionary events that shuffle genomic architectures. The minimal number of such events between two genomes is often used in phylogenomic studies to measure the evolutionary distance between the genomes. Double-Cut-and-Join (DCJ) operations represent a convenient model of most common genome rearrangements (reversals, translocations, fissions, and fusions), while other genome rearrangements, such as transpositions, can be modeled by pairs of DCJs. Since the DCJ model does not directly account for transpositions, their impact on DCJ scenarios is unclear. In the present work, we study implicit appearance of transpositions (as pairs of DCJs) in DCJ scenarios. We consider shortest DCJ scenarios satisfying the maximum parsimony assumption, as well as more general DCJ scenarios based on some realistic but less restrictive assumptions. In both cases, we derive a uniform lower bound for the rate of implicit transpositions, which depends only on the genomes but not a particular DCJ scenario between them. Our results imply that implicit appearance of transpositions in DCJ scenarios may be unavoidable or even abundant for some pairs of genomes. We estimate that for mammalian genomes implicit transpositions constitute at least 6% of genome rearrangements.

11.
J Comput Biol ; 23(3): 150-64, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26885568

RESUMO

Since most dramatic genomic changes are caused by genome rearrangements as well as gene duplications and gain/loss events, it becomes crucial to understand their mechanisms and reconstruct ancestral genomes of the given genomes. This problem was shown to be NP-complete even in the "simplest" case of three genomes, thus calling for heuristic rather than exact algorithmic solutions. At the same time, a larger number of input genomes may actually simplify the problem in practice as it was earlier illustrated with MGRA, a state-of-the-art software tool for reconstruction of ancestral genomes of multiple genomes. One of the key obstacles for MGRA and other similar tools is presence of breakpoint reuses when the same breakpoint region is broken by several different genome rearrangements in the course of evolution. Furthermore, such tools are often limited to genomes composed of the same genes with each gene present in a single copy in every genome. This limitation makes these tools inapplicable for many biological datasets and degrades the resolution of ancestral reconstructions in diverse datasets. We address these deficiencies by extending the MGRA algorithm to genomes with unequal gene contents. The developed next-generation tool MGRA2 can handle gene gain/loss events and shares the ability of MGRA to reconstruct ancestral genomes uniquely in the case of limited breakpoint reuse. Furthermore, MGRA2 employs a number of novel heuristics to cope with higher breakpoint reuse and process datasets inaccessible for MGRA. In practical experiments, MGRA2 shows superior performance for simulated and real genomes as compared to other ancestral genome reconstruction tools.


Assuntos
Evolução Molecular , Amplificação de Genes , Deleção de Genes , Ordem dos Genes , Genoma , Software , Animais , Pontos de Quebra do Cromossomo , Modelos Genéticos
12.
BMC Genomics ; 17 Suppl 1: 13, 2016 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-26818233

RESUMO

BACKGROUND: Anguilla japonica (Japanese eel) is currently one of the most important research subjects in eastern Asia aquaculture. Enigmatic life cycle of the organism makes study of artificial reproduction extremely limited. Henceforth genomic and transcriptomic resources of eels are urgently needed to help solving the problems surrounding this organism across multiple fields. We hereby provide a reconstructed transcriptome from deep sequencing of juvenile (glass eels) whole body samples. The provided expressed sequence tags were used to annotate the currently available draft genome sequence. Homologous information derived from the annotation result was applied to improve the group of scaffolds into available linkage groups. RESULTS: With the transcriptome sequence data combined with publicly available expressed sequence tags evidences, 18,121 genes were structurally and functionally annotated on the draft genome. Among them, 3,921 genes were located in the 19 linkage groups. 137 scaffolds covering 13 million bases were grouped into the linkage groups in additional to the original partial linkage groups, increasing the linkage group coverage from 13 to 14%. CONCLUSIONS: This annotation provide information of the coding regions of the genes supported by transcriptome based evidence. The derived homologous evidences pave the way for phylogenetic analysis of important genetic traits and the improvement of the genome assembly.


Assuntos
Anguilla/genética , Genoma , Animais , Mapeamento Cromossômico , Peixes/genética , Ligação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Filogenia , Polimorfismo de Nucleotídeo Único , Receptores Citoplasmáticos e Nucleares/classificação , Receptores Citoplasmáticos e Nucleares/genética , Análise de Sequência de RNA , Fatores de Transcrição/classificação , Fatores de Transcrição/genética
13.
BMC Bioinformatics ; 17(Suppl 14): 418, 2016 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-28185564

RESUMO

BACKGROUND: Genome median and genome halving are combinatorial optimization problems that aim at reconstruction of ancestral genomes by minimizing the number of evolutionary events between them and genomes of the extant species. While these problems have been widely studied in past decades, their solutions are often either not efficient or not biologically adequate. These shortcomings have been recently addressed by restricting the problems solution space. RESULTS: We show that the restricted variants of genome median and halving problems are, in fact, closely related. We demonstrate that these problems have a neat topological interpretation in terms of embedded graphs and polygon gluings. We illustrate how such interpretation can lead to solutions to these problems in particular cases. CONCLUSIONS: This study provides an unexpected link between comparative genomics and topology, and demonstrates advantages of solving genome median and halving problems within the topological framework.


Assuntos
Genômica , Modelos Genéticos , Genoma
14.
PLoS One ; 10(6): e0129566, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26075913

RESUMO

A high throughput screen for compounds that induce TRAIL-mediated apoptosis identified ML100 as an active chemical probe, which potentiated TRAIL activity in prostate carcinoma PPC-1 and melanoma MDA-MB-435 cells. Follow-up in silico modeling and profiling in cell-based assays allowed us to identify NSC130362, pharmacophore analog of ML100 that induced 65-95% cytotoxicity in cancer cells and did not affect the viability of human primary hepatocytes. In agreement with the activation of the apoptotic pathway, both ML100 and NSC130362 synergistically with TRAIL induced caspase-3/7 activity in MDA-MB-435 cells. Subsequent affinity chromatography and inhibition studies convincingly demonstrated that glutathione reductase (GSR), a key component of the oxidative stress response, is a target of NSC130362. In accordance with the role of GSR in the TRAIL pathway, GSR gene silencing potentiated TRAIL activity in MDA-MB-435 cells but not in human hepatocytes. Inhibition of GSR activity resulted in the induction of oxidative stress, as was evidenced by an increase in intracellular reactive oxygen species (ROS) and peroxidation of mitochondrial membrane after NSC130362 treatment in MDA-MB-435 cells but not in human hepatocytes. The antioxidant reduced glutathione (GSH) fully protected MDA-MB-435 cells from cell lysis induced by NSC130362 and TRAIL, thereby further confirming the interplay between GSR and TRAIL. As a consequence of activation of oxidative stress, combined treatment of different oxidative stress inducers and NSC130362 promoted cell death in a variety of cancer cells but not in hepatocytes in cell-based assays and in in vivo, in a mouse tumor xenograft model.


Assuntos
Apoptose/efeitos dos fármacos , Glutationa Redutase/metabolismo , Ensaios de Triagem em Larga Escala , Estresse Oxidativo , Ligante Indutor de Apoptose Relacionado a TNF/metabolismo , Ligante Indutor de Apoptose Relacionado a TNF/farmacologia , Animais , Antineoplásicos/farmacologia , Linhagem Celular Tumoral , Relação Dose-Resposta a Droga , Doxorrubicina/farmacologia , Descoberta de Drogas , Glutationa/metabolismo , Glutationa Redutase/antagonistas & inibidores , Humanos , Camundongos , Espécies Reativas de Oxigênio , Bibliotecas de Moléculas Pequenas
15.
Comput Biol Chem ; 57: 46-53, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25819137

RESUMO

Advances in DNA sequencing technology over the past decade have increased the volume of raw sequenced genomic data available for further assembly and analysis. While there exist many algorithms for assembly of sequenced genomic material, they often experience difficulties in constructing complete genomic sequences. Instead, they produce long genomic subsequences (scaffolds), which then become a subject to scaffold assembly aimed at reconstruction of their order along genome chromosomes. The balance between reliability and cost for scaffold assembly is not there just yet, which inspires one to seek for new approaches to address this problem. We present a new method for scaffold assembly based on the analysis of gene orders and genome rearrangements in multiple related genomes (some or even all of which may be fragmented). Evaluation of the proposed method on artificially fragmented mammalian genomes demonstrates its high reliability. We also apply our method for incomplete anophelinae genomes, which expose high fragmentation, and further validate the assembly results with referenced-based scaffolding. While the two methods demonstrate consistent results, the proposed method is able to identify more assembly points than the reference-based scaffolding.


Assuntos
Ordem dos Genes/genética , Análise de Sequência de DNA/métodos , Algoritmos , Humanos
16.
Science ; 347(6217): 1258522, 2015 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-25554792

RESUMO

Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history. To investigate the genomic basis of vectorial capacity and explore new avenues for vector control, we sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution. Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila. Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover but instead diversify through protein-sequence changes. This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts.


Assuntos
Anopheles/genética , Evolução Molecular , Genoma de Inseto , Insetos Vetores/genética , Malária/transmissão , Animais , Anopheles/classificação , Sequência de Bases , Cromossomos de Insetos/genética , Drosophila/genética , Humanos , Insetos Vetores/classificação , Dados de Sequência Molecular , Filogenia , Alinhamento de Sequência
17.
J Comput Biol ; 20(10): 714-37, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24093227

RESUMO

Recent advances in single-cell genomics provide an alternative to largely gene-centric metagenomics studies, enabling whole-genome sequencing of uncultivated bacteria. However, single-cell assembly projects are challenging due to (i) the highly nonuniform read coverage and (ii) a greatly elevated number of chimeric reads and read pairs. While recently developed single-cell assemblers have addressed the former challenge, methods for assembling highly chimeric reads remain poorly explored. We present algorithms for identifying chimeric edges and resolving complex bulges in de Bruijn graphs, which significantly improve single-cell assemblies. We further describe applications of the single-cell assembler SPAdes to a new approach for capturing and sequencing "microbial dark matter" that forms small pools of randomly selected single cells (called a mini-metagenome) and further sequences all genomes from the mini-metagenome at once. On single-cell bacterial datasets, SPAdes improves on the recently developed E+V-SC and IDBA-UD assemblers specifically designed for single-cell sequencing. For standard (cultivated monostrain) datasets, SPAdes also improves on A5, ABySS, CLC, EULER-SR, Ray, SOAPdenovo, and Velvet. Thus, recently developed single-cell assemblers not only enable single-cell sequencing, but also improve on conventional assemblers on their own turf. SPAdes is available for free online download under a GPLv2 license.


Assuntos
Mapeamento de Sequências Contíguas/métodos , DNA Bacteriano/genética , DNA Concatenado/genética , Algoritmos , Composição de Bases , Biologia Computacional , Escherichia coli/genética , Biblioteca Gênica , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala , Técnicas de Amplificação de Ácido Nucleico , Pedobacter/genética , Prochlorococcus/genética , Análise de Sequência de DNA , Análise de Célula Única
18.
BMC Genomics ; 14 Suppl 1: S7, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23368723

RESUMO

Error correction of sequenced reads remains a difficult task, especially in single-cell sequencing projects with extremely non-uniform coverage. While existing error correction tools designed for standard (multi-cell) sequencing data usually come up short in single-cell sequencing projects, algorithms actually used for single-cell error correction have been so far very simplistic.We introduce several novel algorithms based on Hamming graphs and Bayesian subclustering in our new error correction tool BAYESHAMMER. While BAYESHAMMER was designed for single-cell sequencing, we demonstrate that it also improves on existing error correction tools for multi-cell sequencing data while working much faster on real-life datasets. We benchmark BAYESHAMMER on both k-mer counts and actual assembly results with the SPADES genome assembler.


Assuntos
Algoritmos , Análise de Sequência de DNA , Teorema de Bayes , Análise por Conglomerados , Escherichia coli/genética , Análise de Célula Única
19.
J Comput Biol ; 20(4): 359-71, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22803627

RESUMO

One of the key advances in genome assembly that has led to a significant improvement in contig lengths has been improved algorithms for utilization of paired reads (mate-pairs). While in most assemblers, mate-pair information is used in a post-processing step, the recently proposed Paired de Bruijn Graph (PDBG) approach incorporates the mate-pair information directly in the assembly graph structure. However, the PDBG approach faces difficulties when the variation in the insert sizes is high. To address this problem, we first transform mate-pairs into edge-pair histograms that allow one to better estimate the distance between edges in the assembly graph that represent regions linked by multiple mate-pairs. Further, we combine the ideas of mate-pair transformation and PDBGs to construct new data structures for genome assembly: pathsets and pathset graphs.


Assuntos
Algoritmos , Mapeamento de Sequências Contíguas/métodos , Genoma/genética , Análise de Sequência de DNA/métodos , Bases de Dados Genéticas , Escherichia coli/genética
20.
Mol Phylogenet Evol ; 65(3): 871-82, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22929217

RESUMO

Evolutionary relationships among placental mammalian orders have been controversial. Whole genome sequencing and new computational methods offer opportunities to resolve the relationships among 10 genomes belonging to the mammalian orders Primates, Rodentia, Carnivora, Perissodactyla and Artiodactyla. By application of the double cut and join distance metric, where gene order is the phylogenetic character, we computed genomic distances among the sampled mammalian genomes. With a marsupial outgroup, the gene order tree supported a topology in which Rodentia fell outside the cluster of Primates, Carnivora, Perissodactyla, and Artiodactyla. Results of breakpoint reuse rate and synteny block length analyses were consistent with the prediction of random breakage model, which provided a diagnostic test to support use of gene order as an appropriate phylogenetic character in this study. We discussed the influence of rate differences among lineages and other factors that may contribute to different resolutions of mammalian ordinal relationships by different methods of phylogenetic reconstruction.


Assuntos
Evolução Biológica , Mamíferos/classificação , Filogenia , Animais , Artiodáctilos/classificação , Artiodáctilos/genética , Carnívoros/classificação , Carnívoros/genética , Ordem dos Genes , Genoma , Mamíferos/genética , Modelos Genéticos , Perissodáctilos/classificação , Perissodáctilos/genética , Primatas/classificação , Primatas/genética , Proteoma/análise , Roedores/classificação , Roedores/genética , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA