Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 38(3): 604-611, 2022 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-34726732

RESUMO

MOTIVATION: With the increasing throughput of sequencing technologies, structural variant (SV) detection has become possible across tens of thousands of genomes. Non-reference sequence (NRS) variants have drawn less attention compared with other types of SVs due to the computational complexity of detecting them. When using short-read data, the detection of NRS variants inevitably involves a de novo assembly which requires high-quality sequence data at high coverage. Previous studies have demonstrated how sequence data of multiple genomes can be combined for the reliable detection of NRS variants. However, the algorithms proposed in these studies have limited scalability to larger sets of genomes. RESULTS: We introduce PopIns2, a tool to discover and characterize NRS variants in many genomes, which scales to considerably larger numbers of genomes than its predecessor PopIns. In this article, we briefly outline the PopIns2 workflow and highlight our novel algorithmic contributions. We developed an entirely new approach for merging contig assemblies of unaligned reads from many genomes into a single set of NRS using a colored de Bruijn graph. Our tests on simulated data indicate that the new merging algorithm ranks among the best approaches in terms of quality and reliability and that PopIns2 shows the best precision for a growing number of genomes processed. Results on the Polaris Diversity Cohort and a set of 1000 Icelandic human genomes demonstrate unmatched scalability for the application on population-scale datasets. AVAILABILITY AND IMPLEMENTATION: The source code of PopIns2 is available from https://github.com/kehrlab/PopIns2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Humanos , Análise de Sequência de DNA/métodos , Reprodutibilidade dos Testes , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos
2.
BMC Genomics ; 15: 68, 2014 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-24460871

RESUMO

BACKGROUND: Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users. RESULTS: Here we present an 'A to Z' protocol for obtaining complete human mitochondrial (mtDNA) genomes - from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling). CONCLUSIONS: All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual 'modules' can be swapped out to suit available resources.


Assuntos
Biologia Computacional/métodos , DNA Mitocondrial/análise , Genoma Mitocondrial , Sequenciamento de Nucleotídeos em Larga Escala , Mitocôndrias/genética , DNA Mitocondrial/isolamento & purificação , Humanos , Reação em Cadeia da Polimerase , Manejo de Espécimes
3.
Bioinformatics ; 27(10): 1359-67, 2011 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-21444294

RESUMO

MOTIVATION: Despite trends towards maximum likelihood and Bayesian criteria, maximum parsimony (MP) remains an important criterion for evaluating phylogenetic trees. Because exact MP search is NP-complete, the computational effort needed to find provably optimal trees skyrockets with increasing numbers of taxa, limiting analyses to around 25-30 taxa. This is, in part, because currently available programs fail to take advantage of parallelism. RESULTS: We present XMP, a new program for finding exact MP trees that comes in both serial and parallel versions. The serial version is faster in nearly all tests than existing software. The parallel version uses a work-stealing algorithm to scale to hundreds of CPUs on a distributed-memory multiprocessor with high efficiency. An optimized SSE2 inner loop provides additional speedup for Pentium 4 and later CPUs. AVAILABILITY: C source code and several binary versions are freely available from http://www.massey.ac.nz/~wtwhite/xmp. The parallel version requires an MPI implementation, such as the freely available MPICH2.


Assuntos
Algoritmos , Funções Verossimilhança , Filogenia , Software , Animais , Teorema de Bayes , Humanos
4.
BMC Bioinformatics ; 9: 242, 2008 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-18489794

RESUMO

BACKGROUND: Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression - an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. RESULTS: We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression - the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. CONCLUSION: coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.


Assuntos
Compressão de Dados/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Ácidos Nucleicos , Animais , Evolução Molecular , Etiquetas de Sequências Expressas , Humanos , Redes Neurais de Computação , Filogenia , Mutação Puntual , Análise de Sequência de DNA , Especificidade da Espécie
5.
PLoS One ; 8(8): e69924, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23950906

RESUMO

We demonstrate quantitatively that, as predicted by evolutionary theory, sequences of homologous proteins from different species converge as we go further and further back in time. The converse, a non-evolutionary model can be expressed as probabilities, and the test works for chloroplast, nuclear and mitochondrial sequences, as well as for sequences that diverged at different time depths. Even on our conservative test, the probability that chance could produce the observed levels of ancestral convergence for just one of the eight datasets of 51 proteins is ≈1×10⁻¹9 and combined over 8 datasets is ≈1×10⁻¹³². By comparison, there are about 108° protons in the universe, hence the probability that the sequences could have been produced by a process involving unrelated ancestral sequences is about 105° lower than picking, among all protons, the same proton at random twice in a row. A non-evolutionary control model shows no convergence, and only a small number of parameters are required to account for the observations. It is time that that researchers insisted that doubters put up testable alternatives to evolution.


Assuntos
DNA/genética , Evolução Molecular , Proteínas/genética , Sequência de Aminoácidos , Animais , Sequência de Bases , Humanos , Modelos Genéticos , Dados de Sequência Molecular , Probabilidade , Proteínas/química
6.
PLoS One ; 3(8): e3106, 2008 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-18769729

RESUMO

RNAs processing other RNAs is very general in eukaryotes, but is not clear to what extent it is ancestral to eukaryotes. Here we focus on pre-mRNA splicing, one of the most important RNA-processing mechanisms in eukaryotes. In most eukaryotes splicing is predominantly catalysed by the major spliceosome complex, which consists of five uridine-rich small nuclear RNAs (U-snRNAs) and over 200 proteins in humans. Three major spliceosomal introns have been found experimentally in Giardia; one Giardia U-snRNA (U5) and a number of spliceosomal proteins have also been identified. However, because of the low sequence similarity between the Giardia ncRNAs and those of other eukaryotes, the other U-snRNAs of Giardia had not been found. Using two computational methods, candidates for Giardia U1, U2, U4 and U6 snRNAs were identified in this study and shown by RT-PCR to be expressed. We found that identifying a U2 candidate helped identify U6 and U4 based on interactions between them. Secondary structural modelling of the Giardia U-snRNA candidates revealed typical features of eukaryotic U-snRNAs. We demonstrate a successful approach to combine computational and experimental methods to identify expected ncRNAs in a highly divergent protist genome. Our findings reinforce the conclusion that spliceosomal small-nuclear RNAs existed in the last common ancestor of eukaryotes.


Assuntos
Giardia lamblia/genética , RNA de Protozoário/genética , RNA Nuclear Pequeno/genética , Spliceossomos/genética , Animais , Sequência de Bases , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Conformação de Ácido Nucleico , RNA de Protozoário/química , RNA Nuclear Pequeno/química , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Uridina/análise
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA