Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Gigascience ; 9(10)2020 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-33057676

RESUMO

BACKGROUND: Metagenomic next-generation sequencing (mNGS) has enabled the rapid, unbiased detection and identification of microbes without pathogen-specific reagents, culturing, or a priori knowledge of the microbial landscape. mNGS data analysis requires a series of computationally intensive processing steps to accurately determine the microbial composition of a sample. Existing mNGS data analysis tools typically require bioinformatics expertise and access to local server-class hardware resources. For many research laboratories, this presents an obstacle, especially in resource-limited environments. FINDINGS: We present IDseq, an open source cloud-based metagenomics pipeline and service for global pathogen detection and monitoring (https://idseq.net). The IDseq Portal accepts raw mNGS data, performs host and quality filtration steps, then executes an assembly-based alignment pipeline, which results in the assignment of reads and contigs to taxonomic categories. The taxonomic relative abundances are reported and visualized in an easy-to-use web application to facilitate data interpretation and hypothesis generation. Furthermore, IDseq supports environmental background model generation and automatic internal spike-in control recognition, providing statistics that are critical for data interpretation. IDseq was designed with the specific intent of detecting novel pathogens. Here, we benchmark novel virus detection capability using both synthetically evolved viral sequences and real-world samples, including IDseq analysis of a nasopharyngeal swab sample acquired and processed locally in Cambodia from a tourist from Wuhan, China, infected with the recently emergent SARS-CoV-2. CONCLUSION: The IDseq Portal reduces the barrier to entry for mNGS data analysis and enables bench scientists, clinicians, and bioinformaticians to gain insight from mNGS datasets for both known and novel pathogens.


Assuntos
Betacoronavirus/genética , Computação em Nuvem , Infecções por Coronavirus/virologia , Metagenoma , Metagenômica/métodos , Pneumonia Viral/virologia , Betacoronavirus/patogenicidade , COVID-19 , Infecções por Coronavirus/diagnóstico , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Pandemias , Pneumonia Viral/diagnóstico , SARS-CoV-2 , Software
2.
Genome Res ; 23(1): 129-41, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23093720

RESUMO

Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types.


Assuntos
Análise de Sequência de DNA/métodos , DNA Bacteriano/química , DNA Mitocondrial/química , Escherichia coli/química , Guanosina/análogos & derivados , Guanosina/química , Humanos , Cinética , Oxirredução
3.
Nucleic Acids Res ; 40(4): e29, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22156058

RESUMO

DNA methylation is the most common form of DNA modification in prokaryotic and eukaryotic genomes. We have applied the method of single-molecule, real-time (SMRT®) DNA sequencing that is capable of direct detection of modified bases at single-nucleotide resolution to characterize the specificity of several bacterial DNA methyltransferases (MTases). In addition to previously described SMRT sequencing of N6-methyladenine and 5-methylcytosine, we show that N4-methylcytosine also has a specific kinetic signature and is therefore identifiable using this approach. We demonstrate for all three prokaryotic methylation types that SMRT sequencing confirms the identity and position of the methylated base in cases where the MTase specificity was previously established by other methods. We then applied the method to determine the sequence context and methylated base identity for three MTases with unknown specificities. In addition, we also find evidence of unanticipated MTase promiscuity with some enzymes apparently also modifying sequences that are related, but not identical, to the cognate site.


Assuntos
Metilação de DNA , Metilases de Modificação do DNA/metabolismo , Análise de Sequência de DNA , Bactérias/enzimologia , Sequência de Bases , DNA (Citosina-5-)-Metiltransferases/metabolismo , Dados de Sequência Molecular , Plasmídeos/química , DNA Metiltransferases Sítio Específica (Adenina-Específica)/metabolismo , DNA-Metiltransferase Sítio-Específica (Citosina N4-Específica)/metabolismo , Especificidade por Substrato
4.
Nat Methods ; 9(1): 75-7, 2011 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-22101853

RESUMO

We describe strand-specific, base-resolution detection of 5-hydroxymethylcytosine (5-hmC) in genomic DNA with single-molecule sensitivity, combining a bioorthogonal, selective chemical labeling method of 5-hmC with single-molecule, real-time (SMRT) DNA sequencing. The chemical labeling not only allows affinity enrichment of 5-hmC-containing DNA fragments but also enhances the kinetic signal of 5-hmC during SMRT sequencing. We applied the approach to sequence 5-hmC in a genomic DNA sample with high confidence.


Assuntos
Citosina/análogos & derivados , DNA/química , Análise de Sequência de DNA/métodos , 5-Metilcitosina/análogos & derivados , Sequência de Bases , Citosina/análise , Sensibilidade e Especificidade
5.
Database (Oxford) ; 2011: bar035, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21930505

RESUMO

Neisseria meningitidis is an important pathogen, causing life-threatening diseases including meningitis, septicemia and in some cases pneumonia. Genomic studies hold great promise for N. meningitidis research, but substantial database resources are needed to deal with the wealth of information that comes with completely sequenced and annotated genomes. To address this need, we developed Neisseria Base (NBase), a comparative genomics database and genome browser that houses and displays publicly available N. meningitidis genomes. In addition to existing N. meningitidis genome sequences, we sequenced and annotated 19 new genomes using 454 pyrosequencing and the CG-Pipeline genome analysis tool. In total, NBase hosts 27 complete N. meningitidis genome sequences along with their associated annotations. The NBase platform is designed to be scalable, via the underlying database schema and modular code architecture, such that it can readily incorporate new genomes and their associated annotations. The front page of NBase provides user access to these genomes through searching, browsing and downloading. NBase search utility includes BLAST-based sequence similarity searches along with a variety of semantic search options. All genomes can be browsed using a modified version of the GBrowse platform, and a plethora of information on each gene can be viewed using a customized details page. NBase also has a whole-genome comparison tool that yields single-nucleotide polymorphism differences between two user-defined groups of genomes. Using the virulent ST-11 lineage as an example, we demonstrate how this comparative genomics utility can be used to identify novel genomic markers for molecular profiling of N. meningitidis.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Genoma Bacteriano , Genômica/métodos , Neisseria meningitidis/genética , Análise de Sequência de DNA/métodos , Marcadores Genéticos , Polimorfismo de Nucleotídeo Único , Interface Usuário-Computador
6.
N Engl J Med ; 365(8): 709-17, 2011 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-21793740

RESUMO

BACKGROUND: A large outbreak of diarrhea and the hemolytic-uremic syndrome caused by an unusual serotype of Shiga-toxin-producing Escherichia coli (O104:H4) began in Germany in May 2011. As of July 22, a large number of cases of diarrhea caused by Shiga-toxin-producing E. coli have been reported--3167 without the hemolytic-uremic syndrome (16 deaths) and 908 with the hemolytic-uremic syndrome (34 deaths)--indicating that this strain is notably more virulent than most of the Shiga-toxin-producing E. coli strains. Preliminary genetic characterization of the outbreak strain suggested that, unlike most of these strains, it should be classified within the enteroaggregative pathotype of E. coli. METHODS: We used third-generation, single-molecule, real-time DNA sequencing to determine the complete genome sequence of the German outbreak strain, as well as the genome sequences of seven diarrhea-associated enteroaggregative E. coli serotype O104:H4 strains from Africa and four enteroaggregative E. coli reference strains belonging to other serotypes. Genomewide comparisons were performed with the use of these enteroaggregative E. coli genomes, as well as those of 40 previously sequenced E. coli isolates. RESULTS: The enteroaggregative E. coli O104:H4 strains are closely related and form a distinct clade among E. coli and enteroaggregative E. coli strains. However, the genome of the German outbreak strain can be distinguished from those of other O104:H4 strains because it contains a prophage encoding Shiga toxin 2 and a distinct set of additional virulence and antibiotic-resistance factors. CONCLUSIONS: Our findings suggest that horizontal genetic exchange allowed for the emergence of the highly virulent Shiga-toxin-producing enteroaggregative E. coli O104:H4 strain that caused the German outbreak. More broadly, these findings highlight the way in which the plasticity of bacterial genomes facilitates the emergence of new pathogens.


Assuntos
Surtos de Doenças , Infecções por Escherichia coli/microbiologia , Genoma Bacteriano , Síndrome Hemolítico-Urêmica/microbiologia , Escherichia coli Shiga Toxigênica/genética , Técnicas de Tipagem Bacteriana , Sequência de Bases , Diarreia/epidemiologia , Diarreia/microbiologia , Infecções por Escherichia coli/epidemiologia , Fezes/microbiologia , Feminino , Alemanha/epidemiologia , Síndrome Hemolítico-Urêmica/epidemiologia , Humanos , Pessoa de Meia-Idade , Filogenia , Reação em Cadeia da Polimerase , Análise de Sequência de DNA , Escherichia coli Shiga Toxigênica/classificação , Escherichia coli Shiga Toxigênica/isolamento & purificação
7.
BMC Genomics ; 12: 32, 2011 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-21232151

RESUMO

BACKGROUND: The dual concepts of pan and core genomes have been widely adopted as means to assess the distribution of gene families within microbial species and genera. The core genome is the set of genes shared by a group of organisms; the pan genome is the set of all genes seen in any of these organisms. A variety of methods have provided drastically different estimates of the sizes of pan and core genomes from sequenced representatives of the same groups of bacteria. RESULTS: We use a combination of mathematical, statistical and computational methods to show that current predictions of pan and core genome sizes may have no correspondence to true values. Pan and core genome size estimates are problematic because they depend on the estimation of the occurrence of rare genes and genomes, respectively, which are difficult to estimate precisely because they are rare. Instead, we introduce and evaluate a robust metric - genomic fluidity - to categorize the gene-level similarity among groups of sequenced isolates. Genomic fluidity is a measure of the dissimilarity of genomes evaluated at the gene level. CONCLUSIONS: The genomic fluidity of a population can be estimated accurately given a small number of sequenced genomes. Further, the genomic fluidity of groups of organisms can be compared robustly despite variation in algorithms used to identify genes and their homologs. As such, we recommend that genomic fluidity be used in place of pan and core genome size estimates when assessing gene diversity within genomes of a species or a group of closely related organisms.


Assuntos
Variação Genética/genética , Genoma Bacteriano/genética , Simulação por Computador , Modelos Estatísticos , Modelos Teóricos
8.
Bioinformatics ; 26(15): 1819-26, 2010 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-20519285

RESUMO

MOTIVATION: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. RESULTS: We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. AVAILABILITY AND IMPLEMENTATION: The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems.


Assuntos
Genoma Bacteriano/genética , Genômica/métodos , Células Procarióticas , Bordetella bronchiseptica/genética , Georgia , Neisseria meningitidis/genética , Análise de Sequência de DNA/métodos , Software
9.
BMC Bioinformatics ; 10: 316, 2009 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-19799776

RESUMO

BACKGROUND: The development of effective environmental shotgun sequence binning methods remains an ongoing challenge in algorithmic analysis of metagenomic data. While previous methods have focused primarily on supervised learning involving extrinsic data, a first-principles statistical model combined with a self-training fitting method has not yet been developed. RESULTS: We derive an unsupervised, maximum-likelihood formalism for clustering short sequences by their taxonomic origin on the basis of their k-mer distributions. The formalism is implemented using a Markov Chain Monte Carlo approach in a k-mer feature space. We introduce a space transformation that reduces the dimensionality of the feature space and a genomic fragment divergence measure that strongly correlates with the method's performance. Pairwise analysis of over 1000 completely sequenced genomes reveals that the vast majority of genomes have sufficient genomic fragment divergence to be amenable for binning using the present formalism. Using a high-performance implementation, the binner is able to classify fragments as short as 400 nt with accuracy over 90% in simulations of low-complexity communities of 2 to 10 species, given sufficient genomic fragment divergence. The method is available as an open source package called LikelyBin. CONCLUSION: An unsupervised binning method based on statistical signatures of short environmental sequences is a viable stand-alone binning method for low complexity samples. For medium and high complexity samples, we discuss the possibility of combining the current method with other methods as part of an iterative process to enhance the resolving power of sorting reads into taxonomic and/or functional bins.


Assuntos
Análise por Conglomerados , Genômica/métodos , Algoritmos , Genoma , Análise de Sequência de DNA/métodos
10.
Int J Bioinform Res Appl ; 5(4): 458-77, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19640832

RESUMO

We have developed a new method for frameshift detection, a combination of ab initio and alignment-based algorithms, that can serve as a useful tool for sequencing quality control in the next generation sequencing. We evaluated the method's accuracy on test sets of annotated genomic sequences with artificial frameshifts in protein coding regions. These tests have shown that the new method performs comparably to the earlier developed FrameD. On the sets of sequences produced by 454 pyrosequencing with sequence errors recovered by Sanger re-sequencing the accuracy of the method was shown to hold at the same level.


Assuntos
Mutação da Fase de Leitura , Genoma Bacteriano , Genômica/métodos , Fases de Leitura Aberta , Algoritmos , Alinhamento de Sequência
11.
Nucleic Acids Res ; 37(Web Server issue): W606-11, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19468047

RESUMO

The Meningococcus Genome Informatics Platform (MGIP) is a suite of computational tools for the analysis of multilocus sequence typing (MLST) data, at http://mgip.biology.gatech.edu. MLST is used to generate allelic profiles to characterize strains of Neisseria meningitidis, a major cause of bacterial meningitis worldwide. Neisseria meningitidis strains are characterized with MLST as specific sequence types (ST) and clonal complexes (CC) based on the DNA sequences at defined loci. These data are vital to molecular epidemiology studies of N. meningitidis, including outbreak investigations and population biology. MGIP analyzes DNA sequence trace files, returns individual allele calls and characterizes the STs and CCs. MGIP represents a substantial advance over existing software in several respects: (i) ease of use-MGIP is user friendly, intuitive and thoroughly documented; (ii) flexibility--because MGIP is a website, it is compatible with any computer with an internet connection, can be used from any geographic location, and there is no installation; (iii) speed--MGIP takes just over one minute to process a set of 96 trace files; and (iv) expandability--MGIP has the potential to expand to more loci than those used in MLST and even to other bacterial species.


Assuntos
Técnicas de Tipagem Bacteriana , Neisseria meningitidis/classificação , Software , Alelos , Genoma Bacteriano , Genômica , Neisseria meningitidis/genética , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Interface Usuário-Computador
12.
Genome Res ; 19(4): 682-9, 2009 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-19176791

RESUMO

Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and six Drosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families-perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.


Assuntos
Biologia Computacional , Genoma , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de DNA/estatística & dados numéricos , Algoritmos , Bases de Dados de Ácidos Nucleicos , Éxons/genética , Família Multigênica , Software
13.
Genomics ; 88(4): 431-42, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-16697139

RESUMO

We have explored the distributions of fully conserved ungapped blocks in genome-wide pair-wise alignments of recently completed species of Drosophila: D. melanogaster, D. yakuba, D. ananassae, D. pseudoobscura, D. virilis, and D. mojavensis. Based on these distributions we have found that nearly every functional sequence category possesses its own distinctive conservation pattern, sometimes independent of the overall sequence conservation level. In the coding and regulatory regions, the ungapped blocks were longer than in introns, UTRs, and nonfunctional sequences. At the same time, the blocks in the coding regions carried a 3N + 2 signature characteristic of synonymous substitutions in the third-codon position. Larger block sizes in transcription regulatory regions can be explained by the presence of conserved arrays of binding sites for transcription factors. We also have shown that the longest ungapped blocks, or "ultraconserved" sequences, are associated with specific gene groups, including those encoding ion channels and components of the cytoskeleton. We discuss how restraining conservation patterns may help in mapping functional sequence categories and improve genome annotation.


Assuntos
Sequência Conservada , Drosophila/genética , Animais , Sequência de Bases , Elementos Facilitadores Genéticos , Éxons , Genoma de Inseto , Íntrons , MicroRNAs , Regiões Promotoras Genéticas , Sequências Reguladoras de Ácido Nucleico , Regiões não Traduzidas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...