Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 158
Filtrar
1.
Nat Methods ; 2024 Sep 26.
Artigo em Inglês | MEDLINE | ID: mdl-39327484

RESUMO

Bacterial species in microbial communities are often represented by mixtures of strains, distinguished by small variations in their genomes. Short-read approaches can be used to detect small-scale variation between strains but fail to phase these variants into contiguous haplotypes. Long-read metagenome assemblers can generate contiguous bacterial chromosomes but often suppress strain-level variation in favor of species-level consensus. Here we present Strainy, an algorithm for strain-level metagenome assembly and phasing from Nanopore and PacBio reads. Strainy takes a de novo metagenomic assembly as input and identifies strain variants, which are then phased and assembled into contiguous haplotypes. Using simulated and mock Nanopore and PacBio metagenome data, we show that Strainy assembles accurate and complete strain haplotypes, outperforming current Nanopore-based methods and comparable with PacBio-based algorithms in completeness and accuracy. We then use Strainy to assemble strain haplotypes of a complex environmental metagenome, revealing distinct strain distribution and mutational patterns in bacterial species.

2.
PLoS Comput Biol ; 20(8): e1012343, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39102435

RESUMO

For decades, the 16S rRNA gene has been used to taxonomically classify prokaryotic species and to taxonomically profile microbial communities. However, the 16S rRNA gene has been criticized for being too conserved to differentiate between distinct species. We argue that the inability to differentiate between species is not a unique feature of the 16S rRNA gene. Rather, we observe the gradual loss of species-level resolution for other nearly-universal prokaryotic marker genes as the number of gene sequences increases in reference databases. This trend was strongly correlated with how represented a taxonomic group was in the database and indicates that, at the gene-level, the boundaries between many species might be fuzzy. Through our study, we argue that any approach that relies on a single marker to distinguish bacterial taxa is fraught even if some markers appear to be discriminative in current databases.


Assuntos
Bactérias , Bases de Dados Genéticas , RNA Ribossômico 16S , RNA Ribossômico 16S/genética , Bactérias/genética , Bactérias/classificação , Marcadores Genéticos/genética , Filogenia , Biologia Computacional/métodos
3.
Nucleic Acids Res ; 51(8): e46, 2023 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-36912074

RESUMO

16S rRNA gene sequence clustering is an important tool in characterizing the diversity of microbial communities. As 16S rRNA gene data sets are growing in size, existing sequence clustering algorithms increasingly become an analytical bottleneck. Part of this bottleneck is due to the substantial computational cost expended on small clusters and singleton sequences. We propose an iterative sampling-based 16S rRNA gene sequence clustering approach that targets the largest clusters in the data set, allowing users to stop the clustering process when sufficient clusters are available for the specific analysis being targeted. We describe a probabilistic analysis of the iterative clustering process that supports the intuition that the clustering process identifies the larger clusters in the data set first. Using real data sets of 16S rRNA gene sequences, we show that the iterative algorithm, coupled with an adaptive sampling process and a mode-shifting strategy for identifying cluster representatives, substantially speeds up the clustering process while being effective at capturing the large clusters in the data set. The experiments also show that SCRAPT (Sample, Cluster, Recruit, AdaPt and iTerate) is able to produce operational taxonomic units that are less fragmented than popular tools: UCLUST, CD-HIT and DNACLUST. The algorithm is implemented in the open-source package SCRAPT. The source code used to generate the results presented in this paper is available at https://github.com/hsmurali/SCRAPT.


Assuntos
Algoritmos , Software , RNA Ribossômico 16S/genética , Genes de RNAr , Análise por Conglomerados
4.
BMC Genomics ; 25(1): 679, 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38978005

RESUMO

BACKGROUND: Oxford Nanopore provides high throughput sequencing platforms able to reconstruct complete bacterial genomes with 99.95% accuracy. However, even small levels of error can obscure the phylogenetic relationships between closely related isolates. Polishing tools have been developed to correct these errors, but it is uncertain if they obtain the accuracy needed for the high-resolution source tracking of foodborne illness outbreaks. RESULTS: We tested 132 combinations of assembly and short- and long-read polishing tools to assess their accuracy for reconstructing the genome sequences of 15 highly similar Salmonella enterica serovar Newport isolates from a 2020 onion outbreak. While long-read polishing alone improved accuracy, near perfect accuracy (99.9999% accuracy or ~ 5 nucleotide errors across the 4.8 Mbp genome, excluding low confidence regions) was only obtained by pipelines that combined both long- and short-read polishing tools. Notably, medaka was a more accurate and efficient long-read polisher than Racon. Among short-read polishers, NextPolish showed the highest accuracy, but Pilon, Polypolish, and POLCA performed similarly. Among the 5 best performing pipelines, polishing with medaka followed by NextPolish was the most common combination. Importantly, the order of polishing tools mattered i.e., using less accurate tools after more accurate ones introduced errors. Indels in homopolymers and repetitive regions, where the short reads could not be uniquely mapped, remained the most challenging errors to correct. CONCLUSIONS: Short reads are still needed to correct errors in nanopore sequenced assemblies to obtain the accuracy required for source tracking investigations. Our granular assessment of the performance of the polishing pipelines allowed us to suggest best practices for tool users and areas for improvement for tool developers.


Assuntos
Benchmarking , Surtos de Doenças , Genoma Bacteriano , Nanoporos , Sequenciamento por Nanoporos/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Salmonella enterica/genética , Salmonella enterica/isolamento & purificação , Humanos , Filogenia
5.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36579850

RESUMO

MOTIVATION: Scientists seeking to understand the genomic basis of bacterial phenotypes, such as antibiotic resistance, today have access to an unprecedented number of complete and nearly complete genomes. Making sense of these data requires computational tools able to perform multiple-genome comparisons efficiently, yet currently available tools cannot scale beyond several tens of genomes. RESULTS: We describe PRAWNS, an efficient and scalable tool for multiple-genome analysis. PRAWNS defines a concise set of genomic features (metablocks), as well as pairwise relationships between them, which can be used as a basis for large-scale genotype-phenotype association studies. We demonstrate the effectiveness of PRAWNS by identifying genomic regions associated with antibiotic resistance in Acinetobacter baumannii. AVAILABILITY AND IMPLEMENTATION: PRAWNS is implemented in C++ and Python3, licensed under the GPLv3 license, and freely downloadable from GitHub (https://github.com/KiranJavkar/PRAWNS.git). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metagenômica , Software , Genômica , Genoma , Bactérias
6.
BMC Genomics ; 24(1): 165, 2023 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-37016310

RESUMO

BACKGROUND: The Salmonella enterica serovar Newport red onion outbreak of 2020 was the largest foodborne outbreak of Salmonella in over a decade. The epidemiological investigation suggested two farms as the likely source of contamination. However, single nucleotide polymorphism (SNP) analysis of the whole genome sequencing data showed that none of the Salmonella isolates collected from the farm regions were linked to the clinical isolates-preventing the use of phylogenetics in source identification. Here, we explored an alternative method for analyzing the whole genome sequencing data driven by the hypothesis that if the outbreak strain had come from the farm regions, then the clinical isolates would disproportionately contain plasmids found in isolates from the farm regions due to horizontal transfer. RESULTS: SNP analysis confirmed that the clinical isolates formed a single, nearly-clonal clade with evidence for ancestry in California going back a decade. The clinical clade had a large core genome (4,399 genes) and a large and sparsely distributed accessory genome (2,577 genes, at least 64% on plasmids). At least 20 plasmid types occurred in the clinical clade, more than were found in the literature for Salmonella Newport. A small number of plasmids, 14 from 13 clinical isolates and 17 from 8 farm isolates, were found to be highly similar (> 95% identical)-indicating they might be related by horizontal transfer. Phylogenetic analysis was unable to determine the geographic origin, isolation source, or time of transfer of the plasmids, likely due to their promiscuous and transient nature. However, our resampling analysis suggested that observing a similar number and combination of highly similar plasmids in random samples of environmental Salmonella enterica within the NCBI Pathogen Detection database was unlikely, supporting a connection between the outbreak strain and the farms implicated by the epidemiological investigation. CONCLUSION: Horizontally transferred plasmids provided evidence for a connection between clinical isolates and the farms implicated as the source of the outbreak. Our case study suggests that such analyses might add a new dimension to source tracking investigations, but highlights the need for detailed and accurate metadata, more extensive environmental sampling, and a better understanding of plasmid molecular evolution.


Assuntos
Salmonella enterica , Sorogrupo , Cebolas/genética , Fazendas , Filogenia , Plasmídeos/genética , Surtos de Doenças
7.
Drug Metab Dispos ; 51(1): 142-153, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36116790

RESUMO

The human gut is home to trillions of microorganisms that are responsible for the modification of many orally administered drugs, leading to a wide range of therapeutic outcomes. Prodrugs bearing an azo bond are designed to treat inflammatory bowel disease and colorectal cancer via microbial azo reduction, allowing for topical application of therapeutic moieties to the diseased tissue in the intestines. Despite the inextricable link between microbial azo reduction and the efficacy of azo prodrugs, the prevalence, abundance, and distribution of azoreductases have not been systematically examined across the gut microbiome. Here, we curated and clustered amino acid sequences of experimentally confirmed bacterial azoreductases and conducted a hidden Markov model-driven homolog search for these enzymes across 4644 genome sequences present in the representative Unified Human Gastrointestinal Genomes collection. We identified 1958 putative azo-reducing species, corroborating previous findings that azo reduction appears to be a ubiquitous function of the gut microbiome. However, through a systematic comparison of predicted and confirmed azo-reducing strains, we hypothesize the presence of uncharacterized azoreductases in 25 prominent strains of the human gut microbiome. Finally, we confirmed the azo reduction of Acid Orange 7 by multiple strains of Fusobacterium nucleatum, Bacteroides fragilis, and Clostridium clostridioforme Together, these results suggest the presence and activity of many uncharacterized azoreductases in the human gut microbiome and motivate future studies aimed at characterizing azoreductase genes in prominent members of the human gut microbiome. SIGNIFICANCE STATEMENT: This work systematically examined the prevalence, abundance, and distribution of azoreductases across the healthy and inflammatory bowel disease human gut microbiome, revealing potentially uncharacterized azoreductase genes. It also confirmed the reduction of Acid Orange 7 by strains of Fusobacterium nucleatum, Bacteroides fragilis, and Clostridium clostridioforme.


Assuntos
Microbioma Gastrointestinal , Doenças Inflamatórias Intestinais , Pró-Fármacos , Humanos , Microbioma Gastrointestinal/genética , Pró-Fármacos/metabolismo , NADH NADPH Oxirredutases/genética , NADH NADPH Oxirredutases/química , NADH NADPH Oxirredutases/metabolismo , Bactérias/genética , Bactérias/metabolismo , Clostridium
8.
Bioinformatics ; 37(13): 1839-1845, 2021 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-33471121

RESUMO

MOTIVATION: Metagenomics has revolutionized microbiome research by enabling researchers to characterize the composition of complex microbial communities. Taxonomic profiling is one of the critical steps in metagenomic analyses. Marker genes, which are single-copy and universally found across Bacteria and Archaea, can provide accurate estimates of taxon abundances in the sample. RESULTS: We present TIPP2, a marker gene-based abundance profiling method, which combines phylogenetic placement with statistical techniques to control classification precision and recall. TIPP2 includes an updated set of reference packages and several algorithmic improvements over the original TIPP method. We find that TIPP2 provides comparable or better estimates of abundance than other profiling methods (including Bracken, mOTUsv2 and MetaPhlAn2), and strictly dominates other methods when there are under-represented (novel) genomes present in the dataset. AVAILABILITY AND IMPLEMENTATION: The code for our method is freely available in open-source form at https://github.com/smirarab/sepp/blob/tipp2/README.TIPP.md. The code and procedure to create new reference packages for TIPP2 are available at https://github.com/shahnidhi/TIPP_reference_package. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

9.
Bioinformatics ; 37(18): 2848-2857, 2021 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-33792639

RESUMO

MOTIVATION: Microbial gene catalogs are data structures that organize genes found in microbial communities, providing a reference for standardized analysis of the microbes across samples and studies. Although gene catalogs are commonly used, they have not been critically evaluated for their effectiveness as a basis for metagenomic analyses. RESULTS: As a case study, we investigate one such catalog, the Integrated Gene Catalog (IGC), however, our observations apply broadly to most gene catalogs constructed to date. We focus on both the approach used to construct this catalog and on its effectiveness when used as a reference for microbiome studies. Our results highlight important limitations of the approach used to construct the IGC and call into question the broad usefulness of gene catalogs more generally. We also recommend best practices for the construction and use of gene catalogs in microbiome studies and highlight opportunities for future research. AVAILABILITY AND IMPLEMENTATION: All supporting scripts for our analyses can be found on GitHub: https://github.com/SethCommichaux/IGC.git. The supporting data can be downloaded from: https://obj.umiacs.umd.edu/igc-analysis/IGC_analysis_data.tar.gz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metagenoma , Microbiota , Microbiota/genética , Metagenômica
10.
PLoS Comput Biol ; 17(5): e1008928, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-34014915

RESUMO

Many students are taught about genome assembly using the dichotomy between the complexity of finding Eulerian and Hamiltonian cycles (easy versus hard, respectively). This dichotomy is sometimes used to motivate the use of de Bruijn graphs in practice. In this paper, we explain that while de Bruijn graphs have indeed been very useful, the reason has nothing to do with the complexity of the Hamiltonian and Eulerian cycle problems. We give 2 arguments. The first is that a genome reconstruction is never unique and hence an algorithm for finding Eulerian or Hamiltonian cycles is not part of any assembly algorithm used in practice. The second is that even if an arbitrary genome reconstruction was desired, one could do so in linear time in both the Eulerian and Hamiltonian paradigms.


Assuntos
Genoma , Algoritmos , Estudo de Prova de Conceito
11.
Environ Sci Technol ; 56(21): 15019-15033, 2022 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-36194536

RESUMO

Reduced availability of agricultural water has spurred increased interest in using recycled irrigation water for U.S. food crop production. However, there are significant knowledge gaps concerning the microbiological quality of these water sources. To address these gaps, we used 16S rRNA gene and metagenomic sequencing to characterize taxonomic and functional variations (e.g., antimicrobial resistance) in bacterial communities across diverse recycled and surface water irrigation sources. We collected 1 L water samples (n = 410) between 2016 and 2018 from the Mid-Atlantic (12 sites) and Southwest (10 sites) U.S. Samples were filtered, and DNA was extracted. The V3-V4 regions of the 16S rRNA gene were then PCR amplified and sequenced. Metagenomic sequencing was also performed to characterize antibiotic, metal, and biocide resistance genes. Bacterial alpha and beta diversities were significantly different (p < 0.001) across water types and seasons. Pathogenic bacteria, such as Salmonella enterica, Staphylococcus aureus, and Aeromonas hydrophilia were observed across sample types. The most common antibiotic resistance genes identified coded against macrolides/lincosamides/streptogramins, aminoglycosides, rifampin and elfamycins, and their read counts fluctuated across seasons. We also observed multi-metal and multi-biocide resistance across all water types. To our knowledge, this is the most comprehensive longitudinal study to date of U.S. recycled water and surface water used for irrigation. Our findings improve understanding of the potential differences in the risk of exposure to bacterial pathogens and antibiotic resistance genes originating from diverse irrigation water sources across seasons and U.S. regions.


Assuntos
Antibacterianos , Desinfetantes , Estados Unidos , RNA Ribossômico 16S/genética , Antibacterianos/farmacologia , Estudos Longitudinais , Bactérias/genética , Resistência Microbiana a Medicamentos/genética , Água , Irrigação Agrícola , Águas Residuárias , Genes Bacterianos
12.
BMC Genomics ; 22(1): 389, 2021 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-34039264

RESUMO

BACKGROUND: Whole genome sequencing of cultured pathogens is the state of the art public health response for the bioinformatic source tracking of illness outbreaks. Quasimetagenomics can substantially reduce the amount of culturing needed before a high quality genome can be recovered. Highly accurate short read data is analyzed for single nucleotide polymorphisms and multi-locus sequence types to differentiate strains but cannot span many genomic repeats, resulting in highly fragmented assemblies. Long reads can span repeats, resulting in much more contiguous assemblies, but have lower accuracy than short reads. RESULTS: We evaluated the accuracy of Listeria monocytogenes assemblies from enrichments (quasimetagenomes) of naturally-contaminated ice cream using long read (Oxford Nanopore) and short read (Illumina) sequencing data. Accuracy of ten assembly approaches, over a range of sequencing depths, was evaluated by comparing sequence similarity of genes in assemblies to a complete reference genome. Long read assemblies reconstructed a circularized genome as well as a 71 kbp plasmid after 24 h of enrichment; however, high error rates prevented high fidelity gene assembly, even at 150X depth of coverage. Short read assemblies accurately reconstructed the core genes after 28 h of enrichment but produced highly fragmented genomes. Hybrid approaches demonstrated promising results but had biases based upon the initial assembly strategy. Short read assemblies scaffolded with long reads accurately assembled the core genes after just 24 h of enrichment, but were highly fragmented. Long read assemblies polished with short reads reconstructed a circularized genome and plasmid and assembled all the genes after 24 h enrichment but with less fidelity for the core genes than the short read assemblies. CONCLUSION: The integration of long and short read sequencing of quasimetagenomes expedited the reconstruction of a high quality pathogen genome compared to either platform alone. A new and more complete level of information about genome structure, gene order and mobile elements can be added to the public health response by incorporating long read analyses with the standard short read WGS outbreak response.


Assuntos
Listeria monocytogenes , Nanoporos , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Listeria monocytogenes/genética , Análise de Sequência de DNA , Sequenciamento Completo do Genoma
13.
J Synchrotron Radiat ; 28(Pt 3): 707-717, 2021 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-33949980

RESUMO

In this paper the design of the free-electron laser (FEL) in the SXL (Soft X-ray Laser) project at the MAX IV Laboratory is presented. The target performance parameters originate in a science case put forward by Swedish users and the SXL FEL is foreseen to be driven by the existing MAX IV 3 GeV linac. The SXL project is planned to be realized in different stages and in this paper the focus is on Phase 1, where the basic operation mode for the FEL will be SASE (self-amplified spontaneous emission), with an emphasis on short pulses. Simulation results for two linac bunches (high and low charge) with different pulse duration are illustrated, as well as the performance for two-color/two-pulses mode and power enhancement through tapering. Besides standard SASE and optical klystron configurations, the FEL setup is also tailored to allow for advanced seeding schemes operations. Finally possible upgrades that will be implemented in a second phase of the project are discussed.

14.
Brief Bioinform ; 20(4): 1140-1150, 2019 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-28968737

RESUMO

Metagenomic samples are snapshots of complex ecosystems at work. They comprise hundreds of known and unknown species, contain multiple strain variants and vary greatly within and across environments. Many microbes found in microbial communities are not easily grown in culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Current methods have made significant strides in reconstructing DNA segments comprising operons, tandem gene arrays and syntenic blocks. Shorter, higher-throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines and assemblers have appeared in recent years. Owing to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here, we survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation and demonstrate the insights that can be derived from assemblies through the use of assembly validation strategies. We also discuss the potential for impact of long-read technologies in metagenomics. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation.


Assuntos
Metagenoma , Metagenômica/métodos , Microbiota/genética , Algoritmos , Biologia Computacional , Bases de Dados Genéticas/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Metagenômica/estatística & dados numéricos , Metagenômica/tendências , Software
15.
Opt Express ; 29(14): 22345-22365, 2021 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-34266001

RESUMO

Ptychography, a scanning coherent diffraction imaging method, can produce a high-resolution reconstruction of a sample and, at the same time, of the illuminating beam. The emergence of vacuum ultraviolet and X-ray free electron lasers (FELs) has brought sources with unprecedented characteristics that enable X-ray ptychography with highly intense and ultra-fast short-wavelength pulses. However, the shot-to-shot pulse fluctuations typical for FEL pulses and particularly the partial spatial coherence of self-amplified spontaneous emission (SASE) FELs lead to numerical complexities in the ptychographic algorithms and ultimately restrict the application of ptychography at FELs. We present a general adaptive forward model for ptychography based on automatic differentiation, which is able to perform reconstructions even under these conditions. We applied this model to the first ptychography experiment at FLASH, the Free electron LASer in Hamburg, and obtained a high-resolution reconstruction of the sample as well as the complex wavefronts of individual FLASH pulses together with their coherence properties. This is not possible with more common ptychography algorithms.

16.
Molecules ; 26(11)2021 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-34205989

RESUMO

The additive manufacturing process is one of the technical domains that has had a sustained development in recent decades. The designers' attention to equipment and materials for 3D printing has been focused on this type of process. The paper presents a comparison between the results of the bending tests and those of the simulation of the same type of stress applied on 3D-printed PLA and PLA-glass structures. The comparison of the results shows that they are close, and the simulation process can be applied with confidence for the streamline of filament consumption, with direct consequences on the volume and weight of additive manufactured structures. The paper determines whether the theories and concepts valid in the strength of materials can be applied to the additive manufacturing pieces. Thus, the study shows that the geometry of the cross-section, by its shape (circular or elliptical) and type (solid or ring shaped), influences the strength properties of 3D-printed structures. The use of simulation will allow a significant shortening of the design time of the new structures. Moreover, the simulation process was applied with good results on 3D-printed structures in which two types of filaments were used for a single piece (structure).

17.
PLoS Comput Biol ; 15(6): e1006994, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31166948

RESUMO

The computational reconstruction of genome sequences from shotgun sequencing data has been greatly simplified by the advent of sequencing technologies that generate long reads. In the case of relatively small genomes (e.g., bacterial or viral), complete genome sequences can frequently be reconstructed computationally without the need for further experiments. However, large and complex genomes, such as those of most animals and plants, continue to pose significant challenges. In such genomes, assembly software produces incomplete and fragmented reconstructions that require additional experimentally derived information and manual intervention in order to reconstruct individual chromosome arms. Recent technologies originally designed to capture chromatin structure have been shown to effectively complement sequencing data, leading to much more contiguous reconstructions of genomes than previously possible. Here, we survey these technologies and the algorithms used to assemble and analyze large eukaryotic genomes, placed within the historical context of genome scaffolding technologies that have been in existence since the dawn of the genomic era.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Genoma Humano/genética , Genômica/métodos , Alinhamento de Sequência/métodos , Humanos , Análise de Sequência de DNA
18.
PLoS Comput Biol ; 15(8): e1007273, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31433799

RESUMO

Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at https://github.com/machinegun/SALSA.


Assuntos
Cromossomos Humanos/genética , Genoma Humano , Genômica/métodos , Algoritmos , Animais , Biologia Computacional , Simulação por Computador , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Biblioteca Genômica , Genômica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos , Software
19.
Nat Rev Genet ; 14(3): 157-67, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23358380

RESUMO

Advances in sequencing technologies and increased access to sequencing services have led to renewed interest in sequence and genome assembly. Concurrently, new applications for sequencing have emerged, including gene expression analysis, discovery of genomic variants and metagenomics, and each of these has different needs and challenges in terms of assembly. We survey the theoretical foundations that underlie modern assembly and highlight the options and practical trade-offs that need to be considered, focusing on how individual features address the needs of specific applications. We also review key software and the interplay between experimental design and efficacy of assembly.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Genoma Humano , Humanos , Metagenômica , Software
20.
Environ Res ; 170: 122-127, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30579985

RESUMO

The quality of irrigation water used to cultivate produce that is consumed raw is an important issue with regard to food safety. In this study, the microbiological quality of potential irrigation water sources in Arizona was evaluated by testing for the presence of indicator and pathogenic bacteria. Reclaimed water samples were collected from two wastewater treatment plants and return flow samples were collected from two drainage canals and one return flow pond. Standard membrane filtration methods were used for detection of indicator bacteria. Water samples (n = 28) were filtered through cellulose ester membrane filters and bacterial populations were enumerated by placing the filters on selective agar. For detection of pathogens (Salmonella enterica, Listeria monocytogenes and Shiga toxin-producing E. coli (STEC)), water samples were filtered through Modified Moore swabs and enriched in Universal Pre-enrichment Broth, followed by selective enrichment broth for each pathogen. The enriched broth was streaked onto agar media selective for each pathogen. Presumptive colonies were confirmed by PCR/real-time PCR. Among the 14 reclaimed water samples from two sites, the ranges of recovered populations of E. coli, total coliforms, and enterococci were 0-1.3, 0.5-8.3 × 103, and 0-5.5 CFU/100 mL, respectively. No L. monocytogenes, Salmonella or STEC were found. In the 13 return flow water samples from 3 sites, the ranges of recovered populations of E. coli, total coliforms and enterococci were 1.9-5.3 × 102, 6.5 × 102-9.1 × 104, and 2.9-3.7× 103 CFU/100 mL, respectively. All samples were negative for L. monocytogenes. One (7.1%) of the return flow samples was positive for E. coli O145. Nine (64.3%) of the samples were positive for Salmonella. Both real-time PCR and culture-based methods were used for the detection of Salmonella and L. monocytogenes, and the results from the two methods were comparable. The findings of this study provide evidence that irrigation waters in Arizona, including reclaimed water and return flows, could be potential sources of bacterial contamination of produce. Additional work is needed to evaluate whether bacteria present in irrigation water sources transfer to the edible portion of irrigated plants and are capable of persisting through post-harvest activities.


Assuntos
Monitoramento Ambiental , Escherichia coli , Microbiologia da Água , Poluição da Água/análise , Arizona , Fezes , Incidência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA