Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Genomics Inform ; 18(1): e8, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-32224841

RESUMO

The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www.bioexpress.re.kr/.

2.
Bioinformatics ; 34(7): 1232-1234, 2018 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-29126106

RESUMO

Summary: Ion Torrent sequencing is one of the most frequently used platforms in healthcare research and industry. Despite many advantages, platform-specific artifacts complicate efficient separation of true variants from errors, especially in variants with lower allele frequencies (<15%). Here, we developed a multi-step filtering toolbox AIRVF that works on flowgram, raw and mapped reads and called variants to reduce artifact-driven false variant calls. Tests on sequencing data of standard reference material showed up to ∼98% reduction of false variants when combined to conventional public pipelines and ∼48% to the in-house commercial solution, with a minimal loss of sensitivity. Availability and implementation: The program with a detailed manual is available at https://sourceforge.net/projects/airvf/. Contact: swkim@yuhs.ac. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Erros de Diagnóstico , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Frequência do Gene , Humanos , Sensibilidade e Especificidade
3.
Mol Biosyst ; 12(3): 914-22, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26790373

RESUMO

Next-generation sequencing (NGS) is a popular method for assessing the molecular diversity of microbial communities without cultivation, for identifying polymorphisms in populations, and for comparing genomes and transcriptomes. However, sequence-specific errors (SSEs) by NGS systems can result in genome mis-assembly, overestimation of diversity in microbial community analyses, and false polymorphism discovery. SSEs can be particularly problematic due to rich microbial biodiversity and genomes containing frequent repeats. In this study, SSEs in public data from all popular NGS systems were discovered using a Markov chain model and hotspots for sequence errors were identified. Deletion errors were frequently preceded by homopolymers in non-Illumina NGS systems, such as GS FLX+. Substitution errors were often related to high GC contents and long G/C homopolymers in Illumina sequencing systems such as HiSeq. After removal of long G/C homopolymers in HiSeq, the average lengths of contigs and average SNP quality increased. SSEs were selectively removed from our mock community data by quality filtering, and a bias against specific microbes was identified. Our findings provide a scientific basis for filtering poor-quality reads, correcting deletion errors, preventing genome mis-assembly, and accurately assessing microbial community compositions and polymorphisms.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Composição de Bases/genética , Sequência de Bases , Genoma Bacteriano , Sequências Repetidas Invertidas/genética , Polimorfismo de Nucleotídeo Único/genética
4.
J Microbiol ; 52(7): 566-73, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24879347

RESUMO

Chimeras are a frequent artifact in polymerase chain reaction and could be the underlying causes of erroneous taxonomic identifications, overestimated microbial diversity, and spurious sequences. However, little is known about the regional effects on chimera formation. Therefore, we investigated the chimera formation rates in different regions of phylogenetically important biomarker genes to test the regional effects on chimera formation. An empirical study of chimera formation rates was performed using the Roche GSFLX™ system with sequences of the V1/V2/V3 and V4/V5 regions of the 16S rRNA gene and sequences of the nifH gene from a mock microbial community. The chimera formation rates for the 16S V1/V2/V3 region, V4/V5 region, and nifH gene were 22.1-38.5%, 3.68-3.88%, and 0.31-0.98%, respectively. Some amplicons from the V1/V2/V3 regions were shorter than the typical length (∼7-31%), reflecting incomplete extension. In the V1/V2/V3 and V4/V5 regions, conserved and hypervariable regions were identified. Chimeric hot spots were located in parts of conserved regions near the ends of the amplicons. The 16S V1/V2/V3 region had the highest chimera formation rate, likely because of long template lengths and incomplete extension. The amplicons of the nifH gene had the lowest frequency of chimera formation most likely because of variations in their wobble positions in triplet codons. Our results suggest that the main reasons for chimera formation are sequence similarity and premature termination of DNA extension near primer regions. Other housekeeping genes can be a good substitute for 16S rRNA genes in molecular microbial studies to reduce the effects of chimera formation.


Assuntos
Biodiversidade , Microbiologia Ambiental , Metagenômica/métodos , Recombinação Genética , Técnicas de Amplificação de Ácido Nucleico/métodos , Oxirredutases/genética , RNA Ribossômico 16S/genética , Análise de Sequência de DNA/métodos
5.
Nucleic Acids Res ; 42(7): e51, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24464999

RESUMO

Pyrosequencing of the 16S ribosomal RNA gene (16S) has become one of the most popular methods to assess microbial diversity. Pyrosequencing reads containing ambiguous bases (Ns) are generally discarded based on the assumptions of their non-sequence-dependent formation and high error rates. However, taxonomic composition differed by removal of reads with Ns. We determined whether Ns from pyrosequencing occur in a sequence-dependent manner. Our reads and the corresponding flow value data revealed occurrence of sequence-specific N errors with a common sequential pattern (a homopolymer + a few nucleotides with bases other than the homopolymer + N) and revealed that the nucleotide base of the homopolymer is the true base for the following N. Using an algorithm reflecting this sequence-dependent pattern, we corrected the Ns in the 16S (86.54%), bphD (81.37%) and nifH (81.55%) amplicon reads from a mock community with high precisions of 95.4, 96.9 and 100%, respectively. The new N correction method was applicable for determining most of Ns in amplicon reads from a soil sample, resulting in reducing taxonomic biases associated with N errors and in shotgun sequencing reads from public metagenome data. The method improves the accuracy and precision of microbial community analysis and genome sequencing using 454 pyrosequencing.


Assuntos
Metagenoma , Análise de Sequência de DNA/métodos , Algoritmos , DNA Bacteriano/química , Microbiologia Ambiental , Motivos de Nucleotídeos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA