Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37824740

RESUMO

Metagenomics is a powerful tool for understanding organismal interactions; however, classification, profiling and detection of interactions at the strain level remain challenging. We present an automated pipeline, quantitative metagenomic alignment and taxonomic exact matching (Qmatey), that performs a fast exact matching-based alignment and integration of taxonomic binning and profiling. It interrogates large databases without using metagenome-assembled genomes, curated pan-genes or k-mer spectra that limit resolution. Qmatey minimizes misclassification and maintains strain level resolution by using only diagnostic reads as shown in the analysis of amplicon, quantitative reduced representation and shotgun sequencing datasets. Using Qmatey to analyze shotgun data from a synthetic community with 35% of the 26 strains at low abundance (0.01-0.06%), we revealed a remarkable 85-96% strain recall and 92-100% species recall while maintaining 100% precision. Benchmarking revealed that the highly ranked Kraken2 and KrakenUniq tools identified 2-4 more taxa (92-100% recall) than Qmatey but produced 315-1752 false positive taxa and high penalty on precision (1-8%). The speed, accuracy and precision of the Qmatey pipeline positions it as a valuable tool for broad-spectrum profiling and for uncovering biologically relevant interactions.


Assuntos
Metagenoma , Metagenômica , Análise de Sequência de DNA , Bases de Dados Factuais
2.
Pharm Biol ; 56(1): 368-377, 2018 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30058427

RESUMO

CONTEXT: Eurycoma longifolia Jack (Simaroubaceae) commonly known as Tongkat Ali is one of the most important plants in Malaysia. The plant extracts (particularly roots) are widely used for the treatment of cough and fever besides having antimalarial, antidiabetic, anticancer and aphrodisiac activities. OBJECTIVES: This study assesses the extent of adulteration of E. longifolia herbal medicinal products (HMPs) using DNA barcoding validated by HPLC analysis. MATERIALS AND METHODS: Chloroplastic rbcL and nuclear ITS2 barcode regions were used in the present study. The sequences generated from E. longifolia HMPs were compared to sequences in the GenBank using MEGABLAST to verify their taxonomic identity. These results were verified by neighbor-joining tree analysis in which branches of unknown specimen are compared to the reference sequences established from this study and other retrieved from the GenBank. The HMPs were also analysed using HPLC analysis for the presence of eurycomanone bioactive marker. RESULTS: Identification using DNA barcoding revealed that 37% of the tested HMPs were authentic while 27% were adulterated with the ITS2 barcode region proven to be the ideal marker. The validation of the authenticity using HPLC analysis showed a situation in which a species which was identified as authentic was found not to contain the expected chemical compound. DISCUSSION AND CONCLUSIONS: DNA barcoding should be used as the first screening step for testing of HMPs raw materials. However, integration of DNA barcoding with HPLC analysis will help to provide detailed knowledge about the safety and efficacy of the HMPs.


Assuntos
Código de Barras de DNA Taxonômico/métodos , Contaminação de Medicamentos , Eurycoma/genética , Extratos Vegetais/genética , Extratos Vegetais/isolamento & purificação , Plantas Medicinais/genética , Cromatografia Líquida de Alta Pressão/métodos , Raízes de Plantas
3.
Front Genet ; 13: 876686, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35495121

RESUMO

With the technological advances in recent decades, determining whole genome sequencing of a person has become feasible and affordable. As a result, large-scale individual genomic sequences are produced and collected for genetic medical diagnoses and cancer drug discovery, which, however, simultaneously poses serious challenges to the protection of personal genomic privacy. It is highly urgent to develop methods which make the personal genomic data both utilizable and confidential. Existing genomic privacy-protection methods are either time-consuming for encryption or with low accuracy of data recovery. To tackle these problems, this paper proposes a sequence similarity-based obfuscation method, namely IterMegaBLAST, for fast and reliable protection of personal genomic privacy. Specifically, given a randomly selected sequence from a dataset of genomic sequences, we first use MegaBLAST to find its most similar sequence from the dataset. These two aligned sequences form a cluster, for which an obfuscated sequence was generated via a DNA generalization lattice scheme. These procedures are iteratively performed until all of the sequences in the dataset are clustered and their obfuscated sequences are generated. Experimental results on benchmark datasets demonstrate that under the same degree of anonymity, IterMegaBLAST significantly outperforms existing state-of-the-art approaches in terms of both utility accuracy and time complexity.

4.
FEMS Microbes ; 3: xtac002, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37332502

RESUMO

Current methods to characterize microbial communities generally employ sequencing of the 16S rRNA gene (<500 bp) with high accuracy (∼99%) but limited phylogenetic resolution. However, long-read sequencing now allows for the profiling of near-full-length ribosomal operons (16S-ITS-23S rRNA genes) on platforms such as the Oxford Nanopore MinION. Here, we describe an rRNA operon database with >300 ,000 entries, representing >10 ,000 prokaryotic species and ∼ 150, 000 strains. Additionally, BLAST parameters were identified for strain-level resolution using in silico mutated, mock rRNA operon sequences (70-95% identity) from four bacterial phyla and two members of the Euryarchaeota, mimicking MinION reads. MegaBLAST settings were determined that required <3 s per read on a Mac Mini with strain-level resolution for sequences with >84% identity. These settings were tested on rRNA operon libraries from the human respiratory tract, farm/forest soils and marine sponges ( n = 1, 322, 818 reads for all sample sets). Most rRNA operon reads in this data set yielded best BLAST hits (95 ± 8%). However, only 38-82% of library reads were compatible with strain-level resolution, reflecting the dominance of human/biomedical-associated prokaryotic entries in the database. Since the MinION and the Mac Mini are both portable, this study demonstrates the possibility of rapid strain-level microbiome analysis in the field.

5.
Curr Protoc Bioinformatics ; 58: 3.3.1-3.3.25, 2017 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-28654728

RESUMO

The Basic Local Alignment Search Tool (BLAST) is the first tool in the annotation of nucleotide or amino acid sequences. BLAST is a flagship of bioinformatics due to its performance and user-friendliness. Beginners and intermediate users will learn how to design and submit blastn and Megablast searches on the Web pages at the National Center for Biotechnology Information. We map nucleic acid sequences to genomes, find identical or similar mRNAs, expressed sequence tag, and noncoding RNA sequences, and run Megablast searches, which are much faster than blastn. Understanding results is assisted by taxonomy reports, genomic views, and multiple alignments. We interpret expected frequency thresholds, biological significance, and statistical significance. Weak hits provide no evidence, but indicate hints for further analyses. We find genes that may code for homologous proteins by translated BLAST. We reduce false positives by filtering out low-complexity regions. Parsed BLAST results can be integrated into analysis pipelines. Links in the output connect to Entrez and PubMed, as well as structural, sequence, interaction, and expression databases. This facilitates integration with a wide spectrum of biological knowledge. © 2017 by John Wiley & Sons, Inc.


Assuntos
Biologia Computacional/métodos , Análise de Sequência/métodos , Software , Sequência de Aminoácidos , Internet
6.
Methods Mol Biol ; 1374: 1-22, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26519398

RESUMO

GenBank(®) is a comprehensive database of publicly available DNA sequences for 300,000 named organisms, more than 110,000 within the embryophyta, obtained through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Daily data exchange with the European Nucleotide Archive (ENA) in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system that integrates data from the major DNA and protein sequence databases with taxonomy, genome, mapping, protein structure and domain information, as well as the biomedical journal literature in PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. GenBank usage scenarios ranging from local analyses of the data available via FTP to online analyses supported by the NCBI web-based tools are discussed. To access GenBank and its related retrieval and analysis services, go to the NCBI home page at www.ncbi.nlm.nih.gov .


Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Genômica/métodos , Animais , Humanos , Navegador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA