Pesquisa | BVS Integralidade em Saúde

RefSeq and the prokaryotic genome annotation pipeline in the age of metagenomes.

Haft, Daniel H; Badretdin, Azat; Coulouris, George; DiCuccio, Michael; Durkin, A Scott; Jovenitti, Eric; Li, Wenjun; Mersha, Megdelawit; O'Neill, Kathleen R; Virothaisakun, Joel; Thibaud-Nissen, Françoise.

Nucleic Acids Res ; 52(D1): D762-D769, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37962425

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains over 315 000 bacterial and archaeal genomes and 236 million proteins with up-to-date and consistent annotation. In the past 3 years, we have expanded the diversity of the RefSeq collection by including the best quality metagenome-assembled genomes (MAGs) submitted to INSDC (DDBJ, ENA and GenBank), while maintaining its quality by adding validation checks. Assemblies are now more stringently evaluated for contamination and for completeness of annotation prior to acceptance into RefSeq. MAGs now account for over 17000 assemblies in RefSeq, split over 165 orders and 362 families. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP), which is used to annotate nearly all RefSeq assemblies include better detection of protein-coding genes. Nearly 83% of RefSeq proteins are now named by a curated Protein Family Model, a 4.7% increase in the past three years ago. In addition to literature citations, Enzyme Commission numbers, and gene symbols, Gene Ontology terms are now assigned to 48% of RefSeq proteins, allowing for easier multi-genome comparison. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/. PGAP is available as a stand-alone tool able to produce GenBank-ready files at https://github.com/ncbi/pgap.

Assuntos

Archaea , Bactérias , Bases de Dados de Ácidos Nucleicos , Metagenoma , Archaea/genética , Bactérias/genética , Bases de Dados de Ácidos Nucleicos/normas , Bases de Dados de Ácidos Nucleicos/tendências , Genoma Arqueal/genética , Genoma Bacteriano/genética , Internet , Anotação de Sequência Molecular , Proteínas/genética

RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.

Li, Wenjun; O'Neill, Kathleen R; Haft, Daniel H; DiCuccio, Michael; Chetvernin, Vyacheslav; Badretdin, Azat; Coulouris, George; Chitsaz, Farideh; Derbyshire, Myra K; Durkin, A Scott; Gonzales, Noreen R; Gwadz, Marc; Lanczycki, Christopher J; Song, James S; Thanki, Narmada; Wang, Jiyao; Yamashita, Roxanne A; Yang, Mingzhang; Zheng, Chanjuan; Marchler-Bauer, Aron; Thibaud-Nissen, Françoise.

Nucleic Acids Res ; 49(D1): D1020-D1028, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33270901

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures. As a result, >122 million or 79% of RefSeq proteins are now named based on a match to a curated PFM. Gene symbols, Enzyme Commission numbers or supporting publication attributes are available on over 40% of the PFMs and are inherited by the proteins and features they name, facilitating multi-genome analyses and connections to the literature. In adherence with the principles of FAIR (findable, accessible, interoperable, reusable), the PFMs are available in the Protein Family Models Entrez database to any user. Finally, the reference and representative genome set, a taxonomically diverse subset of RefSeq prokaryotic genomes, is now recalculated regularly and available for download and homology searches with BLAST. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma Arqueal/genética , Genoma Bacteriano/genética , Anotação de Sequência Molecular/métodos , Proteínas/genética , Curadoria de Dados/métodos , Mineração de Dados/métodos , Genômica/métodos , Internet , Proteínas/classificação , Interface Usuário-Computador

BLAST: a more efficient report with usability improvements.

Boratyn, Grzegorz M; Camacho, Christiam; Cooper, Peter S; Coulouris, George; Fong, Amelia; Ma, Ning; Madden, Thomas L; Matten, Wayne T; McGinnis, Scott D; Merezhuk, Yuri; Raytselis, Yan; Sayers, Eric W; Tao, Tao; Ye, Jian; Zaretskaya, Irena.

Nucleic Acids Res ; 41(Web Server issue): W29-33, 2013 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-23609542

RESUMO

The Basic Local Alignment Search Tool (BLAST) website at the National Center for Biotechnology (NCBI) is an important resource for searching and aligning sequences. A new BLAST report allows faster loading of alignments, adds navigation aids, allows easy downloading of subject sequences and reports and has improved usability. Here, we describe these improvements to the BLAST report, discuss design decisions, describe other improvements to the search page and database documentation and outline plans for future development. The NCBI BLAST URL is http://blast.ncbi.nlm.nih.gov.

Assuntos

Alinhamento de Sequência/métodos , Software , Animais , Genômica , Internet , L-Gulonolactona Oxidase/genética , Ratos

Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction.

Ye, Jian; Coulouris, George; Zaretskaya, Irena; Cutcutache, Ioana; Rozen, Steve; Madden, Thomas L.

BMC Bioinformatics ; 13: 134, 2012 Jun 18.

Artigo em Inglês | MEDLINE | ID: mdl-22708584

RESUMO

BACKGROUND: Choosing appropriate primers is probably the single most important factor affecting the polymerase chain reaction (PCR). Specific amplification of the intended target requires that primers do not have matches to other targets in certain orientations and within certain distances that allow undesired amplification. The process of designing specific primers typically involves two stages. First, the primers flanking regions of interest are generated either manually or using software tools; then they are searched against an appropriate nucleotide sequence database using tools such as BLAST to examine the potential targets. However, the latter is not an easy process as one needs to examine many details between primers and targets, such as the number and the positions of matched bases, the primer orientations and distance between forward and reverse primers. The complexity of such analysis usually makes this a time-consuming and very difficult task for users, especially when the primers have a large number of hits. Furthermore, although the BLAST program has been widely used for primer target detection, it is in fact not an ideal tool for this purpose as BLAST is a local alignment algorithm and does not necessarily return complete match information over the entire primer range. RESULTS: We present a new software tool called Primer-BLAST to alleviate the difficulty in designing target-specific primers. This tool combines BLAST with a global alignment algorithm to ensure a full primer-target alignment and is sensitive enough to detect targets that have a significant number of mismatches to primers. Primer-BLAST allows users to design new target-specific primers in one step as well as to check the specificity of pre-existing primers. Primer-BLAST also supports placing primers based on exon/intron locations and excluding single nucleotide polymorphism (SNP) sites in primers. CONCLUSIONS: We describe a robust and fully implemented general purpose primer design tool that designs target-specific PCR primers. Primer-BLAST offers flexible options to adjust the specificity threshold and other primer properties. This tool is publicly available at http://www.ncbi.nlm.nih.gov/tools/primer-blast.

Assuntos

Algoritmos , Primers do DNA/genética , Reação em Cadeia da Polimerase/métodos , Software , Proteínas de Transporte/genética , Humanos , Íntrons , Dados de Sequência Molecular , Polimorfismo de Nucleotídeo Único

BLAST+: architecture and applications.

Camacho, Christiam; Coulouris, George; Avagyan, Vahram; Ma, Ning; Papadopoulos, Jason; Bealer, Kevin; Madden, Thomas L.

BMC Bioinformatics ; 10: 421, 2009 Dec 15.

Artigo em Inglês | MEDLINE | ID: mdl-20003500

RESUMO

BACKGROUND: Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. RESULTS: We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. CONCLUSION: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.

Assuntos

Biologia Computacional/métodos , Software , Bases de Dados Genéticas , Alinhamento de Sequência

Database indexing for production MegaBLAST searches.

Morgulis, Aleksandr; Coulouris, George; Raytselis, Yan; Madden, Thomas L; Agarwala, Richa; Schäffer, Alejandro A.

Bioinformatics ; 24(16): 1757-64, 2008 Aug 15.

Artigo em Inglês | MEDLINE | ID: mdl-18567917

RESUMO

MOTIVATION: The BLAST software package for sequence comparison speeds up homology search by preprocessing a query sequence into a lookup table. Numerous research studies have suggested that preprocessing the database instead would give better performance. However, production usage of sequence comparison methods that preprocess the database has been limited to programs such as BLAT and SSAHA that are designed to find matches when query and database subsequences are highly similar. RESULTS: We developed a new version of the MegaBLAST module of BLAST that does the initial phase of finding short seeds for matches by searching a database index. We also developed a program makembindex that preprocesses the database into a data structure for rapid seed searching. We show that the new 'indexed MegaBLAST' is faster than the 'non-indexed' version for most practical uses. We show that indexed MegaBLAST is faster than miBLAST, another implementation of BLAST nucleotide searching with a preprocessed database, for most of the 200 queries we tested. To deploy indexed MegaBLAST as part of NCBI'sWeb BLAST service, the storage of databases and the queueing mechanism were modified, so that some machines are now dedicated to serving queries for a specific database. The response time for such Web queries is now faster than it was when each computer handled queries for multiple databases. AVAILABILITY: The code for indexed MegaBLAST is part of the blastn program in the NCBI C++ toolkit. The preprocessor program makembindex is also in the toolkit. Indexed MegaBLAST has been used in production on NCBI's Web BLAST service to search one version of the human and mouse genomes since October 2007. The Linux command-line executables for blastn and makembindex, documentation, and some query sets used to carry out the tests described below are available in the directory: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/indexed_megablast [corrected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Proteínas/química , Análise de Sequência de Proteína/métodos , Software , Interface Usuário-Computador , Sequência de Aminoácidos , Dados de Sequência Molecular , Alinhamento de Sequência/métodos

Using Augmented Reality to Elicit Pretend Play for Children with Autism.

Blackwell, Alan F; Coulouris, George.

IEEE Trans Vis Comput Graph ; 21(5): 598-610, 2015 May.

Artigo em Inglês | MEDLINE | ID: mdl-26357207

RESUMO

Children with autism spectrum condition (ASC) suffer from deficits or developmental delays in symbolic thinking. In particular, they are often found lacking in pretend play during early childhood. Researchers believe that they encounter difficulty in generating and maintaining mental representation of pretense coupled with the immediate reality. We have developed an interactive system that explores the potential of Augmented Reality (AR) technology to visually conceptualize the representation of pretense within an open-ended play environment. Results from an empirical study involving children with ASC aged 4 to 7 demonstrated a significant improvement of pretend play in terms of frequency, duration and relevance using the AR system in comparison to a non computer-assisted situation. We investigated individual differences, skill transfer, system usability and limitations of the proposed AR system. We discuss design guidelines for future AR systems for children with ASC and other pervasive developmental disorders.

Assuntos

Transtorno Autístico/terapia , Gráficos por Computador , Ludoterapia/métodos , Terapia de Exposição à Realidade Virtual/métodos , Síndrome de Asperger/terapia , Criança , Pré-Escolar , Feminino , Humanos , Masculino

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa