Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Nucleic Acids Res ; 52(D1): D762-D769, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37962425

ABSTRACT

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains over 315 000 bacterial and archaeal genomes and 236 million proteins with up-to-date and consistent annotation. In the past 3 years, we have expanded the diversity of the RefSeq collection by including the best quality metagenome-assembled genomes (MAGs) submitted to INSDC (DDBJ, ENA and GenBank), while maintaining its quality by adding validation checks. Assemblies are now more stringently evaluated for contamination and for completeness of annotation prior to acceptance into RefSeq. MAGs now account for over 17000 assemblies in RefSeq, split over 165 orders and 362 families. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP), which is used to annotate nearly all RefSeq assemblies include better detection of protein-coding genes. Nearly 83% of RefSeq proteins are now named by a curated Protein Family Model, a 4.7% increase in the past three years ago. In addition to literature citations, Enzyme Commission numbers, and gene symbols, Gene Ontology terms are now assigned to 48% of RefSeq proteins, allowing for easier multi-genome comparison. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/. PGAP is available as a stand-alone tool able to produce GenBank-ready files at https://github.com/ncbi/pgap.


Subject(s)
Archaea , Bacteria , Databases, Nucleic Acid , Metagenome , Archaea/genetics , Bacteria/genetics , Databases, Nucleic Acid/standards , Databases, Nucleic Acid/trends , Genome, Archaeal/genetics , Genome, Bacterial/genetics , Internet , Molecular Sequence Annotation , Proteins/genetics
2.
Nucleic Acids Res ; 49(D1): D1020-D1028, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33270901

ABSTRACT

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures. As a result, >122 million or 79% of RefSeq proteins are now named based on a match to a curated PFM. Gene symbols, Enzyme Commission numbers or supporting publication attributes are available on over 40% of the PFMs and are inherited by the proteins and features they name, facilitating multi-genome analyses and connections to the literature. In adherence with the principles of FAIR (findable, accessible, interoperable, reusable), the PFMs are available in the Protein Family Models Entrez database to any user. Finally, the reference and representative genome set, a taxonomically diverse subset of RefSeq prokaryotic genomes, is now recalculated regularly and available for download and homology searches with BLAST. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.


Subject(s)
Computational Biology/methods , Databases, Genetic , Genome, Archaeal/genetics , Genome, Bacterial/genetics , Molecular Sequence Annotation/methods , Proteins/genetics , Data Curation/methods , Data Mining/methods , Genomics/methods , Internet , Proteins/classification , User-Computer Interface
3.
IEEE Trans Vis Comput Graph ; 21(5): 598-610, 2015 May.
Article in English | MEDLINE | ID: mdl-26357207

ABSTRACT

Children with autism spectrum condition (ASC) suffer from deficits or developmental delays in symbolic thinking. In particular, they are often found lacking in pretend play during early childhood. Researchers believe that they encounter difficulty in generating and maintaining mental representation of pretense coupled with the immediate reality. We have developed an interactive system that explores the potential of Augmented Reality (AR) technology to visually conceptualize the representation of pretense within an open-ended play environment. Results from an empirical study involving children with ASC aged 4 to 7 demonstrated a significant improvement of pretend play in terms of frequency, duration and relevance using the AR system in comparison to a non computer-assisted situation. We investigated individual differences, skill transfer, system usability and limitations of the proposed AR system. We discuss design guidelines for future AR systems for children with ASC and other pervasive developmental disorders.


Subject(s)
Autistic Disorder/therapy , Computer Graphics , Play Therapy/methods , Virtual Reality Exposure Therapy/methods , Asperger Syndrome/therapy , Child , Child, Preschool , Female , Humans , Male
4.
Nucleic Acids Res ; 41(Web Server issue): W29-33, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23609542

ABSTRACT

The Basic Local Alignment Search Tool (BLAST) website at the National Center for Biotechnology (NCBI) is an important resource for searching and aligning sequences. A new BLAST report allows faster loading of alignments, adds navigation aids, allows easy downloading of subject sequences and reports and has improved usability. Here, we describe these improvements to the BLAST report, discuss design decisions, describe other improvements to the search page and database documentation and outline plans for future development. The NCBI BLAST URL is http://blast.ncbi.nlm.nih.gov.


Subject(s)
Sequence Alignment/methods , Software , Animals , Genomics , Internet , L-Gulonolactone Oxidase/genetics , Rats
5.
BMC Bioinformatics ; 13: 134, 2012 Jun 18.
Article in English | MEDLINE | ID: mdl-22708584

ABSTRACT

BACKGROUND: Choosing appropriate primers is probably the single most important factor affecting the polymerase chain reaction (PCR). Specific amplification of the intended target requires that primers do not have matches to other targets in certain orientations and within certain distances that allow undesired amplification. The process of designing specific primers typically involves two stages. First, the primers flanking regions of interest are generated either manually or using software tools; then they are searched against an appropriate nucleotide sequence database using tools such as BLAST to examine the potential targets. However, the latter is not an easy process as one needs to examine many details between primers and targets, such as the number and the positions of matched bases, the primer orientations and distance between forward and reverse primers. The complexity of such analysis usually makes this a time-consuming and very difficult task for users, especially when the primers have a large number of hits. Furthermore, although the BLAST program has been widely used for primer target detection, it is in fact not an ideal tool for this purpose as BLAST is a local alignment algorithm and does not necessarily return complete match information over the entire primer range. RESULTS: We present a new software tool called Primer-BLAST to alleviate the difficulty in designing target-specific primers. This tool combines BLAST with a global alignment algorithm to ensure a full primer-target alignment and is sensitive enough to detect targets that have a significant number of mismatches to primers. Primer-BLAST allows users to design new target-specific primers in one step as well as to check the specificity of pre-existing primers. Primer-BLAST also supports placing primers based on exon/intron locations and excluding single nucleotide polymorphism (SNP) sites in primers. CONCLUSIONS: We describe a robust and fully implemented general purpose primer design tool that designs target-specific PCR primers. Primer-BLAST offers flexible options to adjust the specificity threshold and other primer properties. This tool is publicly available at http://www.ncbi.nlm.nih.gov/tools/primer-blast.


Subject(s)
Algorithms , DNA Primers/genetics , Polymerase Chain Reaction/methods , Software , Carrier Proteins/genetics , Humans , Introns , Molecular Sequence Data , Polymorphism, Single Nucleotide
6.
BMC Bioinformatics ; 10: 421, 2009 Dec 15.
Article in English | MEDLINE | ID: mdl-20003500

ABSTRACT

BACKGROUND: Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. RESULTS: We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. CONCLUSION: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.


Subject(s)
Computational Biology/methods , Software , Databases, Genetic , Sequence Alignment
7.
Bioinformatics ; 24(16): 1757-64, 2008 Aug 15.
Article in English | MEDLINE | ID: mdl-18567917

ABSTRACT

MOTIVATION: The BLAST software package for sequence comparison speeds up homology search by preprocessing a query sequence into a lookup table. Numerous research studies have suggested that preprocessing the database instead would give better performance. However, production usage of sequence comparison methods that preprocess the database has been limited to programs such as BLAT and SSAHA that are designed to find matches when query and database subsequences are highly similar. RESULTS: We developed a new version of the MegaBLAST module of BLAST that does the initial phase of finding short seeds for matches by searching a database index. We also developed a program makembindex that preprocesses the database into a data structure for rapid seed searching. We show that the new 'indexed MegaBLAST' is faster than the 'non-indexed' version for most practical uses. We show that indexed MegaBLAST is faster than miBLAST, another implementation of BLAST nucleotide searching with a preprocessed database, for most of the 200 queries we tested. To deploy indexed MegaBLAST as part of NCBI'sWeb BLAST service, the storage of databases and the queueing mechanism were modified, so that some machines are now dedicated to serving queries for a specific database. The response time for such Web queries is now faster than it was when each computer handled queries for multiple databases. AVAILABILITY: The code for indexed MegaBLAST is part of the blastn program in the NCBI C++ toolkit. The preprocessor program makembindex is also in the toolkit. Indexed MegaBLAST has been used in production on NCBI's Web BLAST service to search one version of the human and mouse genomes since October 2007. The Linux command-line executables for blastn and makembindex, documentation, and some query sets used to carry out the tests described below are available in the directory: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/indexed_megablast [corrected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Database Management Systems , Databases, Protein , Information Storage and Retrieval/methods , Proteins/chemistry , Sequence Analysis, Protein/methods , Software , User-Computer Interface , Amino Acid Sequence , Molecular Sequence Data , Sequence Alignment/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...