Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters











Database
Language
Publication year range
2.
Bioinformatics ; 36(18): 4699-4705, 2020 09 15.
Article in English | MEDLINE | ID: mdl-32579213

ABSTRACT

MOTIVATION: As the cost of sequencing decreases, the amount of data being deposited into public repositories is increasing rapidly. Public databases rely on the user to provide metadata for each submission that is prone to user error. Unfortunately, most public databases, such as non-redundant (NR), rely on user input and do not have methods for identifying errors in the provided metadata, leading to the potential for error propagation. Previous research on a small subset of the NR database analyzed misclassification based on sequence similarity. To the best of our knowledge, the amount of misclassification in the entire database has not been quantified. We propose a heuristic method to detect potentially misclassified taxonomic assignments in the NR database. We applied a curation technique and quality control to find the most probable taxonomic assignment. Our method incorporates provenance and frequency of each annotation from manually and computationally created databases and clustering information at 95% similarity. RESULTS: We found more than two million potentially taxonomically misclassified proteins in the NR database. Using simulated data, we show a high precision of 97% and a recall of 87% for detecting taxonomically misclassified proteins. The proposed approach and findings could also be applied to other databases. AVAILABILITY AND IMPLEMENTATION: Source code, dataset, documentation, Jupyter notebooks and Docker container are available at https://github.com/boalang/nr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Metadata , Software , Databases, Factual
3.
BMC Bioinformatics ; 20(1): 436, 2019 Aug 22.
Article in English | MEDLINE | ID: mdl-31438850

ABSTRACT

BACKGROUND: Creating a scalable computational infrastructure to analyze the wealth of information contained in data repositories is difficult due to significant barriers in organizing, extracting and analyzing relevant data. Shared data science infrastructures like Boag is needed to efficiently process and parse data contained in large data repositories. The main features of Boag are inspired from existing languages for data intensive computing and can easily integrate data from biological data repositories. RESULTS: As a proof of concept, Boa for genomics, Boag, has been implemented to analyze RefSeq's 153,848 annotation (GFF) and assembly (FASTA) file metadata. Boag provides a massive improvement from existing solutions like Python and MongoDB, by utilizing a domain-specific language that uses Hadoop infrastructure for a smaller storage footprint that scales well and requires fewer lines of code. We execute scripts through Boag to answer questions about the genomes in RefSeq. We identify the largest and smallest genomes deposited, explore exon frequencies for assemblies after 2016, identify the most commonly used bacterial genome assembly program, and address how animal genome assemblies have improved since 2016. Boag databases provide a significant reduction in required storage of the raw data and a significant speed up in its ability to query large datasets due to automated parallelization and distribution of Hadoop infrastructure during computations. CONCLUSIONS: In order to keep pace with our ability to produce biological data, innovative methods are required. The Shared Data Science Infrastructure, Boag, provides researchers a greater access to researchers to efficiently explore data in new ways. We demonstrate the potential of a the domain specific language Boag using the RefSeq database to explore how deposited genome assemblies and annotations are changing over time. This is a small example of how Boag could be used with large biological datasets.


Subject(s)
Data Science , Genomics , Information Dissemination , Animals , Databases, Factual , Databases, Genetic , Exons/genetics , Genome , Sequence Analysis, DNA , Software
4.
J Craniofac Surg ; 27(5): e481-4, 2016 Jul.
Article in English | MEDLINE | ID: mdl-27389328

ABSTRACT

INTRODUCTION: Temporal bone meningoencephalic herniation may occur in head trauma. It is a rare condition with potentially dangerous complications. Several different routes for temporal bone meningoencephalocele have been proposed. CLINICAL REPORT: An11-year-old boy with history of head trauma initially presented with a 9-months history of progressive right-sided hearing loss and facial weakness. The other complaint was formation of a cystic mass in the right external auditory canal. The patient underwent surgery via a mini middle cranial fossa craniotomy associated with a transmastoid approach. CONCLUSION: Although presenting symptoms can be subtle, early suspicion and confirmatory imaging aid in establishing the diagnosis. The combination of computed tomography and magnetic resonance imaging will help in proper preoperative diagnosis. The operation includes transmastoid, middle cranial fossa repair, or combination of both. The multilayer closure of bony defect is very important to avoid cerebrospinal fluid leak. Clinical manifestations, diagnosis, and surgical approaches for posttraumatic meningoencephaloceles arising in the head and neck region are briefly discussed.


Subject(s)
Craniocerebral Trauma/complications , Cysts/diagnostic imaging , Ear Canal/diagnostic imaging , Meningomyelocele/diagnostic imaging , Temporal Bone/diagnostic imaging , Cerebrospinal Fluid Leak/etiology , Child , Cranial Fossa, Middle/surgery , Craniotomy/methods , Cysts/surgery , Facial Paralysis/etiology , Hearing Loss/etiology , Humans , Magnetic Resonance Imaging , Male , Mastoid/diagnostic imaging , Mastoid/pathology , Meningomyelocele/surgery , Tomography, X-Ray Computed
SELECTION OF CITATIONS
SEARCH DETAIL