Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PeerJ ; 11: e16026, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37727687

RESUMO

The discovery of low-coverage (i.e. uncovered) regions containing clinically significant variants, especially when they are related to the patient's clinical phenotype, is critical for whole-exome sequencing (WES) based clinical diagnosis. Therefore, it is essential to develop tools to identify the existence of clinically important variants in low-coverage regions. Here, we introduce a desktop application, namely DEVOUR (DEleterious Variants On Uncovered Regions), that analyzes read alignments for WES experiments, identifies genomic regions with no or low-coverage (read depth < 5) and then annotates known variants in the low-coverage regions using clinical variant annotation databases. As a proof of concept, DEVOUR was used to analyze a total of 28 samples from a publicly available Hirschsprung disease-related WES project (NCBI Bioproject: https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB19327), revealing the potential existence of 98 disease-associated variants in low-coverage regions. DEVOUR is available from https://github.com/projectDevour/DEVOUR under the MIT license.


Assuntos
Existencialismo , Doença de Hirschsprung , Humanos , Sequenciamento do Exoma , Bases de Dados Factuais , Genômica , Doença de Hirschsprung/diagnóstico
2.
Bioinformatics ; 31(6): 926-32, 2015 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-25398609

RESUMO

MOTIVATION: UniRef databases provide full-scale clustering of UniProtKB sequences and are utilized for a broad range of applications, particularly similarity-based functional annotation. Non-redundancy and intra-cluster homogeneity in UniRef were recently improved by adding a sequence length overlap threshold. Our hypothesis is that these improvements would enhance the speed and sensitivity of similarity searches and improve the consistency of annotation within clusters. RESULTS: Intra-cluster molecular function consistency was examined by analysis of Gene Ontology terms. Results show that UniRef clusters bring together proteins of identical molecular function in more than 97% of the clusters, implying that clusters are useful for annotation and can also be used to detect annotation inconsistencies. To examine coverage in similarity results, BLASTP searches against UniRef50 followed by expansion of the hit lists with cluster members demonstrated advantages compared with searches against UniProtKB sequences; the searches are concise (∼7 times shorter hit list before expansion), faster (∼6 times) and more sensitive in detection of remote similarities (>96% recall at e-value <0.0001). Our results support the use of UniRef clusters as a comprehensive and scalable alternative to native sequence databases for similarity searches and reinforces its reliability for use in functional annotation.


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Dioxigenases/metabolismo , Proteínas de Membrana/metabolismo , Proteínas/metabolismo , Análise de Sequência de Proteína , Software , Homólogo AlkB 5 da RNA Desmetilase , Análise por Conglomerados , Dioxigenases/química , Dioxigenases/genética , Ontologia Genética , Humanos , Armazenamento e Recuperação da Informação , Proteínas de Membrana/química , Proteínas de Membrana/genética , Anotação de Sequência Molecular , Proteínas/química , Proteínas/genética
3.
BMC Immunol ; 15: 61, 2014 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-25486901

RESUMO

BACKGROUND: Near universal administration of vaccines mandates intense pharmacovigilance for vaccine safety and a stringently low tolerance for adverse events. Reports of autoimmune diseases (AID) following vaccination have been challenging to evaluate given the high rates of vaccination, background incidence of autoimmunity, and low incidence and variable times for onset of AID after vaccinations. In order to identify biologically plausible pathways to adverse autoimmune events of vaccine-related AID, we used a systems biology approach to create a matrix of innate and adaptive immune mechanisms active in specific diseases, responses to vaccine antigens, adjuvants, preservatives and stabilizers, for the most common vaccine-associated AID found in the Vaccine Adverse Event Reporting System. RESULTS: This report focuses on Guillain-Barre Syndrome (GBS), Rheumatoid Arthritis (RA), Systemic Lupus Erythematosus (SLE), and Idiopathic (or immune) Thrombocytopenic Purpura (ITP). Multiple curated databases and automated text mining of PubMed literature identified 667 genes associated with RA, 448 with SLE, 49 with ITP and 73 with GBS. While all data sources provided valuable and unique gene associations, text mining using natural language processing (NLP) algorithms provided the most information but required curation to remove incorrect associations. Six genes were associated with all four AIDs. Thirty-three pathways were shared by the four AIDs. Classification of genes into twelve immune system related categories identified more "Th17 T-cell subtype" genes in RA than the other AIDs, and more "Chemokine plus Receptors" genes associated with RA than SLE. Gene networks were visualized and clustered into interconnected modules with specific gene clusters for each AID, including one in RA with ten C-X-C motif chemokines. The intersection of genes associated with GBS, GBS peptide auto-antigens, influenza A infection, and influenza vaccination created a subnetwork of genes that inferred a possible role for the MAPK signaling pathway in influenza vaccine related GBS. CONCLUSIONS: Results showing unique and common gene sets, pathways, immune system categories and functional clusters of genes in four autoimmune diseases suggest it is possible to develop molecular classifications of autoimmune and inflammatory events. Combining this information with cellular and other disease responses should greatly aid in the assessment of potential immune-mediated adverse events following vaccination.


Assuntos
Doenças Autoimunes , Simulação por Computador , Controle de Infecções , Infecções/imunologia , Modelos Imunológicos , Vacinação , Vacinas , Imunidade Adaptativa , Doenças Autoimunes/genética , Doenças Autoimunes/imunologia , Doenças Autoimunes/patologia , Humanos , Infecções/genética , Infecções/patologia , Vacinas/efeitos adversos , Vacinas/imunologia
4.
Bioinformatics ; 29(21): 2808-9, 2013 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-23958731

RESUMO

SUMMARY: We have developed a new web application for peptide matching using Apache Lucene-based search engine. The Peptide Match service is designed to quickly retrieve all occurrences of a given query peptide from UniProt Knowledgebase (UniProtKB) with isoforms. The matched proteins are shown in summary tables with rich annotations, including matched sequence region(s) and links to corresponding proteins in a number of proteomic/peptide spectral databases. The results are grouped by taxonomy and can be browsed by organism, taxonomic group or taxonomy tree. The service supports queries where isobaric leucine and isoleucine are treated equivalent, and an option for searching UniRef100 representative sequences, as well as dynamic queries to major proteomic databases. In addition to the web interface, we also provide RESTful web services. The underlying data are updated every 4 weeks in accordance with the UniProt releases. AVAILABILITY: http://proteininformationresource.org/peptide.shtml. CONTACT: chenc@udel.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bases de Dados de Proteínas , Peptídeos/química , Ferramenta de Busca , Internet , Bases de Conhecimento , Proteômica , Análise de Sequência de Proteína
5.
J Digit Imaging ; 26(4): 630-41, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23589184

RESUMO

A widening array of novel imaging biomarkers is being developed using ever more powerful clinical and preclinical imaging modalities. These biomarkers have demonstrated effectiveness in quantifying biological processes as they occur in vivo and in the early prediction of therapeutic outcomes. However, quantitative imaging biomarker data and knowledge are not standardized, representing a critical barrier to accumulating medical knowledge based on quantitative imaging data. We use an ontology to represent, integrate, and harmonize heterogeneous knowledge across the domain of imaging biomarkers. This advances the goal of developing applications to (1) improve precision and recall of storage and retrieval of quantitative imaging-related data using standardized terminology; (2) streamline the discovery and development of novel imaging biomarkers by normalizing knowledge across heterogeneous resources; (3) effectively annotate imaging experiments thus aiding comprehension, re-use, and reproducibility; and (4) provide validation frameworks through rigorous specification as a basis for testable hypotheses and compliance tests. We have developed the Quantitative Imaging Biomarker Ontology (QIBO), which currently consists of 488 terms spanning the following upper classes: experimental subject, biological intervention, imaging agent, imaging instrument, image post-processing algorithm, biological target, indicated biology, and biomarker application. We have demonstrated that QIBO can be used to annotate imaging experiments with standardized terms in the ontology and to generate hypotheses for novel imaging biomarker-disease associations. Our results established the utility of QIBO in enabling integrated analysis of quantitative imaging data.


Assuntos
Biomarcadores , Pesquisa Biomédica , Diagnóstico por Imagem , Informática Médica/métodos , Ontologias Biológicas , Bases de Dados Factuais , Humanos , Informática Médica/normas , Reprodutibilidade dos Testes
6.
J Digit Imaging ; 26(4): 614-29, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23546775

RESUMO

Quantitative imaging biomarkers are of particular interest in drug development for their potential to accelerate the drug development pipeline. The lack of consensus methods and carefully characterized performance hampers the widespread availability of these quantitative measures. A framework to support collaborative work on quantitative imaging biomarkers would entail advanced statistical techniques, the development of controlled vocabularies, and a service-oriented architecture for processing large image archives. Until now, this framework has not been developed. With the availability of tools for automatic ontology-based annotation of datasets, coupled with image archives, and a means for batch selection and processing of image and clinical data, imaging will go through a similar increase in capability analogous to what advanced genetic profiling techniques have brought to molecular biology. We report on our current progress on developing an informatics infrastructure to store, query, and retrieve imaging biomarker data across a wide range of resources in a semantically meaningful way that facilitates the collaborative development and validation of potential imaging biomarkers by many stakeholders. Specifically, we describe the semantic components of our system, QI-Bench, that are used to specify and support experimental activities for statistical validation in quantitative imaging.


Assuntos
Biomarcadores/análise , Diagnóstico por Imagem/métodos , Diagnóstico por Imagem/estatística & dados numéricos , Informática Médica/métodos , Informática Médica/estatística & dados numéricos , Algoritmos , Interpretação Estatística de Dados , Bases de Dados Factuais/estatística & dados numéricos , Humanos , Imageamento Tridimensional , Reprodutibilidade dos Testes
7.
J Am Med Inform Assoc ; 19(6): 1095-102, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22744959

RESUMO

OBJECTIVE: Meaningful exchange of information is a fundamental challenge in collaborative biomedical research. To help address this, the authors developed the Life Sciences Domain Analysis Model (LS DAM), an information model that provides a framework for communication among domain experts and technical teams developing information systems to support biomedical research. The LS DAM is harmonized with the Biomedical Research Integrated Domain Group (BRIDG) model of protocol-driven clinical research. Together, these models can facilitate data exchange for translational research. MATERIALS AND METHODS: The content of the LS DAM was driven by analysis of life sciences and translational research scenarios and the concepts in the model are derived from existing information models, reference models and data exchange formats. The model is represented in the Unified Modeling Language and uses ISO 21090 data types. RESULTS: The LS DAM v2.2.1 is comprised of 130 classes and covers several core areas including Experiment, Molecular Biology, Molecular Databases and Specimen. Nearly half of these classes originate from the BRIDG model, emphasizing the semantic harmonization between these models. Validation of the LS DAM against independently derived information models, research scenarios and reference databases supports its general applicability to represent life sciences research. DISCUSSION: The LS DAM provides unambiguous definitions for concepts required to describe life sciences research. The processes established to achieve consensus among domain experts will be applied in future iterations and may be broadly applicable to other standardization efforts. CONCLUSIONS: The LS DAM provides common semantics for life sciences research. Through harmonization with BRIDG, it promotes interoperability in translational science.


Assuntos
Disciplinas das Ciências Biológicas , Disseminação de Informação , Sistemas de Informação , Integração de Sistemas , Pesquisa Translacional Biomédica , Humanos , Armazenamento e Recuperação da Informação , Padrões de Referência , Semântica , Unified Medical Language System
8.
J Am Med Inform Assoc ; 19(e1): e125-8, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22323393

RESUMO

Quality control and harmonization of data is a vital and challenging undertaking for any successful data coordination center and a responsibility shared between the multiple sites that produce, integrate, and utilize the data. Here we describe a coordinated effort between scientists and data managers in the Cancer Family Registries to implement a data governance infrastructure consisting of both organizational and technical solutions. The technical solution uses a rule-based validation system that facilitates error detection and correction for data centers submitting data to a central informatics database. Validation rules comprise both standard checks on allowable values and a crosscheck of related database elements for logical and scientific consistency. Evaluation over a 2-year timeframe showed a significant decrease in the number of errors in the database and a concurrent increase in data consistency and accuracy.


Assuntos
Neoplasias da Mama , Neoplasias do Colo , Bases de Dados Factuais/normas , Sistema de Registros/normas , Neoplasias da Mama/epidemiologia , Neoplasias do Colo/epidemiologia , Bases de Dados Factuais/estatística & dados numéricos , Humanos , Controle de Qualidade , Projetos de Pesquisa , Estados Unidos
9.
Bioinformatics ; 27(8): 1190-1, 2011 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-21478197

RESUMO

MOTIVATION: Identifier (ID) mapping establishes links between various biological databases and is an essential first step for molecular data integration and functional annotation. ID mapping allows diverse molecular data on genes and proteins to be combined and mapped to functional pathways and ontologies. We have developed comprehensive protein-centric ID mapping services providing mappings for 90 IDs derived from databases on genes, proteins, pathways, diseases, structures, protein families, protein interaction, literature, ontologies, etc. The services are widely used and have been regularly updated since 2006. AVAILABILITY: www.uniprot.org/mappingandproteininformation-resource.org/pirwww/search/idmapping.shtml CONTACT: huang@dbi.udel.edu.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Proteínas/genética , Software , Internet
10.
BMC Bioinformatics ; 10: 136, 2009 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-19426475

RESUMO

BACKGROUND: The UniProt consortium was formed in 2002 by groups from the Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) at Georgetown University, and soon afterwards the website http://www.uniprot.org was set up as a central entry point to UniProt resources. Requests to this address were redirected to one of the three organisations' websites. While these sites shared a set of static pages with general information about UniProt, their pages for searching and viewing data were different. To provide users with a consistent view and to cut the cost of maintaining three separate sites, the consortium decided to develop a common website for UniProt. Following several years of intense development and a year of public beta testing, the http://www.uniprot.org domain was switched to the newly developed site described in this paper in July 2008. DESCRIPTION: The UniProt consortium is the main provider of protein sequence and annotation data for much of the life sciences community. The http://www.uniprot.org website is the primary access point to this data and to documentation and basic tools for the data. These tools include full text and field-based text search, similarity search, multiple sequence alignment, batch retrieval and database identifier mapping. This paper discusses the design and implementation of the new website, which was released in July 2008, and shows how it improves data access for users with different levels of experience, as well as to machines for programmatic access.http://www.uniprot.org/ is open for both academic and commercial use. The site was built with open source tools and libraries. Feedback is very welcome and should be sent to help@uniprot.org. CONCLUSION: The new UniProt website makes accessing and understanding UniProt easier than ever. The two main lessons learned are that getting the basics right for such a data provider website has huge benefits, but is not trivial and easy to underestimate, and that there is no substitute for using empirical data throughout the development process to decide on what is and what is not working for your users.


Assuntos
Bases de Dados de Proteínas , Análise de Sequência de Proteína , Armazenamento e Recuperação da Informação/métodos , Internet , Proteínas/química , Interface Usuário-Computador
11.
Bioinformatics ; 23(10): 1282-8, 2007 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-17379688

RESUMO

MOTIVATION: Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences. RESULTS: The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering >4 million source sequences, the UniRef100 database combines identical sequences and subfragments from any source organism into a single UniRef entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences at the 90 or 50% sequence identity levels. UniRef100, UniRef90 and UniRef50 yield a database size reduction of approximately 10, 40 and 70%, respectively, from the source sequence set. The reduced redundancy increases the speed of similarity searches and improves detection of distant relationships. UniRef entries contain summary cluster and membership information, including the sequence of a representative protein, member count and common taxonomy of the cluster, the accession numbers of all the merged entries and links to rich functional annotation in UniProtKB to facilitate biological discovery. UniRef has already been applied to broad research areas ranging from genome annotation to proteomics data analysis. AVAILABILITY: UniRef is updated biweekly and is available for online search and retrieval at http://www.uniprot.org, as well as for download at ftp://ftp.uniprot.org/pub/databases/uniprot/uniref. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , Animais , Humanos , Armazenamento e Recuperação da Informação
12.
Nucleic Acids Res ; 32(Database issue): D112-4, 2004 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-14681371

RESUMO

The Protein Information Resource (PIR) is an integrated public resource of protein informatics. To facilitate the sensible propagation and standardization of protein annotation and the systematic detection of annotation errors, PIR has extended its superfamily concept and developed the SuperFamily (PIRSF) classification system. Based on the evolutionary relationships of whole proteins, this classification system allows annotation of both specific biological and generic biochemical functions. The system adopts a network structure for protein classification from superfamily to subfamily levels. Protein family members are homologous (sharing common ancestry) and homeomorphic (sharing full-length sequence similarity with common domain architecture). The PIRSF database consists of two data sets, preliminary clusters and curated families. The curated families include family name, protein membership, parent-child relationship, domain architecture, and optional description and bibliography. PIRSF is accessible from the website at http://pir.georgetown.edu/pirsf/ for report retrieval and sequence classification. The report presents family annotation, membership statistics, cross-references to other databases, graphical display of domain architecture, and links to multiple sequence alignments and phylogenetic trees for curated families. PIRSF can be utilized to analyze phylogenetic profiles, to reveal functional convergence and divergence, and to identify interesting relationships between homeomorphic families, domains and structural classes.


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Proteínas/química , Proteínas/classificação , Motivos de Aminoácidos , Animais , Evolução Molecular , Humanos , Armazenamento e Recuperação da Informação , Internet , Estrutura Terciária de Proteína
13.
Nucleic Acids Res ; 31(1): 345-7, 2003 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-12520019

RESUMO

The Protein Information Resource (PIR) is an integrated public resource of protein informatics that supports genomic and proteomic research and scientific discovery. PIR maintains the Protein Sequence Database (PSD), an annotated protein database containing over 283 000 sequences covering the entire taxonomic range. Family classification is used for sensitive identification, consistent annotation, and detection of annotation errors. The superfamily curation defines signature domain architecture and categorizes memberships to improve automated classification. To increase the amount of experimental annotation, the PIR has developed a bibliography system for literature searching, mapping, and user submission, and has conducted retrospective attribution of citations for experimental features. PIR also maintains NREF, a non-redundant reference database, and iProClass, an integrated database of protein family, function, and structure information. PIR-NREF provides a timely and comprehensive collection of protein sequences, currently consisting of more than 1 000 000 entries from PIR-PSD, SWISS-PROT, TrEMBL, RefSeq, GenPept, and PDB. The PIR web site (http://pir.georgetown.edu) connects data analysis tools to underlying databases for information retrieval and knowledge discovery, with functionalities for interactive queries, combinations of sequence and text searches, and sorting and visual exploration of search results. The FTP site provides free download for PSD and NREF biweekly releases and auxiliary databases and files.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Proteínas/classificação , Sequência de Aminoácidos , Animais , Bases de Dados Bibliográficas , Internet , Proteínas/genética
14.
Nucleic Acids Res ; 30(1): 35-7, 2002 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-11752247

RESUMO

The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases).


Assuntos
Bases de Dados de Proteínas , Sequência de Aminoácidos , Animais , Humanos , Armazenamento e Recuperação da Informação , Agências Internacionais , Internet , Proteínas/classificação , Proteínas/genética , Integração de Sistemas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...