Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 73
Filter
Add more filters

Publication year range
1.
Bioinformatics ; 39(2)2023 02 03.
Article in English | MEDLINE | ID: mdl-36759942

ABSTRACT

MOTIVATION: Knowledge graphs (KGs) are being adopted in industry, commerce and academia. Biomedical KG presents a challenge due to the complexity, size and heterogeneity of the underlying information. RESULTS: In this work, we present the Scalable Precision Medicine Open Knowledge Engine (SPOKE), a biomedical KG connecting millions of concepts via semantically meaningful relationships. SPOKE contains 27 million nodes of 21 different types and 53 million edges of 55 types downloaded from 41 databases. The graph is built on the framework of 11 ontologies that maintain its structure, enable mappings and facilitate navigation. SPOKE is built weekly by python scripts which download each resource, check for integrity and completeness, and then create a 'parent table' of nodes and edges. Graph queries are translated by a REST API and users can submit searches directly via an API or a graphical user interface. Conclusions/Significance: SPOKE enables the integration of seemingly disparate information to support precision medicine efforts. AVAILABILITY AND IMPLEMENTATION: The SPOKE neighborhood explorer is available at https://spoke.rbvi.ucsf.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Pattern Recognition, Automated , Precision Medicine , Databases, Factual
2.
J Comput Chem ; 43(15): 1053-1062, 2022 06 05.
Article in English | MEDLINE | ID: mdl-35394655

ABSTRACT

Pfizer's Crystal Structure Database (CSDB) is a key enabling technology that allows scientists on structure-based projects rapid access to Pfizer's vast library of in-house crystal structures, as well as a significant number of structures imported from the Protein Data Bank. In addition to capturing basic information such as the asymmetric unit coordinates, reflection data, and the like, CSDB employs a variety of automated methods to first ensure a standard level of annotations and error checking, and then to add significant value for design teams by processing the structures through a sequence of algorithms that prepares the structures for use in modeling. The structures are made available, both as the original asymmetric unit as submitted, as well as the final prepared structures, through REST-based web services that are consumed by several client desktop applications. The structures can be searched by keyword, sequence, submission date, ligand substructure and similarity search, and other common queries.


Subject(s)
Algorithms , Databases, Protein , Humans , Ligands
3.
AI Mag ; 43(1): 46-58, 2022.
Article in English | MEDLINE | ID: mdl-36093122

ABSTRACT

Knowledge representation and reasoning (KR&R) has been successfully implemented in many fields to enable computers to solve complex problems with AI methods. However, its application to biomedicine has been lagging in part due to the daunting complexity of molecular and cellular pathways that govern human physiology and pathology. In this article we describe concrete uses of SPOKE, an open knowledge network that connects curated information from 37 specialized and human-curated databases into a single property graph, with 3 million nodes and 15 million edges to date. Applications discussed in this article include drug discovery, COVID-19 research and chronic disease diagnosis and management.

4.
PLoS Comput Biol ; 15(4): e1006842, 2019 04.
Article in English | MEDLINE | ID: mdl-31009453

ABSTRACT

Many proteins fold into highly regular and repetitive three dimensional structures. The analysis of structural patterns and repeated elements is fundamental to understand protein function and evolution. We present recent improvements to the CE-Symm tool for systematically detecting and analyzing the internal symmetry and structural repeats in proteins. In addition to the accurate detection of internal symmetry, the tool is now capable of i) reporting the type of symmetry, ii) identifying the smallest repeating unit, iii) describing the arrangement of repeats with transformation operations and symmetry axes, and iv) comparing the similarity of all the internal repeats at the residue level. CE-Symm 2.0 helps the user investigate proteins with a robust and intuitive sequence-to-structure analysis, with many applications in protein classification, functional annotation and evolutionary studies. We describe the algorithmic extensions of the method and demonstrate its applications to the study of interesting cases of protein evolution.


Subject(s)
Algorithms , Computational Biology/methods , Proteins/chemistry , Software , Amino Acid Sequence , Databases, Protein , Models, Molecular , Sequence Analysis, Protein
5.
PLoS Comput Biol ; 15(2): e1006791, 2019 02.
Article in English | MEDLINE | ID: mdl-30735498

ABSTRACT

BioJava is an open-source project that provides a Java library for processing biological data. The project aims to simplify bioinformatic analyses by implementing parsers, data structures, and algorithms for common tasks in genomics, structural biology, ontologies, phylogenetics, and more. Since 2012, we have released two major versions of the library (4 and 5) that include many new features to tackle challenges with increasingly complex macromolecular structure data. BioJava requires Java 8 or higher and is freely available under the LGPL 2.1 license. The project is hosted on GitHub at https://github.com/biojava/biojava. More information and documentation can be found online on the BioJava website (http://www.biojava.org) and tutorial (https://github.com/biojava/biojava-tutorial). All inquiries should be directed to the GitHub page or the BioJava mailing list (http://lists.open-bio.org/mailman/listinfo/biojava-l).


Subject(s)
Computational Biology/methods , Access to Information , Algorithms , Gene Library , Genome/genetics , Genomics , Information Storage and Retrieval , Internet , Software
6.
Bioinformatics ; 34(21): 3755-3758, 2018 11 01.
Article in English | MEDLINE | ID: mdl-29850778

ABSTRACT

Motivation: The interactive visualization of very large macromolecular complexes on the web is becoming a challenging problem as experimental techniques advance at an unprecedented rate and deliver structures of increasing size. Results: We have tackled this problem by developing highly memory-efficient and scalable extensions for the NGL WebGL-based molecular viewer and by using Macromolecular Transmission Format (MMTF), a binary and compressed MMTF. These enable NGL to download and render molecular complexes with millions of atoms interactively on desktop computers and smartphones alike, making it a tool of choice for web-based molecular visualization in research and education. Availability and implementation: The source code is freely available under the MIT license at github.com/arose/ngl and distributed on NPM (npmjs.com/package/ngl). MMTF-JavaScript encoders and decoders are available at github.com/rcsb/mmtf-javascript.


Subject(s)
Computer Graphics , Internet , Macromolecular Substances , Software
7.
Nucleic Acids Res ; 45(D1): D271-D281, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27794042

ABSTRACT

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, http://rcsb.org), the US data center for the global PDB archive, makes PDB data freely available to all users, from structural biologists to computational biologists and beyond. New tools and resources have been added to the RCSB PDB web portal in support of a 'Structural View of Biology.' Recent developments have improved the User experience, including the high-speed NGL Viewer that provides 3D molecular visualization in any web browser, improved support for data file download and enhanced organization of website pages for query, reporting and individual structure exploration. Structure validation information is now visible for all archival entries. PDB data have been integrated with external biological resources, including chromosomal position within the human genome; protein modifications; and metabolic pathways. PDB-101 educational materials have been reorganized into a searchable website and expanded to include new features such as the Geis Digital Archive.


Subject(s)
Computational Biology/methods , Databases, Genetic , Proteins/chemistry , Proteins/genetics , Datasets as Topic , Metabolic Networks and Pathways , Models, Molecular , Protein Conformation , Proteins/metabolism , Software , Structure-Activity Relationship , User-Computer Interface , Web Browser
8.
Bioinformatics ; 33(13): 2047-2049, 2017 Jul 01.
Article in English | MEDLINE | ID: mdl-28334105

ABSTRACT

SUMMARY: We developed a new software tool, BioJava-ModFinder, for identifying protein modifications observed in 3D structures archived in the Protein Data Bank (PDB). Information on more than 400 types of protein modifications were collected and curated from annotations in PDB, RESID, and PSI-MOD. We divided these modifications into three categories: modified residues, attachment modifications, and cross-links. We have developed a systematic method to identify these modifications in 3D protein structures. We have integrated this package with the RCSB PDB web application and added protein modification annotations to the sequence diagram and structure display. By scanning all 3D structures in the PDB using BioJava-ModFinder, we identified more than 30 000 structures with protein modifications, which can be searched, browsed, and visualized on the RCSB PDB website. AVAILABILITY AND IMPLEMENTATION: BioJava-ModFinder is available as open source (LGPL license) at ( https://github.com/biojava/biojava/tree/master/biojava-modfinder ). The RCSB PDB can be accessed at http://www.rcsb.org . CONTACT: pwrose@ucsd.edu.


Subject(s)
Computational Biology/methods , Databases, Protein , Protein Conformation , Software , Internet
9.
PLoS Comput Biol ; 13(6): e1005575, 2017 Jun.
Article in English | MEDLINE | ID: mdl-28574982

ABSTRACT

Recent advances in experimental techniques have led to a rapid growth in complexity, size, and number of macromolecular structures that are made available through the Protein Data Bank. This creates a challenge for macromolecular visualization and analysis. Macromolecular structure files, such as PDB or PDBx/mmCIF files can be slow to transfer, parse, and hard to incorporate into third-party software tools. Here, we present a new binary and compressed data representation, the MacroMolecular Transmission Format, MMTF, as well as software implementations in several languages that have been developed around it, which address these issues. We describe the new format and its APIs and demonstrate that it is several times faster to parse, and about a quarter of the file size of the current standard format, PDBx/mmCIF. As a consequence of the new data representation, it is now possible to visualize structures with millions of atoms in a web browser, keep the whole PDB archive in memory or parse it within few minutes on average computers, which opens up a new way of thinking how to design and implement efficient algorithms in structural bioinformatics. The PDB archive is available in MMTF file format through web services and data that are updated on a weekly basis.


Subject(s)
Computational Biology/methods , Databases, Chemical , Macromolecular Substances , Software , Internet , Macromolecular Substances/analysis , Macromolecular Substances/chemistry , Macromolecular Substances/classification , Molecular Structure
10.
Bioinformatics ; 32(24): 3833-3835, 2016 12 15.
Article in English | MEDLINE | ID: mdl-27551105

ABSTRACT

The Protein Data Bank (PDB) now contains more than 120,000 three-dimensional (3D) structures of biological macromolecules. To allow an interpretation of how PDB data relates to other publicly available annotations, we developed a novel data integration platform that maps 3D structural information across various datasets. This integration bridges from the human genome across protein sequence to 3D structure space. We developed novel software solutions for data management and visualization, while incorporating new libraries for web-based visualization using SVG graphics. AVAILABILITY AND IMPLEMENTATION: The new views are available from http://www.rcsb.org and software is available from https://github.com/rcsb/. CONTACT: andreas.prlic@rcsb.orgSupplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Databases, Protein , Protein Conformation , Software , Amino Acid Sequence , Computer Graphics , Genomics , Humans , User-Computer Interface
12.
J Comput Aided Mol Des ; 31(3): 301-304, 2017 03.
Article in English | MEDLINE | ID: mdl-27995514

ABSTRACT

Scientific software engineering is a distinct discipline from both computational chemistry project support and research informatics. A scientific software engineer not only has a deep understanding of the science of drug discovery but also the desire, skills and time to apply good software engineering practices. A good team of scientific software engineers can create a software foundation that is maintainable, validated and robust. If done correctly, this foundation enable the organization to investigate new and novel computational ideas with a very high level of efficiency.


Subject(s)
Computer-Aided Design , Drug Discovery/methods , Drug Industry/methods , Software , Chemistry, Pharmaceutical , Computational Biology , Drug Design , Models, Molecular
13.
Nucleic Acids Res ; 43(Database issue): D345-56, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25428375

ABSTRACT

The RCSB Protein Data Bank (RCSB PDB, http://www.rcsb.org) provides access to 3D structures of biological macromolecules and is one of the leading resources in biology and biomedicine worldwide. Our efforts over the past 2 years focused on enabling a deeper understanding of structural biology and providing new structural views of biology that support both basic and applied research and education. Herein, we describe recently introduced data annotations including integration with external biological resources, such as gene and drug databases, new visualization tools and improved support for the mobile web. We also describe access to data files, web services and open access software components to enable software developers to more effectively mine the PDB archive and related annotations. Our efforts are aimed at expanding the role of 3D structure in understanding biology and medicine.


Subject(s)
Databases, Protein , Protein Conformation , Binding Sites , Internet , Membrane Proteins/chemistry , Molecular Biology/education , Molecular Sequence Annotation , Multiprotein Complexes/chemistry , Peptides/chemistry , Pharmaceutical Preparations/chemistry , Research , Software
14.
Bioinformatics ; 31(1): 126-7, 2015 Jan 01.
Article in English | MEDLINE | ID: mdl-25183487

ABSTRACT

SUMMARY: The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) resource provides tools for query, analysis and visualization of the 3D structures in the PDB archive. As the mobile Web is starting to surpass desktop and laptop usage, scientists and educators are beginning to integrate mobile devices into their research and teaching. In response, we have developed the RCSB PDB Mobile app for the iOS and Android mobile platforms to enable fast and convenient access to RCSB PDB data and services. Using the app, users from the general public to expert researchers can quickly search and visualize biomolecules, and add personal annotations via the RCSB PDB's integrated MyPDB service. AVAILABILITY AND IMPLEMENTATION: RCSB PDB Mobile is freely available from the Apple App Store and Google Play (http://www.rcsb.org).


Subject(s)
Computational Biology/methods , Computer Graphics , Databases, Protein , Mobile Applications , Software , Biomedical Research , Humans , User-Computer Interface , Workflow
16.
Nucleic Acids Res ; 41(Database issue): D475-82, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23193259

ABSTRACT

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) develops tools and resources that provide a structural view of biology for research and education. The RCSB PDB web site (http://www.rcsb.org) uses the curated 3D macromolecular data contained in the PDB archive to offer unique methods to access, report and visualize data. Recent activities have focused on improving methods for simple and complex searches of PDB data, creating specialized access to chemical component data and providing domain-based structural alignments. New educational resources are offered at the PDB-101 educational view of the main web site such as Author Profiles that display a researcher's PDB entries in a timeline. To promote different kinds of access to the RCSB PDB, Web Services have been expanded, and an RCSB PDB Mobile application for the iPhone/iPad has been released. These improvements enable new opportunities for analyzing and understanding structure data.


Subject(s)
Databases, Protein , Protein Conformation , Biochemistry/education , Computer Graphics , Internet , Ligands , Protein Structure, Tertiary , Research , Structural Homology, Protein
17.
Angew Chem Int Ed Engl ; 54(52): 15762-6, 2015 Dec 21.
Article in English | MEDLINE | ID: mdl-26768696

ABSTRACT

A new class of stabilized pentacene derivatives with externally fused five-membered rings are prepared by means of a key palladium-catalyzed cyclopentannulation step. The target compounds are synthesized by chemical manipulation of a partially saturated 6,13-dibromopentacene precursor that can be fully aromatized in a final step through a DDQ-mediated dehydrogenation reaction (DDQ=2,3-dichloro-5,6-dicyano-1,4-benzoquinone). The new 1,2,8,9-tetraaryldicyclopenta[fg,qr]pentacene derivatives have narrow energy gaps of circa 1.2 eV and behave as strong electron acceptors with lowest unoccupied molecular orbital energies between -3.81 and -3.90 eV. Photodegradation studies reveal the new compounds are more photostable than 6,13-bis(triisopropylsilylethynyl)pentacene (TIPS-pentacene).

18.
BMC Fam Pract ; 15: 122, 2014 Jun 17.
Article in English | MEDLINE | ID: mdl-24938306

ABSTRACT

BACKGROUND: Survival rates following a diagnosis of cancer vary between countries. The International Cancer Benchmarking Partnership (ICBP), a collaboration between six countries with primary care led health services, was set up in 2009 to investigate the causes of these differences. Module 3 of this collaboration hypothesised that an association exists between the readiness of primary care physicians (PCP) to investigate for cancer - the 'threshold' risk level at which they investigate or refer to a specialist for consideration of possible cancer - and survival for that cancer (lung, colorectal and ovarian). We describe the development of an international survey instrument to test this hypothesis. METHODS: The work was led by an academic steering group in England. They agreed that an online survey was the most pragmatic way of identifying differences between the jurisdictions. Research questions were identified through clinical experience and expert knowledge of the relevant literature.A survey comprising a set of direct questions and five clinical scenarios was developed to investigate the hypothesis. The survey content was discussed and refined concurrently and repeatedly with international partners. The survey was validated using an iterative process in England. Following validation the survey was adapted to be relevant to the health systems operating in other jurisdictions and translated into Danish, Norwegian and Swedish, and into Canadian and Australian English. RESULTS: This work has produced a survey with face, content and cross cultural validity that will be circulated in all six countries. It could also form a benchmark for similar surveys in countries with similar health care systems. CONCLUSIONS: The vignettes could also be used as educational resources. This study is likely to impact on healthcare policy and practice in participating countries.


Subject(s)
Neoplasms/diagnosis , Practice Patterns, Physicians'/statistics & numerical data , Primary Health Care/standards , Surveys and Questionnaires , Australia , Canada , Denmark , England , Humans , Norway , Sweden , Translating
19.
Bioinformatics ; 28(20): 2693-5, 2012 Oct 15.
Article in English | MEDLINE | ID: mdl-22877863

ABSTRACT

UNLABELLED: BioJava is an open-source project for processing of biological data in the Java programming language. We have recently released a new version (3.0.5), which is a major update to the code base that greatly extends its functionality. RESULTS: BioJava now consists of several independent modules that provide state-of-the-art tools for protein structure comparison, pairwise and multiple sequence alignments, working with DNA and protein sequences, analysis of amino acid properties, detection of protein modifications and prediction of disordered regions in proteins as well as parsers for common file formats using a biologically meaningful data model. AVAILABILITY: BioJava is an open-source project distributed under the Lesser GPL (LGPL). BioJava can be downloaded from the BioJava website (http://www.biojava.org). BioJava requires Java 1.6 or higher. All inquiries should be directed to the BioJava mailing lists. Details are available at http://biojava.org/wiki/BioJava:MailingLists.


Subject(s)
Proteins/chemistry , Sequence Analysis , Software , Amino Acids/chemistry , Computational Biology , Genomics , Protein Conformation , Protein Processing, Post-Translational , Sequence Alignment , Sequence Analysis, DNA , Sequence Analysis, Protein
20.
Nucleic Acids Res ; 39(Database issue): D392-401, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21036868

ABSTRACT

The RCSB Protein Data Bank (RCSB PDB) web site (http://www.pdb.org) has been redesigned to increase usability and to cater to a larger and more diverse user base. This article describes key enhancements and new features that fall into the following categories: (i) query and analysis tools for chemical structure searching, query refinement, tabulation and export of query results; (ii) web site customization and new structure alerts; (iii) pair-wise and representative protein structure alignments; (iv) visualization of large assemblies; (v) integration of structural data with the open access literature and binding affinity data; and (vi) web services and web widgets to facilitate integration of PDB data and tools with other resources. These improvements enable a range of new possibilities to analyze and understand structure data. The next generation of the RCSB PDB web site, as described here, provides a rich resource for research and education.


Subject(s)
Databases, Protein , Proteins/chemistry , Animals , Computer Graphics , Humans , Internet , Ligands , Mice , Protein Conformation , Systems Integration , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL