Búsqueda | Portal Regional de la BVS

1.

A workflow for deriving chemical entities from crystallographic data and its application to the Crystallography Open Database.

Vaitkus, Antanas; Merkys, Andrius; Sander, Thomas; Quirós, Miguel; Thiessen, Paul A; Bolton, Evan E; Grazulis, Saulius.

J Cheminform ; 15(1): 123, 2023 Dec 19.

Artículo en Inglés | MEDLINE | ID: mdl-38115123

RESUMEN

Knowledge about the 3-dimensional structure, orientation and interaction of chemical compounds is important in many areas of science and technology. X-ray crystallography is one of the experimental techniques capable of providing a large amount of structural information for a given compound, and it is widely used for characterisation of organic and metal-organic molecules. The method provides precise 3D coordinates of atoms inside crystals, however, it does not directly deliver information about certain chemical characteristics such as bond orders, delocalization, charges, lone electron pairs or lone electrons. These aspects of a molecular model have to be derived from crystallographic data using refined information about interatomic distances and atom types as well as employing general chemical knowledge. This publication describes a curated automatic pipeline for the derivation of chemical attributes of molecules from crystallographic models. The method is applied to build a catalogue of chemical entities in an open-access crystallographic database, the Crystallography Open Database (COD). The catalogue of such chemical entities is provided openly as a derived database. The content of this catalogue and the problems arising in the fully automated pipeline are discussed, along with the possibilities to introduce manual data curation into the process.

2.

ShinyTPs: Curating Transformation Products from Text Mining Results.

Palm, Emma H; Chirsir, Parviel; Krier, Jessy; Thiessen, Paul A; Zhang, Jian; Bolton, Evan E; Schymanski, Emma L.

Environ Sci Technol Lett ; 10(10): 865-871, 2023 Oct 10.

Artículo en Inglés | MEDLINE | ID: mdl-37840815

RESUMEN

Transformation product (TP) information is essential to accurately evaluate the hazards compounds pose to human health and the environment. However, information about TPs is often limited, and existing data is often not fully Findable, Accessible, Interoperable, and Reusable (FAIR). FAIRifying existing TP knowledge is a relatively easy path toward improving access to data for identification workflows and for machine-learning-based algorithms. ShinyTPs was developed to curate existing transformation information derived from text-mined data within the PubChem database. The application (available as an R package) visualizes the text-mined chemical names to facilitate the user validation of the automatically extracted reactions. ShinyTPs was applied to a case study using 436 tentatively identified compounds to prioritize TP retrieval. This resulted in the extraction of 645 reactions (associated with 496 compounds), of which 319 were not previously available in PubChem. The curated reactions were added to the PubChem Transformations library, which was used as a TP suspect list for identification of TPs using the open-source workflow patRoon. In total, 72 compounds from the library were tentatively identified, 18% of which were curated using ShinyTPs, showing that the app can help support TP identification in non-target analysis workflows.

3.

Per- and Polyfluoroalkyl Substances (PFAS) in PubChem: 7 Million and Growing.

Schymanski, Emma L; Zhang, Jian; Thiessen, Paul A; Chirsir, Parviel; Kondic, Todor; Bolton, Evan E.

Environ Sci Technol ; 57(44): 16918-16928, 2023 11 07.

Artículo en Inglés | MEDLINE | ID: mdl-37871188

RESUMEN

Per- and polyfluoroalkyl substances (PFAS) are of high concern, with calls to regulate them as a class. In 2021, the Organisation for Economic Co-operation and Development (OECD) revised the definition of PFAS to include any chemical containing at least one saturated CF2 or CF3 moiety. The consequence is that one of the largest open chemical collections, PubChem, with 116 million compounds, now contains over 7 million PFAS under this revised definition. These numbers are several orders of magnitude higher than previously established PFAS lists (typically thousands of entries) and pose an incredible challenge to researchers and computational workflows alike. This article describes a dynamic, openly accessible effort to navigate and explore the >7 million PFAS and >21 million fluorinated compounds (September 2023) in PubChem by establishing the "PFAS and Fluorinated Compounds in PubChem" Classification Browser (or "PubChem PFAS Tree"). A total of 36500 nodes support browsing of the content according to several categories, including classification, structural properties, regulatory status, or presence in existing PFAS suspect lists. Additional annotation and associated data can be used to create subsets (and thus manageable suspect lists or databases) of interest for a wide range of environmental, regulatory, exposomics, and other applications.

Asunto(s)

Fluorocarburos , Contaminantes Químicos del Agua , Bases de Datos Factuales , Árboles

4.

Adding open spectral data to MassBank and PubChem using open source tools to support non-targeted exposomics of mixtures.

Elapavalore, Anjana; Kondic, Todor; Singh, Randolph R; Shoemaker, Benjamin A; Thiessen, Paul A; Zhang, Jian; Bolton, Evan E; Schymanski, Emma L.

Environ Sci Process Impacts ; 25(11): 1788-1801, 2023 Nov 15.

Artículo en Inglés | MEDLINE | ID: mdl-37431591

RESUMEN

The term "exposome" is defined as a comprehensive study of life-course environmental exposures and the associated biological responses. Humans are exposed to many different chemicals, which can pose a major threat to the well-being of humanity. Targeted or non-targeted mass spectrometry techniques are widely used to identify and characterize various environmental stressors when linking exposures to human health. However, identification remains challenging due to the huge chemical space applicable to exposomics, combined with the lack of sufficient relevant entries in spectral libraries. Addressing these challenges requires cheminformatics tools and database resources to share curated open spectral data on chemicals to improve the identification of chemicals in exposomics studies. This article describes efforts to contribute spectra relevant for exposomics to the open mass spectral library MassBank (https://www.massbank.eu) using various open source software efforts, including the R packages RMassBank and Shinyscreen. The experimental spectra were obtained from ten mixtures containing toxicologically relevant chemicals from the US Environmental Protection Agency (EPA) Non-Targeted Analysis Collaborative Trial (ENTACT). Following processing and curation, 5582 spectra from 783 of the 1268 ENTACT compounds were added to MassBank, and through this to other open spectral libraries (e.g., MoNA, GNPS) for community benefit. Additionally, an automated deposition and annotation workflow was developed with PubChem to enable the display of all MassBank mass spectra in PubChem, which is rerun with each MassBank release. The new spectral records have already been used in several studies to increase the confidence in identification in non-target small molecule identification workflows applied to environmental and exposomics research.

Asunto(s)

Exposición a Riesgos Ambientales , Programas Informáticos , Humanos , Espectrometría de Masas/métodos , Exposición a Riesgos Ambientales/análisis , Bases de Datos Factuales

5.

PubChem 2023 update.

Kim, Sunghwan; Chen, Jie; Cheng, Tiejun; Gindulyte, Asta; He, Jia; He, Siqian; Li, Qingliang; Shoemaker, Benjamin A; Thiessen, Paul A; Yu, Bo; Zaslavsky, Leonid; Zhang, Jian; Bolton, Evan E.

Nucleic Acids Res ; 51(D1): D1373-D1380, 2023 01 06.

Artículo en Inglés | MEDLINE | ID: mdl-36305812

RESUMEN

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves a wide range of use cases. In the past two years, a number of changes were made to PubChem. Data from more than 120 data sources was added to PubChem. Some major highlights include: the integration of Google Patents data into PubChem, which greatly expanded the coverage of the PubChem Patent data collection; the creation of the Cell Line and Taxonomy data collections, which provide quick and easy access to chemical information for a given cell line and taxon, respectively; and the update of the bioassay data model. In addition, new functionalities were added to the PubChem programmatic access protocols, PUG-REST and PUG-View, including support for target-centric data download for a given protein, gene, pathway, cell line, and taxon and the addition of the 'standardize' option to PUG-REST, which returns the standardized form of an input chemical structure. A significant update was also made to PubChemRDF. The present paper provides an overview of these changes.

Asunto(s)

Bases de Datos de Compuestos Químicos , Descubrimiento de Drogas , Descubrimiento de Drogas/métodos , Bioensayo , Proteínas , Quimioinformática

6.

PubChem Protein, Gene, Pathway, and Taxonomy Data Collections: Bridging Biology and Chemistry through Target-Centric Views of PubChem Data.

Kim, Sunghwan; Cheng, Tiejun; He, Siqian; Thiessen, Paul A; Li, Qingliang; Gindulyte, Asta; Bolton, Evan E.

J Mol Biol ; 434(11): 167514, 2022 06 15.

Artículo en Inglés | MEDLINE | ID: mdl-35227770

RESUMEN

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public chemical database at the U.S. National Institutes of Health. Visited by millions of users every month, it plays a role as a key chemical information resource for biomedical research communities. Data in PubChem is from hundreds of contributors and organized into multiple collections by record type. Among these are the Protein, Gene, Pathway, and Taxonomy data collections. Records in these collections contain information on chemicals related to a given biological target (i.e., protein, gene, pathway, or taxon), helping users to analyze and interpret the biological activity data of molecules. In addition, annotations about the biological targets are collected from authoritative or curated data sources and integrated into the four collections. The content can be programmatically accessed through PubChem's web service interfaces (including PUG View). A machine-readable representation of this content is also provided within PubChemRDF.

Asunto(s)

Bases de Datos de Compuestos Químicos , Biología , Descubrimiento de Drogas , Proteínas/genética

7.

Discovering pesticides and their TPs in Luxembourg waters using open cheminformatics approaches.

Krier, Jessy; Singh, Randolph R; Kondic, Todor; Lai, Adelene; Diderich, Philippe; Zhang, Jian; Thiessen, Paul A; Bolton, Evan E; Schymanski, Emma L.

Environ Int ; 158: 106885, 2022 01.

Artículo en Inglés | MEDLINE | ID: mdl-34560325

RESUMEN

The diversity of hundreds of thousands of potential organic pollutants and the lack of (publicly available) information about many of them is a huge challenge for environmental sciences, engineering, and regulation. Suspect screening based on high-resolution liquid chromatography-mass spectrometry (LC-HRMS) has enormous potential to help characterize the presence of these chemicals in our environment, enabling the detection of known and newly emerging pollutants, as well as their potential transformation products (TPs). Here, suspect list creation (focusing on pesticides relevant for Luxembourg, incorporating data sources in 4 languages) was coupled to an automated retrieval of related TPs from PubChem based on high confidence suspect hits, to screen for pesticides and their TPs in Luxembourgish river samples. A computational workflow was established to combine LC-HRMS analysis and pre-screening of the suspects (including automated quality control steps), with spectral annotation to determine which pesticides and, in a second step, their related TPs may be present in the samples. The data analysis with Shinyscreen (https://gitlab.lcsb.uni.lu/eci/shinyscreen/), an open source software developed in house, coupled with custom-made scripts, revealed the presence of 162 potential pesticide masses and 96 potential TP masses in the samples. Further identification of these mass matches was performed using the open source approach MetFrag (https://msbi.ipb-halle.de/MetFrag/). Eventual target analysis of 36 suspects resulted in 31 pesticides and TPs confirmed at Level-1 (highest confidence), and five pesticides and TPs not confirmed due to different retention times. Spatio-temporal analysis of the results showed that TPs and pesticides followed similar trends, with a maximum number of potential detections in July. The highest detections were in the rivers Alzette and Mess and the lowest in the Sûre and Eisch. This study (a) added pesticides, classification information and related TPs into the open domain, (b) developed automated open source retrieval methods - both enhancing FAIRness (Findability, Accessibility, Interoperability and Reusability) of the data and methods; and (c) will directly support "L'Administration de la Gestion de l'Eau" on further monitoring steps in Luxembourg.

Asunto(s)

Plaguicidas , Contaminantes Químicos del Agua , Quimioinformática , Luxemburgo , Plaguicidas/análisis , Ríos , Contaminantes Químicos del Agua/análisis

8.

PubChem Periodic Table and Element Pages: Improving Access to Information on Chemical Elements from Authoritative Sources.

Kim, Sunghwan; Gindulyte, Asta; Zhang, Jian; Thiessen, Paul A; Bolton, Evan E.

Chem Teach Int ; 3(1): 57-65, 2021 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-34268481

RESUMEN

PubChem (https://pubchem.ncbi.nlm.nih.gov) is one of the top five most visited chemistry web sites in the world, with more than five million unique users per month (as of March 2020). Many of these users are educators, undergraduate students, and graduate students at academic institutions. Therefore, PubChem has a great potential as an online resource for chemical education. This paper describes the PubChem Periodic Table and Element pages, which were recently introduced to celebrate the 150th anniversary of the periodic table. These services help users navigate the abundant chemical element data available within PubChem, while providing a convenient entry point to explore additional chemical content, such as biological activities and health and safety data available in PubChem Compound pages for specific elements and their isotopes. The PubChem Periodic Table and Element pages are also available as widgets, which enable web developers to display PubChem's element data on web pages they design. The elemental data can be downloaded in common file formats and imported into data analysis programs (e.g., spreadsheet software, like Microsoft Excel and Google Sheets, and computer scripts, such as python and R). Overall, the PubChem Periodic Table and Element pages improve access to chemical element data from authoritative sources.

9.

Empowering large chemical knowledge bases for exposomics: PubChemLite meets MetFrag.

Schymanski, Emma L; Kondic, Todor; Neumann, Steffen; Thiessen, Paul A; Zhang, Jian; Bolton, Evan E.

J Cheminform ; 13(1): 19, 2021 Mar 08.

Artículo en Inglés | MEDLINE | ID: mdl-33685519

RESUMEN

Compound (or chemical) databases are an invaluable resource for many scientific disciplines. Exposomics researchers need to find and identify relevant chemicals that cover the entirety of potential (chemical and other) exposures over entire lifetimes. This daunting task, with over 100 million chemicals in the largest chemical databases, coupled with broadly acknowledged knowledge gaps in these resources, leaves researchers faced with too much-yet not enough-information at the same time to perform comprehensive exposomics research. Furthermore, the improvements in analytical technologies and computational mass spectrometry workflows coupled with the rapid growth in databases and increasing demand for high throughput "big data" services from the research community present significant challenges for both data hosts and workflow developers. This article explores how to reduce candidate search spaces in non-target small molecule identification workflows, while increasing content usability in the context of environmental and exposomics analyses, so as to profit from the increasing size and information content of large compound databases, while increasing efficiency at the same time. In this article, these methods are explored using PubChem, the NORMAN Network Suspect List Exchange and the in silico fragmentation approach MetFrag. A subset of the PubChem database relevant for exposomics, PubChemLite, is presented as a database resource that can be (and has been) integrated into current workflows for high resolution mass spectrometry. Benchmarking datasets from earlier publications are used to show how experimental knowledge and existing datasets can be used to detect and fill gaps in compound databases to progressively improve large resources such as PubChem, and topic-specific subsets such as PubChemLite. PubChemLite is a living collection, updating as annotation content in PubChem is updated, and exported to allow direct integration into existing workflows such as MetFrag. The source code and files necessary to recreate or adjust this are jointly hosted between the research parties (see data availability statement). This effort shows that enhancing the FAIRness (Findability, Accessibility, Interoperability and Reusability) of open resources can mutually enhance several resources for whole community benefit. The authors explicitly welcome additional community input on ideas for future developments.

10.

PubChem in 2021: new data content and improved web interfaces.

Kim, Sunghwan; Chen, Jie; Cheng, Tiejun; Gindulyte, Asta; He, Jia; He, Siqian; Li, Qingliang; Shoemaker, Benjamin A; Thiessen, Paul A; Yu, Bo; Zaslavsky, Leonid; Zhang, Jian; Bolton, Evan E.

Nucleic Acids Res ; 49(D1): D1388-D1395, 2021 01 08.

Artículo en Inglés | MEDLINE | ID: mdl-33151290

RESUMEN

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves the scientific community as well as the general public, with millions of unique users per month. In the past two years, PubChem made substantial improvements. Data from more than 100 new data sources were added to PubChem, including chemical-literature links from Thieme Chemistry, chemical and physical property links from SpringerMaterials, and patent links from the World Intellectual Properties Organization (WIPO). PubChem's homepage and individual record pages were updated to help users find desired information faster. This update involved a data model change for the data objects used by these pages as well as by programmatic users. Several new services were introduced, including the PubChem Periodic Table and Element pages, Pathway pages, and Knowledge panels. Additionally, in response to the coronavirus disease 2019 (COVID-19) outbreak, PubChem created a special data collection that contains PubChem data related to COVID-19 and the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

Asunto(s)

COVID-19/prevención & control , Bases de Datos de Compuestos Químicos , Almacenamiento y Recuperación de la Información/estadística & datos numéricos , SARS-CoV-2/aislamiento & purificación , Interfaz Usuario-Computador , COVID-19/epidemiología , COVID-19/virología , Descubrimiento de Drogas/estadística & datos numéricos , Epidemias , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Salud Pública/estadística & datos numéricos , SARS-CoV-2/fisiología , Programas Informáticos

11.

PUG-View: programmatic access to chemical annotations integrated in PubChem.

Kim, Sunghwan; Thiessen, Paul A; Cheng, Tiejun; Zhang, Jian; Gindulyte, Asta; Bolton, Evan E.

J Cheminform ; 11(1): 56, 2019 Aug 09.

Artículo en Inglés | MEDLINE | ID: mdl-31399858

RESUMEN

PubChem is a chemical data repository that provides comprehensive information on various chemical entities. It contains a wealth of chemical information from hundreds of data sources. Programmatic access to this large amount of data provides researchers with new opportunities for data-intensive research. PubChem provides several programmatic access routes. One of these is PUG-View, which is a Representational State Transfer (REST)-style web service interface specialized for accessing annotation data contained in PubChem. The present paper describes various aspects of PUG-View, including the scope of data accessible through PUG-View, the syntax for formulating a PUG-View request URL, the difference of PUG-View from other web service interfaces in PubChem, and its limitations and usage policies.

12.

PubChem 2019 update: improved access to chemical data.

Kim, Sunghwan; Chen, Jie; Cheng, Tiejun; Gindulyte, Asta; He, Jia; He, Siqian; Li, Qingliang; Shoemaker, Benjamin A; Thiessen, Paul A; Yu, Bo; Zaslavsky, Leonid; Zhang, Jian; Bolton, Evan E.

Nucleic Acids Res ; 47(D1): D1102-D1109, 2019 01 08.

Artículo en Inglés | MEDLINE | ID: mdl-30371825

RESUMEN

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a key chemical information resource for the biomedical research community. Substantial improvements were made in the past few years. New data content was added, including spectral information, scientific articles mentioning chemicals, and information for food and agricultural chemicals. PubChem released new web interfaces, such as PubChem Target View page, Sources page, Bioactivity dyad pages and Patent View page. PubChem also released a major update to PubChem Widgets and introduced a new programmatic access interface, called PUG-View. This paper describes these new developments in PubChem.

Asunto(s)

Biología Computacional/métodos , Bases de Datos de Compuestos Químicos , Preparaciones Farmacéuticas/química , Bibliotecas de Moléculas Pequeñas/química , Animales , Bioensayo/métodos , Descubrimiento de Drogas/métodos , Ensayos Analíticos de Alto Rendimiento/métodos , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Estructura Molecular , Patentes como Asunto , Relación Estructura-Actividad

13.

An update on PUG-REST: RESTful interface for programmatic access to PubChem.

Kim, Sunghwan; Thiessen, Paul A; Cheng, Tiejun; Yu, Bo; Bolton, Evan E.

Nucleic Acids Res ; 46(W1): W563-W570, 2018 07 02.

Artículo en Inglés | MEDLINE | ID: mdl-29718389

RESUMEN

PubChem (https://pubchem.ncbi.nlm.nih.gov) is one of the largest open chemical information resources available. It currently receives millions of unique users per month on average, serving as a key resource for many research fields such as cheminformatics, chemical biology, medicinal chemistry, and drug discovery. PubChem provides multiple programmatic access routes to its data and services. One of them is PUG-REST, a Representational State Transfer (REST)-like web service interface to PubChem. On average, PUG-REST receives more than a million requests per day from tens of thousands of unique users. The present paper provides an update on PUG-REST since our previous paper published in 2015. This includes access to new kinds of data (e.g. concise bioactivity data, table of contents headings, etc.), full implementation of synchronous fast structure search, support for assay data retrieval using accession identifiers in response to the deprecation of NCBI's GI numbers, data exchange between PUG-REST and NCBI's E-Utilities through the List Gateway, implementation of dynamic traffic control through throttling, and enhanced usage policies. In addition, example Perl scripts are provided, which the user can easily modify, run, or translate into another scripting language.

Asunto(s)

Química Farmacéutica/métodos , Descubrimiento de Drogas/métodos , Lenguajes de Programación , Interfaz Usuario-Computador , Bases de Datos de Compuestos Químicos , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Bibliotecas de Moléculas Pequeñas/farmacología

14.

PubChem BioAssay: 2017 update.

Wang, Yanli; Bryant, Stephen H; Cheng, Tiejun; Wang, Jiyao; Gindulyte, Asta; Shoemaker, Benjamin A; Thiessen, Paul A; He, Siqian; Zhang, Jian.

Nucleic Acids Res ; 45(D1): D955-D963, 2017 01 04.

Artículo en Inglés | MEDLINE | ID: mdl-27899599

RESUMEN

PubChem's BioAssay database (https://pubchem.ncbi.nlm.nih.gov) has served as a public repository for small-molecule and RNAi screening data since 2004 providing open access of its data content to the community. PubChem accepts data submission from worldwide researchers at academia, industry and government agencies. PubChem also collaborates with other chemical biology database stakeholders with data exchange. With over a decade's development effort, it becomes an important information resource supporting drug discovery and chemical biology research. To facilitate data discovery, PubChem is integrated with all other databases at NCBI. In this work, we provide an update for the PubChem BioAssay database describing several recent development including added sources of research data, redesigned BioAssay record page, new BioAssay classification browser and new features in the Upload system facilitating data sharing.

Asunto(s)

Bases de Datos de Compuestos Químicos , Bases de Datos de Ácidos Nucleicos , Interferencia de ARN , Motor de Búsqueda , Bibliotecas de Moléculas Pequeñas , Descubrimiento de Drogas , Regulación de la Expresión Génica/efectos de los fármacos , Humanos , Programas Informáticos , Interfaz Usuario-Computador , Navegador Web

15.

Literature information in PubChem: associations between PubChem records and scientific articles.

Kim, Sunghwan; Thiessen, Paul A; Cheng, Tiejun; Yu, Bo; Shoemaker, Benjamin A; Wang, Jiyao; Bolton, Evan E; Wang, Yanli; Bryant, Stephen H.

J Cheminform ; 8: 32, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-27293485

RESUMEN

BACKGROUND: PubChem is an open archive consisting of a set of three primary public databases (BioAssay, Compound, and Substance). It contains information on a broad range of chemical entities, including small molecules, lipids, carbohydrates, and (chemically modified) amino acid and nucleic acid sequences (including siRNA and miRNA). Currently (as of Nov. 2015), PubChem contains more than 150 million depositor-provided chemical substance descriptions, 60 million unique chemical structures, and 225 million biological activity test results provided from over 1 million biological assay records. DESCRIPTION: Many PubChem records (substances, compounds, and assays) include depositor-provided cross-references to scientific articles in PubMed. Some PubChem contributors provide bioactivity data extracted from scientific articles. Literature-derived bioactivity data complement high-throughput screening (HTS) data from the concluded NIH Molecular Libraries Program and other HTS projects. Some journals provide PubChem with information on chemicals that appear in their newly published articles, enabling concurrent publication of scientific articles in journals and associated data in public databases. In addition, PubChem links records to PubMed articles indexed with the Medical Subject Heading (MeSH) controlled vocabulary thesaurus. CONCLUSION: Literature information, both provided by depositors and derived from MeSH annotations, can be accessed using PubChem's web interfaces, enabling users to explore information available in literature related to PubChem records beyond typical web search results. GRAPHICAL ABSTRACT: Graphical abstractLiterature information for PubChem records is derived from various sources.

16.

PubChem Substance and Compound databases.

Kim, Sunghwan; Thiessen, Paul A; Bolton, Evan E; Chen, Jie; Fu, Gang; Gindulyte, Asta; Han, Lianyi; He, Jane; He, Siqian; Shoemaker, Benjamin A; Wang, Jiyao; Yu, Bo; Zhang, Jian; Bryant, Stephen H.

Nucleic Acids Res ; 44(D1): D1202-13, 2016 Jan 04.

Artículo en Inglés | MEDLINE | ID: mdl-26400175

RESUMEN

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public repository for information on chemical substances and their biological activities, launched in 2004 as a component of the Molecular Libraries Roadmap Initiatives of the US National Institutes of Health (NIH). For the past 11 years, PubChem has grown to a sizable system, serving as a chemical information resource for the scientific research community. PubChem consists of three inter-linked databases, Substance, Compound and BioAssay. The Substance database contains chemical information deposited by individual data contributors to PubChem, and the Compound database stores unique chemical structures extracted from the Substance database. Biological activity data of chemical substances tested in assay experiments are contained in the BioAssay database. This paper provides an overview of the PubChem Substance and Compound databases, including data sources and contents, data organization, data submission using PubChem Upload, chemical structure standardization, web-based interfaces for textual and non-textual searches, and programmatic access. It also gives a brief description of PubChem3D, a resource derived from theoretical three-dimensional structures of compounds in PubChem, as well as PubChemRDF, Resource Description Framework (RDF)-formatted PubChem data for data sharing, analysis and integration with information contained in other databases.

Asunto(s)

Bases de Datos de Compuestos Químicos , Internet , Estructura Molecular , Preparaciones Farmacéuticas/química , Programas Informáticos

17.

PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem.

Kim, Sunghwan; Thiessen, Paul A; Bolton, Evan E; Bryant, Stephen H.

Nucleic Acids Res ; 43(W1): W605-11, 2015 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-25934803

RESUMEN

PubChem (http://pubchem.ncbi.nlm.nih.gov) is a public repository for information on chemical substances and their biological activities, developed and maintained by the US National Institutes of Health (NIH). PubChem contains more than 180 million depositor-provided chemical substance descriptions, 60 million unique chemical structures and 225 million bioactivity assay results, covering more than 9000 unique protein target sequences. As an information resource for the chemical biology research community, it routinely receives more than 1 million requests per day from an estimated more than 1 million unique users per month. Programmatic access to this vast amount of data is provided by several different systems, including the US National Center for Biotechnology Information (NCBI)'s Entrez Utilities (E-Utilities or E-Utils) and the PubChem Power User Gateway (PUG)-a common gateway interface (CGI) that exchanges data through eXtended Markup Language (XML). Further simplifying programmatic access, PubChem provides two additional general purpose web services: PUG-SOAP, which uses the simple object access protocol (SOAP) and PUG-REST, which is a Representational State Transfer (REST)-style interface. These interfaces can be harnessed in combination to access the data contained in PubChem, which is integrated with the more than thirty databases available within the NCBI Entrez system.

Asunto(s)

Bases de Datos de Compuestos Químicos , Interfaz Usuario-Computador , Internet , Integración de Sistemas

18.

MMDB and VAST+: tracking structural similarities between macromolecular complexes.

Madej, Thomas; Lanczycki, Christopher J; Zhang, Dachuan; Thiessen, Paul A; Geer, Renata C; Marchler-Bauer, Aron; Bryant, Stephen H.

Nucleic Acids Res ; 42(Database issue): D297-303, 2014 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-24319143

RESUMEN

The computational detection of similarities between protein 3D structures has become an indispensable tool for the detection of homologous relationships, the classification of protein families and functional inference. Consequently, numerous algorithms have been developed that facilitate structure comparison, including rapid searches against a steadily growing collection of protein structures. To this end, NCBI's Molecular Modeling Database (MMDB), which is based on the Protein Data Bank (PDB), maintains a comprehensive and up-to-date archive of protein structure similarities computed with the Vector Alignment Search Tool (VAST). These similarities have been recorded on the level of single proteins and protein domains, comprising in excess of 1.5 billion pairwise alignments. Here we present VAST+, an extension to the existing VAST service, which summarizes and presents structural similarity on the level of biological assemblies or macromolecular complexes. VAST+ simplifies structure neighboring results and shows, for macromolecular complexes tracked in MMDB, lists of similar complexes ranked by the extent of similarity. VAST+ replaces the previous VAST service as the default presentation of structure neighboring data in NCBI's Entrez query and retrieval system. MMDB and VAST+ can be accessed via http://www.ncbi.nlm.nih.gov/Structure.

Asunto(s)

Bases de Datos de Proteínas , Homología Estructural de Proteína , Gráficos por Computador , Internet , Sustancias Macromoleculares/química , Modelos Moleculares , Programas Informáticos

19.

MMDB: 3D structures and macromolecular interactions.

Madej, Thomas; Addess, Kenneth J; Fong, Jessica H; Geer, Lewis Y; Geer, Renata C; Lanczycki, Christopher J; Liu, Chunlei; Lu, Shennan; Marchler-Bauer, Aron; Panchenko, Anna R; Chen, Jie; Thiessen, Paul A; Wang, Yanli; Zhang, Dachuan; Bryant, Stephen H.

Nucleic Acids Res ; 40(Database issue): D461-4, 2012 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-22135289

RESUMEN

Close to 60% of protein sequences tracked in comprehensive databases can be mapped to a known three-dimensional (3D) structure by standard sequence similarity searches. Potentially, a great deal can be learned about proteins or protein families of interest from considering 3D structure, and to this day 3D structure data may remain an underutilized resource. Here we present enhancements in the Molecular Modeling Database (MMDB) and its data presentation, specifically pertaining to biologically relevant complexes and molecular interactions. MMDB is tightly integrated with NCBI's Entrez search and retrieval system, and mirrors the contents of the Protein Data Bank. It links protein 3D structure data with sequence data, sequence classification resources and PubChem, a repository of small-molecule chemical structures and their biological activities, facilitating access to 3D structure data not only for structural biologists, but also for molecular biologists and chemists. MMDB provides a complete set of detailed and pre-computed structural alignments obtained with the VAST algorithm, and provides visualization tools for 3D structure and structure/sequence alignment via the molecular graphics viewer Cn3D. MMDB can be accessed at http://www.ncbi.nlm.nih.gov/structure.

Asunto(s)

Bases de Datos de Proteínas , Modelos Moleculares , Conformación Proteica , Análisis de Secuencia de Proteína

20.

PubChem3D: a new resource for scientists.

Bolton, Evan E; Chen, Jie; Kim, Sunghwan; Han, Lianyi; He, Siqian; Shi, Wenyao; Simonyan, Vahan; Sun, Yan; Thiessen, Paul A; Wang, Jiyao; Yu, Bo; Zhang, Jian; Bryant, Stephen H.

J Cheminform ; 3(1): 32, 2011 Sep 20.

Artículo en Inglés | MEDLINE | ID: mdl-21933373

RESUMEN

BACKGROUND: PubChem is an open repository for small molecules and their experimental biological activity. PubChem integrates and provides search, retrieval, visualization, analysis, and programmatic access tools in an effort to maximize the utility of contributed information. There are many diverse chemical structures with similar biological efficacies against targets available in PubChem that are difficult to interrelate using traditional 2-D similarity methods. A new layer called PubChem3D is added to PubChem to assist in this analysis. DESCRIPTION: PubChem generates a 3-D conformer model description for 92.3% of all records in the PubChem Compound database (when considering the parent compound of salts). Each of these conformer models is sampled to remove redundancy, guaranteeing a minimum (non-hydrogen atom pair-wise) RMSD between conformers. A diverse conformer ordering gives a maximal description of the conformational diversity of a molecule when only a subset of available conformers is used. A pre-computed search per compound record gives immediate access to a set of 3-D similar compounds (called "Similar Conformers") in PubChem and their respective superpositions. Systematic augmentation of PubChem resources to include a 3-D layer provides users with new capabilities to search, subset, visualize, analyze, and download data.A series of retrospective studies help to demonstrate important connections between chemical structures and their biological function that are not obvious using 2-D similarity but are readily apparent by 3-D similarity. CONCLUSIONS: The addition of PubChem3D to the existing contents of PubChem is a considerable achievement, given the scope, scale, and the fact that the resource is publicly accessible and free. With the ability to uncover latent structure-activity relationships of chemical structures, while complementing 2-D similarity analysis approaches, PubChem3D represents a new resource for scientists to exploit when exploring the biological annotations in PubChem.

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA