Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 70
Filtrar
1.
J Cheminform ; 16(1): 69, 2024 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-38880887

RESUMO

PubChem ( https://pubchem.ncbi.nlm.nih.gov ) is a public chemical information resource containing more than 100 million unique chemical structures. One of the most requested tasks in PubChem and other chemical databases is to search chemicals by name (also commonly called a "chemical synonym"). PubChem performs this task by looking up chemical synonym-structure associations provided by individual depositors to PubChem. In addition, these synonyms are used for many purposes, including creating links between chemicals and PubMed articles (using Medical Subject Headings (MeSH) terms). However, these depositor-provided name-structure associations are subject to substantial discrepancies within and between depositors, making it difficult to unambiguously map a chemical name to a specific chemical structure. The present paper describes PubChem's crowdsourcing-based synonym filtering strategy, which resolves inter- and intra-depositor discrepancies in synonym-structure associations as well as in the chemical-MeSH associations. The PubChem synonym filtering process was developed based on the analysis of four crowd-voting strategies, which differ in the consistency threshold value employed (60% vs 70%) and how to resolve intra-depositor discrepancies (a single vote vs. multiple votes per depositor) prior to inter-depositor crowd-voting. The agreement of voting was determined at six levels of chemical equivalency, which considers varying isotopic composition, stereochemistry, and connectivity of chemical structures and their primary components. While all four strategies showed comparable results, Strategy I (one vote per depositor with a 60% consistency threshold) resulted in the most synonyms assigned to a single chemical structure as well as the most synonym-structure associations disambiguated at the six chemical equivalency contexts. Based on the results of this study, Strategy I was implemented in PubChem's filtering process that cleans up synonym-structure associations as well as chemical-MeSH associations. This consistency-based filtering process is designed to look for a consensus in name-structure associations but cannot attest to their correctness. As a result, it can fail to recognize correct name-structure associations (or incorrect ones), for example, when a synonym is provided by only one depositor or when many contributors are incorrect. However, this filtering process is an important starting point for quality control in name-structure associations in large chemical databases like PubChem.

2.
Environ Sci Technol ; 58(9): 4181-4192, 2024 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-38373301

RESUMO

Alzheimer's disease (AD) is a complex and multifactorial neurodegenerative disease, which is currently diagnosed via clinical symptoms and nonspecific biomarkers (such as Aß1-42, t-Tau, and p-Tau) measured in cerebrospinal fluid (CSF), which alone do not provide sufficient insights into disease progression. In this pilot study, these biomarkers were complemented with small-molecule analysis using non-target high-resolution mass spectrometry coupled with liquid chromatography (LC) on the CSF of three groups: AD, mild cognitive impairment (MCI) due to AD, and a non-demented (ND) control group. An open-source cheminformatics pipeline based on MS-DIAL and patRoon was enhanced using CSF- and AD-specific suspect lists to assist in data interpretation. Chemical Similarity Enrichment Analysis revealed a significant increase of hydroxybutyrates in AD, including 3-hydroxybutanoic acid, which was found at higher levels in AD compared to MCI and ND. Furthermore, a highly sensitive target LC-MS method was used to quantify 35 bile acids (BAs) in the CSF, revealing several statistically significant differences including higher dehydrolithocholic acid levels and decreased conjugated BA levels in AD. This work provides several promising small-molecule hypotheses that could be used to help track the progression of AD in CSF samples.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Doenças Neurodegenerativas , Humanos , Doença de Alzheimer/líquido cefalorraquidiano , Doença de Alzheimer/diagnóstico , Doença de Alzheimer/psicologia , Proteínas tau/líquido cefalorraquidiano , Peptídeos beta-Amiloides/líquido cefalorraquidiano , Projetos Piloto , Disfunção Cognitiva/líquido cefalorraquidiano , Disfunção Cognitiva/diagnóstico , Disfunção Cognitiva/psicologia , Biomarcadores , Progressão da Doença
3.
Nucleic Acids Res ; 52(D1): D33-D43, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37994677

RESUMO

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, the NIH Comparative Genomics Resource (CGR), NCBI Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.


Assuntos
Bases de Dados Genéticas , National Library of Medicine (U.S.) , Biotecnologia/instrumentação , Bases de Dados de Ácidos Nucleicos , Internet , Estados Unidos
4.
J Cheminform ; 15(1): 123, 2023 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-38115123

RESUMO

Knowledge about the 3-dimensional structure, orientation and interaction of chemical compounds is important in many areas of science and technology. X-ray crystallography is one of the experimental techniques capable of providing a large amount of structural information for a given compound, and it is widely used for characterisation of organic and metal-organic molecules. The method provides precise 3D coordinates of atoms inside crystals, however, it does not directly deliver information about certain chemical characteristics such as bond orders, delocalization, charges, lone electron pairs or lone electrons. These aspects of a molecular model have to be derived from crystallographic data using refined information about interatomic distances and atom types as well as employing general chemical knowledge. This publication describes a curated automatic pipeline for the derivation of chemical attributes of molecules from crystallographic models. The method is applied to build a catalogue of chemical entities in an open-access crystallographic database, the Crystallography Open Database (COD). The catalogue of such chemical entities is provided openly as a derived database. The content of this catalogue and the problems arising in the fully automated pipeline are discussed, along with the possibilities to introduce manual data curation into the process.

5.
Environ Sci Technol ; 57(44): 16918-16928, 2023 11 07.
Artigo em Inglês | MEDLINE | ID: mdl-37871188

RESUMO

Per- and polyfluoroalkyl substances (PFAS) are of high concern, with calls to regulate them as a class. In 2021, the Organisation for Economic Co-operation and Development (OECD) revised the definition of PFAS to include any chemical containing at least one saturated CF2 or CF3 moiety. The consequence is that one of the largest open chemical collections, PubChem, with 116 million compounds, now contains over 7 million PFAS under this revised definition. These numbers are several orders of magnitude higher than previously established PFAS lists (typically thousands of entries) and pose an incredible challenge to researchers and computational workflows alike. This article describes a dynamic, openly accessible effort to navigate and explore the >7 million PFAS and >21 million fluorinated compounds (September 2023) in PubChem by establishing the "PFAS and Fluorinated Compounds in PubChem" Classification Browser (or "PubChem PFAS Tree"). A total of 36500 nodes support browsing of the content according to several categories, including classification, structural properties, regulatory status, or presence in existing PFAS suspect lists. Additional annotation and associated data can be used to create subsets (and thus manageable suspect lists or databases) of interest for a wide range of environmental, regulatory, exposomics, and other applications.


Assuntos
Fluorocarbonos , Poluentes Químicos da Água , Bases de Dados Factuais , Árvores
6.
Environ Sci Technol Lett ; 10(10): 865-871, 2023 Oct 10.
Artigo em Inglês | MEDLINE | ID: mdl-37840815

RESUMO

Transformation product (TP) information is essential to accurately evaluate the hazards compounds pose to human health and the environment. However, information about TPs is often limited, and existing data is often not fully Findable, Accessible, Interoperable, and Reusable (FAIR). FAIRifying existing TP knowledge is a relatively easy path toward improving access to data for identification workflows and for machine-learning-based algorithms. ShinyTPs was developed to curate existing transformation information derived from text-mined data within the PubChem database. The application (available as an R package) visualizes the text-mined chemical names to facilitate the user validation of the automatically extracted reactions. ShinyTPs was applied to a case study using 436 tentatively identified compounds to prioritize TP retrieval. This resulted in the extraction of 645 reactions (associated with 496 compounds), of which 319 were not previously available in PubChem. The curated reactions were added to the PubChem Transformations library, which was used as a TP suspect list for identification of TPs using the open-source workflow patRoon. In total, 72 compounds from the library were tentatively identified, 18% of which were curated using ShinyTPs, showing that the app can help support TP identification in non-target analysis workflows.

7.
Environ Sci Process Impacts ; 25(11): 1788-1801, 2023 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-37431591

RESUMO

The term "exposome" is defined as a comprehensive study of life-course environmental exposures and the associated biological responses. Humans are exposed to many different chemicals, which can pose a major threat to the well-being of humanity. Targeted or non-targeted mass spectrometry techniques are widely used to identify and characterize various environmental stressors when linking exposures to human health. However, identification remains challenging due to the huge chemical space applicable to exposomics, combined with the lack of sufficient relevant entries in spectral libraries. Addressing these challenges requires cheminformatics tools and database resources to share curated open spectral data on chemicals to improve the identification of chemicals in exposomics studies. This article describes efforts to contribute spectra relevant for exposomics to the open mass spectral library MassBank (https://www.massbank.eu) using various open source software efforts, including the R packages RMassBank and Shinyscreen. The experimental spectra were obtained from ten mixtures containing toxicologically relevant chemicals from the US Environmental Protection Agency (EPA) Non-Targeted Analysis Collaborative Trial (ENTACT). Following processing and curation, 5582 spectra from 783 of the 1268 ENTACT compounds were added to MassBank, and through this to other open spectral libraries (e.g., MoNA, GNPS) for community benefit. Additionally, an automated deposition and annotation workflow was developed with PubChem to enable the display of all MassBank mass spectra in PubChem, which is rerun with each MassBank release. The new spectral records have already been used in several studies to increase the confidence in identification in non-target small molecule identification workflows applied to environmental and exposomics research.


Assuntos
Exposição Ambiental , Software , Humanos , Espectrometria de Massas/métodos , Exposição Ambiental/análise , Bases de Dados Factuais
8.
Glycobiology ; 33(6): 454-463, 2023 06 21.
Artigo em Inglês | MEDLINE | ID: mdl-37129482

RESUMO

The GlyCosmos Glycoscience Portal (https://glycosmos.org) and PubChem (https://pubchem.ncbi.nlm.nih.gov/) are major portals for glycoscience and chemistry, respectively. GlyCosmos is a portal for glycan-related repositories, including GlyTouCan, GlycoPOST, and UniCarb-DR, as well as for glycan-related data resources that have been integrated from a variety of 'omics databases. Glycogenes, glycoproteins, lectins, pathways, and disease information related to glycans are accessible from GlyCosmos. PubChem, on the other hand, is a chemistry-based portal at the National Center for Biotechnology Information. PubChem provides information not only on chemicals, but also genes, proteins, pathways, as well as patents, bioassays, and more, from hundreds of data resources from around the world. In this work, these 2 portals have made substantial efforts to integrate their complementary data to allow users to cross between these 2 domains. In addition to glycan structures, key information, such as glycan-related genes, relevant diseases, glycoproteins, and pathways, was integrated and cross-linked with one another. The interfaces were designed to enable users to easily find, access, download, and reuse data of interest across these resources. Use cases are described illustrating and highlighting the type of content that can be investigated. In total, these integrations provide life science researchers improved awareness and enhanced access to glycan-related information.


Assuntos
Bases de Dados de Compostos Químicos , Polissacarídeos , Glicosilação , Fluxo de Trabalho , Informática , Polissacarídeos/química , Glicoconjugados/química
9.
Glycobiology ; 33(2): 99-103, 2023 03 06.
Artigo em Inglês | MEDLINE | ID: mdl-36648443

RESUMO

Nonulosonic acids or non-2-ulosonic acids (NulOs) are an ancient family of 2-ketoaldonic acids (α-ketoaldonic acids) with a 9-carbon backbone. In nature, these monosaccharides occur either in a 3-deoxy form (referred to as "sialic acids") or in a 3,9-dideoxy "sialic-acid-like" form. The former sialic acids are most common in the deuterostome lineage, including vertebrates, and mimicked by some of their pathogens. The latter sialic-acid-like molecules are found in bacteria and archaea. NulOs are often prominently positioned at the outermost tips of cell surface glycans, and have many key roles in evolution, biology and disease. The diversity of stereochemistry and structural modifications among the NulOs contributes to more than 90 sialic acid forms and 50 sialic-acid-like variants described thus far in nature. This paper reports the curation of these diverse naturally occurring NulOs at the NCBI sialic acid page (https://www.ncbi.nlm.nih.gov/glycans/sialic.html) as part of the NCBI-Glycans initiative. This includes external links to relevant Carbohydrate Structure Databases. As the amino and hydroxyl groups of these monosaccharides are extensively derivatized by various substituents in nature, the Symbol Nomenclature For Glycans (SNFG) rules have been expanded to represent this natural diversity. These developments help illustrate the natural diversity of sialic acids and related NulOs, and enable their systematic representation in publications and online resources.


Assuntos
Ácido N-Acetilneuramínico , Ácidos Siálicos , Animais , Ácidos Siálicos/química , Polissacarídeos/química , Monossacarídeos , Catalogação
10.
Nucleic Acids Res ; 51(D1): D1373-D1380, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36305812

RESUMO

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves a wide range of use cases. In the past two years, a number of changes were made to PubChem. Data from more than 120 data sources was added to PubChem. Some major highlights include: the integration of Google Patents data into PubChem, which greatly expanded the coverage of the PubChem Patent data collection; the creation of the Cell Line and Taxonomy data collections, which provide quick and easy access to chemical information for a given cell line and taxon, respectively; and the update of the bioassay data model. In addition, new functionalities were added to the PubChem programmatic access protocols, PUG-REST and PUG-View, including support for target-centric data download for a given protein, gene, pathway, cell line, and taxon and the addition of the 'standardize' option to PUG-REST, which returns the standardized form of an input chemical structure. A significant update was also made to PubChemRDF. The present paper provides an overview of these changes.


Assuntos
Bases de Dados de Compostos Químicos , Descoberta de Drogas , Descoberta de Drogas/métodos , Bioensaio , Proteínas , Quimioinformática
11.
Nucleic Acids Res ; 51(D1): D29-D38, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36370100

RESUMO

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. New resources include the Comparative Genome Resource (CGR) and the BLAST ClusteredNR database. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, IgBLAST, GDV, RefSeq, NCBI Virus, GenBank type assemblies, iCn3D, ClinVar, GTR, dbGaP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.


Assuntos
Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Estados Unidos , National Library of Medicine (U.S.) , Alinhamento de Sequência , Biotecnologia , Internet
12.
Environ Sci Eur ; 34(1): 104, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36284750

RESUMO

Background: The NORMAN Association (https://www.norman-network.com/) initiated the NORMAN Suspect List Exchange (NORMAN-SLE; https://www.norman-network.com/nds/SLE/) in 2015, following the NORMAN collaborative trial on non-target screening of environmental water samples by mass spectrometry. Since then, this exchange of information on chemicals that are expected to occur in the environment, along with the accompanying expert knowledge and references, has become a valuable knowledge base for "suspect screening" lists. The NORMAN-SLE now serves as a FAIR (Findable, Accessible, Interoperable, Reusable) chemical information resource worldwide. Results: The NORMAN-SLE contains 99 separate suspect list collections (as of May 2022) from over 70 contributors around the world, totalling over 100,000 unique substances. The substance classes include per- and polyfluoroalkyl substances (PFAS), pharmaceuticals, pesticides, natural toxins, high production volume substances covered under the European REACH regulation (EC: 1272/2008), priority contaminants of emerging concern (CECs) and regulatory lists from NORMAN partners. Several lists focus on transformation products (TPs) and complex features detected in the environment with various levels of provenance and structural information. Each list is available for separate download. The merged, curated collection is also available as the NORMAN Substance Database (NORMAN SusDat). Both the NORMAN-SLE and NORMAN SusDat are integrated within the NORMAN Database System (NDS). The individual NORMAN-SLE lists receive digital object identifiers (DOIs) and traceable versioning via a Zenodo community (https://zenodo.org/communities/norman-sle), with a total of > 40,000 unique views, > 50,000 unique downloads and 40 citations (May 2022). NORMAN-SLE content is progressively integrated into large open chemical databases such as PubChem (https://pubchem.ncbi.nlm.nih.gov/) and the US EPA's CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard/), enabling further access to these lists, along with the additional functionality and calculated properties these resources offer. PubChem has also integrated significant annotation content from the NORMAN-SLE, including a classification browser (https://pubchem.ncbi.nlm.nih.gov/classification/#hid=101). Conclusions: The NORMAN-SLE offers a specialized service for hosting suspect screening lists of relevance for the environmental community in an open, FAIR manner that allows integration with other major chemical resources. These efforts foster the exchange of information between scientists and regulators, supporting the paradigm shift to the "one substance, one assessment" approach. New submissions are welcome via the contacts provided on the NORMAN-SLE website (https://www.norman-network.com/nds/SLE/). Supplementary Information: The online version contains supplementary material available at 10.1186/s12302-022-00680-6.

13.
Anal Bioanal Chem ; 414(25): 7399-7419, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35829770

RESUMO

Parkinson's disease (PD) is the second most prevalent neurodegenerative disease, with an increasing incidence in recent years due to the aging population. Genetic mutations alone only explain <10% of PD cases, while environmental factors, including small molecules, may play a significant role in PD. In the present work, 22 plasma (11 PD, 11 control) and 19 feces samples (10 PD, 9 control) were analyzed by non-target high-resolution mass spectrometry (NT-HRMS) coupled to two liquid chromatography (LC) methods (reversed-phase (RP) and hydrophilic interaction liquid chromatography (HILIC)). A cheminformatics workflow was optimized using open software (MS-DIAL and patRoon) and open databases (all public MSP-formatted spectral libraries for MS-DIAL, PubChemLite for Exposomics, and the LITMINEDNEURO list for patRoon). Furthermore, five disease-specific databases and three suspect lists (on PD and related disorders) were developed, using PubChem functionality to identifying relevant unknown chemicals. The results showed that non-target screening with the larger databases generally provided better results compared with smaller suspect lists. However, two suspect screening approaches with patRoon were also good options to study specific chemicals in PD. The combination of chromatographic methods (RP and HILIC) as well as two ionization modes (positive and negative) enhanced the coverage of chemicals in the biological samples. While most metabolomics studies in PD have focused on blood and cerebrospinal fluid, we found a higher number of relevant features in feces, such as alanine betaine or nicotinamide, which can be directly metabolized by gut microbiota. This highlights the potential role of gut dysbiosis in PD development.


Assuntos
Expossoma , Doenças Neurodegenerativas , Doença de Parkinson , Idoso , Alanina , Betaína , Quimioinformática , Humanos , Metaboloma , Metabolômica/métodos , Niacinamida , Projetos Piloto
14.
J Chem Inf Model ; 62(11): 2737-2743, 2022 06 13.
Artigo em Inglês | MEDLINE | ID: mdl-35559614

RESUMO

CAS Common Chemistry (https://commonchemistry.cas.org/) is an open web resource that provides access to reliable chemical substance information for the scientific community. Having served millions of visitors since its creation in 2009, the resource was extensively updated in 2021 with significant enhancements. The underlying dataset was expanded from 8000 to 500,000 chemical substances and includes additional associated information, such as basic properties and computer-readable chemical structure information. New use cases are supported with enhanced search capabilities and an integrated application programming interface. Reusable licensing of the content is provided through a Creative Commons Attribution-Non-Commercial (CC-BY-NC 4.0) license allowing other public resources to integrate the data into their systems. This paper provides an overview of the enhancements to data and functionality, discusses the benefits of the contribution to the chemistry community, and summarizes recent progress in leveraging this resource to strengthen other information sources.


Assuntos
Software
15.
J Mol Biol ; 434(11): 167514, 2022 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-35227770

RESUMO

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public chemical database at the U.S. National Institutes of Health. Visited by millions of users every month, it plays a role as a key chemical information resource for biomedical research communities. Data in PubChem is from hundreds of contributors and organized into multiple collections by record type. Among these are the Protein, Gene, Pathway, and Taxonomy data collections. Records in these collections contain information on chemicals related to a given biological target (i.e., protein, gene, pathway, or taxon), helping users to analyze and interpret the biological activity data of molecules. In addition, annotations about the biological targets are collected from authoritative or curated data sources and integrated into the four collections. The content can be programmatically accessed through PubChem's web service interfaces (including PUG View). A machine-readable representation of this content is also provided within PubChemRDF.


Assuntos
Bases de Dados de Compostos Químicos , Biologia , Descoberta de Drogas , Proteínas/genética
16.
Methods Mol Biol ; 2443: 511-525, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35037224

RESUMO

Plant Reactome (https://plantreactome.gramene.org) and PubChem ( https://pubchem.ncbi.nlm.nih.gov ) are two reference data portals and resources for curated plant pathways, small molecules, metabolites, gene products, and macromolecular interactions. Plant Reactome knowledgebase, a conceptual plant pathway network, is built by biocuration and integrating (bio)chemical entities, gene products, and macromolecular interactions. It provides manually curated pathways for the reference species Oryza sativa (rice) and gene orthology-based projections that extend pathway knowledge to 106 plant species. Currently, it hosts 320 reference pathways for plant metabolism, hormone signaling, transport, genetic regulation, plant organ development and differentiation, and biotic and abiotic stress responses. In addition to the pathway browsing and search functions, the Plant Reactome provides the analysis tools for pathway comparison between reference and projected species, pathway enrichment in gene expression data, and overlay of gene-gene interaction data on pathways. PubChem, a popular reference database of (bio)chemical entities, provides information on small molecules and other types of chemical entities, such as siRNAs, miRNAs, lipids, carbohydrates, and chemically modified nucleotides. The data in PubChem is collected from hundreds of data sources, including Plant Reactome. This chapter provides a brief overview of the Plant Reactome and the PubChem knowledgebases, their association to other public resources providing accessory information, and how users can readily access the contents.


Assuntos
Bases de Conhecimento , Redes e Vias Metabólicas , Bases de Dados Factuais , Plantas/genética , Plantas/metabolismo , Proteínas/metabolismo
17.
Nucleic Acids Res ; 50(D1): D20-D26, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34850941

RESUMO

The National Center for Biotechnology Information (NCBI) produces a variety of online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, RefSeq, SRA, Virus, dbSNP, dbVar, ClinicalTrials.gov, MMDB, iCn3D and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.


Assuntos
Biotecnologia/tendências , Bases de Dados Genéticas/tendências , Bases de Dados de Compostos Químicos , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Humanos , Internet , National Library of Medicine (U.S.) , PubMed , Estados Unidos
18.
Environ Int ; 158: 106885, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34560325

RESUMO

The diversity of hundreds of thousands of potential organic pollutants and the lack of (publicly available) information about many of them is a huge challenge for environmental sciences, engineering, and regulation. Suspect screening based on high-resolution liquid chromatography-mass spectrometry (LC-HRMS) has enormous potential to help characterize the presence of these chemicals in our environment, enabling the detection of known and newly emerging pollutants, as well as their potential transformation products (TPs). Here, suspect list creation (focusing on pesticides relevant for Luxembourg, incorporating data sources in 4 languages) was coupled to an automated retrieval of related TPs from PubChem based on high confidence suspect hits, to screen for pesticides and their TPs in Luxembourgish river samples. A computational workflow was established to combine LC-HRMS analysis and pre-screening of the suspects (including automated quality control steps), with spectral annotation to determine which pesticides and, in a second step, their related TPs may be present in the samples. The data analysis with Shinyscreen (https://gitlab.lcsb.uni.lu/eci/shinyscreen/), an open source software developed in house, coupled with custom-made scripts, revealed the presence of 162 potential pesticide masses and 96 potential TP masses in the samples. Further identification of these mass matches was performed using the open source approach MetFrag (https://msbi.ipb-halle.de/MetFrag/). Eventual target analysis of 36 suspects resulted in 31 pesticides and TPs confirmed at Level-1 (highest confidence), and five pesticides and TPs not confirmed due to different retention times. Spatio-temporal analysis of the results showed that TPs and pesticides followed similar trends, with a maximum number of potential detections in July. The highest detections were in the rivers Alzette and Mess and the lowest in the Sûre and Eisch. This study (a) added pesticides, classification information and related TPs into the open domain, (b) developed automated open source retrieval methods - both enhancing FAIRness (Findability, Accessibility, Interoperability and Reusability) of the data and methods; and (c) will directly support "L'Administration de la Gestion de l'Eau" on further monitoring steps in Luxembourg.


Assuntos
Praguicidas , Poluentes Químicos da Água , Quimioinformática , Luxemburgo , Praguicidas/análise , Rios , Poluentes Químicos da Água/análise
19.
Front Res Metr Anal ; 6: 689059, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34322655

RESUMO

The literature knowledge panels developed and implemented in PubChem are described. These help to uncover and summarize important relationships between chemicals, genes, proteins, and diseases by analyzing co-occurrences of terms in biomedical literature abstracts. Named entities in PubMed records are matched with chemical names in PubChem, disease names in Medical Subject Headings (MeSH), and gene/protein names in popular gene/protein information resources, and the most closely related entities are identified using statistical analysis and relevance-based sampling. Knowledge panels for the co-occurrence of chemical, disease, and gene/protein entities are included in PubChem Compound, Protein, and Gene pages, summarizing these in a compact form. Statistical methods for removing redundancy and estimating relevance scores are discussed, along with benefits and pitfalls of relying on automated (i.e., not human-curated) methods operating on data from multiple heterogeneous sources.

20.
Chem Teach Int ; 3(1): 57-65, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34268481

RESUMO

PubChem (https://pubchem.ncbi.nlm.nih.gov) is one of the top five most visited chemistry web sites in the world, with more than five million unique users per month (as of March 2020). Many of these users are educators, undergraduate students, and graduate students at academic institutions. Therefore, PubChem has a great potential as an online resource for chemical education. This paper describes the PubChem Periodic Table and Element pages, which were recently introduced to celebrate the 150th anniversary of the periodic table. These services help users navigate the abundant chemical element data available within PubChem, while providing a convenient entry point to explore additional chemical content, such as biological activities and health and safety data available in PubChem Compound pages for specific elements and their isotopes. The PubChem Periodic Table and Element pages are also available as widgets, which enable web developers to display PubChem's element data on web pages they design. The elemental data can be downloaded in common file formats and imported into data analysis programs (e.g., spreadsheet software, like Microsoft Excel and Google Sheets, and computer scripts, such as python and R). Overall, the PubChem Periodic Table and Element pages improve access to chemical element data from authoritative sources.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA