Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Nucleic Acids Res ; 51(D1): D1373-D1380, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36305812

RESUMO

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves a wide range of use cases. In the past two years, a number of changes were made to PubChem. Data from more than 120 data sources was added to PubChem. Some major highlights include: the integration of Google Patents data into PubChem, which greatly expanded the coverage of the PubChem Patent data collection; the creation of the Cell Line and Taxonomy data collections, which provide quick and easy access to chemical information for a given cell line and taxon, respectively; and the update of the bioassay data model. In addition, new functionalities were added to the PubChem programmatic access protocols, PUG-REST and PUG-View, including support for target-centric data download for a given protein, gene, pathway, cell line, and taxon and the addition of the 'standardize' option to PUG-REST, which returns the standardized form of an input chemical structure. A significant update was also made to PubChemRDF. The present paper provides an overview of these changes.


Assuntos
Bases de Dados de Compostos Químicos , Descoberta de Drogas , Descoberta de Drogas/métodos , Bioensaio , Proteínas , Quimioinformática
2.
Environ Sci Technol ; 58(9): 4181-4192, 2024 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-38373301

RESUMO

Alzheimer's disease (AD) is a complex and multifactorial neurodegenerative disease, which is currently diagnosed via clinical symptoms and nonspecific biomarkers (such as Aß1-42, t-Tau, and p-Tau) measured in cerebrospinal fluid (CSF), which alone do not provide sufficient insights into disease progression. In this pilot study, these biomarkers were complemented with small-molecule analysis using non-target high-resolution mass spectrometry coupled with liquid chromatography (LC) on the CSF of three groups: AD, mild cognitive impairment (MCI) due to AD, and a non-demented (ND) control group. An open-source cheminformatics pipeline based on MS-DIAL and patRoon was enhanced using CSF- and AD-specific suspect lists to assist in data interpretation. Chemical Similarity Enrichment Analysis revealed a significant increase of hydroxybutyrates in AD, including 3-hydroxybutanoic acid, which was found at higher levels in AD compared to MCI and ND. Furthermore, a highly sensitive target LC-MS method was used to quantify 35 bile acids (BAs) in the CSF, revealing several statistically significant differences including higher dehydrolithocholic acid levels and decreased conjugated BA levels in AD. This work provides several promising small-molecule hypotheses that could be used to help track the progression of AD in CSF samples.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Doenças Neurodegenerativas , Humanos , Doença de Alzheimer/líquido cefalorraquidiano , Doença de Alzheimer/diagnóstico , Doença de Alzheimer/psicologia , Proteínas tau/líquido cefalorraquidiano , Peptídeos beta-Amiloides/líquido cefalorraquidiano , Projetos Piloto , Disfunção Cognitiva/líquido cefalorraquidiano , Disfunção Cognitiva/diagnóstico , Disfunção Cognitiva/psicologia , Biomarcadores , Progressão da Doença
3.
Nucleic Acids Res ; 49(D1): D1388-D1395, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33151290

RESUMO

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves the scientific community as well as the general public, with millions of unique users per month. In the past two years, PubChem made substantial improvements. Data from more than 100 new data sources were added to PubChem, including chemical-literature links from Thieme Chemistry, chemical and physical property links from SpringerMaterials, and patent links from the World Intellectual Properties Organization (WIPO). PubChem's homepage and individual record pages were updated to help users find desired information faster. This update involved a data model change for the data objects used by these pages as well as by programmatic users. Several new services were introduced, including the PubChem Periodic Table and Element pages, Pathway pages, and Knowledge panels. Additionally, in response to the coronavirus disease 2019 (COVID-19) outbreak, PubChem created a special data collection that contains PubChem data related to COVID-19 and the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).


Assuntos
COVID-19/prevenção & controle , Bases de Dados de Compostos Químicos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , SARS-CoV-2/isolamento & purificação , Interface Usuário-Computador , COVID-19/epidemiologia , COVID-19/virologia , Descoberta de Drogas/estatística & dados numéricos , Epidemias , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Saúde Pública/estatística & dados numéricos , SARS-CoV-2/fisiologia , Software
4.
Anal Bioanal Chem ; 414(25): 7399-7419, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35829770

RESUMO

Parkinson's disease (PD) is the second most prevalent neurodegenerative disease, with an increasing incidence in recent years due to the aging population. Genetic mutations alone only explain <10% of PD cases, while environmental factors, including small molecules, may play a significant role in PD. In the present work, 22 plasma (11 PD, 11 control) and 19 feces samples (10 PD, 9 control) were analyzed by non-target high-resolution mass spectrometry (NT-HRMS) coupled to two liquid chromatography (LC) methods (reversed-phase (RP) and hydrophilic interaction liquid chromatography (HILIC)). A cheminformatics workflow was optimized using open software (MS-DIAL and patRoon) and open databases (all public MSP-formatted spectral libraries for MS-DIAL, PubChemLite for Exposomics, and the LITMINEDNEURO list for patRoon). Furthermore, five disease-specific databases and three suspect lists (on PD and related disorders) were developed, using PubChem functionality to identifying relevant unknown chemicals. The results showed that non-target screening with the larger databases generally provided better results compared with smaller suspect lists. However, two suspect screening approaches with patRoon were also good options to study specific chemicals in PD. The combination of chromatographic methods (RP and HILIC) as well as two ionization modes (positive and negative) enhanced the coverage of chemicals in the biological samples. While most metabolomics studies in PD have focused on blood and cerebrospinal fluid, we found a higher number of relevant features in feces, such as alanine betaine or nicotinamide, which can be directly metabolized by gut microbiota. This highlights the potential role of gut dysbiosis in PD development.


Assuntos
Expossoma , Doenças Neurodegenerativas , Doença de Parkinson , Idoso , Alanina , Betaína , Quimioinformática , Humanos , Metaboloma , Metabolômica/métodos , Niacinamida , Projetos Piloto
5.
Nucleic Acids Res ; 47(D1): D1102-D1109, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30371825

RESUMO

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a key chemical information resource for the biomedical research community. Substantial improvements were made in the past few years. New data content was added, including spectral information, scientific articles mentioning chemicals, and information for food and agricultural chemicals. PubChem released new web interfaces, such as PubChem Target View page, Sources page, Bioactivity dyad pages and Patent View page. PubChem also released a major update to PubChem Widgets and introduced a new programmatic access interface, called PUG-View. This paper describes these new developments in PubChem.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Compostos Químicos , Preparações Farmacêuticas/química , Bibliotecas de Moléculas Pequenas/química , Animais , Bioensaio/métodos , Descoberta de Drogas/métodos , Ensaios de Triagem em Larga Escala/métodos , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Estrutura Molecular , Patentes como Assunto , Relação Estrutura-Atividade
6.
Nucleic Acids Res ; 44(14): 6614-24, 2016 08 19.
Artigo em Inglês | MEDLINE | ID: mdl-27342282

RESUMO

Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/.


Assuntos
Genoma Bacteriano , Anotação de Sequência Molecular , Células Procarióticas/metabolismo , Bactérias/genética , Proteínas de Bactérias/química , Bases de Dados de Ácidos Nucleicos , Genes Bacterianos
7.
Nucleic Acids Res ; 43(Database issue): D599-605, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25510495

RESUMO

NCBI RefSeq genome collection http://www.ncbi.nlm.nih.gov/genome represents all three major domains of life: Eukarya, Bacteria and Archaea as well as Viruses. Prokaryotic genome sequences are the most rapidly growing part of the collection. During the year of 2014 more than 10,000 microbial genome assemblies have been publicly released bringing the total number of prokaryotic genomes close to 30,000. We continue to improve the quality and usability of the microbial genome resources by providing easy access to the data and the results of the pre-computed analysis, and improving analysis and visualization tools. A number of improvements have been incorporated into the Prokaryotic Genome Annotation Pipeline. Several new features have been added to RefSeq prokaryotic genomes data processing pipeline including the calculation of genome groups (clades) and the optimization of protein clusters generation using pan-genome approach.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma Arqueal , Genoma Bacteriano , Internet , Anotação de Sequência Molecular
8.
BMC Bioinformatics ; 17 Suppl 8: 276, 2016 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-27586436

RESUMO

BACKGROUND: Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy. RESULTS: Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering. The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters. CONCLUSION: The developed filtering strategies allow to identify and exclude such peripheral proteins limiting the protein dataset in global clustering. Overall, the proposed methodology allows the relevant data at different levels of details to be obtained and data redundancy eliminated while keeping biologically interesting variations.


Assuntos
Proteínas de Bactérias/metabolismo , Genoma Microbiano , Algoritmos , Análise por Conglomerados , Guanosina Trifosfato/metabolismo , Humanos , Filogenia , Estatística como Assunto
9.
Nucleic Acids Res ; 42(Database issue): D660-5, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24304891

RESUMO

Virus Variation (http://www.ncbi.nlm.nih.gov/genomes/VirusVariation/) is a comprehensive, web-based resource designed to support the retrieval and display of large virus sequence datasets. The resource includes a value added database, a specialized search interface and a suite of sequence data displays. Virus-specific sequence annotation and database loading pipelines produce consistent protein and gene annotation and capture sequence descriptors from sequence records then map these metadata to a controlled vocabulary. The database supports a metadata driven, web-based search interface where sequences can be selected using a variety of biological and clinical criteria. Retrieved sequences can then be downloaded in a variety of formats or analyzed using a suite of tools and displays. Over the past 2 years, the pre-existing influenza and Dengue virus resources have been combined into a single construct and West Nile virus added to the resultant resource. A number of improvements were incorporated into the sequence annotation and database loading pipelines, and the virus-specific search interfaces were updated to support more advanced functions. Several new features have also been added to the sequence download options, and a new multiple sequence alignment viewer has been incorporated into the resource tool set. Together these enhancements should support enhanced usability and the inclusion of new viruses in the future.


Assuntos
Bases de Dados Genéticas , Vírus/genética , Genes Virais , Genoma Viral , Genômica , Internet , Anotação de Sequência Molecular , Orthomyxoviridae/genética , Alinhamento de Sequência , Proteínas Virais
10.
Front Res Metr Anal ; 6: 689059, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34322655

RESUMO

The literature knowledge panels developed and implemented in PubChem are described. These help to uncover and summarize important relationships between chemicals, genes, proteins, and diseases by analyzing co-occurrences of terms in biomedical literature abstracts. Named entities in PubMed records are matched with chemical names in PubChem, disease names in Medical Subject Headings (MeSH), and gene/protein names in popular gene/protein information resources, and the most closely related entities are identified using statistical analysis and relevance-based sampling. Knowledge panels for the co-occurrence of chemical, disease, and gene/protein entities are included in PubChem Compound, Protein, and Gene pages, summarizing these in a compact form. Statistical methods for removing redundancy and estimating relevance scores are discussed, along with benefits and pitfalls of relying on automated (i.e., not human-curated) methods operating on data from multiple heterogeneous sources.

11.
BMC Microbiol ; 9: 65, 2009 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-19341451

RESUMO

BACKGROUND: There is an increasing number of complete and incomplete virus genome sequences available in public databases. This large body of sequence data harbors information about epidemiology, phylogeny, and virulence. Several specialized databases, such as the NCBI Influenza Virus Resource or the Los Alamos HIV database, offer sophisticated query interfaces along with integrated exploratory data analysis tools for individual virus species to facilitate extracting this information. Thus far, there has not been a comprehensive database for dengue virus, a significant public health threat. RESULTS: We have created an integrated web resource for dengue virus. The technology developed for the NCBI Influenza Virus Resource has been extended to process non-segmented dengue virus genomes. In order to allow efficient processing of the dengue genome, which is large in comparison with individual influenza segments, we developed an offline pre-alignment procedure which generates a multiple sequence alignment of all dengue sequences. The pre-calculated alignment is then used to rapidly create alignments of sequence subsets in response to user queries. This improvement in technology will also facilitate the incorporation of additional virus species in the future. The set of virus-specific databases at NCBI, which will be referred to as Virus Variation Resources (VVR), allow users to build complex queries against virus-specific databases and then apply exploratory data analysis tools to the results. The metadata is automatically collected where possible, and extended with data extracted from the literature. CONCLUSION: The NCBI Dengue Virus Resource integrates dengue sequence information with relevant metadata (sample collection time and location, disease severity, serotype, sequenced genome region) and facilitates retrieval and preliminary analysis of dengue sequences using integrated web analysis and visualization tools.


Assuntos
Bases de Dados Genéticas , Vírus da Dengue/genética , Genoma Viral , Análise de Sequência de RNA/métodos , Biologia Computacional , Internet , RNA Viral/genética , Alinhamento de Sequência , Interface Usuário-Computador
12.
BMC Bioinformatics ; 9: 237, 2008 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-18485197

RESUMO

BACKGROUND: With the amount of influenza genome sequence data growing rapidly, researchers need machine assistance in selecting datasets and exploring the data. Enhanced visualization tools are required to represent results of the exploratory analysis on the web in an easy-to-comprehend form and to facilitate convenient information retrieval. RESULTS: We developed an approach to visualize large phylogenetic trees in an aggregated form with a special representation of subscale details. The initial aggregated tree representation is built with a level of resolution automatically selected to fit into the available screen space, with terminal groups selected based on sequence similarity. The default aggregated representation can be refined by users interactively.Structure and data variability within terminal groups are displayed using small trees that have the same vertical size as the text annotation of the group. These subscale representations are calculated using systematic sampling from the corresponding terminal group. The aggregated tree containing terminal groups can be annotated using aggregation of structured metadata, such as seasonal distribution, geographic locations, etc. AVAILABILITY: The algorithms are implemented in JavaScript within the NCBI Influenza Virus Resource 1.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Gráficos por Computador , Genoma Viral/genética , Orthomyxoviridae/genética , Análise de Sequência de DNA/métodos , Interface Usuário-Computador , Simulação por Computador , Modelos Genéticos , Tamanho da Amostra
14.
ISRN Neurol ; 2013: 748127, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24109519

RESUMO

Background. Mitoxantrone (MTX) and Rituximab (RTX) are successfully used for treatment of multiple sclerosis (MS) and can be combined to increase efficacy. Objective. We used MTX, RTX, and methylprednisolone in a single combined regiment and observed patients prospectively. Methods. We present results of observational pilot study of combined therapy of RTX and MTX in 28 patients with active MS. Therapeutic protocol consisted of two infusions within 14 days. First infusion was 1000 mg methylprednisolone (MP) IV, 1000 mg RTX IV, and 20 mg MTX IV. On day 14, 1000 mg MP IV and 1000 mg RTX IV were given. Patients were followed prospectively from 12 to 48 months. Results and Conclusion. There were no relapses among all 28 patients during the observation period. B-cell depletion of CD19+ and CD19+/CD27+ memory B-cell subpopulation in both compartments was confirmed in all patients at 6 months. We found a more rapid reconstitution of B cells in the CSF than in the peripheral blood and longstanding depression of CD19+CD27+ memory B-cell. Conclusion. Effectiveness of combined regimen of RTX and MTX could be related to longstanding depletion of CD19+CD27+ memory B-cell subset.

15.
PLoS Curr ; 1: RRN1124, 2009 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-20029662

RESUMO

The Influenza Virus Resource and other Virus Variation Resources at NCBI provide enhanced visualization web tools for exploratory analysis for influenza sequence data. Despite the improvements in data analysis, the initial data retrieval remains unsophisticated, frequently producing huge and imbalanced datasets due to the large number of identical and nearly-identical sequences in the database.We propose a data mining algorithm to organize reported sequences into groups based on their relatedness to the query sequence and to each other. The algorithm uses BLAST to find database sequences related to the query. Neighbor lists precalculated from pairwise BLAST alignments between database sequences are used to organize results in groups of nearly-identical and strongly related sequences. We propose to use a non-symmetric dissimilarity measure well crafted for dealing with sequences of different length (fragments).A balanced and representative data set produced by this tool can be used for further analysis, i.e. multiple sequence alignment and phylogenetic trees. The algorithm is implemented for protein coding sequences and is being integrated with the NCBI Influenza Virus Resource.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa