Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38647153

RESUMO

Computational drug repositioning, which involves identifying new indications for existing drugs, is an increasingly attractive research area due to its advantages in reducing both overall cost and development time. As a result, a growing number of computational drug repositioning methods have emerged. Heterogeneous network-based drug repositioning methods have been shown to outperform other approaches. However, there is a dearth of systematic evaluation studies of these methods, encompassing performance, scalability and usability, as well as a standardized process for evaluating new methods. Additionally, previous studies have only compared several methods, with conflicting results. In this context, we conducted a systematic benchmarking study of 28 heterogeneous network-based drug repositioning methods on 11 existing datasets. We developed a comprehensive framework to evaluate their performance, scalability and usability. Our study revealed that methods such as HGIMC, ITRPCA and BNNR exhibit the best overall performance, as they rely on matrix completion or factorization. HINGRL, MLMC, ITRPCA and HGIMC demonstrate the best performance, while NMFDR, GROBMC and SCPMF display superior scalability. For usability, HGIMC, DRHGCN and BNNR are the top performers. Building on these findings, we developed an online tool called HN-DREP (http://hn-drep.lyhbio.com/) to facilitate researchers in viewing all the detailed evaluation results and selecting the appropriate method. HN-DREP also provides an external drug repositioning prediction service for a specific disease or drug by integrating predictions from all methods. Furthermore, we have released a Snakemake workflow named HN-DRES (https://github.com/lyhbio/HN-DRES) to facilitate benchmarking and support the extension of new methods into the field.


Assuntos
Benchmarking , Reposicionamento de Medicamentos , Reposicionamento de Medicamentos/métodos , Humanos , Biologia Computacional/métodos , Software , Algoritmos
2.
Proteomics ; 23(20): e2300188, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37488995

RESUMO

Relative and absolute intensity-based protein quantification across cell lines, tissue atlases and tumour datasets is increasingly available in public datasets. These atlases enable researchers to explore fundamental biological questions, such as protein existence, expression location, quantity and correlation with RNA expression. Most studies provide MS1 feature-based label-free quantitative (LFQ) datasets; however, growing numbers of isobaric tandem mass tags (TMT) datasets remain unexplored. Here, we compare traditional intensity-based absolute quantification (iBAQ) proteome abundance ranking to an analogous method using reporter ion proteome abundance ranking with data from an experiment where LFQ and TMT were measured on the same samples. This new TMT method substitutes reporter ion intensities for MS1 feature intensities in the iBAQ framework. Additionally, we compared LFQ-iBAQ values to TMT-iBAQ values from two independent large-scale tissue atlas datasets (one LFQ and one TMT) using robust bottom-up proteomic identification, normalisation and quantitation workflows.

3.
J Proteome Res ; 22(6): 2114-2123, 2023 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-37220883

RESUMO

Testing for significant differences in quantities at the protein level is a common goal of many LFQ-based mass spectrometry proteomics experiments. Starting from a table of protein and/or peptide quantities from a given proteomics quantification software, many tools and R packages exist to perform the final tasks of imputation, summarization, normalization, and statistical testing. To evaluate the effects of packages and settings in their substeps on the final list of significant proteins, we studied several packages on three public data sets with known expected protein fold changes. We found that the results between packages and even across different parameters of the same package can vary significantly. In addition to usability aspects and feature/compatibility lists of different packages, this paper highlights sensitivity and specificity trade-offs that come with specific packages and settings.


Assuntos
Peptídeos , Software , Peptídeos/análise , Proteínas/análise , Espectrometria de Massas/métodos , Proteômica/métodos
4.
J Proteome Res ; 21(6): 1566-1574, 2022 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-35549218

RESUMO

Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Algoritmos , Análise por Conglomerados , Consenso , Bases de Dados de Proteínas , Proteômica/métodos , Software , Espectrometria de Massas em Tandem/métodos
5.
J Proteome Res ; 20(4): 2056-2061, 2021 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-33625229

RESUMO

BioContainers is an open-source project that aims to create, store, and distribute bioinformatics software containers and packages. The BioContainers community has developed a set of guidelines to standardize software containers including the metadata, versions, licenses, and software dependencies. BioContainers supports multiple packaging and container technologies such as Conda, Docker, and Singularity. The BioContainers provide over 9000 bioinformatics tools, including more than 200 proteomics and mass spectrometry tools. Here we introduce the BioContainers Registry and Restful API to make containerized bioinformatics tools more findable, accessible, interoperable, and reusable (FAIR). The BioContainers Registry provides a fast and convenient way to find and retrieve bioinformatics tool packages and containers. By doing so, it will increase the use of bioinformatics packages and containers while promoting replicability and reproducibility in research.


Assuntos
Biologia Computacional , Proteômica , Sistema de Registros , Reprodutibilidade dos Testes , Software
6.
Clin Proteomics ; 18(1): 32, 2021 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-34963468

RESUMO

BACKGROUND: Type 2 diabetic kidney disease is the most common cause of chronic kidney diseases (CKD) and end-stage renal diseases (ESRD). Although kidney biopsy is considered as the 'gold standard' for diabetic kidney disease (DKD) diagnosis, it is an invasive procedure, and the diagnosis can be influenced by sampling bias and personal judgement. It is desirable to establish a non-invasive procedure that can complement kidney biopsy in diagnosis and tracking the DKD progress. METHODS: In this cross-sectional study, we collected 252 urine samples, including 134 uncomplicated diabetes, 65 DKD, 40 CKD without diabetes and 13 follow-up diabetic samples, and analyzed the urine proteomes with liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS). We built logistic regression models to distinguish uncomplicated diabetes, DKD and other CKDs. RESULTS: We quantified 559 ± 202 gene products (GPs) (Mean ± SD) on a single sample and 2946 GPs in total. Based on logistic regression models, DKD patients could be differentiated from the uncomplicated diabetic patients with 2 urinary proteins (AUC = 0.928), and the stage 3 (DKD3) and stage 4 (DKD4) DKD patients with 3 urinary proteins (AUC = 0.949). These results were validated in an independent dataset. Finally, a 4-protein classifier identified putative pre-DKD3 patients, who showed DKD3 proteomic features but were not diagnosed by clinical standards. Follow-up studies on 11 patients indicated that 2 putative pre-DKD patients have progressed to DKD3. CONCLUSIONS: Our study demonstrated the potential for urinary proteomics as a noninvasive method for DKD diagnosis and identifying high-risk patients for progression monitoring.

7.
Nucleic Acids Res ; 47(D1): D1211-D1217, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30252093

RESUMO

Sharing of research data in public repositories has become best practice in academia. With the accumulation of massive data, network bandwidth and storage requirements are rapidly increasing. The ProteomeXchange (PX) consortium implements a mode of centralized metadata and distributed raw data management, which promotes effective data sharing. To facilitate open access of proteome data worldwide, we have developed the integrated proteome resource iProX (http://www.iprox.org) as a public platform for collecting and sharing raw data, analysis results and metadata obtained from proteomics experiments. The iProX repository employs a web-based proteome data submission process and open sharing of mass spectrometry-based proteomics datasets. Also, it deploys extensive controlled vocabularies and ontologies to annotate proteomics datasets. Users can use a GUI to provide and access data through a fast Aspera-based transfer tool. iProX is a full member of the PX consortium; all released datasets are freely accessible to the public. iProX is based on a high availability architecture and has been deployed as part of the proteomics infrastructure of China, ensuring long-term and stable resource support. iProX will facilitate worldwide data analysis and sharing of proteomics experiments.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteoma/metabolismo , Proteômica/métodos , Animais , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Metadados/estatística & dados numéricos , Interface Usuário-Computador
8.
Proteomics ; 20(21-22): e1900345, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32574431

RESUMO

Spectrum prediction using machine learning or deep learning models is an emerging method in computational proteomics. Several deep learning-based MS/MS spectrum prediction tools have been developed and showed their potentials not only for increasing the sensitivity and accuracy of data-dependent acquisition search engines, but also for building spectral libraries for data-independent acquisition analysis. Different tools with their unique algorithms and implementations may result in different performances. Hence, it is necessary to systematically evaluate these tools to find out their preferences and intrinsic differences. In this study, multiple datasets with different collision energies, enzymes, instruments, and species, are used to evaluate the performances of the deep learning-based MS/MS spectrum prediction tools, as well as, the machine learning-based tool MS2PIP. The evaluations may provide helpful insights and guidelines of spectrum prediction tools for the corresponding researchers.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Algoritmos , Aprendizado de Máquina , Ferramenta de Busca
9.
Bioinformatics ; 33(16): 2580-2582, 2017 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-28379341

RESUMO

MOTIVATION: BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). AVAILABILITY AND IMPLEMENTATION: The software is freely available at github.com/BioContainers/. CONTACT: yperez@ebi.ac.uk.


Assuntos
Biologia Computacional/métodos , Software , Genômica/métodos , Metabolômica/métodos , Proteômica/métodos
10.
Comput Biol Med ; 175: 108536, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38701592

RESUMO

In response to the shortcomings in data quality and coverage for neurological and psychiatric disorders (NPDs) in existing comprehensive databases, this paper introduces the DTNPD database, specifically designed for NPDs. DTNPD contains detailed information on 30 NPDs types, 1847 drugs, 514 drug targets, 64 drug combinations, and 61 potential target combinations, forming a network with 2389 drug-target associations. The database is user-friendly, offering open access and downloadable data, which is crucial for network pharmacology studies. The key strength of DTNPD lies in its robust networks of drug and target combinations, as well as drug-target networks, facilitating research and development in the field of NPDs. The development of the DTNPD database marks a significant milestone in understanding and treating NPDs. For accessing the DTNPD database, the primary URL is http://dtnpd.cnsdrug.com, complemented by a mirror site available at http://dtnpd.lyhbio.com.


Assuntos
Transtornos Mentais , Doenças do Sistema Nervoso , Humanos , Transtornos Mentais/tratamento farmacológico , Transtornos Mentais/metabolismo , Doenças do Sistema Nervoso/tratamento farmacológico , Bases de Dados de Produtos Farmacêuticos , Bases de Dados Factuais
11.
Front Oncol ; 12: 847706, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35651795

RESUMO

Gastric cancer (GC) is one of the most common malignant tumors with a high mortality rate worldwide and lacks effective methods for prognosis prediction. Postoperative adjuvant chemotherapy is the first-line treatment for advanced gastric cancer, but only a subgroup of patients benefits from it. Here, we used 833 formalin-fixed, paraffin-embedded resected tumor samples from patients with TNM stage II/III GC and established a proteomic subtyping workflow using 100 deep-learned features. Two proteomic subtypes (S-I and S-II) with overall survival differences were identified. S-I has a better survival rate and is sensitive to chemotherapy. Patients in the S-I who received adjuvant chemotherapy had a significant improvement in the 5-year overall survival rate compared with patients who received surgery alone (65.3% vs 52.6%; log-rank P = 0.014), but no improvement was observed in the S-II (54% vs 51%; log-rank P = 0.96). These results were verified in an independent validation set. Furthermore, we also evaluated the superiority and scalability of the deep learning-based workflow in cancer molecular subtyping, exhibiting its great utility and potential in prognosis prediction and therapeutic decision-making.

12.
J Proteomics ; 232: 104070, 2021 02 10.
Artigo em Inglês | MEDLINE | ID: mdl-33307250

RESUMO

Spectral similarity calculation is widely used in protein identification tools and mass spectra clustering algorithms while comparing theoretical or experimental spectra. The performance of the spectral similarity calculation plays an important role in these tools and algorithms especially in the analysis of large-scale datasets. Recently, deep learning methods have been proposed to improve the performance of clustering algorithms and protein identification by training the algorithms with existing data and the use of multiple spectra and identified peptide features. While the efficiency of these algorithms is still under study in comparison with traditional approaches, their application in proteomics data analysis is becoming more common. Here, we propose the use of deep learning to improve spectral similarity comparison. We assessed the performance of deep learning for spectral similarity, with GLEAMS and a newly trained embedder model (DLEAMSE), which uses high-quality spectra from PRIDE Cluster. Also, we developed a new bioinformatics tool (mslookup - https://github.com/bigbio/DLEAMSE/) that allows users to quickly search for spectra in previously identified mass spectra publish in public repositories and spectral libraries. Finally, we released a human database to enable bioinformaticians and biologists to search for identified spectra in their machines. SIGNIFICANCE STATEMENT: Spectral similarity calculation plays an important role in proteomics data analysis. With deep learning's ability to learn the implicit and effective features from large-scale training datasets, deep learning-based MS/MS spectra embedding models has emerged as a solution to improve mass spectral clustering similarity calculation algorithms. We compare multiple similarity scoring and deep learning methods in terms of accuracy (compute the similarity for a pair of the mass spectrum) and computing-time performance. The benchmark results showed no major differences in accuracy between DLEAMSE and normalized dot product for spectrum similarity calculations. The DLEAMSE GPU implementation is faster than NDP in preprocessing on the GPU server and the similarity calculation of DLEAMSE (Euclidean distance on 32-D vectors) takes about 1/3 of dot product calculations. The deep learning model (DLEAMSE) encoding and embedding steps needed to run once for each spectrum and the embedded 32-D points can be persisted in the repository for future comparison, which is faster for future comparisons and large-scale data. Based on these, we proposed a new tool mslookup that enables the researcher to find spectra previously identified in public data. The tool can be also used to generate in-house databases of previously identified spectra to share with other laboratories and consortiums.


Assuntos
Aprendizado Profundo , Espectrometria de Massas em Tandem , Algoritmos , Análise por Conglomerados , Bases de Dados de Proteínas , Humanos , Proteômica , Software
13.
Nat Commun ; 12(1): 5854, 2021 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-34615866

RESUMO

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.


Assuntos
Análise de Dados , Bases de Dados de Proteínas , Metadados , Proteômica , Big Data , Humanos , Reprodutibilidade dos Testes , Software , Transcriptoma
14.
Curr Mol Med ; 20(6): 429-441, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31782363

RESUMO

BACKGROUND: Bipolar disorder (BD) is a type of chronic emotional disorder with a complex genetic structure. However, its genetic molecular mechanism is still unclear, which makes it insufficient to be diagnosed and treated. METHODS AND RESULTS: In this paper, we proposed a model for predicting BD based on single nucleotide polymorphisms (SNPs) screening by genome-wide association study (GWAS), which was constructed by a convolutional neural network (CNN) that predicted the probability of the disease. According to the difference of GWAS threshold, two sets of data were named: group P001 and group P005. And different convolutional neural networks are set for the two sets of data. The training accuracy of the model trained with group P001 data is 96%, and the test accuracy is 91%. The training accuracy of the model trained with group P005 data is 94.5%, and the test accuracy is 92%. At the same time, we used gradient weighted class activation mapping (Grad-CAM) to interpret the prediction model, indirectly to identify high-risk SNPs of BD. In the end, we compared these high-risk SNPs with human gene annotation information. CONCLUSION: The model prediction results of the group P001 yielded 137 risk genes, of which 22 were reported to be associated with the occurrence of BD. The model prediction results of the group P005 yielded 407 risk genes, of which 51 were reported to be associated with the occurrence of BD.


Assuntos
Transtorno Bipolar/genética , Estudo de Associação Genômica Ampla/métodos , Redes Neurais de Computação , Polimorfismo de Nucleotídeo Único/genética , Predisposição Genética para Doença/genética , Humanos , Anotação de Sequência Molecular
15.
Sheng Wu Gong Cheng Xue Bao ; 34(10): 1567-1578, 2018 Oct 25.
Artigo em Chinês | MEDLINE | ID: mdl-30394024

RESUMO

Mass spectrometry and database searching are necessary to identify proteins and peptides. With the rapid development of mass spectrometry technology, mass spectrometry data in proteomics are acquired very quickly, providing a powerful method to identify large-scale proteins and peptides, making mass spectrometry data-based proteomics research more and more into the mainstream. The traditional database searching method has many limitations to identify post-translational modifications of peptides. This paper systematically reviews the development, theoretical concept and applications of spectral network method, and the advantages of spectral network library to identify peptides.


Assuntos
Peptídeos/química , Processamento de Proteína Pós-Traducional , Proteínas/química , Bases de Dados de Proteínas , Espectrometria de Massas , Proteômica
16.
Sheng Wu Gong Cheng Xue Bao ; 34(4): 525-536, 2018 Apr 25.
Artigo em Chinês | MEDLINE | ID: mdl-29701026

RESUMO

Exponential growth of the mass spectrometry (MS) data is exhibited when the mass spectrometry-based proteomics has been developing rapidly. It is a great challenge to develop some quick, accurate and repeatable methods to identify peptides and proteins. Nowadays, the spectral library searching has become a mature strategy for tandem mass spectra based proteins identification in proteomics, which searches the experiment spectra against a collection of confidently identified MS/MS spectra that have been observed previously, and fully utilizes the abundance in the spectrum, peaks from non-canonical fragment ions, and other features. This review provides an overview of the implement of spectral library search strategy, and two key steps, spectral library construction and spectral library searching comprehensively, and discusses the progress and challenge of the library search strategy.


Assuntos
Biblioteca de Peptídeos , Proteínas/análise , Proteômica , Espectrometria de Massas em Tandem , Bases de Dados de Proteínas , Peptídeos/análise
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa