Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-33975333

RESUMEN

Neuropeptides (NPs) are the most versatile neurotransmitters in the immune systems that regulate various central anxious hormones. An efficient and effective bioinformatics tool for rapid and accurate large-scale identification of NPs is critical in immunoinformatics, which is indispensable for basic research and drug development. Although a few NP prediction tools have been developed, it is mandatory to improve their NPs' prediction performances. In this study, we have developed a machine learning-based meta-predictor called NeuroPred-FRL by employing the feature representation learning approach. First, we generated 66 optimal baseline models by employing 11 different encodings, six different classifiers and a two-step feature selection approach. The predicted probability scores of NPs based on the 66 baseline models were combined to be deemed as the input feature vector. Second, in order to enhance the feature representation ability, we applied the two-step feature selection approach to optimize the 66-D probability feature vector and then inputted the optimal one into a random forest classifier for the final meta-model (NeuroPred-FRL) construction. Benchmarking experiments based on both cross-validation and independent tests indicate that the NeuroPred-FRL achieves a superior prediction performance of NPs compared with the other state-of-the-art predictors. We believe that the proposed NeuroPred-FRL can serve as a powerful tool for large-scale identification of NPs, facilitating the characterization of their functional mechanisms and expediting their applications in clinical therapy. Moreover, we interpreted some model mechanisms of NeuroPred-FRL by leveraging the robust SHapley Additive exPlanation algorithm.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Automático , Neuropéptidos/química , Programas Informáticos , Algoritmos , Secuencia de Consenso , Bases de Datos Genéticas , Intervención basada en la Internet , Neuropéptidos/metabolismo , Posición Específica de Matrices de Puntuación , Reproducibilidad de los Resultados , Flujo de Trabajo
2.
Mol Ther ; 30(8): 2856-2867, 2022 08 03.
Artículo en Inglés | MEDLINE | ID: mdl-35526094

RESUMEN

As one of the most prevalent post-transcriptional epigenetic modifications, N5-methylcytosine (m5C) plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important accurately identify m5C modifications in order to gain a deeper understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models have been developed using small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we propose Deepm5C, a bioinformatics method for identifying RNA m5C sites throughout the human genome. To develop Deepm5C, we constructed a novel benchmarking dataset and investigated a mixture of three conventional feature-encoding algorithms and a feature derived from word-embedding approaches. Afterward, four variants of deep-learning classifiers and four commonly used conventional classifiers were employed and trained with the four encodings, ultimately obtaining 32 baseline models. A stacking strategy is effectively utilized by integrating the predicted output of the optimal baseline models and trained with a one-dimensional (1D) convolutional neural network. As a result, the Deepm5C predictor achieved excellent performance during cross-validation with a Matthews correlation coefficient and an accuracy of 0.697 and 0.855, respectively. The corresponding metrics during the independent test were 0.691 and 0.852, respectively. Overall, Deepm5C achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, Deepm5C is expected to assist community-wide efforts in identifying putative m5Cs and to formulate the novel testable biological hypothesis.


Asunto(s)
Aprendizaje Profundo , ARN , Algoritmos , Biología Computacional/métodos , Humanos , Aprendizaje Automático , ARN/genética
3.
J Comput Aided Mol Des ; 35(3): 315-323, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33392948

RESUMEN

Redox-sensitive cysteine (RSC) thiol contributes to many biological processes. The identification of RSC plays an important role in clarifying some mechanisms of redox-sensitive factors; nonetheless, experimental investigation of RSCs is expensive and time-consuming. The computational approaches that quickly and accurately identify candidate RSCs using the sequence information are urgently needed. Herein, an improved and robust computational predictor named IRC-Fuse was developed to identify the RSC by fusing of multiple feature representations. To enhance the performance of our model, we integrated the probability scores evaluated by the random forest models implementing different encoding schemes. Cross-validation results exhibited that the IRC-Fuse achieved accuracy and AUC of 0.741 and 0.807, respectively. The IRC-Fuse outperformed exiting methods with improvement of 10% and 13% on accuracy and MCC, respectively, over independent test data. Comparative analysis suggested that the IRC-Fuse was more effective and promising than the existing predictors. For the convenience of experimental scientists, the IRC-Fuse online web server was implemented and publicly accessible at http://kurata14.bio.kyutech.ac.jp/IRC-Fuse/ .


Asunto(s)
Benchmarking/métodos , Cisteína/química , Proteínas/química , Secuencia de Aminoácidos , Biología Computacional , Bases de Datos Factuales , Aprendizaje Automático , Modelos Moleculares , Oxidación-Reducción , Compuestos de Sulfhidrilo/química
4.
J Biomed Inform ; 120: 103854, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34237438

RESUMEN

In recent years, a comprehensive study of complex disease with multi-view datasets (e.g., multi-omics and imaging scans) has been a focus and forefront in biomedical research. State-of-the-art biomedical technologies are enabling us to collect multi-view biomedical datasets for the study of complex diseases. While all the views of data tend to explore complementary information of disease, analysis of multi-view data with complex interactions is challenging for a deeper and holistic understanding of biological systems. In this paper, we propose a novel generalized kernel machine approach to identify higher-order composite effects in multi-view biomedical datasets (GKMAHCE). This generalized semi-parametric (a mixed-effect linear model) approach includes the marginal and joint Hadamard product of features from different views of data. The proposed kernel machine approach considers multi-view data as predictor variables to allow a more thorough and comprehensive modeling of a complex trait. We applied GKMAHCE approach to both synthesized datasets and real multi-view datasets from adolescent brain development and osteoporosis study. Our experiments demonstrate that the proposed method can effectively identify higher-order composite effects and suggest that corresponding features (genes, region of interests, and chemical taxonomies) function in a concerted effort. We show that the proposed method is more generalizable than existing ones. To promote reproducible research, the source code of the proposed method is available at.


Asunto(s)
Algoritmos , Osteoporosis , Adolescente , Encéfalo/diagnóstico por imagen , Humanos , Modelos Lineales , Osteoporosis/diagnóstico por imagen , Programas Informáticos
5.
Int J Mol Sci ; 22(4)2021 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-33672741

RESUMEN

Pupylation is a type of reversible post-translational modification of proteins, which plays a key role in the cellular function of microbial organisms. Several proteomics methods have been developed for the prediction and analysis of pupylated proteins and pupylation sites. However, the traditional experimental methods are laborious and time-consuming. Hence, computational algorithms are highly needed that can predict potential pupylation sites using sequence features. In this research, a new prediction model, PUP-Fuse, has been developed for pupylation site prediction by integrating multiple sequence representations. Meanwhile, we explored the five types of feature encoding approaches and three machine learning (ML) algorithms. In the final model, we integrated the successive ML scores using a linear regression model. The PUP-Fuse achieved a Mathew correlation value of 0.768 by a 10-fold cross-validation test. It also outperformed existing predictors in an independent test. The web server of the PUP-Fuse with curated datasets is freely available.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Procesamiento Proteico-Postraduccional , Proteínas/química , Proteínas/metabolismo , Secuencia de Aminoácidos , Bases de Datos de Proteínas
6.
Neurocomputing (Amst) ; 304: 12-29, 2018 Aug 23.
Artículo en Inglés | MEDLINE | ID: mdl-30416263

RESUMEN

Many unsupervised kernel methods rely on the estimation of kernel covariance operator (kernel CO) or kernel cross-covariance operator (kernel CCO). Both are sensitive to contaminated data, even when bounded positive definite kernels are used. To the best of our knowledge, there are few well-founded robust kernel methods for statistical unsupervised learning. In addition, while the influence function (IF) of an estimator can characterize its robustness, asymptotic properties and standard error, the IF of a standard kernel canonical correlation analysis (standard kernel CCA) has not been derived yet. To fill this gap, we first propose a robust kernel covariance operator (robust kernel CO) and a robust kernel cross-covariance operator (robust kernel CCO) based on a generalized loss function instead of the quadratic loss function. Second, we derive the IF for robust kernel CCO and standard kernel CCA. Using the IF of the standard kernel CCA, we can detect influential observations from two sets of data. Finally, we propose a method based on the robust kernel CO and the robust kernel CCO, called robust kernel CCA, which is less sensitive to noise than the standard kernel CCA. The introduced principles can also be applied to many other kernel methods involving kernel CO or kernel CCO. Our experiments on both synthesized and imaging genetics data demonstrate that the proposed IF of standard kernel CCA can identify outliers. It is also seen that the proposed robust kernel CCA method performs better for ideal and contaminated data than the standard kernel CCA.

7.
Alzheimers Res Ther ; 16(1): 8, 2024 01 11.
Artículo en Inglés | MEDLINE | ID: mdl-38212844

RESUMEN

BACKGROUND: Specific peripheral proteins have been implicated to play an important role in the development of Alzheimer's disease (AD). However, the roles of additional novel protein biomarkers in AD etiology remains elusive. The availability of large-scale AD GWAS and plasma proteomic data provide the resources needed for the identification of causally relevant circulating proteins that may serve as risk factors for AD and potential therapeutic targets. METHODS: We established and validated genetic prediction models for protein levels in plasma as instruments to investigate the associations between genetically predicted protein levels and AD risk. We studied 71,880 (proxy) cases and 383,378 (proxy) controls of European descent. RESULTS: We identified 69 proteins with genetically predicted concentrations showing associations with AD risk. The drugs almitrine and ciclopirox targeting ATP1A1 were suggested to have a potential for being repositioned for AD treatment. CONCLUSIONS: Our study provides additional insights into the underlying mechanisms of AD and potential therapeutic strategies.


Asunto(s)
Enfermedad de Alzheimer , Humanos , Enfermedad de Alzheimer/genética , Proteómica , Factores de Riesgo , Proteínas Sanguíneas/genética , Biomarcadores , Estudio de Asociación del Genoma Completo
8.
Gigascience ; 132024 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-38608280

RESUMEN

Pancreatic ductal adenocarcinoma (PDAC) remains a lethal malignancy, largely due to the paucity of reliable biomarkers for early detection and therapeutic targeting. Existing blood protein biomarkers for PDAC often suffer from replicability issues, arising from inherent limitations such as unmeasured confounding factors in conventional epidemiologic study designs. To circumvent these limitations, we use genetic instruments to identify proteins with genetically predicted levels to be associated with PDAC risk. Leveraging genome and plasma proteome data from the INTERVAL study, we established and validated models to predict protein levels using genetic variants. By examining 8,275 PDAC cases and 6,723 controls, we identified 40 associated proteins, of which 16 are novel. Functionally validating these candidates by focusing on 2 selected novel protein-encoding genes, GOLM1 and B4GALT1, we demonstrated their pivotal roles in driving PDAC cell proliferation, migration, and invasion. Furthermore, we also identified potential drug repurposing opportunities for treating PDAC. SIGNIFICANCE: PDAC is a notoriously difficult-to-treat malignancy, and our limited understanding of causal protein markers hampers progress in developing effective early detection strategies and treatments. Our study identifies novel causal proteins using genetic instruments and subsequently functionally validates selected novel proteins. This dual approach enhances our understanding of PDAC etiology and potentially opens new avenues for therapeutic interventions.


Asunto(s)
Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Proteoma , Carcinoma Ductal Pancreático/diagnóstico , Carcinoma Ductal Pancreático/genética , Glicosiltransferasas , Neoplasias Pancreáticas/diagnóstico , Neoplasias Pancreáticas/genética , Biomarcadores , Proteínas de la Membrana
9.
Curr Med Chem ; 29(5): 865-880, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34348604

RESUMEN

MicroRNAs (miRNAs) are central players that regulate the post-transcriptional processes of gene expression. Binding of miRNAs to target mRNAs can repress their translation by inducing the degradation or by inhibiting the translation of the target mRNAs. Highthroughput experimental approaches for miRNA target identification are costly and timeconsuming, depending on various factors. It is vitally important to develop bioinformatics methods for accurately predicting miRNA targets. With the increase of RNA sequences in the post-genomic era, bioinformatics methods are being developed for miRNA studies especially for miRNA target prediction. This review summarizes the current development of state-of-the-art bioinformatics tools for miRNA target prediction, points out the progress and limitations of the available miRNA databases, and their working principles. Finally, we discuss the caveat and perspectives of the next-generation algorithms for the prediction of miRNA targets.


Asunto(s)
MicroARNs , Algoritmos , Biología Computacional/métodos , Humanos , MicroARNs/genética , MicroARNs/metabolismo , ARN Mensajero/genética
10.
PLoS One ; 14(5): e0217027, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31120939

RESUMEN

BACKGROUND: Gene shaving (GS) is an essential and challenging tools for biomedical researchers due to the large number of genes in human genome and the complex nature of biological networks. Most GS methods are not applicable to non-linear and multi-view data sets. While the kernel based methods can overcome these problems, a well-founded positive definite kernel based GS method has yet to be proposed for biomedical data analysis. METHODS AND FINDINGS: Since the kernel based methods on genomic information can improve the prediction of diseases, here we proposed a noble method, "kernel based gene shaving" which is based on the influence function of kernel canonical correlation analysis. To investigate the performance of the proposed method in comparison to state-of-the-art-method in gene saving, we analyzed extensive simulated and real microarray gene expression data set. The performance metrics including true positive rate, true negative rate, false positive rate, false negative rate, misclassification error rate, the false discovery rate and area under curves were computed for each methods. In colon cancer data analysis, the proposed method identified a significant subsets of 210 genes out of 2000 genes and suggestive superior performance compared with other methods. The proposed method can be applied to the study of other disease process where two view data is a common task. CONCLUSIONS: We addressed the challenge of finding unique kernel based GS methods by using the influence function of kernel canonical correlation analysis. The proposed method has shown to have better performance than state-of-the-art-methods in gene saving and has identified many more significant gene interactions, suggesting that genes function in a concerted effort in colon cancer. In similar biomedical data analysis, kernel based methods could be applied to select a potential subset of genes. The positive definite kernel based methods can overcome the non-linearity problem and improve the prediction process.


Asunto(s)
Neoplasias del Colon/diagnóstico , Neoplasias del Colon/genética , Técnicas Genéticas , Aprendizaje Automático , Algoritmos , Área Bajo la Curva , Inteligencia Artificial , Biología Computacional , Simulación por Computador , Reacciones Falso Positivas , Perfilación de la Expresión Génica , Humanos , Dinámicas no Lineales , Análisis de Secuencia por Matrices de Oligonucleótidos , Programas Informáticos
11.
J Neurosci Methods ; 309: 161-174, 2018 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-30184473

RESUMEN

BACKGROUND: Technological advances are enabling us to collect multimodal datasets at an increasing depth and resolution while with decreasing labors. Understanding complex interactions among multimodal datasets, however, is challenging. NEW METHOD: In this study, we tested the interaction effect of multimodal datasets using a novel method called the kernel machine for detecting higher order interactions among biologically relevant multimodal data. Using a semiparametric method on a reproducing kernel Hilbert space, we formulated the proposed method as a standard mixed-effects linear model and derived a score-based variance component statistic to test higher order interactions between multimodal datasets. RESULTS: The method was evaluated using extensive numerical simulation and real data from the Mind Clinical Imaging Consortium with both schizophrenia patients and healthy controls. Our method identified 13-triplets that included 6 gene-derived SNPs, 10 ROIs, and 6 gene-specific DNA methylations that are correlated with the changes in hippocampal volume, suggesting that these triplets may be important for explaining schizophrenia-related neurodegeneration. COMPARISON WITH EXISTING METHOD(S): The performance of the proposed method is compared with the following methods: test based on only first and first few principal components followed by multiple regression, and full principal component analysis regression, and the sequence kernel association test. CONCLUSIONS: With strong evidence (p-value ≤0.000001), the triplet (MAGI2, CRBLCrus1.L, FBXO28) is a significant biomarker for schizophrenia patients. This novel method can be applicable to the study of other disease processes, where multimodal data analysis is a common task.


Asunto(s)
Aprendizaje Automático , Análisis Multivariante , Neuroimagen/métodos , Esquizofrenia/diagnóstico , Adulto , Algoritmos , Simulación por Computador , Femenino , Humanos , Masculino , Curva ROC , Esquizofrenia/diagnóstico por imagen , Esquizofrenia/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA