Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.238
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38600665

RESUMEN

Single-cell RNA sequencing (scRNA-seq) facilitates the study of cell type heterogeneity and the construction of cell atlas. However, due to its limitations, many genes may be detected to have zero expressions, i.e. dropout events, leading to bias in downstream analyses and hindering the identification and characterization of cell types and cell functions. Although many imputation methods have been developed, their performances are generally lower than expected across different kinds and dimensions of data and application scenarios. Therefore, developing an accurate and robust single-cell gene expression data imputation method is still essential. Considering to maintain the original cell-cell and gene-gene correlations and leverage bulk RNA sequencing (bulk RNA-seq) data information, we propose scINRB, a single-cell gene expression imputation method with network regularization and bulk RNA-seq data. scINRB adopts network-regularized non-negative matrix factorization to ensure that the imputed data maintains the cell-cell and gene-gene similarities and also approaches the gene average expression calculated from bulk RNA-seq data. To evaluate the performance, we test scINRB on simulated and experimental datasets and compare it with other commonly used imputation methods. The results show that scINRB recovers gene expression accurately even in the case of high dropout rates and dimensions, preserves cell-cell and gene-gene similarities and improves various downstream analyses including visualization, clustering and trajectory inference.


Asunto(s)
Algoritmos , Análisis de la Célula Individual , RNA-Seq , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN/métodos , Análisis por Conglomerados , Expresión Génica , Perfilación de la Expresión Génica , Programas Informáticos
2.
Proc Natl Acad Sci U S A ; 120(18): e2218197120, 2023 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-37094150

RESUMEN

System identification learns mathematical models of dynamic systems starting from input-output data. Despite its long history, such research area is still extremely active. New challenges are posed by identification of complex physical processes given by the interconnection of dynamic systems. Examples arise in biology and industry, e.g., in the study of brain dynamics or sensor networks. In the last years, regularized kernel-based identification, with inspiration from machine learning, has emerged as an interesting alternative to the classical approach commonly adopted in the literature. In the linear setting, it uses the class of stable kernels to include fundamental features of physical dynamical systems, e.g., smooth exponential decay of impulse responses. Such class includes also unknown continuous parameters, called hyperparameters, which play a similar role as the model discrete order in controlling complexity. In this paper, we develop a linear system identification procedure by casting stable kernels in a full Bayesian framework. Our models incorporate hyperparameters uncertainty and consist of a mixture of dynamic systems over a continuum spectrum of dimensions. They are obtained by overcoming drawbacks related to classical Markov chain Monte Carlo schemes that, when applied to stable kernels, are proved to become nearly reducible (i.e., unable to reconstruct posteriors of interest in reasonable time). Numerical experiments show that full Bayes frequently outperforms the state-of-the-art results on typical benchmark problems. Two real applications related to brain dynamics (neural activity) and sensor networks are also included.

3.
J Neurosci ; 44(12)2024 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-38388427

RESUMEN

Individual differences in cognitive performance in childhood are a key predictor of significant life outcomes such as educational attainment and mental health. Differences in cognitive ability are governed in part by variations in brain structure. However, studies commonly focus on either gray or white matter metrics in humans, leaving open the key question as to whether gray or white matter microstructure plays distinct or complementary roles supporting cognitive performance. To compare the role of gray and white matter in supporting cognitive performance, we used regularized structural equation models to predict cognitive performance with gray and white matter measures. Specifically, we compared how gray matter (volume, cortical thickness, and surface area) and white matter measures (volume, fractional anisotropy, and mean diffusivity) predicted individual differences in cognitive performance. The models were tested in 11,876 children (ABCD Study; 5,680 female, 6,196 male) at 10 years old. We found that gray and white matter metrics bring partly nonoverlapping information to predict cognitive performance. The models with only gray or white matter explained respectively 15.4 and 12.4% of the variance in cognitive performance, while the combined model explained 19.0%. Zooming in, we additionally found that different metrics within gray and white matter had different predictive power and that the tracts/regions that were most predictive of cognitive performance differed across metrics. These results show that studies focusing on a single metric in either gray or white matter to study the link between brain structure and cognitive performance are missing a key part of the equation.


Asunto(s)
Sustancia Blanca , Niño , Humanos , Masculino , Femenino , Sustancia Blanca/diagnóstico por imagen , Encéfalo/diagnóstico por imagen , Sustancia Gris/diagnóstico por imagen , Imagen de Difusión por Resonancia Magnética , Cognición
4.
Methods ; 226: 61-70, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38631404

RESUMEN

As the most abundant mRNA modification, m6A controls and influences many aspects of mRNA metabolism including the mRNA stability and degradation. However, the role of specific m6A sites in regulating gene expression still remains unclear. In additional, the multicollinearity problem caused by the correlation of methylation level of multiple m6A sites in each gene could influence the prediction performance. To address the above challenges, we propose an elastic-net regularized negative binomial regression model (called m6Aexpress-enet) to predict which m6A site could potentially regulate its gene expression. Comprehensive evaluations on simulated datasets demonstrate that m6Aexpress-enet could achieve the top prediction performance. Applying m6Aexpress-enet on real MeRIP-seq data from human lymphoblastoid cell lines, we have uncovered the complex regulatory pattern of predicted m6A sites and their unique enrichment pathway of the constructed co-methylation modules. m6Aexpress-enet proves itself as a powerful tool to enable biologists to discover the mechanism of m6A regulatory gene expression. Furthermore, the source code and the step-by-step implementation of m6Aexpress-enet is freely accessed at https://github.com/tengzhangs/m6Aexpress-enet.


Asunto(s)
Regulación de la Expresión Génica , ARN Mensajero , Humanos , ARN Mensajero/genética , ARN Mensajero/metabolismo , Regulación de la Expresión Génica/genética , Biología Computacional/métodos , Metilación , Programas Informáticos , Adenosina/metabolismo , Adenosina/genética , Adenosina/análogos & derivados , Análisis de Regresión
5.
Proc Natl Acad Sci U S A ; 119(9)2022 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-35197293

RESUMEN

Entropic outlier sparsification (EOS) is proposed as a cheap and robust computational strategy for learning in the presence of data anomalies and outliers. EOS dwells on the derived analytic solution of the (weighted) expected loss minimization problem subject to Shannon entropy regularization. An identified closed-form solution is proven to impose additional costs that depend linearly on statistics size and are independent of data dimension. Obtained analytic results also explain why the mixtures of spherically symmetric Gaussians-used heuristically in many popular data analysis algorithms-represent an optimal and least-biased choice for the nonparametric probability distributions when working with squared Euclidean distances. The performance of EOS is compared to a range of commonly used tools on synthetic problems and on partially mislabeled supervised classification problems from biomedicine. Applying EOS for coinference of data anomalies during learning is shown to allow reaching an accuracy of [Formula: see text] when predicting patient mortality after heart failure, statistically significantly outperforming predictive performance of common learning tools for the same data.

6.
Proc Natl Acad Sci U S A ; 119(45): e2206704119, 2022 11 08.
Artículo en Inglés | MEDLINE | ID: mdl-36322739

RESUMEN

New neurons are continuously generated in the subgranular zone of the dentate gyrus throughout adulthood. These new neurons gradually integrate into hippocampal circuits, forming new naive synapses. Viewed from this perspective, these new neurons may represent a significant source of "wiring" noise in hippocampal networks. In machine learning, such noise injection is commonly used as a regularization technique. Regularization techniques help prevent overfitting training data and allow models to generalize learning to new, unseen data. Using a computational modeling approach, here we ask whether a neurogenesis-like process similarly acts as a regularizer, facilitating generalization in a category learning task. In a convolutional neural network (CNN) trained on the CIFAR-10 object recognition dataset, we modeled neurogenesis as a replacement/turnover mechanism, where weights for a randomly chosen small subset of hidden layer neurons were reinitialized to new values as the model learned to categorize 10 different classes of objects. We found that neurogenesis enhanced generalization on unseen test data compared to networks with no neurogenesis. Moreover, neurogenic networks either outperformed or performed similarly to networks with conventional noise injection (i.e., dropout, weight decay, and neural noise). These results suggest that neurogenesis can enhance generalization in hippocampal learning through noise injection, expanding on the roles that neurogenesis may have in cognition.


Asunto(s)
Memoria , Neurogénesis , Memoria/fisiología , Neurogénesis/fisiología , Hipocampo/fisiología , Neuronas/fisiología , Sinapsis , Giro Dentado/fisiología
7.
BMC Bioinformatics ; 25(1): 169, 2024 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-38684942

RESUMEN

Many important biological facts have been found as single-cell RNA sequencing (scRNA-seq) technology has advanced. With the use of this technology, it is now possible to investigate the connections among individual cells, genes, and illnesses. For the analysis of single-cell data, clustering is frequently used. Nevertheless, biological data usually contain a large amount of noise data, and traditional clustering methods are sensitive to noise. However, acquiring higher-order spatial information from the data alone is insufficient. As a result, getting trustworthy clustering findings is challenging. We propose the Cauchy hyper-graph Laplacian non-negative matrix factorization (CHLNMF) as a unique approach to address these issues. In CHLNMF, we replace the measurement based on Euclidean distance in the conventional non-negative matrix factorization (NMF), which can lessen the influence of noise, with the Cauchy loss function (CLF). The model also incorporates the hyper-graph constraint, which takes into account the high-order link among the samples. The CHLNMF model's best solution is then discovered using a half-quadratic optimization approach. Finally, using seven scRNA-seq datasets, we contrast the CHLNMF technique with the other nine top methods. The validity of our technique was established by analysis of the experimental outcomes.


Asunto(s)
Algoritmos , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN/métodos , Humanos , Análisis por Conglomerados , Biología Computacional/métodos
8.
J Cell Mol Med ; 28(17): e18553, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39239860

RESUMEN

Microbes are involved in a wide range of biological processes and are closely associated with disease. Inferring potential disease-associated microbes as the biomarkers or drug targets may help prevent, diagnose and treat complex human diseases. However, biological experiments are time-consuming and expensive. In this study, we introduced a new method called iPALM-GLMF, which modelled microbe-disease association prediction as a problem of non-negative matrix factorization with graph dual regularization terms and L 2 , 1 $$ {L}_{2,1} $$ norm regularization terms. The graph dual regularization terms were used to capture potential features in the microbe and disease space, and the L 2 , 1 $$ {L}_{2,1} $$ norm regularization terms were used to ensure the sparsity of the feature matrices obtained from the non-negative matrix factorization and to improve the interpretability. To solve the model, iPALM-GLMF used a non-negative double singular value decomposition to initialize the matrix factorization and adopted an inertial Proximal Alternating Linear Minimization iterative process to obtain the final matrix factorization results. As a result, iPALM-GLMF performed better than other existing methods in leave-one-out cross-validation and fivefold cross-validation. In addition, case studies of different diseases demonstrated that iPALM-GLMF could effectively predict potential microbial-disease associations. iPALM-GLMF is publicly available at https://github.com/LiangzheZhang/iPALM-GLMF.


Asunto(s)
Algoritmos , Humanos , Biología Computacional/métodos , Microbiota
9.
BMC Genomics ; 25(1): 885, 2024 Sep 20.
Artículo en Inglés | MEDLINE | ID: mdl-39304826

RESUMEN

MicroRNAs (miRNAs) have been demonstrated to be closely related to human diseases. Studying the potential associations between miRNAs and diseases contributes to our understanding of disease pathogenic mechanisms. As traditional biological experiments are costly and time-consuming, computational models can be considered as effective complementary tools. In this study, we propose a novel model of robust orthogonal non-negative matrix tri-factorization (NMTF) with self-paced learning and dual hypergraph regularization, named SPLHRNMTF, to predict miRNA-disease associations. More specifically, SPLHRNMTF first uses a non-linear fusion method to obtain miRNA and disease comprehensive similarity. Subsequently, the improved miRNA-disease association matrix is reformulated based on weighted k-nearest neighbor profiles to correct false-negative associations. In addition, we utilize L 2 , 1 norm to replace Frobenius norm to calculate residual error, alleviating the impact of noise and outliers on prediction performance. Then, we integrate self-paced learning into NMTF to alleviate the model from falling into bad local optimal solutions by gradually including samples from easy to complex. Finally, hypergraph regularization is introduced to capture high-order complex relations from hypergraphs related to miRNAs and diseases. In 5-fold cross-validation five times experiments, SPLHRNMTF obtains higher average AUC values than other baseline models. Moreover, the case studies on breast neoplasms and lung neoplasms further demonstrate the accuracy of SPLHRNMTF. Meanwhile, the potential associations discovered are of biological significance.


Asunto(s)
Biología Computacional , MicroARNs , MicroARNs/genética , Humanos , Biología Computacional/métodos , Algoritmos , Predisposición Genética a la Enfermedad , Aprendizaje Automático , Neoplasias Pulmonares/genética
10.
Neuroimage ; 299: 120839, 2024 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-39251116

RESUMEN

Accurate diagnosis of mental disorders is expected to be achieved through the identification of reliable neuroimaging biomarkers with the help of cutting-edge feature selection techniques. However, existing feature selection methods often fall short in capturing the local structural characteristics among samples and effectively eliminating redundant features, resulting in inadequate performance in disorder prediction. To address this gap, we propose a novel supervised method named local-structure-preservation and redundancy-removal-based feature selection (LRFS), and then apply it to the identification of meaningful biomarkers for schizophrenia (SZ). LRFS method leverages graph-based regularization to preserve original sample similarity relationships during data transformation, thus retaining crucial local structure information. Additionally, it introduces redundancy-removal regularization based on interrelationships among features to exclude similar and redundant features from high-dimensional data. Moreover, LRFS method incorporates l2,1 sparse regularization that enables selecting a sparse and noise-robust feature subset. Experimental evaluations on eight public datasets with diverse properties demonstrate the superior performance of our method over nine popular feature selection methods in identifying discriminative features, with average classification accuracy gains ranging from 1.30 % to 9.11 %. Furthermore, the LRFS method demonstrates superior discriminability in four functional magnetic resonance imaging (fMRI) datasets from 708 healthy controls (HCs) and 537 SZ patients, with an average increase in classification accuracy ranging from 1.89 % to 9.24 % compared to other nine methods. Notably, our method reveals reproducible and significant changes in SZ patients relative to HCs across the four datasets, predominantly in the thalamus-related functional network connectivity, which exhibit a significant correlation with clinical symptoms. Convergence analysis, parameter sensitivity analysis, and ablation studies further demonstrate the effectiveness and robustness of our method. In short, our proposed feature selection method effectively identifies discriminative and reliable features that hold the potential to be biomarkers, paving the way for the elucidation of brain abnormalities and the advancement of precise diagnosis of mental disorders.


Asunto(s)
Biomarcadores , Imagen por Resonancia Magnética , Esquizofrenia , Esquizofrenia/diagnóstico por imagen , Humanos , Imagen por Resonancia Magnética/métodos , Encéfalo/diagnóstico por imagen , Adulto , Femenino , Masculino , Neuroimagen/métodos
11.
Biostatistics ; 2023 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-37660312

RESUMEN

Despite growing interest in estimating individualized treatment rules, little attention has been given the binary outcome setting. Estimation is challenging with nonlinear link functions, especially when variable selection is needed. We use a new computational approach to solve a recently proposed doubly robust regularized estimating equation to accomplish this difficult task in a case study of depression treatment. We demonstrate an application of this new approach in combination with a weighted and penalized estimating equation to this challenging binary outcome setting. We demonstrate the double robustness of the method and its effectiveness for variable selection. The work is motivated by and applied to an analysis of treatment for unipolar depression using a population of patients treated at Kaiser Permanente Washington.

12.
Biostatistics ; 24(2): 227-243, 2023 04 14.
Artículo en Inglés | MEDLINE | ID: mdl-34545394

RESUMEN

Many studies collect functional data from multiple subjects that have both multilevel and multivariate structures. An example of such data comes from popular neuroscience experiments where participants' brain activity is recorded using modalities such as electroencephalography and summarized as power within multiple time-varying frequency bands within multiple electrodes, or brain regions. Summarizing the joint variation across multiple frequency bands for both whole-brain variability between subjects, as well as location-variation within subjects, can help to explain neural reactions to stimuli. This article introduces a novel approach to conducting interpretable principal components analysis on multilevel multivariate functional data that decomposes total variation into subject-level and replicate-within-subject-level (i.e., electrode-level) variation and provides interpretable components that can be both sparse among variates (e.g., frequency bands) and have localized support over time within each frequency band. Smoothness is achieved through a roughness penalty, while sparsity and localization of components are achieved by solving an innovative rank-one based convex optimization problem with block Frobenius and matrix $L_1$-norm-based penalties. The method is used to analyze data from a study to better understand reactions to emotional information in individuals with histories of trauma and the symptom of dissociation, revealing new neurophysiological insights into how subject- and electrode-level brain activity are associated with these phenomena. Supplementary materials for this article are available online.


Asunto(s)
Encéfalo , Electroencefalografía , Humanos , Análisis de Componente Principal , Encéfalo/fisiología , Electroencefalografía/métodos
13.
Small ; 20(37): e2402105, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38727184

RESUMEN

The scarcity of fresh water necessitates sustainable and efficient water desalination strategies. Solar-driven steam generation (SSG), which employs solar energy for water evaporation, has emerged as a promising approach. Graphene oxide (GO)-based membranes possess advantages like capillary action and Marangoni effect, but their stacking defects and dead zones of flexible flakes hinders efficient water transportation, thus the evaporation rate lag behind unobstructed-porous 3D evaporators. Therefore, fundamental mass-transfer approach for optimizing SSG evaporators offers new horizons. Herein, a universal multi-force-fields-based method is presented to regularize membrane channels, which can mechanically eliminate inherent interlayer stackings and defects. Both characterization and simulation demonstrate the effectiveness of this approach across different scales and explain the intrinsic mechanism of mass-transfer enhancement. When combined with a structurally optimized substrate, the 4Laponite@GO-1 achieves evaporation rate of 2.782 kg m-2 h-1 with 94.48% evaporation efficiency, which is comparable with most 3D evaporators. Moreover, the optimized membrane exhibits excellent cycling stability (10 days) and tolerance to extreme conditions (pH 1-14, salinity 1%-15%), verifies the robust structural stability of regularized channels. This optimization strategy provides simple but efficient way to enhance the SSG performance of GO-based membranes, facilitating their extensive application in sustainable water purification technologies.

14.
Cancer Causes Control ; 35(7): 1075-1088, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38532045

RESUMEN

PURPOSE: Food insecurity-the lack of unabated access to nutritious foods-is a consequence many cancer survivors face. Food insecurity is associated with adverse health outcomes and lower diet quality in the general public. The goal of this analysis was to extract major and prevailing dietary patterns among food insecure cancer survivors from observed 24-h recall data and evaluate their relationship to survival after a cancer diagnosis. METHODS: We implemented two dietary patterns analysis approaches: penalized logistic regression and principal components analysis. Using nationally representative data from the National Health and Nutrition Examination Survey (NHANES) study, we extracted three dietary patterns. Additionally, we evaluated the HEI-2015 for comparison. Cox proportional hazards models assessed the relationship between the diet quality indices and survival after a cancer diagnosis. RESULTS: There were 981 deaths from all causes and 343 cancer-related deaths. After multivariable adjustment, we found higher risks of all-cause mortality associated with higher adherence to Pattern #1 (HR 1.25; 95% CI 1.09-1.43) and Pattern #2 (HR 1.15; 95% CI 1.01-1.31) among cancer survivors. CONCLUSION: Among all cancer survivors, higher adherence to major and prevailing dietary patterns from the U.S. food insecure cancer survivor population may lead to worse survival outcomes.


Asunto(s)
Supervivientes de Cáncer , Dieta , Inseguridad Alimentaria , Neoplasias , Encuestas Nutricionales , Humanos , Femenino , Masculino , Persona de Mediana Edad , Supervivientes de Cáncer/estadística & datos numéricos , Neoplasias/mortalidad , Neoplasias/epidemiología , Estados Unidos/epidemiología , Adulto , Anciano , Conducta Alimentaria , Patrones Dietéticos
15.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34607360

RESUMEN

Learning node representation is a fundamental problem in biological network analysis, as compact representation features reveal complicated network structures and carry useful information for downstream tasks such as link prediction and node classification. Recently, multiple networks that profile objects from different aspects are increasingly accumulated, providing the opportunity to learn objects from multiple perspectives. However, the complex common and specific information across different networks pose challenges to node representation methods. Moreover, ubiquitous noise in networks calls for more robust representation. To deal with these problems, we present a representation learning method for multiple biological networks. First, we accommodate the noise and spurious edges in networks using denoised diffusion, providing robust connectivity structures for the subsequent representation learning. Then, we introduce a graph regularized integration model to combine refined networks and compute common representation features. By using the regularized decomposition technique, the proposed model can effectively preserve the common structural property of different networks and simultaneously accommodate their specific information, leading to a consistent representation. A simulation study shows the superiority of the proposed method on different levels of noisy networks. Three network-based inference tasks, including drug-target interaction prediction, gene function identification and fine-grained species categorization, are conducted using representation features learned from our method. Biological networks at different scales and levels of sparsity are involved. Experimental results on real-world data show that the proposed method has robust performance compared with alternatives. Overall, by eliminating noise and integrating effectively, the proposed method is able to learn useful representations from multiple biological networks.


Asunto(s)
Aprendizaje , Redes Neurales de la Computación , Simulación por Computador , Difusión
16.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34607358

RESUMEN

The discovery of cancer subtypes has become much-researched topic in oncology. Dividing cancer patients into subtypes can provide personalized treatments for heterogeneous patients. High-throughput technologies provide multiple omics data for cancer subtyping. Integration of multi-view data is used to identify cancer subtypes in many computational methods, which obtain different subtypes for the same cancer, even using the same multi-omics data. To a certain extent, these subtypes from distinct methods are related, which may have certain guiding significance for cancer subtyping. It is a challenge to effectively utilize the valuable information of distinct subtypes to produce more accurate and reliable subtypes. A weighted ensemble sparse latent representation (subtype-WESLR) is proposed to detect cancer subtypes on heterogeneous omics data. Using a weighted ensemble strategy to fuse base clustering obtained by distinct methods as prior knowledge, subtype-WESLR projects each sample feature profile from each data type to a common latent subspace while maintaining the local structure of the original sample feature space and consistency with the weighted ensemble and optimizes the common subspace by an iterative method to identify cancer subtypes. We conduct experiments on various synthetic datasets and eight public multi-view datasets from The Cancer Genome Atlas. The results demonstrate that subtype-WESLR is better than competing methods by utilizing the integration of base clustering of exist methods for more precise subtypes.


Asunto(s)
Algoritmos , Neoplasias , Análisis por Conglomerados , Humanos , Neoplasias/genética
17.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35323901

RESUMEN

MOTIVATION: MicroRNAs (miRNAs), as critical regulators, are involved in various fundamental and vital biological processes, and their abnormalities are closely related to human diseases. Predicting disease-related miRNAs is beneficial to uncovering new biomarkers for the prevention, detection, prognosis, diagnosis and treatment of complex diseases. RESULTS: In this study, we propose a multi-view Laplacian regularized deep factorization machine (DeepFM) model, MLRDFM, to predict novel miRNA-disease associations while improving the standard DeepFM. Specifically, MLRDFM improves DeepFM from two aspects: first, MLRDFM takes the relationships among items into consideration by regularizing their embedding features via their similarity-based Laplacians. In this study, miRNA Laplacian regularization integrates four types of miRNA similarity, while disease Laplacian regularization integrates two types of disease similarity. Second, to judiciously train our model, Laplacian eigenmaps are utilized to initialize the weights in the dense embedding layer. The experimental results on the latest HMDD v3.2 dataset show that MLRDFM improves the performance and reduces the overfitting phenomenon of DeepFM. Besides, MLRDFM is greatly superior to the state-of-the-art models in miRNA-disease association prediction in terms of different evaluation metrics with the 5-fold cross-validation. Furthermore, case studies further demonstrate the effectiveness of MLRDFM.


Asunto(s)
MicroARNs , Algoritmos , Biología Computacional/métodos , Predisposición Genética a la Enfermedad , Humanos , MicroARNs/genética
18.
Brief Bioinform ; 23(6)2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36168938

RESUMEN

More and more evidence indicates that the dysregulations of microRNAs (miRNAs) lead to diseases through various kinds of underlying mechanisms. Identifying the multiple types of disease-related miRNAs plays an important role in studying the molecular mechanism of miRNAs in diseases. Moreover, compared with traditional biological experiments, computational models are time-saving and cost-minimized. However, most tensor-based computational models still face three main challenges: (i) easy to fall into bad local minima; (ii) preservation of high-order relations; (iii) false-negative samples. To this end, we propose a novel tensor completion framework integrating self-paced learning, hypergraph regularization and adaptive weight tensor into nonnegative tensor factorization, called SPLDHyperAWNTF, for the discovery of potential multiple types of miRNA-disease associations. We first combine self-paced learning with nonnegative tensor factorization to effectively alleviate the model from falling into bad local minima. Then, hypergraphs for miRNAs and diseases are constructed, and hypergraph regularization is used to preserve the high-order complex relations of these hypergraphs. Finally, we innovatively introduce adaptive weight tensor, which can effectively alleviate the impact of false-negative samples on the prediction performance. The average results of 5-fold and 10-fold cross-validation on four datasets show that SPLDHyperAWNTF can achieve better prediction performance than baseline models in terms of Top-1 precision, Top-1 recall and Top-1 F1. Furthermore, we implement case studies to further evaluate the accuracy of SPLDHyperAWNTF. As a result, 98 (MDAv2.0) and 98 (MDAv2.0-2) of top-100 are confirmed by HMDDv3.2 dataset. Moreover, the results of enrichment analysis illustrate that unconfirmed potential associations have biological significance.


Asunto(s)
MicroARNs , Humanos , MicroARNs/genética , Biología Computacional/métodos , Algoritmos , Predisposición Genética a la Enfermedad
19.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-35514181

RESUMEN

With the development of high-throughput technologies, the accumulation of large amounts of multidimensional genomic data provides an excellent opportunity to study the multilevel biological regulatory relationships in cancer. Based on the hypothesis of competitive endogenous ribonucleic acid (RNA) (ceRNA) network, lncRNAs can eliminate the inhibition of microRNAs (miRNAs) on their target genes by binding to intracellular miRNA sites so as to improve the expression level of these target genes. However, previous studies on cancer expression mechanism are mostly based on individual or two-dimensional data, and lack of integration and analysis of various RNA-seq data, making it difficult to verify the complex biological relationships involved. To explore RNA expression patterns and potential molecular mechanisms of cancer, a network-regularized sparse orthogonal-regularized joint non-negative matrix factorization (NSOJNMF) algorithm is proposed, which combines the interaction relations among RNA-seq data in the way of network regularization and effectively prevents multicollinearity through sparse constraints and orthogonal regularization constraints to generate good modular sparse solutions. NSOJNMF algorithm is performed on the datasets of liver cancer and colon cancer, then ceRNA co-modules of them are recognized. The enrichment analysis of these modules shows that >90% of them are closely related to the occurrence and development of cancer. In addition, the ceRNA networks constructed by the ceRNA co-modules not only accurately mine the known correlations of the three RNA molecules but also further discover their potential biological associations, which may contribute to the exploration of the competitive relationships among multiple RNAs and the molecular mechanisms affecting tumor development.


Asunto(s)
Neoplasias del Colon , MicroARNs , ARN Largo no Codificante , Neoplasias del Colon/genética , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Genómica , Humanos , MicroARNs/genética , MicroARNs/metabolismo , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , ARN Mensajero/genética
20.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35438149

RESUMEN

Therapeutic peptides act on the skeletal system, digestive system and blood system, have antibacterial properties and help relieve inflammation. In order to reduce the resource consumption of wet experiments for the identification of therapeutic peptides, many computational-based methods have been developed to solve the identification of therapeutic peptides. Due to the insufficiency of traditional machine learning methods in dealing with feature noise. We propose a novel therapeutic peptide identification method called Structured Sparse Regularized Takagi-Sugeno-Kang Fuzzy System on Within-Class Scatter (SSR-TSK-FS-WCS). Our method achieves good performance on multiple therapeutic peptides and UCI datasets.


Asunto(s)
Algoritmos , Lógica Difusa , Aprendizaje Automático , Péptidos/uso terapéutico
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA