Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Brief Bioinform ; 23(6)2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36209415

RESUMEN

Existing methods for differential network analysis could only infer whether two networks of interest have differences between two groups of samples, but could not quantify and localize network differences. In this work, a novel method, permutation-based Network True Discovery Proportions (NetTDP), is proposed to quantify the number of edges (correlations) or nodes (genes) for which the co-expression networks are different. In the NetTDP method, we propose an edge-level statistic and a node-level statistic, and detect true discoveries of edges and nodes in the sense of differential co-expression network, respectively, by the permutation-based sumSome method. Furthermore, the NetTDP method could further localize the differences by inferring the TDPs for edge or gene subsets of interest, which can be selected post hoc. Our NetTDP method allows inference on data-driven modules or biology-driven gene sets, and remains valid even when these sub-networks are optimized using the same data. Experimental results on both simulation data sets and five real data sets show the effectiveness of the proposed method in inferring the quantification and localization of differential co-expression networks. The R code is available at https://github.com/LiminLi-xjtu/NetTDP.


Asunto(s)
Biología Computacional , Redes Reguladoras de Genes , Biología Computacional/métodos , Algoritmos , Simulación por Computador
2.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33515011

RESUMEN

MOTIVATION: Gene set enrichment analysis (GSEA) has been widely used to identify gene sets with statistically significant difference between cases and controls against a large gene set. GSEA needs both phenotype labels and expression of genes. However, gene expression are assessed more often for model organisms than minor species. Also, importantly gene expression are not measured well under specific conditions for human, due to high risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus, predicting enrichment significance (on a phenotype) of a given gene set of a species (target, say human), by using gene expression measured under the same phenotype of the other species (source, say mouse) is a vital and challenging problem, which we call CROSS-species gene set enrichment problem (XGSEP). RESULTS: For XGSEP, we propose the CROSS-species gene set enrichment analysis (XGSEA), with three steps of: (1) running GSEA for a source species to obtain enrichment scores and $p$-values of source gene sets; (2) representing the relation between source and target gene sets by domain adaptation; and (3) using regression to predict $p$-values of target gene sets, based on the representation in (2). We extensively validated the XGSEA by using five regression and one classification measurements on four real data sets under various settings, proving that the XGSEA significantly outperformed three baseline methods in most cases. A case study of identifying important human pathways for T -cell dysfunction and reprogramming from mouse ATAC-Seq data further confirmed the reliability of the XGSEA. AVAILABILITY: Source code of the XGSEA is available through https://github.com/LiminLi-xjtu/XGSEA.


Asunto(s)
Neoplasias Encefálicas/genética , Aprendizaje Automático , Melanoma/genética , Neoplasias Ováricas/genética , Neoplasias Cutáneas/genética , Animales , Neoplasias Encefálicas/inmunología , Neoplasias Encefálicas/patología , Biología Computacional/métodos , Conjuntos de Datos como Asunto , Embrión de Mamíferos , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Melanoma/inmunología , Melanoma/patología , Ratones , Neoplasias Ováricas/inmunología , Neoplasias Ováricas/patología , Neoplasias Cutáneas/inmunología , Neoplasias Cutáneas/patología , Linfocitos T/inmunología , Linfocitos T/patología , Pez Cebra
3.
Artículo en Inglés | MEDLINE | ID: mdl-31056512

RESUMEN

Cross-species or Cross-platform data classification is a challenging problem in the field of bioinformatics, which aims to classify data samples in one species/platform by using labeled data samples in another species/platform. Traditional classification methods can not be used in this case, since the samples from two species/platforms may have different feature spaces, or follow different statistical distributions. Domain adaptation is a new strategy which could be used to deal with this problem. A big challenge in domain adaptation is how to reduce the difference and correct the drift between the source and the target domains in the heterogeneous case, when the feature spaces of the two domains are different. It has been shown theoretically that probability divergences between the two domains such as maximum mean discrepancy (MMD) play an important role in the generalization bound for domain adaptation. However, they are rarely used for heterogeneous domain adaptation due to the different feature spaces of the domains. In this work, we propose a heterogeneous domain adaptation approach by making use of MMD, which measures the probability divergence in an embedded low-dimensional common subspace. Our proposed discriminative heterogeneous MMD approach (DMMD) aims to find new representations of the samples in a common subspace by minimizing the domain probability divergence with preserving the known discriminative information. A conjugate gradient algorithm on a Grassmann manifold is applied to solve the nonlinear DMMD model. Our experiments on both simulation and benchmark machine learning datasets show that our approaches outperform other state-of-the-art approaches for heterogeneous domain adaptation. We finally apply our approach to a cross-platform dataset and a cross-species dataset, and the results show the effectiveness of our approach.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Aprendizaje Automático , Bases de Datos Genéticas , Genómica , Especificidad de la Especie
4.
BMC Genomics ; 21(Suppl 10): 617, 2020 Nov 18.
Artículo en Inglés | MEDLINE | ID: mdl-33208088

RESUMEN

BACKGROUND: Biological evidence has shown that microRNAs(miRNAs) are greatly implicated in various biological progresses involved in human diseases. The identification of miRNA-disease associations(MDAs) is beneficial to disease diagnosis as well as treatment. Due to the high costs of biological experiments, it attracts more and more attention to predict MDAs by computational approaches. RESULTS: In this work, we propose a novel model MTFMDA for miRNA-disease association prediction by matrix tri-factorization, based on the known miRNA-disease associations, two types of miRNA similarities, and two types of disease similarities. The main idea of MTFMDA is to factorize the miRNA-disease association matrix to three matrices, a feature matrix for miRNAs, a feature matrix for diseases, and a low-rank relationship matrix. Our model incorporates the Laplacian regularizers which force the feature matrices to preserve the similarities of miRNAs or diseases. A novel algorithm is proposed to solve the optimization problem. CONCLUSIONS: We evaluate our model by 5-fold cross validation by using known MDAs from HMDD V2.0 and show that our model could obtain the significantly highest AUCs among all the state-of-art methods. We further validate our method by applying it on colon and breast neoplasms in two different types of experiment settings. The new identified associated miRNAs for the two diseases could be verified by two other databases including dbDEMC and HMDD V3.0, which further shows the power of our proposed method.


Asunto(s)
MicroARNs , Algoritmos , Área Bajo la Curva , Biología Computacional , Predisposición Genética a la Enfermedad , Humanos , MicroARNs/genética , MicroARNs/metabolismo
5.
BMC Med Genomics ; 12(Suppl 9): 191, 2019 12 24.
Artículo en Inglés | MEDLINE | ID: mdl-31874642

RESUMEN

BACKGROUND: Recent high throughput technologies have been applied for collecting heterogeneous biomedical omics datasets. Computational analysis of the multi-omics datasets could potentially reveal deep insights for a given disease. Most existing clustering methods by multi-omics data assume strong consistency among different sources of datasets, and thus may lose efficacy when the consistency is relatively weak. Furthermore, they could not identify the conflicting parts for each view, which might be important in applications such as cancer subtype identification. METHODS: In this work, we propose an integrative subspace clustering method (ISC) by common and specific decomposition to identify clustering structures with multi-omics datasets. The main idea of our ISC method is that the original representations for the samples in each view could be reconstructed by the concatenation of a common part and a view-specific part in orthogonal subspaces. The problem can be formulated as a matrix decomposition problem and solved efficiently by our proposed algorithm. RESULTS: The experiments on simulation and text datasets show that our method outperforms other state-of-art methods. Our method is further evaluated by identifying cancer types using a colorectal dataset. We finally apply our method to cancer subtype identification for five cancers using TCGA datasets, and the survival analysis shows that the subtypes we found are significantly better than other compared methods. CONCLUSION: We conclude that our ISC model could not only discover the weak common information across views but also identify the view-specific information.


Asunto(s)
Biología Computacional/métodos , Neoplasias/clasificación , Análisis por Conglomerados , Humanos , Análisis de Supervivencia
6.
IET Syst Biol ; 13(5): 267-275, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-31538961

RESUMEN

In the process of drug discovery and disease treatment, drug repositioning is broadly studied to identify biological targets for existing drugs. Many methods have been proposed for drug-target interaction prediction by taking into account different kinds of data sources. However, most of the existing methods only use one side information for drugs or targets to predict new targets for drugs. Some recent works have improved the prediction accuracy by jointly considering multiple representations of drugs and targets. In this work, the authors propose a drug-target prediction approach by matrix completion with multi-view side information (MCM) of drugs and proteins from both structural view and chemical view. Different from existing studies for drug-target prediction, they predict drug-target interaction by directly completing the interaction matrix between them. The experimental results show that the MCM method could obtain significantly higher accuracies than the comparison methods. They finally report new drug-target interactions for 26 FDA-approved drugs, and biologically discuss these targets using existing references.


Asunto(s)
Biología Computacional/métodos , Reposicionamiento de Medicamentos/métodos , Aprobación de Drogas , Preparaciones Farmacéuticas/metabolismo , Proteínas/metabolismo
7.
IEEE/ACM Trans Comput Biol Bioinform ; 16(5): 1712-1721, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-28541222

RESUMEN

Drug repositioning has been a key problem in drug development, and heterogeneous data sources are used to predict drug-target interactions by different approaches. However, most of studies focus on a single representation of drugs or proteins. It has been shown that integrating multi-view representations of drugs and proteins can strengthen the prediction ability. For example, a drug can be represented by its chemical structure, or by its chemical response in different cells. A protein can be represented by its sequence, or by its gene expression values in different cells. The docking of drugs and proteins based on their structure can be considered as one view (structural view), and the chemical performance of them based on gene expression and drug response can be considered as another view (chemical view). In this work, we first propose a single-view approach of SLRE based on low rank embedding for an arbitrary view, and then extend it to a multi-view approach of MLRE, which could integrate both views. Our experiments show that our methods perform significantly better than baseline methods including single-view methods and multi-view methods. We finally report predicted drug-target interactions for 30 FDA-approved drugs.


Asunto(s)
Desarrollo de Medicamentos/métodos , Aprendizaje Automático , Modelos Estadísticos , Algoritmos , Antineoplásicos/química , Antineoplásicos/metabolismo , Antineoplásicos/farmacología , Línea Celular Tumoral , Reposicionamiento de Medicamentos , Humanos , Transcriptoma/efectos de los fármacos
8.
BMC Syst Biol ; 12(Suppl 9): 141, 2018 12 31.
Artículo en Inglés | MEDLINE | ID: mdl-30598086

RESUMEN

BACKGROUND: Evaluating the significance for a group of genes or proteins in a pathway or biological process for a disease could help researchers understand the mechanism of the disease. For example, identifying related pathways or gene functions for chromatin states of tumor-specific T cells will help determine whether T cells could reprogram or not, and further help design the cancer treatment strategy. Some existing p-value combination methods can be used in this scenario. However, these methods suffer from different disadvantages, and thus it is still challenging to design more powerful and robust statistical method. RESULTS: The existing method of Group combined p-value (GCP) first partitions p-values to several groups using a set of several truncation points, but the method is often sensitive to these truncation points. Another method of adaptive rank truncated product method(ARTP) makes use of multiple truncation integers to adaptively combine the smallest p-values, but the method loses statistical power since it ignores the larger p-values. To tackle these problems, we propose a robust p-value combination method (rPCMP) by considering multiple partitions of p-values with different sets of truncation points. The proposed rPCMP statistic have a three-layer hierarchical structure. The inner-layer considers a statistic which combines p-values in a specified interval defined by two thresholds points, the intermediate-layer uses a GCP statistic which optimizes the statistic from the inner layer for a partition set of threshold points, and the outer-layer integrates the GCP statistic from multiple partitions of p-values. The empirical distribution of statistic under null distribution could be estimated by permutation procedure. CONCLUSIONS: Our proposed rPCMP method has been shown to be more robust and have higher statistical power. Simulation study shows that our method can effectively control the type I error rates and have higher statistical power than the existing methods. We finally apply our rPCMP method to an ATAC-seq dataset for discovering the related gene functions with chromatin states in mouse tumors T cell.


Asunto(s)
Biología Computacional/métodos , Análisis de Secuencia , Modelos Estadísticos
9.
BMC Med Genomics ; 10(Suppl 4): 75, 2017 12 21.
Artículo en Inglés | MEDLINE | ID: mdl-29322925

RESUMEN

BACKGROUND: The Cancer Genome Atlas (TCGA) has collected transcriptome, genome and epigenome information for over 20 cancers from thousands of patients. The availability of these diverse data types makes it necessary to combine these data to capture the heterogeneity of biological processes and phenotypes and further identify homogeneous subtypes for cancers such as breast cancer. Many multi-view clustering approaches are proposed to discover clusters across different data types. The problem is challenging when different data types show poor agreement of clustering structure. RESULTS: In this work, we first propose a multi-view clustering approach with consensus (CMC), which tries to find consensus kernels among views by using Hilbert Schmidt Independence Criterion. To tackle the problem when poor agreement among views exists, we further propose a multi-view clustering approach with enhanced consensus (ECMC) to solve this problem by decomposing the kernel information in each view into a consensus part and a disagreement part. The consensus parts for different views are supposed to be similar, and the disagreement parts should be independent with the consensus parts. Both the CMC and ECMC models can be solved by alternative updating with semi-definite programming. Our experiments on both simulation datasets and real-world benchmark datasets show that ECMC model could achieve higher clustering accuracies than other state-of-art multi-view clustering approaches. We also apply the ECMC model to integrate mRNA expression, DNA methylation and microRNA (miRNA) expression data for five cancer data sets, and the survival analysis show that our ECMC model outperforms other methods when identifying cancer subtypes. By Fisher's combination test method, we found that three computed subtypes roughly correspond to three known breast cancer subtypes including luminal B, HER2 and basal-like subtypes. CONCLUSION: Integrating heterogeneous TCGA datasets by our proposed multi-view clustering approach ECMC could effectively identify cancer subtypes.


Asunto(s)
Genómica , Neoplasias/clasificación , Algoritmos , Análisis por Conglomerados , Metilación de ADN , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Humanos , Aprendizaje Automático , MicroARNs/metabolismo , Neoplasias/genética , Neoplasias/metabolismo , Neoplasias/mortalidad , ARN Mensajero/metabolismo , Análisis de Supervivencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA