Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35021191

RESUMEN

Networks consisting of molecular interactions are intrinsically dynamical systems of an organism. These interactions curated in molecular interaction databases are still not complete and contain false positives introduced by high-throughput screening experiments. In this study, we propose a framework to integrate interactions of functional associated protein-coding genes from 31 data sources to reconstruct a network with high coverage and quality. For each interaction, 369 features were constructed including properties of both the interaction and the involved genes. The training and validation sets were built on the pathway interactions as positives and the potential negative instances resulting from our proposed semi-supervised strategy. Random forest classification method was then applied to train and predict multiple times to give a score for each interaction. After setting a threshold estimated by a Binomial distribution, a Human protein-coding Gene Functional Association Network (HuGFAN) was reconstructed with 20 383 genes and 1185 429 high confidence interactions. Then, HuGFAN was compared with other networks from data sources with respect to network properties, suggesting that HuGFAN is more function and pathway related. Finally, HuGFAN was applied to identify cancer driver through two famous network-based methods (DriverNet and HotNet2) to show its outstanding performance compared with other networks. HuGFAN and other supplementary files are freely available at https://github.com/xthuang226/HuGFAN.


Asunto(s)
Redes Reguladoras de Genes , Aprendizaje Automático , Bases de Datos Factuales , Humanos
2.
PLoS Comput Biol ; 17(8): e1009224, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34383739

RESUMEN

Computational integrative analysis has become a significant approach in the data-driven exploration of biological problems. Many integration methods for cancer subtyping have been proposed, but evaluating these methods has become a complicated problem due to the lack of gold standards. Moreover, questions of practical importance remain to be addressed regarding the impact of selecting appropriate data types and combinations on the performance of integrative studies. Here, we constructed three classes of benchmarking datasets of nine cancers in TCGA by considering all the eleven combinations of four multi-omics data types. Using these datasets, we conducted a comprehensive evaluation of ten representative integration methods for cancer subtyping in terms of accuracy measured by combining both clustering accuracy and clinical significance, robustness, and computational efficiency. We subsequently investigated the influence of different omics data on cancer subtyping and the effectiveness of their combinations. Refuting the widely held intuition that incorporating more types of omics data always produces better results, our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. Our analyses also suggested several effective combinations for most cancers under our studies, which may be of particular interest to researchers in omics data analysis.


Asunto(s)
Biología Computacional/métodos , Neoplasias/clasificación , Neoplasias/genética , Algoritmos , Biomarcadores de Tumor/genética , Interpretación Estadística de Datos , Bases de Datos Genéticas/estadística & datos numéricos , Aprendizaje Profundo , Femenino , Genómica/estadística & datos numéricos , Humanos , Masculino , Aprendizaje Automático no Supervisado
3.
Front Genet ; 10: 966, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31649733

RESUMEN

Cancer subtypes can improve our understanding of cancer, and suggest more precise treatment for patients. Multi-omics molecular data can characterize cancers at different levels. Up to now, many computational methods that integrate multi-omics data for cancer subtyping have been proposed. However, there are no consistent criteria to evaluate the integration methods due to the lack of gold standards (e.g., the number of subtypes in a specific cancer). Since comprehensive evaluation and comparison between different methods serves as a useful tool or guideline for users to select an optimal method for their own purpose, we develop a scalable platform, CEPICS, for comprehensively evaluating and comparing multi-omics data integration methods in cancer subtyping. Given a user-specified maximum number of subtypes, k-max, CEPICS provides (1) cancer subtyping results using up to five built-in state-of-the-art integration methods under the number of subtypes from two to k-max, (2) a report including the evaluation of each user-selected method and comparisons across them using clustering performance metrics and clinical survival analysis, and (3) an overall analysis of subtyping results by different methods representing a robust cancer subtype prediction for samples. Furthermore, users can upload subtyping results of their own methods to compare with the built-in methods. CEPICS is implemented as an R package and is freely available at https://github.com/GaoLabXDU/CEPICS.

4.
Molecules ; 23(1)2017 Dec 25.
Artículo en Inglés | MEDLINE | ID: mdl-29295608

RESUMEN

Driver mutation provides fitness advantage to cancer cells, the accumulation of which increases the fitness of cancer cells and accelerates cancer progression. This work seeks to extract patterns accumulated by driver genes ("fitness relationships") in tumorigenesis. We introduce a network-based method for extracting the fitness relationships of driver genes by modeling the network properties of the "fitness" of cancer cells. Colon adenocarcinoma (COAD) and skin cutaneous malignant melanoma (SKCM) are employed as case studies. Consistent results derived from different background networks suggest the reliability of the identified fitness relationships. Additionally co-occurrence analysis and pathway analysis reveal the functional significance of the fitness relationships with signaling transduction. In addition, a subset of driver genes called the "fitness core" is recognized for each case. Further analyses indicate the functional importance of the fitness core in carcinogenesis, and provide potential therapeutic opportunities in medicinal intervention. Fitness relationships characterize the functional continuity among driver genes in carcinogenesis, and suggest new insights in understanding the oncogenic mechanisms of cancers, as well as providing guiding information for medicinal intervention.


Asunto(s)
Aptitud Genética , Neoplasias/genética , Oncogenes , Humanos , Mutación/genética , Reproducibilidad de los Resultados , Transducción de Señal , Estadística como Asunto
5.
Biomed Res Int ; 2016: 2090286, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27610367

RESUMEN

As smoking rates decrease, proportionally more cases with lung adenocarcinoma occur in never-smokers, while aberrant DNA methylation has been suggested to contribute to the tumorigenesis of lung adenocarcinoma. It is extremely difficult to distinguish which genes play key roles in tumorigenic processes via DNA methylation-mediated gene silencing from a large number of differentially methylated genes. By integrating gene expression and DNA methylation data, a pipeline combined with the differential network analysis is designed to uncover driver methylation genes and responsive modules, which demonstrate distinctive expressions and network topology in tumors with aberrant DNA methylation. Totally, 135 genes are recognized as candidate driver genes in early stage lung adenocarcinoma and top ranked 30 genes are recognized as driver methylation genes. Functional annotation and the differential network analysis indicate the roles of identified driver genes in tumorigenesis, while literature study reveals significant correlations of the top 30 genes with early stage lung adenocarcinoma in never-smokers. The analysis pipeline can also be employed in identification of driver epigenetic events for other cancers characterized by matched gene expression data and DNA methylation data.


Asunto(s)
Adenocarcinoma/genética , Carcinogénesis/genética , Metilación de ADN/genética , Neoplasias Pulmonares/genética , Proteínas de Neoplasias/genética , Adenocarcinoma/patología , Adenocarcinoma del Pulmón , Regulación Neoplásica de la Expresión Génica , Humanos , Neoplasias Pulmonares/patología , Anotación de Secuencia Molecular , Proteínas de Neoplasias/biosíntesis , Estadificación de Neoplasias
6.
IET Syst Biol ; 8(3): 116-25, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-25014378

RESUMEN

Community detection has been extensively studied in the past decades largely because of the fact that community exists in various networks such as technological, social and biological networks. Most of the available algorithms, however, only focus on the properties of the vertices, ignoring the roles of the edges. To explore the roles of the edges in the networks for community discovery, the authors introduce the novel edge centrality based on its antitriangle property. To investigate how the edge centrality characterises the community structure, they develop an approach based on the edge antitriangle centrality with the isolated vertex handling strategy (EACH) for community detection. EACH first calculates the edge antitriangle centrality scores for all the edges of a given network and removes the edge with the highest score per iteration until the scores of the remaining edges are all zero. Furthermore, EACH is characterised by being free of the parameters and independent of any additional measures to determine the community structure. To demonstrate the effectiveness of EACH, they compare it with the state-of-the art algorithms on both the synthetic networks and the real world networks. The experimental results show that EACH is more accurate and has lower complexity in terms of community discovery and especially it can gain quite inherent and consistent communities with a maximal diameter of four jumps.


Asunto(s)
Biología Computacional/métodos , Biología de Sistemas , Algoritmos , Gráficos por Computador , Simulación por Computador , Redes Reguladoras de Genes , Humanos , Modelos Estadísticos , Apoyo Social , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...