Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 216
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38426322

RESUMEN

Cancer is a complex and high-mortality disease regulated by multiple factors. Accurate cancer subtyping is crucial for formulating personalized treatment plans and improving patient survival rates. The underlying mechanisms that drive cancer progression can be comprehensively understood by analyzing multi-omics data. However, the high noise levels in omics data often pose challenges in capturing consistent representations and adequately integrating their information. This paper proposed a novel variational autoencoder-based deep learning model, named Deeply Integrating Latent Consistent Representations (DILCR). Firstly, multiple independent variational autoencoders and contrastive loss functions were designed to separate noise from omics data and capture latent consistent representations. Subsequently, an Attention Deep Integration Network was proposed to integrate consistent representations across different omics levels effectively. Additionally, we introduced the Improved Deep Embedded Clustering algorithm to make integrated variable clustering friendly. The effectiveness of DILCR was evaluated using 10 typical cancer datasets from The Cancer Genome Atlas and compared with 14 state-of-the-art integration methods. The results demonstrated that DILCR effectively captures the consistent representations in omics data and outperforms other integration methods in cancer subtyping. In the Kidney Renal Clear Cell Carcinoma case study, cancer subtypes were identified by DILCR with significant biological significance and interpretability.


Asunto(s)
Carcinoma de Células Renales , Neoplasias Renales , Neoplasias , Humanos , Multiómica , Neoplasias/genética , Carcinoma de Células Renales/genética , Algoritmos , Análisis por Conglomerados , Neoplasias Renales/genética
2.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38678587

RESUMEN

Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.


Asunto(s)
Biomarcadores de Tumor , Aprendizaje Profundo , Recurrencia Local de Neoplasia , Humanos , Biomarcadores de Tumor/metabolismo , Biomarcadores de Tumor/genética , Recurrencia Local de Neoplasia/metabolismo , Recurrencia Local de Neoplasia/genética , Biología Computacional/métodos , Neoplasias/genética , Neoplasias/metabolismo , Neoplasias/patología , Genómica/métodos , Multiómica
3.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38985929

RESUMEN

Recent advances in sequencing, mass spectrometry, and cytometry technologies have enabled researchers to collect multiple 'omics data types from a single sample. These large datasets have led to a growing consensus that a holistic approach is needed to identify new candidate biomarkers and unveil mechanisms underlying disease etiology, a key to precision medicine. While many reviews and benchmarks have been conducted on unsupervised approaches, their supervised counterparts have received less attention in the literature and no gold standard has emerged yet. In this work, we present a thorough comparison of a selection of six methods, representative of the main families of intermediate integrative approaches (matrix factorization, multiple kernel methods, ensemble learning, and graph-based methods). As non-integrative control, random forest was performed on concatenated and separated data types. Methods were evaluated for classification performance on both simulated and real-world datasets, the latter being carefully selected to cover different medical applications (infectious diseases, oncology, and vaccines) and data modalities. A total of 15 simulation scenarios were designed from the real-world datasets to explore a large and realistic parameter space (e.g. sample size, dimensionality, class imbalance, effect size). On real data, the method comparison showed that integrative approaches performed better or equally well than their non-integrative counterpart. By contrast, DIABLO and the four random forest alternatives outperform the others across the majority of simulation scenarios. The strengths and limitations of these methods are discussed in detail as well as guidelines for future applications.


Asunto(s)
Biología Computacional , Humanos , Biología Computacional/métodos , Algoritmos , Genómica/métodos , Genómica/estadística & datos numéricos , Multiómica
4.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38783706

RESUMEN

RNA Polymerase II (Pol II) transcriptional elongation pausing is an integral part of the dynamic regulation of gene transcription in the genome of metazoans. It plays a pivotal role in many vital biological processes and disease progression. However, experimentally measuring genome-wide Pol II pausing is technically challenging and the precise governing mechanism underlying this process is not fully understood. Here, we develop RP3 (RNA Polymerase II Pausing Prediction), a network regularized logistic regression machine learning method, to predict Pol II pausing events by integrating genome sequence, histone modification, gene expression, chromatin accessibility, and protein-protein interaction data. RP3 can accurately predict Pol II pausing in diverse cellular contexts and unveil the transcription factors that are associated with the Pol II pausing machinery. Furthermore, we utilize a forward feature selection framework to systematically identify the combination of histone modification signals associated with Pol II pausing. RP3 is freely available at https://github.com/AMSSwanglab/RP3.


Asunto(s)
Código de Histonas , ARN Polimerasa II , ARN Polimerasa II/metabolismo , Humanos , Elongación de la Transcripción Genética , Cromatina/metabolismo , Cromatina/genética , Histonas/metabolismo , Aprendizaje Automático , Animales
5.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38557672

RESUMEN

Lung adenocarcinoma (LUAD) is the most common histologic subtype of lung cancer. Early-stage patients have a 30-50% probability of metastatic recurrence after surgical treatment. Here, we propose a new computational framework, Interpretable Biological Pathway Graph Neural Networks (IBPGNET), based on pathway hierarchy relationships to predict LUAD recurrence and explore the internal regulatory mechanisms of LUAD. IBPGNET can integrate different omics data efficiently and provide global interpretability. In addition, our experimental results show that IBPGNET outperforms other classification methods in 5-fold cross-validation. IBPGNET identified PSMC1 and PSMD11 as genes associated with LUAD recurrence, and their expression levels were significantly higher in LUAD cells than in normal cells. The knockdown of PSMC1 and PSMD11 in LUAD cells increased their sensitivity to afatinib and decreased cell migration, invasion and proliferation. In addition, the cells showed significantly lower EGFR expression, indicating that PSMC1 and PSMD11 may mediate therapeutic sensitivity through EGFR expression.


Asunto(s)
Adenocarcinoma del Pulmón , Neoplasias Pulmonares , Humanos , Adenocarcinoma del Pulmón/genética , Adenocarcinoma del Pulmón/metabolismo , Neoplasias Pulmonares/metabolismo , Línea Celular Tumoral , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Regulación Neoplásica de la Expresión Génica , Receptores ErbB/genética , Proliferación Celular
6.
EMBO Rep ; 25(1): 254-285, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38177910

RESUMEN

Midbrain dopaminergic neurons (mDANs) control voluntary movement, cognition, and reward behavior under physiological conditions and are implicated in human diseases such as Parkinson's disease (PD). Many transcription factors (TFs) controlling human mDAN differentiation during development have been described, but much of the regulatory landscape remains undefined. Using a tyrosine hydroxylase (TH) human iPSC reporter line, we here generate time series transcriptomic and epigenomic profiles of purified mDANs during differentiation. Integrative analysis predicts novel regulators of mDAN differentiation and super-enhancers are used to identify key TFs. We find LBX1, NHLH1 and NR2F1/2 to promote mDAN differentiation and show that overexpression of either LBX1 or NHLH1 can also improve mDAN specification. A more detailed investigation of TF targets reveals that NHLH1 promotes the induction of neuronal miR-124, LBX1 regulates cholesterol biosynthesis, and NR2F1/2 controls neuronal activity.


Asunto(s)
Neuronas Dopaminérgicas , Células Madre Pluripotentes Inducidas , Humanos , Neuronas Dopaminérgicas/metabolismo , Multiómica , Mesencéfalo , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Células Madre Pluripotentes Inducidas/metabolismo , Diferenciación Celular/genética , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/genética
7.
Trends Genet ; 38(2): 128-139, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-34561102

RESUMEN

A wealth of single-cell protocols makes it possible to characterize different molecular layers at unprecedented resolution. Integrating the resulting multimodal single-cell data to find cell-to-cell correspondences remains a challenge. We argue that data integration needs to happen at a meaningful biological level of abstraction and that it is necessary to consider the inherent discrepancies between modalities to strike a balance between biological discovery and noise removal. A survey of current methods reveals that a distinction between technical and biological origins of presumed unwanted variation between datasets is not yet commonly considered. The increasing availability of paired multimodal data will aid the development of improved methods by providing a ground truth on cell-to-cell matches.

8.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36445207

RESUMEN

Driven by multi-omics data, some multi-view clustering algorithms have been successfully applied to cancer subtypes prediction, aiming to identify subtypes with biometric differences in the same cancer, thereby improving the clinical prognosis of patients and designing personalized treatment plan. Due to the fact that the number of patients in omics data is much smaller than the number of genes, multi-view spectral clustering based on similarity learning has been widely developed. However, these algorithms still suffer some problems, such as over-reliance on the quality of pre-defined similarity matrices for clustering results, inability to reasonably handle noise and redundant information in high-dimensional omics data, ignoring complementary information between omics data, etc. This paper proposes multi-view spectral clustering with latent representation learning (MSCLRL) method to alleviate the above problems. First, MSCLRL generates a corresponding low-dimensional latent representation for each omics data, which can effectively retain the unique information of each omics and improve the robustness and accuracy of the similarity matrix. Second, the obtained latent representations are assigned appropriate weights by MSCLRL, and global similarity learning is performed to generate an integrated similarity matrix. Third, the integrated similarity matrix is used to feed back and update the low-dimensional representation of each omics. Finally, the final integrated similarity matrix is used for clustering. In 10 benchmark multi-omics datasets and 2 separate cancer case studies, the experiments confirmed that the proposed method obtained statistically and biologically meaningful cancer subtypes.


Asunto(s)
Multiómica , Neoplasias , Humanos , Algoritmos , Neoplasias/genética , Análisis por Conglomerados
9.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36433785

RESUMEN

Differentiating cancer subtypes is crucial to guide personalized treatment and improve the prognosis for patients. Integrating multi-omics data can offer a comprehensive landscape of cancer biological process and provide promising ways for cancer diagnosis and treatment. Taking the heterogeneity of different omics data types into account, we propose a hierarchical multi-kernel learning (hMKL) approach, a novel cancer molecular subtyping method to identify cancer subtypes by adopting a two-stage kernel learning strategy. In stage 1, we obtain a composite kernel borrowing the cancer integration via multi-kernel learning (CIMLR) idea by optimizing the kernel parameters for individual omics data type. In stage 2, we obtain a final fused kernel through a weighted linear combination of individual kernels learned from stage 1 using an unsupervised multiple kernel learning method. Based on the final fusion kernel, k-means clustering is applied to identify cancer subtypes. Simulation studies show that hMKL outperforms the one-stage CIMLR method when there is data heterogeneity. hMKL can estimate the number of clusters correctly, which is the key challenge in subtyping. Application to two real data sets shows that hMKL identified meaningful subtypes and key cancer-associated biomarkers. The proposed method provides a novel toolkit for heterogeneous multi-omics data integration and cancer subtypes identification.


Asunto(s)
Aprendizaje Profundo , Neoplasias , Humanos , Multiómica , Neoplasias/genética , Análisis por Conglomerados , Simulación por Computador , Biomarcadores de Tumor/genética
10.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36847697

RESUMEN

Brain imaging genomics is an emerging interdisciplinary field, where integrated analysis of multimodal medical image-derived phenotypes (IDPs) and multi-omics data, bridging the gap between macroscopic brain phenotypes and their cellular and molecular characteristics. This approach aims to better interpret the genetic architecture and molecular mechanisms associated with brain structure, function and clinical outcomes. More recently, the availability of large-scale imaging and multi-omics datasets from the human brain has afforded the opportunity to the discovering of common genetic variants contributing to the structural and functional IDPs of the human brain. By integrative analyses with functional multi-omics data from the human brain, a set of critical genes, functional genomic regions and neuronal cell types have been identified as significantly associated with brain IDPs. Here, we review the recent advances in the methods and applications of multi-omics integration in brain imaging analysis. We highlight the importance of functional genomic datasets in understanding the biological functions of the identified genes and cell types that are associated with brain IDPs. Moreover, we summarize well-known neuroimaging genetics datasets and discuss challenges and future directions in this field.


Asunto(s)
Encéfalo , Genómica , Humanos , Genómica/métodos , Encéfalo/diagnóstico por imagen , Encéfalo/metabolismo , Fenotipo , Neuroimagen/métodos
11.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37000166

RESUMEN

Cooperative driver pathways discovery helps researchers to study the pathogenesis of cancer. However, most discovery methods mainly focus on genomics data, and neglect the known pathway information and other related multi-omics data; thus they cannot faithfully decipher the carcinogenic process. We propose CDPMiner (Cooperative Driver Pathways Miner) to discover cooperative driver pathways by multiplex network embedding, which can jointly model relational and attribute information of multi-type molecules. CDPMiner first uses the pathway topology to quantify the weight of genes in different pathways, and optimizes the relations between genes and pathways. Then it constructs an attributed multiplex network consisting of micro RNAs, long noncoding RNAs, genes and pathways, embeds the network through deep joint matrix factorization to mine more essential information for pathway-level analysis and reconstructs the pathway interaction network. Finally, CDPMiner leverages the reconstructed network and mutation data to define the driver weight between pathways to discover cooperative driver pathways. Experimental results on Breast invasive carcinoma and Stomach adenocarcinoma datasets show that CDPMiner can effectively fuse multi-omics data to discover more driver pathways, which indeed cooperatively trigger cancers and are valuable for carcinogenesis analysis. Ablation study justifies CDPMiner for a more comprehensive analysis of cancer by fusing multi-omics data.


Asunto(s)
Algoritmos , Neoplasias de la Mama , Humanos , Femenino , Genómica/métodos , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Mutación , Carcinogénesis/genética
12.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38156562

RESUMEN

Disrupted protein phosphorylation due to genetic variation is a widespread phenomenon that triggers oncogenic transformation of healthy cells. However, few relevant phosphorylation disruption events have been verified due to limited biological experimental methods. Because of the lack of reliable benchmark datasets, current bioinformatics methods primarily use sequence-based traits to study variant impact on phosphorylation (VIP). Here, we increased the number of experimentally supported VIP events from less than 30 to 740 by manually curating and reanalyzing multi-omics data from 916 patients provided by the Clinical Proteomic Tumor Analysis Consortium. To predict VIP events in cancer cells, we developed VIPpred, a machine learning method characterized by multidimensional features that exhibits robust performance across different cancer types. Our method provided a pan-cancer landscape of VIP events, which are enriched in cancer-related pathways and cancer driver genes. We found that variant-induced increases in phosphorylation events tend to inhibit the protein degradation of oncogenes and promote tumor suppressor protein degradation. Our work provides new insights into phosphorylation-related cancer biology as well as novel avenues for precision therapy.


Asunto(s)
Neoplasias , Proteómica , Humanos , Fosforilación , Oncogenes , Carcinogénesis/genética , Neoplasias/metabolismo
13.
Hum Genomics ; 18(1): 75, 2024 Jul 02.
Artículo en Inglés | MEDLINE | ID: mdl-38956648

RESUMEN

BACKGROUND: Aging represents a significant risk factor for the occurrence of cerebral small vessel disease, associated with white matter (WM) lesions, and to age-related cognitive alterations, though the precise mechanisms remain largely unknown. This study aimed to investigate the impact of polygenic risk scores (PRS) for WM integrity, together with age-related DNA methylation, and gene expression alterations, on cognitive aging in a cross-sectional healthy aging cohort. The PRSs were calculated using genome-wide association study (GWAS) summary statistics for magnetic resonance imaging (MRI) markers of WM integrity, including WM hyperintensities, fractional anisotropy (FA), and mean diffusivity (MD). These scores were utilized to predict age-related cognitive changes and evaluate their correlation with structural brain changes, which distinguish individuals with higher and lower cognitive scores. To reduce the dimensionality of the data and identify age-related DNA methylation and transcriptomic alterations, Sparse Partial Least Squares-Discriminant Analysis (sPLS-DA) was used. Subsequently, a canonical correlation algorithm was used to integrate the three types of omics data (PRS, DNA methylation, and gene expression data) and identify an individual "omics" signature that distinguishes subjects with varying cognitive profiles. RESULTS: We found a positive association between MD-PRS and long-term memory, as well as a correlation between MD-PRS and structural brain changes, effectively discriminating between individuals with lower and higher memory scores. Furthermore, we observed an enrichment of polygenic signals in genes related to both vascular and non-vascular factors. Age-related alterations in DNA methylation and gene expression indicated dysregulation of critical molecular features and signaling pathways involved in aging and lifespan regulation. The integration of multi-omics data underscored the involvement of synaptic dysfunction, axonal degeneration, microtubule organization, and glycosylation in the process of cognitive aging. CONCLUSIONS: These findings provide valuable insights into the biological mechanisms underlying the association between WM coherence and cognitive aging. Additionally, they highlight how age-associated DNA methylation and gene expression changes contribute to cognitive aging.


Asunto(s)
Envejecimiento Cognitivo , Metilación de ADN , Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Humanos , Metilación de ADN/genética , Femenino , Masculino , Herencia Multifactorial/genética , Anciano , Persona de Mediana Edad , Estudios Transversales , Sustancia Blanca/diagnóstico por imagen , Sustancia Blanca/patología , Factores de Riesgo , Imagen por Resonancia Magnética , Envejecimiento/genética , Envejecimiento/patología , Encéfalo/diagnóstico por imagen , Encéfalo/metabolismo , Encéfalo/patología , Puntuación de Riesgo Genético
14.
Genomics ; 116(3): 110831, 2024 05.
Artículo en Inglés | MEDLINE | ID: mdl-38513875

RESUMEN

Hepatitis B virus (HBV) infection is a major etiology of hepatocellular carcinoma (HCC). An interesting question is how different are the molecular and phenotypic profiles between HBV-infected (HBV+) and non-HBV-infected (HBV-) HCCs? Based on the publicly available multi-omics data for HCC, including bulk and single-cell data, and the data we collected and sequenced, we performed a comprehensive comparison of molecular and phenotypic features between HBV+ and HBV- HCCs. Our analysis showed that compared to HBV- HCCs, HBV+ HCCs had significantly better clinical outcomes, higher degree of genomic instability, higher enrichment of DNA repair and immune-related pathways, lower enrichment of stromal and oncogenic signaling pathways, and better response to immunotherapy. Furthermore, in vitro experiments confirmed that HBV+ HCCs had higher immunity, PD-L1 expression and activation of DNA damage response pathways. This study may provide insights into the profiles of HBV+ and HBV- HCCs, and guide rational therapeutic interventions for HCC patients.


Asunto(s)
Carcinoma Hepatocelular , Virus de la Hepatitis B , Neoplasias Hepáticas , Carcinoma Hepatocelular/virología , Carcinoma Hepatocelular/genética , Neoplasias Hepáticas/virología , Neoplasias Hepáticas/genética , Humanos , Virus de la Hepatitis B/genética , Fenotipo , Antígeno B7-H1/genética , Antígeno B7-H1/metabolismo , Hepatitis B/virología , Hepatitis B/complicaciones , Hepatitis B/genética , Inestabilidad Genómica , Reparación del ADN , Multiómica
15.
BMC Bioinformatics ; 25(1): 70, 2024 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-38355439

RESUMEN

BACKGROUND: Biological networks have proven invaluable ability for representing biological knowledge. Multilayer networks, which gather different types of nodes and edges in multiplex, heterogeneous and bipartite networks, provide a natural way to integrate diverse and multi-scale data sources into a common framework. Recently, we developed MultiXrank, a Random Walk with Restart algorithm able to explore such multilayer networks. MultiXrank outputs scores reflecting the proximity between an initial set of seed node(s) and all the other nodes in the multilayer network. We illustrate here the versatility of bioinformatics tasks that can be performed using MultiXrank. RESULTS: We first show that MultiXrank can be used to prioritise genes and drugs of interest by exploring multilayer networks containing interactions between genes, drugs, and diseases. In a second study, we illustrate how MultiXrank scores can also be used in a supervised strategy to train a binary classifier to predict gene-disease associations. The classifier performance are validated using outdated and novel gene-disease association for training and evaluation, respectively. Finally, we show that MultiXrank scores can be used to compute diffusion profiles and use them as disease signatures. We computed the diffusion profiles of more than 100 immune diseases using a multilayer network that includes cell-type specific genomic information. The clustering of the immune disease diffusion profiles reveals shared shared phenotypic characteristics. CONCLUSION: Overall, we illustrate here diverse applications of MultiXrank to showcase its versatility. We expect that this can lead to further and broader bioinformatics applications.


Asunto(s)
Algoritmos , Biología Computacional , Genómica
16.
BMC Bioinformatics ; 25(1): 132, 2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38539064

RESUMEN

BACKGROUND: Classifying breast cancer subtypes is crucial for clinical diagnosis and treatment. However, the early symptoms of breast cancer may not be apparent. Rapid advances in high-throughput sequencing technology have led to generating large number of multi-omics biological data. Leveraging and integrating the available multi-omics data can effectively enhance the accuracy of identifying breast cancer subtypes. However, few efforts focus on identifying the associations of different omics data to predict the breast cancer subtypes. RESULTS: In this paper, we propose a differential sparse canonical correlation analysis network (DSCCN) for classifying the breast cancer subtypes. DSCCN performs differential analysis on multi-omics expression data to identify differentially expressed (DE) genes and adopts sparse canonical correlation analysis (SCCA) to mine highly correlated features between multi-omics DE-genes. Meanwhile, DSCCN uses multi-task deep learning neural network separately to train the correlated DE-genes to predict breast cancer subtypes, which spontaneously tackle the data heterogeneity problem in integrating multi-omics data. CONCLUSIONS: The experimental results show that by mining the associations among multi-omics data, DSCCN is more capable of accurately classifying breast cancer subtypes than the existing methods.


Asunto(s)
Neoplasias de la Mama , Aprendizaje Profundo , Humanos , Femenino , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Multiómica , Análisis de Correlación Canónica
17.
Int J Cancer ; 155(2): 282-297, 2024 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-38489486

RESUMEN

Aberrant DNA methylation is a hallmark of many cancer types. Despite our knowledge of epigenetic and transcriptomic alterations in lung adenocarcinoma (LUAD), we lack robust multi-modal molecular classifications for patient stratification. This is partly because the impact of epigenetic alterations on lung cancer development and progression is still not fully understood. To that end, we identified disease-associated processes under epigenetic regulation in LUAD. We performed a genome-wide expression-methylation Quantitative Trait Loci (emQTL) analysis by integrating DNA methylation and gene expression data from 453 patients in the TCGA cohort. Using a community detection algorithm, we identified distinct communities of CpG-gene associations with diverse biological processes. Interestingly, we identified a community linked to hormone response and lipid metabolism; the identified CpGs in this community were enriched in enhancer regions and binding regions of transcription factors such as FOXA1/2, GRHL2, HNF1B, AR, and ESR1. Furthermore, the CpGs were connected to their associated genes through chromatin interaction loops. These findings suggest that the expression of genes involved in hormone response and lipid metabolism in LUAD is epigenetically regulated through DNA methylation and enhancer-promoter interactions. By applying consensus clustering on the integrated expression-methylation pattern of the emQTL-genes and CpGs linked to hormone response and lipid metabolism, we further identified subclasses of patients with distinct prognoses. This novel patient stratification was validated in an independent patient cohort of 135 patients and showed increased prognostic significance compared to previously defined molecular subtypes.


Asunto(s)
Adenocarcinoma del Pulmón , Islas de CpG , Metilación de ADN , Epigénesis Genética , Regulación Neoplásica de la Expresión Génica , Neoplasias Pulmonares , Sitios de Carácter Cuantitativo , Humanos , Adenocarcinoma del Pulmón/genética , Adenocarcinoma del Pulmón/patología , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patología , Islas de CpG/genética , Femenino , Masculino , Adenocarcinoma/genética , Adenocarcinoma/patología , Perfilación de la Expresión Génica/métodos , Multiómica
18.
Brief Bioinform ; 23(6)2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36094096

RESUMEN

Mendelian randomization is a versatile tool to identify the possible causal relationship between an omics biomarker and disease outcome using genetic variants as instrumental variables. A key theme is the prioritization of genes whose omics readouts can be used as predictors of the disease outcome through analyzing GWAS and QTL summary data. However, there is a dearth of study of the best practice in probing the effects of multiple -omics biomarkers annotated to the same gene of interest. To bridge this gap, we propose powerful combination tests that integrate multiple correlated $P$-values without assuming the dependence structure between the exposures. Our extensive simulation experiments demonstrate the superiority of our proposed approach compared with existing methods that are adapted to the setting of our interest. The top hits of the analyses of multi-omics Alzheimer's disease datasets include genes ABCA7 and ATP1B1.


Asunto(s)
Enfermedad de Alzheimer , Análisis de la Aleatorización Mendeliana , Humanos , Análisis de la Aleatorización Mendeliana/métodos , Causalidad , Enfermedad de Alzheimer/genética , Biomarcadores , Estudio de Asociación del Genoma Completo
19.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35437603

RESUMEN

Each type of cancer usually has several subtypes with distinct clinical implications, and therefore the discovery of cancer subtypes is an important and urgent task in disease diagnosis and therapy. Using single-omics data to predict cancer subtypes is difficult because genomes are dysregulated and complicated by multiple molecular mechanisms, and therefore linking cancer genomes to cancer phenotypes is not an easy task. Using multi-omics data to effectively predict cancer subtypes is an area of much interest; however, integrating multi-omics data is challenging. Here, we propose a novel method of multi-omics data integration for clustering to identify cancer subtypes (MDICC) that integrates new affinity matrix and network fusion methods. Our experimental results show the effectiveness and generalization of the proposed MDICC model in identifying cancer subtypes, and its performance was better than those of currently available state-of-the-art clustering methods. Furthermore, the survival analysis demonstrates that MDICC delivered comparable or even better results than many typical integrative methods.


Asunto(s)
Neoplasias , Análisis por Conglomerados , Humanos , Neoplasias/genética , Análisis de Supervivencia
20.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35043143

RESUMEN

Advances in single-cell biotechnologies simultaneously generate the transcriptomic and epigenomic profiles at cell levels, providing an opportunity for investigating cell fates. Although great efforts have been devoted to either of them, the integrative analysis of single-cell multi-omics data is really limited because of the heterogeneity, noises and sparsity of single-cell profiles. In this study, a network-based integrative clustering algorithm (aka NIC) is present for the identification of cell types by fusing the parallel single-cell transcriptomic (scRNA-seq) and epigenomic profiles (scATAC-seq or DNA methylation). To avoid heterogeneity of multi-omics data, NIC automatically learns the cell-cell similarity graphs, which transforms the fusion of multi-omics data into the analysis of multiple networks. Then, NIC employs joint non-negative matrix factorization to learn the shared features of cells by exploiting the structure of learned cell-cell similarity networks, providing a better way to characterize the features of cells. The graph learning and integrative analysis procedures are jointly formulated as an optimization problem, and then the update rules are derived. Thirteen single-cell multi-omics datasets from various tissues and organisms are adopted to validate the performance of NIC, and the experimental results demonstrate that the proposed algorithm significantly outperforms the state-of-the-art methods in terms of various measurements. The proposed algorithm provides an effective strategy for the integrative analysis of single-cell multi-omics data (The software is coded using Matlab, and is freely available for academic https://github.com/xkmaxidian/NIC ).


Asunto(s)
Análisis de la Célula Individual , Transcriptoma , Algoritmos , Análisis por Conglomerados , Epigenómica , Análisis de la Célula Individual/métodos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA