Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-38578856

RESUMEN

Accurate screening of cancer types is crucial for effective cancer detection and precise treatment selection. However, the association between gene expression profiles and tumors is often limited to a small number of biomarker genes. While computational methods using nature-inspired algorithms have shown promise in selecting predictive genes, existing techniques are limited by inefficient search and poor generalization across diverse datasets. This study presents a framework termed Evolutionary Optimized Diverse Ensemble Learning (EODE) to improve ensemble learning for cancer classification from gene expression data. The EODE methodology combines an intelligent grey wolf optimization algorithm for selective feature space reduction, guided random injection modeling for ensemble diversity enhancement, and subset model optimization for synergistic classifier combinations. Extensive experiments were conducted across 35 gene expression benchmark datasets encompassing varied cancer types. Results demonstrated that EODE obtained significantly improved screening accuracy over individual and conventionally aggregated models. The integrated optimization of advanced feature selection, directed specialized modeling, and cooperative classifier ensembles helps address key challenges in current nature-inspired approaches. This provides an effective framework for robust and generalized ensemble learning with gene expression biomarkers. Specifically, we have opened EODE source code on Github at https://github.com/wangxb96/EODE.

2.
Bone ; 182: 117050, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38367924

RESUMEN

Postmenopausal osteoporosis (PMOP) is a common kind of osteoporosis that is associated with excessive osteocyte death and bone loss. Previous studies have shown that TNF-α-induced osteocyte necroptosis might exert a stronger effect on PMOP than apoptosis, and TLR4 can also induce cell necroptosis, as confirmed by recent studies. However, little is known about the relationship between TNF-α-induced osteocyte necroptosis and TLR4. In the present study, we showed that TNF-α increased the expression of TLR4, which promoted osteocyte necroptosis in PMOP. In patients with PMOP, TLR4 was highly expressed at skeletal sites where exists osteocyte necroptosis, and high TLR4 expression is correlated with enhanced TNF-α expression. Osteocytes exhibited robust TLR4 expression upon exposure to necroptotic osteocytes in vivo and in vitro. Western blotting and immunofluorescence analyses demonstrated that TNF-α upregulated TLR4 expression in vitro, which might further promote osteocyte necroptosis. Furthermore, inhibition of TLR4 by TAK-242 in vitro effectively blocked osteocyte necroptosis induced by TNF-α. Collectively, these results suggest a novel TLR4-mediated process of osteocyte necroptosis, which might increase osteocyte death and bone loss in the process of PMOP.


Asunto(s)
Osteocitos , Osteoporosis Posmenopáusica , Receptor Toll-Like 4 , Factor de Necrosis Tumoral alfa , Femenino , Humanos , Necroptosis , Osteocitos/metabolismo , Osteoporosis Posmenopáusica/metabolismo , Receptor Toll-Like 4/metabolismo , Factor de Necrosis Tumoral alfa/metabolismo
3.
Adv Sci (Weinh) ; 11(16): e2307280, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38380499

RESUMEN

Single-cell RNA sequencing (scRNA-seq) is a robust method for studying gene expression at the single-cell level, but accurately quantifying genetic material is often hindered by limited mRNA capture, resulting in many missing expression values. Existing imputation methods rely on strict data assumptions, limiting their broader application, and lack reliable supervision, leading to biased signal recovery. To address these challenges, authors developed Bis, a distribution-agnostic deep learning model for accurately recovering missing sing-cell gene expression from multiple platforms. Bis is an optimal transport-based autoencoder model that can capture the intricate distribution of scRNA-seq data while addressing the characteristic sparsity by regularizing the cellular embedding space. Additionally, they propose a module using bulk RNA-seq data to guide reconstruction and ensure expression consistency. Experimental results show Bis outperforms other models across simulated and real datasets, showcasing superiority in various downstream analyses including batch effect removal, clustering, differential expression analysis, and trajectory inference. Moreover, Bis successfully restores gene expression levels in rare cell subsets in a tumor-matched peripheral blood dataset, revealing developmental characteristics of cytokine-induced natural killer cells within a head and neck squamous cell carcinoma microenvironment.


Asunto(s)
Aprendizaje Profundo , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión Génica/métodos
4.
Methods ; 223: 65-74, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38280472

RESUMEN

MicroRNAs (miRNAs) are vital in regulating gene expression through binding to specific target sites on messenger RNAs (mRNAs), a process closely tied to cancer pathogenesis. Identifying miRNA functional targets is essential but challenging, due to incomplete genome annotation and an emphasis on known miRNA-mRNA interactions, restricting predictions of unknown ones. To address those challenges, we have developed a deep learning model based on miRNA functional target identification, named miTDS, to investigate miRNA-mRNA interactions. miTDS first employs a scoring mechanism to eliminate unstable sequence pairs and then utilizes a dynamic word embedding model based on the transformer architecture, enabling a comprehensive analysis of miRNA-mRNA interaction sites by harnessing the global contextual associations of each nucleotide. On this basis, miTDS fuses extended seed alignment representations learned in the multi-scale attention mechanism module with dynamic semantic representations extracted in the RNA-based dual-path module, which can further elucidate and predict miRNA and mRNA functions and interactions. To validate the effectiveness of miTDS, we conducted a thorough comparison with state-of-the-art miRNA-mRNA functional target prediction methods. The evaluation, performed on a dataset cross-referenced with entries from MirTarbase and Diana-TarBase, revealed that miTDS surpasses current methods in accurately predicting functional targets. In addition, our model exhibited proficiency in identifying A-to-I RNA editing sites, which represents an aberrant interaction that yields valuable insights into the suppression of cancerous processes.


Asunto(s)
Aprendizaje Profundo , MicroARNs , MicroARNs/genética , ARN Mensajero/genética , Nucleótidos , Edición de ARN
5.
Comput Biol Med ; 168: 107753, 2024 01.
Artículo en Inglés | MEDLINE | ID: mdl-38039889

RESUMEN

BACKGROUND: Trans-acting factors are of special importance in transcription regulation, which is a group of proteins that can directly or indirectly recognize or bind to the 8-12 bp core sequence of cis-acting elements and regulate the transcription efficiency of target genes. The progressive development in high-throughput chromatin capture technology (e.g., Hi-C) enables the identification of chromatin-interacting sequence groups where trans-acting DNA motif groups can be discovered. The problem difficulty lies in the combinatorial nature of DNA sequence pattern matching and its underlying sequence pattern search space. METHOD: Here, we propose to develop MotifHub for trans-acting DNA motif group discovery on grouped sequences. Specifically, the main approach is to develop probabilistic modeling for accommodating the stochastic nature of DNA motif patterns. RESULTS: Based on the modeling, we develop global sampling techniques based on EM and Gibbs sampling to address the global optimization challenge for model fitting with latent variables. The results reflect that our proposed approaches demonstrate promising performance with linear time complexities. CONCLUSION: MotifHub is a novel algorithm considering the identification of both DNA co-binding motif groups and trans-acting TFs. Our study paves the way for identifying hub TFs of stem cell development (OCT4 and SOX2) and determining potential therapeutic targets of prostate cancer (FOXA1 and MYC). To ensure scientific reproducibility and long-term impact, its matrix-algebra-optimized source code is released at http://bioinfo.cs.cityu.edu.hk/MotifHub.


Asunto(s)
Algoritmos , Programas Informáticos , Motivos de Nucleótidos/genética , Reproducibilidad de los Resultados , Cromatina/genética
6.
Bioinformatics ; 39(11)2023 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-37934154

RESUMEN

MOTIVATION: Recent frameworks based on deep learning have been developed to identify cancer subtypes from high-throughput gene expression profiles. Unfortunately, the performance of deep learning is highly dependent on its neural network architectures which are often hand-crafted with expertise in deep neural networks, meanwhile, the optimization and adjustment of the network are usually costly and time consuming. RESULTS: To address such limitations, we proposed a fully automated deep neural architecture search model for diagnosing consensus molecular subtypes from gene expression data (DNAS). The proposed model uses ant colony algorithm, one of the heuristic swarm intelligence algorithms, to search and optimize neural network architecture, and it can automatically find the optimal deep learning model architecture for cancer diagnosis in its search space. We validated DNAS on eight colorectal cancer datasets, achieving the average accuracy of 95.48%, the average specificity of 98.07%, and the average sensitivity of 96.24%, respectively. Without the loss of generality, we investigated the general applicability of DNAS further on other cancer types from different platforms including lung cancer and breast cancer, and DNAS achieved an area under the curve of 95% and 96%, respectively. In addition, we conducted gene ontology enrichment and pathological analysis to reveal interesting insights into cancer subtype identification and characterization across multiple cancer types. AVAILABILITY AND IMPLEMENTATION: The source code and data can be downloaded from https://github.com/userd113/DNAS-main. And the web server of DNAS is publicly accessible at 119.45.145.120:5001.


Asunto(s)
Neoplasias de la Mama , Aprendizaje Profundo , Humanos , Femenino , Redes Neurales de la Computación , Algoritmos , Programas Informáticos
7.
Comput Struct Biotechnol J ; 21: 2454-2470, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37077177

RESUMEN

Cancer has received extensive recognition for its high mortality rate, with metastatic cancer being the top cause of cancer-related deaths. Metastatic cancer involves the spread of the primary tumor to other body organs. As much as the early detection of cancer is essential, the timely detection of metastasis, the identification of biomarkers, and treatment choice are valuable for improving the quality of life for metastatic cancer patients. This study reviews the existing studies on classical machine learning (ML) and deep learning (DL) in metastatic cancer research. Since the majority of metastatic cancer research data are collected in the formats of PET/CT and MRI image data, deep learning techniques are heavily involved. However, its black-box nature and expensive computational cost are notable concerns. Furthermore, existing models could be overestimated for their generality due to the non-diverse population in clinical trial datasets. Therefore, research gaps are itemized; follow-up studies should be carried out on metastatic cancer using machine learning and deep learning tools with data in a symmetric manner.

8.
Adv Sci (Weinh) ; 10(11): e2204113, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36762572

RESUMEN

The single-cell RNA sequencing (scRNA-seq) quantifies the gene expression of individual cells, while the bulk RNA sequencing (bulk RNA-seq) characterizes the mixed transcriptome of cells. The inference of drug sensitivities for individual cells can provide new insights to understand the mechanism of anti-cancer response heterogeneity and drug resistance at the cellular resolution. However, pharmacogenomic information related to their corresponding scRNA-Seq is often limited. Therefore, a transfer learning model is proposed to infer the drug sensitivities at single-cell level. This framework learns bulk transcriptome profiles and pharmacogenomics information from population cell lines in a large public dataset and transfers the knowledge to infer drug efficacy of individual cells. The results suggest that it is suitable to learn knowledge from pre-clinical cell lines to infer pre-existing cell subpopulations with different drug sensitivities prior to drug exposure. In addition, the model offers a new perspective on drug combinations. It is observed that drug-resistant subpopulation can be sensitive to other drugs (e.g., a subset of JHU006 is Vorinostat-resistant while Gefitinib-sensitive); such finding corroborates the previously reported drug combination (Gefitinib + Vorinostat) strategy in several cancer types. The identified drug sensitivity biomarkers reveal insights into the tumor heterogeneity and treatment at cellular resolution.


Asunto(s)
Transcriptoma , RNA-Seq/métodos , Gefitinib , Vorinostat , Transcriptoma/genética , Análisis de Secuencia de ARN/métodos
9.
J Vasc Surg Venous Lymphat Disord ; 11(3): 626-633, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-36787860

RESUMEN

OBJECTIVE: To investigate the safety and effectiveness of venous stenting in patients with chronic iliofemoral venous obstruction and secondary lymphedema from malignancy. METHODS: From July 2012 to December 2020, patients with iliofemoral venous obstruction and secondary lymphedema who underwent venous stenting in our institution were reviewed retrospectively. Clinical characteristics, surgical complications, and symptom relief were assessed. Stent patency was evaluated with duplex ultrasound or computed tomographic venography. Twelve-month outcomes were reported. RESULTS: Fifty-three patients with concurrent secondary lymphedema who had stents placed for iliofemoral venous obstruction were included. There were 42 females, and the mean age was 56.9 years. Nonthrombotic iliac vein lesions were identified in 16 patients (30.1%). Immediate technical success was 100%, with an average of two stents implanted. The median Villalta score, and Chronic Venous Disease Quality of Life quality of life questionnaire scores decreased from 12 (IQR, 10-15) and 58 (IQR, 50-66) at baseline, respectively, to 5 (interquartile range [IQR], 4-6) and 28 (IQR, 22-45) at 12 months after the procedure (P < .05), showing significant improvement in the quality of life. At the end of a median follow-up of 12 months (range, 3-25 months), the cumulative primary, assisted primary, and secondary patency rates were 70.8%, 76.9%, and 90.1%, respectively. CONCLUSIONS: In patients with secondary lymphedema from malignancy, venous stent placement is safe and effective for iliofemoral venous obstruction.


Asunto(s)
Neoplasias , Enfermedades Vasculares , Femenino , Humanos , Persona de Mediana Edad , Estudios Retrospectivos , Calidad de Vida , Vena Femoral/diagnóstico por imagen , Vena Femoral/cirugía , Resultado del Tratamiento , Stents , Vena Ilíaca/diagnóstico por imagen , Vena Ilíaca/cirugía , Enfermedad Crónica
10.
Nat Commun ; 14(1): 400, 2023 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-36697410

RESUMEN

Single-cell RNA sequencing provides high-throughput gene expression information to explore cellular heterogeneity at the individual cell level. A major challenge in characterizing high-throughput gene expression data arises from challenges related to dimensionality, and the prevalence of dropout events. To address these concerns, we develop a deep graph learning method, scMGCA, for single-cell data analysis. scMGCA is based on a graph-embedding autoencoder that simultaneously learns cell-cell topology representation and cluster assignments. We show that scMGCA is accurate and effective for cell segregation and batch effect correction, outperforming other state-of-the-art models across multiple platforms. In addition, we perform genomic interpretation on the key compressed transcriptomic space of the graph-embedding autoencoder to demonstrate the underlying gene regulation mechanism. We demonstrate that in a pancreatic ductal adenocarcinoma dataset, scMGCA successfully provides annotations on the specific cell types and reveals differential gene expression levels across multiple tumor-associated and cell signalling pathways.


Asunto(s)
Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Perfilación de la Expresión Génica/métodos , Neoplasias Pancreáticas/genética , Regulación de la Expresión Génica , Transcriptoma , Carcinoma Ductal Pancreático/genética , Análisis de la Célula Individual/métodos
11.
Vasc Endovascular Surg ; 57(2): 164-168, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-36167456

RESUMEN

Venous cystic adventitial disease (VCAD) is a rare vascular anomaly located in the common femoral vein in most cases. We describe the case of a 59-year-old female patient with right leg edema who was misdiagnosed with deep vein thrombosis of the lower extremity at another hospital. Magnetic resonance angiography revealed a round mass in the popliteal vein, with a narrow lumen. Considering the location of the lesion, absence of a history of deep venous thrombosis and trauma, and clinical manifestations, the diagnosis is likely a popliteal vein adventitial cyst. Segmental popliteal vein resection and reconstruction were performed using a cylindrical great saphenous vein graft. No joint connection was found during the operation, and the postoperative pathology confirmed VCAD.


Asunto(s)
Quistes , Enfermedades Vasculares , Femenino , Humanos , Persona de Mediana Edad , Vena Poplítea/diagnóstico por imagen , Vena Poplítea/cirugía , Quistes/diagnóstico por imagen , Quistes/cirugía , Resultado del Tratamiento , Enfermedades Vasculares/diagnóstico por imagen , Enfermedades Vasculares/cirugía , Vena Femoral/diagnóstico por imagen , Vena Femoral/cirugía , Vena Femoral/patología
12.
BioData Min ; 15(1): 12, 2022 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-35461302

RESUMEN

BACKGROUND: Cancer molecular subtyping plays a critical role in individualized patient treatment. In previous studies, high-throughput gene expression signature-based methods have been proposed to identify cancer subtypes. Unfortunately, the existing ones suffer from the curse of dimensionality, data sparsity, and computational deficiency. METHODS: To address those problems, we propose a computational framework for colorectal cancer subtyping without any exploitation in model complexity and generality. A supervised learning framework based on deep learning (DeepCSD) is proposed to identify cancer subtypes. Specifically, based on the differentially expressed genes under cancer consensus molecular subtyping, we design a minimalist feed-forward neural network to capture the distinct molecular features in different cancer subtypes. To mitigate the overfitting phenomenon of deep learning as much as possible, L1 and L2 regularization and dropout layers are added. RESULTS: For demonstrating the effectiveness of DeepCSD, we compared it with other methods including Random Forest (RF), Deep forest (gcForest), support vector machine (SVM), XGBoost, and DeepCC on eight independent colorectal cancer datasets. The results reflect that DeepCSD can achieve superior performance over other algorithms. In addition, gene ontology enrichment and pathology analysis are conducted to reveal novel insights into the cancer subtype identification and characterization mechanisms. CONCLUSIONS: DeepCSD considers all subtype-specific genes as input, which is pathologically necessary for its completeness. At the same time, DeepCSD shows remarkable robustness in handling cross-platform gene expression data, achieving similar performance on both training and test data without significant model overfitting or exploitation of model complexity.

13.
Bioinformatics ; 38(11): 3020-3028, 2022 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-35451457

RESUMEN

MOTIVATION: Thanks to the development of high-throughput sequencing technologies, massive amounts of various biomolecular data have been accumulated to revolutionize the study of genomics and molecular biology. One of the main challenges in analyzing this biomolecular data is to cluster their subtypes into subpopulations to facilitate subsequent downstream analysis. Recently, many clustering methods have been developed to address the biomolecular data. However, the computational methods often suffer from many limitations such as high dimensionality, data heterogeneity and noise. RESULTS: In our study, we develop a novel Graph-based Multiple Hierarchical Consensus Clustering (GMHCC) method with an unsupervised graph-based feature ranking (FR) and a graph-based linking method to explore the multiple hierarchical information of the underlying partitions of the consensus clustering for multiple types of biomolecular data. Indeed, we first propose to use a graph-based unsupervised FR model to measure each feature by building a graph over pairwise features and then providing each feature with a rank. Subsequently, to maintain the diversity and robustness of basic partitions (BPs), we propose multiple diverse feature subsets to generate several BPs and then explore the hierarchical structures of the multiple BPs by refining the global consensus function. Finally, we develop a new graph-based linking method, which explicitly considers the relationships between clusters to generate the final partition. Experiments on multiple types of biomolecular data including 35 cancer gene expression datasets and eight single-cell RNA-seq datasets validate the effectiveness of our method over several state-of-the-art consensus clustering approaches. Furthermore, differential gene analysis, gene ontology enrichment analysis and KEGG pathway analysis are conducted, providing novel insights into cell developmental lineages and characterization mechanisms. AVAILABILITY AND IMPLEMENTATION: The source code is available at GitHub: https://github.com/yifuLu/GMHCC. The software and the supporting data can be downloaded from: https://figshare.com/articles/software/GMHCC/17111291. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Programas Informáticos , Consenso , Análisis por Conglomerados , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de la Célula Individual
14.
IEEE J Biomed Health Inform ; 26(8): 4303-4313, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35439152

RESUMEN

Exploring the prognostic classification and biomarkers in Head and Neck Squamous Carcinoma (HNSC) is of great clinical significance. We hybridized three prominent strategies to comprehensively characterize the molecular features of HNSC. We constructed a 15-gene signature to predict patients' death risk with an average AUC of 0.744 for 1-, 3-, and 5-year on TCGA-HNSC training set, and average AUCs of 0.636, 0.584, 0.755 in GSE65858, GSE-112026, CPTAC-HNSCC datasets, respectively. By combined with NMF clustering and consensus clustering of fraction of tumor immune cell infiltration (ICI) in the tumor microenvironment (TME), we captured a more refined biological characteristics of HNSC, and observed a prognosis heterogeneity in high tumor immunity patients. By matching tumor subset-specific expression signatures to drug-induced cell line expression profiles from large-scale pharmacogenomic databases in the OCTAD workspace, we identified a group of HNSC patients featured with poor prognosis and demonstrated that the individuals in this group are likely to receive increased drug sensitivity to reverse differentially expressed disease signature genes. This trend is especially highlighted among those with higher death risk and tumour immunity.


Asunto(s)
Perfilación de la Expresión Génica , Neoplasias de Cabeza y Cuello , Biomarcadores de Tumor/genética , Neoplasias de Cabeza y Cuello/tratamiento farmacológico , Neoplasias de Cabeza y Cuello/genética , Humanos , Pronóstico , Carcinoma de Células Escamosas de Cabeza y Cuello/genética , Transcriptoma , Resultado del Tratamiento , Microambiente Tumoral/genética
15.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35323862

RESUMEN

Healthcare disparities in multiethnic medical data is a major challenge; the main reason lies in the unequal data distribution of ethnic groups among data cohorts. Biomedical data collected from different cancer genome research projects may consist of mainly one ethnic group, such as people with European ancestry. In contrast, the data distribution of other ethnic races such as African, Asian, Hispanic, and Native Americans can be less visible than the counterpart. Data inequality in the biomedical field is an important research problem, resulting in the diverse performance of machine learning models while creating healthcare disparities. Previous researches have reduced the healthcare disparities only using limited data distributions. In our study, we work on fine-tuning of deep learning and transfer learning models with different multiethnic data distributions for the prognosis of 33 cancer types. In previous studies, to reduce the healthcare disparities, only a single ethnic cohort was used as the target domain with one major source domain. In contrast, we focused on multiple ethnic cohorts as the target domain in transfer learning using the TCGA and MMRF CoMMpass study datasets. After performance comparison for experiments with new data distributions, our proposed model shows promising performance for transfer learning schemes compared to the baseline approach for old and new data distributation experiments.


Asunto(s)
Disparidades en Atención de Salud , Neoplasias , Etnicidad , Hispánicos o Latinos , Humanos , Aprendizaje Automático , Neoplasias/genética
16.
IEEE J Biomed Health Inform ; 26(3): 1309-1317, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-34379600

RESUMEN

Prostate cancer is the second leading cancer in men, according to the WHO world cancer report. Its prevention and treatment demand proper attention. Despite numerous attempts for disease prevention, prostate tumours can still become metastatic by blood circulation to other organs. Several treatments have been adopted. However, findings show that the docetaxel treatment induces adverse reactions in patients. Particle Swarm Optimized Gaussian Process Classifier (PSO-GPC) is proposed to determine when to discontinue treatment. Based on three cohorts of prostate cancer patients, we propose and compare several classifiers for the best performance in determining treatment discontinuation. Given the data skewness and class imbalance, the models are evaluated based on both the area under receiver operating characteristics curve (AUC) and area under precision recall curve (AUPRC). With the AUCs ranging between 0.6717-0.8499, and AUPRCs ranging between 0.1392-0.5423, PSO-GPC performs better than the state-of-the-art. We have carried out statistical analysis for ranking methods and analyzed independent cohort data with PSO-GPC, demonstrating its unbiased performance. A proper determination of treatment discontinuation in metastatic castration-resistant prostate cancer patients will reduce the mortality rate in cancer patients.


Asunto(s)
Neoplasias de la Próstata Resistentes a la Castración , Área Bajo la Curva , Docetaxel , Humanos , Masculino , Distribución Normal , Neoplasias de la Próstata Resistentes a la Castración/tratamiento farmacológico , Curva ROC
17.
IEEE Trans Cybern ; 52(10): 11027-11040, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-33961576

RESUMEN

Patient stratification has been studied widely to tackle subtype diagnosis problems for effective treatment. Due to the dimensionality curse and poor interpretability of data, there is always a long-lasting challenge in constructing a stratification model with high diagnostic ability and good generalization. To address these problems, this article proposes two novel evolutionary multiobjective clustering algorithms with ensemble (NSGA-II-ECFE and MOEA/D-ECFE) with four cluster validity indices used as the objective functions. First, an effective ensemble construction method is developed to enrich the ensemble diversity. After that, an ensemble clustering fitness evaluation (ECFE) method is proposed to evaluate the ensembles by measuring the consensus clustering under those four objective functions. To generate the consensus clustering, ECFE exploits the hybrid co-association matrix from the ensembles and then dynamically selects the suitable clustering algorithm on that matrix. Multiple experiments have been conducted to demonstrate the effectiveness of the proposed algorithm in comparison with seven clustering algorithms, twelve ensemble clustering approaches, and two multiobjective clustering algorithms on 55 synthetic datasets and 35 real patient stratification datasets. The experimental results demonstrate the competitive edges of the proposed algorithms over those compared methods. Furthermore, the proposed algorithm is applied to extend its advantages by identifying cancer subtypes from five cancer-related single-cell RNA-seq datasets.


Asunto(s)
Algoritmos , Neoplasias , Análisis por Conglomerados , Humanos , Neoplasias/genética
18.
Bioinformatics ; 37(19): 3319-3327, 2021 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-33515231

RESUMEN

MOTIVATION: The early detection of cancer through accessible blood tests can foster early patient interventions. Although there are developments in cancer detection from cell-free DNA (cfDNA), its accuracy remains speculative. Given its central importance with broad impacts, we aspire to address the challenge. METHOD: A bagging Ensemble Meta Classifier (CancerEMC) is proposed for early cancer detection based on circulating protein biomarkers and mutations in cfDNA from blood. CancerEMC is generally designed for both binary cancer detection and multi-class cancer type localization. It can address the class imbalance problem in multi-analyte blood test data based on robust oversampling and adaptive synthesis techniques. RESULTS: Based on the clinical blood test data, we observe that the proposed CancerEMC has outperformed other algorithms and state-of-the-arts studies (including CancerSEEK) for cancer detection. The results reveal that our proposed method (i.e. CancerEMC) can achieve the best performance result for both binary cancer classification with 99.17% accuracy (AUC = 0.999) and localized multiple cancer detection with 74.12% accuracy (AUC = 0.938). Addressing the data imbalance issue with oversampling techniques, the accuracy can be increased to 91.50% (AUC = 0.992), where the state-of-the-art method can only be estimated at 69.64% (AUC = 0.921). Similar results can also be observed on independent and isolated testing data. AVAILABILITY: https://github.com/saifurcubd/Cancer-Detection. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

19.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33454736

RESUMEN

Haploinsufficiency, wherein a single allele is not enough to maintain normal functions, can lead to many diseases including cancers and neurodevelopmental disorders. Recently, computational methods for identifying haploinsufficiency have been developed. However, most of those computational methods suffer from study bias, experimental noise and instability, resulting in unsatisfactory identification of haploinsufficient genes. To address those challenges, we propose a deep forest model, called HaForest, to identify haploinsufficient genes. The multiscale scanning is proposed to extract local contextual representations from input features under Linear Discriminant Analysis. After that, the cascade forest structure is applied to obtain the concatenated features directly by integrating decision-tree-based forests. Meanwhile, to exploit the complex dependency structure among haploinsufficient genes, the LightGBM library is embedded into HaForest to reveal the highly expressive features. To validate the effectiveness of our method, we compared it to several computational methods and four deep learning algorithms on five epigenomic data sets. The results reveal that HaForest achieves superior performance over the other algorithms, demonstrating its unique and complementary performance in identifying haploinsufficient genes. The standalone tool is available at https://github.com/yangyn533/HaForest.


Asunto(s)
Aprendizaje Profundo , Epigénesis Genética , Haploinsuficiencia , Neoplasias/genética , Trastornos del Neurodesarrollo/genética , Programas Informáticos , Alelos , Benchmarking , Árboles de Decisión , Análisis Discriminante , Elementos de Facilitación Genéticos , Genoma Humano , Histonas/genética , Histonas/metabolismo , Humanos , Internet , Neoplasias/diagnóstico , Neoplasias/patología , Trastornos del Neurodesarrollo/diagnóstico , Trastornos del Neurodesarrollo/patología , Regiones Promotoras Genéticas
20.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2431-2444, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-32086219

RESUMEN

Detection and diagnosis of cancer are especially essential for early prevention and effective treatments. Many studies have been proposed to tackle the subtype diagnosis problems with those data, which often suffer from low diagnostic ability and bad generalization. This article studies a multiobjective PSO-based hybrid algorithm (MOPSOHA) to optimize four objectives including the number of features, the accuracy, and two entropy-based measures: the relevance and the redundancy simultaneously, diagnosing the cancer data with high classification power and robustness. First, we propose a novel binary encoding strategy to choose informative gene subsets to optimize those objective functions. Second, a mutation operator is designed to enhance the exploration capability of the swarm. Finally, a local search method based on the "best/1" mutation operator of differential evolutionary algorithm (DE) is employed to exploit the neighborhood area with sparse high-quality solutions since the base vector always approaches to some good promising areas. In order to demonstrate the effectiveness of MOPSOHA, it is tested on 41 cancer datasets including thirty-five cancer gene expression datasets and six independent disease datasets. Compared MOPSOHA with other state-of-the-art algorithms, the performance of MOPSOHA is superior to other algorithms in most of the benchmark datasets.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Neoplasias , Bases de Datos Genéticas , Diagnóstico por Computador , Regulación Neoplásica de la Expresión Génica/genética , Humanos , Neoplasias/clasificación , Neoplasias/diagnóstico , Neoplasias/genética , Neoplasias/metabolismo , Transcriptoma/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA