Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Cancers (Basel) ; 14(21)2022 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-36358745

RESUMO

Accurate prediction of breast cancer metastasis in the early stages of cancer diagnosis is crucial to reduce cancer-related deaths. With the availability of gene expression datasets, many machine-learning models have been proposed to predict breast cancer metastasis using thousands of genes simultaneously. However, the prediction accuracy of the models using gene expression often suffers from the diverse molecular characteristics across different datasets. Additionally, breast cancer is known to have many subtypes, which hinders the performance of the models aimed at all subtypes. To overcome the heterogeneous nature of breast cancer, we propose a method to obtain personalized classifiers that are trained on subsets of patients selected using the similarities between training and testing patients. Results on multiple independent datasets showed that our proposed approach significantly improved prediction accuracy compared to the models trained on the complete training dataset and models trained on specific cancer subtypes. Our results also showed that personalized classifiers trained on positively and negatively correlated patients outperformed classifiers trained only on positively correlated patients, highlighting the importance of selecting proper patient subsets for constructing personalized classifiers. Additionally, our proposed approach obtained more robust features than the other models and identified different features for different patients, making it a promising tool for designing personalized medicine for cancer patients.

2.
IEEE/ACM Trans Comput Biol Bioinform ; 19(3): 1344-1353, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34662279

RESUMO

Interpretability of machine learning (ML) models represents the extent to which a model's decision-making process can be understood by model developers and/or end users. Transcriptomics-based cancer prognosis models, for example, while achieving good accuracy, are usually hard to interpret, due to the high-dimensional feature space and the complexity of models. As interpretability is critical for the transparency and fairness of ML models, several algorithms have been proposed to improve the interpretability of arbitrary classifiers. However, evaluation of these algorithms often requires substantial domain knowledge. Here, we propose a breast cancer metastasis prediction model using a very small number of biologically interpretable features, and a simple yet novel model interpretation approach that can provide personalized interpretations. In addition, we contributed, to the best of our knowledge, the first method to quantitatively compare different interpretation algorithms. Experimental results show that our model not only achieved competitive prediction accuracy, but also higher inter-classifier interpretation consistency than state-of-the-art interpretation methods. Importantly, our interpretation results can improve the generalizability of the prediction models. Overall, this work provides several novel ideas to construct and evaluate interpretable ML models that can be valuable to both the cancer machine learning community and related application domains.


Assuntos
Neoplasias da Mama , Melanoma , Algoritmos , Neoplasias da Mama/genética , Feminino , Humanos , Aprendizado de Máquina , Neoplasias Cutâneas , Melanoma Maligno Cutâneo
3.
BMC Bioinformatics ; 21(Suppl 14): 359, 2020 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-32998692

RESUMO

BACKGROUND: The abundance of molecular profiling of breast cancer tissues entailed active research on molecular marker-based early diagnosis of metastasis. Recently there is a surging interest in combining gene expression with gene networks such as protein-protein interaction (PPI) network, gene co-expression (CE) network and pathway information to identify robust and accurate biomarkers for metastasis prediction, reflecting the common belief that cancer is a systems biology disease. However, controversy exists in the literature regarding whether network markers are indeed better features than genes alone for predicting as well as understanding metastasis. We believe much of the existing results may have been biased by the overly complicated prediction algorithms, unfair evaluation, and lack of rigorous statistics. In this study, we propose a simple approach to use network edges as features, based on two types of networks respectively, and compared their prediction power using three classification algorithms and rigorous statistical procedure on one of the largest datasets available. To detect biomarkers that are significant for the prediction and to compare the robustness of different feature types, we propose an unbiased and novel procedure to measure feature importance that eliminates the potential bias from factors such as different sample size, number of features, as well as class distribution. RESULTS: Experimental results reveal that edge-based feature types consistently outperformed gene-based feature type in random forest and logistic regression models under all performance evaluation metrics, while the prediction accuracy of edge-based support vector machine (SVM) model was poorer, due to the larger number of edge features compared to gene features and the lack of feature selection in SVM model. Experimental results also show that edge features are much more robust than gene features and the top biomarkers from edge feature types are statistically more significantly enriched in the biological processes that are well known to be related to breast cancer metastasis. CONCLUSIONS: Overall, this study validates the utility of edge features as biomarkers but also highlights the importance of carefully designed experimental procedures in order to achieve statistically reliable comparison results.


Assuntos
Biomarcadores Tumorais/metabolismo , Neoplasias da Mama/patologia , Máquina de Vetores de Suporte , Área Sob a Curva , Neoplasias da Mama/genética , Feminino , Redes Reguladoras de Genes/genética , Humanos , Modelos Logísticos , Metástase Neoplásica , Mapas de Interação de Proteínas/genética , Curva ROC
4.
BMC Med Genomics ; 13(Suppl 5): 40, 2020 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-32241278

RESUMO

BACKGROUND: Discovering a highly accurate and robust gene signature for the prediction of breast cancer metastasis from gene expression profiling of primary tumors is one of the most challenging tasks to reduce the number of deaths in women. Due to the limited success of gene-based features in achieving satisfactory prediction accuracy, many methodologies have been proposed in recent years to develop network-based features by integrating network information with gene expression. However, evaluation results are inconsistent to confirm the effectiveness of network-based features, because of many confounding factors involved in classification model learning process, such as data normalization, dimension reduction, and feature selection. An unbiased comparative evaluation is essential for uncovering the strength of network-based features. METHODS: In this study, we compared several types of network-based features obtained using different mathematical operators (Mean, Maximum, Minimum, Median, Variance) on geneset (i.e., a gene and its' neighbors in the network) in protein-protein interaction network and gene co-expression network for their ability in predicting breast cancer metastasis using gene expression data from more than 10 patient cohorts. RESULTS: While network-based features are usually statistically more significant than gene-based feature, a consistent improvement of prediction performance using network-based features requires a substantial number of patients in the dataset. In contrary to many previous reports, no evidence was found to support the robustness of network-based features and we argue some of the robustness may be due to the inherent bias associated with node degree in the network. In addition, different types of network features seem to cover different pathways and are complementary to each other. Consequently, an ensemble classifier combining different network features was proposed and was found to significantly outperform classifiers based on gene-based feature or any single type of network-based features. CONCLUSIONS: Network-based features and their combination show promise for improving the prediction of breast cancer metastasis but may require a large amount of training data. Robustness claim of network-based features needs to be re-examined with network node degree and other confounding factors in consideration.


Assuntos
Biomarcadores Tumorais/genética , Neoplasias da Mama/secundário , Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Transcriptoma , Neoplasias da Mama/genética , Feminino , Perfilação da Expressão Gênica , Humanos , Mapas de Interação de Proteínas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA