ABSTRACT
OBJECTIVE: This study introduces the complete blood count (CBC), a standard prenatal screening test, as a biomarker for diagnosing preeclampsia with severe features (sPE), employing machine learning models. METHODS: We used a boosting machine learning model fed with synthetic data generated through a new methodology called DAS (Data Augmentation and Smoothing). Using data from a Brazilian study including 132 pregnant women, we generated 3,552 synthetic samples for model training. To improve interpretability, we also provided a ridge regression model. RESULTS: Our boosting model obtained an AUROC of 0.90±0.10, sensitivity of 0.95, and specificity of 0.79 to differentiate sPE and non-PE pregnant women, using CBC parameters of neutrophils count, mean corpuscular hemoglobin (MCH), and the aggregate index of systemic inflammation (AISI). In addition, we provided a ridge regression equation using the same three CBC parameters, which is fully interpretable and achieved an AUROC of 0.79±0.10 to differentiate the both groups. Moreover, we also showed that a monocyte count lower than 490 / m m 3 yielded a sensitivity of 0.71 and specificity of 0.72. CONCLUSION: Our study showed that ML-powered CBC could be used as a biomarker for sPE diagnosis support. In addition, we showed that a low monocyte count alone could be an indicator of sPE. SIGNIFICANCE: Although preeclampsia has been extensively studied, no laboratory biomarker with favorable cost-effectiveness has been proposed. Using artificial intelligence, we proposed to use the CBC, a low-cost, fast, and well-spread blood test, as a biomarker for sPE.
Subject(s)
Biomarkers , Machine Learning , Pre-Eclampsia , Humans , Pre-Eclampsia/diagnosis , Pre-Eclampsia/blood , Female , Pregnancy , Biomarkers/blood , Blood Cell Count/methods , Adult , Sensitivity and Specificity , Brazil , Severity of Illness Index , ROC Curve , Prenatal Diagnosis/methodsABSTRACT
Optimizing early breast cancer (BC) detection requires effective risk assessment tools. This retrospective study from Brazil showcases the efficacy of machine learning in discerning complex patterns within routine blood tests, presenting a globally accessible and cost-effective approach for risk evaluation. We analyzed complete blood count (CBC) tests from 396,848 women aged 40-70, who underwent breast imaging or biopsies within six months after their CBC test. Of these, 2861 (0.72%) were identified as cases: 1882 with BC confirmed by anatomopathological tests, and 979 with highly suspicious imaging (BI-RADS 5). The remaining 393,987 participants (99.28%), with BI-RADS 1 or 2 results, were classified as controls. The database was divided into modeling (including training and validation) and testing sets based on diagnostic certainty. The testing set comprised cases confirmed by anatomopathology and controls cancer-free for 4.5-6.5 years post-CBC. Our ridge regression model, incorporating neutrophil-lymphocyte ratio, red blood cells, and age, achieved an AUC of 0.64 (95% CI 0.64-0.65). We also demonstrate that these results are slightly better than those from a boosting machine learning model, LightGBM, plus having the benefit of being fully interpretable. Using the probabilistic output from this model, we divided the study population into four risk groups: high, moderate, average, and low risk, which obtained relative ratios of BC of 1.99, 1.32, 1.02, and 0.42, respectively. The aim of this stratification was to streamline prioritization, potentially improving the early detection of breast cancer, particularly in resource-limited environments. As a risk stratification tool, this model offers the potential for personalized breast cancer screening by prioritizing women based on their individual risk, thereby indicating a shift from a broad population strategy.
Subject(s)
Breast Neoplasms , Machine Learning , Humans , Breast Neoplasms/blood , Breast Neoplasms/diagnosis , Breast Neoplasms/pathology , Female , Middle Aged , Retrospective Studies , Adult , Aged , Blood Cell Count/methods , Risk Assessment/methods , Early Detection of Cancer/methods , Brazil/epidemiologyABSTRACT
Alzheimer's disease (AD) is the most common type and accounts for 60%-70% of the reported cases of dementia. MicroRNAs (miRNAs) are small non-coding RNAs that play a crucial role in gene expression regulation. Although the diagnosis of AD is primarily clinical, several miRNAs have been associated with AD and considered as potential markers for diagnosis and progression of AD. We sought to match AD-related miRNAs in cerebrospinal fluid (CSF) found in the GeoDataSets, evaluated by machine learning, with miRNAs listed in a systematic review, and a pathway analysis. Using machine learning approaches, we identified most differentially expressed miRNAs in Gene Expression Omnibus (GEO), which were validated by the systematic review, using the acronym PECO-Population (P): Patients with AD, Exposure (E): expression of miRNAs, Comparison (C): Healthy individuals, and Objective (O): miRNAs differentially expressed in CSF. Additionally, pathway enrichment analysis was performed to identify the main pathways involving at least four miRNAs selected. Four miRNAs were identified for differentiating between patients with and without AD in machine learning combined to systematic review, and followed the pathways analysis: miRNA-30a-3p, miRNA-193a-5p, miRNA-143-3p, miRNA-145-5p. The pathways epidermal growth factor, MAPK, TGF-beta and ATM-dependent DNA damage response, were regulated by these miRNAs, but only the MAPK pathway presented higher relevance after a randomic pathway analysis. These findings have the potential to assist in the development of diagnostic tests for AD using miRNAs as biomarkers, as well as provide understanding of the relationship between different pathophysiological mechanisms of AD.
Subject(s)
Alzheimer Disease , Data Mining , Machine Learning , MicroRNAs , Alzheimer Disease/cerebrospinal fluid , Alzheimer Disease/genetics , Alzheimer Disease/diagnosis , Humans , MicroRNAs/cerebrospinal fluid , MicroRNAs/genetics , Biomarkers/cerebrospinal fluidABSTRACT
Abstract Objective: To develop a convolutional neural network (CNN) model, trained with the Brazilian "Estudo Longitudinal de Saúde do Adulto Musculoesquelético" (ELSA-Brasil MSK, Longitudinal Study of Adult Health, Musculoskeletal) baseline radiographic examinations, for the automated classification of knee osteoarthritis. Materials and Methods: This was a cross-sectional study carried out with 5,660 baseline posteroanterior knee radiographs from the ELSA-Brasil MSK database (5,660 baseline posteroanterior knee radiographs). The examinations were interpreted by a radiologist with specific training, and the calibration was as established previously. Results: The CNN presented an area under the receiver operating characteristic curve of 0.866 (95% CI: 0.842-0.882). The model can be optimized to achieve, not simultaneously, maximum values of 0.907 for accuracy, 0.938 for sensitivity, and 0.994 for specificity. Conclusion: The proposed CNN can be used as a screening tool, reducing the total number of examinations evaluated by the radiologists of the study, and as a double-reading tool, contributing to the reduction of possible interpretation errors.
Resumo Objetivo: Desenvolver um modelo computacional - rede neural convolucional (RNC) - treinado com radiografias da linha de base do Estudo Longitudinal de Saúde do Adulto Musculoesquelético (ELSA-Brasil Musculoesquelético), para a classificação automática de osteoartrite dos joelhos. Materiais e Métodos: Trata-se de um estudo transversal abrangendo todos os exames da linha de base do ELSA-Brasil Musculoesquelético (5.660 radiografias dos joelhos em incidência posteroanterior). Os exames foram interpretados por médico radiologista com treinamento específico e calibração previamente publicada. Resultados: A RNC desenvolvida apresentou área sob a curva característica de operação do receptor de 0,866 (IC 95%: 0,842-0,882). O modelo pode ser calibrado para alcançar, não simultaneamente, valores máximos de 0,907 para acurácia, 0,938 para sensibilidade e 0,994 para especificidade. Conclusão: A RNC desenvolvida pode ser utilizada como ferramenta de triagem, reduzindo o número total de exames avaliados pelos radiologistas do estudo, e/ou como ferramenta de segunda leitura, contribuindo com a redução de possíveis erros de interpretação.
ABSTRACT
Objective: To develop a convolutional neural network (CNN) model, trained with the Brazilian "Estudo Longitudinal de Saúde do Adulto Musculoesquelético" (ELSA-Brasil MSK, Longitudinal Study of Adult Health, Musculoskeletal) baseline radiographic examinations, for the automated classification of knee osteoarthritis. Materials and Methods: This was a cross-sectional study carried out with 5,660 baseline posteroanterior knee radiographs from the ELSA-Brasil MSK database (5,660 baseline posteroanterior knee radiographs). The examinations were interpreted by a radiologist with specific training, and the calibration was as established previously. Results: The CNN presented an area under the receiver operating characteristic curve of 0.866 (95% CI: 0.842-0.882). The model can be optimized to achieve, not simultaneously, maximum values of 0.907 for accuracy, 0.938 for sensitivity, and 0.994 for specificity. Conclusion: The proposed CNN can be used as a screening tool, reducing the total number of examinations evaluated by the radiologists of the study, and as a double-reading tool, contributing to the reduction of possible interpretation errors.
Objetivo: Desenvolver um modelo computacional - rede neural convolucional (RNC) - treinado com radiografias da linha de base do Estudo Longitudinal de Saúde do Adulto Musculoesquelético (ELSA-Brasil Musculoesquelético), para a classificação automática de osteoartrite dos joelhos. Materiais e Métodos: Trata-se de um estudo transversal abrangendo todos os exames da linha de base do ELSA-Brasil Musculoesquelético (5.660 radiografias dos joelhos em incidência posteroanterior). Os exames foram interpretados por médico radiologista com treinamento específico e calibração previamente publicada. Resultados: A RNC desenvolvida apresentou área sob a curva característica de operação do receptor de 0,866 (IC 95%: 0,842-0,882). O modelo pode ser calibrado para alcançar, não simultaneamente, valores máximos de 0,907 para acurácia, 0,938 para sensibilidade e 0,994 para especificidade. Conclusão: A RNC desenvolvida pode ser utilizada como ferramenta de triagem, reduzindo o número total de exames avaliados pelos radiologistas do estudo, e/ou como ferramenta de segunda leitura, contribuindo com a redução de possíveis erros de interpretação.
ABSTRACT
BACKGROUND: Despite an extensive network of primary care availability, Brazil has suffered profoundly during the COVID-19 pandemic, experiencing the greatest sanitary collapse in its history. Thus, it is important to understand phenotype risk factors for SARS-CoV-2 infection severity in the Brazilian population in order to provide novel insights into the pathogenesis of the disease. OBJECTIVE: This study proposes to predict the risk of COVID-19 death through machine learning, using blood biomarkers data from patients admitted to two large hospitals in Brazil. METHODS: We retrospectively collected blood biomarkers data in a 24-h time window from 6,979 patients with COVID-19 confirmed by positive RT-PCR admitted to two large hospitals in Brazil, of whom 291 (4.2%) died and 6,688 (95.8%) were discharged. We then developed a large-scale exploration of risk models to predict the probability of COVID-19 severity, finally choosing the best performing model regarding the average AUROC. To improve generalizability, for each model five different testing scenarios were conducted, including two external validations. RESULTS: We developed a machine learning-based panel composed of parameters extracted from the complete blood count (lymphocytes, MCV, platelets and RDW), in addition to C-Reactive Protein, which yielded an average AUROC of 0.91 ± 0.01 to predict death by COVID-19 confirmed by positive RT-PCR within a 24-h window. CONCLUSION: Our study suggests that routine laboratory variables could be useful to identify COVID-19 patients under higher risk of death using machine learning. Further studies are needed for validating the model in other populations and contexts, since the natural history of SARS-CoV-2 infection and its consequences on the hematopoietic system and other organs is still quite recent.
Subject(s)
COVID-19 , Brazil/epidemiology , COVID-19/diagnosis , COVID-19/epidemiology , Humans , Machine Learning , Pandemics , Retrospective Studies , SARS-CoV-2ABSTRACT
BACKGROUND: A cheap and minimum-invasive method for early identification of Alzheimer's disease (AD) pathogenesis is key to disease management and the success of emerging treatments targeting the prodromal phases of the disease. OBJECTIVE: To develop a machine learning-based blood panel to predict the progression from mild cognitive impairment (MCI) to dementia due to AD within a four-year time-to-conversion horizon. METHODS: We created over one billion models to predict the probability of conversion from MCI to dementia due to AD and chose the best-performing one. We used Alzheimer's Disease Neuroimaging Initiative (ADNI) data of 379 MCI individuals in the baseline visit, from which 176 converted to AD dementia. RESULTS: We developed a machine learning-based panel composed of 12 plasma proteins (ApoB, Calcitonin, C-peptide, CRP, IGFBP-2, Interleukin-3, Interleukin-8, PARC, Serotransferrin, THP, TLSP 1-309, and TN-C), and which yielded an AUC of 0.91, accuracy of 0.91, sensitivity of 0.84, and specificity of 0.98 for predicting the risk of MCI patients converting to dementia due to AD in a horizon of up to four years. CONCLUSION: The proposed machine learning model was able to accurately predict the risk of MCI patients converting to dementia due to AD in a horizon of up to four years, suggesting that this model could be used as a minimum-invasive tool for clinical decision support. Further studies are needed to better clarify the possible pathophysiological links with the reported proteins.