Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 185
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Neuroimage ; 285: 120495, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38092156

RESUMO

This study presents a comprehensive examination of sex-related differences in resting-state electroencephalogram (EEG) data, leveraging two different types of machine learning models to predict an individual's sex. We utilized data from the Two Decades-Brainclinics Research Archive for Insights in Neurophysiology (TDBRAIN) EEG study, affirming that gender prediction can be attained with noteworthy accuracy. The best performing model achieved an accuracy of 85% and an ROC AUC of 89%, surpassing all prior benchmarks set using EEG data and rivaling the top-tier results derived from fMRI studies. A comparative analysis of LightGBM and Deep Convolutional Neural Network (DCNN) models revealed DCNN's superior performance, attributed to its ability to learn complex spatial-temporal patterns in the EEG data and handle large volumes of data effectively. Despite this, interpretability remained a challenge for the DCNN model. The LightGBM interpretability analysis revealed that the most important EEG features for accurate sex prediction were related to left fronto-central and parietal EEG connectivity. We also showed the role of both low (delta and theta) and high (beta and gamma) activity in the accurate sex prediction. These results, however, have to be approached with caution, because it was obtained from a dataset comprised largely of participants with various mental health conditions, which limits the generalizability of the results and necessitates further validation in future studies. . Overall, the study illuminates the potential of interpretable machine learning for sex prediction, alongside highlighting the importance of considering individual differences in prediction sex from brain activity.


Assuntos
Encéfalo , Redes Neurais de Computação , Humanos , Encéfalo/fisiologia , Aprendizado de Máquina , Imageamento por Ressonância Magnética , Eletroencefalografia/métodos
2.
Am J Hum Genet ; 108(12): 2301-2318, 2021 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-34762822

RESUMO

Identifying whether a given genetic mutation results in a gene product with increased (gain-of-function; GOF) or diminished (loss-of-function; LOF) activity is an important step toward understanding disease mechanisms because they may result in markedly different clinical phenotypes. Here, we generated an extensive database of documented germline GOF and LOF pathogenic variants by employing natural language processing (NLP) on the available abstracts in the Human Gene Mutation Database. We then investigated various gene- and protein-level features of GOF and LOF variants and applied machine learning and statistical analyses to identify discriminative features. We found that GOF variants were enriched in essential genes, for autosomal-dominant inheritance, and in protein binding and interaction domains, whereas LOF variants were enriched in singleton genes, for protein-truncating variants, and in protein core regions. We developed a user-friendly web-based interface that enables the extraction of selected subsets from the GOF/LOF database by a broad set of annotated features and downloading of up-to-date versions. These results improve our understanding of how variants affect gene/protein function and may ultimately guide future treatment options.


Assuntos
Bases de Dados Genéticas , Mutação com Ganho de Função , Mutação com Perda de Função , Proteínas/genética , Computação em Nuvem , Predisposição Genética para Doença , Genoma Humano , Mutação em Linhagem Germinativa , Humanos , Intervenção Baseada em Internet , Aprendizado de Máquina
3.
J Comput Chem ; 45(7): 368-376, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-37909259

RESUMO

The concept of chemical bonding is a crucial aspect of chemistry that aids in understanding the complexity and reactivity of molecules and materials. However, the interpretation of chemical bonds can be hindered by the choice of the theoretical approach and the specific method utilized. This study aims to investigate the effect of choosing different density functionals on the interpretation of bonding achieved through energy decomposition analysis (EDA). To achieve this goal, a data set was created, representing four bonding groups and various combinations of functionals and dispersion correction schemes. The calculations showed significant variation among the different functionals for the EDA terms, with the dispersion correction terms exhibiting the highest variability. More information was extracted by using machine learning in combination with dimensionality reduction on the data set. Results indicate that, despite the differences in the EDA terms obtained from different functionals, the functional has the least significant impact, suggesting minimal influence on the bonding interpretation.

4.
Cytometry A ; 105(1): 24-35, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-37776305

RESUMO

T-lineage acute lymphoblastic leukemia (T-ALL) accounts for about 15% of pediatric and about 25% of adult ALL cases. Minimal/measurable residual disease (MRD) assessed by flow cytometry (FCM) is an important prognostic indicator for risk stratification. In order to assess the MRD a limited number of antibodies directed against the most discriminative antigens must be selected. We propose a pipeline for evaluating the influence of different markers for cell population classification in FCM data. We use linear support vector machine, fitted to each sample individually to avoid issues with patient and laboratory variations. The best separating hyperplane direction as well as the influence of omitting specific markers is considered. Ninety-one bone marrow samples of 43 pediatric T-ALL patients from five reference laboratories were analyzed by FCM regarding marker importance for blast cell identification using combinations of eight different markers. For all laboratories, CD48 and CD99 were among the top three markers with strongest contribution to the optimal hyperplane, measured by median separating hyperplane coefficient size for all samples per center and time point (diagnosis, Day 15, Day 33). Based on the available limited set tested (CD3, CD4, CD5, CD7, CD8, CD45, CD48, CD99), our findings prove that CD48 and CD99 are useful markers for MRD monitoring in T-ALL. The proposed pipeline can be applied for evaluation of other marker combinations in the future.


Assuntos
Leucemia-Linfoma Linfoblástico de Células Precursoras , Leucemia-Linfoma Linfoblástico de Células T Precursoras , Adulto , Criança , Humanos , Leucemia-Linfoma Linfoblástico de Células T Precursoras/diagnóstico , Citometria de Fluxo , Leucemia-Linfoma Linfoblástico de Células Precursoras/diagnóstico , Neoplasia Residual/diagnóstico , Linfócitos T
5.
Allergy ; 79(8): 2173-2185, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38995241

RESUMO

BACKGROUND: There is evidence that global anthropogenic climate change may be impacting floral phenology and the temporal and spatial characteristics of aero-allergenic pollen. Given the extent of current and future climate uncertainty, there is a need to strengthen predictive pollen forecasts. METHODS: The study aims to use CatBoost (CB) and deep learning (DL) models for predicting the daily total pollen concentration up to 14 days in advance for 23 cities, covering all five continents. The model includes the projected environmental parameters, recent concentrations (1, 2 and 4 weeks), and the past environmental explanatory variables, and their future values. RESULTS: The best pollen forecasts include Mexico City (R2(DL_7) ≈ .7), and Santiago (R2(DL_7) ≈ .8) for the 7th forecast day, respectively; while the weakest pollen forecasts are made for Brisbane (R2(DL_7) ≈ .4) and Seoul (R2(DL_7) ≈ .1) for the 7th forecast day. The global order of the five most important environmental variables in determining the daily total pollen concentrations is, in decreasing order: the past daily total pollen concentration, future 2 m temperature, past 2 m temperature, past soil temperature in 28-100 cm depth, and past soil temperature in 0-7 cm depth. City-related clusters of the most similar distribution of feature importance values of the environmental variables only slightly change on consecutive forecast days for Caxias do Sul, Cape Town, Brisbane, and Mexico City, while they often change for Sydney, Santiago, and Busan. CONCLUSIONS: This new knowledge of the ecological relationships of the most remarkable variables importance for pollen forecast models according to clusters, cities and forecast days is important for developing and improving the accuracy of airborne pollen forecasts.


Assuntos
Alérgenos , Previsões , Pólen , Pólen/imunologia , Previsões/métodos , Humanos , Mudança Climática , Modelos Teóricos , Monitoramento Ambiental/métodos
6.
BMC Gastroenterol ; 24(1): 267, 2024 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-39148020

RESUMO

PURPOSE: Irritable bowel syndrome (IBS) is a diagnosis defined by gastrointestinal (GI) symptoms like abdominal pain and changes associated with defecation. The condition is classified as a disorder of the gut-brain interaction (DGBI), and patients with IBS commonly experience psychological distress. The present study focuses on this distress, defined from reports of fatigue, anxiety, depression, sleep disturbances, and performance on cognitive tests. The aim was to investigate the joint contribution of these features of psychological distress in predicting IBS versus healthy controls (HCs) and to disentangle clinically meaningful subgroups of IBS patients. METHODS: IBS patients ( n = 49 ) and HCs ( n = 28 ) completed the Chalder Fatigue Scale (CFQ), the Hamilton Anxiety and Depression Scale (HADS), and the Bergen Insomnia Scale (BIS), and performed tests of memory function and attention from the Repeatable Battery Assessing Neuropsychological Symptoms (RBANS). An initial exploratory data analysis was followed by supervised (Random Forest) and unsupervised (K-means) classification procedures. RESULTS: The explorative data analysis showed that the group of IBS patients obtained significantly more severe scores than HCs on all included measures, with the strongest pairwise correlation between fatigue and a quality measure of sleep disturbances. The supervised classification model correctly predicted belongings to the IBS group in 80% of the cases in a test set of unseen data. Two methods for calculating feature importance in the test set gave mental and physical fatigue and anxiety the strongest weights. An unsupervised procedure with K = 3 showed that one cluster contained 24% of the patients and all but two HCs. In the two other clusters, their IBS members were overall more impaired, with the following differences. One of the two clusters showed more severe cognitive problems and anxiety symptoms than the other, which experienced more severe problems related to the quality of sleep and fatigue. The three clusters were not different on a severity measure of IBS and age. CONCLUSION: The results showed that psychological distress is an integral component of IBS symptomatology. The study should inspire future longitudinal studies to further dissect clinical patterns of IBS to improve the assessment and personalized treatment for this and other patient groups defined as disorders of the gut-brain interaction. The project is registered at https://classic. CLINICALTRIALS: gov/ct2/show/NCT04296552 20/05/2019.


Assuntos
Ansiedade , Eixo Encéfalo-Intestino , Depressão , Fadiga , Síndrome do Intestino Irritável , Aprendizado de Máquina , Angústia Psicológica , Humanos , Feminino , Masculino , Síndrome do Intestino Irritável/psicologia , Síndrome do Intestino Irritável/fisiopatologia , Síndrome do Intestino Irritável/complicações , Adulto , Ansiedade/psicologia , Ansiedade/diagnóstico , Pessoa de Meia-Idade , Fadiga/psicologia , Fadiga/diagnóstico , Fadiga/fisiopatologia , Fadiga/etiologia , Depressão/psicologia , Depressão/diagnóstico , Transtornos do Sono-Vigília/psicologia , Transtornos do Sono-Vigília/fisiopatologia , Transtornos do Sono-Vigília/diagnóstico , Estudos de Casos e Controles , Testes Neuropsicológicos , Estresse Psicológico/psicologia , Estresse Psicológico/diagnóstico
7.
Environ Sci Technol ; 58(26): 11492-11503, 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38904357

RESUMO

Soil organic carbon (SOC) plays a vital role in global carbon cycling and sequestration, underpinning the need for a comprehensive understanding of its distribution and controls. This study explores the importance of various covariates on SOC spatial distribution at both local (up to 1.25 km) and continental (USA) scales using a deep learning approach. Our findings highlight the significant role of terrain attributes in predicting SOC concentration distribution with terrain, contributing approximately one-third of the overall prediction at the local scale. At the continental scale, climate is only 1.2 times more important than terrain in predicting SOC distribution, whereas at the local scale, the structural pattern of terrain is 14 and 2 times more important than climate and vegetation, respectively. We underscore that terrain attributes, while being integral to the SOC distribution at all scales, are stronger predictors at the local scale with explicit spatial arrangement information. While this observational study does not assess causal mechanisms, our analysis nonetheless presents a nuanced perspective about SOC spatial distribution, which suggests disparate predictors of SOC at local and continental scales. The insights gained from this study have implications for improved SOC mapping, decision support tools, and land management strategies, aiding in the development of effective carbon sequestration initiatives and enhancing climate mitigation efforts.


Assuntos
Carbono , Clima , Solo , Solo/química , Ciclo do Carbono , Sequestro de Carbono
8.
Artigo em Inglês | MEDLINE | ID: mdl-38985398

RESUMO

This study presents a methodology for predicting the duration of surgical procedures using Machine Learning (ML). The methodology incorporates a new set of predictors emphasizing the significance of surgical team dynamics and composition, including experience, familiarity, social behavior, and gender diversity. By applying ML techniques to a comprehensive dataset of over 77,000 surgeries, we achieved a 24% improvement in the mean absolute error (MAE) over a model that mimics the current approach of the decision maker. Our results also underscore the critical role of surgeon experience and team composition dynamics in enhancing prediction accuracy. These advancements can lead to more efficient operational planning and resource allocation in hospitals, potentially reducing downtime in operating rooms and improving healthcare delivery.

9.
Scand J Public Health ; : 14034948241249519, 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38860312

RESUMO

AIMS: We contribute to the methodological literature on the assessment of health inequalities by applying an algorithmic approach to evaluate the capabilities of socioeconomic variables in predicting the prevalence of non-communicable diseases in a Norwegian health survey. METHODS: We use data from the seventh survey of the population based Tromsø Study (2015-2016), including 11,074 women and 10,009 men aged 40 years and above. We apply the random forest algorithm to predict four non-communicable disease outcomes (heart attack, cancer, diabetes and stroke) based on information on a number of social root causes and health behaviours. We evaluate our results using the classification error, the mean decrease in accuracy, partial dependence statistics. RESULTS: Results suggest that education, household income and occupation to a variable extent contribute to predicting non-communicable disease outcomes. Prediction misclassification ranges between 25.1% and 35.4% depending on the non-communicable diseases under study. Partial dependences reveal mostly expected health gradients, with some examples of complex functional relationships. Out-of-sample model validation shows that predictions translate to new data input. CONCLUSIONS: Algorithmic modelling can provide additional empirical detail and metrics for evaluating heterogeneous inequalities in morbidity. The extent to which education, income and occupation contribute to predicting binary non-communicable disease outcomes depends on both non-communicable diseases and socioeconomic indicator. Partial dependences reveal that social gradients in non-communicable disease outcomes vary in shape between combinations of non-communicable disease outcome and socioeconomic status indicator. Misclassification rates highlight the extent of variation within socioeconomic groups, suggesting that future studies may improve predictive accuracy by exploring further subpopulation heterogeneity.

10.
J Shoulder Elbow Surg ; 33(4): 815-822, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37625694

RESUMO

BACKGROUND: Postoperative rotator cuff retear after arthroscopic rotator cuff repair (ARCR) is still a major problem. Various risk factors such as age, gender, and tear size have been reported. Recently, magnetic resonance imaging-based stump classification was reported as an index of rotator cuff fragility. Although stump type 3 is reported to have a high retear rate, there are few reports on the risk of postoperative retear based on this classification. Machine learning (ML), an artificial intelligence technique, allows for more flexible predictive models than conventional statistical methods and has been applied to predict clinical outcomes. In this study, we used ML to predict postoperative retear risk after ARCR. METHODS: The retrospective case-control study included 353 patients who underwent surgical treatment for complete rotator cuff tear using the suture-bridge technique. Patients who initially presented with retears and traumatic tears were excluded. In study participants, after the initial tear repair, rotator cuff retears were diagnosed by magnetic resonance imaging; Sugaya classification types IV and V were defined as re-tears. Age, gender, stump classification, tear size, Goutallier classification, presence of diabetes, and hyperlipidemia were used for ML parameters to predict the risk of retear. Using Python's Scikit-learn as an ML library, five different AI models (logistic regression, random forest, AdaBoost, CatBoost, LightGBM) were trained on the existing data, and the prediction models were applied to the test dataset. The performance of these ML models was measured by the area under the receiver operating characteristic curve. Additionally, key features affecting retear were evaluated. RESULTS: The area under the receiver operating characteristic curve for logistic regression was 0.78, random forest 0.82, AdaBoost 0.78, CatBoost 0.83, and LightGBM 0.87, respectively for each model. LightGBM showed the highest score. The important factors for model prediction were age, stump classification, and tear size. CONCLUSIONS: The ML classifier model predicted retears after ARCR with high accuracy, and the AI model showed that the most important characteristics affecting retears were age and imaging findings, including stump classification. This model may be able to predict postoperative rotator cuff retears based on clinical features.


Assuntos
Lacerações , Lesões do Manguito Rotador , Humanos , Lesões do Manguito Rotador/diagnóstico por imagem , Lesões do Manguito Rotador/cirurgia , Estudos Retrospectivos , Estudos de Casos e Controles , Inteligência Artificial , Resultado do Tratamento , Ruptura/cirurgia , Artroscopia/métodos , Imageamento por Ressonância Magnética , Medição de Risco , Aprendizado de Máquina
11.
Behav Res Methods ; 2024 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-38453828

RESUMO

Conventionally, event-related potential (ERP) analysis relies on the researcher to identify the sensors and time points where an effect is expected. However, this approach is prone to bias and may limit the ability to detect unexpected effects or to investigate the full range of the electroencephalography (EEG) signal. Data-driven approaches circumvent this limitation, however, the multiple comparison problem and the statistical correction thereof affect both the sensitivity and specificity of the analysis. In this study, we present SHERPA - a novel approach based on explainable artificial intelligence (XAI) designed to provide the researcher with a straightforward and objective method to find relevant latency ranges and electrodes. SHERPA is comprised of a convolutional neural network (CNN) for classifying the conditions of the experiment and SHapley Additive exPlanations (SHAP) as a post hoc explainer to identify the important temporal and spatial features. A classical EEG face perception experiment is employed to validate the approach by comparing it to the established researcher- and data-driven approaches. Likewise, SHERPA identified an occipital cluster close to the temporal coordinates for the N170 effect expected. Most importantly, SHERPA allows quantifying the relevance of an ERP for a psychological mechanism by calculating an "importance score". Hence, SHERPA suggests the presence of a negative selection process at the early and later stages of processing. In conclusion, our new method not only offers an analysis approach suitable in situations with limited prior knowledge of the effect in question but also an increased sensitivity capable of distinguishing neural processes with high precision.

12.
Entropy (Basel) ; 26(7)2024 Jun 22.
Artigo em Inglês | MEDLINE | ID: mdl-39056900

RESUMO

Rapid and precise detection of significant data streams within a network is crucial for efficient traffic management. This study leverages the TabNet deep learning architecture to identify large-scale flows, known as elephant flows, by analyzing the information in the 5-tuple fields of the initial packet header. The results demonstrate that employing a TabNet model can accurately identify elephant flows right at the start of the flow and makes it possible to reduce the number of flow table entries by up to 20 times while still effectively managing 80% of the network traffic through individual flow entries. The model was trained and tested on a comprehensive dataset from a campus network, demonstrating its robustness and potential applicability to varied network environments.

13.
Neuroimage ; 282: 120396, 2023 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-37805019

RESUMO

Multivariate pattern analysis (MVPA) of Magnetoencephalography (MEG) and Electroencephalography (EEG) data is a valuable tool for understanding how the brain represents and discriminates between different stimuli. Identifying the spatial and temporal signatures of stimuli is typically a crucial output of these analyses. Such analyses are mainly performed using linear, pairwise, sliding window decoding models. These allow for relative ease of interpretation, e.g. by estimating a time-course of decoding accuracy, but have limited decoding performance. On the other hand, full epoch multiclass decoding models, commonly used for brain-computer interface (BCI) applications, can provide better decoding performance. However interpretation methods for such models have been designed with a low number of classes in mind. In this paper, we propose an approach that combines a multiclass, full epoch decoding model with supervised dimensionality reduction, while still being able to reveal the contributions of spatiotemporal and spectral features using permutation feature importance. Crucially, we introduce a way of doing supervised dimensionality reduction of input features within a neural network optimised for the classification task, improving performance substantially. We demonstrate the approach on 3 different many-class task-MEG datasets using image presentations. Our results demonstrate that this approach consistently achieves higher accuracy than the peak accuracy of a sliding window decoder while estimating the relevant spatiotemporal features in the MEG signal.


Assuntos
Interfaces Cérebro-Computador , Magnetoencefalografia , Humanos , Magnetoencefalografia/métodos , Encéfalo , Eletroencefalografia/métodos , Mapeamento Encefálico/métodos , Redes Neurais de Computação , Algoritmos
14.
Hum Brain Mapp ; 44(17): 6105-6119, 2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-37753636

RESUMO

Decoding brain imaging data are gaining popularity, with applications in brain-computer interfaces and the study of neural representations. Decoding is typically subject-specific and does not generalise well over subjects, due to high amounts of between subject variability. Techniques that overcome this will not only provide richer neuroscientific insights but also make it possible for group-level models to outperform subject-specific models. Here, we propose a method that uses subject embedding, analogous to word embedding in natural language processing, to learn and exploit the structure in between-subject variability as part of a decoding model, our adaptation of the WaveNet architecture for classification. We apply this to magnetoencephalography data, where 15 subjects viewed 118 different images, with 30 examples per image; to classify images using the entire 1 s window following image presentation. We show that the combination of deep learning and subject embedding is crucial to closing the performance gap between subject- and group-level decoding models. Importantly, group models outperform subject models on low-accuracy subjects (although slightly impair high-accuracy subjects) and can be helpful for initialising subject models. While we have not generally found group-level models to perform better than subject-level models, the performance of group modelling is expected to be even higher with bigger datasets. In order to provide physiological interpretation at the group level, we make use of permutation feature importance. This provides insights into the spatiotemporal and spectral information encoded in the models. All code is available on GitHub (https://github.com/ricsinaruto/MEG-group-decode).


Assuntos
Interfaces Cérebro-Computador , Aprendizado Profundo , Humanos , Encéfalo/diagnóstico por imagem , Encéfalo/fisiologia , Magnetoencefalografia/métodos , Mapeamento Encefálico/métodos
15.
Rev Cardiovasc Med ; 24(11): 330, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-39076440

RESUMO

Background: Cardiovascular diseases (CVD) remain the predominant global cause of mortality, with both low and high temperatures increasing CVD-related mortalities. Climate change impacts human health directly through temperature fluctuations and indirectly via factors like disease vectors. Elevated and reduced temperatures have been linked to increases in CVD-related hospitalizations and mortality, with various studies worldwide confirming the significant health implications of temperature variations and air pollution on cardiovascular outcomes. Methods: A database of daily Emergency Room admissions at the Giovanni XIII Polyclinic in Bari (Southern Italy) was developed, spanning from 2013 to 2019, including weather and air quality data. A Random Forest (RF) supervised machine learning model was used to simulate the trend of hospital admissions for CVD. The Seasonal and Trend decomposition using Loess (STL) decomposition model separated the trend component, while cross-validation techniques were employed to prevent overfitting. Model performance was assessed using specific metrics and error analysis. Additionally, the SHapley Additive exPlanations (SHAP) method, a feature importance technique within the eXplainable Artificial Intelligence (XAI) framework, was used to identify the feature importance. Results: An R 2 of 0.97 and a Mean Absolute Error of 0.36 admissions were achieved by the model. Atmospheric pressure, minimum temperature, and carbon monoxide were found to collectively contribute about 74% to the model's predictive power, with atmospheric pressure being the dominant factor at 37%. Conclusions: This research underscores the significant influence of weather-climate variables on cardiovascular diseases. The identified key climate factors provide a practical framework for policymakers and healthcare professionals to mitigate the adverse effects of climate change on CVD and devise preventive strategies.

16.
BMC Med Res Methodol ; 23(1): 144, 2023 06 19.
Artigo em Inglês | MEDLINE | ID: mdl-37337173

RESUMO

BACKGROUND: Machine learning tools such as random forests provide important opportunities for modeling large, complex modern data generated in medicine. Unfortunately, when it comes to understanding why machine learning models are predictive, applied research continues to rely on 'out of bag' (OOB) variable importance metrics (VIMPs) that are known to have considerable shortcomings within the statistics community. After explaining the limitations of OOB VIMPs - including bias towards correlated features and limited interpretability - we describe a modern approach called 'knockoff VIMPs' and explain its advantages. METHODS: We first evaluate current VIMP practices through an in-depth literature review of 50 recent random forest manuscripts. Next, we recommend organized and interpretable strategies for analysis with knockoff VIMPs, including computing them for groups of features and considering multiple model performance metrics. To demonstrate methods, we develop a random forest to predict 5-year incident stroke in the Sleep Heart Health Study and compare results based on OOB and knockoff VIMPs. RESULTS: Nearly all papers in the literature review contained substantial limitations in their use of VIMPs. In our demonstration, using OOB VIMPs for individual variables suggested two highly correlated lung function variables (forced expiratory volume, forced vital capacity) as the best predictors of incident stroke, followed by age and height. Using an organized analytic approach that considered knockoff VIMPs of both groups of features and individual features, the largest contributions to model sensitivity were medications (especially cardiovascular) and measured medical risk factors, while the largest contributions to model specificity were age, diastolic blood pressure, self-reported medical risk factors, polysomnography features, and pack-years of smoking. Thus, we reach very different conclusions about stroke risk factors using OOB VIMPs versus knockoff VIMPs. CONCLUSIONS: The near-ubiquitous reliance on OOB VIMPs may provide misleading results for researchers who use such methods to guide their research. Given the rapid pace of scientific inquiry using machine learning, it is essential to bring modern knockoff VIMPs that are interpretable and unbiased into widespread applied practice to steer researchers using random forest machine learning toward more meaningful results.


Assuntos
Algoritmo Florestas Aleatórias , Acidente Vascular Cerebral , Humanos , Benchmarking , Aprendizado de Máquina , Acidente Vascular Cerebral/diagnóstico , Acidente Vascular Cerebral/epidemiologia , Sono
17.
J Biomed Inform ; 147: 104535, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37926393

RESUMO

INTRODUCTION: Depression is a global concern, with a significant number of people affected worldwide, particularly in low- and middle-income countries. The rising prevalence of depression emphasizes the importance of early detection and understanding the origins of such conditions. OBJECTIVE: This paper proposes a framework for detecting depression using a hybrid visualization approach that combines local and global interpretation. This approach aims to assist in model adaptation, provide insights into patient characteristics, and evaluate prediction model suitability in a different environment. METHODS: This study utilizes R programming language with the Caret, ggplot2, Plotly, and Dalex libraries for model training, visualization, and interpretation. Data from the NHANES repository was used for secondary data analysis. The NHANES repository is a comprehensive source for examining health and nutrition of individuals in the United States, and covers demographic, dietary, medication use, lifestyle choices, reproductive and mental health data. Penalized logistic regression models were built using NHANES 2015-2018 data, while NHANES 2019-March 2020 data was used for evaluation at the global-specific and local level interpretation. RESULTS: The prediction model that supports this framework achieved an average AUC score of 0.748 (95% CI: 0.743-0.752), with minimal variability in sensitivity and specificity. CONCLUSION: The built-in prediction model highlights chest pain, the ratio of family income to poverty, and smoking status as crucial features for predicting depressive states in both the original and local environments.


Assuntos
Dieta , Pobreza , Humanos , Estados Unidos , Inquéritos Nutricionais , Modelos Logísticos
18.
Biomed Eng Online ; 22(1): 74, 2023 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-37479991

RESUMO

BACKGROUND: Colorectal cancer is one of the most serious malignant tumors, and lymph node metastasis (LNM) from colorectal cancer is a major factor for patient management and prognosis. Accurate image detection of LNM is an important task to help clinicians diagnose cancer. Recently, the U-Net architecture based on convolutional neural networks (CNNs) has been widely used to segment image to accomplish more precise cancer diagnosis. However, the accurate segmentation of important regions with high diagnostic value is still a great challenge due to the insufficient capability of CNN and codec structure in aggregating the detailed and non-local contextual information. In this work, we propose a high performance and low computation solution. METHODS: Inspired by the working principle of Fovea in visual neuroscience, a novel network framework based on U-Net for cancer segmentation named Fovea-UNet is proposed to adaptively adjust the resolution according to the importance-aware of information and selectively focuses on the region most relevant to colorectal LNM. Specifically, we design an effective adaptively optimized pooling operation called Fovea Pooling (FP), which dynamically aggregate the detailed and non-local contextual information according to the pixel-level feature importance. In addition, the improved lightweight backbone network based on GhostNet is adopted to reduce the computational cost caused by FP. RESULTS: Experimental results show that our proposed framework can achieve higher performance than other state-of-the-art segmentation networks with 79.38% IoU, 88.51% DSC, 92.82% sensitivity and 84.57% precision on the LNM dataset, and the parameter amount is reduced to 23.23 MB. CONCLUSIONS: The proposed framework can provide a valid tool for cancer diagnosis, especially for LNM of colorectal cancer.


Assuntos
Neoplasias Colorretais , Aprendizado Profundo , Humanos , Neoplasias Colorretais/diagnóstico por imagem , Metástase Linfática , Redes Neurais de Computação
19.
J Biopharm Stat ; 33(3): 257-271, 2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-36397284

RESUMO

Lung cancer recurrence seems to be the most leading cause of death as well as deterioration of lifespan. Proper assessment of the probability of recurrence in early-stage lung cancer is necessary to push up the treatment progress. We therefore employed machine-learning technologies to forecast post-operative recurrence risks using 174 lung cancer patient records. Six classification algorithms logistic regression, SVM, decision tree classification, random forest classification, XGBoost and lightGBM were used to predict the cancer recurrence. The patient samples were divided into training and test group with the split ratio of 3:1 for model generation and the accuracy were validated using k-fold cross-validation method. It is worth noting that the logistic regression model outperformed all the models in both training (Accuracy = 0.82) and test set (Accuracy = 0.79) on k-fold validation. Further, the optimal features (n = 7) identified using the RFE method is certainly helpful to improve the model in a high precision. The imperative risk factors associated with recurrence were identified using three feature selection methods. Importantly, our research showed that age is an important prognostic factor to be considered during the recurrence prediction. Indeed, severe concern on the identified risk factors combined with predictive models assists the physician to reduce the cancer recurrence rate in patients with lung cancer.


Assuntos
Neoplasias Pulmonares , Recidiva Local de Neoplasia , Humanos , Recidiva Local de Neoplasia/epidemiologia , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/epidemiologia , Aprendizado de Máquina , Previsões , Algoritmos
20.
BMC Med Inform Decis Mak ; 23(1): 58, 2023 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-37024858

RESUMO

OBJECTIVE: We aimed to develop a robust framework to model the complex association between clinical features and traumatic brain injury (TBI) risk in children under age two, and identify significant features to derive clinical decision rules for triage decisions. METHODS: In this retrospective study, four frequently used machine learning models, i.e., support vector machine (SVM), random forest (RF), deep neural network (DNN), and XGBoost (XGB), were compared to identify significant clinical features from 24 input features associated with the TBI risk in children under age two under the permutation feature importance test (PermFIT) framework by using the publicly available data set from the Pediatric Emergency Care Applied Research Network (PECARN) study. The prediction accuracy was determined by comparing the predicted TBI status with the computed tomography (CT) scan results since CT scan is the gold standard for diagnosing TBI. RESULTS: At a significance level of [Formula: see text], DNN, RF, XGB, and SVM identified 9, 1, 2,  and 4 significant features, respectively. In a comparison of accuracy (Accuracy), the area under the curve (AUC), and the precision-recall area under the curve (PR-AUC), the permutation feature importance test for DNN model was the most powerful framework for identifying significant features and outperformed other methods, i.e., RF, XGB, and SVM, with Accuracy, AUC, and PR-AUC as 0.915, 0.794, and 0.974, respectively. CONCLUSION: These results indicate that the PermFIT-DNN framework robustly identifies significant clinical features associated with TBI status and improves prediction performance. The findings could be used to inform the development of clinical decision tools designed to inform triage decisions.


Assuntos
Lesões Encefálicas Traumáticas , Serviços Médicos de Emergência , Criança , Humanos , Lactente , Estudos Retrospectivos , Lesões Encefálicas Traumáticas/diagnóstico por imagem , Redes Neurais de Computação , Regras de Decisão Clínica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA