RESUMO
At the plasma membrane, in response to biotic and abiotic cues, specific ligands initiate the formation of receptor kinase heterodimers, which regulate activities of plasma membrane proteins and initiate signaling cascades to the nucleus. In this study, we utilized affinity enrichment mass spectrometry (AE-MS) to investigate the stimulus-dependent interactomes of LRR receptor kinases in response to their respective ligands, with an emphasis on exploring structural influences and potential cross-talk events at the plasma membrane. BRI1 and SIRK1 were chosen as receptor kinases with distinct coreceptor preference. By using interactome characteristic of domain-swap chimera following a gradient boosting learning algorithm trained on SIRK1 and BRI1 interactomes, we attribute contributions of extracellular domain, transmembrane domain, juxtamembrane domain and kinase domain of respective ligand-binding receptors to their interaction with their coreceptors and substrates. Our results revealed juxtamembrane domain as major structural element defining the specific substrate recruitment for BRI1 and extracellular domain for SIRK1. Furthermore, the learning algrorithm enabled us to predict the phenotypic outcomes of chimeric receptors based on different domain combinations, which was verified by dedicated experiments. As a result, our work reveals a tightly controlled balance of signaling cascade activation dependent on ligand-binding receptors domains and the internal ligand status of the plant. Moreover, our study shows the robust utility of machine learning classification as a quantitative metric for studying dynamic interactomes, dissecting the contribution of specific domains and predicting their phenotypic outcome.
RESUMO
Viruses are the most ubiquitous and diverse entities in the biome. Due to the rapid growth of newly identified viruses, there is an urgent need for accurate and comprehensive virus classification, particularly for novel viruses. Here, we present PhaGCN2, which can rapidly classify the taxonomy of viral sequences at the family level and supports the visualization of the associations of all families. We evaluate the performance of PhaGCN2 and compare it with the state-of-the-art virus classification tools, such as vConTACT2, CAT and VPF-Class, using the widely accepted metrics. The results show that PhaGCN2 largely improves the precision and recall of virus classification, increases the number of classifiable virus sequences in the Global Ocean Virome dataset (v2.0) by four times and classifies more than 90% of the Gut Phage Database. PhaGCN2 makes it possible to conduct high-throughput and automatic expansion of the database of the International Committee on Taxonomy of Viruses. The source code is freely available at https://github.com/KennthShang/PhaGCN2.0.
Assuntos
Vírus , Vírus/genética , Genoma Viral , Bases de Dados Factuais , Software , GenômicaRESUMO
BACKGROUND: Compared to traditional supervised machine learning approaches employing fully labeled samples, positive-unlabeled (PU) learning techniques aim to classify "unlabeled" samples based on a smaller proportion of known positive examples. This more challenging modeling goal reflects many real-world scenarios in which negative examples are not available-posing direct challenges to defining prediction accuracy and robustness. While several studies have evaluated predictions learned from only definitive positive examples, few have investigated whether correct classification of a high proportion of known positives (KP) samples from among unlabeled samples can act as a surrogate to indicate model quality. RESULTS: In this study, we report a novel methodology combining multiple established PU learning-based strategies with permutation testing to evaluate the potential of KP samples to accurately classify unlabeled samples without using "ground truth" positive and negative labels for validation. Multivariate synthetic and real-world high-dimensional benchmark datasets were employed to demonstrate the suitability of the proposed pipeline to provide evidence of model robustness across varied underlying ground truth class label compositions among the unlabeled set and with different proportions of KP examples. Comparisons between model performance with actual and permuted labels could be used to distinguish reliable from unreliable models. CONCLUSIONS: As in fully supervised machine learning, permutation testing offers a means to set a baseline "no-information rate" benchmark in the context of semi-supervised PU learning inference tasks-providing a standard against which model performance can be compared.
Assuntos
Aprendizado de Máquina , Aprendizado de Máquina Supervisionado , Humanos , Biologia Computacional/métodos , AlgoritmosRESUMO
BACKGROUND: The accurate prediction of genomic breeding values is central to genomic selection in both plant and animal breeding studies. Genomic prediction involves the use of thousands of molecular markers spanning the entire genome and therefore requires methods able to efficiently handle high dimensional data. Not surprisingly, machine learning methods are becoming widely advocated for and used in genomic prediction studies. These methods encompass different groups of supervised and unsupervised learning methods. Although several studies have compared the predictive performances of individual methods, studies comparing the predictive performance of different groups of methods are rare. However, such studies are crucial for identifying (i) groups of methods with superior genomic predictive performance and assessing (ii) the merits and demerits of such groups of methods relative to each other and to the established classical methods. Here, we comparatively evaluate the genomic predictive performance and informally assess the computational cost of several groups of supervised machine learning methods, specifically, regularized regression methods, deep, ensemble and instance-based learning algorithms, using one simulated animal breeding dataset and three empirical maize breeding datasets obtained from a commercial breeding program. RESULTS: Our results show that the relative predictive performance and computational expense of the groups of machine learning methods depend upon both the data and target traits and that for classical regularized methods, increasing model complexity can incur huge computational costs but does not necessarily always improve predictive accuracy. Thus, despite their greater complexity and computational burden, neither the adaptive nor the group regularized methods clearly improved upon the results of their simple regularized counterparts. This rules out selection of one procedure among machine learning methods for routine use in genomic prediction. The results also show that, because of their competitive predictive performance, computational efficiency, simplicity and therefore relatively few tuning parameters, the classical linear mixed model and regularized regression methods are likely to remain strong contenders for genomic prediction. CONCLUSIONS: The dependence of predictive performance and computational burden on target datasets and traits call for increasing investments in enhancing the computational efficiency of machine learning algorithms and computing resources.
Assuntos
Aprendizado Profundo , Animais , Melhoramento Vegetal , Genoma , Genômica/métodos , Aprendizado de MáquinaRESUMO
BACKGROUND: the problem in early diagnosis of sporadic cancer is understanding the individual's risk to develop disease. In response to this need, global scientific research is focusing on developing predictive models based on non-invasive screening tests. A tentative solution to the problem may be a cancer screening blood-based test able to discover those cell requirements triggering subclinical and clinical onset latency, at the stage when the cell disorder, i.e. atypical epithelial hyperplasia, is still in a subclinical stage of proliferative dysregulation. METHODS: a well-established procedure to identify proliferating circulating tumor cells was deployed to measure the cell proliferation of circulating non-haematological cells which may suggest tumor pathology. Moreover, the data collected were processed by a supervised machine learning model to make the prediction. RESULTS: the developed test combining circulating non-haematological cell proliferation data and artificial intelligence shows 98.8% of accuracy, 100% sensitivity, and 95% specificity. CONCLUSION: this proof of concept study demonstrates that integration of innovative non invasive methods and predictive-models can be decisive in assessing the health status of an individual, and achieve cutting-edge results in cancer prevention and management.
Assuntos
Inteligência Artificial , Neoplasias , HumanosRESUMO
BACKGROUND: Coastal areas are subject to various anthropogenic and natural influences. In this study, we investigated and compared the characteristics of two coastal regions, Andhra Pradesh (AP) and Goa (GA), focusing on pollution, anthropogenic activities, and recreational impacts. We explored three main factors influencing the differences between these coastlines: The Bay of Bengal's shallower depth and lower salinity; upwelling phenomena due to the thermocline in the Arabian Sea; and high tides that can cause strong currents that transport pollutants and debris. RESULTS: The microbial diversity in GA was significantly higher than that in AP, which might be attributed to differences in temperature, soil type, and vegetation cover. 16S rRNA amplicon sequencing and bioinformatics analysis indicated the presence of diverse microbial phyla, including candidate phyla radiation (CPR). Statistical analysis, random forest regression, and supervised machine learning models classification confirm the diversity of the microbiome accurately. Furthermore, we have identified 450 cultures of heterotrophic, biotechnologically important bacteria. Some strains were identified as novel taxa based on 16S rRNA gene sequencing, showing promising potential for further study. CONCLUSION: Thus, our study provides valuable insights into the microbial diversity and pollution levels of coastal areas in AP and GA. These findings contribute to a better understanding of the impact of anthropogenic activities and climate variations on biology of coastal ecosystems and biodiversity.
Assuntos
Bactérias , Baías , Microbiota , Filogenia , RNA Ribossômico 16S , Água do Mar , Aprendizado de Máquina Supervisionado , RNA Ribossômico 16S/genética , Bactérias/classificação , Bactérias/genética , Bactérias/isolamento & purificação , Microbiota/genética , Água do Mar/microbiologia , Índia , Baías/microbiologia , Biodiversidade , DNA Bacteriano/genética , Salinidade , Análise de Sequência de DNA/métodosRESUMO
Posttraumatic stress disorder (PTSD) is a chronic psychiatric condition that follows exposure to a traumatic stressor. Though previous in vivo proton (1H) MRS) research conducted at 4 T or lower has identified alterations in glutamate metabolism associated with PTSD predisposition and/or progression, no prior investigations have been conducted at higher field strength. In addition, earlier studies have not extensively addressed the impact of psychiatric comorbidities such as major depressive disorder (MDD) on PTSD-associated 1H-MRS-visible brain metabolite abnormalities. Here we employ 7 T 1H MRS to examine concentrations of glutamate, glutamine, GABA, and glutathione in the medial prefrontal cortex (mPFC) of PTSD patients with MDD (PTSD+MDD+; N = 6) or without MDD (PTSD+MDD-; N = 5), as well as trauma-unmatched controls without PTSD but with MDD (PTSD-MDD+; N = 9) or without MDD (PTSD-MDD-; N = 18). Participants with PTSD demonstrated decreased ratios of GABA to glutamine relative to healthy PTSD-MDD- controls but no single-metabolite abnormalities. When comorbid MDD was considered, however, MDD but not PTSD diagnosis was significantly associated with increased mPFC glutamine concentration and decreased glutamate:glutamine ratio. In addition, all participants with PTSD and/or MDD collectively demonstrated decreased glutathione relative to healthy PTSD-MDD- controls. Despite limited findings in single metabolites, patterns of abnormality in prefrontal metabolite concentrations among individuals with PTSD and/or MDD enabled supervised classification to separate them from healthy controls with 80+% sensitivity and specificity, with glutathione, glutamine, and myoinositol consistently among the most informative metabolites for this classification. Our findings indicate that MDD can be an important factor in mPFC glutamate metabolism abnormalities observed using 1H MRS in cohorts with PTSD.
Assuntos
Transtorno Depressivo Maior , Neurotransmissores , Córtex Pré-Frontal , Transtornos de Estresse Pós-Traumáticos , Humanos , Córtex Pré-Frontal/metabolismo , Transtorno Depressivo Maior/metabolismo , Transtornos de Estresse Pós-Traumáticos/metabolismo , Masculino , Feminino , Adulto , Neurotransmissores/metabolismo , Comorbidade , Pessoa de Meia-Idade , Glutamina/metabolismo , Ácido Glutâmico/metabolismo , Espectroscopia de Prótons por Ressonância Magnética , Ácido gama-Aminobutírico/metabolismoRESUMO
BACKGROUND: Single center MRI radiomics models are sensitive to data heterogeneity, limiting the diagnostic capabilities of current prostate cancer (PCa) radiomics models. PURPOSE: To study the impact of image resampling on the diagnostic performance of radiomics in a multicenter prostate MRI setting. STUDY TYPE: Retrospective. POPULATION: Nine hundred thirty patients (nine centers, two vendors) with 737 eligible PCa lesions, randomly split into training (70%, N = 500), validation (10%, N = 89), and a held-out test set (20%, N = 148). FIELD STRENGTH/SEQUENCE: 1.5T and 3T scanners/T2-weighted imaging (T2W), diffusion-weighted imaging (DWI), and apparent diffusion coefficient maps. ASSESSMENT: A total of 48 normalized radiomics datasets were created using various resampling methods, including different target resolutions (T2W: 0.35, 0.5, and 0.8 mm; DWI: 1.37, 2, and 2.5 mm), dimensionalities (2D/3D) and interpolation techniques (nearest neighbor, linear, Bspline and Blackman windowed-sinc). Each of the datasets was used to train a radiomics model to detect clinically relevant PCa (International Society of Urological Pathology grade ≥ 2). Baseline models were constructed using 2D and 3D datasets without image resampling. The resampling configurations with highest validation performance were evaluated in the test dataset and compared to the baseline models. STATISTICAL TESTS: Area under the curve (AUC), DeLong test. The significance level used was 0.05. RESULTS: The best 2D resampling model (T2W: Bspline and 0.5 mm resolution, DWI: nearest neighbor and 2 mm resolution) significantly outperformed the 2D baseline (AUC: 0.77 vs. 0.64). The best 3D resampling model (T2W: linear and 0.8 mm resolution, DWI: nearest neighbor and 2.5 mm resolution) significantly outperformed the 3D baseline (AUC: 0.79 vs. 0.67). DATA CONCLUSION: Image resampling has a significant effect on the performance of multicenter radiomics artificial intelligence in prostate MRI. The recommended 2D resampling configuration is isotropic resampling with T2W at 0.5 mm (Bspline interpolation) and DWI at 2 mm (nearest neighbor interpolation). For the 3D radiomics, this work recommends isotropic resampling with T2W at 0.8 mm (linear interpolation) and DWI at 2.5 mm (nearest neighbor interpolation). EVIDENCE LEVEL: 3 TECHNICAL EFFICACY: Stage 2.
Assuntos
Próstata , Neoplasias da Próstata , Masculino , Humanos , Próstata/diagnóstico por imagem , Próstata/patologia , Estudos Retrospectivos , Inteligência Artificial , Radiômica , Imageamento por Ressonância Magnética/métodos , Neoplasias da Próstata/diagnóstico por imagem , Neoplasias da Próstata/patologiaRESUMO
OBJECTIVES: Non-contrast computed tomography of the brain (NCCTB) is commonly used to detect intracranial pathology but is subject to interpretation errors. Machine learning can augment clinical decision-making and improve NCCTB scan interpretation. This retrospective detection accuracy study assessed the performance of radiologists assisted by a deep learning model and compared the standalone performance of the model with that of unassisted radiologists. METHODS: A deep learning model was trained on 212,484 NCCTB scans drawn from a private radiology group in Australia. Scans from inpatient, outpatient, and emergency settings were included. Scan inclusion criteria were age ≥ 18 years and series slice thickness ≤ 1.5 mm. Thirty-two radiologists reviewed 2848 scans with and without the assistance of the deep learning system and rated their confidence in the presence of each finding using a 7-point scale. Differences in AUC and Matthews correlation coefficient (MCC) were calculated using a ground-truth gold standard. RESULTS: The model demonstrated an average area under the receiver operating characteristic curve (AUC) of 0.93 across 144 NCCTB findings and significantly improved radiologist interpretation performance. Assisted and unassisted radiologists demonstrated an average AUC of 0.79 and 0.73 across 22 grouped parent findings and 0.72 and 0.68 across 189 child findings, respectively. When assisted by the model, radiologist AUC was significantly improved for 91 findings (158 findings were non-inferior), and reading time was significantly reduced. CONCLUSIONS: The assistance of a comprehensive deep learning model significantly improved radiologist detection accuracy across a wide range of clinical findings and demonstrated the potential to improve NCCTB interpretation. CLINICAL RELEVANCE STATEMENT: This study evaluated a comprehensive CT brain deep learning model, which performed strongly, improved the performance of radiologists, and reduced interpretation time. The model may reduce errors, improve efficiency, facilitate triage, and better enable the delivery of timely patient care. KEY POINTS: ⢠This study demonstrated that the use of a comprehensive deep learning system assisted radiologists in the detection of a wide range of abnormalities on non-contrast brain computed tomography scans. ⢠The deep learning model demonstrated an average area under the receiver operating characteristic curve of 0.93 across 144 findings and significantly improved radiologist interpretation performance. ⢠The assistance of the comprehensive deep learning model significantly reduced the time required for radiologists to interpret computed tomography scans of the brain.
Assuntos
Aprendizado Profundo , Adolescente , Humanos , Radiografia , Radiologistas , Estudos Retrospectivos , Tomografia Computadorizada por Raios X/métodos , AdultoRESUMO
OBJECTIVE: Challenging infrarenal aortic neck characteristics have been associated with an increased risk of type Ia endoleak after endovascular aneurysm repair (EVAR). Short apposition (< 10 mm circumferential shortest apposition length [SAL]) on the first post-operative computed tomography angiography (CTA) has been associated with type Ia endoleak. Therefore, this study aimed to develop a model to predict post-operative SAL in patients with an abdominal aortic aneurysm based on the pre-operative shape. METHODS: A statistical shape model was developed to obtain principal component scores. The dataset comprised patients treated by standard EVAR without complications (n = 93) enriched with patients with a late type Ia endoleak (n = 54). The infrarenal SAL was obtained from the first post-operative CTA and subsequently binarised (< 10 mm and ≥ 10 mm). The principal component scores that were statistically different between the SAL groups were used as input for five classification models, and evaluated by means of leave one out cross validation. Area under the receiver operating characteristic curves (AUC), accuracy, sensitivity, and specificity were determined for each classification model. RESULTS: Of the 147 patients, 24 patients had an infrarenal SAL < 10 mm and 123 patients had a SAL ≥ 10 mm. The gradient boosting model resulted in the highest AUC of 0.77. Using this model, 114 patients (77.6%) were correctly classified; sensitivity (< 10 mm apposition was correctly predicted) and specificity (≥ 10 mm apposition was correctly predicted) were 0.70 and 0.79 based on a threshold of 0.21, respectively. CONCLUSION: A model was developed to predict which patients undergoing EVAR will achieve sufficient graft apposition (≥ 10 mm) in the infrarenal aortic neck based on a statistical shape model of pre-operative CTA data. This model can help vascular specialists during the planning phase to accurately identify patients who are unlikely to achieve sufficient apposition after standard EVAR.
RESUMO
In the process of screening for probiotic strains, there are no clearly established bacterial phenotypic markers which could be used for the prediction of their in vivo mechanism of action. In this work, we demonstrate for the first time that Machine Learning (ML) methods can be used for accurately predicting the in vivo immunomodulatory activity of probiotic strains based on their cell surface phenotypic features using a snail host-microbe interaction model. A broad range of snail gut presumptive probiotics, including 240 new lactic acid bacterial strains (Lactobacillus, Leuconostoc, Lactococcus, and Enterococcus), were isolated and characterized based on their capacity to withstand snails' gastrointestinal defense barriers, such as the pedal mucus, gastric mucus, gastric juices, and acidic pH, in association with their cell surface hydrophobicity, autoaggregation, and biofilm formation ability. The implemented ML pipeline predicted with high accuracy (88 %) strains with a strong capacity to enhance chemotaxis and phagocytic activity of snails' hemolymph cells, while also revealed bacterial autoaggregation and cell surface hydrophobicity as the most important parameters that significantly affect host immune responses. The results show that ML approaches may be useful to derive a predictive understanding of host-probiotic interactions, while also highlighted the use of snails as an efficient animal model for screening presumptive probiotic strains in the light of their interaction with cellular innate immune responses.
Assuntos
Aprendizado de Máquina , Probióticos , Probióticos/farmacologia , Animais , Lactobacillales/fisiologia , Lactobacillales/imunologia , Caramujos/imunologia , Caramujos/microbiologia , Caracois Helix/imunologia , Caracois Helix/fisiologia , Imunidade Inata , ImunomodulaçãoRESUMO
OBJECTIVES: Accumulating evidence argues for a more widespread use of therapeutic drug monitoring (TDM) to support individualized medicine, especially for therapies where toxicity and efficacy are critical issues, such as in oncology. However, development of TDM assays struggles to keep pace with the rapid introduction of new drugs. Therefore, novel approaches for faster assay development are needed that also allow effortless inclusion of newly approved drugs as well as customization to smaller subsets if scientific or clinical situations require. METHODS: We applied and evaluated two machine-learning approaches i.e., a regression-based approach and an artificial neural network (ANN) to retention time (RT) prediction for efficient development of a liquid chromatography mass spectrometry (LC-MS) method quantifying 73 oral antitumor drugs (OADs) and five active metabolites. Individual steps included training, evaluation, comparison, and application of the superior approach to RT prediction, followed by stipulation of the optimal gradient. RESULTS: Both approaches showed excellent results for RT prediction (mean difference ± standard deviation: 2.08â¯% ± 9.44â¯% ANN; 1.78â¯% ± 1.93â¯% regression-based approach). Using the regression-based approach, the optimum gradient (4.91â¯%â¯MeOH/min) was predicted with a total run time of 17.92â¯min. The associated method was fully validated following FDA and EMA guidelines. Exemplary modification and application of the regression-based approach to a subset of 14 uro-oncological agents resulted in a considerably shortened run time of 9.29â¯min. CONCLUSIONS: Using a regression-based approach, a multi drug LC-MS assay for RT prediction was efficiently developed, which can be easily expanded to newly approved OADs and customized to smaller subsets if required.
Assuntos
Antineoplásicos , Humanos , Cromatografia Líquida/métodos , Espectrometria de Massas em Tandem/métodos , Antineoplásicos/farmacologia , Monitoramento de Medicamentos/métodos , Aprendizado de MáquinaRESUMO
BACKGROUND: Effective preventive interventions for PTSD rely on early identification of individuals at risk for developing PTSD. To establish early post-trauma who are at risk, there is a need for accurate prognostic risk screening instruments for PTSD that can be widely implemented in recently trauma-exposed adults. Achieving such accuracy and generalizability requires external validation of machine learning classification models. The current 2-ASAP cohort study will perform external validation on both full and minimal feature sets of supervised machine learning classification models assessing individual risk to follow an adverse PTSD symptom trajectory over the course of 1 year. We will derive these models from the TraumaTIPS cohort, separately for men and women. METHOD: The 2-ASAP longitudinal cohort will include N = 863 adults (N = 436 females, N = 427 males) who were recently exposed to acute civilian trauma. We will include civilian victims of accidents, crime and calamities at Victim Support Netherlands; and who were presented for medical evaluation of (suspected) traumatic injuries by emergency transportation to the emergency department. The baseline assessment within 2 months post-trauma will include self-report questionnaires on demographic, medical and traumatic event characteristics; potential risk and protective factors for PTSD; PTSD symptom severity and other adverse outcomes; and current best-practice PTSD screening instruments. Participants will be followed at 3, 6, 9, and 12 months post-trauma, assessing PTSD symptom severity and other adverse outcomes via self-report questionnaires. DISCUSSION: The ultimate goal of our study is to improve accurate screening and prevention for PTSD in recently trauma-exposed civilians. To enable future large-scale implementation, we will use self-report data to inform the prognostic models; and we will derive a minimal feature set of the classification models. This can be transformed into a short online screening instrument that is user-friendly for recently trauma-exposed adults to fill in. The eventual short online screening instrument will classify early post-trauma which adults are at risk for developing PTSD. Those at risk can be targeted and may subsequently benefit from preventive interventions, aiming to reduce PTSD and relatedly improve psychological, functional and economic outcomes.
Assuntos
Transtornos de Estresse Pós-Traumáticos , Humanos , Transtornos de Estresse Pós-Traumáticos/prevenção & controle , Transtornos de Estresse Pós-Traumáticos/diagnóstico , Estudos Longitudinais , Masculino , Feminino , Adulto , Estudos Prospectivos , Países Baixos , Programas de Rastreamento/métodos , Aprendizado de MáquinaRESUMO
BACKGROUND: Cervical cancer (CC) is among the most prevalent cancer types among women with the highest prevalence in low- and middle-income countries (LMICs). It is a curable disease if detected early. Machine learning (ML) techniques can aid in early detection and prediction thus reducing screening and treatment costs. This study focused on women living with HIV (WLHIV) in Uganda. Its aim was to identify the best predictors of CC and the supervised ML model that best predicts CC among WLHIV. METHODS: Secondary data that included 3025 women from three health facilities in central Uganda was used. A multivariate binary logistic regression and recursive feature elimination with random forest (RFERF) were used to identify the best predictors. Five models; logistic regression (LR), random forest (RF), K-Nearest neighbor (KNN), support vector machine (SVM), and multi-layer perceptron (MLP) were applied to identify the out-performer. The confusion matrix and the area under the receiver operating characteristic curve (AUC/ROC) were used to evaluate the models. RESULTS: The results revealed that duration on antiretroviral therapy (ART), WHO clinical stage, TPT status, Viral load status, and family planning were commonly selected by the two techniques and thus highly significant in CC prediction. The RF from the RFERF-selected features outperformed other models with the highest scores of 90% accuracy and 0.901 AUC. CONCLUSION: Early identification of CC and knowledge of the risk factors could help control the disease. The RF outperformed other models applied regardless of the selection technique used. Future research can be expanded to include ART-naïve women in predicting CC.
Assuntos
Infecções por HIV , Neoplasias do Colo do Útero , Humanos , Feminino , Uganda/epidemiologia , Neoplasias do Colo do Útero/diagnóstico , Infecções por HIV/tratamento farmacológico , Adulto , Aprendizado de Máquina Supervisionado , Pessoa de Meia-Idade , Lesões Pré-Cancerosas/diagnóstico , Modelos Logísticos , Algoritmos , Máquina de Vetores de SuporteRESUMO
India has been dealing with fluoride contamination of groundwater for the past few decades. Long-term exposure of fluoride can cause skeletal and dental fluorosis. Therefore, an in-depth exploration of fluoride concentrations in different parts of India is desirable. This work employs machine learning algorithms to analyze the fluoride concentrations in five major affected Indian states (Andhra Pradesh, Rajasthan, Tamil Nadu, Telangana and West Bengal). A correlation matrix was used to identify appropriate predictor variables for fluoride prediction. The various algorithms used for predictions included K-nearest neighbor (KNN), logistic regression (LR), random forest (RF), support vector classifier (SVC), Gaussian NB, MLP classifier, decision tree classifier, gradient boosting classifier, voting classifier soft and voting classifier hard. The performance of these models is assessed over accuracy, precision, recall and error rate and receiver operating curve. As the dataset was skewed, the performance of models was evaluated before and after resampling. Analysis of results indicates that the RF model is the best model for predicting fluoride contamination in groundwater in Indian states.
Assuntos
Fluoretos , Água Subterrânea , Poluentes Químicos da Água , Índia , Água Subterrânea/análise , Água Subterrânea/química , Fluoretos/análise , Poluentes Químicos da Água/análise , Aprendizado de Máquina Supervisionado , Monitoramento Ambiental/métodos , AlgoritmosRESUMO
BACKGROUND: Nowadays, social media plays a crucial role in disseminating information about cancer prevention and treatment. A growing body of research has focused on assessing access and communication effects of cancer information on social media. However, there remains a limited understanding of the comprehensive presentation of cancer prevention and treatment methods across social media platforms. Furthermore, research comparing the differences between medical social media (MSM) and common social media (CSM) is also lacking. OBJECTIVE: Using big data analytics, this study aims to comprehensively map the characteristics of cancer treatment and prevention information on MSM and CSM. This approach promises to enhance cancer coverage and assist patients in making informed treatment decisions. METHODS: We collected all posts (N=60,843) from 4 medical WeChat official accounts (accounts with professional medical backgrounds, classified as MSM in this paper) and 5 health and lifestyle WeChat official accounts (accounts with nonprofessional medical backgrounds, classified as CSM in this paper). We applied latent Dirichlet allocation topic modeling to extract cancer-related posts (N=8427) and identified 6 cancer themes separately in CSM and MSM. After manually labeling posts according to our codebook, we used a neural-based method for automated labeling. Specifically, we framed our task as a multilabel task and utilized different pretrained models, such as Bidirectional Encoder Representations from Transformers (BERT) and Global Vectors for Word Representation (GloVe), to learn document-level semantic representations for labeling. RESULTS: We analyzed a total of 4479 articles from MSM and 3948 articles from CSM related to cancer. Among these, 35.52% (2993/8427) contained prevention information and 44.43% (3744/8427) contained treatment information. Themes in CSM were predominantly related to lifestyle, whereas MSM focused more on medical aspects. The most frequently mentioned prevention measures were early screening and testing, healthy diet, and physical exercise. MSM mentioned vaccinations for cancer prevention more frequently compared with CSM. Both types of media provided limited coverage of radiation prevention (including sun protection) and breastfeeding. The most mentioned treatment measures were surgery, chemotherapy, and radiotherapy. Compared with MSM (1137/8427, 13.49%), CSM (2993/8427, 35.52%) focused more on prevention. CONCLUSIONS: The information about cancer prevention and treatment on social media revealed a lack of balance. The focus was primarily limited to a few aspects, indicating a need for broader coverage of prevention measures and treatments in social media. Additionally, the study's findings underscored the potential of applying machine learning to content analysis as a promising research approach for mapping key dimensions of cancer information on social media. These findings hold methodological and practical significance for future studies and health promotion.
Assuntos
Aprendizado de Máquina , Neoplasias , Mídias Sociais , Mídias Sociais/estatística & dados numéricos , Humanos , Neoplasias/prevenção & controle , Neoplasias/terapia , ChinaRESUMO
Clinical prediction models serve as valuable instruments for assessing the risk of crucial outcomes and facilitating decision-making in clinical settings. Constructing these models requires nuanced analytical decisions and expertise informed by the current statistical literature. Access and thorough understanding of such literature may be limited for neurocritical care physicians, which may hinder the interpretation of existing predictive models. The present emphasis is on narrowing this knowledge gap by providing neurocritical care specialists with methodological guidance for interpreting predictive models in neurocritical care. Presented are the statistical learning principles integral to constructing a model predicting hospital mortality (nonsurvival during hospitalization) in patients with moderate and severe blunt traumatic brain injury using components of the IMPACT-Core model. Discussion encompasses critical elements such as model flexibility, hyperparameter selection, data imbalance, cross-validation, model assessment (discrimination and calibration), prediction instability, and probability thresholds. The intricate interplay among these components, the data set, and the clincal context of neurocritical care is elaborated. Leveraging this comprehensive exploration of statistical learning can enhance comprehension of articles encompassing model generation, tailored clinical care, and, ultimately, better interpretation and clinical applicability of predictive models.
RESUMO
BACKGROUND: Reduced bone density is recognized as a predictor for potential complications in reverse shoulder arthroplasty (RSA). While humeral and glenoid planning based on preoperative computed tomography (CT) scans assist in implant selection and position, reproducible methods for quantifying the patients' bone density are currently not available. The purpose of this study was to perform bone density analyses including patient specific calibration in an RSA cohort based on preoperative CT imaging. It was hypothesized that preoperative CT bone density measures would provide objective quantification of the patients' humeral bone quality. METHODS: This study consisted of three parts, (1) analysis of a patient-specific calibration method in cadaveric CT scans, (2) retrospective application in a clinical RSA cohort, and (3) clustering and classification with machine learning models. Forty cadaveric shoulders were scanned in a clinical CT and compared regarding calibration with density phantoms, air muscle, and fat (patient-specific) or standard Hounsfield unit. Post-scan patient-specific calibration was used to improve the extraction of three-dimensional regions of interest for retrospective bone density analysis in a clinical RSA cohort (n=345). Machine learning models were used to improve the clustering (Hierarchical Ward) and classification (Support Vector Machine (SVM)) of low bone densities in the respective patients. RESULTS: The patient-specific calibration method demonstrated improved accuracy with excellent intraclass correlation coefficients (ICC) for cylindrical cancellous bone densities (ICC>0.75). Clustering partitioned the training data set into a high-density subgroup consisting of 96 patients and a low-density subgroup consisting of 146 patients, showing significant differences between these groups. The SVM showed optimized prediction accuracy of low and high bone densities compared to conventional statistics in the training (accuracy=91.2%; AUC=0.967) and testing (accuracy=90.5 %; AUC=0.958) data set. CONCLUSION: Preoperative CT scans can be used to quantify the proximal humeral bone quality in patients undergoing RSA. The use of machine learning models and patient-specific calibration on bone mineral density demonstrated that multiple 3D bone density scores improved the accuracy of objective preoperative bone quality assessment. The trained model could provide preoperative information to surgeons treating patients with potentially poor bone quality.
RESUMO
The three Ground Reaction Force (GRF) components can be estimated using pressure insole sensors. In this paper, we compare the accuracy of estimating GRF components for both feet using six methods: three Deep Learning (DL) methods (Artificial Neural Network, Long Short-Term Memory, and Convolutional Neural Network) and three Supervised Machine Learning (SML) methods (Least Squares, Support Vector Regression, and Random Forest (RF)). Data were collected from nine subjects across six activities: normal and slow walking, static with and without carrying a load, and two Manual Material Handling activities. This study has two main contributions: first, the estimation of GRF components (Fx, Fy, and Fz) during the six activities, two of which have never been studied; second, the comparison of the accuracy of GRF component estimation between the six methods for each activity. RF provided the most accurate estimation for static situations, with mean RMSE values of RMSE_Fx = 1.65 N, RMSE_Fy = 1.35 N, and RMSE_Fz = 7.97 N for the mean absolute values measured by the force plate (reference) RMSE_Fx = 14.10 N, RMSE_Fy = 3.83 N, and RMSE_Fz = 397.45 N. In our study, we found that RF, an SML method, surpassed the experimented DL methods.
Assuntos
Aprendizado Profundo , Pressão , Aprendizado de Máquina Supervisionado , Humanos , Masculino , Caminhada/fisiologia , Redes Neurais de Computação , Sapatos , Adulto , Feminino , Pé/fisiologia , Fenômenos Biomecânicos/fisiologia , Adulto JovemRESUMO
High diversity seabed habitats, such as shellfish aggregations, play a significant role in marine ecosystem sustainability but are susceptible to bottom disturbance induced by anthropogenic activities. Regular monitoring of these habitats with effective mapping methods is therefore essential. Multibeam echosounder (MBES) has been widely used in recent decades for seabed characterization due to its non-destructive manner and extensive spatial coverage compared to traditional methods like bottom sampling. Nevertheless, bottom sampling remains essential to link ground truth with acoustic seabed classification. Using seabed samples and MBES measurements, machine learning techniques are commonly employed to model their relationships and generate classification maps of an extended seabed. However, limited ground truth data, resulting from constraints in regulations, budget, or time, may impede the development of robust machine learning models. To address this challenge, we applied a semi-supervised machine learning method to classify seabed sediments of a blue mussel (Mytilus edulis) cultivation area in the Oosterschelde, the Netherlands. We utilized nine boxcore samples to generate pseudo-labels on MBES data. These pseudo-labels enlarged the training data size, facilitated the training of three comprehensive machine learning algorithms (Gradient Boosting, Random Forest, and Support Vector Machine), and helped to classify the study site into mussel and non-mussel areas. We found the geomorphological and backscatter-related features to be complementary for mussel culture detection. Our classification results were demonstrated effective through expert knowledge of this cultivation area and brought insights for future research on natural mussel habitats.