Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 438
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36464489

RESUMO

Viruses are the most ubiquitous and diverse entities in the biome. Due to the rapid growth of newly identified viruses, there is an urgent need for accurate and comprehensive virus classification, particularly for novel viruses. Here, we present PhaGCN2, which can rapidly classify the taxonomy of viral sequences at the family level and supports the visualization of the associations of all families. We evaluate the performance of PhaGCN2 and compare it with the state-of-the-art virus classification tools, such as vConTACT2, CAT and VPF-Class, using the widely accepted metrics. The results show that PhaGCN2 largely improves the precision and recall of virus classification, increases the number of classifiable virus sequences in the Global Ocean Virome dataset (v2.0) by four times and classifies more than 90% of the Gut Phage Database. PhaGCN2 makes it possible to conduct high-throughput and automatic expansion of the database of the International Committee on Taxonomy of Viruses. The source code is freely available at https://github.com/KennthShang/PhaGCN2.0.


Assuntos
Vírus , Vírus/genética , Genoma Viral , Bases de Dados Factuais , Software , Genômica
2.
BMC Bioinformatics ; 25(1): 218, 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38898392

RESUMO

BACKGROUND: Compared to traditional supervised machine learning approaches employing fully labeled samples, positive-unlabeled (PU) learning techniques aim to classify "unlabeled" samples based on a smaller proportion of known positive examples. This more challenging modeling goal reflects many real-world scenarios in which negative examples are not available-posing direct challenges to defining prediction accuracy and robustness. While several studies have evaluated predictions learned from only definitive positive examples, few have investigated whether correct classification of a high proportion of known positives (KP) samples from among unlabeled samples can act as a surrogate to indicate model quality. RESULTS: In this study, we report a novel methodology combining multiple established PU learning-based strategies with permutation testing to evaluate the potential of KP samples to accurately classify unlabeled samples without using "ground truth" positive and negative labels for validation. Multivariate synthetic and real-world high-dimensional benchmark datasets were employed to demonstrate the suitability of the proposed pipeline to provide evidence of model robustness across varied underlying ground truth class label compositions among the unlabeled set and with different proportions of KP examples. Comparisons between model performance with actual and permuted labels could be used to distinguish reliable from unreliable models. CONCLUSIONS: As in fully supervised machine learning, permutation testing offers a means to set a baseline "no-information rate" benchmark in the context of semi-supervised PU learning inference tasks-providing a standard against which model performance can be compared.


Assuntos
Aprendizado de Máquina , Aprendizado de Máquina Supervisionado , Humanos , Biologia Computacional/métodos , Algoritmos
3.
BMC Genomics ; 25(1): 152, 2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38326768

RESUMO

BACKGROUND: The accurate prediction of genomic breeding values is central to genomic selection in both plant and animal breeding studies. Genomic prediction involves the use of thousands of molecular markers spanning the entire genome and therefore requires methods able to efficiently handle high dimensional data. Not surprisingly, machine learning methods are becoming widely advocated for and used in genomic prediction studies. These methods encompass different groups of supervised and unsupervised learning methods. Although several studies have compared the predictive performances of individual methods, studies comparing the predictive performance of different groups of methods are rare. However, such studies are crucial for identifying (i) groups of methods with superior genomic predictive performance and assessing (ii) the merits and demerits of such groups of methods relative to each other and to the established classical methods. Here, we comparatively evaluate the genomic predictive performance and informally assess the computational cost of several groups of supervised machine learning methods, specifically, regularized regression methods, deep, ensemble and instance-based learning algorithms, using one simulated animal breeding dataset and three empirical maize breeding datasets obtained from a commercial breeding program. RESULTS: Our results show that the relative predictive performance and computational expense of the groups of machine learning methods depend upon both the data and target traits and that for classical regularized methods, increasing model complexity can incur huge computational costs but does not necessarily always improve predictive accuracy. Thus, despite their greater complexity and computational burden, neither the adaptive nor the group regularized methods clearly improved upon the results of their simple regularized counterparts. This rules out selection of one procedure among machine learning methods for routine use in genomic prediction. The results also show that, because of their competitive predictive performance, computational efficiency, simplicity and therefore relatively few tuning parameters, the classical linear mixed model and regularized regression methods are likely to remain strong contenders for genomic prediction. CONCLUSIONS: The dependence of predictive performance and computational burden on target datasets and traits call for increasing investments in enhancing the computational efficiency of machine learning algorithms and computing resources.


Assuntos
Aprendizado Profundo , Animais , Melhoramento Vegetal , Genoma , Genômica/métodos , Aprendizado de Máquina
4.
Mol Cancer ; 23(1): 32, 2024 02 13.
Artigo em Inglês | MEDLINE | ID: mdl-38350884

RESUMO

BACKGROUND: the problem in early diagnosis of sporadic cancer is understanding the individual's risk to develop disease. In response to this need, global scientific research is focusing on developing predictive models based on non-invasive screening tests. A tentative solution to the problem may be a cancer screening blood-based test able to discover those cell requirements triggering subclinical and clinical onset latency, at the stage when the cell disorder, i.e. atypical epithelial hyperplasia, is still in a subclinical stage of proliferative dysregulation. METHODS: a well-established procedure to identify proliferating circulating tumor cells was deployed to measure the cell proliferation of circulating non-haematological cells which may suggest tumor pathology. Moreover, the data collected were processed by a supervised machine learning model to make the prediction. RESULTS: the developed test combining circulating non-haematological cell proliferation data and artificial intelligence shows 98.8% of accuracy, 100% sensitivity, and 95% specificity. CONCLUSION: this proof of concept study demonstrates that integration of innovative non invasive methods and predictive-models can be decisive in assessing the health status of an individual, and achieve cutting-edge results in cancer prevention and management.


Assuntos
Inteligência Artificial , Neoplasias , Humanos
5.
BMC Microbiol ; 24(1): 162, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38730339

RESUMO

BACKGROUND: Coastal areas are subject to various anthropogenic and natural influences. In this study, we investigated and compared the characteristics of two coastal regions, Andhra Pradesh (AP) and Goa (GA), focusing on pollution, anthropogenic activities, and recreational impacts. We explored three main factors influencing the differences between these coastlines: The Bay of Bengal's shallower depth and lower salinity; upwelling phenomena due to the thermocline in the Arabian Sea; and high tides that can cause strong currents that transport pollutants and debris. RESULTS: The microbial diversity in GA was significantly higher than that in AP, which might be attributed to differences in temperature, soil type, and vegetation cover. 16S rRNA amplicon sequencing and bioinformatics analysis indicated the presence of diverse microbial phyla, including candidate phyla radiation (CPR). Statistical analysis, random forest regression, and supervised machine learning models classification confirm the diversity of the microbiome accurately. Furthermore, we have identified 450 cultures of heterotrophic, biotechnologically important bacteria. Some strains were identified as novel taxa based on 16S rRNA gene sequencing, showing promising potential for further study. CONCLUSION: Thus, our study provides valuable insights into the microbial diversity and pollution levels of coastal areas in AP and GA. These findings contribute to a better understanding of the impact of anthropogenic activities and climate variations on biology of coastal ecosystems and biodiversity.


Assuntos
Bactérias , Baías , Microbiota , Filogenia , RNA Ribossômico 16S , Água do Mar , Aprendizado de Máquina Supervisionado , RNA Ribossômico 16S/genética , Bactérias/classificação , Bactérias/genética , Bactérias/isolamento & purificação , Microbiota/genética , Água do Mar/microbiologia , Índia , Baías/microbiologia , Biodiversidade , DNA Bacteriano/genética , Salinidade , Análise de Sequência de DNA/métodos
6.
NMR Biomed ; : e5220, 2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39054694

RESUMO

Posttraumatic stress disorder (PTSD) is a chronic psychiatric condition that follows exposure to a traumatic stressor. Though previous in vivo proton (1H) MRS) research conducted at 4 T or lower has identified alterations in glutamate metabolism associated with PTSD predisposition and/or progression, no prior investigations have been conducted at higher field strength. In addition, earlier studies have not extensively addressed the impact of psychiatric comorbidities such as major depressive disorder (MDD) on PTSD-associated 1H-MRS-visible brain metabolite abnormalities. Here we employ 7 T 1H MRS to examine concentrations of glutamate, glutamine, GABA, and glutathione in the medial prefrontal cortex (mPFC) of PTSD patients with MDD (PTSD+MDD+; N = 6) or without MDD (PTSD+MDD-; N = 5), as well as trauma-unmatched controls without PTSD but with MDD (PTSD-MDD+; N = 9) or without MDD (PTSD-MDD-; N = 18). Participants with PTSD demonstrated decreased ratios of GABA to glutamine relative to healthy PTSD-MDD- controls but no single-metabolite abnormalities. When comorbid MDD was considered, however, MDD but not PTSD diagnosis was significantly associated with increased mPFC glutamine concentration and decreased glutamate:glutamine ratio. In addition, all participants with PTSD and/or MDD collectively demonstrated decreased glutathione relative to healthy PTSD-MDD- controls. Despite limited findings in single metabolites, patterns of abnormality in prefrontal metabolite concentrations among individuals with PTSD and/or MDD enabled supervised classification to separate them from healthy controls with 80+% sensitivity and specificity, with glutathione, glutamine, and myoinositol consistently among the most informative metabolites for this classification. Our findings indicate that MDD can be an important factor in mPFC glutamate metabolism abnormalities observed using 1H MRS in cohorts with PTSD.

7.
J Magn Reson Imaging ; 59(5): 1800-1806, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-37572098

RESUMO

BACKGROUND: Single center MRI radiomics models are sensitive to data heterogeneity, limiting the diagnostic capabilities of current prostate cancer (PCa) radiomics models. PURPOSE: To study the impact of image resampling on the diagnostic performance of radiomics in a multicenter prostate MRI setting. STUDY TYPE: Retrospective. POPULATION: Nine hundred thirty patients (nine centers, two vendors) with 737 eligible PCa lesions, randomly split into training (70%, N = 500), validation (10%, N = 89), and a held-out test set (20%, N = 148). FIELD STRENGTH/SEQUENCE: 1.5T and 3T scanners/T2-weighted imaging (T2W), diffusion-weighted imaging (DWI), and apparent diffusion coefficient maps. ASSESSMENT: A total of 48 normalized radiomics datasets were created using various resampling methods, including different target resolutions (T2W: 0.35, 0.5, and 0.8 mm; DWI: 1.37, 2, and 2.5 mm), dimensionalities (2D/3D) and interpolation techniques (nearest neighbor, linear, Bspline and Blackman windowed-sinc). Each of the datasets was used to train a radiomics model to detect clinically relevant PCa (International Society of Urological Pathology grade ≥ 2). Baseline models were constructed using 2D and 3D datasets without image resampling. The resampling configurations with highest validation performance were evaluated in the test dataset and compared to the baseline models. STATISTICAL TESTS: Area under the curve (AUC), DeLong test. The significance level used was 0.05. RESULTS: The best 2D resampling model (T2W: Bspline and 0.5 mm resolution, DWI: nearest neighbor and 2 mm resolution) significantly outperformed the 2D baseline (AUC: 0.77 vs. 0.64). The best 3D resampling model (T2W: linear and 0.8 mm resolution, DWI: nearest neighbor and 2.5 mm resolution) significantly outperformed the 3D baseline (AUC: 0.79 vs. 0.67). DATA CONCLUSION: Image resampling has a significant effect on the performance of multicenter radiomics artificial intelligence in prostate MRI. The recommended 2D resampling configuration is isotropic resampling with T2W at 0.5 mm (Bspline interpolation) and DWI at 2 mm (nearest neighbor interpolation). For the 3D radiomics, this work recommends isotropic resampling with T2W at 0.8 mm (linear interpolation) and DWI at 2.5 mm (nearest neighbor interpolation). EVIDENCE LEVEL: 3 TECHNICAL EFFICACY: Stage 2.


Assuntos
Próstata , Neoplasias da Próstata , Masculino , Humanos , Próstata/diagnóstico por imagem , Próstata/patologia , Estudos Retrospectivos , Inteligência Artificial , Radiômica , Imageamento por Ressonância Magnética/métodos , Neoplasias da Próstata/diagnóstico por imagem , Neoplasias da Próstata/patologia
8.
Eur Radiol ; 34(2): 810-822, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37606663

RESUMO

OBJECTIVES: Non-contrast computed tomography of the brain (NCCTB) is commonly used to detect intracranial pathology but is subject to interpretation errors. Machine learning can augment clinical decision-making and improve NCCTB scan interpretation. This retrospective detection accuracy study assessed the performance of radiologists assisted by a deep learning model and compared the standalone performance of the model with that of unassisted radiologists. METHODS: A deep learning model was trained on 212,484 NCCTB scans drawn from a private radiology group in Australia. Scans from inpatient, outpatient, and emergency settings were included. Scan inclusion criteria were age ≥ 18 years and series slice thickness ≤ 1.5 mm. Thirty-two radiologists reviewed 2848 scans with and without the assistance of the deep learning system and rated their confidence in the presence of each finding using a 7-point scale. Differences in AUC and Matthews correlation coefficient (MCC) were calculated using a ground-truth gold standard. RESULTS: The model demonstrated an average area under the receiver operating characteristic curve (AUC) of 0.93 across 144 NCCTB findings and significantly improved radiologist interpretation performance. Assisted and unassisted radiologists demonstrated an average AUC of 0.79 and 0.73 across 22 grouped parent findings and 0.72 and 0.68 across 189 child findings, respectively. When assisted by the model, radiologist AUC was significantly improved for 91 findings (158 findings were non-inferior), and reading time was significantly reduced. CONCLUSIONS: The assistance of a comprehensive deep learning model significantly improved radiologist detection accuracy across a wide range of clinical findings and demonstrated the potential to improve NCCTB interpretation. CLINICAL RELEVANCE STATEMENT: This study evaluated a comprehensive CT brain deep learning model, which performed strongly, improved the performance of radiologists, and reduced interpretation time. The model may reduce errors, improve efficiency, facilitate triage, and better enable the delivery of timely patient care. KEY POINTS: • This study demonstrated that the use of a comprehensive deep learning system assisted radiologists in the detection of a wide range of abnormalities on non-contrast brain computed tomography scans. • The deep learning model demonstrated an average area under the receiver operating characteristic curve of 0.93 across 144 findings and significantly improved radiologist interpretation performance. • The assistance of the comprehensive deep learning model significantly reduced the time required for radiologists to interpret computed tomography scans of the brain.


Assuntos
Aprendizado Profundo , Adolescente , Humanos , Radiografia , Radiologistas , Estudos Retrospectivos , Tomografia Computadorizada por Raios X/métodos , Adulto
9.
Artigo em Inglês | MEDLINE | ID: mdl-38972630

RESUMO

OBJECTIVE: Challenging infrarenal aortic neck characteristics have been associated with an increased risk of type Ia endoleak after endovascular aneurysm repair (EVAR). Short apposition (< 10 mm circumferential shortest apposition length [SAL]) on the first post-operative computed tomography angiography (CTA) has been associated with type Ia endoleak. Therefore, this study aimed to develop a model to predict post-operative SAL in patients with an abdominal aortic aneurysm based on the pre-operative shape. METHODS: A statistical shape model was developed to obtain principal component scores. The dataset comprised patients treated by standard EVAR without complications (n = 93) enriched with patients with a late type Ia endoleak (n = 54). The infrarenal SAL was obtained from the first post-operative CTA and subsequently binarised (< 10 mm and ≥ 10 mm). The principal component scores that were statistically different between the SAL groups were used as input for five classification models, and evaluated by means of leave one out cross validation. Area under the receiver operating characteristic curves (AUC), accuracy, sensitivity, and specificity were determined for each classification model. RESULTS: Of the 147 patients, 24 patients had an infrarenal SAL < 10 mm and 123 patients had a SAL ≥ 10 mm. The gradient boosting model resulted in the highest AUC of 0.77. Using this model, 114 patients (77.6%) were correctly classified; sensitivity (< 10 mm apposition was correctly predicted) and specificity (≥ 10 mm apposition was correctly predicted) were 0.70 and 0.79 based on a threshold of 0.21, respectively. CONCLUSION: A model was developed to predict which patients undergoing EVAR will achieve sufficient graft apposition (≥ 10 mm) in the infrarenal aortic neck based on a statistical shape model of pre-operative CTA data. This model can help vascular specialists during the planning phase to accurately identify patients who are unlikely to achieve sufficient apposition after standard EVAR.

10.
Fish Shellfish Immunol ; 152: 109788, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39053586

RESUMO

In the process of screening for probiotic strains, there are no clearly established bacterial phenotypic markers which could be used for the prediction of their in vivo mechanism of action. In this work, we demonstrate for the first time that Machine Learning (ML) methods can be used for accurately predicting the in vivo immunomodulatory activity of probiotic strains based on their cell surface phenotypic features using a snail host-microbe interaction model. A broad range of snail gut presumptive probiotics, including 240 new lactic acid bacterial strains (Lactobacillus, Leuconostoc, Lactococcus, and Enterococcus), were isolated and characterized based on their capacity to withstand snails' gastrointestinal defense barriers, such as the pedal mucus, gastric mucus, gastric juices, and acidic pH, in association with their cell surface hydrophobicity, autoaggregation, and biofilm formation ability. The implemented ML pipeline predicted with high accuracy (88 %) strains with a strong capacity to enhance chemotaxis and phagocytic activity of snails' hemolymph cells, while also revealed bacterial autoaggregation and cell surface hydrophobicity as the most important parameters that significantly affect host immune responses. The results show that ML approaches may be useful to derive a predictive understanding of host-probiotic interactions, while also highlighted the use of snails as an efficient animal model for screening presumptive probiotic strains in the light of their interaction with cellular innate immune responses.


Assuntos
Aprendizado de Máquina , Probióticos , Probióticos/farmacologia , Animais , Lactobacillales/fisiologia , Lactobacillales/imunologia , Caramujos/imunologia , Caramujos/microbiologia , Caracois Helix/imunologia , Caracois Helix/fisiologia , Imunidade Inata , Imunomodulação
11.
Clin Chem Lab Med ; 62(2): 293-302, 2024 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-37606251

RESUMO

OBJECTIVES: Accumulating evidence argues for a more widespread use of therapeutic drug monitoring (TDM) to support individualized medicine, especially for therapies where toxicity and efficacy are critical issues, such as in oncology. However, development of TDM assays struggles to keep pace with the rapid introduction of new drugs. Therefore, novel approaches for faster assay development are needed that also allow effortless inclusion of newly approved drugs as well as customization to smaller subsets if scientific or clinical situations require. METHODS: We applied and evaluated two machine-learning approaches i.e., a regression-based approach and an artificial neural network (ANN) to retention time (RT) prediction for efficient development of a liquid chromatography mass spectrometry (LC-MS) method quantifying 73 oral antitumor drugs (OADs) and five active metabolites. Individual steps included training, evaluation, comparison, and application of the superior approach to RT prediction, followed by stipulation of the optimal gradient. RESULTS: Both approaches showed excellent results for RT prediction (mean difference ± standard deviation: 2.08 % ± 9.44 % ANN; 1.78 % ± 1.93 % regression-based approach). Using the regression-based approach, the optimum gradient (4.91 % MeOH/min) was predicted with a total run time of 17.92 min. The associated method was fully validated following FDA and EMA guidelines. Exemplary modification and application of the regression-based approach to a subset of 14 uro-oncological agents resulted in a considerably shortened run time of 9.29 min. CONCLUSIONS: Using a regression-based approach, a multi drug LC-MS assay for RT prediction was efficiently developed, which can be easily expanded to newly approved OADs and customized to smaller subsets if required.


Assuntos
Antineoplásicos , Humanos , Cromatografia Líquida/métodos , Espectrometria de Massas em Tandem/métodos , Antineoplásicos/farmacologia , Monitoramento de Medicamentos/métodos , Aprendizado de Máquina
12.
BMC Womens Health ; 24(1): 393, 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38978015

RESUMO

BACKGROUND: Cervical cancer (CC) is among the most prevalent cancer types among women with the highest prevalence in low- and middle-income countries (LMICs). It is a curable disease if detected early. Machine learning (ML) techniques can aid in early detection and prediction thus reducing screening and treatment costs. This study focused on women living with HIV (WLHIV) in Uganda. Its aim was to identify the best predictors of CC and the supervised ML model that best predicts CC among WLHIV. METHODS: Secondary data that included 3025 women from three health facilities in central Uganda was used. A multivariate binary logistic regression and recursive feature elimination with random forest (RFERF) were used to identify the best predictors. Five models; logistic regression (LR), random forest (RF), K-Nearest neighbor (KNN), support vector machine (SVM), and multi-layer perceptron (MLP) were applied to identify the out-performer. The confusion matrix and the area under the receiver operating characteristic curve (AUC/ROC) were used to evaluate the models. RESULTS: The results revealed that duration on antiretroviral therapy (ART), WHO clinical stage, TPT status, Viral load status, and family planning were commonly selected by the two techniques and thus highly significant in CC prediction. The RF from the RFERF-selected features outperformed other models with the highest scores of 90% accuracy and 0.901 AUC. CONCLUSION: Early identification of CC and knowledge of the risk factors could help control the disease. The RF outperformed other models applied regardless of the selection technique used. Future research can be expanded to include ART-naïve women in predicting CC.


Assuntos
Infecções por HIV , Neoplasias do Colo do Útero , Humanos , Feminino , Uganda/epidemiologia , Neoplasias do Colo do Útero/diagnóstico , Infecções por HIV/tratamento farmacológico , Adulto , Aprendizado de Máquina Supervisionado , Pessoa de Meia-Idade , Lesões Pré-Cancerosas/diagnóstico , Modelos Logísticos , Algoritmos , Máquina de Vetores de Suporte
13.
J Water Health ; 22(8): 1387-1408, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39212277

RESUMO

India has been dealing with fluoride contamination of groundwater for the past few decades. Long-term exposure of fluoride can cause skeletal and dental fluorosis. Therefore, an in-depth exploration of fluoride concentrations in different parts of India is desirable. This work employs machine learning algorithms to analyze the fluoride concentrations in five major affected Indian states (Andhra Pradesh, Rajasthan, Tamil Nadu, Telangana and West Bengal). A correlation matrix was used to identify appropriate predictor variables for fluoride prediction. The various algorithms used for predictions included K-nearest neighbor (KNN), logistic regression (LR), random forest (RF), support vector classifier (SVC), Gaussian NB, MLP classifier, decision tree classifier, gradient boosting classifier, voting classifier soft and voting classifier hard. The performance of these models is assessed over accuracy, precision, recall and error rate and receiver operating curve. As the dataset was skewed, the performance of models was evaluated before and after resampling. Analysis of results indicates that the RF model is the best model for predicting fluoride contamination in groundwater in Indian states.


Assuntos
Fluoretos , Água Subterrânea , Poluentes Químicos da Água , Índia , Água Subterrânea/análise , Água Subterrânea/química , Fluoretos/análise , Poluentes Químicos da Água/análise , Aprendizado de Máquina Supervisionado , Monitoramento Ambiental/métodos , Algoritmos
14.
J Med Internet Res ; 26: e55937, 2024 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-39141911

RESUMO

BACKGROUND: Nowadays, social media plays a crucial role in disseminating information about cancer prevention and treatment. A growing body of research has focused on assessing access and communication effects of cancer information on social media. However, there remains a limited understanding of the comprehensive presentation of cancer prevention and treatment methods across social media platforms. Furthermore, research comparing the differences between medical social media (MSM) and common social media (CSM) is also lacking. OBJECTIVE: Using big data analytics, this study aims to comprehensively map the characteristics of cancer treatment and prevention information on MSM and CSM. This approach promises to enhance cancer coverage and assist patients in making informed treatment decisions. METHODS: We collected all posts (N=60,843) from 4 medical WeChat official accounts (accounts with professional medical backgrounds, classified as MSM in this paper) and 5 health and lifestyle WeChat official accounts (accounts with nonprofessional medical backgrounds, classified as CSM in this paper). We applied latent Dirichlet allocation topic modeling to extract cancer-related posts (N=8427) and identified 6 cancer themes separately in CSM and MSM. After manually labeling posts according to our codebook, we used a neural-based method for automated labeling. Specifically, we framed our task as a multilabel task and utilized different pretrained models, such as Bidirectional Encoder Representations from Transformers (BERT) and Global Vectors for Word Representation (GloVe), to learn document-level semantic representations for labeling. RESULTS: We analyzed a total of 4479 articles from MSM and 3948 articles from CSM related to cancer. Among these, 35.52% (2993/8427) contained prevention information and 44.43% (3744/8427) contained treatment information. Themes in CSM were predominantly related to lifestyle, whereas MSM focused more on medical aspects. The most frequently mentioned prevention measures were early screening and testing, healthy diet, and physical exercise. MSM mentioned vaccinations for cancer prevention more frequently compared with CSM. Both types of media provided limited coverage of radiation prevention (including sun protection) and breastfeeding. The most mentioned treatment measures were surgery, chemotherapy, and radiotherapy. Compared with MSM (1137/8427, 13.49%), CSM (2993/8427, 35.52%) focused more on prevention. CONCLUSIONS: The information about cancer prevention and treatment on social media revealed a lack of balance. The focus was primarily limited to a few aspects, indicating a need for broader coverage of prevention measures and treatments in social media. Additionally, the study's findings underscored the potential of applying machine learning to content analysis as a promising research approach for mapping key dimensions of cancer information on social media. These findings hold methodological and practical significance for future studies and health promotion.


Assuntos
Aprendizado de Máquina , Neoplasias , Mídias Sociais , Mídias Sociais/estatística & dados numéricos , Humanos , Neoplasias/prevenção & controle , Neoplasias/terapia , China
15.
Neurocrit Care ; 2024 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-39107660

RESUMO

Clinical prediction models serve as valuable instruments for assessing the risk of crucial outcomes and facilitating decision-making in clinical settings. Constructing these models requires nuanced analytical decisions and expertise informed by the current statistical literature. Access and thorough understanding of such literature may be limited for neurocritical care physicians, which may hinder the interpretation of existing predictive models. The present emphasis is on narrowing this knowledge gap by providing neurocritical care specialists with methodological guidance for interpreting predictive models in neurocritical care. Presented are the statistical learning principles integral to constructing a model predicting hospital mortality (nonsurvival during hospitalization) in patients with moderate and severe blunt traumatic brain injury using components of the IMPACT-Core model. Discussion encompasses critical elements such as model flexibility, hyperparameter selection, data imbalance, cross-validation, model assessment (discrimination and calibration), prediction instability, and probability thresholds. The intricate interplay among these components, the data set, and the clincal context of neurocritical care is elaborated. Leveraging this comprehensive exploration of statistical learning can enhance comprehension of articles encompassing model generation, tailored clinical care, and, ultimately, better interpretation and clinical applicability of predictive models.

16.
Artigo em Inglês | MEDLINE | ID: mdl-39154849

RESUMO

BACKGROUND: Reduced bone density is recognized as a predictor for potential complications in reverse shoulder arthroplasty (RSA). While humeral and glenoid planning based on preoperative computed tomography (CT) scans assist in implant selection and position, reproducible methods for quantifying the patients' bone density are currently not available. The purpose of this study was to perform bone density analyses including patient specific calibration in an RSA cohort based on preoperative CT imaging. It was hypothesized that preoperative CT bone density measures would provide objective quantification of the patients' humeral bone quality. METHODS: This study consisted of three parts, (1) analysis of a patient-specific calibration method in cadaveric CT scans, (2) retrospective application in a clinical RSA cohort, and (3) clustering and classification with machine learning models. Forty cadaveric shoulders were scanned in a clinical CT and compared regarding calibration with density phantoms, air muscle, and fat (patient-specific) or standard Hounsfield unit. Post-scan patient-specific calibration was used to improve the extraction of three-dimensional regions of interest for retrospective bone density analysis in a clinical RSA cohort (n=345). Machine learning models were used to improve the clustering (Hierarchical Ward) and classification (Support Vector Machine (SVM)) of low bone densities in the respective patients. RESULTS: The patient-specific calibration method demonstrated improved accuracy with excellent intraclass correlation coefficients (ICC) for cylindrical cancellous bone densities (ICC>0.75). Clustering partitioned the training data set into a high-density subgroup consisting of 96 patients and a low-density subgroup consisting of 146 patients, showing significant differences between these groups. The SVM showed optimized prediction accuracy of low and high bone densities compared to conventional statistics in the training (accuracy=91.2%; AUC=0.967) and testing (accuracy=90.5 %; AUC=0.958) data set. CONCLUSION: Preoperative CT scans can be used to quantify the proximal humeral bone quality in patients undergoing RSA. The use of machine learning models and patient-specific calibration on bone mineral density demonstrated that multiple 3D bone density scores improved the accuracy of objective preoperative bone quality assessment. The trained model could provide preoperative information to surgeons treating patients with potentially poor bone quality.

17.
Sensors (Basel) ; 24(16)2024 Aug 16.
Artigo em Inglês | MEDLINE | ID: mdl-39205012

RESUMO

The three Ground Reaction Force (GRF) components can be estimated using pressure insole sensors. In this paper, we compare the accuracy of estimating GRF components for both feet using six methods: three Deep Learning (DL) methods (Artificial Neural Network, Long Short-Term Memory, and Convolutional Neural Network) and three Supervised Machine Learning (SML) methods (Least Squares, Support Vector Regression, and Random Forest (RF)). Data were collected from nine subjects across six activities: normal and slow walking, static with and without carrying a load, and two Manual Material Handling activities. This study has two main contributions: first, the estimation of GRF components (Fx, Fy, and Fz) during the six activities, two of which have never been studied; second, the comparison of the accuracy of GRF component estimation between the six methods for each activity. RF provided the most accurate estimation for static situations, with mean RMSE values of RMSE_Fx = 1.65 N, RMSE_Fy = 1.35 N, and RMSE_Fz = 7.97 N for the mean absolute values measured by the force plate (reference) RMSE_Fx = 14.10 N, RMSE_Fy = 3.83 N, and RMSE_Fz = 397.45 N. In our study, we found that RF, an SML method, surpassed the experimented DL methods.


Assuntos
Aprendizado Profundo , Pressão , Aprendizado de Máquina Supervisionado , Humanos , Masculino , Caminhada/fisiologia , Redes Neurais de Computação , Sapatos , Adulto , Feminino , Pé/fisiologia , Fenômenos Biomecânicos/fisiologia , Adulto Jovem
18.
J Environ Manage ; 369: 122250, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39213853

RESUMO

High diversity seabed habitats, such as shellfish aggregations, play a significant role in marine ecosystem sustainability but are susceptible to bottom disturbance induced by anthropogenic activities. Regular monitoring of these habitats with effective mapping methods is therefore essential. Multibeam echosounder (MBES) has been widely used in recent decades for seabed characterization due to its non-destructive manner and extensive spatial coverage compared to traditional methods like bottom sampling. Nevertheless, bottom sampling remains essential to link ground truth with acoustic seabed classification. Using seabed samples and MBES measurements, machine learning techniques are commonly employed to model their relationships and generate classification maps of an extended seabed. However, limited ground truth data, resulting from constraints in regulations, budget, or time, may impede the development of robust machine learning models. To address this challenge, we applied a semi-supervised machine learning method to classify seabed sediments of a blue mussel (Mytilus edulis) cultivation area in the Oosterschelde, the Netherlands. We utilized nine boxcore samples to generate pseudo-labels on MBES data. These pseudo-labels enlarged the training data size, facilitated the training of three comprehensive machine learning algorithms (Gradient Boosting, Random Forest, and Support Vector Machine), and helped to classify the study site into mussel and non-mussel areas. We found the geomorphological and backscatter-related features to be complementary for mussel culture detection. Our classification results were demonstrated effective through expert knowledge of this cultivation area and brought insights for future research on natural mussel habitats.


Assuntos
Ecossistema , Animais , Monitoramento Ambiental/métodos , Aprendizado de Máquina Supervisionado , Países Baixos , Bivalves , Aprendizado de Máquina , Mytilus edulis
19.
Exp Appl Acarol ; 2024 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-39177713

RESUMO

Two-spotted spider mite (Tetranychus urticae) is an important greenhouse pest. In cucumbers, heavy infestations lead to the complete loss of leaf assimilation surface, resulting in plant death. Symptoms caused by spider mite feeding alter the light reflection of leaves and could therefore be optically detected. Machine learning methods have already been employed to analyze spectral information in order to differentiate between healthy and spider mite-infested leaves of crops such as tomatoes or cotton. In this study, machine learning methods were applied to cucumbers. Hyperspectral data of leaves were recorded under controlled conditions. Effective wavelengths were identified using three feature selection methods. Subsequently, three supervised machine learning algorithms were used to classify healthy and spider mite-infested leaves. All combinations of feature selection and classification methods yielded accuracy of over 80%, even when using ten or five wavelengths. These results suggest that machine learning methods are a powerful tool for image-based detection of spider mites in cucumbers. In addition, due to the limited number of wavelengths, there is also substantial potential for practical application.

20.
Sante Publique ; 35(6): 65-85, 2024 02 23.
Artigo em Francês | MEDLINE | ID: mdl-38388403

RESUMO

Introduction: Benefiting from the disability pension implies morbid (physical and psychological) and social (fall in income) implications for the person. It also has economic consequences for society, with increasing expenses since 2011 (+4.9% on average per year). Investing in preventive actions against the loss of the ability to work should limit these consequences, but it requires targeting people at risk. The development of artificial intelligence opens up prospects in this regard. Purpose of the Research: To target, using supervised machine learning methods, those people with a high probability of becoming eligible for the disability pension over the course of the year based on their socio-demographic and medical characteristics (pathologies, work stoppages, drugs taken, and medical procedures). Method: Among the beneficiaries of the French public welfare system aged 20­64 in 2017, we compared the socio-demographic and medical characteristics between 2014 and 2016 of those who received a disability pension in 2017 and not before, and those who did not receive a disability pension from 2014 to 2017. The determination of the boundary between these two groups was tested using logistic regression, decision trees, random forests, naive Bayes classifiers, and support vector machines. The models' performance was compared with respect to accuracy, precision, sensitivity, specificity, and AUC (area under the curve). Finally, the predictive power of each factor was measured by AUC too. Results: The boosted logistic regression had the best performance for three of the five criteria, but low sensitivity. The best sensitivity was obtained with the support vector machines, with an accuracy close to that of the boosted logistic regression, but a lower precision and specificity. Random forests offered the best discriminatory ability. The naive Bayes classifier had the worst performance. The most predictive factors in becoming eligible for the disability pension were having 30 days or more off sick in 2014, 2015, and 2016 and being aged 55 to 64. Conclusion: Supervised learning methods have appeared relevant for identifying people with the highest probability of becoming eligible for the disability pension and, more broadly, for steering public and social policies.


Introduction: Le recours à la pension d'invalidité a des implications morbides (physiques ou psychiques) et sociales (baisse du revenu). Il a aussi des conséquences économiques pour la société, avec des dépenses croissantes depuis 2011 (+4,9 % en moyenne par année). Prévenir la perte de la capacité à travailler devrait permettre de limiter ces conséquences, mais nécessite de cibler les personnes à risque. Le développement des méthodes d'intelligence artificielle ouvre des perspectives en ce sens. But de l'étude: Cibler les personnes ayant une « forte ¼ probabilité de devenir bénéficiaires d'une pension d'invalidité dans l'année au regard de leurs caractéristiques sociodémographiques et médicales (pathologies, arrêts de travail, médicaments et actes médicaux) à partir de méthodes d'apprentissage automatique supervisé. Méthodes: Parmi les bénéficiaires du régime général âgés de 21 à 64 ans en 2017, comparaison des caractéristiques de 2014 à 2016 entre les nouveaux bénéficiaires d'une pension d'invalidité en 2017 et ceux n'en bénéficiant pas. La détermination de la frontière entre ces deux groupes a été testée à l'aide de la régression logistique, des arbres de décision, des forêts aléatoires, de la classification naïve bayésienne et des séparateurs à vaste marge. Les performances des modèles ont été comparées au regard de la justesse, la précision, la sensibilité, la spécificité et l'AUC (Area Under the Curve). Le pouvoir prédictif de chaque facteur est estimé à partir de l'AUC. Résultats: La régression logistique boostée avait les meilleures performances sur trois des cinq critères retenus, mais une faible sensibilité. La meilleure sensibilité était obtenue avec les séparateurs à vaste marge, avec une justesse proche de la régression logistique boostée mais une précision et une spécificité inférieures. Les forêts aléatoires offraient la meilleure capacité discriminatoire. Les facteurs les plus prédictifs du risque de passer en invalidité étaient le bénéfice d'au moins 30 jours d'indemnités journalières pour maladie en 2014, 2015 et 2016 et le fait d'être âgé de 55 à 64 ans. Conclusion: Les méthodes d'apprentissage supervisé sont apparues pertinentes pour le ciblage des personnes les plus à risque de recourir à la pension d'invalidité et, plus largement, pour le pilotage d'autres prestations sociales.


Assuntos
Inteligência Artificial , Pensões , Humanos , Teorema de Bayes , Aprendizado de Máquina , Fatores de Risco
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa