RESUMO
Recent studies introduced the importance of using machine learning algorithms in research focused on the identification of antibiotic resistance. In this study, we highlight the importance of building solid machine learning foundations to differentiate antimicrobial resistance among microorganisms. Using advanced machine learning algorithms, we established a methodology capable of analyzing the FTIR structural profile of the samples of Streptococcus pyogenes and Streptococcus mutans (Gram-positive), as well as Escherichia coli and Klebsiella pneumoniae (Gram-negative), demonstrating cross-sectional applicability in this focus on different microorganisms. The analysis focuses on specific biomolecules-Carbohydrates, Fatty Acids, and Proteins-in FTIR spectra, providing a multidimensional database that transcends microbial variability. The results highlight the ability of the method to consistently identify resistance patterns, regardless of the Gram classification of the bacteria and the species involved, reinforcing the premise that the structural characteristics identified are universal among the microorganisms tested. By validating this approach in four distinct species, our study proves the versatility and precision of the methodology used, in addition to bringing support to the development of an innovative protocol for the rapid and safe identification of antimicrobial resistance. This advance is crucial for optimizing treatment strategies and avoiding the spread of resistance. This emphasizes the relevance of specialized machine learning bases in effectively differentiating between resistance profiles in Gram-negative and Gram-positive bacteria to be implemented in the identification of antibiotic resistance. The obtained result has a high potential to be applied to clinical procedures.
RESUMO
The current detection method for Chikungunya Virus (CHIKV) involves an invasive and costly molecular biology procedure as the gold standard diagnostic method. Consequently, the search for a non-invasive, more cost-effective, reagent-free, and sustainable method for the detection of CHIKV infection is imperative for public health. The portable Fourier-transform infrared coupled with Attenuated Total Reflection (ATR-FTIR) platform was applied to discriminate systemic diseases using saliva, however, the salivary diagnostic application in viral diseases is less explored. The study aimed to identify unique vibrational modes of salivary infrared profiles to detect CHIKV infection using chemometrics and artificial intelligence algorithms. Thus, we intradermally challenged interferon-gamma gene knockout C57/BL6 mice with CHIKV (20 µl, 1 X 105 PFU/ml, n = 6) or vehicle (20 µl, n = 7). Saliva and serum samples were collected on day 3 (due to the peak of viremia). CHIKV infection was confirmed by Real-time PCR in the serum of CHIKV-infected mice. The best pattern classification showed a sensitivity of 83%, specificity of 86%, and accuracy of 85% using support vector machine (SVM) algorithms. Our results suggest that the salivary ATR-FTIR platform can discriminate CHIKV infection with the potential to be applied as a non-invasive, sustainable, and cost-effective detection tool for this emerging disease.
Assuntos
Algoritmos , Inteligência Artificial , Febre de Chikungunya , Vírus Chikungunya , Saliva , Animais , Saliva/virologia , Febre de Chikungunya/diagnóstico , Febre de Chikungunya/virologia , Vírus Chikungunya/isolamento & purificação , Vírus Chikungunya/genética , Camundongos , Espectroscopia de Infravermelho com Transformada de Fourier/métodos , Camundongos Endogâmicos C57BL , Camundongos KnockoutRESUMO
Machine learning classification approaches were used to discriminate a fishy off-flavour identified in beef with health-enhanced fatty acid profiles. The random forest approach outperformed (P < 0.001; receiver operating characteristic curve: 99.8 %, sensitivity: 99.9 % and specificity: 93.7 %) the logistic regression, partial least-squares discrimination analysis and the support vector machine (linear and radial) approaches, correctly classifying 100 % and 82 % of the fishy and non-fishy meat samples, respectively. The random forest algorithm identified 20 volatile compounds responsible for the discrimination of fishy from non-fishy meat samples. Among those, seven volatile compounds (pentadecane, octadecane, γ-dodecalactone, dodecanal, (E,E)-2,4-heptadienal, 2-heptanone, and ethylbenzene) were selected as significant contributors to the fishy off-flavour fingerprint, all being related to lipid oxidation. This fishy off-flavour fingerprint could facilitate the rapid monitoring of beef with enhanced healthy fatty acids to avoid consumer dissatisfaction due to fishy off-flavour.
Assuntos
Ácidos Graxos , Aprendizado de Máquina , Carne Vermelha , Compostos Orgânicos Voláteis , Animais , Bovinos , Compostos Orgânicos Voláteis/análise , Carne Vermelha/análise , Ácidos Graxos/análise , PaladarRESUMO
INTRODUCTION AND OBJECTIVES: The increasing incidence of hepatocellular carcinoma (HCC) in China is an urgent issue, necessitating early diagnosis and treatment. This study aimed to develop personalized predictive models by combining machine learning (ML) technology with a demographic, medical history, and noninvasive biomarker data. These models can enhance the decision-making capabilities of physicians for HCC in hepatitis B virus (HBV)-related cirrhosis patients with low serum alpha-fetoprotein (AFP) levels. PATIENTS AND METHODS: A total of 6,980 patients treated between January 2012 and December 2018 were included. Pre-treatment laboratory tests and clinical data were obtained. The significant risk factors for HCC were identified, and the relative risk of each variable affecting its diagnosis was calculated using ML and univariate regression analysis. The data set was then randomly partitioned into validation (20 %) and training sets (80 %) to develop the ML models. RESULTS: Twelve independent risk factors for HCC were identified using Gaussian naïve Bayes, extreme gradient boosting (XGBoost), random forest, and least absolute shrinkage and selection operation regression models. Multivariate analysis revealed that male sex, age >60 years, alkaline phosphate >150 U/L, AFP >25 ng/mL, carcinoembryonic antigen >5 ng/mL, and fibrinogen >4 g/L were the risk factors, whereas hypertension, calcium <2.25 mmol/L, potassium ≤3.5 mmol/L, direct bilirubin >6.8 µmol/L, hemoglobin <110 g/L, and glutamic-pyruvic transaminase >40 U/L were the protective factors in HCC patients. Based on these factors, a nomogram was constructed, showing an area under the curve (AUC) of 0.746 (sensitivity = 0.710, specificity=0.646), which was significantly higher than AFP AUC of 0.658 (sensitivity = 0.462, specificity=0.766). Compared with several ML algorithms, the XGBoost model had an AUC of 0.832 (sensitivity = 0.745, specificity=0.766) and an independent validation AUC of 0.829 (sensitivity = 0.766, specificity = 0.737), making it the top-performing model in both sets. The external validation results have proven the accuracy of the XGBoost model. CONCLUSIONS: The proposed XGBoost demonstrated a promising ability for individualized prediction of HCC in HBV-related cirrhosis patients with low-level AFP.
Assuntos
Carcinoma Hepatocelular , Cirrose Hepática , Neoplasias Hepáticas , Aprendizado de Máquina , alfa-Fetoproteínas , Humanos , Carcinoma Hepatocelular/sangue , Carcinoma Hepatocelular/virologia , Carcinoma Hepatocelular/epidemiologia , Carcinoma Hepatocelular/diagnóstico , Carcinoma Hepatocelular/etiologia , Neoplasias Hepáticas/sangue , Neoplasias Hepáticas/virologia , Neoplasias Hepáticas/epidemiologia , Neoplasias Hepáticas/etiologia , Neoplasias Hepáticas/diagnóstico , alfa-Fetoproteínas/análise , alfa-Fetoproteínas/metabolismo , Masculino , Feminino , Pessoa de Meia-Idade , Cirrose Hepática/sangue , Cirrose Hepática/virologia , Cirrose Hepática/diagnóstico , Medição de Risco , Fatores de Risco , China/epidemiologia , Hepatite B Crônica/complicações , Hepatite B Crônica/sangue , Valor Preditivo dos Testes , Adulto , Nomogramas , Biomarcadores Tumorais/sangue , Hepatite B/complicações , Hepatite B/sangue , Hepatite B/diagnóstico , Idoso , Estudos RetrospectivosRESUMO
BACKGROUND: Colorectal cancer has a high incidence and mortality rate due to a low rate of early diagnosis. Therefore, efficient diagnostic methods are urgently needed. PURPOSE: This study assesses the diagnostic effectiveness of Carbohydrate Antigen 19-9 (CA19-9), Carcinoembryonic Antigen (CEA), Alpha-fetoprotein (AFP), and Cancer Antigen 125 (CA125) serum tumor markers for colorectal cancer (CRC) and investigates a machine learning-based diagnostic model incorporating these markers with blood biochemical indices for improved CRC detection. METHOD: Between January 2019 and December 2021, data from 800 CRC patients and 697 controls were collected; 52 patients and 63 controls attending the same hospital in 2022 were collected as an external validation set. Markers' effectiveness was analyzed individually and collectively, using metrics like ROC curve AUC and F1 score. Variables chosen through backward regression, including demographics and blood tests, were tested on six machine learning models using these metrics. RESULT: In the case group, the levels of CEA, CA199, and CA125 were found to be higher than those in the control group. Combining these with a fourth serum marker significantly improved predictive efficacy over using any single marker alone, achieving an Area Under the Curve (AUC) value of 0.801. Using stepwise regression (backward), 17 variables were meticulously selected for evaluation in six machine learning models. Among these models, the Gradient Boosting Machine (GBM) emerged as the top performer in the training set, test set, and external validation set, boasting an AUC value of over 0.9, indicating its superior predictive power. CONCLUSION: Machine learning models integrating tumor markers and blood indices offer superior CRC diagnostic accuracy, potentially enhancing clinical practice.
RESUMO
FTIR (Fourier transform infrared spectroscopy) is one analytical technique of the absorption of infrared radiation. FTIR can also be used as a tool to characterize profiles of biomolecules in bacterial cells, which can be useful in differentiating different bacteria. Considering that different bacterial species have different molecular compositions, it will then result in unique FTIR spectra for each species and even bacterial strains. Having this important tool, here, we have developed a methodology aimed at refining the analysis and classification of the FTIR absorption spectra obtained from samples of Staphylococcus aureus, with the implementation of machine learning algorithms. In the first stage, the system conforming to four specified species groups, Control, Amoxicillin induced (AMO), Gentamicin induced (GEN), and Erythromycin induced (ERY), was analyzed. Then, in the second stage, five hidden samples were identified and correctly classified as with/without resistance to induced antibiotics. The total analyses were performed in three windows, Carbohydrates, Fatty Acids, and Proteins, of five hundred spectra. The protocol for acquiring the spectral data from the antibiotic-resistant bacteria via FTIR spectroscopy developed by Soares et al. was implemented here due to demonstrating high accuracy and sensitivity. The present study focuses on the prediction of antibiotic-induced samples through the implementation of the hierarchical cluster analysis (HCA), principal component analysis (PCA) algorithm, and calculation of confusion matrices (CMs) applied to the FTIR absorption spectra data. The data analysis process developed here has the main objective of obtaining knowledge about the intrinsic behavior of S. aureus samples within the analysis regions of the FTIR absorption spectra. The results yielded values with 0.7 to 1 accuracy and high values of sensitivity and specificity for the species identification in the CM calculations. Such results provide important information on antibiotic resistance in samples of S. aureus bacteria for potential application in the detection of antibiotic resistance in clinical use.
RESUMO
Scoliosis is a disease estimated to affect more than 8% of adults in the United States. It is diagnosed with use of radiography by means of manual measurement of the angle between maximally tilted vertebrae on a radiograph (ie, the Cobb angle). However, these measurements are time-consuming, limiting their use in scoliosis surgical planning and postoperative monitoring. In this retrospective study, a pipeline (using the SpineTK architecture) was developed that was trained, validated, and tested on 1310 anterior-posterior images obtained with a low-dose stereoradiographic scanning system and radiographs obtained in patients with suspected scoliosis to automatically measure Cobb angles. The images were obtained at six centers (2005-2020). The algorithm measured Cobb angles on hold-out internal (n = 460) and external (n = 161) test sets with less than 2° error (intraclass correlation coefficient, 0.96) compared with ground truth measurements by two experienced radiologists. Measurements, produced in less than 0.5 second, did not differ significantly (P = .05 cutoff) from ground truth measurements, regardless of the presence or absence of surgical hardware (P = .80), age (P = .58), sex (P = .83), body mass index (P = .63), scoliosis severity (P = .44), or image type (low-dose stereoradiographic image vs radiograph; P = .51) in the patient. These findings suggest that the algorithm is highly robust across different clinical characteristics. Given its automated, rapid, and accurate measurements, this network may be used for monitoring scoliosis progression in patients. Keywords: Cobb Angle, Convolutional Neural Network, Deep Learning Algorithms, Pediatrics, Machine Learning Algorithms, Scoliosis, Spine Supplemental material is available for this article. © RSNA, 2023.
RESUMO
The purpose of this research was to evaluate performance of an energy-dispersive X-ray fluorescence (XRF) sensor to classify soybean based on protein content. The hypothesis was that sulfur signals and other XRF spectral features can be used as proxies to infer soybean protein content. Sample preparation and equipment settings to optimize detection of S and other specific emission lines were tested for this application. A logistic regression model for classifying soybean as high- or low-protein was developed based on XRF spectra and protein contents. Additionally, the model was validated with an independent set of samples. Global accuracies of the method were 0.83 (training set) and 0.81 (test set) and the corresponding kappa indices were 0.66 and 0.61, respectively. These numbers indicated satisfactory performance of the sensor, suggesting that XRF spectral features can be applied for screening protein content in soybean.
Assuntos
Glycine max , Espectrometria por Raios X/métodos , Raios XRESUMO
Predictive models are statistical representations that indicate, based on the historical data analysis, the probability of triggering a given phenomenon in the future. In geosciences, such models have been essential to predict the occurrence of adverse phenomena commonly associated with environmental disasters, such as gully erosion. Therefore, this paper presents a method for producing gully erosion predictive models based on geoenvironmental data and machine learning techniques. The method's effectiveness test was produced in a region of approximately 40,000 km² in southeastern Brazil and compared the predictive performance of four models designed with different machine learning algorithms. The results demonstrated that the technique is capable of producing models with high predictive ability, with emphasis on the random forest algorithm, which, in addition to having achieved the highest levels of accuracy, also produced highly realistic maps for the study area.â¢The method is straightforward and may be applied to predict other geological processes.â¢The application of the method does not require knowledge of programming language.â¢The models produced achieved high predictive performance.
RESUMO
Late blight (LB) caused by the oomycete Phytophthora infestans is one of the most important biotic constraints for potato production worldwide. This study assessed 508 accessions (79 wild potato species and 429 landraces from a cultivated core collection) held at the International Potato Center genebank for resistance to LB. One P. infestans isolate belonging to the EC-1 lineage, which is currently the predominant type of P. infestans in Peru, Ecuador, and Colombia, was used in whole plant assays under greenhouse conditions. Novel sources of resistance to LB were found in accessions of Solanum albornozii, S. andreanum, S. lesteri, S. longiconicum, S. morelliforme, S. stenophyllidium, S. mochiquense, S. cajamarquense, and S. huancabambense. All of these species are endemic to South America and thus could provide novel sources of resistance for potato breeding programs. We found that the level of resistance to LB in wild species and potato landraces cannot be predicted from altitude and bioclimatic variables of the locations where the accessions were collected. The high percentage (73%) of potato landraces susceptible to LB in our study suggests the importance of implementing disease control measures, including planting susceptible genotypes in less humid areas and seasons or switching to genotypes identified as resistant. In addition, this study points out a high risk of genetic erosion in potato biodiversity at high altitudes of the Andes due to susceptibility to LB in the native landraces, which has been exacerbated by climatic change that favors the development of LB in those regions.[Formula: see text] Copyright © 2022 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.
Assuntos
Phytophthora infestans , Solanum tuberosum , Solanum , Phytophthora infestans/genética , Melhoramento Vegetal , Doenças das Plantas/genética , Solanum tuberosum/genéticaRESUMO
In sucker-rod pumping wells, due to the lack of an early diagnosis of operating condition or sensor faults, several problems can go unnoticed. These problems can increase downtime and production loss. In these wells, the diagnosis of operation conditions is carried out through downhole dynamometer cards, via pre-established patterns, with human visual effort in the operation centers. Starting with machine learning algorithms, several papers have been published on the subject, but it is still common to have doubts concerning the difficulty level of the dynamometer card classification task and best practices for solving the problem. In the search for answers to these questions, this work carried out sixty tests with more than 50,000 dynamometer cards from 38 wells in the Mossoró, RN, Brazil. In addition, it presented test results for three algorithms (decision tree, random forest and XGBoost), three descriptors (Fourier, wavelet and card load values), as well as pipelines provided by automated machine learning. Tests with and without the tuning of hypermeters, different levels of dataset balancing and various evaluation metrics were evaluated. The research shows that it is possible to detect sensor failures from dynamometer cards. Of the results that will be presented, 75% of the tests had an accuracy above 92% and the maximum accuracy was 99.84%.
Assuntos
Algoritmos , Aprendizado de Máquina , Brasil , HumanosRESUMO
Market participants use a wide set of information before they decide to invest in risk assets, such as stocks. Investors often follow the news to collect the information that will help them decide which strategy to follow. In this study, we analyze how public news and historical prices can be used together to anticipate and prevent financial losses on the Brazilian stock market. We include an extensive set of 64 securities in our analysis, which represent various sectors of the Brazilian economy. Our analysis compares the traditional Buy & Hold and the moving average strategies to several experiments designed with 11 machine learning algorithms. We explore daily, weekly and monthly time horizons for both publication and return windows. With this approach we were able to assess the most relevant set of news for investor's decision, and to determine for how long the information remains relevant to the market. We found a strong relationship between news publications and stock price changes in Brazil, suggesting even short-term arbitrage opportunities. The study shows that it is possible to predict stock price falls using a set of news in Portuguese, and that text mining-based approaches can overcome traditional strategies when forecasting losses.
RESUMO
Data Mining techniques play an important role in the prediction of soil spatial distribution in systematic soil surveying, though existing methodologies still lack standardization and a full understanding of their capabilities. The aim of this work was to evaluate the performance of preprocessing procedures and supervised classification approaches for predicting map units from 1:100,000-scale conventional semi-detailed soil surveys. Sheets of the Brazilian National Cartographic System on the 1:50,000 scale, Dois Córregos (Brotas 1:100,000-scale sheet), São Pedro and Laras (Piracicaba 1:100,000-scale sheet) were used for developing models. Soil map information and predictive environmental covariates for the dataset were obtained from the semi-detailed soil survey of the state of São Paulo, from the Brazilian Institute of Geography and Statistics (IBGE) 1:50,000-scale topographic sheets and from the 1:750,000-scale geological map of the state of São Paulo. The target variable was a soil map unit of four types: local soil unit name and soil class at three hierarchical levels of the Brazilian System of Soil Classification (SiBCS). Different data preprocessing treatments and four algorithms all having different approaches were also tested. Results showed that composite soil map units were not adequate for the machine learning process. Class balance did not contribute to improving the performance of classifiers. Accuracy values of 78 % and a Kappa index of 0.67 were obtained after preprocessing procedures with Random Forest, the algorithm that performed best. Information from conventional map units of semi-detailed (4th order) 1:100,000 soil survey generated models with values for accuracy, precision, sensitivity, specificity and Kappa indexes that support their use in programs for systematic soil surveying.
Assuntos
Mapeamento Geográfico , Monitoramento do Solo , Mineração de DadosRESUMO
Data Mining techniques play an important role in the prediction of soil spatial distribution in systematic soil surveying, though existing methodologies still lack standardization and a full understanding of their capabilities. The aim of this work was to evaluate the performance of preprocessing procedures and supervised classification approaches for predicting map units from 1:100,000-scale conventional semi-detailed soil surveys. Sheets of the Brazilian National Cartographic System on the 1:50,000 scale, Dois Córregos (Brotas 1:100,000-scale sheet), São Pedro and Laras (Piracicaba 1:100,000-scale sheet) were used for developing models. Soil map information and predictive environmental covariates for the dataset were obtained from the semi-detailed soil survey of the state of São Paulo, from the Brazilian Institute of Geography and Statistics (IBGE) 1:50,000-scale topographic sheets and from the 1:750,000-scale geological map of the state of São Paulo. The target variable was a soil map unit of four types: local soil unit name and soil class at three hierarchical levels of the Brazilian System of Soil Classification (SiBCS). Different data preprocessing treatments and four algorithms all having different approaches were also tested. Results showed that composite soil map units were not adequate for the machine learning process. Class balance did not contribute to improving the performance of classifiers. Accuracy values of 78 % and a Kappa index of 0.67 were obtained after preprocessing procedures with Random Forest, the algorithm that performed best. Information from conventional map units of semi-detailed (4th order) 1:100,000 soil survey generated models with values for accuracy, precision, sensitivity, specificity and Kappa indexes that support their use in programs for systematic soil surveying.(AU)
Assuntos
Monitoramento do Solo , Mapeamento Geográfico , Mineração de DadosRESUMO
Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions.