Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros











Intervalo de ano de publicação
1.
Food Res Int ; 179: 113958, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38342522

RESUMO

Bee pollen is considered an excellent dietary supplement with functional characteristics, and it has been employed in food and cosmetics formulations and in biomedical applications. Therefore, understanding its chemical composition, particularly crude protein contents, is essential to ensure its quality and industrial application. For the quantification of crude protein in bee pollen, this study explored the potential of combining digital image analysis and Random Forest algorithm for the development of a rapid, cost-effective, and environmentally friendly analytical methodology. Digital images of bee pollen samples (n = 244) were captured using a smartphone camera with controlled lighting. RGB channels intensities and color histograms were extracted using open source softwares. Crude protein contents were determined using the Kjeldahl method (reference) and in combination with RGB channels and color histograms data from digital images, they were used to generate a predictive model through the application of the Random Forest algorithm. The developed model exhibited good performance and predictive capability for crude protein analysis in bee pollen (R2 = 80.93 %; RMSE = 1.49 %; MAE = 1.26 %). Thus, the developed analytical methodology can be considered environmentally friendly according to the AGREE metric, making it an excellent alternative to conventional analysis methods. It avoids the use of toxic reagents and solvents, demonstrates energy efficiency, utilizes low-cost instrumentation, and it is robust and precise. These characteristics indicate its potential for easy implementation in routine analysis of crude protein in bee pollen samples in quality control laboratories.


Assuntos
Pólen , Algoritmo Florestas Aleatórias , Animais , Abelhas , Pólen/química , Proteínas/análise , Suplementos Nutricionais
2.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38033292

RESUMO

Throughout evolution, pathogenic viruses have developed different strategies to evade the response of the adaptive immune system. To carry out successful replication, some pathogenic viruses encode different proteins that manipulate the molecular mechanisms of host cells. Currently, there are different bioinformatics tools for virus research; however, none of them focus on predicting viral proteins that evade the adaptive system. In this work, we have developed a novel tool based on machine and deep learning for predicting this type of viral protein named VirusHound-I. This tool is based on a model developed with the multilayer perceptron algorithm using the dipeptide composition molecular descriptor. In this study, we have also demonstrated the robustness of our strategy for data augmentation of the positive dataset based on generative antagonistic networks. During the 10-fold cross-validation step in the training dataset, the predictive model showed 0.947 accuracy, 0.994 precision, 0.943 F1 score, 0.995 specificity, 0.896 sensitivity, 0.894 kappa, 0.898 Matthew's correlation coefficient and 0.989 AUC. On the other hand, during the testing step, the model showed 0.964 accuracy, 1.0 precision, 0.967 F1 score, 1.0 specificity, 0.936 sensitivity, 0.929 kappa, 0.931 Matthew's correlation coefficient and 1.0 AUC. Taking this model into account, we have developed a tool called VirusHound-I that makes it possible to predict viral proteins that evade the host's adaptive immune system. We believe that VirusHound-I can be very useful in accelerating studies on the molecular mechanisms of evasion of pathogenic viruses, as well as in the discovery of therapeutic targets.


Assuntos
Proteínas Virais , Vírus , Proteínas Virais/genética , Proteínas Virais/química , Algoritmo Florestas Aleatórias , Redes Neurais de Computação , Algoritmos , Vírus/genética
3.
Int. j. morphol ; 41(4): 1267-1272, ago. 2023. ilus, tab
Artigo em Inglês | LILACS | ID: biblio-1514354

RESUMO

SUMMARY: In the study, it was aimed to predict sex from hand measurements using machine learning algorithms (MLA). Measurements were made on MR images of 60 men and 60 women. Determined parameters; hand length (HL), palm length (PL), hand width (HW), wrist width (EBG), metacarpal I length (MIL), metacarpal I width (MIW), metacarpal II length (MIIL), metacarpal II width (MIIW), metacarpal III length (MIIL), metacarpal III width (MIIIW), metacarpal IV length (MIVL), metacarpal IV width (MIVW), metacarpal V length (MVL), metacarpal V width (MVW), phalanx I length (PILL), measured as phalanx II length (PIIL), phalanx III length (PIIL), phalanx IV length (PIVL), phalanx V length (PVL). In addition, the hand index (HI) was calculated. Logistic Regression (LR), Random Forest (RF), Linear Discriminant Analysis (LDA), K-nearest neighbour (KNN) and Naive Bayes (NB) were used as MLAs. In the study, the KNN algorithm's Accuracy, SEN, F1 and Specificity ratios were determined as 88 %. In this study using MLA, it is understood that the highest accuracy belongs to the KNN algorithm. Except for the hand's MIIW, MIIIW, MIVW, MVW, HI variables, other variables were statistically significant in terms of sex difference.


En el estudio, el objetivo era predecir el sexo a partir de mediciones manuales utilizando algoritmos de aprendizaje automático (MLA). Las mediciones se realizaron en imágenes de RM de 60 hombres y 60 mujeres. Parámetros determinados; longitud de la mano (HL), longitud de la palma (PL), ancho de la mano (HW), ancho de la muñeca (EBG), longitud del metacarpiano I (MIL), ancho del metacarpiano I (MIW), longitud del metacarpiano II (MIIL), ancho del metacarpiano II (MIIW), longitud del metacarpiano III (MIIL), ancho del metacarpiano III (MIIIW), longitud del metacarpiano IV (MIVL), ancho del metacarpiano IV (MIVW), longitud del metacarpiano V (MVL), ancho del metacarpiano V (MVW), longitud de la falange I (PILL), medido como longitud de la falange II (PIIL), longitud de la falange III (PIIL), longitud de la falange IV (PIVL), longitud de la falange V (PVL). Además, se calculó el índice de la mano (HI). Regresión logística (LR), Random Forest (RF), Análisis discriminante lineal (LDA), K-vecino más cercano (KNN) y Naive Bayes (NB) se utilizaron como MLA. En el estudio, las proporciones de precisión, SEN, F1 y especificidad del algoritmo KNN se determinaron en un 88 %. En este estudio que utiliza MLA, se entiende que la mayor precisión pertenece al algoritmo KNN. Excepto por las variables MIIW, MIIIW, MIVW, MVW, HI de la mano, otras variables fueron estadísticamente significativas en términos de diferencia de sexo.


Assuntos
Humanos , Masculino , Feminino , Ossos do Carpo/diagnóstico por imagem , Falanges dos Dedos da Mão/diagnóstico por imagem , Ossos Metacarpais/diagnóstico por imagem , Determinação do Sexo pelo Esqueleto/métodos , Algoritmos , Imageamento por Ressonância Magnética , Ossos do Carpo/anatomia & histologia , Análise Discriminante , Modelos Logísticos , Falanges dos Dedos da Mão/anatomia & histologia , Ossos Metacarpais/anatomia & histologia , Aprendizado de Máquina , Algoritmo Florestas Aleatórias
4.
Sci Rep ; 13(1): 11402, 2023 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-37452079

RESUMO

Inferring causal relationships from observational data is a key challenge in understanding the interpretability of Machine Learning models. Given the ever-increasing amount of observational data available in many areas, Machine Learning algorithms used for forecasting have become more complex, leading to a less understandable path of how a decision is made by the model. To address this issue, we propose leveraging ensemble models, e.g., Random Forest, to assess which input features the trained model prioritizes when making a forecast and, in this way, establish causal relationships between the variables. The advantage of these algorithms lies in their ability to provide feature importance, which allows us to build the causal network. We present our methodology to estimate causality in time series from oil field production. As it is difficult to extract causal relations from a real field, we also included a synthetic oil production dataset and a weather dataset, which is also synthetic, to provide the ground truth. We aim to perform causal discovery, i.e., establish the existing connections between the variables in each dataset. Through an iterative process of improving the forecasting of a target's value, we evaluate whether the forecasting improves by adding information from a new potential driver; if so, we state that the driver causally affects the target. On the oil field-related datasets, our causal analysis results agree with the interwell connections already confirmed by tracer information; whenever the tracer data are available, we used it as our ground truth. This consistency between both estimated and confirmed connections provides us the confidence about the effectiveness of our proposed methodology. To our knowledge, this is the first time causal analysis using solely production data is employed to discover interwell connections in an oil field dataset.


Assuntos
Algoritmos , Algoritmo Florestas Aleatórias , Fatores de Tempo , Causalidade , Previsões
5.
J Diabetes Res ; 2023: 9713905, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37404324

RESUMO

The development of medical diagnostic models to support healthcare professionals has witnessed remarkable growth in recent years. Among the prevalent health conditions affecting the global population, diabetes stands out as a significant concern. In the domain of diabetes diagnosis, machine learning algorithms have been widely explored for generating disease detection models, leveraging diverse datasets primarily derived from clinical studies. The performance of these models heavily relies on the selection of the classifier algorithm and the quality of the dataset. Therefore, optimizing the input data by selecting relevant features becomes essential for accurate classification. This research presents a comprehensive investigation into diabetes detection models by integrating two feature selection techniques: the Akaike information criterion and genetic algorithms. These techniques are combined with six prominent classifier algorithms, including support vector machine, random forest, k-nearest neighbor, gradient boosting, extra trees, and naive Bayes. By leveraging clinical and paraclinical features, the generated models are evaluated and compared to existing approaches. The results demonstrate superior performance, surpassing accuracies of 94%. Furthermore, the use of feature selection techniques allows for working with a reduced dataset. The significance of feature selection is underscored in this study, showcasing its pivotal role in enhancing the performance of diabetes detection models. By judiciously selecting relevant features, this approach contributes to the advancement of medical diagnostic capabilities and empowers healthcare professionals in making informed decisions regarding diabetes diagnosis and treatment.


Assuntos
Algoritmos , Diabetes Mellitus , Humanos , Teorema de Bayes , Aprendizado de Máquina , Diabetes Mellitus/diagnóstico , Algoritmo Florestas Aleatórias
6.
J Trace Elem Med Biol ; 78: 127164, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37031660

RESUMO

BACKGROUND: Brazil has consolidated a relevant position in the world market, being the largest exporter and second producer of beef. Genetics, feeding system, geographic origin and climate influence the multielement profile of beef. The feasibility of combining classification algorithms with major and trace elements was evaluated as a tool for authentication of beef cuts. METHODS: Animals of Angus, Nelore and Wagyu crossbreeds, raised in a vertically integrated system, were sampled at the slaughterhouse for chuck steak, rump cap and sirloin steak. Supervised learning algorithms i.e. Classification and Regression Tree (CART), Multilayer Perceptron (MLP), Naïve Bayes (NB), Random Forest (RF) and Sequential Minimal Optimization (SMO) were used to build classification models based on the multielement profile of beef determined by neutron activation analysis. RESULTS: Br, Co, Cs, Fe, K, Na, Rb, Se and Zn were determined in the beef samples. The classification accuracy values obtained for the beef cuts were 96% (MLP), 95% (SMO), 91% (RF), 86% (NB) and 70% (CART). CONCLUSION: The Multilayer Perceptron algorithm provided the best classification performance towards authentication of beef cuts on basis of major and trace element mass fractions.


Assuntos
Algoritmos , Aprendizado de Máquina , Animais , Bovinos , Teorema de Bayes , Algoritmo Florestas Aleatórias , Brasil
7.
Environ Sci Pollut Res Int ; 30(22): 61863-61887, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36934187

RESUMO

In this article, the optimization of the specific urease activity (SUA) and the calcium carbonate (CaCO3) using microbially induced calcite precipitation (MICP) was compared to optimization using three algorithms based on machine learning: random forest regressor, artificial neural networks (ANNs), and multivariate linear regression. This study applied the techniques in two existing response surface method (RSM) experiments involving MICP technique. Random forest-based models and artificial neural network-based models were submitted through the optimization of hyperparameters via cross-validation technique and grid search, to select the best-optimized model. For this study, the random forest-based algorithm is aimed at having the best performance of 0.9381 and 0.9463 in comparison to the original r2 of 0.9021 and 0.8530, respectively. This study is aimed at exploring the capability of using machine learning-based models in small datasets for the purpose of optimization of experimental variables in MICP technique and the meaningfulness of the models by their specificities in the small experimental datasets applied to experimental designs. This study is aimed at exploring the capability of using machine learning-based models in small datasets for experimental variable optimization in MICP technique. The use of these techniques can create prerogatives to scale and mitigate costs in future experiments associated to the field.


Assuntos
Redes Neurais de Computação , Algoritmo Florestas Aleatórias , Algoritmos , Aprendizado de Máquina , Carbonato de Cálcio
9.
Sensors (Basel) ; 23(2)2023 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-36679580

RESUMO

Driver identification refers to the process whose primary purpose is identifying the person behind the steering wheel using collected information about the driver him/herself. The constant monitoring of drivers through sensors generates great benefits in advanced driver assistance systems (ADAS), to learn more about the behavior of road users. Currently, there are many research works that address the subject in search of creating intelligent models that help to identify vehicle users in an efficient and objective way. However, the different methodologies proposed to create these models are based on data generated from sensors that include different vehicle brands on routes established in real environments, which, although they provide very important information for different purposes, in the case of driver identification, there may be a certain degree of bias due to the different situations in which the route environment may change. The proposed method seeks to intelligently and objectively select the most outstanding statistical features from motor activity generated in the main elements of the vehicle with genetic algorithms for driver identification, this process being newer than those established by the state-of-the-art. The results obtained from the proposal were an accuracy of 90.74% to identify two drivers and 62% for four, using a Random Forest Classifier (RFC). With this, it can be concluded that a comprehensive selection of features can greatly optimize the identification of drivers.


Assuntos
Condução de Veículo , Humanos , Masculino , Acidentes de Trânsito , Algoritmo Florestas Aleatórias , Aprendizagem , Atividade Motora
10.
PLoS One ; 18(1): e0277858, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36719891

RESUMO

We disclose a methodology to determine the participants in discussions and their contributions in social networks with a local relationship (e.g., nationality), providing certain levels of trust and efficiency in the process. The dynamic is a challenge that has demanded studies and some approximations to recent solutions. The study addressed the problem of identifying the nationality of users in the Twitter social network before an opinion request (of a political nature and social participation). The employed methodology classifies, via machine learning, the Twitter users' nationality to carry out opinion studies in three Central American countries. The Random Forests algorithm is used to generate classification models with small training samples, using exclusively numerical characteristics based on the number of times that different interactions among users occur. When averaging the proportions achieved by inferences of the ratio of nationals of each country, in the initial data, an average of 77.40% was calculated, compared to 91.60% averaged after applying the automatic classification model, an average increase of 14.20%. In conclusion, it can be seen that the suggested set of method provides a reasonable approach and efficiency in the face of opinion problems.


Assuntos
Etnicidade , Mídias Sociais , Humanos , Rede Social , Atitude , Algoritmo Florestas Aleatórias
11.
Ciênc. rural (Online) ; 53(10): e20220327, 2023. tab, graf
Artigo em Inglês | VETINDEX | ID: biblio-1418792

RESUMO

Quantile Random Forest (QRF) is a non-parametric methodology that combines the advantages of Random Forest (RF) and Quantile Regression (QR). Specifically, this approach can explore non-linear functions, determining the probability distribution of a response variable and extracting information from different quantiles instead of just predicting the mean. This evaluated the performance of the QRF in the genomic prediction for complex traits (epistasis and dominance). In addition, compare the accuracies obtained with those derived from the G-BLUP. The simulation created an F2 population with 1,000 individuals and genotyped for 4,010 SNP markers. Besides, twelve traits were simulated from a model considering additive and non-additive effects, QTL (Quantitative trait loci) numbers ranging from eight to 120, and heritability of 0.3, 0.5, or 0.8. For training and validation, the 5-fold cross-validation approach was used. For each fold, the accuracies of all the proposed models were calculated: QRF in five different quantiles and three G-BLUP models (additive effect, additive and epistatic effects, additive and dominant effects). Finally, the predictive performance of these methodologies was compared. In all scenarios, the QRF accuracies were equal to or greater than the methodologies evaluated and proved to be an alternative tool to predict genetic values in complex traits.


Quantile Random Forest (QRF) é uma metodologia não paramétrica, que combina as vantagens do Random Forest (RF) e da Regressão Quantílica (QR). Especificamente, essa abordagem pode explorar funções não lineares, determinando a distribuição de probabilidade de uma variável resposta e extraindo informações de diferentes quantis em vez de apenas prever a média. O objetivo deste trabalho foi avaliar o desempenho do QRF em predizer o valor genético genômico para características com arquitetura genética não aditiva (epistasia e dominância). Adicionalmente, as acurácias obtidas foram comparadas com aquelas advindas do G-BLUP. A simulação criou uma população F2 com 1.000 indivíduos genotipados para 4.010 marcadores SNP. Além disso, doze características foram simuladas a partir de um modelo considerando efeitos aditivos e não aditivos, com número de QTL (Quantitative trait loci) variando de oito a 120 e herdabilidade de 0,3, 0,5 ou 0,8. Para treinamento e validação foi usada a abordagem da validação cruzada 5-fold. Para cada um dos folds foram calculadas as acurácias de todos os modelos propostos: QRF em cinco quantis diferentes e três modelos do G-BLUP (com efeito aditivo, aditivo e epistático, aditivo e dominante). Por fim, o desempenho preditivo dessas metodologias foi comparado. Em todos os cenários, as acurácias do QRF foram iguais ou superiores às metodologias avaliadas e mostrou ser uma ferramenta alternativa para predizer valores genéticos em características complexas.


Assuntos
Seleção Genética , Genoma , Genômica , Epistasia Genética , Algoritmo Florestas Aleatórias
12.
PeerJ ; 10: e11683, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35480565

RESUMO

Background: Plant innate immunity relies on a broad repertoire of receptor proteins that can detect pathogens and trigger an effective defense response. Bioinformatic tools based on conserved domain and sequence similarity are within the most popular strategies for protein identification and characterization. However, the multi-domain nature, high sequence diversity and complex evolutionary history of disease resistance (DR) proteins make their prediction a real challenge. Here we present RFPDR, which pioneers the application of Random Forest (RF) for Plant DR protein prediction. Methods: A recently published collection of experimentally validated DR proteins was used as a positive dataset, while 10x10 nested datasets, ranging from 400-4,000 non-DR proteins, were used as negative datasets. A total of 9,631 features were extracted from each protein sequence, and included in a full dimension (FD) RFPDR model. Sequence selection was performed, to generate a reduced-dimension (RD) RFPDR model. Model performances were evaluated using an 80/20 (training/testing) partition, with 10-cross fold validation, and compared to baseline, sequence-based and state-of-the-art strategies. To gain some insights into the underlying biology, the most discriminatory sequence-based features in the RF classifier were identified. Results and Discussion: RD-RFPDR showed to be sensitive (86.4 ± 4.0%) and specific (96.9 ± 1.5%) for identifying DR proteins, while robust to data imbalance. Its high performance and robustness, added to the fact that RD-RFPDR provides valuable information related to DR proteins underlying properties, make RD-RFPDR an interesting approach for DR protein prediction, complementing the state-of-the-art strategies.


Assuntos
Proteínas de Plantas , Algoritmo Florestas Aleatórias , Proteínas de Plantas/genética , Resistência à Doença , Sequência de Aminoácidos , Plantas
13.
Chaos ; 32(12): 123118, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36587353

RESUMO

The aim of this study is to formulate a new methodology based upon informational tools to detect patients with cardiac arrhythmias. As it is known, sudden death is the consequence of a final arrhythmia, and here lies the relevance of the efforts aimed at the early detection of arrhythmias. The information content in the time series from an electrocardiogram (ECG) signal is conveyed in the form of a probability distribution function, to compute the permutation entropy proposed by Bandt and Pompe. This selection was made seeking its remarkable conceptual simplicity, computational speed, and robustness to noise. In this work, two well-known databases were used, one containing normal sinus rhythms and another one containing arrhythmias, both from the MIT medical databank. For different values of embedding time delay τ, normalized permutation entropy and statistical complexity measure are computed to finally represent them on the horizontal and vertical axes, respectively, which define the causal plane H×C. To improve the results obtained in previous works, a feature set composed by these two magnitudes is built to train the following supervised machine learning algorithms: random forest (RF), support vector machine (SVM), and k nearest neighbors (kNN). To evaluate the performance of each classification technique, a 10-fold cross-validation scheme repeated 10 times was implemented. Finally, to select the best model, three quality parameters were computed, namely, accuracy, the area under the receiver operative characteristic (ROC) curve (AUC), and the F1-score. The results obtained show that the best classification model to detect the ECG coming from arrhythmic patients is RF. The values of the quality parameters were at the same levels reported in the available literature using a larger data set, thus supporting this proposal that uses a very small-sized feature space to train the model later used to classify. Summarizing, the attained results show the possibility to discriminate both groups of patients, with normal sinus rhythm or arrhythmic ECG, showing a promising efficiency in the definition of new markers for the detection of cardiovascular pathologies.


Assuntos
Algoritmos , Arritmias Cardíacas , Humanos , Arritmias Cardíacas/diagnóstico , Algoritmo Florestas Aleatórias , Eletrocardiografia/métodos , Máquina de Vetores de Suporte
14.
J Biomol Struct Dyn ; 40(22): 11948-11967, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34463205

RESUMO

The disease caused by the new type of coronavirus, Covid-19, has posed major public health challenges for many countries. With its rapid spread, since the beginning of the outbreak in December 2019, the disease transmitted by SARS-CoV-2 has already caused over 2 million deaths to date. In this work, we propose a web solution, called Heg.IA, to optimize the diagnosis of Covid-19 through the use of artificial intelligence. Our system aims to support decision-making regarding to diagnosis of Covid-19 and to the indication of hospitalization on regular ward, semi-ICU or ICU based on decision a Random Forest architecture with 90 trees. The main idea is that healthcare professionals can insert 41 hematological parameters from common blood tests and arterial gasometry into the system. Then, Heg.IA will provide a diagnostic report. The system reached good results for both Covid-19 diagnosis and to recommend hospitalization. For the first scenario we found average results of accuracy of 92.891%±0.851, kappa index of 0.858 ± 0.017, sensitivity of 0.936 ± 0.011, precision of 0.923 ± 0.011, specificity of 0.921 ± 0.012 and area under ROC of 0.984 ± 0.003. As for the indication of hospitalization, we achieved excellent performance of accuracies above 99% and more than 0.99 for the other metrics in all situations. By using a computationally simple method, based on the classical decision trees, we were able to achieve high diagnosis performance. Heg.IA system may be a way to overcome the testing unavailability in the context of Covid-19.Communicated by Ramaswamy H. Sarma.


Assuntos
COVID-19 , Humanos , COVID-19/diagnóstico , SARS-CoV-2 , Teste para COVID-19 , Algoritmo Florestas Aleatórias , Inteligência Artificial , Testes Hematológicos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA