Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 73
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Methods ; 229: 133-146, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38944134

RESUMEN

Asparagine peptide lyase (APL) is among the seven groups of proteases, also known as proteolytic enzymes, which are classified according to their catalytic residue. APLs are synthesized as precursors or propeptides that undergo self-cleavage through autoproteolytic reaction. At present, APLs are grouped into 10 families belonging to six different clans of proteases. Recognizing their critical roles in many biological processes including virus maturation, and virulence, accurate identification and characterization of APLs is indispensable. Experimental identification and characterization of APLs is laborious and time-consuming. Here, we developed APLpred, a novel support vector machine (SVM) based predictor that can predict APLs from the primary sequences. APLpred was developed using Boruta-based optimal features derived from seven encodings and subsequently trained using five machine learning algorithms. After evaluating each model on an independent dataset, we selected APLpred (an SVM-based model) due to its consistent performance during cross-validation and independent evaluation. We anticipate APLpred will be an effective tool for identifying APLs. This could aid in designing inhibitors against these enzymes and exploring their functions. The APLpred web server is freely available at https://procarb.org/APLpred/.


Asunto(s)
Máquina de Vectores de Soporte , Aprendizaje Automático , Biología Computacional/métodos , Programas Informáticos , Secuencia de Aminoácidos/genética , Bases de Datos de Proteínas
2.
Cardiovasc Diabetol ; 23(1): 163, 2024 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-38725059

RESUMEN

BACKGROUND: Sepsis is a severe form of systemic inflammatory response syndrome that is caused by infection. Sepsis is characterized by a marked state of stress, which manifests as nonspecific physiological and metabolic changes in response to the disease. Previous studies have indicated that the stress hyperglycemia ratio (SHR) can serve as a reliable predictor of adverse outcomes in various cardiovascular and cerebrovascular diseases. However, there is limited research on the relationship between the SHR and adverse outcomes in patients with infectious diseases, particularly in critically ill patients with sepsis. Therefore, this study aimed to explore the association between the SHR and adverse outcomes in critically ill patients with sepsis. METHODS: Clinical data from 2312 critically ill patients with sepsis were extracted from the MIMIC-IV (2.2) database. Based on the quartiles of the SHR, the study population was divided into four groups. The primary outcome was 28-day all-cause mortality, and the secondary outcome was in-hospital mortality. The relationship between the SHR and adverse outcomes was explored using restricted cubic splines, Cox proportional hazard regression, and Kaplan‒Meier curves. The predictive ability of the SHR was assessed using the Boruta algorithm, and a prediction model was established using machine learning algorithms. RESULTS: Data from 2312 patients who were diagnosed with sepsis were analyzed. Restricted cubic splines demonstrated a "U-shaped" association between the SHR and survival rate, indicating that an increase in the SHR is related to an increased risk of adverse events. A higher SHR was significantly associated with an increased risk of 28-day mortality and in-hospital mortality in patients with sepsis (HR > 1, P < 0.05) compared to a lower SHR. Boruta feature selection showed that SHR had a higher Z score, and the model built using the rsf algorithm showed the best performance (AUC = 0.8322). CONCLUSION: The SHR exhibited a U-shaped relationship with 28-day all-cause mortality and in-hospital mortality in critically ill patients with sepsis. A high SHR is significantly correlated with an increased risk of adverse events, thus indicating that is a potential predictor of adverse outcomes in patients with sepsis.


Asunto(s)
Biomarcadores , Glucemia , Causas de Muerte , Enfermedad Crítica , Bases de Datos Factuales , Mortalidad Hospitalaria , Hiperglucemia , Aprendizaje Automático , Valor Predictivo de las Pruebas , Sepsis , Humanos , Sepsis/mortalidad , Sepsis/diagnóstico , Sepsis/sangre , Masculino , Femenino , Persona de Mediana Edad , Estudios Retrospectivos , Anciano , Medición de Riesgo , Factores de Tiempo , Factores de Riesgo , Pronóstico , Hiperglucemia/diagnóstico , Hiperglucemia/mortalidad , Hiperglucemia/sangre , Glucemia/metabolismo , Biomarcadores/sangre , Técnicas de Apoyo para la Decisión , China/epidemiología
3.
Cardiovasc Diabetol ; 23(1): 243, 2024 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-38987779

RESUMEN

BACKGROUND: The prevalence of obesity-associated insulin resistance (IR) is increasing along with the increase in obesity rates. In this study, we compared the predictive utility of four alternative indexes of IR [triglyceride glucose index (TyG index), metabolic score for insulin resistance (METS-IR), the triglyceride/high-density lipoprotein cholesterol (TG/HDL-C) ratio and homeostatic model assessment of insulin resistance (HOMA-IR)] for all-cause mortality and cardiovascular mortality in the general population based on key variables screened by the Boruta algorithm. The aim was to find the best replacement index of IR. METHODS: In this study, 14,653 participants were screened from the National Health and Nutrition Examination Survey (2001-2018). And TyG index, METS-IR, TG/HDL-C and HOMA-IR were calculated separately for each participant according to the given formula. The predictive values of IR replacement indexes for all-cause mortality and cardiovascular mortality in the general population were assessed. RESULTS: Over a median follow-up period of 116 months, a total of 2085 (10.23%) all-cause deaths and 549 (2.61%) cardiovascular disease (CVD) related deaths were recorded. Multivariate Cox regression and restricted cubic splines analysis showed that among the four indexes, only METS-IR was significantly associated with both all-cause and CVD mortality, and both showed non-linear associations with an approximate "U-shape". Specifically, baseline METS-IR lower than the inflection point (41.33) was negatively associated with mortality [hazard ratio (HR) 0.972, 95% CI 0.950-0.997 for all-cause mortality]. In contrast, baseline METS-IR higher than the inflection point (41.33) was positively associated with mortality (HR 1.019, 95% CI 1.011-1.026 for all-cause mortality and HR 1.028, 95% CI 1.014-1.043 for CVD mortality). We further stratified the METS-IR and showed that significant associations between METS-IR levels and all-cause and cardiovascular mortality were predominantly present in the nonelderly population aged < 65 years. CONCLUSIONS: In conjunction with the results of the Boruta algorithm, METS-IR demonstrated a more significant association with all-cause and cardiovascular mortality in the U.S. population compared to the other three alternative IR indexes (TyG index, TG/HDL-C and HOMA-IR), particularly evident in individuals under 65 years old.


Asunto(s)
Biomarcadores , Glucemia , Enfermedades Cardiovasculares , Causas de Muerte , Resistencia a la Insulina , Síndrome Metabólico , Encuestas Nutricionales , Valor Predictivo de las Pruebas , Triglicéridos , Humanos , Masculino , Femenino , Enfermedades Cardiovasculares/mortalidad , Enfermedades Cardiovasculares/diagnóstico , Enfermedades Cardiovasculares/sangre , Persona de Mediana Edad , Medición de Riesgo , Adulto , Estados Unidos/epidemiología , Biomarcadores/sangre , Anciano , Triglicéridos/sangre , Pronóstico , Glucemia/metabolismo , Factores de Tiempo , Síndrome Metabólico/mortalidad , Síndrome Metabólico/diagnóstico , Síndrome Metabólico/sangre , Síndrome Metabólico/epidemiología , HDL-Colesterol/sangre , Insulina/sangre , Factores de Riesgo de Enfermedad Cardiaca , Factores de Riesgo
4.
Br J Nutr ; 131(12): 2058-2067, 2024 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-38606596

RESUMEN

Machine learning methods have been used in identifying omics markers for a variety of phenotypes. We aimed to examine whether a supervised machine learning algorithm can improve identification of alcohol-associated transcriptomic markers. In this study, we analysed array-based, whole-blood derived expression data for 17 873 gene transcripts in 5508 Framingham Heart Study participants. By using the Boruta algorithm, a supervised random forest (RF)-based feature selection method, we selected twenty-five alcohol-associated transcripts. In a testing set (30 % of entire study participants), AUC (area under the receiver operating characteristics curve) of these twenty-five transcripts were 0·73, 0·69 and 0·66 for non-drinkers v. moderate drinkers, non-drinkers v. heavy drinkers and moderate drinkers v. heavy drinkers, respectively. The AUC of the selected transcripts by the Boruta method were comparable to those identified using conventional linear regression models, for example, AUC of 1958 transcripts identified by conventional linear regression models (false discovery rate < 0·2) were 0·74, 0·66 and 0·65, respectively. With Bonferroni correction for the twenty-five Boruta method-selected transcripts and three CVD risk factors (i.e. at P < 6·7e-4), we observed thirteen transcripts were associated with obesity, three transcripts with type 2 diabetes and one transcript with hypertension. For example, we observed that alcohol consumption was inversely associated with the expression of DOCK4, IL4R, and SORT1, and DOCK4 and SORT1 were positively associated with obesity, and IL4R was inversely associated with hypertension. In conclusion, using a supervised machine learning method, the RF-based Boruta algorithm, we identified novel alcohol-associated gene transcripts.


Asunto(s)
Consumo de Bebidas Alcohólicas , Algoritmos , Humanos , Consumo de Bebidas Alcohólicas/genética , Masculino , Femenino , Persona de Mediana Edad , Aprendizaje Automático , Enfermedades Cardiovasculares/genética , Transcriptoma , Adulto , Factores de Riesgo , Aprendizaje Automático Supervisado , Bosques Aleatorios
5.
Thromb J ; 22(1): 76, 2024 Aug 16.
Artículo en Inglés | MEDLINE | ID: mdl-39152448

RESUMEN

PURPOSE: To identify the key risk factors for venous thromboembolism (VTE) in urological inpatients based on the Caprini scale using an interpretable machine learning method. METHODS: VTE risk data of urological inpatients were obtained based on the Caprini scale in the case hospital. Based on the data, the Boruta method was used to further select the key variables from the 37 variables in the Caprini scale. Furthermore, decision rules corresponding to each risk level were generated using the rough set (RS) method. Finally, random forest (RF), support vector machine (SVM), and backpropagation artificial neural network (BPANN) were used to verify the data accuracy and were compared with the RS method. RESULTS: Following the screening, the key risk factors for VTE in urology were "(C1) Age," "(C2) Minor Surgery planned," "(C3) Obesity (BMI > 25)," "(C8) Varicose veins," "(C9) Sepsis (< 1 month)," (C10) "Serious lung disease incl. pneumonia (< 1month) " (C11) COPD," "(C16) Other risk," "(C18) Major surgery (> 45 min)," "(C19) Laparoscopic surgery (> 45 min)," "(C20) Patient confined to bed (> 72 h)," "(C18) Malignancy (present or previous)," "(C23) Central venous access," "(C31) History of DVT/PE," "(C32) Other congenital or acquired thrombophilia," and "(C34) Stroke (< 1 month." According to the decision rules of different risk levels obtained using the RS method, "(C1) Age," "(C18) Major surgery (> 45 minutes)," and "(C21) Malignancy (present or previous)" were the main factors influencing mid- and high-risk levels, and some suggestions on VTE prevention were indicated based on these three factors. The average accuracies of the RS, RF, SVM, and BPANN models were 79.5%, 87.9%, 92.6%, and 97.2%, respectively. In addition, BPANN had the highest accuracy, recall, F1-score, and precision. CONCLUSIONS: The RS model achieved poorer accuracy than the other three common machine learning models. However, the RS model provides strong interpretability and allows for the identification of high-risk factors and decision rules influencing high-risk assessments of VTE in urology. This transparency is very important for clinicians in the risk assessment process.

6.
Network ; : 1-38, 2024 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-38511557

RESUMEN

Interpretable machine learning models are instrumental in disease diagnosis and clinical decision-making, shedding light on relevant features. Notably, Boruta, SHAP (SHapley Additive exPlanations), and BorutaShap were employed for feature selection, each contributing to the identification of crucial features. These selected features were then utilized to train six machine learning algorithms, including LR, SVM, ETC, AdaBoost, RF, and LR, using diverse medical datasets obtained from public sources after rigorous preprocessing. The performance of each feature selection technique was evaluated across multiple ML models, assessing accuracy, precision, recall, and F1-score metrics. Among these, SHAP showcased superior performance, achieving average accuracies of 80.17%, 85.13%, 90.00%, and 99.55% across diabetes, cardiovascular, statlog, and thyroid disease datasets, respectively. Notably, the LGBM emerged as the most effective algorithm, boasting an average accuracy of 91.00% for most disease states. Moreover, SHAP enhanced the interpretability of the models, providing valuable insights into the underlying mechanisms driving disease diagnosis. This comprehensive study contributes significant insights into feature selection techniques and machine learning algorithms for disease diagnosis, benefiting researchers and practitioners in the medical field. Further exploration of feature selection methods and algorithms holds promise for advancing disease diagnosis methodologies, paving the way for more accurate and interpretable diagnostic models.

7.
Child Care Health Dev ; 50(4): e13291, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38895948

RESUMEN

BACKGROUND: Epidemiological and nutritional modifications are causing an increase in stunting in many low- and middle-income countries (LMIC), which will eventually result in juvenile diseases and mortality. Therefore, this study aimed to identify the influential factors contributing to stunting among under-five children in Cambodia. METHODS: A secondary dataset consisting of 3268 under-five children was extracted from the latest Cambodian Demographic and Health Survey (CDHS)-2021/2022 dataset. The Chi-square test and Boruta algorithm were used for covariate selection, and logistic regression approaches were used to determine the influence of demographic, socioeconomic and other factors on the presence of stunting. RESULTS: Findings revealed that about 21% of under-five children were stunted, and the prevalence of stunting was higher in rural areas than in urban areas. The prevalence of child stunting was lower in families with highly educated parents. A child whose father had a secondary education had 0.71 times lower (adjusted odds ratio [AOR]: 0.71, 95% CI: 0.520-0.969) chance of stunting than a child whose father had no education. Findings revealed that Ratnak Kiri, Mondul Kiri, Stung Treng, Pursat and Kampot had a greater prevalence of stunting than other places, ranging from 27.11% to 35.70%, whereas Banteay Meanchey, Phnom Penh and Kandal had the lowest rates, ranging from 12.80% to 16.00%. Results of the Boruta algorithm and logistic regression suggested that under-five stunting is significantly influenced by factors such as the child's age, size at birth, mother's age at first birth, mother's body mass index (BMI), father's educational status, cooking fuel, and wealth index. CONCLUSIONS: It is necessary to take initiatives for reducing the prevalence of stunted children prioritising the identified factors that ultimately help to reduce the burden of child health. The authors believed that the findings of this study will be helpful for policymakers in designing the appropriate policies and actions to achieve the Sustainable Development Goals (SDGs) by reducing stunting among under-five children in Cambodia.


Asunto(s)
Trastornos del Crecimiento , Encuestas Epidemiológicas , Factores Socioeconómicos , Humanos , Trastornos del Crecimiento/epidemiología , Cambodia/epidemiología , Masculino , Preescolar , Femenino , Lactante , Prevalencia , Población Rural/estadística & datos numéricos , Factores de Riesgo , Recién Nacido , Población Urbana/estadística & datos numéricos , Estado Nutricional
8.
BMC Oral Health ; 24(1): 1047, 2024 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-39243071

RESUMEN

OBJECTIVES: Temporomandibular disorders (TMDs) have a relatively high prevalence among university students. This study aimed to identify independent risk factors for TMD in university students and develop an effective risk prediction model. METHODS: This study included 1,122 university students from four universities in Changchun City, Jilin Province, as subjects. Predictive factors were screened by using the least absolute shrinkage and selection operator (LASSO) regression and the machine learning Boruta algorithm in the training cohort. A multifactorial logistic regression analysis was used to construct a TMD risk prediction model. Internal validation of the model was conducted via bootstrap resampling, and an external validation cohort comprised 205 university students undergoing oral examinations at the Stomatological Hospital of Jilin University. RESULTS: The prevalence of TMD among university students was 44.30%. Ten predictive factors were included in the model, comprising gender, facial cold stimulation, unilateral chewing, biting hard or resilient foods, clenching teeth, grinding teeth, excessive mouth opening, malocclusion, stress, and anxiety. The model demonstrated good predictive ability with area under the receiver operating characteristic curve (AUC) values of 0.853, 0.838, and 0.821 in the training cohort, internal validation cohort, and external validation cohort, respectively. The calibration curves demonstrated that the predicted results were consistent with the actual results, and the decision curve analysis (DCA) indicated the model's high clinical utility. CONCLUSIONS: An online nomogram of TMD in university students with good predictive performance was constructed, which can effectively predict the risk of TMD in university students. The model provides a useful tool for the early identification and treatment of TMDs in university students, helping clinicians to predict the probability of TMDs in each patient, thus providing more personalized and accurate treatment decisions for patients.


Asunto(s)
Nomogramas , Estudiantes , Trastornos de la Articulación Temporomandibular , Humanos , Trastornos de la Articulación Temporomandibular/epidemiología , Femenino , Masculino , Universidades , Estudiantes/estadística & datos numéricos , Factores de Riesgo , Adulto Joven , Medición de Riesgo , China/epidemiología , Prevalencia , Adolescente , Adulto
9.
BMC Bioinformatics ; 24(1): 224, 2023 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-37264332

RESUMEN

BACKGROUND AND OBJECTIVE: As a common chronic disease, diabetes is called the "second killer" among modern diseases. Currently, there is no medical cure for diabetes. We can only rely on medication for auxiliary treatment. However, many diabetic patients still die each year. In addition, a considerable number of people do not pay attention to their physical health or opt out of treatment due to lack of money, which eventually leads to various complications. Therefore, diagnosing diabetes at an early stage and intervening early is necessary; thus, developing an early detection method for diabetes is essential. METHODS: In this study, a diabetes prediction model based on Boruta feature selection and ensemble learning is proposed. The model contains the use of Boruta feature selection, the extraction of salient features from datasets, the use of the K-Means++ algorithm for unsupervised clustering of data and stacking of an ensemble learning method for classification. It has been validated on a diabetes dataset. RESULTS: The experiments were performed on the PIMA Indian diabetes dataset. The model was evaluated by accuracy, precision and F1 index. The obtained results show that the accuracy rate of the model reaches 98% and achieves good results. CONCLUSION: Compared with other diabetes prediction models, this model achieved better results, and the obtained results indicate that this model is superior to other models in diabetes prediction and has better performance.


Asunto(s)
Diabetes Mellitus , Aprendizaje Automático , Humanos , Diabetes Mellitus/diagnóstico , Algoritmos , Diagnóstico Precoz
10.
Int J Mol Sci ; 24(6)2023 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-36982674

RESUMEN

Window of implantation (WOI) genes have been comprehensively identified at the single cell level. DNA methylation changes in cervical secretions are associated with in vitro fertilization embryo transfer (IVF-ET) outcomes. Using a machine learning (ML) approach, we aimed to determine which methylation changes in WOI genes from cervical secretions best predict ongoing pregnancy during embryo transfer. A total of 2708 promoter probes were extracted from mid-secretory phase cervical secretion methylomic profiles for 158 WOI genes, and 152 differentially methylated probes (DMPs) were selected. Fifteen DMPs in 14 genes (BMP2, CTSA, DEFB1, GRN, MTF1, SERPINE1, SERPINE2, SFRP1, STAT3, TAGLN2, TCF4, THBS1, ZBTB20, ZNF292) were identified as the most relevant to ongoing pregnancy status. These 15 DMPs yielded accuracy rates of 83.53%, 85.26%, 85.78%, and 76.44%, and areas under the receiver operating characteristic curves (AUCs) of 0.90, 0.91, 0.89, and 0.86 for prediction by random forest (RF), naïve Bayes (NB), support vector machine (SVM), and k-nearest neighbors (KNN), respectively. SERPINE1, SERPINE2, and TAGLN2 maintained their methylation difference trends in an independent set of cervical secretion samples, resulting in accuracy rates of 71.46%, 80.06%, 80.72%, and 80.68%, and AUCs of 0.79, 0.84, 0.83, and 0.82 for prediction by RF, NB, SVM, and KNN, respectively. Our findings demonstrate that methylation changes in WOI genes detected noninvasively from cervical secretions are potential markers for predicting IVF-ET outcomes. Further studies of cervical secretion of DNA methylation markers may provide a novel approach for precision embryo transfer.


Asunto(s)
Infertilidad Femenina , beta-Defensinas , Femenino , Embarazo , Humanos , Metilación de ADN , Teorema de Bayes , Serpina E2/genética , Infertilidad Femenina/metabolismo , Endometrio/metabolismo , Implantación del Embrión/genética , Marcadores Genéticos , Fertilización In Vitro/métodos , beta-Defensinas/metabolismo , Proteínas Portadoras/metabolismo , Proteínas del Tejido Nervioso/metabolismo
11.
J Gen Intern Med ; 37(11): 2727-2735, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35112279

RESUMEN

BACKGROUND: Adverse health effects resulting from falls are a major public health concern. Although studies have identified risk factors for falls, none have examined long-term prediction of fall risk. Furthermore, recent evidence suggests that there are additional risk factors, such as psychosocial factors. OBJECTIVE: In this 3-year longitudinal study, we evaluated a predictive model for risk of fall among community-dwelling older adults using machine learning methods. DESIGN: A 3-year follow-up prospective longitudinal study (from 2010 to 2013). SETTING: Twenty-four municipalities in nine of the 47 prefectures (provinces) of Japan. PARTICIPANTS: Community-dwelling individuals aged ≥65 years who were functionally independent at baseline (n = 61,883). METHODS: The baseline survey was conducted from August 2010 to January 2012, and the follow-up survey was conducted from October to December 2013. Both surveys were conducted involving self-reported questionnaires. The measured outcome at the follow-up survey was self-reported multiple falls during the previous year. The 142 variables included in the baseline survey were regarded as candidate predictors. The random-forest-based Boruta algorithm was used to select predictors, and the eXtreme Gradient Boosting algorithm with 10 repetitions of nested k-fold cross-validation was used for modeling and model evaluation. Furthermore, we used shapley additive explanations to gain insight into the behavior of the prediction model. KEY RESULTS: Fourteen out of 142 candidate features were selected as predictors. Among these predictors, experience of falling as of the baseline survey was the most important feature, followed by self-rated health and age. Moreover, sense of coherence was newly identified as a risk factor for falls. CONCLUSIONS: This study suggests that machine learning tools can be adapted to explore new associative factors, make accurate predictions, and provide actionable insights for fall prevention strategies.


Asunto(s)
Vida Independiente , Aprendizaje Automático , Anciano , Humanos , Estudios Longitudinales , Estudios Prospectivos , Factores de Riesgo
12.
Sensors (Basel) ; 22(7)2022 Mar 23.
Artículo en Inglés | MEDLINE | ID: mdl-35408096

RESUMEN

Hydraulic systems are advanced in function and level as they are used in various industrial fields. Furthermore, condition monitoring using internet of things (IoT) sensors is applied for system maintenance and management. In this study, meaningful features were identified through extraction and selection of various features, and classification evaluation metrics were presented through machine learning and deep learning to expand the diagnosis of abnormalities and defects in each component of the hydraulic system. Data collected from IoT sensor data in the time domain were divided into clusters in predefined sections. The shape and density characteristics were extracted by cluster. Among 2335 newly extracted features, related features were selected using correlation coefficients and the Boruta algorithm for each hydraulic component and used for model learning. Linear discriminant analysis (LDA), logistic regression, support vector classifier (SVC), decision tree, random forest, XGBoost, LightGBM, and multi-layer perceptron were used to calculate the true positive rate (TPR) and true negative rate (TNR) for each hydraulic component to detect normal and abnormal conditions. Valve condition, internal pump leakage, and hydraulic accumulator data showed TPR performance of 0.94 or more and a TNR performance of 0.84 or more. This study's findings can help to determine the stable and unstable states of each component of the hydraulic system and form the basis for engineers' judgment.


Asunto(s)
Internet de las Cosas , Algoritmos , Análisis Discriminante , Aprendizaje Automático , Redes Neurales de la Computación
13.
Int J Mol Sci ; 23(17)2022 Aug 23.
Artículo en Inglés | MEDLINE | ID: mdl-36076915

RESUMEN

Streptococcus pyogenes, or group A Streptococcus (GAS), a gram-positive bacterium, is implicated in a wide range of clinical manifestations and life-threatening diseases. One of the key virulence factors of GAS is streptopain, a C10 family cysteine peptidase. Since its discovery, various homologs of streptopain have been reported from other bacterial species. With the increased affordability of sequencing, a significant increase in the number of potential C10 family-like sequences in the public databases is anticipated, posing a challenge in classifying such sequences. Sequence-similarity-based tools are the methods of choice to identify such streptopain-like sequences. However, these methods depend on some level of sequence similarity between the existing C10 family and the target sequences. Therefore, in this work, we propose a novel predictor, C10Pred, for the prediction of C10 peptidases using sequence-derived optimal features. C10Pred is a support vector machine (SVM) based model which is efficient in predicting C10 enzymes with an overall accuracy of 92.7% and Matthews' correlation coefficient (MCC) value of 0.855 when tested on an independent dataset. We anticipate that C10Pred will serve as a handy tool to classify novel streptopain-like proteins belonging to the C10 family and offer essential information.


Asunto(s)
Proteasas de Cisteína , Cisteína , Aprendizaje Automático , Proteínas , Máquina de Vectores de Soporte
14.
J Environ Manage ; 312: 114951, 2022 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-35364516

RESUMEN

Drought hazard is one of the main consequences of global warming and climate change. Unlike other natural disasters, drought has complex climatic features. Therefore, accurate drought monitoring is a challenging task. This paper proposes a framework for assessing drought classifications at the regional level. The proposed framework provides a new drought monitoring indicator called Multi-Scalar Seasonally Amalgamated Regional Standardized Precipitation Evapotranspiration Index (MSARSPEI). MSARSPEI is an amalgam of the Standardized Precipitation Evapotranspiration (SPEI) (Vicente-Serrano et al., 2010) and Regionally Improved Weighted Standardized Drought Index (RIWSDI) (Jiang et al., 2020). In the proposed framework, the Boruta algorithm of feature selection is configured to ensemble monthly time series data of evaporation in various meteorological stations located in specific regions. Further, the framework suggests the standardization of the Cumulative Distribution Function (CDF) of K-Component Gaussian (K-CG) mixture distribution function for obtaining MSARSPEI data. The application of the proposed framework is based on seven different regions of Pakistan. For comparative analysis, this paper compared the performance of MSARSPE with SPEI using Pearson correlation. Outcomes associated with this research show that the proposed regional drought index has a strong correlation with the competing indicator in various time scales. In addition, the study assessed the spatial extent of various drought classifications under MSARSPEI. In summation, this research concludes that the choice of the MSARSPEI is rationally valid and more appropriate for the regional assessment of drought under the global warming scenario.


Asunto(s)
Sequías , Calentamiento Global , Cambio Climático , Meteorología , Pakistán
15.
Entropy (Basel) ; 24(11)2022 Oct 29.
Artículo en Inglés | MEDLINE | ID: mdl-36359649

RESUMEN

The huge amount of power fingerprint data often has the problem of unbalanced categories and is difficult to upload by the limited data transmission rate for IoT communications. An optimized LightGBM power fingerprint extraction and identification method based on entropy features is proposed. First, the voltage and current signals were extracted on the basis of the time-domain features and V-I trajectory features, and a 56-dimensional original feature set containing six entropy features was constructed. Then, the Boruta algorithm with a light gradient boosting machine (LightGBM) as the base learner was used for feature selection of the original feature set, and a 23-dimensional optimal feature subset containing five entropy features was determined. Finally, the Optuna algorithm was used to optimize the hyperparameters of the LightGBM classifier. The classification performance of the power fingerprint identification model on imbalanced datasets was further improved by improving the loss function of the LightGBM model. The experimental results prove that the method can effectively reduce the computational complexity of feature extraction and reduce the amount of power fingerprint data transmission. It meets the recognition accuracy and efficiency requirements of a massive power fingerprint identification system.

16.
Ecotoxicol Environ Saf ; 228: 112996, 2021 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-34814005

RESUMEN

The quick identification of heavy metals is of major importance and is beneficial for controlling the fertilizer production process in the fertilizer industries. This work aimed to use visible and near-infrared spectroscopy (Vis-NIR), Boruta, and deep learning to establish rapid heavy metals screening methods. Boruta algorithm was used to extract appropriate wavelengths, and a deep belief network (DBN) was computed to determine the amounts of various heavy metals such as chromium (Cr), cadmium (Cd), lead (Pb), and mercury (Hg) for both the entire and selected wavelengths. To assess the model, coefficient of determination (R2), root mean squared error (RMSE), and residual prediction deviation (RPD) were used to calculate the reliability of the model. The results of the selected wavelengths were excellent and much higher than the full wavelengths with R2p = 0.96, RMSEP = 0.2017 mg kg-1 and RPDpred = 5.0 for Cr; R2p = 0.91, RMSEP = 0.2832 mg kg-1 and RPDpred = 3.4 for Pb; R2p = 0.90, RMSEP = 0.2992 mg kg-1, and RPDpred = 3.3 for Hg. Descent prediction was obtained also for Cd (R2p = 0.87, RMSEP = 0.3435 mg kg-1, and RPDpred = 2.7). To further assess the robustness of the DBN, it was compared with conventional machine learning methods such as support vector machine for regression (SVR), k nearest neighbor (KNN), and partial least squares (PLS). The overall results indicated that the Vis-NIR technique coupled with Boruta and DBN could be reliable and accurate for screening heavy metals in organic fertilizers.

17.
Sensors (Basel) ; 21(12)2021 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-34199163

RESUMEN

In this paper, an explainable AI-based fault diagnosis model for bearings is proposed with five stages, i.e., (1) a data preprocessing method based on the Stockwell Transformation Coefficient (STC) is proposed to analyze the vibration signals for variable speed and load conditions, (2) a statistical feature extraction method is introduced to capture the significance from the invariant pattern of the analyzed data by STC, (3) an explainable feature selection process is proposed by introducing a wrapper-based feature selector-Boruta, (4) a feature filtration method is considered on the top of the feature selector to avoid the multicollinearity problem, and finally, (5) an additive Shapley explanation followed by k-NN is proposed to diagnose and to explain the individual decision of the k-NN classifier for debugging the performance of the diagnosis model. Thus, the idea of explainability is introduced for the first time in the field of bearing fault diagnosis in two steps: (a) incorporating explainability to the feature selection process, and (b) interpretation of the classifier performance with respect to the selected features. The effectiveness of the proposed model is demonstrated on two different datasets obtained from separate bearing testbeds. Lastly, an assessment of several state-of-the-art fault diagnosis algorithms in rotating machinery is included.


Asunto(s)
Algoritmos , Vibración , Inteligencia Artificial , Análisis de Falla de Equipo
18.
Environ Monit Assess ; 193(4): 162, 2021 Mar 05.
Artículo en Inglés | MEDLINE | ID: mdl-33665671

RESUMEN

Understanding the spatial distribution of soil nutrients and factors affecting their concentration and availability is crucial for soil fertility management and sustainable land utilization while quantifying factors affecting soil nitrogen distribution in Qorveh-Dehgolan plain is mostly lacking. This study, thus, aimed at digital modeling and mapping the spatial distribution of topsoil total nitrogen (TN) in Qorveh-Dehgolan plain with an area of 150,000 ha using random forest (RF), decision tree (DT), and cubist (CB) algorithms. A total of 130 observation points were collected from a depth of 0 to 30 cm from topsoil surfaces based on a random sampling pattern. Then, soil physicochemical properties, calcium carbonate equivalent, organic carbon, and topsoil total nitrogen were measured. A number of 51 environmental variables including 31 geomorphometric attributes derived from a digital elevation model with 12.5-m spatial resolution, 13 spectral indices and reflectance from SENTINEL-2 satellite (MSIsensor), and five soil properties and two spatial variables of latitude and longitude were used as covariates for digital mapping of topsoil total nitrogen. The most appropriate covariates were then selected by the Boruta algorithm in the R software environment. A standard deviation map was produced to show model uncertainty. The covariate selection resulted in the separation of 14 effective covariates in the spatial prediction of topsoil total nitrogen by using the data mining algorithms. The validation of digital mapping of topsoil total nitrogen by RF, DT, and CB models using 20% of independent data showed root mean square error (RMSE) of 0.032, 0.035, and 0.043%; mean absolute error (MAE) of 0.0008, 0.001, and 0.002%; and based on the coefficients of determination of 0.42, 0.38, 0.35, respectively. Relative importance (RI) of environmental covariates using the %IncMSE index indicated the importance of two geomorphometric variables of midslope position and normalized height along with SAVI and NDVI remote sensing variables in the spatial modeling and distribution of total nitrogen in the studied lands. The RF prediction and associated uncertainty maps, with show high accuracy and low standard deviation in the most part of study area, reveled low overfitting and overtraining in soil-landscape modeling; so, this model can lead to the development of a digital map of soil surface properties with acceptable accuracy for sustainable land utilization.


Asunto(s)
Monitoreo del Ambiente , Nitrógeno , Aprendizaje Automático , Suelo , Incertidumbre
19.
Trop Anim Health Prod ; 53(3): 395, 2021 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-34245361

RESUMEN

BACKGROUND: Assigning animals to their corresponding breeds through breed informative single-nucleotide polymorphisms (SNPs) is required in many fields. For instance, it is used in the traceability and the authentication of meat and other livestock products. SNPs' information for several pork breeds are now accessible thanks to the availability of dense SNP chips. These SNP chips cover a large number of molecular markers distributed across the entire genome. To identify the pork breed from a sample of industrial meat, one must analyze a large panel of genetic markers depending on the SNP chip used. The analysis of such large datasets requires intensive work. This leads to the idea of creating less dense chips of breed informative markers based on a reduced number of SNPs. Therefore, the analysis of the data emanating from the genotyping of these reduced chips will require less time and effort. AIM: The objective of this study is to find the most informative SNPs for the discrimination between four pig breeds, namely Duroc, Landrace, Large White, and Pietrain. METHOD: The Illumina Porcine 60 k SNP chip was used to genotype SNPs distributed all over the individuals' genomes. Firstly, we used three different statistical approaches for feature selection: (i) principal component analysis (PCA), (ii) least absolute shrinkage and selection operator (LASSO), and (iii) random forest (RF). These three approaches identified three sets of SNPs; each set corresponds to one approach. Then, we combined the results of the three methods by setting up a final panel containing the SNPs which appear on the three sets altogether. RESULTS: Separately, each method resulted in a panel with the corresponding most discriminating SNPs. The PCA, the LASSO, and the random forest with Boruta algorithm highlighted 28,816, 50, and 286 SNPs, respectively. The number of SNPs selected by PCA is high compared to Boruta and LASSO because PCA chooses the variables while preserving as much information about the data as possible. The only downside of LASSO regression is that among a group of correlated variables, LASSO tends to select only one variable and ignore the others regardless of their importance. Contrarily to LASSO, the Boruta algorithm considers the interdependence between SNPs and selects informative variables even if they are correlated and have the same effect. The three panels shared 23 SNPs; the distribution of the individuals according to these SNPs showed a grouping of individuals of each breed in well-defined clusters without any overlapping. CONCLUSIONS: The biological pathways represented by 23 breed informative SNPs resulted by the combination of PCA, LASSO, and Boruta should be explored in further analysis. The results provided by our study are promising for further applications of this method in other livestock animals.


Asunto(s)
Genética de Población , Polimorfismo de Nucleótido Simple , Animales , Marcadores Genéticos , Genotipo , Análisis de Secuencia por Matrices de Oligonucleótidos/veterinaria , Porcinos/genética
20.
BMC Genomics ; 21(1): 416, 2020 Jun 22.
Artículo en Inglés | MEDLINE | ID: mdl-32571208

RESUMEN

BACKGROUND: Recent literature on the differential role of genes within networks distinguishes core from peripheral genes. If previous works have shown contrasting features between them, whether such categorization matters for phenotype prediction remains to be studied. RESULTS: We measured 17 phenotypic traits for 241 cloned genotypes from a Populus nigra collection, covering growth, phenology, chemical and physical properties. We also sequenced RNA for each genotype and built co-expression networks to define core and peripheral genes. We found that cores were more differentiated between populations than peripherals while being less variable, suggesting that they have been constrained through potentially divergent selection. We also showed that while cores were overrepresented in a subset of genes statistically selected for their capacity to predict the phenotypes (by Boruta algorithm), they did not systematically predict better than peripherals or even random genes. CONCLUSION: Our work is the first attempt to assess the importance of co-expression network connectivity in phenotype prediction. While highly connected core genes appear to be important, they do not bear enough information to systematically predict better quantitative traits than other gene sets.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Populus/crecimiento & desarrollo , Regulación del Desarrollo de la Expresión Génica , Regulación de la Expresión Génica de las Plantas , Genotipo , Aprendizaje Automático , Fenotipo , Proteínas de Plantas/genética , Populus/genética , Sitios de Carácter Cuantitativo , Análisis de Secuencia de ARN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA