Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 128
Filtrar
1.
Development ; 149(16)2022 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-35972204

RESUMEN

Cell division and the resulting changes to the cell organization affect the shape and functionality of all tissues. Thus, understanding the determinants of the tissue-wide changes imposed by cell division is a key question in developmental biology. Here, we use a network representation of live cell imaging data from shoot apical meristems (SAMs) in Arabidopsis thaliana to predict cell division events and their consequences at the tissue level. We show that a support vector machine classifier based on the SAM network properties is predictive of cell division events, with test accuracy of 76%, which matches that based on cell size alone. Furthermore, we demonstrate that the combination of topological and biological properties, including cell size, perimeter, distance and shared cell wall between cells, can further boost the prediction accuracy of resulting changes in topology triggered by cell division. Using our classifiers, we demonstrate the importance of microtubule-mediated cell-to-cell growth coordination in influencing tissue-level topology. Together, the results from our network-based analysis demonstrate a feedback mechanism between tissue topology and cell division in A. thaliana SAMs.


Asunto(s)
Proteínas de Arabidopsis , Arabidopsis , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , División Celular , Pared Celular/metabolismo , Regulación de la Expresión Génica de las Plantas , Meristema/metabolismo
2.
Int J Mol Sci ; 25(8)2024 Apr 13.
Artículo en Inglés | MEDLINE | ID: mdl-38673888

RESUMEN

Urease, a pivotal enzyme in nitrogen metabolism, plays a crucial role in various microorganisms, including the pathogenic Helicobacter pylori. Inhibiting urease activity offers a promising approach to combating infections and associated ailments, such as chronic kidney diseases and gastric cancer. However, identifying potent urease inhibitors remains challenging due to resistance issues that hinder traditional approaches. Recently, machine learning (ML)-based models have demonstrated the ability to predict the bioactivity of molecules rapidly and effectively. In this study, we present ML models designed to predict urease inhibitors by leveraging essential physicochemical properties. The methodological approach involved constructing a dataset of urease inhibitors through an extensive literature search. Subsequently, these inhibitors were characterized based on physicochemical properties calculations. An exploratory data analysis was then conducted to identify and analyze critical features. Ultimately, 252 classification models were trained, utilizing a combination of seven ML algorithms, three attribute selection methods, and six different strategies for categorizing inhibitory activity. The investigation unveiled discernible trends distinguishing urease inhibitors from non-inhibitors. This differentiation enabled the identification of essential features that are crucial for precise classification. Through a comprehensive comparison of ML algorithms, tree-based methods like random forest, decision tree, and XGBoost exhibited superior performance. Additionally, incorporating the "chemical family type" attribute significantly enhanced model accuracy. Strategies involving a gray-zone categorization demonstrated marked improvements in predictive precision. This research underscores the transformative potential of ML in predicting urease inhibitors. The meticulous methodology outlined herein offers actionable insights for developing robust predictive models within biochemical systems.


Asunto(s)
Inhibidores Enzimáticos , Aprendizaje Automático , Ureasa , Ureasa/antagonistas & inhibidores , Ureasa/química , Ureasa/metabolismo , Inhibidores Enzimáticos/química , Inhibidores Enzimáticos/farmacología , Helicobacter pylori/enzimología , Helicobacter pylori/efectos de los fármacos , Algoritmos , Humanos
3.
Environ Sci Technol ; 57(49): 20636-20646, 2023 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-38011382

RESUMEN

Cyanobacterial harmful algal blooms (CyanoHABs) pose serious risks to inland water resources. Despite advancements in our understanding of associated environmental factors and modeling efforts, predicting CyanoHABs remains challenging. Leveraging an integrated water quality data collection effort in Iowa lakes, this study aimed to identify factors associated with hazardous microcystin levels and develop one-week-ahead predictive classification models. Using water samples from 38 Iowa lakes collected between 2018 and 2021, feature selection was conducted considering both linear and nonlinear properties. Subsequently, we developed three model types (Neural Network, XGBoost, and Logistic Regression) with different sampling strategies using the nine selected variables (mcyA_M, TKN, % hay/pasture, pH, mcyA_M:16S, % developed, DOC, dewpoint temperature, and ortho-P). Evaluation metrics demonstrated the strong performance of the Neural Network with oversampling (ROC-AUC 0.940, accuracy 0.861, sensitivity 0.857, specificity 0.857, LR+ 5.993, and 1/LR- 5.993), as well as the XGBoost with downsampling (ROC-AUC 0.944, accuracy 0.831, sensitivity 0.928, specificity 0.833, LR+ 5.557, and 1/LR- 11.569). This study exhibited the intricacies of modeling with limited data and class imbalances, underscoring the importance of continuous monitoring and data collection to improve predictive accuracy. Also, the methodologies employed can serve as meaningful references for researchers tackling similar challenges in diverse environments.


Asunto(s)
Cianobacterias , Floraciones de Algas Nocivas , Lagos/microbiología , Iowa
4.
Mol Divers ; 2023 Jul 21.
Artículo en Inglés | MEDLINE | ID: mdl-37479824

RESUMEN

In this study, we built classification models using machine learning techniques to predict the bioactivity of non-covalent inhibitors of Bruton's tyrosine kinase (BTK) and to provide interpretable and transparent explanations for these predictions. To achieve this, we gathered data on BTK inhibitors from the Reaxys and ChEMBL databases, removing compounds with covalent bonds and duplicates to obtain a dataset of 3895 inhibitors of non-covalent. These inhibitors were characterized using MACCS fingerprints and Morgan fingerprints, and four traditional machine learning algorithms (decision trees (DT), random forests (RF), support vector machines (SVM), and extreme gradient boosting (XGBoost)) were used to build 16 classification models. In addition, four deep learning models were developed using deep neural networks (DNN). The best model, Model D_4, which was built using XGBoost and MACCS fingerprints, achieved an accuracy of 94.1% and a Matthews correlation coefficient (MCC) of 0.75 on the test set. To provide interpretable explanations, we employed the SHAP method to decompose the predicted values into the contributions of each feature. We also used K-means dimensionality reduction and hierarchical clustering to visualize the clustering effects of molecular structures of the inhibitors. The results of this study were validated using crystal structures, and we found that the interaction between the BTK amino acid residue and the important features of clustered scaffold was consistent with the known properties of the complex crystal structures. Overall, our models demonstrated high predictive ability and a qualitative model can be converted to a quantitative model to some extent by SHAP, making them valuable for guiding the design of new BTK inhibitors with desired activity.

5.
Mol Divers ; 27(3): 1037-1051, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35737257

RESUMEN

Histone deacetylase (HDAC) 1, a member of the histone deacetylases family, plays a pivotal role in various tumors. In this study, we collected 7313 human HDAC1 inhibitors with bioactivities to form a dataset. Then, the dataset was divided into a training set and a test set using two splitting methods: (1) Kohonen's self-organizing map and (2) random splitting. The molecular structures were represented by MACCS fingerprints, RDKit fingerprints, topological torsions fingerprints and ECFP4 fingerprints. A total of 80 classification models were built by using five machine learning methods, including decision tree (DT), random forest, support vector machine, eXtreme Gradient Boosting and deep neural network. Model 15A_2 built by the XGBoost algorithm based on ECFP4 fingerprints showed the best performance, with an accuracy of 88.08% and an MCC value of 0.76 on the test set. Finally, we clustered the 7313 HDAC1 inhibitors into 31 subsets, and the substructural features in each subset were investigated. Moreover, using DT algorithm we analyzed the structure-activity relationship of HDAC1 inhibitors. It may conclude that some substructures have a significant effect on high activity, such as N-(2-amino-phenyl)-benzamide, benzimidazole, AR-42 analogues, hydroxamic acid with a middle chain alkyl and 4-aryl imidazole with a midchain of alkyl whose α carbon is chiral.


Asunto(s)
Algoritmos , Aprendizaje Automático , Humanos , Relación Estructura-Actividad , Estructura Molecular , Máquina de Vectores de Soporte , Histona Desacetilasa 1
6.
Mol Divers ; 2023 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-37142889

RESUMEN

FMS-like tyrosine kinase 3 (FLT3) is a type III receptor tyrosine kinase, which is an important target for anti-cancer therapy. In this work, we conducted a structure-activity relationship (SAR) study on 3867 FLT3 inhibitors we collected. MACCS fingerprints, ECFP4 fingerprints, and TT fingerprints were used to represent the inhibitors in the dataset. A total of 36 classification models were built based on support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and deep neural networks (DNN) algorithms. Model 3D_3 built by deep neural networks (DNN) and TT fingerprints performed best on the test set with the highest prediction accuracy of 85.83% and Matthews correlation coefficient (MCC) of 0.72 and also performed well on the external test set. In addition, we clustered 3867 inhibitors into 11 subsets by the K-Means algorithm to figure out the structural characteristics of the reported FLT3 inhibitors. Finally, we analyzed the SAR of FLT3 inhibitors by RF algorithm based on ECFP4 fingerprints. The results showed that 2-aminopyrimidine, 1-ethylpiperidine,2,4-bis(methylamino)pyrimidine, amino-aromatic heterocycle, [(2E)-but-2-enyl]dimethylamine, but-2-enyl, and alkynyl were typical fragments among highly active inhibitors. Besides, three scaffolds in Subset_A (Subset 4), Subset_B, and Subset_C showed a significant relationship to inhibition activity targeting FLT3.

7.
Multivariate Behav Res ; 58(3): 580-597, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-35507677

RESUMEN

Diagnostic classification models (DCMs) are psychometric models for evaluating a student's mastery of the essential skills in a content domain based upon their responses to a set of test items. Currently, diagnostic model and/or Q-matrix misspecification is a known problem with limited avenues for remediation. To address this problem, this paper defines a one-sided score statistic that is a computationally efficient method for detecting under-specification at the item level of both the Q-matrix and the model parameters of the particular DCM chosen in an analysis. This method is analogous to the modification indices widely used in structural equation modeling. The results of a simulation study show the Type I error rate of modification indices for DCMs are acceptably close to the nominal significance level when the appropriate mixture χ2 reference distribution is used. The simulation results indicate that modification indices are very powerful in the detection of an under-specified Q-matrix and have ample power to detect the omission of model parameters in large samples or when the items are highly discriminating. An application of modification indices for DCMs to an analysis of response data from a large-scale administration of a diagnostic test demonstrates how they can be useful in diagnostic model refinement.


Asunto(s)
Simulación por Computador , Humanos , Psicometría/métodos , Análisis de Clases Latentes
8.
Sensors (Basel) ; 23(4)2023 Feb 09.
Artículo en Inglés | MEDLINE | ID: mdl-36850564

RESUMEN

With the rise of social networks and the introduction of data protection laws, companies are training machine learning models using data generated locally by their users or customers in various types of devices. The data may include sensitive information such as family information, medical records, personal habits, or financial records that, if leaked, can generate problems. For this reason, this paper aims to introduce a protocol for training Multi-Layer Perceptron (MLP) neural networks via combining federated learning and homomorphic encryption, where the data are distributed in multiple clients, and the data privacy is preserved. This proposal was validated by running several simulations using a dataset for a multi-class classification problem, different MLP neural network architectures, and different numbers of participating clients. The results are shown for several metrics in the local and federated settings, and a comparative analysis is carried out. Additionally, the privacy guarantees of the proposal are formally analyzed under a set of defined assumptions, and the added value of the proposed protocol is identified compared with previous works in the same area of knowledge.

9.
Int J Mol Sci ; 24(12)2023 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-37373474

RESUMEN

There is early evidence of extraocular systemic signals effecting function and morphology in neovascular age-related macular degeneration (nAMD). The prospective, cross-sectional BIOMAC study is an explorative investigation of peripheral blood proteome profiles and matched clinical features to uncover systemic determinacy in nAMD under anti-vascular endothelial growth factor intravitreal therapy (anti-VEGF IVT). It includes 46 nAMD patients stratified by the level of disease control under ongoing anti-VEGF treatment. Proteomic profiles in peripheral blood samples of every patient were detected with LC-MS/MS mass spectrometry. The patients underwent extensive clinical examination with a focus on macular function and morphology. In silico analysis includes unbiased dimensionality reduction and clustering, a subsequent annotation of clinical features, and non-linear models for recognition of underlying patterns. The model assessment was performed using leave-one-out cross validation. The findings provide an exploratory demonstration of the link between systemic proteomic signals and macular disease pattern using and validating non-linear classification models. Three main results were obtained: (1) Proteome-based clustering identifies two distinct patient subclusters with the smaller one (n = 10) exhibiting a strong signature for oxidative stress response. Matching the relevant meta-features on the individual patient's level identifies pulmonary dysfunction as an underlying health condition in these patients. (2) We identify biomarkers for nAMD disease features with Aldolase C as a putative factor associated with superior disease control under ongoing anti-VEGF treatment. (3) Apart from this, isolated protein markers are only weakly correlated with nAMD disease expression. In contrast, applying a non-linear classification model identifies complex molecular patterns hidden in a high number of proteomic dimensions determining macular disease expression. In conclusion, so far unconsidered systemic signals in the peripheral blood proteome contribute to the clinically observed phenotype of nAMD, which should be examined in future translational research on AMD.


Asunto(s)
Inhibidores de la Angiogénesis , Degeneración Macular , Humanos , Inhibidores de la Angiogénesis/uso terapéutico , Ranibizumab/uso terapéutico , Factor A de Crecimiento Endotelial Vascular/metabolismo , Proteoma , Estudios Prospectivos , Cromatografía Liquida , Estudios Transversales , Proteómica , Espectrometría de Masas en Tándem , Degeneración Macular/tratamiento farmacológico , Fenotipo
10.
Molecules ; 28(15)2023 Jul 27.
Artículo en Inglés | MEDLINE | ID: mdl-37570667

RESUMEN

This study aimed to develop an analytical method to determine the geographical origin of Moroccan Argan oil through near-infrared (NIR) or mid-infrared (MIR) spectroscopic fingerprints. However, the classification may be problematic due to the spectral similarity of the components in the samples. Therefore, unsupervised and supervised classification methods-including principal component analysis (PCA), Partial Least Squares-Discriminant Analysis (PLS-DA) and Soft Independent Modeling of Class Analogy (SIMCA)-were evaluated to distinguish between Argan oils from four regions. The spectra of 93 samples were acquired and preprocessed using both standard preprocessing methods and multivariate filters, such as External Parameter Orthogonalization, Generalized Least Squares Weighting and Orthogonal Signal Correction, to improve the models. Their accuracy, precision, sensitivity, and selectivity were used to evaluate the performance of the models. SIMCA and PLS-DA models generated after standard preprocessing failed to correctly classify all samples. However, successful models were produced after using multivariate filters. The NIR and MIR classification models show an equivalent accuracy. The PLS-DA models outperformed the SIMCA with 100% accuracy, specificity, sensitivity and precision. In conclusion, the studied multivariate filters are applicable on the spectroscopic fingerprints to geographically identify the Argan oils in routine monitoring, significantly reducing analysis costs and time.

11.
Curr Psychol ; : 1-14, 2023 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-36684455

RESUMEN

Traditionally, the selection process of teacher candidates has emphasized the assessment of subject matter and pedagogical knowledge using psychometric methodologies, which simply organize candidates in continuous scales and require a large number of samples. However, these methods do not allow for the identification of candidates' knowledge profiles and learning paths, which would help develop programs tailored to support students in their training process. In this study, an evaluation instrument was developed by using the nonparametric approach to model diagnostic classifications and was then validated on a sample of 119 participants. This instrument allows for disaggregating candidates' initial knowledge and establishing relationships between its components. The results showed that candidates present a variety of profiles, which may consider more than one attribute. Not only does it provide a score that can be used for selection processes, it also provides useful information for initial teacher training methods.

12.
Educ Inf Technol (Dordr) ; 28(6): 6825-6844, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36465419

RESUMEN

Open educational resources (OER) can be cost-effective alternatives to traditional textbooks for higher education faculty to decrease student spending on textbooks. To further advocate college instructors' use of OER, understanding their value belief towards integrating OER in teaching is necessary but currently absent. This study thus analyzed 513 college instructors' value beliefs about using OER in college teaching by applying a psychometric model known as diagnostic classification models (DCMs). The findings of this study validated the three constructs in value beliefs measured by an OER user survey: engaging students, customizing classroom materials and supporting personal professional development. The results showed that a considerable number of college instructors maintained a low level of value beliefs towards using OER. We further provided individualized classification for each college instructor in terms of the three types of value beliefs. In addition, this study investigated how pre-determined latent classes of value beliefs influenced college instructors' practice and perception of using OER. Particularly, college instructors who value OER to address their profession needs are more likely to adapt OER in their teaching rather than merely reusing existing copies. Practical implications of supporting higher education faculty's use of OER are discussed in the end.

13.
Br J Psychiatry ; 220(4): 169-171, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35354505

RESUMEN

Machine-learning techniques are used in this BJPsych special issue on precision medicine in attempts to create statistical models that make clinically relevant predictions for individual patients. In this primer, we outline five key points that are helpful for a new reader to consider in order to engage with the field and evaluate the literature. These points include the consideration of why we are interested in new statistical approaches, how they may produce individualised predictions, what caveats need to be kept in-mind and why the interest and engagment of clinicians and clinical researchers is critical to successful model development and implementation. We hope that the following primer will provide shared understanding to encourage dialogue between clinical and methodological fields.


Asunto(s)
Aprendizaje Automático , Modelos Estadísticos , Humanos , Medicina de Precisión
14.
Mol Divers ; 26(3): 1531-1543, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-34345964

RESUMEN

The EGFR kinase pathway is one of the most frequently activated signaling pathways in human cancers. EGFR and HER2 are the two significant members of this pathway, which are attractive drug targets of clinical relevance in lung and breast cancer. Therefore, identifying EGFR- and HER2-specific inhibitors is one of the important challenges in cancer drug discovery. To address this issue, a dataset of 519 compounds having inhibitory activity against both the isoforms, i.e., EGFR and HER2, was collected from the literature and developed a knowledge-based computational classification model for predicting the specificity of a molecule for an isoform (EGFR/HER2) with precision. A total of seventy-two classification models using nine fingerprint types, four classifiers (IBK, NB, SMO and RF) and two different datasets (EGFR and HER2 isoform specific) were developed. It was observed that the models developed using random forest and IBK performed better for EGFR- and HER2-specific datasets, respectively. Scaffold and functional group analysis led to the identification of prevalent core and fragments in each of the datasets. The accuracy of the selected best performing models was also evaluated using the decoy dataset. We have also developed an application EGFRisopred, which integrates the best performing models and permits the user to predict the specificity of a compound as an EGFR-/HER2-specific anticancer agent. It is expected that the tool's availability as a free utility will allow researchers to identify new inhibitors against these targets important in cancer.


Asunto(s)
Antineoplásicos , Neoplasias de la Mama , Receptor ErbB-2/antagonistas & inhibidores , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Neoplasias de la Mama/tratamiento farmacológico , Receptores ErbB , Femenino , Humanos , Aprendizaje Automático , Isoformas de Proteínas
15.
Mol Divers ; 26(2): 1227-1242, 2022 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-34347229

RESUMEN

The dormant or latent form of Mycobacterium tuberculosis (MTB) is not killed by the conventional antitubercular drugs. The treatment of latent TB is essential to reduce the period of treatment as well as incidences of drug resistance. In this background, we have made an attempt to develop the quantitative structure-activity relationship models (QSAR: regression and classification based) against the dormant form of MTB and later used the developed classifier models (linear discriminant analysis (LDA) and random forest (RF)) for the two-fold classifications. The logic of applying this concept of two-fold classification for the MTB modeling is to increase the confidence of correct classification. The 2D-QSAR modeling suggested the contribution of burden eigen, edge adjacency, van der Waals (vdW) surface area, topological charge, and pharmacophoric indices in predicting the antitubercular activity against the dormant MTB. The prediction qualities of the training and test sets were found to be moderate and good, according to the mean absolute error (MAE)-based criteria's. The LDA and RF models unveiled the importance of burden eigen, edge adjacency, Geary autocorrelation, and drug-like indices as discriminating features to differentiate the antitubercular compounds into higher and lower active groups. The LDA model showed the classification accuracies of 85.14% and 87.10% for the training and test sets, while the RF model exhibited the accuracies of 100.00% and 80.65% for both the sets. The descriptors selected in the final models are only two-dimensional (2D), which are easy to compute and does not require computationally expensive steps of structure conversion, optimization, and energy minimization mandatorily needed before the computation of 3D descriptors. These models could be used for identifying and selection of higher active compounds against the dormant form of the MTB.


Asunto(s)
Mycobacterium tuberculosis , Antituberculosos/química , Antituberculosos/farmacología , Relación Estructura-Actividad Cuantitativa , Triazoles
16.
J Appl Toxicol ; 42(11): 1766-1776, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-35653511

RESUMEN

Fish is one of the model animals used to evaluate the adverse effects of a chemical exposed to the ecosystem. However, its low throughput and relevantly high expense make it impossible to test all new chemicals in manufacture. Hence, using in silico models to prioritize compounds to be tested has been widely applied in environmental risk assessment and drug discovery. In this study, we constructed the local predictive models for four fish species, including bluegill sunfish, rainbow trout, fathead minnow, and sheepshead minnow, and the global models with all four fish data. A total of 1874 unique compounds with their labels, that is, toxic (LC50 < 10 ppm) or nontoxic, were collected from ECOTOX and literature. Both conventional machine learning methods and the deep learning architecture, graph convolutional network (GCN), were used to build predictive models. The classification accuracy of the best local model for each fish species was higher than 0.83. For the global models, two strategies including consistency prediction and probability threshold were adopted to improve the predictive capability at the cost of limiting applicability domain. For 63% of compounds in domain, the accuracy was around 0.97. By comparison of the deep learning and machine learning methods, we found that the single-task GCN showed specific advantages in performance, and multitask GCN showed no advantages over the conventional machine learning methods. The data and models are available on GitHub (https://github.com/ChemPredict/ChemicalAquaticToxicity).


Asunto(s)
Cyprinidae , Aprendizaje Profundo , Animales , Ecosistema , Dosificación Letal Mediana , Aprendizaje Automático
17.
Int J Audiol ; 61(6): 515-519, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-34182868

RESUMEN

OBJECTIVE: To our knowledge, there is no published study investigating the characteristics of people experiencing tinnitus in Albania. Such a study would be important, providing the basis for further research in this region and contributing to a wider understanding of tinnitus heterogeneity across different geographic locations. The main objective of this study was to develop an Albanian translation of a standardised questionnaire for tinnitus research, namely the European School for Interdisciplinary Tinnitus Research-Screening Questionnaire (ESIT-SQ). A secondary objective was to assess its applicability and usefulness by conducting an exploratory survey on a small sample of the Albanian tinnitus population. DESIGN AND STUDY SAMPLE: Three translators were recruited to create the Albanian ESIT-SQ translation following good practice guidelines. Using this questionnaire, data from 107 patients attending otolaryngology clinics in Albania were collected. RESULTS: Participants reporting various degrees of tinnitus symptom severity had distinct phenotypic characteristics. Application of a random forest approach on this preliminary dataset showed that self-reported hearing difficulty, and tinnitus duration, pitch and temporal manifestation were important variables for predicting tinnitus symptom severity. CONCLUSIONS: Our study provided an Albanian translation of the ESIT-SQ and demonstrated that it is a useful tool for tinnitus profiling and subgrouping.


Asunto(s)
Pérdida Auditiva , Acúfeno , Humanos , Autoinforme , Encuestas y Cuestionarios , Acúfeno/diagnóstico , Acúfeno/epidemiología , Traducciones
18.
Multivariate Behav Res ; 57(5): 784-803, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34061682

RESUMEN

The information matrix or its inverse variance-covariance matrix for the maximum likelihood estimates of model parameters in diagnostic classification models plays a key role in statistical inference. Although both the item and structural parameters should be contained in the calculation of the information matrix simultaneously, previous studies have mainly focused on performance of the item parameter standard error (SE), no study has investigated the structural parameter SE estimation methods systematically. In this study, we propose a class of structural parameter SE estimation methods based on the empirical cross-product matrix, the observed information matrix, and the sandwich-type covariance matrix. A simulation study was conducted under different attribute hierarchy structures, the findings suggest that the proposed methods are useful for empirical researchers and practitioners in evaluating the variability of structural parameter estimators. We illustrate the application of the structural parameter SE estimation methods for exploring the presence of an attribute hierarchy using real data.


Asunto(s)
Modelos Estadísticos , Simulación por Computador , Funciones de Verosimilitud
19.
Molecules ; 27(6)2022 Mar 08.
Artículo en Inglés | MEDLINE | ID: mdl-35335117

RESUMEN

Dual-specific tyrosine phosphorylation regulated kinase 1 (DYRK1A) has been regarded as a potential therapeutic target of neurodegenerative diseases, and considerable progress has been made in the discovery of DYRK1A inhibitors. Identification of pharmacophoric fragments provides valuable information for structure- and fragment-based design of potent and selective DYRK1A inhibitors. In this study, seven machine learning methods along with five molecular fingerprints were employed to develop qualitative classification models of DYRK1A inhibitors, which were evaluated by cross-validation, test set, and external validation set with four performance indicators of predictive classification accuracy (CA), the area under receiver operating characteristic (AUC), Matthews correlation coefficient (MCC), and balanced accuracy (BA). The PubChem fingerprint-support vector machine model (CA = 0.909, AUC = 0.933, MCC = 0.717, BA = 0.855) and PubChem fingerprint along with the artificial neural model (CA = 0.862, AUC = 0.911, MCC = 0.705, BA = 0.870) were considered as the optimal modes for training set and test set, respectively. A hybrid data balancing method SMOTETL, a combination of synthetic minority over-sampling technique (SMOTE) and Tomek link (TL) algorithms, was applied to explore the impact of balanced learning on the performance of models. Based on the frequency analysis and information gain, pharmacophoric fragments related to DYRK1A inhibition were also identified. All the results will provide theoretical supports and clues for the screening and design of novel DYRK1A inhibitors.


Asunto(s)
Aprendizaje Automático , Máquina de Vectores de Soporte , Algoritmos
20.
Entropy (Basel) ; 24(9)2022 Sep 08.
Artículo en Inglés | MEDLINE | ID: mdl-36141147

RESUMEN

Atrial fibrillation (AF) is the most common cardiac arrhythmia, and in response to increasing clinical demand, a variety of signals and indices have been utilized for its analysis, which include complex fractionated atrial electrograms (CFAEs). New methodologies have been developed to characterize the atrial substrate, along with straightforward classification models to discriminate between paroxysmal and persistent AF (ParAF vs. PerAF). Yet, most previous works have missed the mark for the assessment of CFAE signal quality, as well as for studying their stability over time and between different recording locations. As a consequence, an atrial substrate assessment may be unreliable or inaccurate. The objectives of this work are, on the one hand, to make use of a reduced set of nonlinear indices that have been applied to CFAEs recorded from ParAF and PerAF patients to assess intra-recording and intra-patient stability and, on the other hand, to generate a simple classification model to discriminate between them. The dominant frequency (DF), AF cycle length, sample entropy (SE), and determinism (DET) of the Recurrence Quantification Analysis are the analyzed indices, along with the coefficient of variation (CV) which is utilized to indicate the corresponding alterations. The analysis of the intra-recording stability revealed that discarding noisy or artifacted CFAE segments provoked a significant variation in the CV(%) in any segment length for the DET and SE, with deeper decreases for longer segments. The intra-patient stability provided large variations in the CV(%) for the DET and even larger for the SE at any segment length. To discern ParAF versus PerAF, correlation matrix filters and Random Forests were employed, respectively, to remove redundant information and to rank the variables by relevance, while coarse tree models were built, optimally combining high-ranked indices, and tested with leave-one-out cross-validation. The best classification performance combined the SE and DF, with an accuracy (Acc) of 88.3%, to discriminate ParAF versus PerAF, while the highest single Acc was provided by the DET, reaching 82.2%. This work has demonstrated that due to the high variability of CFAEs data averaging from one recording place or among different recording places, as is traditionally made, it may lead to an unfair oversimplification of the CFAE-based atrial substrate characterization. Furthermore, a careful selection of reduced sets of features input to simple classification models is helpful to accurately discern the CFAEs of ParAF versus PerAF.

SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda