Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 129
Filtrar
1.
Comput Biol Chem ; 113: 108215, 2024 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-39378821

RESUMO

This work presents a novel feature extraction method for identifying complex patterns in genomic sequences by employing the Hidden Markov Model (HMM). In this study, we use HMM to identify gene nucleotide patterns that are specific to malignant and non-malignant cells. Crucial genetic components DNA and RNA are involved in many biological processes that impact both healthy and malignant cells. Early patient identification is essential to successful cancer diagnosis and therapy. Varying nucleotide patterns indicate different cellular responses, which are important to understanding the molecular causes of cancer and associated disorders. We present a detailed study of nucleotide patterns in whole raw nucleotide sequences with variations in both protein sequence (CDS) and non-protein sequence (NCDS) in both malignant and non-malignant cells. Nucleotide prediction has been achieved while computational expenses are reduced by using the proposed HMM for feature extraction and selection. The classification models implemented in this work for cancer detection are Gradient-Boosted Decision Trees (GBDT), Random Forests (RF), Decision Trees (DT), and Support Vector Machines (SVM) with kernels. The suggested classification model's accuracy and 10-fold cross-validation have been validated via comprehensive case studies. The results reveal that DT and ensemble learning techniques significantly differentiate between malignant and non-malignant DNA sequences. SVM with suitable kernels improves cancer detection accuracy significantly. Combining feature reduction approaches with nucleotide pattern classifiers based on Hidden Markov models improves performance and ensures reliable cancer detection.

2.
Sci Rep ; 14(1): 22759, 2024 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-39354017

RESUMO

Due to the limited hydrophobic properties of porcelain insulators, applying anti-pollution flashover coatings is crucial to enhance their functionality. This research outlines a classification system for assessing contamination levels on 22 kV porcelain insulators, both with and without coatings. It synthesizes six classification criteria derived through both numerical simulations and experimental studies to effectively gauge contamination severity. The study examined insulators treated with Room Temperature Vulcanizing (RTV) silicone under three different conditions: uncoated, partially coated, and fully coated. Additionally, the research assessed the effects of humidity on these polluted insulators to understand environmental impacts on their performance. The criteria, which are the flashover voltage (x1), fifth to third harmonics of leakage current (x2), maximum electric field (x3), total harmonic index (x4), insulation resistance (x5) and dielectric loss (x6), were proposed for evaluating the insulator's string condition. The finite element method (FEM) was used to simulate an electric field. Then, based on the proposed criteria, the performances of the Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Multi-layer Perceptron (MLP) have been trained and compared to classify polluted insulator conditions with and without coating. The established criteria facilitate precise monitoring of the condition of high-voltage insulators, ensuring quick and effective responses that support the stability of the electrical power system.

3.
Biology (Basel) ; 13(10)2024 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-39452089

RESUMO

The development of current sexing methods largely depends on the use of adequate sources of data and adjustable classification techniques. Most sex estimation methods have been based on linear measurements, while the angles have been largely ignored, potentially leading to the loss of valuable information for sex discrimination. This study aims to evaluate the usefulness of cranial angles for sex estimation and to differentiate the most dimorphic ones by training machine learning algorithms. Computed tomography images of 154 males and 180 females were used to derive data of 36 cranial angles. The classification models were created by support vector machines, naïve Bayes, logistic regression, and the rule-induction algorithm CN2. A series of cranial angle subsets was arranged by an attribute selection scheme. The algorithms achieved the highest accuracy on subsets of cranial angles, most of which correspond to well-known features for sex discrimination. Angles characterizing the lower forehead and upper midface were included in the best-performing models of all algorithms. The accuracy results showed the considerable classification potential of the cranial angles. The study demonstrates the value of the cranial angles as sex indicators and the possibility to enhance the sex estimation accuracy by using them.

4.
Food Chem ; 460(Pt 3): 140728, 2024 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-39121772

RESUMO

Pigmented rice contains beneficial phenolic antioxidants but analysing them across germplasm collections is laborious and time-consuming. Here we utilised rapid surface Fourier transform infrared (FTIR) spectroscopy and machine learning algorithms (ML) to predict and classify polyphenolic antioxidants. Total phenolics, flavonoids, anthocyanins, and proanthocyanidins were quantified biochemically from 270 diverse global coloured rice collection and attenuated total reflectance (ATR) FTIR spectra were obtained by scanning whole grain surfaces at 800-4000 cm-1. Five ML classification models were optimised using the biochemical and spectral data which performed predictions with 93.5%-100% accuracy. Random Forest and Support Vector Machine models identified key FTIR peaks linked to flavonols, flavones and anthocyanins as important model predictors. This research successfully established direct and non-destructive surface chemistry spectroscopy of the aleurone layer of pigmented rice integrated with ML models as a viable high-throughput platform to accelerate the analysis and profiling of nutritionally valuable coloured rice varieties.


Assuntos
Antioxidantes , Aprendizado de Máquina , Oryza , Fenóis , Oryza/química , Espectroscopia de Infravermelho com Transformada de Fourier , Antioxidantes/química , Fenóis/química , Fenóis/análise , Sementes/química , Ensaios de Triagem em Larga Escala
5.
Clin Neurophysiol ; 166: 152-165, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39178550

RESUMO

OBJECTIVE: To assess the value of combining brain and autonomic measures to discriminate the subjective perception of pain from other sensory-cognitive activations. METHODS: 20 healthy individuals received 2 types of tonic painful stimulation delivered to the hand: electrical stimuli and immersion in 10 Celsius degree (°C) water, which were contrasted with non-painful immersion in 15 °C water, and stressful cognitive testing. High-density electroencephalography (EEG) and autonomic measures (pupillary, electrodermal and cardiovascular) were continuously recorded, and the accuracy of pain detection based on combinations of electrophysiological features was assessed using machine learning procedures. RESULTS: Painful stimuli induced a significant decrease in contralateral EEG alpha power. Cardiac, electrodermal and pupillary reactivities occurred in both painful and stressful conditions. Classification models, trained on leave-one-out cross-validation folds, showed low accuracy (61-73%) of cortical and autonomic features taken independently, while their combination significantly improved accuracy to 93% in individual reports. CONCLUSIONS: Changes in cortical oscillations reflecting somatosensory salience and autonomic changes reflecting arousal can be triggered by many activating signals other than pain; conversely, the simultaneous occurrence of somatosensory activation plus strong autonomic arousal has great probability of reflecting pain uniquely. SIGNIFICANCE: Combining changes in cortical and autonomic reactivities appears critical to derive accurate indexes of acute pain perception.


Assuntos
Sistema Nervoso Autônomo , Eletroencefalografia , Dor , Humanos , Masculino , Feminino , Adulto , Sistema Nervoso Autônomo/fisiopatologia , Dor/fisiopatologia , Dor/diagnóstico , Eletroencefalografia/métodos , Córtex Cerebral/fisiopatologia , Adulto Jovem , Medição da Dor/métodos , Resposta Galvânica da Pele/fisiologia , Percepção da Dor/fisiologia , Estimulação Elétrica/métodos
6.
Front Plant Sci ; 15: 1411772, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39070913

RESUMO

Cooking time is a crucial determinant of culinary quality of cassava roots and incorporating it into the early stages of breeding selection is vital for breeders. This study aimed to assess the potential of near-infrared spectroscopy (NIRS) in classifying cassava genotypes based on their cooking times. Five cooking times (15, 20, 25, 30, and 40 minutes) were assessed and 888 genotypes evaluated over three crop seasons (2019/2020, 2020/2021, and 2021/2022). Fifteen roots from five plants per plot, featuring diameters ranging from 4 to 7 cm, were randomly chosen for cooking analysis and spectral data collection. Two root samples (15 slices each) per genotype were collected, with the first set aside for spectral data collection, processed, and placed in two petri dishes, while the second set was utilized for cooking assessment. Cooking data were classified into binary and multiclass variables (CT4C and CT6C). Two NIRs devices, the portable QualitySpec® Trek (QST) and the benchtop NIRFlex N-500 were used to collect spectral data. Classification of genotypes was carried out using the K-nearest neighbor algorithm (KNN) and partial least squares (PLS) models. The spectral data were split into a training set (80%) and an external validation set (20%). For binary variables, the classification accuracy for cassava cooking time was notably high ( R C a l 2 ranging from 0.72 to 0.99). Regarding multiclass variables, accuracy remained consistent across classes, models, and NIR instruments (~0.63). However, the KNN model demonstrated slightly superior accuracy in classifying all cooking time classes, except for the CT4C variable (QST) in the NoCook and 25 min classes. Despite the increased complexity associated with binary classification, it remained more efficient, offering higher classification accuracy for samples and facilitating the selection of the most relevant time or variables, such as cooking time ≤ 30 minutes. The accuracy of the optimal scenario for classifying samples with a cooking time of 30 minutes reached R C a l 2   = 0.86 and R V a l 2 = 0.84, with a Kappa value of 0.53. Overall, the models exhibited a robust fit for all cooking times, showcasing the significant potential of NIRs as a high-throughput phenotyping tool for classifying cassava genotypes based on cooking time.

7.
Foods ; 13(11)2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38890831

RESUMO

Date palm (Phoenix dactylifera L.) fruit samples belonging to the 'Mejhoul' and 'Boufeggous' cultivars were harvested at the Tamar stage and used in our experiments. Before scanning, date samples were dried using convective drying at 60 °C and infrared drying at 60 °C with a frequency of 50 Hz, and then they were scanned. The scanning trials were performed for two hundred date palm fruit in fresh, convective-dried, and infrared-dried forms of each cultivar using a flatbed scanner. The image-texture parameters of date fruit were extracted from images converted to individual color channels in RGB, Lab, XYZ, and UVS color models. The models to classify fresh and dried samples were developed based on selected image textures using machine learning algorithms belonging to the groups of Bayes, Trees, Lazy, Functions, and Meta. For both the 'Mejhoul' and 'Boufeggous' cultivars, models built using Random Forest from the group of Trees turned out to be accurate and successful. The average classification accuracy for fresh, convective-dried, and infrared-dried 'Mejhoul' reached 99.33%, whereas fresh, convective-dried, and infrared-dried samples of 'Boufeggous' were distinguished with an average accuracy of 94.33%. In the case of both cultivars and each model, the higher correctness of discrimination was between fresh and infrared-dried samples, whereas the highest number of misclassified cases occurred between fresh and convective-dried fruit. Thus, the developed procedure may be considered an innovative approach to the non-destructive assessment of drying impact on the external quality characteristics of date palm fruit.

8.
Front Mol Biosci ; 11: 1395721, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38872916

RESUMO

Background: Head and Neck Squamous Cell Carcinoma (HNSCC) is the seventh most highly prevalent cancer type worldwide. Early detection of HNSCC is one of the important challenges in managing the treatment of the cancer patients. Existing techniques for detecting HNSCC are costly, expensive, and invasive in nature. Methods: In this study, we aimed to address this issue by developing classification models using machine learning and deep learning techniques, focusing on single-cell transcriptomics to distinguish between HNSCC and normal samples. Furthermore, we built models to classify HNSCC samples into HPV-positive (HPV+) and HPV-negative (HPV-) categories. In this study, we have used GSE181919 dataset, we have extracted 20 primary cancer (HNSCC) samples, and 9 normal tissues samples. The primary cancer samples contained 13 HPV- and 7 HPV+ samples. The models developed in this study have been trained on 80% of the dataset and validated on the remaining 20%. To develop an efficient model, we performed feature selection using mRMR method to shortlist a small number of genes from a plethora of genes. We also performed Gene Ontology (GO) enrichment analysis on the 100 shortlisted genes. Results: Artificial Neural Network based model trained on 100 genes outperformed the other classifiers with an AUROC of 0.91 for HNSCC classification for the validation set. The same algorithm achieved an AUROC of 0.83 for the classification of HPV+ and HPV- patients on the validation set. In GO enrichment analysis, it was found that most genes were involved in binding and catalytic activities. Conclusion: A software package has been developed in Python which allows users to identify HNSCC in patients along with their HPV status. It is available at https://webs.iiitd.edu.in/raghava/hnscpred/.

9.
Foods ; 13(9)2024 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-38731678

RESUMO

The profile of secondary metabolites present in the apple cuticular layer is not only characteristic of a particular apple cultivar; it also dynamically reflects various external factors in the growing environment. In this study, the possibility of authenticating apple samples by analyzing their cuticular layer extracts was investigated. Ultra-high-performance liquid chromatography coupled with high-resolution tandem mass spectrometry (UHPLC-HRMS/MS) was employed for obtaining metabolomic fingerprints. A total of 274 authentic apple samples from four cultivars harvested in the Czech Republic and Poland between 2020 and 2022 were analyzed. The complex data generated, processed using univariate and multivariate statistical methods, enabled the building of classification models to distinguish apple cultivars as well as their geographical origin. The models showed very good performance in discriminating Czech and Polish samples for three out of four cultivars: "Gala", "Golden Delicious" and "Idared". Moreover, the validity of the models was tested over several harvest seasons. In addition to metabolites of the triterpene biosynthetic pathway, the diagnostic markers were mainly wax esters. "Jonagold", which is known to be susceptible to mutations, was the only cultivar for which an unambiguous classification of geographical origin was not possible.

10.
J Cheminform ; 16(1): 43, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622648

RESUMO

Multiple metrics are used when assessing and validating the performance of quantitative structure-activity relationship (QSAR) models. In the case of binary classification, balanced accuracy is a metric to assess the global performance of such models. In contrast to accuracy, balanced accuracy does not depend on the respective prevalence of the two categories in the test set that is used to validate a QSAR classifier. As such, balanced accuracy is used to overcome the effect of imbalanced test sets on the model's perceived accuracy. Matthews' correlation coefficient (MCC), an alternative global performance metric, is also known to mitigate the imbalance of the test set. However, in contrast to the balanced accuracy, MCC remains dependent on the respective prevalence of the predicted categories. For simplicity, the rest of this work is based on the positive prevalence. The MCC value may be underestimated at high or extremely low positive prevalence. It contributes to more challenging comparisons between experiments using test sets with different positive prevalences and may lead to incorrect interpretations. The concept of balanced metrics beyond balanced accuracy is, to the best of our knowledge, not yet described in the cheminformatic literature. Therefore, after describing the relevant literature, this manuscript will first formally define a confusion matrix, sensitivity and specificity and then present, with synthetic data, the danger of comparing performance metrics under nonconstant prevalence. Second, it will demonstrate that balanced accuracy is the performance metric accuracy calibrated to a test set with a positive prevalence of 50% (i.e., balanced test set). This concept of balanced accuracy will then be extended to the MCC after showing its dependency on the positive prevalence. Applying the same concept to any other performance metric and widening it to the concept of calibrated metrics will then be briefly discussed. We will show that, like balanced accuracy, any balanced performance metric may be expressed as a function of the well-known values of sensitivity and specificity. Finally, a tale of two MCCs will exemplify the use of this concept of balanced MCC versus MCC with four use cases using synthetic data. SCIENTIFIC CONTRIBUTION: This work provides a formal, unified framework for understanding prevalence dependence in model validation metrics, deriving balanced metric expressions beyond balanced accuracy, and demonstrating their practical utility for common use cases. In contrast to prior literature, it introduces the derived confusion matrix to express metrics as functions of sensitivity, specificity and prevalence without needing additional coefficients. The manuscript extends the concept of balanced metrics to Matthews' correlation coefficient and other widely used performance indicators, enabling robust comparisons under prevalence shifts.

11.
Int J Mol Sci ; 25(8)2024 Apr 13.
Artigo em Inglês | MEDLINE | ID: mdl-38673888

RESUMO

Urease, a pivotal enzyme in nitrogen metabolism, plays a crucial role in various microorganisms, including the pathogenic Helicobacter pylori. Inhibiting urease activity offers a promising approach to combating infections and associated ailments, such as chronic kidney diseases and gastric cancer. However, identifying potent urease inhibitors remains challenging due to resistance issues that hinder traditional approaches. Recently, machine learning (ML)-based models have demonstrated the ability to predict the bioactivity of molecules rapidly and effectively. In this study, we present ML models designed to predict urease inhibitors by leveraging essential physicochemical properties. The methodological approach involved constructing a dataset of urease inhibitors through an extensive literature search. Subsequently, these inhibitors were characterized based on physicochemical properties calculations. An exploratory data analysis was then conducted to identify and analyze critical features. Ultimately, 252 classification models were trained, utilizing a combination of seven ML algorithms, three attribute selection methods, and six different strategies for categorizing inhibitory activity. The investigation unveiled discernible trends distinguishing urease inhibitors from non-inhibitors. This differentiation enabled the identification of essential features that are crucial for precise classification. Through a comprehensive comparison of ML algorithms, tree-based methods like random forest, decision tree, and XGBoost exhibited superior performance. Additionally, incorporating the "chemical family type" attribute significantly enhanced model accuracy. Strategies involving a gray-zone categorization demonstrated marked improvements in predictive precision. This research underscores the transformative potential of ML in predicting urease inhibitors. The meticulous methodology outlined herein offers actionable insights for developing robust predictive models within biochemical systems.


Assuntos
Inibidores Enzimáticos , Aprendizado de Máquina , Urease , Urease/antagonistas & inibidores , Urease/química , Urease/metabolismo , Inibidores Enzimáticos/química , Inibidores Enzimáticos/farmacologia , Helicobacter pylori/enzimologia , Helicobacter pylori/efeitos dos fármacos , Algoritmos , Humanos
12.
J Imaging Inform Med ; 37(4): 1752-1766, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38429562

RESUMO

Breast cancer is recognized as a prominent cause of cancer-related mortality among women globally, emphasizing the critical need for early diagnosis resulting improvement in survival rates. Current breast cancer diagnostic procedures depend on manual assessments of pathological images by medical professionals. However, in remote or underserved regions, the scarcity of expert healthcare resources often compromised the diagnostic accuracy. Machine learning holds great promise for early detection, yet existing breast cancer screening algorithms are frequently characterized by significant computational demands, rendering them unsuitable for deployment on low-processing-power mobile devices. In this paper, a real-time automated system "Auto-BCS" is introduced that significantly enhances the efficiency of early breast cancer screening. The system is structured into three distinct phases. In the initial phase, images undergo a pre-processing stage aimed at noise reduction. Subsequently, feature extraction is carried out using a lightweight and optimized deep learning model followed by extreme gradient boosting classifier, strategically employed to optimize the overall performance and prevent overfitting in the deep learning model. The system's performance is gauged through essential metrics, including accuracy, precision, recall, F1 score, and inference time. Comparative evaluations against state-of-the-art algorithms affirm that Auto-BCS outperforms existing models, excelling in both efficiency and processing speed. Computational efficiency is prioritized by Auto-BCS, making it particularly adaptable to low-processing-power mobile devices. Comparative assessments confirm the superior performance of Auto-BCS, signifying its potential to advance breast cancer screening technology.


Assuntos
Algoritmos , Neoplasias da Mama , Detecção Precoce de Câncer , Humanos , Neoplasias da Mama/patologia , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/diagnóstico por imagem , Feminino , Detecção Precoce de Câncer/métodos , Interpretação de Imagem Assistida por Computador/métodos , Aprendizado Profundo
13.
Environ Sci Technol ; 57(49): 20636-20646, 2023 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-38011382

RESUMO

Cyanobacterial harmful algal blooms (CyanoHABs) pose serious risks to inland water resources. Despite advancements in our understanding of associated environmental factors and modeling efforts, predicting CyanoHABs remains challenging. Leveraging an integrated water quality data collection effort in Iowa lakes, this study aimed to identify factors associated with hazardous microcystin levels and develop one-week-ahead predictive classification models. Using water samples from 38 Iowa lakes collected between 2018 and 2021, feature selection was conducted considering both linear and nonlinear properties. Subsequently, we developed three model types (Neural Network, XGBoost, and Logistic Regression) with different sampling strategies using the nine selected variables (mcyA_M, TKN, % hay/pasture, pH, mcyA_M:16S, % developed, DOC, dewpoint temperature, and ortho-P). Evaluation metrics demonstrated the strong performance of the Neural Network with oversampling (ROC-AUC 0.940, accuracy 0.861, sensitivity 0.857, specificity 0.857, LR+ 5.993, and 1/LR- 5.993), as well as the XGBoost with downsampling (ROC-AUC 0.944, accuracy 0.831, sensitivity 0.928, specificity 0.833, LR+ 5.557, and 1/LR- 11.569). This study exhibited the intricacies of modeling with limited data and class imbalances, underscoring the importance of continuous monitoring and data collection to improve predictive accuracy. Also, the methodologies employed can serve as meaningful references for researchers tackling similar challenges in diverse environments.


Assuntos
Cianobactérias , Proliferação Nociva de Algas , Lagos/microbiologia , Iowa
14.
Biomedicines ; 11(10)2023 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-37892978

RESUMO

This research aims to enhance the classification and prediction of ischemic heart diseases using machine learning techniques, with a focus on resource efficiency and clinical applicability. Specifically, we introduce novel non-invasive indicators known as Campello de Souza features, which require only a tensiometer and a clock for data collection. These features were evaluated using a comprehensive dataset of heart disease cases from a machine learning data repository. Our findings highlight the ability of machine learning algorithms to not only streamline diagnostic procedures but also reduce diagnostic errors and the dependency on extensive clinical testing. Three key features-mean arterial pressure, pulsatile blood pressure index, and resistance-compliance indicator-were found to significantly improve the accuracy of machine learning algorithms in binary heart disease classification. Logistic regression achieved the highest average accuracy among the examined classifiers when utilizing these features. While such novel indicators contribute substantially to the classification process, they should be integrated into a broader diagnostic framework that includes comprehensive patient evaluations and medical expertise. Therefore, the present study offers valuable insights for leveraging data science techniques in the diagnosis and management of cardiovascular diseases.

15.
Data Brief ; 50: 109610, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37808538

RESUMO

This paper presents a semi-automated, scalable, and homologous methodology towards IoT implemented in Python for extracting and integrating images in pedestrian and motorcyclist areas on the road for constructing a multiclass object classifier. It consists of two stages. The first stage deals with creating a non-debugged data set by acquiring images related to the semantic context previously mentioned, using an embedded device connected 24/7 via Wi-Fi to a free and public CCTV service in Medellin, Colombia. Through artificial vision techniques, and automatically performs a comparative chronological analysis to download the images observed by 80 cameras that report data asynchronously. The second stage proposes two algorithms focused on debugging the previously obtained data set. The first one facilitates the user in labeling the data set not debugged through Regions of Interest (ROI) and hotkeys. It decomposes the information in the nth image of the data set in the same dictionary to store it in a binary Pickle file. The second one is nothing more than an observer of the classification performed by the user through the first algorithm to allow the user to verify if the information contained in the Pickle file built is correct.

16.
Food Res Int ; 172: 113181, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37689933

RESUMO

The colour of the different Port wine styles and indication of age (IOA) categories is a distinctive quality parameter influenced by the grapes and ageing process. The impact of Port wine styles and IOA on phenolic composition is mostly unknown. This work aims to study the chromatic characteristics (CIELab) and their relation with the phenolic composition of White, Tawny, and Ruby Port wines and evaluate the feasibility of its utilisation for their discrimination. Port wine styles and IOA categories can be discriminated by their chromatic characteristics, using different data analysis models. The higher b* values, corresponding to the brownish/yellowish colour of Tawny and White Ports belonging to higher IOA categories, seem more related to the sugar browning than the oxidative change in phenolic compounds. However, this last process is essential for the red colour (a*) decrease of Tawny Port wines with higher IOA.


Assuntos
Vinho , Análise de Dados , Fenóis
17.
Molecules ; 28(15)2023 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-37570667

RESUMO

This study aimed to develop an analytical method to determine the geographical origin of Moroccan Argan oil through near-infrared (NIR) or mid-infrared (MIR) spectroscopic fingerprints. However, the classification may be problematic due to the spectral similarity of the components in the samples. Therefore, unsupervised and supervised classification methods-including principal component analysis (PCA), Partial Least Squares-Discriminant Analysis (PLS-DA) and Soft Independent Modeling of Class Analogy (SIMCA)-were evaluated to distinguish between Argan oils from four regions. The spectra of 93 samples were acquired and preprocessed using both standard preprocessing methods and multivariate filters, such as External Parameter Orthogonalization, Generalized Least Squares Weighting and Orthogonal Signal Correction, to improve the models. Their accuracy, precision, sensitivity, and selectivity were used to evaluate the performance of the models. SIMCA and PLS-DA models generated after standard preprocessing failed to correctly classify all samples. However, successful models were produced after using multivariate filters. The NIR and MIR classification models show an equivalent accuracy. The PLS-DA models outperformed the SIMCA with 100% accuracy, specificity, sensitivity and precision. In conclusion, the studied multivariate filters are applicable on the spectroscopic fingerprints to geographically identify the Argan oils in routine monitoring, significantly reducing analysis costs and time.

19.
Mol Divers ; 2023 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-37479824

RESUMO

In this study, we built classification models using machine learning techniques to predict the bioactivity of non-covalent inhibitors of Bruton's tyrosine kinase (BTK) and to provide interpretable and transparent explanations for these predictions. To achieve this, we gathered data on BTK inhibitors from the Reaxys and ChEMBL databases, removing compounds with covalent bonds and duplicates to obtain a dataset of 3895 inhibitors of non-covalent. These inhibitors were characterized using MACCS fingerprints and Morgan fingerprints, and four traditional machine learning algorithms (decision trees (DT), random forests (RF), support vector machines (SVM), and extreme gradient boosting (XGBoost)) were used to build 16 classification models. In addition, four deep learning models were developed using deep neural networks (DNN). The best model, Model D_4, which was built using XGBoost and MACCS fingerprints, achieved an accuracy of 94.1% and a Matthews correlation coefficient (MCC) of 0.75 on the test set. To provide interpretable explanations, we employed the SHAP method to decompose the predicted values into the contributions of each feature. We also used K-means dimensionality reduction and hierarchical clustering to visualize the clustering effects of molecular structures of the inhibitors. The results of this study were validated using crystal structures, and we found that the interaction between the BTK amino acid residue and the important features of clustered scaffold was consistent with the known properties of the complex crystal structures. Overall, our models demonstrated high predictive ability and a qualitative model can be converted to a quantitative model to some extent by SHAP, making them valuable for guiding the design of new BTK inhibitors with desired activity.

20.
Int J Mol Sci ; 24(12)2023 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-37373474

RESUMO

There is early evidence of extraocular systemic signals effecting function and morphology in neovascular age-related macular degeneration (nAMD). The prospective, cross-sectional BIOMAC study is an explorative investigation of peripheral blood proteome profiles and matched clinical features to uncover systemic determinacy in nAMD under anti-vascular endothelial growth factor intravitreal therapy (anti-VEGF IVT). It includes 46 nAMD patients stratified by the level of disease control under ongoing anti-VEGF treatment. Proteomic profiles in peripheral blood samples of every patient were detected with LC-MS/MS mass spectrometry. The patients underwent extensive clinical examination with a focus on macular function and morphology. In silico analysis includes unbiased dimensionality reduction and clustering, a subsequent annotation of clinical features, and non-linear models for recognition of underlying patterns. The model assessment was performed using leave-one-out cross validation. The findings provide an exploratory demonstration of the link between systemic proteomic signals and macular disease pattern using and validating non-linear classification models. Three main results were obtained: (1) Proteome-based clustering identifies two distinct patient subclusters with the smaller one (n = 10) exhibiting a strong signature for oxidative stress response. Matching the relevant meta-features on the individual patient's level identifies pulmonary dysfunction as an underlying health condition in these patients. (2) We identify biomarkers for nAMD disease features with Aldolase C as a putative factor associated with superior disease control under ongoing anti-VEGF treatment. (3) Apart from this, isolated protein markers are only weakly correlated with nAMD disease expression. In contrast, applying a non-linear classification model identifies complex molecular patterns hidden in a high number of proteomic dimensions determining macular disease expression. In conclusion, so far unconsidered systemic signals in the peripheral blood proteome contribute to the clinically observed phenotype of nAMD, which should be examined in future translational research on AMD.


Assuntos
Inibidores da Angiogênese , Degeneração Macular , Humanos , Inibidores da Angiogênese/uso terapêutico , Ranibizumab/uso terapêutico , Fator A de Crescimento do Endotélio Vascular/metabolismo , Proteoma , Estudos Prospectivos , Cromatografia Líquida , Estudos Transversais , Proteômica , Espectrometria de Massas em Tandem , Degeneração Macular/tratamento farmacológico , Fenótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA