Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 37
Filter
1.
Sci Data ; 11(1): 203, 2024 Feb 14.
Article in English | MEDLINE | ID: mdl-38355591

ABSTRACT

This study entailed a comprehensive GC‒MS analysis conducted on 121 patient samples to generate a clinical breathomics dataset. Breath molecules, indicative of diverse conditions such as psychological and pathological states and the microbiome, were of particular interest due to their non-invasive nature. The highlighted noninvasive approach for detecting these breath molecules significantly enhances diagnostic and monitoring capacities. This dataset cataloged volatile organic compounds (VOCs) from the breath of individuals with asthma, bronchiectasis, and chronic obstructive pulmonary disease. Uniform and consistent sample collection protocols were strictly adhered to during the accumulation of this extensive dataset, ensuring its reliability. It encapsulates extensive human clinical breath molecule data pertinent to three specific diseases. This consequential clinical breathomics dataset is a crucial resource for researchers and clinicians in identifying and exploring important compounds within the patient's breath, thereby augmenting future diagnostic and therapeutic initiatives.


Subject(s)
Asthma , Breath Tests , Bronchiectasis , Pulmonary Disease, Chronic Obstructive , Volatile Organic Compounds , Humans , Asthma/diagnosis , Breath Tests/methods , Exhalation , Reproducibility of Results , Volatile Organic Compounds/analysis , Gas Chromatography-Mass Spectrometry , Bronchiectasis/diagnosis , Pulmonary Disease, Chronic Obstructive/diagnosis
2.
NPJ Digit Med ; 7(1): 31, 2024 Feb 08.
Article in English | MEDLINE | ID: mdl-38332372

ABSTRACT

The Motor Disorder Society's Unified Parkinson's Disease Rating Scale (MDS-UPDRS) is designed to assess bradykinesia, the cardinal symptoms of Parkinson's disease (PD). However, it cannot capture the all-day variability of bradykinesia outside the clinical environment. Here, we introduce FastEval Parkinsonism ( https://fastevalp.cmdm.tw/ ), a deep learning-driven video-based system, providing users to capture keypoints, estimate the severity, and summarize in a report. Leveraging 840 finger-tapping videos from 186 individuals (103 patients with Parkinson's disease (PD), 24 participants with atypical parkinsonism (APD), 12 elderly with mild parkinsonism signs (MPS), and 47 healthy controls (HCs)), we employ a dilated convolution neural network with two data augmentation techniques. Our model achieves acceptable accuracies (AAC) of 88.0% and 81.5%. The frequency-intensity (FI) value of thumb-index finger distance was indicated as a pivotal hand parameter to quantify the performance. Our model also shows the usability for multi-angle videos, tested in an external database enrolling over 300 PD patients.

3.
IEEE J Biomed Health Inform ; 28(2): 1066-1077, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38064333

ABSTRACT

We present PathoOpenGait, a cloud-based platform for comprehensive gait analysis. Gait assessment is crucial in neurodegenerative diseases such as Parkinson's and multiple system atrophy, yet current techniques are neither affordable nor efficient. PathoOpenGait utilizes 2D and 3D data from a binocular 3D camera for monitoring and analyzing gait parameters. Our algorithms, including a semi-supervised learning-boosted neural network model for turn time estimation and deterministic algorithms to estimate gait parameters, were rigorously validated on annotated gait records, demonstrating high precision and consistency. We further demonstrate PathoOpenGait's applicability in clinical settings by analyzing gait trials from Parkinson's patients and healthy controls. PathoOpenGait is the first open-source, cloud-based system for gait analysis, providing a user-friendly tool for continuous patient care and monitoring. It offers a cost-effective and accessible solution for both clinicians and patients, revolutionizing the field of gait assessment. PathoOpenGait is available at https://pathoopengait.cmdm.tw.


Subject(s)
Gait Analysis , Parkinson Disease , Humans , Gait , Algorithms , Supervised Machine Learning
4.
Gut Microbes ; 15(2): 2288200, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38038385

ABSTRACT

Past studies have confirmed the etiologies of bacterial extracellular vesicles (BEVs) in various diseases, including inflammatory bowel disease (IBD) and colorectal cancer (CRC). This study aimed to investigate the characteristics of stool-derived bacterial extracellular vesicles (stBEVs) and discuss their association with stool bacteria. First, three culture models - gram-positive (G+)BcBEVs (from B.coagulans), gram-negative (G-)EcBEVs (from E.coli), and eukaryotic cell-derived EVs (EEV, from Colo205 cell line) - were used to benchmark various fractions of stEVs separated from optimized density gradient approach (DG). As such, WB, TEM, NTA, and functional assays, were utilized to analyze properties and distribution of EVs in cultured and stool samples. Stool samples from healthy individuals were interrogated using the approaches developed. Results demonstrated successful separation of most stBEVs (within DG fractions 8&9) from stEEVs (within DG fractions 5&6). Data also suggest the presence of stBEV DNA within vesicles after extraction of BEV DNA and DNase treatment. Metagenomic analysis from full-length (FL) region sequencing results confirmed significant differences between stool bacteria and stBEVs. Significantly, F8&9 and the pooled sample (F5-F9) exhibited a similar microbial composition, indicating that F8&9 were enriched in most stBEV species, primarily dominated by Firmicutes (89.6%). However, F5&6 and F7 still held low-density BEVs with a significantly higher proportion of Proteobacteria (20.5% and 40.7%, respectively) and Bacteroidetes (24% and 13.7%, respectively), considerably exceeding the proportions in stool and F8&9. Importantly, among five healthy individuals, significant variations were observed in the gut microbiota composition of their respective stBEVs, indicating the potential of stBEVs as a target for personalized medicine and research.


Subject(s)
Extracellular Vesicles , Gastrointestinal Microbiome , Microbiota , Humans , Gastrointestinal Microbiome/genetics , Microbiota/genetics , Feces/microbiology , Bacteria/genetics , RNA, Ribosomal, 16S/genetics , DNA
5.
Global Health ; 19(1): 57, 2023 08 14.
Article in English | MEDLINE | ID: mdl-37580752

ABSTRACT

BACKGROUND: Co-development alliances and capital-raising activities are essential supports for biopharmaceutical innovation. During the initial outbreak of the COVID-19, the level of these business activities has increased greatly. Yet the magnitude, direction, and duration of the trend remain ambiguous. Real-time real-world data are needed to inform strategic redirections and industrial policies. METHODS: This observational study aims to characterize trends in global biopharma innovation activities throughout the global pandemic outbreak. Our extensive deal dataset is retrieved from the commercial database GlobalData (12,866 partnership deals and 32,250 fundraising deals announced between 2011 and 2022). We perform Chi-squared tests to examine the changes in qualitative deal attributes during and beyond the outbreak. Our deal-level sample is further aggregated into category-level panel data according to deal characteristics such as therapy area, molecule type, and development phase. We run a series of regressions to examine how the monthly investment amount raised in each category changed with the onset of the pandemic, controlling for the US Federal funds rate. RESULTS: The temporary surge of partnership and capital-raising activities was associated with the increase in infectious disease-related deals. Academic and government institutions played an increased role in supporting COVID-related co-development partnerships in 2020, and biopharma ventures had been securing more investments in the capital market throughout 2020 and 2021. The partnership and investment boom did not last till the later pandemic in 2022. The most significant and enduring trend was the shifting focus toward discovery-phase investments. Our regression model reveals that the discovery-phase fundraising deals did not suffer from a bounce back in the late pandemic, consistent with a persistent focus on early innovation. CONCLUSIONS: Despite the reduced level of partnership and fundraising activities during 2022, we observe a lasting change in focus toward biopharmaceutical innovation after the pandemic outbreak. Our evidence suggests how entrepreneurs and investors should allocate resources in response to the post-pandemic tight monetary environment. We also suggest the need for policy interventions in financing private/public co-development partnerships and non-COVID-related technologies, to maintain their research capacity and generate breakthroughs when faced with unforeseen diseases.


Subject(s)
COVID-19 , Fund Raising , Humans , COVID-19/epidemiology , COVID-19/prevention & control , Organizations , Public-Private Sector Partnerships , Commerce
6.
Bioinform Adv ; 3(1): vbad061, 2023.
Article in English | MEDLINE | ID: mdl-37234699

ABSTRACT

Motivation: Liquid chromatography coupled with mass spectrometry (LC-MS) is widely used in metabolomics studies, while HILIC LC-MS is particularly suited for polar metabolites. Determining an optimized mobile phase and developing a proper liquid chromatography method tend to be laborious, time-consuming and empirical. Results: We developed a containerized web tool providing a workflow to quickly determine the optimized mobile phase by batch-evaluating chromatography peaks for metabolomics LC-MS studies. A mass chromatographic quality value, an asymmetric factor, and the local maximum intensity of the extracted ion chromatogram were calculated to determine the number of peaks and peak retention time. The optimal mobile phase can be quickly determined by selecting the mobile phase that produces the largest number of resolved peaks. Moreover, the workflow enables one to automatically process the repeats by evaluating chromatography peaks and determining the retention time of large standards. This workflow was validated with 20 chemical standards and successfully constructed a reference library of 571 metabolites for the HILIC LC-MS platform. Availability and implementation: MetaMOPE is freely available at https://metamope.cmdm.tw. Source code and installation instructions are available on GitHub: https://github.com/CMDM-Lab/MetaMOPE. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

7.
ACS Omega ; 8(18): 15854-15864, 2023 May 09.
Article in English | MEDLINE | ID: mdl-37179635

ABSTRACT

Since the first food database was released over one hundred years ago, food databases have become more diversified, including food composition databases, food flavor databases, and food chemical compound databases. These databases provide detailed information about the nutritional compositions, flavor molecules, and chemical properties of various food compounds. As artificial intelligence (AI) is becoming popular in every field, AI methods can also be applied to food industry research and molecular chemistry. Machine learning and deep learning are valuable tools for analyzing big data sources such as food databases. Studies investigating food compositions, flavors, and chemical compounds with AI concepts and learning methods have emerged in the past few years. This review illustrates several well-known food databases, focusing on their primary contents, interfaces, and other essential features. We also introduce some of the most common machine learning and deep learning methods. Furthermore, a few studies related to food databases are given as examples, demonstrating their applications in food pairing, food-drug interactions, and molecular modeling. Based on the results of these applications, it is expected that the combination of food databases and AI will play an essential role in food science and food chemistry.

8.
Anal Chem ; 95(6): 3317-3324, 2023 02 14.
Article in English | MEDLINE | ID: mdl-36724516

ABSTRACT

Mass spectrometry imaging (MSI) is a powerful tool that can be used to simultaneously investigate the spatial distribution of different molecules in samples. However, it is difficult to comprehensively analyze complex biological systems with only a single analytical technique due to different analytical properties and application limitations. Therefore, many analytical methods have been combined to extend data interpretation, evaluate data credibility, and facilitate data mining to explore important temporal and spatial relationships in biological systems. Image registration is an initial and critical step for multimodal imaging data fusion. However, the image registration of multimodal images is not a simple task. The property difference between each data modality may include spatial resolution, image characteristics, or both. The image registrations between MSI and different imaging techniques are often achieved indirectly through histology. Many methods exist for image registration between MSI data and histological images. However, most of them are manual or semiautomatic and have their prerequisites. Here, we built MSI Registrar (MSIr), a web service for automatic registration between MSI and histology. It can help to reduce subjectivity and processing time efficiently. MSIr provides an interface for manually selecting region of interests from histological images; the user selects regions of interest to extract the corresponding spectrum indices in MSI data. In the performance evaluation, MSIr can quickly map MSI data to histological images and help pinpoint molecular components at specific locations in tissues. Most registrations were adequate and were without excessive shifts. MSIr is freely available at https://msir.cmdm.tw and https://github.com/CMDM-Lab/MSIr.


Subject(s)
Diagnostic Imaging , Histological Techniques , Mass Spectrometry/methods , Data Mining
9.
Brief Funct Genomics ; 22(3): 291-301, 2023 05 18.
Article in English | MEDLINE | ID: mdl-36723978

ABSTRACT

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was first detected in December 2019. As of mid-2021, the delta variant was the primary type; however, in January 2022, the omicron (BA.1) variant rapidly spread and became the dominant type in the United States. In June 2022, its subvariants surpassed previous variants in different temporal and spatial situations. To investigate the high transmissibility of omicron variants, we assessed the complex of spike protein 1 receptor-binding domain (S1RBD) and human angiotensin-converting enzyme 2 (hACE2) from the Protein Data Bank (6m0j, 7a91, 7mjn, 7v80, 7v84, 7v8b, 7wbl and 7xo9) and directly mutated specific amino acids to simulate several variants, including variants of concern (alpha, beta, gamma, delta), variants of interest (delta plus, epsilon, lambda, mu, mu without R346K) and omicron variants (BA.1, BA.2, BA.2.12.1, BA.4, BA.5). Molecular dynamics (MD) simulations for 100 ns under physiological conditions were then performed. We found that the omicron S1RBD-hACE2 complexes become more compact with increases in hydrogen-bond interactions at the interface, which is related to the transmissibility of SARS-CoV-2. Moreover, the relaxation time of hydrogen bonds is relatively short among the omicron variants, which implies that the interface conformation alterations are fast. From the molecular perspective, PHE486 and TYR501 in omicron S1RBDs need to involve hydrogen bonds and hydrophobic interactions on the interface. Our study provides structural features of the dominant variants that explain the evolution trend and their increased contagiousness and could thus also shed light on future variant changes.


Subject(s)
Angiotensin-Converting Enzyme 2 , COVID-19 , Humans , Angiotensin-Converting Enzyme 2/genetics , Hydrogen Bonding , SARS-CoV-2 , Spike Glycoprotein, Coronavirus/genetics
10.
J Formos Med Assoc ; 121(12): 2649-2652, 2022 Dec.
Article in English | MEDLINE | ID: mdl-36031487

ABSTRACT

New psychoactive substances (NPS) have increasingly been illegally synthesized and used around the world in recent years. Due to the large volume and the variety of NPS, most do not have sufficient information about their addictive potential and harmful effects to human subjects. This makes it difficult to evaluate these potential substances of abuse. This study aims to build a database based on Taiwan's controlled substances, to provide quick structural and pharmacological feedback. Taiwan Controlled Substances Database (TCSD) includes the collection of controlled substances, relevant experimental and structural information, as well as computational features such as molecular fingerprints and descriptors. Two types of structural search were added: substructure search and topological fingerprint similarity search. A web framework was used to enhance accessibility and usability (https://cs2search.cmdm.tw).


Subject(s)
Controlled Substances , Humans , Taiwan , Databases, Factual
11.
Sci Data ; 9(1): 521, 2022 08 26.
Article in English | MEDLINE | ID: mdl-36028515

ABSTRACT

Rare skin diseases include more than 800 diseases affecting more than 6.8 million patients worldwide. However, only 100 drugs have been developed for treating rare skin diseases in the past 38 years. To investigate potential treatments through drug repurposing for rare skin diseases, it is necessary to have a well-organized database to link all known disease causes, mechanisms, and related information to accelerate the process. Drug repurposing provides less expensive and faster potential options to develop treatments for known diseases. In this work, we designed and constructed a rare skin disease database (RSDB) as a disease-centered information depository to facilitate repurposing drug candidates for rare skin diseases. We collected and integrated associated genes, chemicals, and phenotypes into a network connected by pairwise relationships between different components for rare skin diseases. The RSDB covers 891 rare skin diseases defined by the Orphanet and GARD databases. The organized network for each rare skin disease comprises associated genes, phenotypes, and chemicals with the corresponding connections. The RSDB is available at https://rsdb.cmdm.tw .


Subject(s)
Rare Diseases , Skin Diseases , Databases, Factual , Drug Repositioning , Humans
12.
Brief Bioinform ; 23(5)2022 09 20.
Article in English | MEDLINE | ID: mdl-35945035

ABSTRACT

Neural network (NN)-based protein modeling methods have improved significantly in recent years. Although the overall accuracy of the two non-homology-based modeling methods, AlphaFold and RoseTTAFold, is outstanding, their performance for specific protein families has remained unexamined. G-protein-coupled receptor (GPCR) proteins are particularly interesting since they are involved in numerous pathways. This work directly compares the performance of these novel deep learning-based protein modeling methods for GPCRs with the most widely used template-based software-Modeller. We collected the experimentally determined structures of 73 GPCRs from the Protein Data Bank. The official AlphaFold repository and RoseTTAFold web service were used with default settings to predict five structures of each protein sequence. The predicted models were then aligned with the experimentally solved structures and evaluated by the root-mean-square deviation (RMSD) metric. If only looking at each program's top-scored structure, Modeller had the smallest average modeling RMSD of 2.17 Å, which is better than AlphaFold's 5.53 Å and RoseTTAFold's 6.28 Å, probably since Modeller already included many known structures as templates. However, the NN-based methods (AlphaFold and RoseTTAFold) outperformed Modeller in 21 and 15 out of the 73 cases with the top-scored model, respectively, where no good templates were available for Modeller. The larger RMSD values generated by the NN-based methods were primarily due to the differences in loop prediction compared to the crystal structures.


Subject(s)
Receptors, G-Protein-Coupled , Software , Databases, Protein , Models, Molecular , Protein Conformation , Receptors, G-Protein-Coupled/chemistry
13.
Pediatr Neonatol ; 63(5): 468-473, 2022 09.
Article in English | MEDLINE | ID: mdl-35641386

ABSTRACT

BACKGROUND: Omphalocele and gastroschisis are the two most common congenital abdominal wall defects; however, no previous study has focused on gastrointestinal and hepatobiliary tract malformations in these two conditions. This study aimed to investigate the demographic characteristics, coexisting congenital gastrointestinal and hepatobiliary tract anomalies, hospital course, and outcomes of patients with gastroschisis and omphalocele. METHODS: This is retrospective chart review of all patients admitted to one tertiary medical center in Taiwan between January 1, 2000 and June 30, 2020 with a diagnosis of gastroschisis or omphalocele. The medical records were reviewed to obtain demographic data regarding coexisting gastrointestinal and hepatobiliary tract anomalies and outcomes. RESULTS: Of the 51 patients included, 21 had gastroschisis and 30 had omphalocele. Gastroschisis was associated with a significantly younger maternal age and a higher incidence of small for gestational age. Of the 30 patients with omphalocele, twelve had associated gastrointestinal and hepatobiliary anomalies. Seven of the 21 patients with gastroschisis had gastrointestinal anomalies, and none had hepatobiliary anomalies. Among the omphalocele patients, three (10%) had documented malrotation, and one developed midgut volvulus. Among gastroschisis patients, four patients (19%) had malrotation, and two developed midgut volvulus. There were no statistically significant differences in postoperative complications or mortality rates between those with and without gastrointestinal/hepatobiliary tract anomalies. CONCLUSION: The diversity of coexisting gastrointestinal and hepatobiliary tract anomalies is higher in the omphalocele than in gastroschisis. In addition, we demonstrate that patients with gastroschisis or omphalocele have a higher rate of intestinal malrotation and midgut volvulus.


Subject(s)
Gastroschisis , Hernia, Umbilical , Intestinal Volvulus , Gastroschisis/complications , Gastroschisis/diagnosis , Gastroschisis/epidemiology , Hernia, Umbilical/complications , Hernia, Umbilical/diagnosis , Hernia, Umbilical/epidemiology , Hospitals , Humans , Intestinal Volvulus/surgery , Retrospective Studies
14.
J Biol Chem ; 298(6): 101957, 2022 06.
Article in English | MEDLINE | ID: mdl-35452675

ABSTRACT

Japanese encephalitis is a mosquito-borne disease caused by the Japanese encephalitis virus (JEV) that is prevalent in Asia and the Western Pacific. Currently, there is no effective treatment for Japanese encephalitis. Curcumin (Cur) is a compound extracted from the roots of Curcuma longa, and many studies have reported its antiviral and anti-inflammatory activities. However, the high cytotoxicity and very low solubility of Cur limit its biomedical applications. In this study, Cur carbon quantum dots (Cur-CQDs) were synthesized by mild pyrolysis-induced polymerization and carbonization, leading to higher water solubility and lower cytotoxicity, as well as superior antiviral activity against JEV infection. We found that Cur-CQDs effectively bound to the E protein of JEV, preventing viral entry into the host cells. In addition, after continued treatment of JEV with Cur-CQDs, a mutant strain of JEV was evolved that did not support binding of Cur-CQDs to the JEV envelope. Using transmission electron microscopy, biolayer interferometry, and molecular docking analysis, we revealed that the S123R and K312R mutations in the E protein play a key role in binding Cur-CQDs. The S123 and K312 residues are located in structural domains II and III of the E protein, respectively, and are responsible for binding to receptors on and fusing with the cell membrane. Taken together, our results suggest that the E protein of flaviviruses represents a potential target for the development of CQD-based inhibitors to prevent or treat viral infections.


Subject(s)
Encephalitis Virus, Japanese , Encephalitis, Japanese , Quantum Dots , Animals , Antiviral Agents/pharmacology , Antiviral Agents/therapeutic use , Carbon , Encephalitis Virus, Japanese/chemistry , Encephalitis Virus, Japanese/genetics , Encephalitis, Japanese/drug therapy , Molecular Docking Simulation , Viral Envelope Proteins/metabolism
16.
Cancers (Basel) ; 14(3)2022 Jan 30.
Article in English | MEDLINE | ID: mdl-35158999

ABSTRACT

Exosomes participate in cell-cell communication by transferring molecular components between cells. Previous studies have shown that exosomal molecules derived from cancer cells and liquid biopsies can serve as biomarkers for cancer diagnosis and prognosis. The exploration of the molecules transferred by lung cancer-derived exosomes can advance the understanding of exosome-mediated signaling pathways and mechanisms. However, the molecular characterization and functional indications of exosomal proteins and lipids have not been comprehensively organized. This review thoroughly collected data concerning exosomal proteins and lipids from various lung cancer samples, including cancer cell lines and cancer patients. As potential diagnostic and prognostic biomarkers, exosomal proteins and lipids are available for clinical use in lung cancer. Potential therapeutic targets are mentioned for the future development of lung cancer therapy. Molecular functions implying their possible roles in exosome-mediated signaling are also discussed. Finally, we emphasized the importance and value of lung cancer stem cell-derived exosomes in lung cancer therapy. In summary, this review presents a comprehensive description of the protein and lipid composition and function of lung cancer-derived exosomes for lung cancer diagnosis, prognosis, and treatment.

17.
Sci Rep ; 12(1): 175, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34997034

ABSTRACT

Pharmaceutical patent analysis is the key to product protection for pharmaceutical companies. In patent claims, a Markush structure is a standard chemical structure drawing with variable substituents. Overlaps between apparently dissimilar Markush structures are nearly unrecognizable when the structures span a broad chemical space. We propose a quantum search-based method which performs an exact comparison between two non-enumerated Markush structures with a constraint satisfaction oracle. The quantum circuit is verified with a quantum simulator and the real effect of noise is estimated using a five-qubit superconductivity-based IBM quantum computer. The possibilities of measuring the correct states can be increased by improving the connectivity of the most computation intensive qubits. Depolarizing error is the most influential error. The quantum method to exactly compares two patents is hard to simulate classically and thus creates a quantum advantage in patent analysis.

18.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34498673

ABSTRACT

The key to generating the best deep learning model for predicting molecular property is to test and apply various optimization methods. While individual optimization methods from different past works outside the pharmaceutical domain each succeeded in improving the model performance, better improvement may be achieved when specific combinations of these methods and practices are applied. In this work, three high-performance optimization methods in the literature that have been shown to dramatically improve model performance from other fields are used and discussed, eventually resulting in a general procedure for generating optimized CNN models on different properties of molecules. The three techniques are the dynamic batch size strategy for different enumeration ratios of the SMILES representation of compounds, Bayesian optimization for selecting the hyperparameters of a model and feature learning using chemical features obtained by a feedforward neural network, which are concatenated with the learned molecular feature vector. A total of seven different molecular properties (water solubility, lipophilicity, hydration energy, electronic properties, blood-brain barrier permeability and inhibition) are used. We demonstrate how each of the three techniques can affect the model and how the best model can generally benefit from using Bayesian optimization combined with dynamic batch size tuning.


Subject(s)
Deep Learning , Bayes Theorem , Neural Networks, Computer , Solubility
19.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34530437

ABSTRACT

The trade-off between a machine learning (ML) and deep learning (DL) model's predictability and its interpretability has been a rising concern in central nervous system-related quantitative structure-activity relationship (CNS-QSAR) analysis. Many state-of-the-art predictive modeling failed to provide structural insights due to their black box-like nature. Lack of interpretability and further to provide easy simple rules would be challenging for CNS-QSAR models. To address these issues, we develop a protocol to combine the power of ML and DL to generate a set of simple rules that are easy to interpret with high prediction power. A data set of 940 market drugs (315 CNS-active, 625 CNS-inactive) with support vector machine and graph convolutional network algorithms were used. Individual ML/DL modeling methods were also constructed for comparison. The performance of these models was evaluated using an additional external dataset of 117 market drugs (42 CNS-active, 75 CNS-inactive). Fingerprint-split validation was adopted to ensure model stringency and generalizability. The resulting novel hybrid ensemble model outperformed other constituent traditional QSAR models with an accuracy of 0.96 and an F1 score of 0.95. With the power of the interpretability provided with this protocol, our model laid down a set of simple physicochemical rules to determine whether a compound can be a CNS drug using six sub-structural features. These rules displayed higher classification ability than classical guidelines, with higher specificity and more mechanistic insights than just for blood-brain barrier permeability. This hybrid protocol can potentially be used for other drug property predictions.


Subject(s)
Deep Learning , Blood-Brain Barrier , Machine Learning , Permeability , Support Vector Machine
20.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32501508

ABSTRACT

Aqueous solubility is the key property driving many chemical and biological phenomena and impacts experimental and computational attempts to assess those phenomena. Accurate prediction of solubility is essential and challenging, even with modern computational algorithms. Fingerprint-based, feature-based and molecular graph-based representations have all been used with different deep learning methods for aqueous solubility prediction. It has been clearly demonstrated that different molecular representations impact the model prediction and explainability. In this work, we reviewed different representations and also focused on using graph and line notations for modeling. In general, one canonical chemical structure is used to represent one molecule when computing its properties. We carefully examined the commonly used simplified molecular-input line-entry specification (SMILES) notation representing a single molecule and proposed to use the full enumerations in SMILES to achieve better accuracy. A convolutional neural network (CNN) was used. The full enumeration of SMILES can improve the presentation of a molecule and describe the molecule with all possible angles. This CNN model can be very robust when dealing with large datasets since no additional explicit chemistry knowledge is necessary to predict the solubility. Also, traditionally it is hard to use a neural network to explain the contribution of chemical substructures to a single property. We demonstrated the use of attention in the decoding network to detect the part of a molecule that is relevant to solubility, which can be used to explain the contribution from the CNN.


Subject(s)
Deep Learning , Water/chemistry , Algorithms , Neural Networks, Computer , Solubility
SELECTION OF CITATIONS
SEARCH DETAIL
...