Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 213
Filtrar
1.
Nat Comput Sci ; 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38730184

RESUMO

Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 µs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.

2.
Nat Commun ; 15(1): 1517, 2024 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-38409255

RESUMO

We investigate the potential of graph neural networks for transfer learning and improving molecular property prediction on sparse and expensive to acquire high-fidelity data by leveraging low-fidelity measurements as an inexpensive proxy for a targeted property of interest. This problem arises in discovery processes that rely on screening funnels for trading off the overall costs against throughput and accuracy. Typically, individual stages in these processes are loosely connected and each one generates data at different scale and fidelity. We consider this setup holistically and demonstrate empirically that existing transfer learning techniques for graph neural networks are generally unable to harness the information from multi-fidelity cascades. Here, we propose several effective transfer learning strategies and study them in transductive and inductive settings. Our analysis involves a collection of more than 28 million unique experimental protein-ligand interactions across 37 targets from drug discovery by high-throughput screening and 12 quantum properties from the dataset QMugs. The results indicate that transfer learning can improve the performance on sparse tasks by up to eight times while using an order of magnitude less high-fidelity training data. Moreover, the proposed methods consistently outperform existing transfer learning strategies for graph-structured data on drug discovery and quantum mechanics datasets.


Assuntos
Descoberta de Drogas , Aprendizagem , Ensaios de Triagem em Larga Escala , Redes Neurais de Computação , Aprendizado de Máquina
3.
Life Sci ; 335: 122244, 2023 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-37949208

RESUMO

High blood sugar and insulin insensitivity causes the lifelong chronic metabolic disease called Type 2 diabetes (T2D) which has a higher chance of developing different malignancies. T2D with comorbidities like Cancers can make normal medications for those disorders more difficult. There may be a significant correlation between comorbidities and have an impact on one another's health. These associations may be due to a number of direct and indirect mechanisms. Such molecular mechanisms that underpin T2D and cancer are yet unknown. However, the large volumes of data available on these diseases allowed us to use analytical tools for uncovering their interrelated pathways. Here, we tried to present a system for investigating potential comorbidity relationships between T2D and Cancer disease by looking at the molecular processes involved, analyzing a huge number of freely accessible transcriptomic datasets of various disorders using bioinformatics. Using semantic similarity and gene set enrichment analysis, we created an informatics pipeline that evaluates and integrates Gene Ontology (GO), expression of genes, and biological process data. We discovered genes that are common in T2D and Cancer along with molecular pathways and GOs. We compared the top 200 Differentially Expressed Genes (DEGs) from each selected T2D and cancer dataset and found the most significant common genes. Among all the common genes 13 genes were found most frequent. We also found 4 common GO terms: GO:0000003, GO:0000122, GO:0000165, and GO:0000278 among all the common GO terms between T2d and different cancers. Using these genes and GO term semantic similarity, we calculated the distance between these two diseases. The semantic similarity results of our study showed a higher association of Liver Cancer (LiC), Breast Cancer (BreC), Colorectal Cancer (CC), and Bladder Cancer (BlaC) with T2D. Furthermore we found KIF4A, NUSAP1, CENPF, CCNB1, TOP2A, CCNB2, RRM2, HMMR, NDC80, NCAPG, and IGFBP5 common hub proteins among different cancers correlated to T2D. AGE-RAGE signaling pathway in diabetic complications, Osteoclast differentiation, TNF signaling pathway, IL-17 signaling pathway, p53 signaling pathway, MAPK signaling pathway, Human T-cell leukemia virus 1 infection, and Non-alcoholic fatty liver disease are the 8 most significant pathways found among 18 common pathways between T2D and selected cancers. As a result of our technique, we now know more about disease pathways that are critical between T2D and cancer.


Assuntos
Diabetes Mellitus Tipo 2 , Neoplasias Hepáticas , Humanos , Diabetes Mellitus Tipo 2/genética , Neoplasias Hepáticas/patologia , Perfilação da Expressão Gênica/métodos , Transcriptoma , Comorbidade , Biologia Computacional/métodos , Cinesinas/genética
4.
Clin Transl Allergy ; 13(11): e12306, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-38006387

RESUMO

BACKGROUND: Not being well controlled by therapy with inhaled corticosteroids and long-acting ß2 agonist bronchodilators is a major concern for severe-asthma patients. The current treatment option for these patients is the use of biologicals such as anti-IgE treatment, omalizumab, as an add-on therapy. Despite the accepted use of omalizumab, patients do not always benefit from it. Therefore, there is a need to identify reliable biomarkers as predictors of omalizumab response. METHODS: Two novel computational algorithms, machine-learning based Recursive Ensemble Feature Selection (REFS) and rule-based algorithm Logic Explainable Networks (LEN), were used on open accessible mRNA expression data from moderate-to-severe asthma patients to identify genes as predictors of omalizumab response. RESULTS: With REFS, the number of features was reduced from 28,402 genes to 5 genes while obtaining a cross-validated accuracy of 0.975. The 5 responsiveness predictive genes encode the following proteins: Coiled-coil domain- containing protein 113 (CCDC113), Solute Carrier Family 26 Member 8 (SLC26A), Protein Phosphatase 1 Regulatory Subunit 3D (PPP1R3D), C-Type lectin Domain Family 4 member C (CLEC4C) and LOC100131780 (not annotated). The LEN algorithm found 4 identical genes with REFS: CCDC113, SLC26A8 PPP1R3D and LOC100131780. Literature research showed that the 4 identified responsiveness predicting genes are associated with mucosal immunity, cell metabolism, and airway remodeling. CONCLUSION AND CLINICAL RELEVANCE: Both computational methods show 4 identical genes as predictors of omalizumab response in moderate-to-severe asthma patients. The obtained high accuracy indicates that our approach has potential in clinical settings. Future studies in relevant cohort data should validate our computational approach.

5.
Commun Chem ; 6(1): 262, 2023 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-38030692

RESUMO

Atom-centred neural networks represent the state-of-the-art for approximating the quantum chemical properties of molecules, such as internal energies. While the design of machine learning architectures that respect chemical principles has continued to advance, the final atom pooling operation that is necessary to convert from atomic to molecular representations in most models remains relatively undeveloped. The most common choices, sum and average pooling, compute molecular representations that are naturally a good fit for many physical properties, while satisfying properties such as permutation invariance which are desirable from a geometric deep learning perspective. However, there are growing concerns that such simplistic functions might have limited representational power, while also being suboptimal for physical properties that are highly localised or intensive. Based on recent advances in graph representation learning, we investigate the use of a learnable pooling function that leverages an attention mechanism to model interactions between atom representations. The proposed pooling operation is a drop-in replacement requiring no changes to any of the other architectural components. Using SchNet and DimeNet++ as starting models, we demonstrate consistent uplifts in performance compared to sum and mean pooling and a recent physics-aware pooling operation designed specifically for orbital energies, on several datasets, properties, and levels of theory, with up to 85% improvements depending on the specific task.

6.
Commun Med (Lond) ; 3(1): 139, 2023 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-37803172

RESUMO

BACKGROUND: Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods, followed by classification of the now complete samples. The focus of the machine learning researcher is to optimise the classifier's performance. METHODS: We utilise three simulated and three real-world clinical datasets with different feature types and missingness patterns. Initially, we evaluate how the downstream classifier performance depends on the choice of classifier and imputation methods. We employ ANOVA to quantitatively evaluate how the choice of missingness rate, imputation method, and classifier method influences the performance. Additionally, we compare commonly used methods for assessing imputation quality and introduce a class of discrepancy scores based on the sliced Wasserstein distance. We also assess the stability of the imputations and the interpretability of model built on the imputed data. RESULTS: The performance of the classifier is most affected by the percentage of missingness in the test data, with a considerable performance decline observed as the test missingness rate increases. We also show that the commonly used measures for assessing imputation quality tend to lead to imputed data which poorly matches the underlying data distribution, whereas our new class of discrepancy scores performs much better on this measure. Furthermore, we show that the interpretability of classifier models trained using poorly imputed data is compromised. CONCLUSIONS: It is imperative to consider the quality of the imputation when performing downstream classification as the effects on the classifier can be considerable.


Many artificial intelligence (AI) methods aim to classify samples of data into groups, e.g., patients with disease vs. those without. This often requires datasets to be complete, i.e., that all data has been collected for all samples. However, in clinical practice this is often not the case and some data can be missing. One solution is to 'complete' the dataset using a technique called imputation to replace those missing values. However, assessing how well the imputation method performs is challenging. In this work, we demonstrate why people should care about imputation, develop a new method for assessing imputation quality, and demonstrate that if we build AI models on poorly imputed data, the model can give different results to those we would hope for. Our findings may improve the utility and quality of AI models in the clinic.

7.
Nat Mach Intell ; 5(7): 739-753, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37771758

RESUMO

Integrating gene expression across tissues and cell types is crucial for understanding the coordinated biological mechanisms that drive disease and characterise homeostasis. However, traditional multitissue integration methods cannot handle uncollected tissues or rely on genotype information, which is often unavailable and subject to privacy concerns. Here we present HYFA (Hypergraph Factorisation), a parameter-efficient graph representation learning approach for joint imputation of multi-tissue and cell-type gene expression. HYFA is genotype-agnostic, supports a variable number of collected tissues per individual, and imposes strong inductive biases to leverage the shared regulatory architecture of tissues and genes. In performance comparison on Genotype-Tissue Expression project data, HYFA achieves superior performance over existing methods, especially when multiple reference tissues are available. The HYFA-imputed dataset can be used to identify replicable regulatory genetic variations (eQTLs), with substantial gains over the original incomplete dataset. HYFA can accelerate the effective and scalable integration of tissue and cell-type transcriptome biorepositories.

8.
Comput Methods Programs Biomed ; 241: 107733, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37572513

RESUMO

BACKGROUND AND OBJECTIVE: High-resolution histopathology whole slide images (WSIs) contain abundant valuable information for cancer prognosis. However, most computational pathology methods for survival prediction have weak interpretability and cannot explain the decision-making processes reasonably. To address this issue, we propose a highly interpretable neural network termed pattern-perceptive survival transformer (Surformer) for cancer survival prediction from WSIs. METHODS: Notably, Surformer can quantify specific histological patterns through bag-level labels without any patch/cell-level auxiliary information. Specifically, the proposed ratio-reserved cross-attention module (RRCA) generates global and local features with the learnable prototypes (pglobal, plocals) as detectors and quantifies the patches correlative to each plocal in the form of ratio factors (rfs). Afterward, multi-head self&cross-attention modules proceed with the computation for feature enhancement against noise. Eventually, the designed disentangling loss function guides multiple local features to focus on distinct patterns, thereby assisting rfs from RRCA in achieving more explicit histological feature quantification. RESULTS: Extensive experiments on five TCGA datasets illustrate that Surformer outperforms existing state-of-the-art methods. In addition, we highlight its interpretation by visualizing rfs distribution across high-risk and low-risk cohorts and retrieving and analyzing critical histological patterns contributing to the survival prediction. CONCLUSIONS: Surformer is expected to be exploited as a useful tool for performing histopathology image data-driven analysis and gaining new insights for interpreting the associations between such images and patient survival states.


Assuntos
Neoplasias , Humanos , Neoplasias/diagnóstico por imagem , Percepção , Fontes de Energia Elétrica , Redes Neurais de Computação , Pesquisa
9.
Commun Med (Lond) ; 3(1): 100, 2023 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-37474615

RESUMO

BACKGROUND: Identifying prediagnostic neurodegenerative disease is a critical issue in neurodegenerative disease research, and Alzheimer's disease (AD) in particular, to identify populations suitable for preventive and early disease-modifying trials. Evidence from genetic and other studies suggests the neurodegeneration of Alzheimer's disease measured by brain atrophy starts many years before diagnosis, but it is unclear whether these changes can be used to reliably detect prediagnostic sporadic disease. METHODS: We trained a Bayesian machine learning neural network model to generate a neuroimaging phenotype and AD score representing the probability of AD using structural MRI data in the Alzheimer's Disease Neuroimaging Initiative (ADNI) Cohort (cut-off 0.5, AUC 0.92, PPV 0.90, NPV 0.93). We go on to validate the model in an independent real-world dataset of the National Alzheimer's Coordinating Centre (AUC 0.74, PPV 0.65, NPV 0.80) and demonstrate the correlation of the AD-score with cognitive scores in those with an AD-score above 0.5. We then apply the model to a healthy population in the UK Biobank study to identify a cohort at risk for Alzheimer's disease. RESULTS: We show that the cohort with a neuroimaging Alzheimer's phenotype has a cognitive profile in keeping with Alzheimer's disease, with strong evidence for poorer fluid intelligence, and some evidence of poorer numeric memory, reaction time, working memory, and prospective memory. We found some evidence in the AD-score positive cohort for modifiable risk factors of hypertension and smoking. CONCLUSIONS: This approach demonstrates the feasibility of using AI methods to identify a potentially prediagnostic population at high risk for developing sporadic Alzheimer's disease.


Spotting people with dementia early is challenging, but important to identify people for trials of treatment and prevention. We used brain scans of people with Alzheimer's disease, the commonest type of dementia, and applied an artificial intelligence method to spot people with Alzheimer's disease. We used this to find people in the Healthy UK Biobank study who might have early Alzheimer's disease. The people we found had subtle changes in their memory and thinking to suggest they may have early disease, and we also found they had high blood pressure and smoked for longer. We have demonstrated an approach that could be used to select people at high risk of future dementia for clinical trials.

10.
Genes (Basel) ; 14(4)2023 04 21.
Artigo em Inglês | MEDLINE | ID: mdl-37107707

RESUMO

Operons represent one of the leading strategies of gene organization in prokaryotes, having a crucial influence on the regulation of gene expression and on bacterial chromosome organization. However, there is no consensus yet on why, how, and when operons are formed and conserved, and many different theories have been proposed. Histidine biosynthesis is a highly studied metabolic pathway, and many of the models suggested to explain operons origin and evolution can be applied to the histidine pathway, making this route an attractive model for the study of operon evolution. Indeed, the organization of his genes in operons can be due to a progressive clustering of biosynthetic genes during evolution, coupled with a horizontal transfer of these gene clusters. The necessity of physical interactions among the His enzymes could also have had a role in favoring gene closeness, of particular importance in extreme environmental conditions. In addition, the presence in this pathway of paralogous genes, heterodimeric enzymes and complex regulatory networks also support other operon evolution hypotheses. It is possible that histidine biosynthesis, and in general all bacterial operons, may result from a mixture of several models, being shaped by different forces and mechanisms during evolution.


Assuntos
Evolução Molecular , Histidina , Histidina/genética , Óperon/genética , Bactérias/genética , Família Multigênica
11.
J Chem Inf Model ; 63(9): 2667-2678, 2023 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-37058588

RESUMO

High-throughput screening (HTS), as one of the key techniques in drug discovery, is frequently used to identify promising drug candidates in a largely automated and cost-effective way. One of the necessary conditions for successful HTS campaigns is a large and diverse compound library, enabling hundreds of thousands of activity measurements per project. Such collections of data hold great promise for computational and experimental drug discovery efforts, especially when leveraged in combination with modern deep learning techniques, and can potentially lead to improved drug activity predictions and cheaper and more effective experimental design. However, existing collections of machine-learning-ready public datasets do not exploit the multiple data modalities present in real-world HTS projects. Thus, the largest fraction of experimental measurements, corresponding to hundreds of thousands of "noisy" activity values from primary screening, are effectively ignored in the majority of machine learning models of HTS data. To address these limitations, we introduce Multifidelity PubChem BioAssay (MF-PCBA), a curated collection of 60 datasets that includes two data modalities for each dataset, corresponding to primary and confirmatory screening, an aspect that we call multifidelity. Multifidelity data accurately reflect real-world HTS conventions and present a new, challenging task for machine learning: the integration of low- and high-fidelity measurements through molecular representation learning, taking into account the orders-of-magnitude difference in size between the primary and confirmatory screens. Here we detail the steps taken to assemble MF-PCBA in terms of data acquisition from PubChem and the filtering steps required to curate the raw data. We also provide an evaluation of a recent deep-learning-based method for multifidelity integration across the introduced datasets, demonstrating the benefit of leveraging all HTS modalities, and a discussion in terms of the roughness of the molecular activity landscape. In total, MF-PCBA contains over 16.6 million unique molecule-protein interactions. The datasets can be easily assembled by using the source code available at https://github.com/davidbuterez/mf-pcba.


Assuntos
Benchmarking , Ensaios de Triagem em Larga Escala , Ensaios de Triagem em Larga Escala/métodos , Descoberta de Drogas/métodos , Aprendizado de Máquina , Bioensaio
12.
IEEE Trans Med Imaging ; 42(5): 1363-1373, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37015608

RESUMO

Recent studies on multi-contrast MRI reconstruction have demonstrated the potential of further accelerating MRI acquisition by exploiting correlation between contrasts. Most of the state-of-the-art approaches have achieved improvement through the development of network architectures for fixed under-sampling patterns, without considering inter-contrast correlation in the under-sampling pattern design. On the other hand, sampling pattern learning methods have shown better reconstruction performance than those with fixed under-sampling patterns. However, most under-sampling pattern learning algorithms are designed for single contrast MRI without exploiting complementary information between contrasts. To this end, we propose a framework to optimize the under-sampling pattern of a target MRI contrast which complements the acquired fully-sampled reference contrast. Specifically, a novel image synthesis network is introduced to extract the redundant information contained in the reference contrast, which is exploited in the subsequent joint pattern optimization and reconstruction network. We have demonstrated superior performance of our learned under-sampling patterns on both public and in-house datasets, compared to the commonly used under-sampling patterns and state-of-the-art methods that jointly optimize the reconstruction network and the under-sampling patterns, up to 8-fold under-sampling factor.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Extremidade Superior
13.
Nat Methods ; 20(4): 569-579, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36997816

RESUMO

The ability to quantify structural changes of the endoplasmic reticulum (ER) is crucial for understanding the structure and function of this organelle. However, the rapid movement and complex topology of ER networks make this challenging. Here, we construct a state-of-the-art semantic segmentation method that we call ERnet for the automatic classification of sheet and tubular ER domains inside individual cells. Data are skeletonized and represented by connectivity graphs, enabling precise and efficient quantification of network connectivity. ERnet generates metrics on topology and integrity of ER structures and quantifies structural change in response to genetic or metabolic manipulation. We validate ERnet using data obtained by various ER-imaging methods from different cell types as well as ground truth images of synthetic ER structures. ERnet can be deployed in an automatic high-throughput and unbiased fashion and identifies subtle changes in ER phenotypes that may inform on disease progression and response to therapy.


Assuntos
Retículo Endoplasmático , Semântica , Retículo Endoplasmático/metabolismo
14.
Neural Netw ; 162: 271-287, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36921434

RESUMO

Deep learning-based models have achieved significant success in detecting cardiac arrhythmia by analyzing ECG signals to categorize patient heartbeats. To improve the performance of such models, we have developed a novel hybrid hierarchical attention-based bidirectional recurrent neural network with dilated CNN (HARDC) method for arrhythmia classification. This solves problems that arise when traditional dilated convolutional neural network (CNN) models disregard the correlation between contexts and gradient dispersion. The proposed HARDC fully exploits the dilated CNN and bidirectional recurrent neural network unit (BiGRU-BiLSTM) architecture to generate fusion features. As a result of incorporating both local and global feature information and an attention mechanism, the model's performance for prediction is improved. By combining the fusion features with a dilated CNN and a hierarchical attention mechanism, the trained HARDC model showed significantly improved classification results and interpretability of feature extraction on the PhysioNet 2017 challenge dataset. Sequential Z-Score normalization, filtering, denoising, and segmentation are used to prepare the raw data for analysis. CGAN (Conditional Generative Adversarial Network) is then used to generate synthetic signals from the processed data. The experimental results demonstrate that the proposed HARDC model significantly outperforms other existing models, achieving an accuracy of 99.60%, F1 score of 98.21%, a precision of 97.66%, and recall of 99.60% using MIT-BIH generated ECG. In addition, this approach significantly reduces run time when using dilated CNN compared to normal convolution. Overall, this hybrid model demonstrates an innovative and cost-effective strategy for ECG signal compression and high-performance ECG recognition. Our results indicate that an automated and highly computed method to classify multiple types of arrhythmia signals holds considerable promise.


Assuntos
Algoritmos , Eletrocardiografia , Humanos , Frequência Cardíaca , Eletrocardiografia/métodos , Redes Neurais de Computação , Arritmias Cardíacas/diagnóstico , Processamento de Sinais Assistido por Computador
15.
Front Immunol ; 14: 1091941, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36776835

RESUMO

Introduction: The current monkeypox (MPX) outbreak, caused by the monkeypox virus (MPXV), has turned into a global concern, with over 59,000 infection cases and 23 deaths worldwide. Objectives: Herein, we aimed to exploit robust immunoinformatics approach, targeting membrane-bound, enveloped, and extracellular proteins of MPXV to formulate a chimeric antigen. Such a strategy could similarly be applied for identifying immunodominant epitopes and designing multi-epitope vaccine ensembles in other pathogens responsible for chronic pathologies that are difficult to intervene against. Methods: A reverse vaccinology pipeline was used to select 11 potential vaccine candidates, which were screened and mapped to predict immunodominant B-cell and T-cell epitopes. The finalized epitopes were merged with the aid of suitable linkers, an adjuvant (Resuscitation-promoting factor), a PADRE sequence (13 aa), and an HIV TAT sequence (11 aa) to formulate a multivalent epitope vaccine. Bioinformatics tools were employed to carry out codon adaptation and computational cloning. The tertiary structure of the chimeric vaccine construct was modeled via I-TASSER, and its interaction with Toll-like receptor 4 (TLR4) was evaluated using molecular docking and molecular dynamics simulation. C-ImmSim server was implemented to examine the immune response against the designed multi-epitope antigen. Results and discussion: The designed chimeric vaccine construct included 21 immunodominant epitopes (six B-cell, eight cytotoxic T lymphocyte, and seven helper T-lymphocyte) and is predicted non-allergen, antigenic, soluble, with suitable physicochemical features, that can promote cross-protection among the MPXV strains. The selected epitopes indicated a wide global population coverage (93.62%). Most finalized epitopes have 70%-100% sequence similarity with the experimentally validated immune epitopes of the vaccinia virus, which can be helpful in the speedy progression of vaccine design. Lastly, molecular docking and molecular dynamics simulation computed stable and energetically favourable interaction between the putative antigen and TLR4. Conclusion: Our results show that the multi-epitope vaccine might elicit cellular and humoral immune responses and could be a potential vaccine candidate against the MPXV infection. Further experimental testing of the proposed vaccine is warranted to validate its safety and efficacy profile.


Assuntos
Monkeypox virus , Receptor 4 Toll-Like , Vacinas Virais , Epitopos de Linfócito B , Epitopos Imunodominantes/genética , Simulação de Acoplamento Molecular , Vacinas Combinadas , Vacinas Virais/imunologia
16.
Eur Neuropsychopharmacol ; 69: 26-46, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36706689

RESUMO

To study mental illness and health, in the past researchers have often broken down their complexity into individual subsystems (e.g., genomics, transcriptomics, proteomics, clinical data) and explored the components independently. Technological advancements and decreasing costs of high throughput sequencing has led to an unprecedented increase in data generation. Furthermore, over the years it has become increasingly clear that these subsystems do not act in isolation but instead interact with each other to drive mental illness and health. Consequently, individual subsystems are now analysed jointly to promote a holistic understanding of the underlying biological complexity of health and disease. Complementing the increasing data availability, current research is geared towards developing novel methods that can efficiently combine the information rich multi-omics data to discover biologically meaningful biomarkers for diagnosis, treatment, and prognosis. However, clinical translation of the research is still challenging. In this review, we summarise conventional and state-of-the-art statistical and machine learning approaches for discovery of biomarker, diagnosis, as well as outcome and treatment response prediction through integrating multi-omics and clinical data. In addition, we describe the role of biological model systems and in silico multi-omics model designs in clinical translation of psychiatric research from bench to bedside. Finally, we discuss the current challenges and explore the application of multi-omics integration in future psychiatric research. The review provides a structured overview and latest updates in the field of multi-omics in psychiatry.


Assuntos
Transtornos Mentais , Multiômica , Humanos , Genômica , Proteômica/métodos , Aprendizado de Máquina , Transtornos Mentais/diagnóstico , Transtornos Mentais/genética , Transtornos Mentais/terapia
17.
Comput Biol Med ; 152: 106368, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36481763

RESUMO

Despite the arsenal of existing cancer therapies, the ongoing recurrence and new cases of cancer pose a serious health concern that necessitates the development of new and effective treatments. Cancer immunotherapy, which uses the body's immune system to combat cancer, is a promising treatment option. As a result, in silico methods for identifying and characterizing tumor T cell antigens (TTCAs) would be useful for better understanding their functional mechanisms. Although few computational methods for TTCA identification have been developed, their lack of model interpretability is a major drawback. Thus, developing computational methods for the effective identification and characterization of TTCAs is a critical endeavor. PSRTTCA, a new machine learning (ML)-based approach for improving the identification and characterization of TTCAs based on their primary sequences, is proposed in this study. Specifically, we introduce a new propensity score representation learning algorithm that allows one to generate various sets of propensity scores of amino acids, dipeptides, and g-gap dipeptides to be TTCAs. To enhance the predictive performance, optimal sets of variant propensity scores were determined and fed into the final meta-predictor (PSRTTCA). Benchmarking results revealed that PSRTTCA was a more precise and promising tool for the identification and characterization of TTCAs than conventional ML classifiers and existing methods. Furthermore, PSR-derived propensities of amino acids in becoming TTCAs are used to reveal the relationship between TTCAs and their informative physicochemical properties in order to provide insights into TTCA characteristics. Finally, a user-friendly online computational platform of PSRTTCA is publicly available at http://pmlabstack.pythonanywhere.com/PSRTTCA. The PSRTTCA predictor is anticipated to facilitate community-wide efforts in accelerating the discovery of novel TTCAs for cancer immunotherapy and other clinical applications.


Assuntos
Aminoácidos , Neoplasias , Humanos , Pontuação de Propensão , Aminoácidos/química , Algoritmos , Neoplasias/terapia , Dipeptídeos/química , Dipeptídeos/metabolismo , Linfócitos T/metabolismo , Biologia Computacional/métodos
18.
IEEE/ACM Trans Comput Biol Bioinform ; 20(2): 1211-1220, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35576419

RESUMO

Drug Side-Effects (DSEs) have a high impact on public health, care system costs, and drug discovery processes. Predicting the probability of side-effects, before their occurrence, is fundamental to reduce this impact, in particular on drug discovery. Candidate molecules could be screened before undergoing clinical trials, reducing the costs in time, money, and health of the participants. Drug side-effects are triggered by complex biological processes involving many different entities, from drug structures to protein-protein interactions. To predict their occurrence, it is necessary to integrate data from heterogeneous sources. In this work, such heterogeneous data is integrated into a graph dataset, expressively representing the relational information between different entities, such as drug molecules and genes. The relational nature of the dataset represents an important novelty for drug side-effect predictors. Graph Neural Networks (GNNs) are exploited to predict DSEs on our dataset with very promising results. GNNs are deep learning models that can process graph-structured data, with minimal information loss, and have been applied on a wide variety of biological tasks. Our experimental results confirm the advantage of using relationships between data entities, suggesting interesting future developments in this scope. The experimentation also shows the importance of specific subsets of data in determining associations between drugs and side-effects.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Descoberta de Drogas , Redes Neurais de Computação , Probabilidade , Projetos de Pesquisa
19.
ACS Omega ; 7(45): 41082-41095, 2022 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-36406571

RESUMO

Antimalarial peptides (AMAPs) varying in length, amino acid composition, charge, conformational structure, hydrophobicity, and amphipathicity reflect their diversity in antimalarial mechanisms. Due to the worldwide major health problem concerning antimicrobial resistance, these peptides possess great therapeutic value owing to their low incidences of drug resistance as compared to conventional antibiotics. Although well-known experimental methods are able to precisely determine the antimalarial activity of peptides, these methods are still time-consuming and costly. Thus, machine learning (ML)-based methods that are capable of identifying AMAPs rapidly by using only sequence information would be beneficial for the high-throughput identification of AMAPs. In this study, we propose the first computational model (termed iAMAP-SCM) for the large-scale identification and characterization of peptides with antimalarial activity by using only sequence information. Specifically, we employed an interpretable scoring card method (SCM) to develop iAMAP-SCM and estimate propensities of 20 amino acids and 400 dipeptides to be AMAPs in a supervised manner. Experimental results showed that iAMAP-SCM could achieve a maximum accuracy and Matthew's coefficient correlation of 0.957 and 0.834, respectively, on the independent test dataset. In addition, SCM-derived propensities of 20 amino acids and selected physicochemical properties were used to provide an understanding of the functional mechanisms of AMAPs. Finally, a user-friendly online computational platform of iAMAP-SCM is publicly available at http://pmlabstack.pythonanywhere.com/iAMAP-SCM. The iAMAP-SCM predictor is anticipated to assist experimental scientists in the high-throughput identification of potential AMAP candidates for the treatment of malaria and other clinical applications.

20.
J Am Soc Nephrol ; 33(12): 2133-2140, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36351761

RESUMO

Although still in its infancy, artificial intelligence (AI) analysis of kidney biopsy images is anticipated to become an integral aspect of renal histopathology. As these systems are developed, the focus will understandably be on developing ever more accurate models, but successful translation to the clinic will also depend upon other characteristics of the system.In the extreme, deployment of highly performant but "black box" AI is fraught with risk, and high-profile errors could damage future trust in the technology. Furthermore, a major factor determining whether new systems are adopted in clinical settings is whether they are "trusted" by clinicians. Key to unlocking trust will be designing platforms optimized for intuitive human-AI interactions and ensuring that, where judgment is required to resolve ambiguous areas of assessment, the workings of the AI image classifier are understandable to the human observer. Therefore, determining the optimal design for AI systems depends on factors beyond performance, with considerations of goals, interpretability, and safety constraining many design and engineering choices.In this article, we explore challenges that arise in the application of AI to renal histopathology, and consider areas where choices around model architecture, training strategy, and workflow design may be influenced by factors beyond the final performance metrics of the system.


Assuntos
Inteligência Artificial , Confiança , Humanos , Rim
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...