Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 220
Filtrar
1.
ArXiv ; 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38947930

RESUMO

We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self-consistency TM-score >= 0.45, at which two RNAs have the same global fold. Open-source code: https://github.com/rish-16/rna-backbone-design.

2.
ArXiv ; 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38947934

RESUMO

We introduce ProteinWorkshop, a comprehensive benchmark suite for representation learning on protein structures with Geometric Graph Neural Networks. We consider large-scale pre-training and downstream tasks on both experimental and predicted structures to enable the systematic evaluation of the quality of the learned structural representation and their usefulness in capturing functional relationships for downstream tasks. We find that: (1) large-scale pretraining on AlphaFold structures and auxiliary tasks consistently improve the performance of both rotation-invariant and equivariant GNNs, and (2) more expressive equivariant GNNs benefit from pretraining to a greater extent compared to invariant models. We aim to establish a common ground for the machine learning and computational biology communities to rigorously compare and advance protein structure representation learning. Our open-source codebase reduces the barrier to entry for working with large protein structure datasets by providing: (1) storage-efficient dataloaders for large-scale structural databases including AlphaFoldDB and ESM Atlas, as well as (2) utilities for constructing new tasks from the entire PDB. ProteinWorkshop is available at: github.com/a-r-j/ProteinWorkshop.

3.
Med Image Anal ; 97: 103252, 2024 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-38963973

RESUMO

Histopathology image-based survival prediction aims to provide a precise assessment of cancer prognosis and can inform personalized treatment decision-making in order to improve patient outcomes. However, existing methods cannot automatically model the complex correlations between numerous morphologically diverse patches in each whole slide image (WSI), thereby preventing them from achieving a more profound understanding and inference of the patient status. To address this, here we propose a novel deep learning framework, termed dual-stream multi-dependency graph neural network (DM-GNN), to enable precise cancer patient survival analysis. Specifically, DM-GNN is structured with the feature updating and global analysis branches to better model each WSI as two graphs based on morphological affinity and global co-activating dependencies. As these two dependencies depict each WSI from distinct but complementary perspectives, the two designed branches of DM-GNN can jointly achieve the multi-view modeling of complex correlations between the patches. Moreover, DM-GNN is also capable of boosting the utilization of dependency information during graph construction by introducing the affinity-guided attention recalibration module as the readout function. This novel module offers increased robustness against feature perturbation, thereby ensuring more reliable and stable predictions. Extensive benchmarking experiments on five TCGA datasets demonstrate that DM-GNN outperforms other state-of-the-art methods and offers interpretable prediction insights based on the morphological depiction of high-attention patches. Overall, DM-GNN represents a powerful and auxiliary tool for personalized cancer prognosis from histopathology images and has great potential to assist clinicians in making personalized treatment decisions and improving patient outcomes.

4.
bioRxiv ; 2024 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-38826198

RESUMO

Computational RNA design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired secondary structure without considering 3D geometry and conformational diversity. We introduce gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. Under the hood, gRNAde is a multi-state Graph Neural Network that generates candidate RNA sequences conditioned on one or more 3D backbone structures where the identities of the bases are unknown. On a single-state fixed backbone re-design benchmark of 14 RNA structures from the PDB identified by Das et al. [2010], gRNAde obtains higher native sequence recovery rates (56% on average) compared to Rosetta (45% on average), taking under a second to produce designs compared to the reported hours for Rosetta. We further demonstrate the utility of gRNAde on a new benchmark of multi-state design for structurally flexible RNAs, as well as zero-shot ranking of mutational fitness landscapes in a retrospective analysis of a recent RNA polymerase ribozyme structure.

5.
ArXiv ; 2024 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-38827456

RESUMO

Computational RNA design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired secondary structure without considering 3D geometry and conformational diversity. We introduce gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. Under the hood, gRNAde is a multi-state Graph Neural Network that generates candidate RNA sequences conditioned on one or more 3D backbone structures where the identities of the bases are unknown. On a single-state fixed backbone re-design benchmark of 14 RNA structures from the PDB identified by Das et al. [2010], gRNAde obtains higher native sequence recovery rates (56% on average) compared to Rosetta (45% on average), taking under a second to produce designs compared to the reported hours for Rosetta. We further demonstrate the utility of gRNAde on a new benchmark of multi-state design for structurally flexible RNAs, as well as zero-shot ranking of mutational fitness landscapes in a retrospective analysis of a recent RNA polymerase ribozyme structure. Open source code: https://github.com/chaitjo/geometric-rna-design.

6.
Sci Immunol ; 9(95): eade2094, 2024 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-38787961

RESUMO

Immunotherapy advances have been hindered by difficulties in tracking the behaviors of lymphocytes after antigen signaling. Here, we assessed the behavior of T cells active within tumors through the development of the antigen receptor signaling reporter (AgRSR) mouse, fate-mapping lymphocytes responding to antigens at specific times and locations. Contrary to reports describing the ready egress of T cells out of the tumor, we find that intratumoral antigen signaling traps CD8+ T cells in the tumor. These clonal populations expand and become increasingly exhausted over time. By contrast, antigen-signaled regulatory T cell (Treg) clonal populations readily recirculate out of the tumor. Consequently, intratumoral antigen signaling acts as a gatekeeper to compartmentalize CD8+ T cell responses, even within the same clonotype, thus enabling exhausted T cells to remain confined to a specific tumor tissue site.


Assuntos
Linfócitos T CD8-Positivos , Transdução de Sinais , Animais , Linfócitos T CD8-Positivos/imunologia , Camundongos , Transdução de Sinais/imunologia , Camundongos Endogâmicos C57BL , Camundongos Transgênicos , Antígenos de Neoplasias/imunologia , Neoplasias/imunologia
7.
Sci Rep ; 14(1): 12548, 2024 05 31.
Artigo em Inglês | MEDLINE | ID: mdl-38822012

RESUMO

Patient triage is crucial in emergency departments, ensuring timely and appropriate care based on correctly evaluating the emergency grade of patient conditions. Triage methods are generally performed by human operator based on her own experience and information that are gathered from the patient management process. Thus, it is a process that can generate errors in emergency-level associations. Recently, Traditional triage methods heavily rely on human decisions, which can be subjective and prone to errors. A growing interest has recently been focused on leveraging artificial intelligence (AI) to develop algorithms to maximize information gathering and minimize errors in patient triage processing. We define and implement an AI-based module to manage patients' emergency code assignments in emergency departments. It uses historical data from the emergency department to train the medical decision-making process. Data containing relevant patient information, such as vital signs, symptoms, and medical history, accurately classify patients into triage categories. Experimental results demonstrate that the proposed algorithm achieved high accuracy outperforming traditional triage methods. By using the proposed method, we claim that healthcare professionals can predict severity index to guide patient management processing and resource allocation.


Assuntos
Algoritmos , Serviço Hospitalar de Emergência , Redes Neurais de Computação , Triagem , Triagem/métodos , Humanos , Inteligência Artificial , Tomada de Decisão Clínica/métodos
8.
Nat Comput Sci ; 4(5): 367-378, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38730184

RESUMO

Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 µs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.


Assuntos
Descoberta de Drogas , Aprendizado de Máquina , Simulação de Dinâmica Molecular , Proteínas , Ligantes , Descoberta de Drogas/métodos , Proteínas/química , Proteínas/metabolismo , Teoria Quântica
9.
Nat Commun ; 15(1): 1517, 2024 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-38409255

RESUMO

We investigate the potential of graph neural networks for transfer learning and improving molecular property prediction on sparse and expensive to acquire high-fidelity data by leveraging low-fidelity measurements as an inexpensive proxy for a targeted property of interest. This problem arises in discovery processes that rely on screening funnels for trading off the overall costs against throughput and accuracy. Typically, individual stages in these processes are loosely connected and each one generates data at different scale and fidelity. We consider this setup holistically and demonstrate empirically that existing transfer learning techniques for graph neural networks are generally unable to harness the information from multi-fidelity cascades. Here, we propose several effective transfer learning strategies and study them in transductive and inductive settings. Our analysis involves a collection of more than 28 million unique experimental protein-ligand interactions across 37 targets from drug discovery by high-throughput screening and 12 quantum properties from the dataset QMugs. The results indicate that transfer learning can improve the performance on sparse tasks by up to eight times while using an order of magnitude less high-fidelity training data. Moreover, the proposed methods consistently outperform existing transfer learning strategies for graph-structured data on drug discovery and quantum mechanics datasets.


Assuntos
Descoberta de Drogas , Aprendizagem , Ensaios de Triagem em Larga Escala , Redes Neurais de Computação , Aprendizado de Máquina
10.
Clin Transl Allergy ; 13(11): e12306, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-38006387

RESUMO

BACKGROUND: Not being well controlled by therapy with inhaled corticosteroids and long-acting ß2 agonist bronchodilators is a major concern for severe-asthma patients. The current treatment option for these patients is the use of biologicals such as anti-IgE treatment, omalizumab, as an add-on therapy. Despite the accepted use of omalizumab, patients do not always benefit from it. Therefore, there is a need to identify reliable biomarkers as predictors of omalizumab response. METHODS: Two novel computational algorithms, machine-learning based Recursive Ensemble Feature Selection (REFS) and rule-based algorithm Logic Explainable Networks (LEN), were used on open accessible mRNA expression data from moderate-to-severe asthma patients to identify genes as predictors of omalizumab response. RESULTS: With REFS, the number of features was reduced from 28,402 genes to 5 genes while obtaining a cross-validated accuracy of 0.975. The 5 responsiveness predictive genes encode the following proteins: Coiled-coil domain- containing protein 113 (CCDC113), Solute Carrier Family 26 Member 8 (SLC26A), Protein Phosphatase 1 Regulatory Subunit 3D (PPP1R3D), C-Type lectin Domain Family 4 member C (CLEC4C) and LOC100131780 (not annotated). The LEN algorithm found 4 identical genes with REFS: CCDC113, SLC26A8 PPP1R3D and LOC100131780. Literature research showed that the 4 identified responsiveness predicting genes are associated with mucosal immunity, cell metabolism, and airway remodeling. CONCLUSION AND CLINICAL RELEVANCE: Both computational methods show 4 identical genes as predictors of omalizumab response in moderate-to-severe asthma patients. The obtained high accuracy indicates that our approach has potential in clinical settings. Future studies in relevant cohort data should validate our computational approach.

11.
Commun Chem ; 6(1): 262, 2023 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-38030692

RESUMO

Atom-centred neural networks represent the state-of-the-art for approximating the quantum chemical properties of molecules, such as internal energies. While the design of machine learning architectures that respect chemical principles has continued to advance, the final atom pooling operation that is necessary to convert from atomic to molecular representations in most models remains relatively undeveloped. The most common choices, sum and average pooling, compute molecular representations that are naturally a good fit for many physical properties, while satisfying properties such as permutation invariance which are desirable from a geometric deep learning perspective. However, there are growing concerns that such simplistic functions might have limited representational power, while also being suboptimal for physical properties that are highly localised or intensive. Based on recent advances in graph representation learning, we investigate the use of a learnable pooling function that leverages an attention mechanism to model interactions between atom representations. The proposed pooling operation is a drop-in replacement requiring no changes to any of the other architectural components. Using SchNet and DimeNet++ as starting models, we demonstrate consistent uplifts in performance compared to sum and mean pooling and a recent physics-aware pooling operation designed specifically for orbital energies, on several datasets, properties, and levels of theory, with up to 85% improvements depending on the specific task.

12.
Life Sci ; 335: 122244, 2023 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-37949208

RESUMO

High blood sugar and insulin insensitivity causes the lifelong chronic metabolic disease called Type 2 diabetes (T2D) which has a higher chance of developing different malignancies. T2D with comorbidities like Cancers can make normal medications for those disorders more difficult. There may be a significant correlation between comorbidities and have an impact on one another's health. These associations may be due to a number of direct and indirect mechanisms. Such molecular mechanisms that underpin T2D and cancer are yet unknown. However, the large volumes of data available on these diseases allowed us to use analytical tools for uncovering their interrelated pathways. Here, we tried to present a system for investigating potential comorbidity relationships between T2D and Cancer disease by looking at the molecular processes involved, analyzing a huge number of freely accessible transcriptomic datasets of various disorders using bioinformatics. Using semantic similarity and gene set enrichment analysis, we created an informatics pipeline that evaluates and integrates Gene Ontology (GO), expression of genes, and biological process data. We discovered genes that are common in T2D and Cancer along with molecular pathways and GOs. We compared the top 200 Differentially Expressed Genes (DEGs) from each selected T2D and cancer dataset and found the most significant common genes. Among all the common genes 13 genes were found most frequent. We also found 4 common GO terms: GO:0000003, GO:0000122, GO:0000165, and GO:0000278 among all the common GO terms between T2d and different cancers. Using these genes and GO term semantic similarity, we calculated the distance between these two diseases. The semantic similarity results of our study showed a higher association of Liver Cancer (LiC), Breast Cancer (BreC), Colorectal Cancer (CC), and Bladder Cancer (BlaC) with T2D. Furthermore we found KIF4A, NUSAP1, CENPF, CCNB1, TOP2A, CCNB2, RRM2, HMMR, NDC80, NCAPG, and IGFBP5 common hub proteins among different cancers correlated to T2D. AGE-RAGE signaling pathway in diabetic complications, Osteoclast differentiation, TNF signaling pathway, IL-17 signaling pathway, p53 signaling pathway, MAPK signaling pathway, Human T-cell leukemia virus 1 infection, and Non-alcoholic fatty liver disease are the 8 most significant pathways found among 18 common pathways between T2D and selected cancers. As a result of our technique, we now know more about disease pathways that are critical between T2D and cancer.


Assuntos
Diabetes Mellitus Tipo 2 , Neoplasias Hepáticas , Humanos , Diabetes Mellitus Tipo 2/genética , Neoplasias Hepáticas/patologia , Perfilação da Expressão Gênica/métodos , Transcriptoma , Comorbidade , Biologia Computacional/métodos , Cinesinas/genética
13.
Commun Med (Lond) ; 3(1): 139, 2023 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-37803172

RESUMO

BACKGROUND: Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods, followed by classification of the now complete samples. The focus of the machine learning researcher is to optimise the classifier's performance. METHODS: We utilise three simulated and three real-world clinical datasets with different feature types and missingness patterns. Initially, we evaluate how the downstream classifier performance depends on the choice of classifier and imputation methods. We employ ANOVA to quantitatively evaluate how the choice of missingness rate, imputation method, and classifier method influences the performance. Additionally, we compare commonly used methods for assessing imputation quality and introduce a class of discrepancy scores based on the sliced Wasserstein distance. We also assess the stability of the imputations and the interpretability of model built on the imputed data. RESULTS: The performance of the classifier is most affected by the percentage of missingness in the test data, with a considerable performance decline observed as the test missingness rate increases. We also show that the commonly used measures for assessing imputation quality tend to lead to imputed data which poorly matches the underlying data distribution, whereas our new class of discrepancy scores performs much better on this measure. Furthermore, we show that the interpretability of classifier models trained using poorly imputed data is compromised. CONCLUSIONS: It is imperative to consider the quality of the imputation when performing downstream classification as the effects on the classifier can be considerable.


Many artificial intelligence (AI) methods aim to classify samples of data into groups, e.g., patients with disease vs. those without. This often requires datasets to be complete, i.e., that all data has been collected for all samples. However, in clinical practice this is often not the case and some data can be missing. One solution is to 'complete' the dataset using a technique called imputation to replace those missing values. However, assessing how well the imputation method performs is challenging. In this work, we demonstrate why people should care about imputation, develop a new method for assessing imputation quality, and demonstrate that if we build AI models on poorly imputed data, the model can give different results to those we would hope for. Our findings may improve the utility and quality of AI models in the clinic.

14.
Nat Mach Intell ; 5(7): 739-753, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37771758

RESUMO

Integrating gene expression across tissues and cell types is crucial for understanding the coordinated biological mechanisms that drive disease and characterise homeostasis. However, traditional multitissue integration methods cannot handle uncollected tissues or rely on genotype information, which is often unavailable and subject to privacy concerns. Here we present HYFA (Hypergraph Factorisation), a parameter-efficient graph representation learning approach for joint imputation of multi-tissue and cell-type gene expression. HYFA is genotype-agnostic, supports a variable number of collected tissues per individual, and imposes strong inductive biases to leverage the shared regulatory architecture of tissues and genes. In performance comparison on Genotype-Tissue Expression project data, HYFA achieves superior performance over existing methods, especially when multiple reference tissues are available. The HYFA-imputed dataset can be used to identify replicable regulatory genetic variations (eQTLs), with substantial gains over the original incomplete dataset. HYFA can accelerate the effective and scalable integration of tissue and cell-type transcriptome biorepositories.

15.
Comput Methods Programs Biomed ; 241: 107733, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37572513

RESUMO

BACKGROUND AND OBJECTIVE: High-resolution histopathology whole slide images (WSIs) contain abundant valuable information for cancer prognosis. However, most computational pathology methods for survival prediction have weak interpretability and cannot explain the decision-making processes reasonably. To address this issue, we propose a highly interpretable neural network termed pattern-perceptive survival transformer (Surformer) for cancer survival prediction from WSIs. METHODS: Notably, Surformer can quantify specific histological patterns through bag-level labels without any patch/cell-level auxiliary information. Specifically, the proposed ratio-reserved cross-attention module (RRCA) generates global and local features with the learnable prototypes (pglobal, plocals) as detectors and quantifies the patches correlative to each plocal in the form of ratio factors (rfs). Afterward, multi-head self&cross-attention modules proceed with the computation for feature enhancement against noise. Eventually, the designed disentangling loss function guides multiple local features to focus on distinct patterns, thereby assisting rfs from RRCA in achieving more explicit histological feature quantification. RESULTS: Extensive experiments on five TCGA datasets illustrate that Surformer outperforms existing state-of-the-art methods. In addition, we highlight its interpretation by visualizing rfs distribution across high-risk and low-risk cohorts and retrieving and analyzing critical histological patterns contributing to the survival prediction. CONCLUSIONS: Surformer is expected to be exploited as a useful tool for performing histopathology image data-driven analysis and gaining new insights for interpreting the associations between such images and patient survival states.


Assuntos
Neoplasias , Humanos , Neoplasias/diagnóstico por imagem , Percepção , Fontes de Energia Elétrica , Redes Neurais de Computação , Pesquisa
16.
Commun Med (Lond) ; 3(1): 100, 2023 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-37474615

RESUMO

BACKGROUND: Identifying prediagnostic neurodegenerative disease is a critical issue in neurodegenerative disease research, and Alzheimer's disease (AD) in particular, to identify populations suitable for preventive and early disease-modifying trials. Evidence from genetic and other studies suggests the neurodegeneration of Alzheimer's disease measured by brain atrophy starts many years before diagnosis, but it is unclear whether these changes can be used to reliably detect prediagnostic sporadic disease. METHODS: We trained a Bayesian machine learning neural network model to generate a neuroimaging phenotype and AD score representing the probability of AD using structural MRI data in the Alzheimer's Disease Neuroimaging Initiative (ADNI) Cohort (cut-off 0.5, AUC 0.92, PPV 0.90, NPV 0.93). We go on to validate the model in an independent real-world dataset of the National Alzheimer's Coordinating Centre (AUC 0.74, PPV 0.65, NPV 0.80) and demonstrate the correlation of the AD-score with cognitive scores in those with an AD-score above 0.5. We then apply the model to a healthy population in the UK Biobank study to identify a cohort at risk for Alzheimer's disease. RESULTS: We show that the cohort with a neuroimaging Alzheimer's phenotype has a cognitive profile in keeping with Alzheimer's disease, with strong evidence for poorer fluid intelligence, and some evidence of poorer numeric memory, reaction time, working memory, and prospective memory. We found some evidence in the AD-score positive cohort for modifiable risk factors of hypertension and smoking. CONCLUSIONS: This approach demonstrates the feasibility of using AI methods to identify a potentially prediagnostic population at high risk for developing sporadic Alzheimer's disease.


Spotting people with dementia early is challenging, but important to identify people for trials of treatment and prevention. We used brain scans of people with Alzheimer's disease, the commonest type of dementia, and applied an artificial intelligence method to spot people with Alzheimer's disease. We used this to find people in the Healthy UK Biobank study who might have early Alzheimer's disease. The people we found had subtle changes in their memory and thinking to suggest they may have early disease, and we also found they had high blood pressure and smoked for longer. We have demonstrated an approach that could be used to select people at high risk of future dementia for clinical trials.

17.
IEEE Trans Med Imaging ; 42(5): 1363-1373, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37015608

RESUMO

Recent studies on multi-contrast MRI reconstruction have demonstrated the potential of further accelerating MRI acquisition by exploiting correlation between contrasts. Most of the state-of-the-art approaches have achieved improvement through the development of network architectures for fixed under-sampling patterns, without considering inter-contrast correlation in the under-sampling pattern design. On the other hand, sampling pattern learning methods have shown better reconstruction performance than those with fixed under-sampling patterns. However, most under-sampling pattern learning algorithms are designed for single contrast MRI without exploiting complementary information between contrasts. To this end, we propose a framework to optimize the under-sampling pattern of a target MRI contrast which complements the acquired fully-sampled reference contrast. Specifically, a novel image synthesis network is introduced to extract the redundant information contained in the reference contrast, which is exploited in the subsequent joint pattern optimization and reconstruction network. We have demonstrated superior performance of our learned under-sampling patterns on both public and in-house datasets, compared to the commonly used under-sampling patterns and state-of-the-art methods that jointly optimize the reconstruction network and the under-sampling patterns, up to 8-fold under-sampling factor.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Extremidade Superior
18.
Genes (Basel) ; 14(4)2023 04 21.
Artigo em Inglês | MEDLINE | ID: mdl-37107707

RESUMO

Operons represent one of the leading strategies of gene organization in prokaryotes, having a crucial influence on the regulation of gene expression and on bacterial chromosome organization. However, there is no consensus yet on why, how, and when operons are formed and conserved, and many different theories have been proposed. Histidine biosynthesis is a highly studied metabolic pathway, and many of the models suggested to explain operons origin and evolution can be applied to the histidine pathway, making this route an attractive model for the study of operon evolution. Indeed, the organization of his genes in operons can be due to a progressive clustering of biosynthetic genes during evolution, coupled with a horizontal transfer of these gene clusters. The necessity of physical interactions among the His enzymes could also have had a role in favoring gene closeness, of particular importance in extreme environmental conditions. In addition, the presence in this pathway of paralogous genes, heterodimeric enzymes and complex regulatory networks also support other operon evolution hypotheses. It is possible that histidine biosynthesis, and in general all bacterial operons, may result from a mixture of several models, being shaped by different forces and mechanisms during evolution.


Assuntos
Evolução Molecular , Histidina , Histidina/genética , Óperon/genética , Bactérias/genética , Família Multigênica
19.
J Chem Inf Model ; 63(9): 2667-2678, 2023 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-37058588

RESUMO

High-throughput screening (HTS), as one of the key techniques in drug discovery, is frequently used to identify promising drug candidates in a largely automated and cost-effective way. One of the necessary conditions for successful HTS campaigns is a large and diverse compound library, enabling hundreds of thousands of activity measurements per project. Such collections of data hold great promise for computational and experimental drug discovery efforts, especially when leveraged in combination with modern deep learning techniques, and can potentially lead to improved drug activity predictions and cheaper and more effective experimental design. However, existing collections of machine-learning-ready public datasets do not exploit the multiple data modalities present in real-world HTS projects. Thus, the largest fraction of experimental measurements, corresponding to hundreds of thousands of "noisy" activity values from primary screening, are effectively ignored in the majority of machine learning models of HTS data. To address these limitations, we introduce Multifidelity PubChem BioAssay (MF-PCBA), a curated collection of 60 datasets that includes two data modalities for each dataset, corresponding to primary and confirmatory screening, an aspect that we call multifidelity. Multifidelity data accurately reflect real-world HTS conventions and present a new, challenging task for machine learning: the integration of low- and high-fidelity measurements through molecular representation learning, taking into account the orders-of-magnitude difference in size between the primary and confirmatory screens. Here we detail the steps taken to assemble MF-PCBA in terms of data acquisition from PubChem and the filtering steps required to curate the raw data. We also provide an evaluation of a recent deep-learning-based method for multifidelity integration across the introduced datasets, demonstrating the benefit of leveraging all HTS modalities, and a discussion in terms of the roughness of the molecular activity landscape. In total, MF-PCBA contains over 16.6 million unique molecule-protein interactions. The datasets can be easily assembled by using the source code available at https://github.com/davidbuterez/mf-pcba.


Assuntos
Benchmarking , Ensaios de Triagem em Larga Escala , Ensaios de Triagem em Larga Escala/métodos , Descoberta de Drogas/métodos , Aprendizado de Máquina , Bioensaio
20.
Nat Methods ; 20(4): 569-579, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36997816

RESUMO

The ability to quantify structural changes of the endoplasmic reticulum (ER) is crucial for understanding the structure and function of this organelle. However, the rapid movement and complex topology of ER networks make this challenging. Here, we construct a state-of-the-art semantic segmentation method that we call ERnet for the automatic classification of sheet and tubular ER domains inside individual cells. Data are skeletonized and represented by connectivity graphs, enabling precise and efficient quantification of network connectivity. ERnet generates metrics on topology and integrity of ER structures and quantifies structural change in response to genetic or metabolic manipulation. We validate ERnet using data obtained by various ER-imaging methods from different cell types as well as ground truth images of synthetic ER structures. ERnet can be deployed in an automatic high-throughput and unbiased fashion and identifies subtle changes in ER phenotypes that may inform on disease progression and response to therapy.


Assuntos
Retículo Endoplasmático , Semântica , Retículo Endoplasmático/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA