Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 107
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nat Methods ; 21(6): 1114-1121, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38594452

RESUMO

The identification of genetic and chemical perturbations with similar impacts on cell morphology can elucidate compounds' mechanisms of action or novel regulators of genetic pathways. Research on methods for identifying such similarities has lagged due to a lack of carefully designed and well-annotated image sets of cells treated with chemical and genetic perturbations. Here we create such a Resource dataset, CPJUMP1, in which each perturbed gene's product is a known target of at least two chemical compounds in the dataset. We systematically explore the directionality of correlations among perturbations that target the same protein encoded by a given gene, and we find that identifying matches between chemical and genetic perturbations is a challenging task. Our dataset and baseline analyses provide a benchmark for evaluating methods that measure perturbation similarities and impact, and more generally, learn effective representations of cellular state from microscopy images. Such advancements would accelerate the applications of image-based profiling of cellular states, such as uncovering drug mode of action or probing functional genomics.


Assuntos
Processamento de Imagem Assistida por Computador , Humanos , Processamento de Imagem Assistida por Computador/métodos , Microscopia/métodos
2.
Nat Methods ; 19(12): 1550-1557, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36344834

RESUMO

Cells can be perturbed by various chemical and genetic treatments and the impact on gene expression and morphology can be measured via transcriptomic profiling and image-based assays, respectively. The patterns observed in these high-dimensional profile data can power a dozen applications in drug discovery and basic biology research, but both types of profiles are rarely available for large-scale experiments. Here, we provide a collection of four datasets with both gene expression and morphological profile data useful for developing and testing multimodal methodologies. Roughly a thousand features are measured for each of the two data types, across more than 28,000 chemical and genetic perturbations. We define biological problems that use the shared and complementary information in these two data modalities, provide baseline analysis and evaluation metrics for multi-omic applications, and make the data resource publicly available ( https://broad.io/rosetta/ ).


Assuntos
Descoberta de Drogas , Perfilação da Expressão Gênica , Perfilação da Expressão Gênica/métodos , Expressão Gênica
3.
Cancer Causes Control ; 35(3): 465-475, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37843701

RESUMO

INTRODUCTION: Brain metastasis (BM) is an aggressive complication with an extremely poor prognosis in patients with small-cell lung cancer (SCLC). A well-constructed prognostic model could help in providing timely survival consultation or optimizing treatments. METHODS: We analyzed clinical data from SCLC patients between 2000 and 2018 based on the Surveillance, Epidemiology, and End Results (SEER) database. We identified significant prognostic factors and integrated them using a multivariable Cox regression approach. Internal validation of the model was performed through a bootstrap resampling procedure. Model performance was evaluated based on the area under the curve (AUC) and calibration curve. RESULTS: A total of 2,454 SCLC patients' clinical data was collected from the database. It was determined that seven clinical parameters were associated with prognosis in SCLC patients with BM. A satisfactory level of discrimination was achieved by the predictive model, with 6-, 12-, and 18-month AUC values of 0.726, 0.707, and 0.737 in the training cohort; and 0.759, 0.742, and 0.744 in the validation cohort. As measured by survival rate probabilities, the calibration curve agreed well with actual observations. Furthermore, prognostic scores were found to significantly alter the survival curves of different risk groups. We then deployed the prognostic model onto a website server so that users can access it easily. CONCLUSIONS: In this study, a nomogram and a web-based predictor were developed to predict overall survival in SCLC patients with BM. It may assist physicians in making informed clinical decisions and determining the best treatment plan for each patient.


Assuntos
Neoplasias Encefálicas , Neoplasias Pulmonares , Humanos , Nomogramas , Bases de Dados Factuais , Internet , Prognóstico , Programa de SEER
4.
Chemphyschem ; 25(13): e202300953, 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38396282

RESUMO

Chalcogenide perovskites are a class of materials with electronic and optoelectronic properties desirable for solar cells, infrared optics, and computing. The oxide counterparts of these chalcogenides have been studied extensively for their electrocatalytic and photoelectrochemical properties. As chalcogenide perovskites are more covalent, conductive, and stable, we hypothesize that they are more viable as electrocatalysts than oxide perovskites. The goal of this synthetic, experimental, and computational study is to examine the hydrogen evolution reaction (HER) activity of three Barium-based chalcogenides in perovskite and related structures: BaZrS3, BaTiS3, and BaVS3. Potential energy surfaces for hydrogen adsorption on surfaces of these materials are calculated using density functional theory and the computational hydrogen electrode model is used to contrast overpotentials with experiment. Although both experiments and computations agree that BaVS3 is the most active of the three materials, high overpotentials of these materials make them less viable than platinum for HER. Our work establishes a framework for future studies in the chemical and electrochemical properties of chalcogenide perovskites.

5.
J Chem Inf Model ; 64(4): 1172-1186, 2024 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-38300851

RESUMO

Drug-induced cardiotoxicity (DICT) is a major concern in drug development, accounting for 10-14% of postmarket withdrawals. In this study, we explored the capabilities of chemical and biological data to predict cardiotoxicity, using the recently released DICTrank data set from the United States FDA. We found that such data, including protein targets, especially those related to ion channels (e.g., hERG), physicochemical properties (e.g., electrotopological state), and peak concentration in plasma offer strong predictive ability for DICT. Compounds annotated with mechanisms of action such as cyclooxygenase inhibition could distinguish between most-concern and no-concern DICT. Cell Painting features for ER stress discerned most-concern cardiotoxic from nontoxic compounds. Models based on physicochemical properties provided substantial predictive accuracy (AUCPR = 0.93). With the availability of omics data in the future, using biological data promises enhanced predictability and deeper mechanistic insights, paving the way for safer drug development. All models from this study are available at https://broad.io/DICTrank_Predictor.


Assuntos
Cardiotoxicidade , Desenvolvimento de Medicamentos , Humanos , Cardiotoxicidade/etiologia , Cardiotoxicidade/metabolismo
6.
Nat Methods ; 17(2): 241, 2020 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-31969730

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

7.
PLoS Comput Biol ; 18(2): e1009888, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35213530

RESUMO

A variational autoencoder (VAE) is a machine learning algorithm, useful for generating a compressed and interpretable latent space. These representations have been generated from various biomedical data types and can be used to produce realistic-looking simulated data. However, standard vanilla VAEs suffer from entangled and uninformative latent spaces, which can be mitigated using other types of VAEs such as ß-VAE and MMD-VAE. In this project, we evaluated the ability of VAEs to learn cell morphology characteristics derived from cell images. We trained and evaluated these three VAE variants-Vanilla VAE, ß-VAE, and MMD-VAE-on cell morphology readouts and explored the generative capacity of each model to predict compound polypharmacology (the interactions of a drug with more than one target) using an approach called latent space arithmetic (LSA). To test the generalizability of the strategy, we also trained these VAEs using gene expression data of the same compound perturbations and found that gene expression provides complementary information. We found that the ß-VAE and MMD-VAE disentangle morphology signals and reveal a more interpretable latent space. We reliably simulated morphology and gene expression readouts from certain compounds thereby predicting cell states perturbed with compounds of known polypharmacology. Inferring cell state for specific drug mechanisms could aid researchers in developing and identifying targeted therapeutics and categorizing off-target effects in the future.


Assuntos
Aprendizado de Máquina , Polifarmacologia , Algoritmos
8.
Proc Natl Acad Sci U S A ; 117(35): 21381-21390, 2020 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-32839303

RESUMO

Stored red blood cells (RBCs) are needed for life-saving blood transfusions, but they undergo continuous degradation. RBC storage lesions are often assessed by microscopic examination or biochemical and biophysical assays, which are complex, time-consuming, and destructive to fragile cells. Here we demonstrate the use of label-free imaging flow cytometry and deep learning to characterize RBC lesions. Using brightfield images, a trained neural network achieved 76.7% agreement with experts in classifying seven clinically relevant RBC morphologies associated with storage lesions, comparable to 82.5% agreement between different experts. Given that human observation and classification may not optimally discern RBC quality, we went further and eliminated subjective human annotation in the training step by training a weakly supervised neural network using only storage duration times. The feature space extracted by this network revealed a chronological progression of morphological changes that better predicted blood quality, as measured by physiological hemolytic assay readouts, than the conventional expert-assessed morphology classification system. With further training and clinical testing across multiple sites, protocols, and instruments, deep learning and label-free imaging flow cytometry might be used to routinely and objectively assess RBC storage lesions. This would automate a complex protocol, minimize laboratory sample handling and preparation, and reduce the impact of procedural errors and discrepancies between facilities and blood donors. The chronology-based machine-learning approach may also improve upon humans' assessment of morphological changes in other biomedically important progressions, such as differentiation and metastasis.


Assuntos
Bancos de Sangue , Aprendizado Profundo , Eritrócitos/citologia , Humanos
9.
Nat Methods ; 16(12): 1247-1253, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31636459

RESUMO

Segmenting the nuclei of cells in microscopy images is often the first step in the quantitative analysis of imaging data for biological and biomedical applications. Many bioimage analysis tools can segment nuclei in images but need to be selected and configured for every experiment. The 2018 Data Science Bowl attracted 3,891 teams worldwide to make the first attempt to build a segmentation method that could be applied to any two-dimensional light microscopy image of stained nuclei across experiments, with no human interaction. Top participants in the challenge succeeded in this task, developing deep-learning-based models that identified cell nuclei across many image types and experimental conditions without the need to manually adjust segmentation parameters. This represents an important step toward configuration-free bioimage analysis software tools.


Assuntos
Núcleo Celular/ultraestrutura , Processamento de Imagem Assistida por Computador/métodos , Ciência de Dados , Humanos , Microscopia de Fluorescência/métodos
10.
PLoS Biol ; 16(7): e2005970, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29969450

RESUMO

CellProfiler has enabled the scientific research community to create flexible, modular image analysis pipelines since its release in 2005. Here, we describe CellProfiler 3.0, a new version of the software supporting both whole-volume and plane-wise analysis of three-dimensional (3D) image stacks, increasingly common in biomedical research. CellProfiler's infrastructure is greatly improved, and we provide a protocol for cloud-based, large-scale image processing. New plugins enable running pretrained deep learning models on images. Designed by and for biologists, CellProfiler equips researchers with powerful computational tools via a well-documented user interface, empowering biologists in all fields to create quantitative, reproducible image analysis workflows.


Assuntos
Processamento de Imagem Assistida por Computador , Software , Animais , Núcleo Celular/metabolismo , DNA/metabolismo , Aprendizado Profundo , Humanos , Imageamento Tridimensional , Células-Tronco Pluripotentes Induzidas/citologia , Células-Tronco Pluripotentes Induzidas/metabolismo , Camundongos , RNA Mensageiro/genética , RNA Mensageiro/metabolismo
11.
Nat Methods ; 14(9): 849-863, 2017 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-28858338

RESUMO

Image-based cell profiling is a high-throughput strategy for the quantification of phenotypic differences among a variety of cell populations. It paves the way to studying biological systems on a large scale by using chemical and genetic perturbations. The general workflow for this technology involves image acquisition with high-throughput microscopy systems and subsequent image processing and analysis. Here, we introduce the steps required to create high-quality image-based (i.e., morphological) profiles from a collection of microscopy images. We recommend techniques that have proven useful in each stage of the data analysis process, on the basis of the experience of 20 laboratories worldwide that are refining their image-based cell-profiling methodologies in pursuit of biological discovery. The recommended techniques cover alternatives that may suit various biological goals, experimental designs, and laboratories' preferences.


Assuntos
Rastreamento de Células/métodos , Ensaios de Triagem em Larga Escala/métodos , Interpretação de Imagem Assistida por Computador/métodos , Microscopia/métodos , Reconhecimento Automatizado de Padrão/métodos , Análise Serial de Tecidos/métodos , Algoritmos , Animais , Interpretação Estatística de Dados , Humanos , Aprendizado de Máquina
12.
Cytometry A ; 97(4): 407-414, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32091180

RESUMO

Acute lymphoblastic leukemia (ALL) is the most common childhood cancer. While there are a number of well-recognized prognostic biomarkers at diagnosis, the most powerful independent prognostic factor is the response of the leukemia to induction chemotherapy (Campana and Pui: Blood 129 (2017) 1913-1918). Given the potential for machine learning to improve precision medicine, we tested its capacity to monitor disease in children undergoing ALL treatment. Diagnostic and on-treatment bone marrow samples were labeled with an ALL-discriminating antibody combination and analyzed by imaging flow cytometry. Ignoring the fluorescent markers and using only features extracted from bright-field and dark-field cell images, a deep learning model was able to identify ALL cells at an accuracy of >88%. This antibody-free, single cell method is cheap, quick, and could be adapted to a simple, laser-free cytometer to allow automated, point-of-care testing to detect slow early responders. Adaptation to other types of leukemia is feasible, which would revolutionize residual disease monitoring. © 2020 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.


Assuntos
Leucemia , Aprendizado de Máquina , Criança , Computadores , Citometria de Fluxo , Humanos , Leucemia/diagnóstico , Neoplasia Residual
13.
PLoS Pathog ; 13(5): e1006363, 2017 May.
Artigo em Inglês | MEDLINE | ID: mdl-28505176

RESUMO

A key to the pathogenic success of Mycobacterium tuberculosis (Mtb), the causative agent of tuberculosis, is the capacity to survive within host macrophages. Although several factors required for this survival have been identified, a comprehensive knowledge of such factors and how they work together to manipulate the host environment to benefit bacterial survival are not well understood. To systematically identify Mtb factors required for intracellular growth, we screened an arrayed, non-redundant Mtb transposon mutant library by high-content imaging to characterize the mutant-macrophage interaction. Based on a combination of imaging features, we identified mutants impaired for intracellular survival. We then characterized the phenotype of infection with each mutant by profiling the induced macrophage cytokine response. Taking a systems-level approach to understanding the biology of identified mutants, we performed a multiparametric analysis combining pathogen and host phenotypes to predict functional relationships between mutants based on clustering. Strikingly, mutants defective in two well-known virulence factors, the ESX-1 protein secretion system and the virulence lipid phthiocerol dimycocerosate (PDIM), clustered together. Building upon the shared phenotype of loss of the macrophage type I interferon (IFN) response to infection, we found that PDIM production and export are required for coordinated secretion of ESX-1-substrates, for phagosomal permeabilization, and for downstream induction of the type I IFN response. Multiparametric clustering also identified two novel genes that are required for PDIM production and induction of the type I IFN response. Thus, multiparametric analysis combining host and pathogen infection phenotypes can be used to identify novel functional relationships between genes that play a role in infection.


Assuntos
Antígenos de Bactérias/genética , Proteínas de Bactérias/genética , Mycobacterium tuberculosis/patogenicidade , Fagossomos/microbiologia , Tuberculose/microbiologia , Animais , Antígenos de Bactérias/metabolismo , Proteínas de Bactérias/metabolismo , Linhagem Celular , Citocinas/imunologia , Citocinas/metabolismo , Biblioteca Gênica , Interações Hospedeiro-Patógeno , Macrófagos/imunologia , Macrófagos/microbiologia , Camundongos , Mutação , Mycobacterium tuberculosis/genética , Mycobacterium tuberculosis/crescimento & desenvolvimento , Mycobacterium tuberculosis/imunologia , Fagossomos/imunologia , Fenótipo , Tuberculose/imunologia , Virulência
14.
Cytometry A ; 95(9): 952-965, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31313519

RESUMO

Identifying nuclei is often a critical first step in analyzing microscopy images of cells and classical image processing algorithms are most commonly used for this task. Recent developments in deep learning can yield superior accuracy, but typical evaluation metrics for nucleus segmentation do not satisfactorily capture error modes that are relevant in cellular images. We present an evaluation framework to measure accuracy, types of errors, and computational efficiency; and use it to compare deep learning strategies and classical approaches. We publicly release a set of 23,165 manually annotated nuclei and source code to reproduce experiments and run the proposed evaluation methodology. Our evaluation framework shows that deep learning improves accuracy and can reduce the number of biologically relevant errors by half. © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.


Assuntos
Núcleo Celular , Processamento de Imagem Assistida por Computador/métodos , Microscopia de Fluorescência/métodos , Linhagem Celular , Confiabilidade dos Dados , Aprendizado Profundo , Fluorescência , Humanos , Citometria por Imagem/métodos
15.
Bioinformatics ; 32(20): 3210-3212, 2016 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-27354701

RESUMO

CellProfiler Analyst allows the exploration and visualization of image-based data, together with the classification of complex biological phenotypes, via an interactive user interface designed for biologists and data scientists. CellProfiler Analyst 2.0, completely rewritten in Python, builds on these features and adds enhanced supervised machine learning capabilities (Classifier), as well as visualization tools to overview an experiment (Plate Viewer and Image Gallery). AVAILABILITY AND IMPLEMENTATION: CellProfiler Analyst 2.0 is free and open source, available at http://www.cellprofiler.org and from GitHub (https://github.com/CellProfiler/CellProfiler-Analyst) under the BSD license. It is available as a packaged application for Mac OS X and Microsoft Windows and can be compiled for Linux. We implemented an automatic build process that supports nightly updates and regular release cycles for the software. CONTACT: anne@broadinstitute.orgSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Fenótipo , Software , Animais , Conjuntos de Dados como Assunto , Humanos
16.
BMC Bioinformatics ; 17: 51, 2016 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-26817459

RESUMO

BACKGROUND: Automated classification using machine learning often relies on features derived from segmenting individual objects, which can be difficult to automate. WND-CHARM is a previously developed classification algorithm in which features are computed on the whole image, thereby avoiding the need for segmentation. The algorithm obtained encouraging results but requires considerable computational expertise to execute. Furthermore, some benchmark sets have been shown to be subject to confounding artifacts that overestimate classification accuracy. RESULTS: We developed CP-CHARM, a user-friendly image-based classification algorithm inspired by WND-CHARM in (i) its ability to capture a wide variety of morphological aspects of the image, and (ii) the absence of requirement for segmentation. In order to make such an image-based classification method easily accessible to the biological research community, CP-CHARM relies on the widely-used open-source image analysis software CellProfiler for feature extraction. To validate our method, we reproduced WND-CHARM's results and ensured that CP-CHARM obtained comparable performance. We then successfully applied our approach on cell-based assay data and on tissue images. We designed these new training and test sets to reduce the effect of batch-related artifacts. CONCLUSIONS: The proposed method preserves the strengths of WND-CHARM - it extracts a wide variety of morphological features directly on whole images thereby avoiding the need for cell segmentation, but additionally, it makes the methods easily accessible for researchers without computational expertise by implementing them as a CellProfiler pipeline. It has been demonstrated to perform well on a wide range of bioimage classification problems, including on new datasets that have been carefully selected and annotated to minimize batch effects. This provides for the first time a realistic and reliable assessment of the whole image classification strategy.


Assuntos
Algoritmos , Células/citologia , Biologia Computacional/métodos , Interpretação de Imagem Assistida por Computador/métodos , Microscopia de Fluorescência/métodos , Artefatos , Humanos , Reconhecimento Automatizado de Padrão/métodos , Software
17.
World J Microbiol Biotechnol ; 31(6): 951-8, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25838197

RESUMO

Conditions required to enhance a particular species efficient in degradative capabilities is very useful in wastewater treatment processes. Paracoccus sp. is known to efficiently reduce nitrogen oxides (NOx) due to the branched denitrification pathway. Individual-based simulations showed that the relative fitness of Paracoccus sp. to Pseudomonas sp. increased significantly with nitrate levels above 5 mM. Spatial structure of the biofilm showed substantially less nitrite levels in the areas of Paracoccus sp. dominance. The simulation was validated in a laboratory reactor harboring biofilm community by fluorescent in situ hybridization, which showed that increasing nitrate levels enhanced the abundance of Paracoccus sp. Different levels of NOx did not display any significant effect on biofilm formation of Paracoccus sp., unlike several other bacteria. This study shows that the attribute of Paracoccus sp. to tolerate and efficiently reduce NOx is conferring a fitness payoff to the organism at high concentrations of nitrate in a multispecies biofilm community.


Assuntos
Biofilmes/efeitos dos fármacos , Biofilmes/crescimento & desenvolvimento , Nitratos/metabolismo , Óxidos de Nitrogênio/metabolismo , Paracoccus/isolamento & purificação , Paracoccus/fisiologia , Consórcios Microbianos/efeitos dos fármacos , Oxirredução , Paracoccus/metabolismo
18.
bioRxiv ; 2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-37745478

RESUMO

High-throughput image-based profiling platforms are powerful technologies capable of collecting data from billions of cells exposed to thousands of perturbations in a time- and cost-effective manner. Therefore, image-based profiling data has been increasingly used for diverse biological applications, such as predicting drug mechanism of action or gene function. However, batch effects pose severe limitations to community-wide efforts to integrate and interpret image-based profiling data collected across different laboratories and equipment. To address this problem, we benchmarked seven high-performing scRNA-seq batch correction techniques, representing diverse approaches, using a newly released Cell Painting dataset, the largest publicly accessible image-based dataset. We focused on five different scenarios with varying complexity, and we found that Harmony, a mixture-model based method, consistently outperformed the other tested methods. Our proposed framework, benchmark, and metrics can additionally be used to assess new batch correction methods in the future. Overall, this work paves the way for improvements that allow the community to make best use of public Cell Painting data for scientific discovery.

19.
Spectrochim Acta A Mol Biomol Spectrosc ; 312: 124049, 2024 May 05.
Artigo em Inglês | MEDLINE | ID: mdl-38394884

RESUMO

Gasoline and diesel are the main petroleum products used for road transportation in India. Due to this reason, adulteration can be done by fraudsters using different miscible substances such as kerosene, turpentine, thinner, ethanol etc. In this work, Fourier transform infrared spectroscopy (FTIR) coupled with principal component analysis (PCA) and partial least square regression (PLSR) were used to investigate adulteration in petroleum products and to design an adulterant profiling. ATR-FTIR has an advantage over other traditional methods as it is less time-consuming and needs no extraction procedure. The samples used for the study were prepared by adding different volume of adulterant (0-20%) to standard diesel and gasoline samples. According to the results obtained from this study, ATR-FTIR spectroscopy proved to be the most comprehensible method for the detection of adulteration in diesel and gasoline fuels. Furthermore, the use of FTIR spectroscopy combined with PCA got best segregation of adulterated samples. The predictive model achieved a root mean square error of prediction of 0.477% and 0.592% for diesel and gasoline respectively.

20.
ArXiv ; 2024 Jun 22.
Artigo em Inglês | MEDLINE | ID: mdl-38947938

RESUMO

Predicting drug efficacy and safety in vivo requires information on biological responses (e.g., cell morphology and gene expression) to small molecule perturbations. However, current molecular representation learning methods do not provide a comprehensive view of cell states under these perturbations and struggle to remove noise, hindering model generalization. We introduce the Information Alignment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells. We integrate molecules and cellular response data as nodes into a context graph, connecting them with weighted edges based on chemical, biological, and computational criteria. For each molecule in a training batch, InfoAlign optimizes the encoder's latent representation with a minimality objective to discard redundant structural information. A sufficiency objective decodes the representation to align with different feature spaces from the molecule's neighborhood in the context graph. We demonstrate that the proposed sufficiency objective for alignment is tighter than existing encoder-based contrastive methods. Empirically, we validate representations from InfoAlign in two downstream tasks: molecular property prediction against up to 19 baseline methods across four datasets, plus zero-shot molecule-morphology matching.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA