Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 93
Filtrar
1.
Drug Discov Today ; : 104024, 2024 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-38759948

RESUMO

3D structure-based drug design (SBDD) is considered a challenging and rational way for innovative drug discovery. Geometric deep learning is a promising approach that solves the accurate model training of 3D SBDD through building neural network models to learn non-Euclidean data, such as 3D molecular graphs and manifold data. Here, we summarize geometric deep learning methods and applications that contain 3D molecular representations, equivariant graph neural networks (EGNNs), and six generative model methods [diffusion model, flow-based model, generative adversarial networks (GANs), variational autoencoder (VAE), autoregressive models, and energy-based models]. Our review provides insights into geometric deep learning methods and advanced applications of 3D SBDD that will be of relevance for the drug discovery community.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38691432

RESUMO

Learning with noisy labels (LNL) has attracted significant attention from the research community. Many recent LNL methods rely on the assumption that clean samples tend to have a "small loss." However, this assumption often fails to generalize to some real-world cases with imbalanced subpopulations, that is, training subpopulations that vary in sample size or recognition difficulty. Therefore, recent LNL methods face the risk of misclassifying those "informative" samples (e.g., hard samples or samples in the tail subpopulations) into noisy samples, leading to poor generalization performance. To address this issue, we propose a novel LNL method to deal with noisy labels and imbalanced subpopulations simultaneously. It first leverages sample correlation to estimate samples' clean probabilities for label correction and then utilizes corrected labels for distributionally robust optimization (DRO) to further improve the robustness. Specifically, in contrast to previous works using classification loss as the selection criterion, we introduce a feature-based metric that takes the sample correlation into account for estimating samples' clean probabilities. Then, we refurbish the noisy labels using the estimated clean probabilities and the pseudo-labels from the model's predictions. With refurbished labels, we use DRO to train the model to be robust to subpopulation imbalance. Extensive experiments on a wide range of benchmarks demonstrate that our technique can consistently improve state-of-the-art (SOTA) robust learning paradigms against noisy labels, especially when encountering imbalanced subpopulations. We provide our code in https://github.com/chenmc1996/LNL-IS.

3.
Neural Netw ; 176: 106328, 2024 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-38688067

RESUMO

Given a graph G, the network collapse problem (NCP) selects a vertex subset S of minimum cardinality from G such that the difference in the values of a given measure function f(G)-f(G∖S) is greater than a predefined collapse threshold. Many graph analytic applications can be formulated as NCPs with different measure functions, which often pose a significant challenge due to their NP-hard nature. As a result, traditional greedy algorithms, which select the vertex with the highest reward at each step, may not effectively find the optimal solution. In addition, existing learning-based algorithms do not have the ability to model the sequence of actions taken during the decision-making process, making it difficult to capture the combinatorial effect of selected vertices on the final solution. This limits the performance of learning-based approaches in non-submodular NCPs. To address these limitations, we propose a unified framework called DT-NC, which adapts the Decision Transformer to the Network Collapse problems. DT-NC takes into account the historical actions taken during the decision-making process and effectively captures the combinatorial effect of selected vertices. The ability of DT-NC to model the dependency among selected vertices allows it to address the difficulties caused by the non-submodular property of measure functions in some NCPs effectively. Through extensive experiments on various NCPs and graphs of different sizes, we demonstrate that DT-NC outperforms the state-of-the-art methods and exhibits excellent transferability and generalizability.

4.
Bioinformatics ; 40(3)2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38426338

RESUMO

MOTIVATION: Retrosynthesis is a critical task in drug discovery, aimed at finding a viable pathway for synthesizing a given target molecule. Many existing approaches frame this task as a graph-generating problem. Specifically, these methods first identify the reaction center, and break a targeted molecule accordingly to generate the synthons. Reactants are generated by either adding atoms sequentially to synthon graphs or by directly adding appropriate leaving groups. However, both of these strategies have limitations. Adding atoms results in a long prediction sequence that increases the complexity of generation, while adding leaving groups only considers those in the training set, which leads to poor generalization. RESULTS: In this paper, we propose a novel end-to-end graph generation model for retrosynthesis prediction, which sequentially identifies the reaction center, generates the synthons, and adds motifs to the synthons to generate reactants. Given that chemically meaningful motifs fall between the size of atoms and leaving groups, our model achieves lower prediction complexity than adding atoms and demonstrates superior performance than adding leaving groups. We evaluate our proposed model on a benchmark dataset and show that it significantly outperforms previous state-of-the-art models. Furthermore, we conduct ablation studies to investigate the contribution of each component of our proposed model to the overall performance on benchmark datasets. Experiment results demonstrate the effectiveness of our model in predicting retrosynthesis pathways and suggest its potential as a valuable tool in drug discovery. AVAILABILITY AND IMPLEMENTATION: All code and data are available at https://github.com/szu-ljh2020/MARS.


Assuntos
Benchmarking , Descoberta de Drogas , Fases de Leitura
5.
J Comput Biol ; 31(3): 213-228, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38531049

RESUMO

Molecular prediction tasks normally demand a series of professional experiments to label the target molecule, which suffers from the limited labeled data problem. One of the semisupervised learning paradigms, known as self-training, utilizes both labeled and unlabeled data. Specifically, a teacher model is trained using labeled data and produces pseudo labels for unlabeled data. These labeled and pseudo-labeled data are then jointly used to train a student model. However, the pseudo labels generated from the teacher model are generally not sufficiently accurate. Thus, we propose a robust self-training strategy by exploring robust loss function to handle such noisy labels in two paradigms, that is, generic and adaptive. We have conducted experiments on three molecular biology prediction tasks with four backbone models to gradually evaluate the performance of the proposed robust self-training strategy. The results demonstrate that the proposed method enhances prediction performance across all tasks, notably within molecular regression tasks, where there has been an average enhancement of 41.5%. Furthermore, the visualization analysis confirms the superiority of our method. Our proposed robust self-training is a simple yet effective strategy that efficiently improves molecular biology prediction performance. It tackles the labeled data insufficient issue in molecular biology by taking advantage of both labeled and unlabeled data. Moreover, it can be easily embedded with any prediction task, which serves as a universal approach for the bioinformatics community.


Assuntos
Biologia Computacional , Biologia Molecular , Humanos , Aprendizado de Máquina Supervisionado
6.
bioRxiv ; 2023 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-38105939

RESUMO

Profiling the binding of T cell receptors (TCRs) of T cells to antigenic peptides presented by MHC proteins is one of the most important unsolved problems in modern immunology. Experimental methods to probe TCR-antigen interactions are slow, labor-intensive, costly, and yield moderate throughput. To address this problem, we developed pMTnet-omni, an Artificial Intelligence (AI) system based on hybrid protein sequence and structure information, to predict the pairing of TCRs of αß T cells with peptide-MHC complexes (pMHCs). pMTnet-omni is capable of handling peptides presented by both class I and II pMHCs, and capable of handling both human and mouse TCR-pMHC pairs, through information sharing enabled this hybrid design. pMTnet-omni achieves a high overall Area Under the Curve of Receiver Operator Characteristics (AUROC) of 0.888, which surpasses competing tools by a large margin. We showed that pMTnet-omni can distinguish binding affinity of TCRs with similar sequences. Across a range of datasets from various biological contexts, pMTnet-omni characterized the longitudinal evolution and spatial heterogeneity of TCR-pMHC interactions and their functional impact. We successfully developed a biomarker based on pMTnet-omni for predicting immune-related adverse events of immune checkpoint inhibitor (ICI) treatment in a cohort of 57 ICI-treated patients. pMTnet-omni represents a major advance towards developing a clinically usable AI system for TCR-pMHC pairing prediction that can aid the design and implementation of TCR-based immunotherapeutics.

7.
Biomed Phys Eng Express ; 9(6)2023 09 12.
Artigo em Inglês | MEDLINE | ID: mdl-37604139

RESUMO

Electrocardiogram (ECG)-gated multi-phase computed tomography angiography (MP-CTA) is frequently used for diagnosis of coronary artery disease. Radiation dose may become a potential concern as the scan needs to cover a wide range of cardiac phases during a heart cycle. A common method to reduce radiation is to limit the full-dose acquisition to a predefined range of phases while reducing the radiation dose for the rest. Our goal in this study is to develop a spatiotemporal deep learning method to enhance the quality of low-dose CTA images at phases acquired at reduced radiation dose. Recently, we demonstrated that a deep learning method, Cycle-Consistent generative adversarial networks (CycleGAN), could effectively denoise low-dose CT images through spatial image translation without labeled image pairs in both low-dose and full-dose image domains. As CycleGAN does not utilize the temporal information in its denoising mechanism, we propose to use RecycleGAN, which could translate a series of images ordered in time from the low-dose domain to the full-dose domain through an additional recurrent network. To evaluate RecycleGAN, we use the XCAT phantom program, a highly realistic simulation tool based on real patient data, to generate MP-CTA image sequences for 18 patients (14 for training, 2 for validation and 2 for test). Our simulation results show that RecycleGAN can achieve better denoising performance than CycleGAN based on both visual inspection and quantitative metrics. We further demonstrate the superior denoising performance of RecycleGAN using clinical MP-CTA images from 50 patients.


Assuntos
Angiografia por Tomografia Computadorizada , Tomografia Computadorizada por Raios X , Humanos , Coração/diagnóstico por imagem , Angiografia , Benchmarking
8.
Chem Res Toxicol ; 36(8): 1206-1226, 2023 08 21.
Artigo em Inglês | MEDLINE | ID: mdl-37562046

RESUMO

The development of new drugs is time-consuming and expensive, and as such, accurately predicting the potential toxicity of a drug candidate is crucial in ensuring its safety and efficacy. Recently, deep graph learning has become prevalent in this field due to its computational power and cost efficiency. Many novel deep graph learning methods aid toxicity prediction and further prompt drug development. This review aims to connect fundamental knowledge with burgeoning deep graph learning methods. We first summarize the essential components of deep graph learning models for toxicity prediction, including molecular descriptors, molecular representations, evaluation metrics, validation methods, and data sets. Furthermore, based on various graph-related representations of molecules, we introduce several representative studies and methods for toxicity prediction from the perspective of GNN architectures and graph pretrained models. Compared to other types of models, deep graph models not only advance in higher accuracy and efficiency but also provide more intuitive insights, which is significant in the development of model interpretation and generalization ability. The graph pretrained models are emerging as they can extract prominent features from large-scale unlabeled molecular graph data and improve the performance of downstream toxicity prediction tasks. We hope this survey can serve as a handbook for individuals interested in exploring deep graph learning for toxicity prediction.


Assuntos
Desenvolvimento de Medicamentos , Preparações Farmacêuticas , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos
9.
Artigo em Inglês | MEDLINE | ID: mdl-37494169

RESUMO

It has been discovered that graph convolutional networks (GCNs) encounter a remarkable drop in performance when multiple layers are piled up. The main factor that accounts for why deep GCNs fail lies in oversmoothing, which isolates the network output from the input with the increase of network depth, weakening expressivity and trainability. In this article, we start by investigating refined measures upon DropEdge-an existing simple yet effective technique to relieve oversmoothing. We term our method as DropEdge ++ for its two structure-aware samplers in contrast to DropEdge: layer-dependent (LD) sampler and feature-dependent (FD) sampler. Regarding the LD sampler, we interestingly find that increasingly sampling edges from the bottom layer yields superior performance than the decreasing counterpart as well as DropEdge. We theoretically reveal this phenomenon with mean-edge-number (MEN), a metric closely related to oversmoothing. For the FD sampler, we associate the edge sampling probability with the feature similarity of node pairs and prove that it further correlates the convergence subspace of the output layer with the input features. Extensive experiments on several node classification benchmarks, including both full-and semi-supervised tasks, illustrate the efficacy of DropEdge ++ and its compatibility with a variety of backbones by achieving generally better performance over DropEdge and the no-drop version.

10.
Front Big Data ; 6: 1108659, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36936996

RESUMO

The accurate segmentation of nuclei is crucial for cancer diagnosis and further clinical treatments. To successfully train a nuclei segmentation network in a fully-supervised manner for a particular type of organ or cancer, we need the dataset with ground-truth annotations. However, such well-annotated nuclei segmentation datasets are highly rare, and manually labeling an unannotated dataset is an expensive, time-consuming, and tedious process. Consequently, we require to discover a way for training the nuclei segmentation network with unlabeled dataset. In this paper, we propose a model named NuSegUDA for nuclei segmentation on the unlabeled dataset (target domain). It is achieved by applying Unsupervised Domain Adaptation (UDA) technique with the help of another labeled dataset (source domain) that may come from different type of organ, cancer, or source. We apply UDA technique at both of feature space and output space. We additionally utilize a reconstruction network and incorporate adversarial learning into it so that the source-domain images can be accurately translated to the target-domain for further training of the segmentation network. We validate our proposed NuSegUDA on two public nuclei segmentation datasets, and obtain significant improvement as compared with the baseline methods. Extensive experiments also verify the contribution of newly proposed image reconstruction adversarial loss, and target-translated source supervised loss to the performance boost of NuSegUDA. Finally, considering the scenario when we have a small number of annotations available from the target domain, we extend our work and propose NuSegSSDA, a Semi-Supervised Domain Adaptation (SSDA) based approach.

11.
Med Image Anal ; 84: 102705, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36525843

RESUMO

Fine-grained nucleus classification is challenging because of the high inter-class similarity and intra-class variability. Therefore, a large number of labeled data is required for training effective nucleus classification models. However, it is challenging to label a large-scale nucleus classification dataset comparable to ImageNet in natural images, considering that high-quality nucleus labeling requires specific domain knowledge. In addition, the existing publicly available datasets are often inconsistently labeled with divergent labeling criteria. Due to this inconsistency, conventional models have to be trained on each dataset separately and work independently to infer their own classification results, limiting their classification performance. To fully utilize all annotated datasets, we formulate the nucleus classification task as a multi-label problem with missing labels to utilize all datasets in a unified framework. Specifically, we merge all datasets and combine their labels as multiple labels. Thus, each data has one ground-truth label and several missing labels. We devise a base classification module that is trained using all data but sparsely supervised by the ground-truth labels only. We then exploit the correlation among different label sets by a label correlation module. By doing so, we can have two trained basic modules and further cross-train them with both ground-truth labels and pseudo labels for the missing ones. Importantly, data without any ground-truth labels can also be involved in our framework, as we can regard them as data with all labels missing and generate the corresponding pseudo labels. We carefully re-organized multiple publicly available nucleus classification datasets, converted them into a uniform format, and tested the proposed framework on them. Experimental results show substantial improvement compared to the state-of-the-art methods. The code and data are available at https://w-h-zhang.github.io/projects/dataset_merging/dataset_merging.html.


Assuntos
Núcleo Celular , Humanos
12.
Med Image Anal ; 84: 102703, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36481608

RESUMO

Mitosis counting of biopsies is an important biomarker for breast cancer patients, which supports disease prognostication and treatment planning. Developing a robust mitotic cell detection model is highly challenging due to its complex growth pattern and high similarities with non-mitotic cells. Most mitosis detection algorithms have poor generalizability across image domains and lack reproducibility and validation in multicenter settings. To overcome these issues, we propose a generalizable and robust mitosis detection algorithm (called FMDet), which is independently tested on multicenter breast histopathological images. To capture more refined morphological features of cells, we convert the object detection task as a semantic segmentation problem. The pixel-level annotations for mitotic nuclei are obtained by taking the intersection of the masks generated from a well-trained nuclear segmentation model and the bounding boxes provided by the MIDOG 2021 challenge. In our segmentation framework, a robust feature extractor is developed to capture the appearance variations of mitotic cells, which is constructed by integrating a channel-wise multi-scale attention mechanism into a fully convolutional network structure. Benefiting from the fact that the changes in the low-level spectrum do not affect the high-level semantic perception, we employ a Fourier-based data augmentation method to reduce domain discrepancies by exchanging the low-frequency spectrum between two domains. Our FMDet algorithm has been tested in the MIDOG 2021 challenge and ranked first place. Further, our algorithm is also externally validated on four independent datasets for mitosis detection, which exhibits state-of-the-art performance in comparison with previously published results. These results demonstrate that our algorithm has the potential to be deployed as an assistant decision support tool in clinical practice. Our code has been released at https://github.com/Xiyue-Wang/1st-in-MICCAI-MIDOG-2021-challenge.


Assuntos
Aprendizado Profundo , Humanos , Reprodutibilidade dos Testes , Algoritmos , Mama/diagnóstico por imagem , Mitose , Processamento de Imagem Assistida por Computador/métodos
13.
Med Image Anal ; 83: 102645, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36270093

RESUMO

Benefiting from the large-scale archiving of digitized whole-slide images (WSIs), computer-aided diagnosis has been well developed to assist pathologists in decision-making. Content-based WSI retrieval can be a new approach to find highly correlated WSIs in a historically diagnosed WSI archive, which has the potential usages for assisted clinical diagnosis, medical research, and trainee education. During WSI retrieval, it is particularly challenging to encode the semantic content of histopathological images and to measure the similarity between images for interpretable results due to the gigapixel size of WSIs. In this work, we propose a Retrieval with Clustering-guided Contrastive Learning (RetCCL) framework for robust and accurate WSI-level image retrieval, which integrates a novel self-supervised feature learning method and a global ranking and aggregation algorithm for much improved performance. The proposed feature learning method makes use of existing large-scale unlabeled histopathological image data, which helps learn universal features that could be used directly for subsequent WSI retrieval tasks without extra fine-tuning. The proposed WSI retrieval method not only returns a set of WSIs similar to a query WSI, but also highlights patches or sub-regions of each WSI that share high similarity with patches of the query WSI, which helps pathologists interpret the searching results. Our WSI retrieval framework has been evaluated on the tasks of anatomical site retrieval and cancer subtype retrieval using over 22,000 slides, and the performance exceeds other state-of-the-art methods significantly (around 10% for the anatomic site retrieval in terms of average mMV@10). Besides, the patch retrieval using our learned feature representation offers a performance improvement of 24% on the TissueNet dataset in terms of mMV@5 compared with using ImageNet pre-trained features, which further demonstrates the effectiveness of the proposed CCL feature learning method.


Assuntos
Pesquisa Biomédica , Humanos
14.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 722-737, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35104214

RESUMO

The rich content in various real-world networks such as social networks, biological networks, and communication networks provides unprecedented opportunities for unsupervised machine learning on graphs. This paper investigates the fundamental problem of preserving and extracting abundant information from graph-structured data into embedding space without external supervision. To this end, we generalize conventional mutual information computation from vector space to graph domain and present a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graph and hidden representation. Except for standard GMI which considers graph structures from a local perspective, our further proposed GMI++ additionally captures global topological properties by analyzing the co-occurrence relationship of nodes. GMI and its extension exhibit several benefits: First, they are invariant to the isomorphic transformation of input graphs-an inevitable constraint in many existing methods; Second, they can be efficiently estimated and maximized by current mutual information estimation methods; Lastly, our theoretical analysis confirms their correctness and rationality. With the aid of GMI, we develop an unsupervised embedding model and adapt it to the specific anomaly detection task. Extensive experiments indicate that our GMI methods achieve promising performance in various downstream tasks, such as node classification, link prediction, and anomaly detection.

15.
J Comput Biol ; 30(1): 82-94, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-35972373

RESUMO

Molecule generation is the procedure to generate initial novel molecule proposals for molecule design. Molecules are first projected into continuous vectors in chemical latent space, and then, these embedding vectors are decoded into molecules under the variational autoencoder (VAE) framework. The continuous latent space of VAE can be utilized to generate novel molecules with desired chemical properties and further optimize the desired chemical properties of molecules. However, there is a posterior collapse problem with the conventional recurrent neural network-based VAEs for the molecule sequence generation, which deteriorates the generation performance. We investigate the posterior collapse problem and find that the underestimated reconstruction loss is the main factor in the posterior collapse problem in molecule sequence generation. To support our conclusion, we present both analytical and experimental evidence. What is more, we propose an efficient and effective solution to fix the problem and prevent posterior collapse. As a result, our method achieves competitive reconstruction accuracy and validity score on the benchmark data sets.


Assuntos
Benchmarking , Redes Neurais de Computação , Sulfadiazina
16.
Biomolecules ; 12(9)2022 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-36139164

RESUMO

The main target of retrosynthesis is to recursively decompose desired molecules into available building blocks. Existing template-based retrosynthesis methods follow a template selection stereotype and suffer from limited training templates, which prevents them from discovering novel reactions. To overcome this limitation, we propose an innovative retrosynthesis prediction framework that can compose novel templates beyond training templates. As far as we know, this is the first method that uses machine learning to compose reaction templates for retrosynthesis prediction. Besides, we propose an effective reactant candidate scoring model that can capture atom-level transformations, which helps our method outperform previous methods on the USPTO-50K dataset. Experimental results show that our method can produce novel templates for 15 USPTO-50K test reactions that are not covered by training templates. We have released our source implementation.


Assuntos
Técnicas de Química Sintética , Aprendizado de Máquina , Técnicas de Química Sintética/métodos , Modelos Químicos
17.
Sci Rep ; 12(1): 14527, 2022 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-36008541

RESUMO

Computational pathology is a rapidly expanding area for research due to the current global transformation of histopathology through the adoption of digital workflows. Survival prediction of breast cancer patients is an important task that currently depends on histopathology assessment of cancer morphological features, immunohistochemical biomarker expression and patient clinical findings. To facilitate the manual process of survival risk prediction, we developed a computational pathology framework for survival prediction using digitally scanned haematoxylin and eosin-stained tissue microarray images of clinically aggressive triple negative breast cancer. Our results show that the model can produce an average concordance index of 0.616. Our model predictions are analysed for independent prognostic significance in univariate analysis (hazard ratio = 3.12, 95% confidence interval [1.69,5.75], p < 0.005) and multivariate analysis using clinicopathological data (hazard ratio = 2.68, 95% confidence interval [1.44,4.99], p < 0.005). Through qualitative analysis of heatmaps generated from our model, an expert pathologist is able to associate tissue features highlighted in the attention heatmaps of high-risk predictions with morphological features associated with more aggressive behaviour such as low levels of tumour infiltrating lymphocytes, stroma rich tissues and high-grade invasive carcinoma, providing explainability of our method for triple negative breast cancer.


Assuntos
Neoplasias da Mama , Carcinoma , Neoplasias de Mama Triplo Negativas , Neoplasias da Mama/patologia , Carcinoma/patologia , Feminino , Humanos , Linfócitos do Interstício Tumoral/patologia , Prognóstico , Modelos de Riscos Proporcionais , Neoplasias de Mama Triplo Negativas/patologia
18.
IEEE Trans Med Imaging ; 41(12): 3939-3951, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36037453

RESUMO

The classification of nuclei in H&E-stained histopathological images is a fundamental step in the quantitative analysis of digital pathology. Most existing methods employ multi-class classification on the detected nucleus instances, while the annotation scale greatly limits their performance. Moreover, they often downplay the contextual information surrounding nucleus instances that is critical for classification. To explicitly provide contextual information to the classification model, we design a new structured input consisting of a content-rich image patch and a target instance mask. The image patch provides rich contextual information, while the target instance mask indicates the location of the instance to be classified and emphasizes its shape. Benefiting from our structured input format, we propose Structured Triplet for representation learning, a triplet learning framework on unlabelled nucleus instances with customized positive and negative sampling strategies. We pre-train a feature extraction model based on this framework with a large-scale unlabeled dataset, making it possible to train an effective classification model with limited annotated data. We also add two auxiliary branches, namely the attribute learning branch and the conventional self-supervised learning branch, to further improve its performance. As part of this work, we will release a new dataset of H&E-stained pathology images with nucleus instance masks, containing 20,187 patches of size 1024 ×1024 , where each patch comes from a different whole-slide image. The model pre-trained on this dataset with our framework significantly reduces the burden of extensive labeling. We show a substantial improvement in nucleus classification accuracy compared with the state-of-the-art methods.


Assuntos
Núcleo Celular , Núcleo Celular/patologia , Coloração e Rotulagem
19.
Med Image Anal ; 81: 102559, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-35952419

RESUMO

A large-scale and well-annotated dataset is a key factor for the success of deep learning in medical image analysis. However, assembling such large annotations is very challenging, especially for histopathological images with unique characteristics (e.g., gigapixel image size, multiple cancer types, and wide staining variations). To alleviate this issue, self-supervised learning (SSL) could be a promising solution that relies only on unlabeled data to generate informative representations and generalizes well to various downstream tasks even with limited annotations. In this work, we propose a novel SSL strategy called semantically-relevant contrastive learning (SRCL), which compares relevance between instances to mine more positive pairs. Compared to the two views from an instance in traditional contrastive learning, our SRCL aligns multiple positive instances with similar visual concepts, which increases the diversity of positives and then results in more informative representations. We employ a hybrid model (CTransPath) as the backbone, which is designed by integrating a convolutional neural network (CNN) and a multi-scale Swin Transformer architecture. The CTransPath is pretrained on massively unlabeled histopathological images that could serve as a collaborative local-global feature extractor to learn universal feature representations more suitable for tasks in the histopathology image domain. The effectiveness of our SRCL-pretrained CTransPath is investigated on five types of downstream tasks (patch retrieval, patch classification, weakly-supervised whole-slide image classification, mitosis detection, and colorectal adenocarcinoma gland segmentation), covering nine public datasets. The results show that our SRCL-based visual representations not only achieve state-of-the-art performance in each dataset, but are also more robust and transferable than other SSL methods and ImageNet pretraining (both supervised and self-supervised methods). Our code and pretrained model are available at https://github.com/Xiyue-Wang/TransPath.


Assuntos
Mitose , Redes Neurais de Computação , Humanos , Coloração e Rotulagem , Aprendizado de Máquina Supervisionado
20.
J Cheminform ; 14(1): 44, 2022 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-35799215

RESUMO

Blood-brain barrier is a pivotal factor to be considered in the process of central nervous system (CNS) drug development, and it is of great significance to rapidly explore the blood-brain barrier permeability (BBBp) of compounds in silico in early drug discovery process. Here, we focus on whether and how uncertainty estimation methods improve in silico BBBp models. We briefly surveyed the current state of in silico BBBp prediction and uncertainty estimation methods of deep learning models, and curated an independent dataset to determine the reliability of the state-of-the-art algorithms. The results exhibit that, despite the comparable performance on BBBp prediction between graph neural networks-based deep learning models and conventional physicochemical-based machine learning models, the GROVER-BBBp model shows greatly improvement when using uncertainty estimations. In particular, the strategy combined Entropy and MC-dropout can increase the accuracy of distinguishing BBB + from BBB - to above 99% by extracting predictions with high confidence level (uncertainty score < 0.1). Case studies on preclinical/clinical drugs for Alzheimer' s disease and marketed antitumor drugs that verified by literature proved the application value of uncertainty estimation enhanced BBBp prediction model, that may facilitate the drug discovery in the field of CNS diseases and metastatic brain tumors.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA