Búsqueda | BVS Bolivia

1.

GTE: a graph learning framework for prediction of T-cell receptors and epitopes binding specificity.

Jiang, Feng; Guo, Yuzhi; Ma, Hehuan; Na, Saiyang; Zhong, Wenliang; Han, Yi; Wang, Tao; Huang, Junzhou.

Brief Bioinform ; 25(4)2024 May 23.

Artículo en Inglés | MEDLINE | ID: mdl-39007599

RESUMEN

The interaction between T-cell receptors (TCRs) and peptides (epitopes) presented by major histocompatibility complex molecules (MHC) is fundamental to the immune response. Accurate prediction of TCR-epitope interactions is crucial for advancing the understanding of various diseases and their prevention and treatment. Existing methods primarily rely on sequence-based approaches, overlooking the inherent topology structure of TCR-epitope interaction networks. In this study, we present $GTE$, a novel heterogeneous Graph neural network model based on inductive learning to capture the topological structure between TCRs and Epitopes. Furthermore, we address the challenge of constructing negative samples within the graph by proposing a dynamic edge update strategy, enhancing model learning with the nonbinding TCR-epitope pairs. Additionally, to overcome data imbalance, we adapt the Deep AUC Maximization strategy to the graph domain. Extensive experiments are conducted on four public datasets to demonstrate the superiority of exploring underlying topological structures in predicting TCR-epitope interactions, illustrating the benefits of delving into complex molecular networks. The implementation code and data are available at https://github.com/uta-smile/GTE.

Asunto(s)

Receptores de Antígenos de Linfocitos T , Receptores de Antígenos de Linfocitos T/química , Receptores de Antígenos de Linfocitos T/inmunología , Receptores de Antígenos de Linfocitos T/metabolismo , Humanos , Epítopos de Linfocito T/inmunología , Epítopos de Linfocito T/química , Redes Neurales de la Computación , Biología Computacional/métodos , Unión Proteica , Epítopos/química , Epítopos/inmunología , Algoritmos , Programas Informáticos

2.

Cmai: Predicting Antigen-Antibody Interactions from Massive Sequencing Data.

Song, Bing; Wang, Kaiwen; Na, Saiyang; Yao, Jia; Fattah, Farjana J; von Itzstein, Mitchell S; Yang, Donghan M; Liu, Jialiang; Xue, Yaming; Liang, Chaoying; Guo, Yuzhi; Raman, Indu; Zhu, Chengsong; Dowell, Jonathan E; Homsi, Jade; Rashdan, Sawsan; Yang, Shengjie; Gwin, Mary E; Hsiehchen, David; Gloria-McCutchen, Yvonne; Raj, Prithvi; Bai, Xiaochen; Wang, Jun; Conejo-Garcia, Jose; Xie, Yang; Gerber, David E; Huang, Junzhou; Wang, Tao.

bioRxiv ; 2024 Jul 12.

Artículo en Inglés | MEDLINE | ID: mdl-39005456

RESUMEN

The interaction between antigens and antibodies (B cell receptors, BCRs) is the key step underlying the function of the humoral immune system in various biological contexts. The capability to profile the landscape of antigen-binding affinity of a vast number of BCRs will provide a powerful tool to reveal novel insights at unprecedented levels and will yield powerful tools for translational development. However, current experimental approaches for profiling antibody-antigen interactions are costly and time-consuming, and can only achieve low-to-mid throughput. On the other hand, bioinformatics tools in the field of antibody informatics mostly focus on optimization of antibodies given known binding antigens, which is a very different research question and of limited scope. In this work, we developed an innovative Artificial Intelligence tool, Cmai, to address the prediction of the binding between antibodies and antigens that can be scaled to high-throughput sequencing data. Cmai achieved an AUROC of 0.91 in our validation cohort. We devised a biomarker metric based on the output from Cmai applied to high-throughput BCR sequencing data. We found that, during immune-related adverse events (irAEs) caused by immune-checkpoint inhibitor (ICI) treatment, the humoral immunity is preferentially responsive to intracellular antigens from the organs affected by the irAEs. In contrast, extracellular antigens on malignant tumor cells are inducing B cell infiltrations, and the infiltrating B cells have a greater tendency to co-localize with tumor cells expressing these antigens. We further found that the abundance of tumor antigen-targeting antibodies is predictive of ICI treatment response. Overall, Cmai and our biomarker approach filled in a gap that is not addressed by current antibody optimization works nor works such as AlphaFold3 that predict the structures of complexes of proteins that are known to bind.

3.

HiCervix: An Extensive Hierarchical Dataset and Benchmark for Cervical Cytology Classification.

Cai, De; Chen, Jie; Zhao, Junhan; Xue, Yuan; Yang, Sen; Yuan, Wei; Feng, Min; Weng, Haiyan; Liu, Shuguang; Peng, Yulong; Zhu, Junyou; Wang, Kanran; Jackson, Christopher; Tang, Hongping; Huang, Junzhou; Wang, Xiyue.

IEEE Trans Med Imaging ; PP2024 Jun 26.

Artículo en Inglés | MEDLINE | ID: mdl-38923481

RESUMEN

Cervical cytology is a critical screening strategy for early detection of pre-cancerous and cancerous cervical lesions. The challenge lies in accurately classifying various cervical cytology cell types. Existing automated cervical cytology methods are primarily trained on databases covering a narrow range of coarse-grained cell types, which fail to provide a comprehensive and detailed performance analysis that accurately represents real-world cytopathology conditions. To overcome these limitations, we introduce HiCervix, the most extensive, multi-center cervical cytology dataset currently available to the public. HiCervix includes 40,229 cervical cells from 4,496 whole slide images, categorized into 29 annotated classes. These classes are organized within a three-level hierarchical tree to capture fine-grained subtype information. To exploit the semantic correlation inherent in this hierarchical tree, we propose HierSwin, a hierarchical vision transformer-based classification network. HierSwin serves as a benchmark for detailed feature learning in both coarse-level and fine-level cervical cancer classification tasks. In our comprehensive experiments, HierSwin demonstrated remarkable performance, achieving 92.08% accuracy for coarse-level classification and 82.93% accuracy averaged across all three levels. When compared to board-certified cytopathologists, HierSwin achieved high classification performance (0.8293 versus 0.7359 averaged accuracy), highlighting its potential for clinical applications. This newly released HiCervix dataset, along with our benchmark HierSwin method, is poised to make a substantial impact on the advancement of deep learning algorithms for rapid cervical cancer screening and greatly improve cancer prevention and patient outcomes in real-world clinical settings.

4.

Learning With Noisy Labels Over Imbalanced Subpopulations.

Chen, Mingcai; Zhao, Yu; He, Bing; Han, Zongbo; Huang, Junzhou; Wu, Bingzhe; Yao, Jianhua.

IEEE Trans Neural Netw Learn Syst ; PP2024 May 01.

Artículo en Inglés | MEDLINE | ID: mdl-38691432

RESUMEN

Learning with noisy labels (LNL) has attracted significant attention from the research community. Many recent LNL methods rely on the assumption that clean samples tend to have a "small loss." However, this assumption often fails to generalize to some real-world cases with imbalanced subpopulations, that is, training subpopulations that vary in sample size or recognition difficulty. Therefore, recent LNL methods face the risk of misclassifying those "informative" samples (e.g., hard samples or samples in the tail subpopulations) into noisy samples, leading to poor generalization performance. To address this issue, we propose a novel LNL method to deal with noisy labels and imbalanced subpopulations simultaneously. It first leverages sample correlation to estimate samples' clean probabilities for label correction and then utilizes corrected labels for distributionally robust optimization (DRO) to further improve the robustness. Specifically, in contrast to previous works using classification loss as the selection criterion, we introduce a feature-based metric that takes the sample correlation into account for estimating samples' clean probabilities. Then, we refurbish the noisy labels using the estimated clean probabilities and the pseudo-labels from the model's predictions. With refurbished labels, we use DRO to train the model to be robust to subpopulation imbalance. Extensive experiments on a wide range of benchmarks demonstrate that our technique can consistently improve state-of-the-art (SOTA) robust learning paradigms against noisy labels, especially when encountering imbalanced subpopulations. We provide our code in https://github.com/chenmc1996/LNL-IS.

5.

Geometric deep learning methods and applications in 3D structure-based drug design.

Bai, Qifeng; Xu, Tingyang; Huang, Junzhou; Pérez-Sánchez, Horacio.

Drug Discov Today ; 29(7): 104024, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38759948

RESUMEN

3D structure-based drug design (SBDD) is considered a challenging and rational way for innovative drug discovery. Geometric deep learning is a promising approach that solves the accurate model training of 3D SBDD through building neural network models to learn non-Euclidean data, such as 3D molecular graphs and manifold data. Here, we summarize geometric deep learning methods and applications that contain 3D molecular representations, equivariant graph neural networks (EGNNs), and six generative model methods [diffusion model, flow-based model, generative adversarial networks (GANs), variational autoencoder (VAE), autoregressive models, and energy-based models]. Our review provides insights into geometric deep learning methods and advanced applications of 3D SBDD that will be of relevance for the drug discovery community.

Asunto(s)

Aprendizaje Profundo , Diseño de Fármacos , Redes Neurales de la Computación , Descubrimiento de Drogas/métodos , Humanos , Estructura Molecular

6.

Solving the non-submodular network collapse problems via Decision Transformer.

Ma, Kaili; Yang, Han; Yang, Shanchao; Zhao, Kangfei; Li, Lanqing; Chen, Yongqiang; Huang, Junzhou; Cheng, James; Rong, Yu.

Neural Netw ; 176: 106328, 2024 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-38688067

RESUMEN

Given a graph G, the network collapse problem (NCP) selects a vertex subset S of minimum cardinality from G such that the difference in the values of a given measure function f(G)-f(G∖S) is greater than a predefined collapse threshold. Many graph analytic applications can be formulated as NCPs with different measure functions, which often pose a significant challenge due to their NP-hard nature. As a result, traditional greedy algorithms, which select the vertex with the highest reward at each step, may not effectively find the optimal solution. In addition, existing learning-based algorithms do not have the ability to model the sequence of actions taken during the decision-making process, making it difficult to capture the combinatorial effect of selected vertices on the final solution. This limits the performance of learning-based approaches in non-submodular NCPs. To address these limitations, we propose a unified framework called DT-NC, which adapts the Decision Transformer to the Network Collapse problems. DT-NC takes into account the historical actions taken during the decision-making process and effectively captures the combinatorial effect of selected vertices. The ability of DT-NC to model the dependency among selected vertices allows it to address the difficulties caused by the non-submodular property of measure functions in some NCPs effectively. Through extensive experiments on various NCPs and graphs of different sizes, we demonstrate that DT-NC outperforms the state-of-the-art methods and exhibits excellent transferability and generalizability.

Asunto(s)

Algoritmos , Redes Neurales de la Computación , Toma de Decisiones/fisiología , Humanos

7.

MARS: a motif-based autoregressive model for retrosynthesis prediction.

Liu, Jiahan; Yan, Chaochao; Yu, Yang; Lu, Chan; Huang, Junzhou; Ou-Yang, Le; Zhao, Peilin.

Bioinformatics ; 40(3)2024 Mar 04.

Artículo en Inglés | MEDLINE | ID: mdl-38426338

RESUMEN

MOTIVATION: Retrosynthesis is a critical task in drug discovery, aimed at finding a viable pathway for synthesizing a given target molecule. Many existing approaches frame this task as a graph-generating problem. Specifically, these methods first identify the reaction center, and break a targeted molecule accordingly to generate the synthons. Reactants are generated by either adding atoms sequentially to synthon graphs or by directly adding appropriate leaving groups. However, both of these strategies have limitations. Adding atoms results in a long prediction sequence that increases the complexity of generation, while adding leaving groups only considers those in the training set, which leads to poor generalization. RESULTS: In this paper, we propose a novel end-to-end graph generation model for retrosynthesis prediction, which sequentially identifies the reaction center, generates the synthons, and adds motifs to the synthons to generate reactants. Given that chemically meaningful motifs fall between the size of atoms and leaving groups, our model achieves lower prediction complexity than adding atoms and demonstrates superior performance than adding leaving groups. We evaluate our proposed model on a benchmark dataset and show that it significantly outperforms previous state-of-the-art models. Furthermore, we conduct ablation studies to investigate the contribution of each component of our proposed model to the overall performance on benchmark datasets. Experiment results demonstrate the effectiveness of our model in predicting retrosynthesis pathways and suggest its potential as a valuable tool in drug discovery. AVAILABILITY AND IMPLEMENTATION: All code and data are available at https://github.com/szu-ljh2020/MARS.

Asunto(s)

Benchmarking , Descubrimiento de Drogas , Sistemas de Lectura

8.

Toward Robust Self-Training Paradigm for Molecular Prediction Tasks.

Ma, Hehuan; Jiang, Feng; Rong, Yu; Guo, Yuzhi; Huang, Junzhou.

J Comput Biol ; 31(3): 213-228, 2024 03.

Artículo en Inglés | MEDLINE | ID: mdl-38531049

RESUMEN

Molecular prediction tasks normally demand a series of professional experiments to label the target molecule, which suffers from the limited labeled data problem. One of the semisupervised learning paradigms, known as self-training, utilizes both labeled and unlabeled data. Specifically, a teacher model is trained using labeled data and produces pseudo labels for unlabeled data. These labeled and pseudo-labeled data are then jointly used to train a student model. However, the pseudo labels generated from the teacher model are generally not sufficiently accurate. Thus, we propose a robust self-training strategy by exploring robust loss function to handle such noisy labels in two paradigms, that is, generic and adaptive. We have conducted experiments on three molecular biology prediction tasks with four backbone models to gradually evaluate the performance of the proposed robust self-training strategy. The results demonstrate that the proposed method enhances prediction performance across all tasks, notably within molecular regression tasks, where there has been an average enhancement of 41.5%. Furthermore, the visualization analysis confirms the superiority of our method. Our proposed robust self-training is a simple yet effective strategy that efficiently improves molecular biology prediction performance. It tackles the labeled data insufficient issue in molecular biology by taking advantage of both labeled and unlabeled data. Moreover, it can be easily embedded with any prediction task, which serves as a universal approach for the bioinformatics community.

Asunto(s)

Biología Computacional , Biología Molecular , Humanos , Aprendizaje Automático Supervisado

9.

pan-MHC and cross-Species Prediction of T Cell Receptor-Antigen Binding.

Han, Yi; Yang, Yuqiu; Tian, Yanhua; Fattah, Farjana J; von Itzstein, Mitchell S; Hu, Yifei; Zhang, Minying; Kang, Xiongbin; Yang, Donghan M; Liu, Jialiang; Xue, Yaming; Liang, Chaoying; Raman, Indu; Zhu, Chengsong; Xiao, Olivia; Dowell, Jonathan E; Homsi, Jade; Rashdan, Sawsan; Yang, Shengjie; Gwin, Mary E; Hsiehchen, David; Gloria-McCutchen, Yvonne; Pan, Ke; Wu, Fangjiang; Gibbons, Don; Wang, Xinlei; Yee, Cassian; Huang, Junzhou; Reuben, Alexandre; Cheng, Chao; Zhang, Jianjun; Gerber, David E; Wang, Tao.

bioRxiv ; 2023 Dec 12.

Artículo en Inglés | MEDLINE | ID: mdl-38105939

RESUMEN

Profiling the binding of T cell receptors (TCRs) of T cells to antigenic peptides presented by MHC proteins is one of the most important unsolved problems in modern immunology. Experimental methods to probe TCR-antigen interactions are slow, labor-intensive, costly, and yield moderate throughput. To address this problem, we developed pMTnet-omni, an Artificial Intelligence (AI) system based on hybrid protein sequence and structure information, to predict the pairing of TCRs of αß T cells with peptide-MHC complexes (pMHCs). pMTnet-omni is capable of handling peptides presented by both class I and II pMHCs, and capable of handling both human and mouse TCR-pMHC pairs, through information sharing enabled this hybrid design. pMTnet-omni achieves a high overall Area Under the Curve of Receiver Operator Characteristics (AUROC) of 0.888, which surpasses competing tools by a large margin. We showed that pMTnet-omni can distinguish binding affinity of TCRs with similar sequences. Across a range of datasets from various biological contexts, pMTnet-omni characterized the longitudinal evolution and spatial heterogeneity of TCR-pMHC interactions and their functional impact. We successfully developed a biomarker based on pMTnet-omni for predicting immune-related adverse events of immune checkpoint inhibitor (ICI) treatment in a cohort of 57 ICI-treated patients. pMTnet-omni represents a major advance towards developing a clinically usable AI system for TCR-pMHC pairing prediction that can aid the design and implementation of TCR-based immunotherapeutics.

10.

Spatiotemporal denoising of low-dose cardiac CT image sequences using RecycleGAN.

Zhou, Shiwei; Yang, Jinyu; Konduri, Krishnateja; Huang, Junzhou; Yu, Lifeng; Jin, Mingwu.

Biomed Phys Eng Express ; 9(6)2023 09 12.

Artículo en Inglés | MEDLINE | ID: mdl-37604139

RESUMEN

Electrocardiogram (ECG)-gated multi-phase computed tomography angiography (MP-CTA) is frequently used for diagnosis of coronary artery disease. Radiation dose may become a potential concern as the scan needs to cover a wide range of cardiac phases during a heart cycle. A common method to reduce radiation is to limit the full-dose acquisition to a predefined range of phases while reducing the radiation dose for the rest. Our goal in this study is to develop a spatiotemporal deep learning method to enhance the quality of low-dose CTA images at phases acquired at reduced radiation dose. Recently, we demonstrated that a deep learning method, Cycle-Consistent generative adversarial networks (CycleGAN), could effectively denoise low-dose CT images through spatial image translation without labeled image pairs in both low-dose and full-dose image domains. As CycleGAN does not utilize the temporal information in its denoising mechanism, we propose to use RecycleGAN, which could translate a series of images ordered in time from the low-dose domain to the full-dose domain through an additional recurrent network. To evaluate RecycleGAN, we use the XCAT phantom program, a highly realistic simulation tool based on real patient data, to generate MP-CTA image sequences for 18 patients (14 for training, 2 for validation and 2 for test). Our simulation results show that RecycleGAN can achieve better denoising performance than CycleGAN based on both visual inspection and quantitative metrics. We further demonstrate the superior denoising performance of RecycleGAN using clinical MP-CTA images from 50 patients.

Asunto(s)

Angiografía por Tomografía Computarizada , Tomografía Computarizada por Rayos X , Humanos , Corazón/diagnóstico por imagen , Angiografía , Benchmarking

11.

Recent Advances in Toxicity Prediction: Applications of Deep Graph Learning.

Miao, Yuwei; Ma, Hehuan; Huang, Junzhou.

Chem Res Toxicol ; 36(8): 1206-1226, 2023 08 21.

Artículo en Inglés | MEDLINE | ID: mdl-37562046

RESUMEN

The development of new drugs is time-consuming and expensive, and as such, accurately predicting the potential toxicity of a drug candidate is crucial in ensuring its safety and efficacy. Recently, deep graph learning has become prevalent in this field due to its computational power and cost efficiency. Many novel deep graph learning methods aid toxicity prediction and further prompt drug development. This review aims to connect fundamental knowledge with burgeoning deep graph learning methods. We first summarize the essential components of deep graph learning models for toxicity prediction, including molecular descriptors, molecular representations, evaluation metrics, validation methods, and data sets. Furthermore, based on various graph-related representations of molecules, we introduce several representative studies and methods for toxicity prediction from the perspective of GNN architectures and graph pretrained models. Compared to other types of models, deep graph models not only advance in higher accuracy and efficiency but also provide more intuitive insights, which is significant in the development of model interpretation and generalization ability. The graph pretrained models are emerging as they can extract prominent features from large-scale unlabeled molecular graph data and improve the performance of downstream toxicity prediction tasks. We hope this survey can serve as a handbook for individuals interested in exploring deep graph learning for toxicity prediction.

Asunto(s)

Desarrollo de Medicamentos , Preparaciones Farmacéuticas , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos

12.

Structure-Aware DropEdge Toward Deep Graph Convolutional Networks.

Han, Jiaqi; Huang, Wenbing; Rong, Yu; Xu, Tingyang; Sun, Fuchun; Huang, Junzhou.

IEEE Trans Neural Netw Learn Syst ; PP2023 Jul 26.

Artículo en Inglés | MEDLINE | ID: mdl-37494169

RESUMEN

It has been discovered that graph convolutional networks (GCNs) encounter a remarkable drop in performance when multiple layers are piled up. The main factor that accounts for why deep GCNs fail lies in oversmoothing, which isolates the network output from the input with the increase of network depth, weakening expressivity and trainability. In this article, we start by investigating refined measures upon DropEdge-an existing simple yet effective technique to relieve oversmoothing. We term our method as DropEdge ++ for its two structure-aware samplers in contrast to DropEdge: layer-dependent (LD) sampler and feature-dependent (FD) sampler. Regarding the LD sampler, we interestingly find that increasingly sampling edges from the bottom layer yields superior performance than the decreasing counterpart as well as DropEdge. We theoretically reveal this phenomenon with mean-edge-number (MEN), a metric closely related to oversmoothing. For the FD sampler, we associate the edge sampling probability with the feature similarity of node pairs and prove that it further correlates the convergence subspace of the output layer with the input features. Extensive experiments on several node classification benchmarks, including both full-and semi-supervised tasks, illustrate the efficacy of DropEdge ++ and its compatibility with a variety of backbones by achieving generally better performance over DropEdge and the no-drop version.

13.

NuSegDA: Domain adaptation for nuclei segmentation.

Haq, Mohammad Minhazul; Ma, Hehuan; Huang, Junzhou.

Front Big Data ; 6: 1108659, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-36936996

RESUMEN

The accurate segmentation of nuclei is crucial for cancer diagnosis and further clinical treatments. To successfully train a nuclei segmentation network in a fully-supervised manner for a particular type of organ or cancer, we need the dataset with ground-truth annotations. However, such well-annotated nuclei segmentation datasets are highly rare, and manually labeling an unannotated dataset is an expensive, time-consuming, and tedious process. Consequently, we require to discover a way for training the nuclei segmentation network with unlabeled dataset. In this paper, we propose a model named NuSegUDA for nuclei segmentation on the unlabeled dataset (target domain). It is achieved by applying Unsupervised Domain Adaptation (UDA) technique with the help of another labeled dataset (source domain) that may come from different type of organ, cancer, or source. We apply UDA technique at both of feature space and output space. We additionally utilize a reconstruction network and incorporate adversarial learning into it so that the source-domain images can be accurately translated to the target-domain for further training of the segmentation network. We validate our proposed NuSegUDA on two public nuclei segmentation datasets, and obtain significant improvement as compared with the baseline methods. Extensive experiments also verify the contribution of newly proposed image reconstruction adversarial loss, and target-translated source supervised loss to the performance boost of NuSegUDA. Finally, considering the scenario when we have a small number of annotations available from the target domain, we extend our work and propose NuSegSSDA, a Semi-Supervised Domain Adaptation (SSDA) based approach.

14.

Learning Representations by Graphical Mutual Information Estimation and Maximization.

Peng, Zhen; Luo, Minnan; Huang, Wenbing; Li, Jundong; Zheng, Qinghua; Sun, Fuchun; Huang, Junzhou.

IEEE Trans Pattern Anal Mach Intell ; 45(1): 722-737, 2023 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-35104214

RESUMEN

The rich content in various real-world networks such as social networks, biological networks, and communication networks provides unprecedented opportunities for unsupervised machine learning on graphs. This paper investigates the fundamental problem of preserving and extracting abundant information from graph-structured data into embedding space without external supervision. To this end, we generalize conventional mutual information computation from vector space to graph domain and present a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graph and hidden representation. Except for standard GMI which considers graph structures from a local perspective, our further proposed GMI++ additionally captures global topological properties by analyzing the co-occurrence relationship of nodes. GMI and its extension exhibit several benefits: First, they are invariant to the isomorphic transformation of input graphs-an inevitable constraint in many existing methods; Second, they can be efficiently estimated and maximized by current mutual information estimation methods; Lastly, our theoretical analysis confirms their correctness and rationality. With the aid of GMI, we develop an unsupervised embedding model and adapt it to the specific anomaly detection task. Extensive experiments indicate that our GMI methods achieve promising performance in various downstream tasks, such as node classification, link prediction, and anomaly detection.

15.

Merging nucleus datasets by correlation-based cross-training.

Zhang, Wenhua; Zhang, Jun; Wang, Xiyue; Yang, Sen; Huang, Junzhou; Yang, Wei; Wang, Wenping; Han, Xiao.

Med Image Anal ; 84: 102705, 2023 02.

Artículo en Inglés | MEDLINE | ID: mdl-36525843

RESUMEN

Fine-grained nucleus classification is challenging because of the high inter-class similarity and intra-class variability. Therefore, a large number of labeled data is required for training effective nucleus classification models. However, it is challenging to label a large-scale nucleus classification dataset comparable to ImageNet in natural images, considering that high-quality nucleus labeling requires specific domain knowledge. In addition, the existing publicly available datasets are often inconsistently labeled with divergent labeling criteria. Due to this inconsistency, conventional models have to be trained on each dataset separately and work independently to infer their own classification results, limiting their classification performance. To fully utilize all annotated datasets, we formulate the nucleus classification task as a multi-label problem with missing labels to utilize all datasets in a unified framework. Specifically, we merge all datasets and combine their labels as multiple labels. Thus, each data has one ground-truth label and several missing labels. We devise a base classification module that is trained using all data but sparsely supervised by the ground-truth labels only. We then exploit the correlation among different label sets by a label correlation module. By doing so, we can have two trained basic modules and further cross-train them with both ground-truth labels and pseudo labels for the missing ones. Importantly, data without any ground-truth labels can also be involved in our framework, as we can regard them as data with all labels missing and generate the corresponding pseudo labels. We carefully re-organized multiple publicly available nucleus classification datasets, converted them into a uniform format, and tested the proposed framework on them. Experimental results show substantial improvement compared to the state-of-the-art methods. The code and data are available at https://w-h-zhang.github.io/projects/dataset_merging/dataset_merging.html.

Asunto(s)

Núcleo Celular , Humanos

16.

A generalizable and robust deep learning algorithm for mitosis detection in multicenter breast histopathological images.

Wang, Xiyue; Zhang, Jun; Yang, Sen; Xiang, Jingxi; Luo, Feng; Wang, Minghui; Zhang, Jing; Yang, Wei; Huang, Junzhou; Han, Xiao.

Med Image Anal ; 84: 102703, 2023 02.

Artículo en Inglés | MEDLINE | ID: mdl-36481608

RESUMEN

Mitosis counting of biopsies is an important biomarker for breast cancer patients, which supports disease prognostication and treatment planning. Developing a robust mitotic cell detection model is highly challenging due to its complex growth pattern and high similarities with non-mitotic cells. Most mitosis detection algorithms have poor generalizability across image domains and lack reproducibility and validation in multicenter settings. To overcome these issues, we propose a generalizable and robust mitosis detection algorithm (called FMDet), which is independently tested on multicenter breast histopathological images. To capture more refined morphological features of cells, we convert the object detection task as a semantic segmentation problem. The pixel-level annotations for mitotic nuclei are obtained by taking the intersection of the masks generated from a well-trained nuclear segmentation model and the bounding boxes provided by the MIDOG 2021 challenge. In our segmentation framework, a robust feature extractor is developed to capture the appearance variations of mitotic cells, which is constructed by integrating a channel-wise multi-scale attention mechanism into a fully convolutional network structure. Benefiting from the fact that the changes in the low-level spectrum do not affect the high-level semantic perception, we employ a Fourier-based data augmentation method to reduce domain discrepancies by exchanging the low-frequency spectrum between two domains. Our FMDet algorithm has been tested in the MIDOG 2021 challenge and ranked first place. Further, our algorithm is also externally validated on four independent datasets for mitosis detection, which exhibits state-of-the-art performance in comparison with previously published results. These results demonstrate that our algorithm has the potential to be deployed as an assistant decision support tool in clinical practice. Our code has been released at https://github.com/Xiyue-Wang/1st-in-MICCAI-MIDOG-2021-challenge.

Asunto(s)

Aprendizaje Profundo , Humanos , Reproducibilidad de los Resultados , Algoritmos , Mama/diagnóstico por imagen , Mitosis , Procesamiento de Imagen Asistido por Computador/métodos

17.

RetCCL: Clustering-guided contrastive learning for whole-slide image retrieval.

Wang, Xiyue; Du, Yuexi; Yang, Sen; Zhang, Jun; Wang, Minghui; Zhang, Jing; Yang, Wei; Huang, Junzhou; Han, Xiao.

Med Image Anal ; 83: 102645, 2023 01.

Artículo en Inglés | MEDLINE | ID: mdl-36270093

RESUMEN

Benefiting from the large-scale archiving of digitized whole-slide images (WSIs), computer-aided diagnosis has been well developed to assist pathologists in decision-making. Content-based WSI retrieval can be a new approach to find highly correlated WSIs in a historically diagnosed WSI archive, which has the potential usages for assisted clinical diagnosis, medical research, and trainee education. During WSI retrieval, it is particularly challenging to encode the semantic content of histopathological images and to measure the similarity between images for interpretable results due to the gigapixel size of WSIs. In this work, we propose a Retrieval with Clustering-guided Contrastive Learning (RetCCL) framework for robust and accurate WSI-level image retrieval, which integrates a novel self-supervised feature learning method and a global ranking and aggregation algorithm for much improved performance. The proposed feature learning method makes use of existing large-scale unlabeled histopathological image data, which helps learn universal features that could be used directly for subsequent WSI retrieval tasks without extra fine-tuning. The proposed WSI retrieval method not only returns a set of WSIs similar to a query WSI, but also highlights patches or sub-regions of each WSI that share high similarity with patches of the query WSI, which helps pathologists interpret the searching results. Our WSI retrieval framework has been evaluated on the tasks of anatomical site retrieval and cancer subtype retrieval using over 22,000 slides, and the performance exceeds other state-of-the-art methods significantly (around 10% for the anatomic site retrieval in terms of average mMV@10). Besides, the patch retrieval using our learned feature representation offers a performance improvement of 24% on the TissueNet dataset in terms of mMV@5 compared with using ImageNet pre-trained features, which further demonstrates the effectiveness of the proposed CCL feature learning method.

Asunto(s)

Investigación Biomédica , Humanos

18.

Molecule Sequence Generation with Rebalanced Variational Autoencoder Loss.

Yan, Chaochao; Yang, Jinyu; Ma, Hehuan; Wang, Sheng; Huang, Junzhou.

J Comput Biol ; 30(1): 82-94, 2023 01.

Artículo en Inglés | MEDLINE | ID: mdl-35972373

RESUMEN

Molecule generation is the procedure to generate initial novel molecule proposals for molecule design. Molecules are first projected into continuous vectors in chemical latent space, and then, these embedding vectors are decoded into molecules under the variational autoencoder (VAE) framework. The continuous latent space of VAE can be utilized to generate novel molecules with desired chemical properties and further optimize the desired chemical properties of molecules. However, there is a posterior collapse problem with the conventional recurrent neural network-based VAEs for the molecule sequence generation, which deteriorates the generation performance. We investigate the posterior collapse problem and find that the underestimated reconstruction loss is the main factor in the posterior collapse problem in molecule sequence generation. To support our conclusion, we present both analytical and experimental evidence. What is more, we propose an efficient and effective solution to fix the problem and prevent posterior collapse. As a result, our method achieves competitive reconstruction accuracy and validity score on the benchmark data sets.

Asunto(s)

Benchmarking , Redes Neurales de la Computación , Sulfadiazina

19.

RetroComposer: Composing Templates for Template-Based Retrosynthesis Prediction.

Yan, Chaochao; Zhao, Peilin; Lu, Chan; Yu, Yang; Huang, Junzhou.

Biomolecules ; 12(9)2022 09 19.

Artículo en Inglés | MEDLINE | ID: mdl-36139164

RESUMEN

The main target of retrosynthesis is to recursively decompose desired molecules into available building blocks. Existing template-based retrosynthesis methods follow a template selection stereotype and suffer from limited training templates, which prevents them from discovering novel reactions. To overcome this limitation, we propose an innovative retrosynthesis prediction framework that can compose novel templates beyond training templates. As far as we know, this is the first method that uses machine learning to compose reaction templates for retrosynthesis prediction. Besides, we propose an effective reactant candidate scoring model that can capture atom-level transformations, which helps our method outperform previous methods on the USPTO-50K dataset. Experimental results show that our method can produce novel templates for 15 USPTO-50K test reactions that are not covered by training templates. We have released our source implementation.

Asunto(s)

Técnicas de Química Sintética , Aprendizaje Automático , Técnicas de Química Sintética/métodos , Modelos Químicos

20.

Knowledge-Based Representation Learning for Nucleus Instance Classification From Histopathological Images.

Zhang, Wenhua; Zhang, Jun; Yang, Sen; Wang, Xiyue; Yang, Wei; Huang, Junzhou; Wang, Wenping; Han, Xiao.

IEEE Trans Med Imaging ; 41(12): 3939-3951, 2022 12.

Artículo en Inglés | MEDLINE | ID: mdl-36037453

RESUMEN

The classification of nuclei in H&E-stained histopathological images is a fundamental step in the quantitative analysis of digital pathology. Most existing methods employ multi-class classification on the detected nucleus instances, while the annotation scale greatly limits their performance. Moreover, they often downplay the contextual information surrounding nucleus instances that is critical for classification. To explicitly provide contextual information to the classification model, we design a new structured input consisting of a content-rich image patch and a target instance mask. The image patch provides rich contextual information, while the target instance mask indicates the location of the instance to be classified and emphasizes its shape. Benefiting from our structured input format, we propose Structured Triplet for representation learning, a triplet learning framework on unlabelled nucleus instances with customized positive and negative sampling strategies. We pre-train a feature extraction model based on this framework with a large-scale unlabeled dataset, making it possible to train an effective classification model with limited annotated data. We also add two auxiliary branches, namely the attribute learning branch and the conventional self-supervised learning branch, to further improve its performance. As part of this work, we will release a new dataset of H&E-stained pathology images with nucleus instance masks, containing 20,187 patches of size 1024 ×1024 , where each patch comes from a different whole-slide image. The model pre-trained on this dataset with our framework significantly reduces the burden of extensive labeling. We show a substantial improvement in nucleus classification accuracy compared with the state-of-the-art methods.

Asunto(s)

Núcleo Celular , Núcleo Celular/patología , Coloración y Etiquetado

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA