Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
1.
Cell ; 186(3): 591-606.e23, 2023 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-36669483

RESUMO

Dysregulation of the immune system is a cardinal feature of opioid addiction. Here, we characterize the landscape of peripheral immune cells from patients with opioid use disorder and from healthy controls. Opioid-associated blood exhibited an abnormal distribution of immune cells characterized by a significant expansion of fragile-like regulatory T cells (Tregs), which was positively correlated with the withdrawal score. Analogously, opioid-treated mice also showed enhanced Treg-derived interferon-γ (IFN-γ) expression. IFN-γ signaling reshaped synaptic morphology in nucleus accumbens (NAc) neurons, modulating subsequent withdrawal symptoms. We demonstrate that opioids increase the expression of neuron-derived C-C motif chemokine ligand 2 (Ccl2) and disrupted blood-brain barrier (BBB) integrity through the downregulation of astrocyte-derived fatty-acid-binding protein 7 (Fabp7), which both triggered peripheral Treg infiltration into NAc. Our study demonstrates that opioids drive the expansion of fragile-like Tregs and favor peripheral Treg diapedesis across the BBB, which leads to IFN-γ-mediated synaptic instability and subsequent withdrawal symptoms.


Assuntos
Interferon gama , Transtornos Relacionados ao Uso de Opioides , Síndrome de Abstinência a Substâncias , Linfócitos T Reguladores , Animais , Camundongos , Analgésicos Opioides/administração & dosagem , Interferon gama/metabolismo , Transtornos Relacionados ao Uso de Opioides/metabolismo , Transtornos Relacionados ao Uso de Opioides/patologia
2.
Brief Bioinform ; 25(6)2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39391931

RESUMO

Despite advanced diagnostics, 3%-5% of cases remain classified as cancer of unknown primary (CUP). DNA methylation, an important epigenetic feature, is essential for determining the origin of metastatic tumors. We presented PathMethy, a novel Transformer model integrated with functional categories and crosstalk of pathways, to accurately trace the origin of tumors in CUP samples based on DNA methylation. PathMethy outperformed seven competing methods in F1-score across nine cancer datasets and predicted accurately the molecular subtypes within nine primary tumor types. It not only excelled at tracing the origins of both primary and metastatic tumors but also demonstrated a high degree of agreement with previously diagnosed sites in cases of CUP. PathMethy provided biological insights by highlighting key pathways, functional categories, and their interactions. Using functional categories of pathways, we gained a global understanding of biological processes. For broader access, a user-friendly web server for researchers and clinicians is available at https://cup.pathmethy.com.


Assuntos
Metilação de DNA , Neoplasias , Humanos , Neoplasias/genética , Software , Inteligência Artificial , Biologia Computacional/métodos , Algoritmos , Epigênese Genética
3.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37204192

RESUMO

Accurately predicting the antigen-binding specificity of adaptive immune receptors (AIRs), such as T-cell receptors (TCRs) and B-cell receptors (BCRs), is essential for discovering new immune therapies. However, the diversity of AIR chain sequences limits the accuracy of current prediction methods. This study introduces SC-AIR-BERT, a pre-trained model that learns comprehensive sequence representations of paired AIR chains to improve binding specificity prediction. SC-AIR-BERT first learns the 'language' of AIR sequences through self-supervised pre-training on a large cohort of paired AIR chains from multiple single-cell resources. The model is then fine-tuned with a multilayer perceptron head for binding specificity prediction, employing the K-mer strategy to enhance sequence representation learning. Extensive experiments demonstrate the superior AUC performance of SC-AIR-BERT compared with current methods for TCR- and BCR-binding specificity prediction.


Assuntos
Receptores de Antígenos de Linfócitos B , Receptores de Antígenos de Linfócitos T , Humanos , Receptores de Antígenos de Linfócitos T/genética , Receptores de Antígenos de Linfócitos B/genética , Redes Neurais de Computação , Especificidade de Anticorpos
4.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36946415

RESUMO

Colorectal cancer (CRC) is one of the most common gastrointestinal malignancies. There are few recurrence risk signatures for CRC patients. Single-cell RNA-sequencing (scRNA-seq) provides a high-resolution platform for prognostic signature detection. However, scRNA-seq is not practical in large cohorts due to its high cost and most single-cell experiments lack clinical phenotype information. Few studies have been reported to use external bulk transcriptome with survival time to guide the detection of key cell subtypes in scRNA-seq data. We proposed scRankXMBD, a computational framework to prioritize prognostic-associated cell subpopulations based on within-cell relative expression orderings of gene pairs from single-cell transcriptomes. scRankXMBD achieves higher precision and concordance compared with five existing methods. Moreover, we developed single-cell gene pair signatures to predict recurrence risk for patients individually. Our work facilitates the application of the rank-based method in scRNA-seq data for prognostic biomarker discovery and precision oncology. scRankXMBD is available at https://github.com/xmuyulab/scRank-XMBD. (XMBD:Xiamen Big Data, a biomedical open software initiative in the National Institute for Data Science in Health and Medicine, Xiamen University, China.).


Assuntos
Neoplasias Colorretais , Transcriptoma , Humanos , Perfilação da Expressão Gênica/métodos , Prognóstico , Medicina de Precisão , Software , Neoplasias Colorretais/genética , Análise de Sequência de RNA
5.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35368072

RESUMO

Liquid chromatography-mass spectrometry-based quantitative proteomics can measure the expression of thousands of proteins from biological samples and has been increasingly applied in cancer research. Identifying differentially expressed proteins (DEPs) between tumors and normal controls is commonly used to investigate carcinogenesis mechanisms. While differential expression analysis (DEA) at an individual level is desired to identify patient-specific molecular defects for better patient stratification, most statistical DEP analysis methods only identify deregulated proteins at the population level. To date, robust individualized DEA algorithms have been proposed for ribonucleic acid data, but their performance on proteomics data is underexplored. Herein, we performed a systematic evaluation on five individualized DEA algorithms for proteins on cancer proteomic datasets from seven cancer types. Results show that the within-sample relative expression orderings (REOs) of protein pairs in normal tissues were highly stable, providing the basis for individualized DEA for proteins using REOs. Moreover, individualized DEA algorithms achieve higher precision in detecting sample-specific deregulated proteins than population-level methods. To facilitate the utilization of individualized DEA algorithms in proteomics for prognostic biomarker discovery and personalized medicine, we provide Individualized DEP Analysis IDEPAXMBD (XMBD: Xiamen Big Data, a biomedical open software initiative in the National Institute for Data Science in Health and Medicine, Xiamen University, China.) (https://github.com/xmuyulab/IDEPA-XMBD), which is a user-friendly and open-source Python toolkit that integrates individualized DEA algorithms for DEP-associated deregulation pattern recognition.


Assuntos
Neoplasias , Proteoma , Humanos , Espectrometria de Massas/métodos , Neoplasias/genética , Proteoma/análise , Proteômica/métodos , Software
6.
BMC Bioinformatics ; 24(1): 387, 2023 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-37821827

RESUMO

BACKGROUND: Metagenomic sequencing is an unbiased approach that can potentially detect all the known and unidentified strains in pathogen detection. Recently, nanopore sequencing has been emerging as a highly potential tool for rapid pathogen detection due to its fast turnaround time. However, identifying pathogen within species is nontrivial for nanopore sequencing data due to the high sequencing error rate. RESULTS: We developed the core gene alleles metagenome strain identification (cgMSI) tool, which uses a two-stage maximum a posteriori probability estimation method to detect pathogens at strain level from nanopore metagenomic sequencing data at low computational cost. The cgMSI tool can accurately identify strains and estimate relative abundance at 1× coverage. CONCLUSIONS: We developed cgMSI for nanopore metagenomic pathogen detection within species. cgMSI is available at https://github.com/ZHU-XU-xmu/cgMSI .


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Metagenoma , Alelos , Metagenômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
7.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33822895

RESUMO

Metagenomics data provide rich information for the detection of foodborne pathogens from food and environmental samples that are mixed with complex background bacteria strains. While pathogen detection from metagenomic sequencing data has become an activity of increasing interest, shotgun sequencing of uncultured food samples typically produces data that contain reads from many different organisms, making accurate strain typing a challenging task. Particularly, as many pathogens may contain a common set of genes that are highly similar to those from normal bacteria in food samples, traditional strain-level abundance profiling approaches do not perform well at detecting pathogens of very low abundance levels. To overcome this limitation, we propose an abundance correction method based on species-specific genomic regions to achieve high sensitivity and high specificity in target pathogen detection at low abundance.


Assuntos
Bactérias/genética , Bactérias/patogenicidade , Infecções Bacterianas/diagnóstico , Doenças Transmitidas por Alimentos/microbiologia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenoma , Metagenômica/métodos , Sequenciamento Completo do Genoma/métodos , Infecções Bacterianas/microbiologia , Confiabilidade dos Dados , Genoma Bacteriano , Humanos , Sensibilidade e Especificidade , Especificidade da Espécie
8.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-34020539

RESUMO

With the development of single-cell RNA sequencing (scRNA-seq) technology, it has become possible to perform large-scale transcript profiling for tens of thousands of cells in a single experiment. Many analysis pipelines have been developed for data generated from different high-throughput scRNA-seq platforms, bringing a new challenge to users to choose a proper workflow that is efficient, robust and reliable for a specific sequencing platform. Moreover, as the amount of public scRNA-seq data has increased rapidly, integrated analysis of scRNA-seq data from different sources has become increasingly popular. However, it remains unclear whether such integrated analysis would be biassed if the data were processed by different upstream pipelines. In this study, we encapsulated seven existing high-throughput scRNA-seq data processing pipelines with Nextflow, a general integrative workflow management framework, and evaluated their performance in terms of running time, computational resource consumption and data analysis consistency using eight public datasets generated from five different high-throughput scRNA-seq platforms. Our work provides a useful guideline for the selection of scRNA-seq data processing pipelines based on their performance on different real datasets. In addition, these guidelines can serve as a performance evaluation framework for future developments in high-throughput scRNA-seq data processing.


Assuntos
Algoritmos , Bases de Dados de Ácidos Nucleicos , RNA-Seq , RNA , Análise de Célula Única , Células 3T3 , Animais , Células HEK293 , Humanos , Camundongos , RNA/biossíntese , RNA/genética
9.
Bioinformatics ; 37(2): 265-267, 2021 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-33416868

RESUMO

SUMMARY: Currently, various software tools are used to support two mainstream workflows for data-independent acquisition (DIA) mass spectrometry (MS) data processing, namely, spectrum-centric scoring (SCS) and peptide-centric scoring (PCS). However, a fully automatic, easily reproducible and freely accessible pipeline that simultaneously integrates SCS and PCS strategies and supports both library-free and library-based modes is absent. We developed Diamond, a Nextflow-based, containerized, multi-modal DIA-MS data processing pipeline for peptide identification and quantification. Diamond integrated two mainstream workflows for DIA data analysis, namely, SCS and PCS, for use cases both with and without assay libraries. This multi-modal pipeline serves as a versatile, easy-to-use and easily extendable toolbox for large-scale DIA data processing. AVAILABILITY: Diamond is hosted on GitHub (https://github.com/xmuyulab/Diamond) and is released under the highly permissive MIT license to encourage further customization and modification. The Docker image for Diamond is freely accessible at https://hub.docker.com/r/zeroli/diamond.

10.
J Biomed Inform ; 130: 104093, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35537690

RESUMO

The random noises, sampling biases, and batch effects often confound true biological variations in single-cell RNA-sequencing (scRNA-seq) data. Adjusting such biases is key to the robust discoveries in downstream analyses, such as cell clustering, gene selection and data integration. Here we propose a model-based downsampling algorithm based on minimal unbiased representative points (MURPXMBD). MURPXMBD is designed to retrieve a set of representative points by reducing gene-wise random independent errors, while retaining the covariance structure of biological origin hence provide an unbiased representation of the cell population. Subsequent validation using benchmark datasets shows that MURPXMBD can improve the quality and accuracy of clustering algorithms, and thus facilitate the discovery of new cell types. Besides, MURPXMBD also improves the performance of dataset integration algorithms. In summary, MURPXMBD serves as a useful noise-reduction method for single-cell sequencing analysis in biomedical studies.


Assuntos
Análise de Célula Única , Transcriptoma , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
11.
Bioinformatics ; 36(17): 4551-4559, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32458976

RESUMO

MOTIVATION: Per-base quality values in Next Generation Sequencing data take a significant portion of storage even after compression. Lossy compression technologies could further reduce the space used by quality values. However, in many applications, lossless compression is still desired. Hence, sequencing data in multiple file formats have to be prepared for different applications. RESULTS: We developed a scalable lossy to lossless compression solution for quality values named ScaleQC (Scalable Quality value Compression). ScaleQC is able to provide the so-called bit-stream level scalability that the losslessly compressed bit-stream by ScaleQC can be further truncated to lower data rates without incurring an expensive transcoding operation. Despite its scalability, ScaleQC still achieves comparable compression performance at both lossless and lossy data rates compared to the existing lossless or lossy compressors. AVAILABILITY AND IMPLEMENTATION: ScaleQC has been integrated with SAMtools as a special quality value encoding mode for CRAM. Its source codes can be obtained from our integrated SAMtools (https://github.com/xmuyulab/samtools) with dependency on integrated HTSlib (https://github.com/xmuyulab/htslib). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Compressão de Dados , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Software
12.
BMC Bioinformatics ; 21(1): 321, 2020 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-32689929

RESUMO

BACKGROUND: Recent advancements in high-throughput sequencing technologies have generated an unprecedented amount of genomic data that must be stored, processed, and transmitted over the network for sharing. Lossy genomic data compression, especially of the base quality values of sequencing data, is emerging as an efficient way to handle this challenge due to its superior compression performance compared to lossless compression methods. Many lossy compression algorithms have been developed for and evaluated using DNA sequencing data. However, whether these algorithms can be used on RNA sequencing (RNA-seq) data remains unclear. RESULTS: In this study, we evaluated the impacts of lossy quality value compression on common RNA-seq data analysis pipelines including expression quantification, transcriptome assembly, and short variants detection using RNA-seq data from different species and sequencing platforms. Our study shows that lossy quality value compression could effectively improve RNA-seq data compression. In some cases, lossy algorithms achieved up to 1.2-3 times further reduction on the overall RNA-seq data size compared to existing lossless algorithms. However, lossy quality value compression could affect the results of some RNA-seq data processing pipelines, and hence its impacts to RNA-seq studies cannot be ignored in some cases. Pipelines using HISAT2 for alignment were most significantly affected by lossy quality value compression, while the effects of lossy compression on pipelines that do not depend on quality values, e.g., STAR-based expression quantification and transcriptome assembly pipelines, were not observed. Moreover, regardless of using either STAR or HISAT2 as the aligner, variant detection results were affected by lossy quality value compression, albeit to a lesser extent when STAR-based pipeline was used. Our results also show that the impacts of lossy quality value compression depend on the compression algorithms being used and the compression levels if the algorithm supports setting of multiple compression levels. CONCLUSIONS: Lossy quality value compression can be incorporated into existing RNA-seq analysis pipelines to alleviate the data storage and transmission burdens. However, care should be taken on the selection of compression tools and levels based on the requirements of the downstream analysis pipelines to avoid introducing undesirable adverse effects on the analysis results.


Assuntos
Algoritmos , Compressão de Dados/métodos , Compressão de Dados/normas , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Sequência de Bases , Perfilação da Expressão Gênica , Genoma Humano , Humanos
13.
Cell Rep Methods ; 4(6): 100797, 2024 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-38889685

RESUMO

Cancer of unknown primary (CUP) represents metastatic cancer where the primary site remains unidentified despite standard diagnostic procedures. To determine the tumor origin in such cases, we developed BPformer, a deep learning method integrating the transformer model with prior knowledge of biological pathways. Trained on transcriptomes from 10,410 primary tumors across 32 cancer types, BPformer achieved remarkable accuracy rates of 94%, 92%, and 89% in primary tumors and primary and metastatic sites of metastatic tumors, respectively, surpassing existing methods. Additionally, BPformer was validated in a retrospective study, demonstrating consistency with tumor sites diagnosed through immunohistochemistry and histopathology. Furthermore, BPformer was able to rank pathways based on their contribution to tumor origin identification, which helped to classify oncogenic signaling pathways into those that are highly conservative among different cancers versus those that are highly variable depending on their origins.


Assuntos
Neoplasias Primárias Desconhecidas , Humanos , Neoplasias Primárias Desconhecidas/genética , Neoplasias Primárias Desconhecidas/patologia , Neoplasias Primárias Desconhecidas/metabolismo , Neoplasias Primárias Desconhecidas/diagnóstico , Transdução de Sinais/genética , Transcriptoma , Aprendizado Profundo , Estudos Retrospectivos
14.
Neural Netw ; 172: 106151, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38301339

RESUMO

Representation learning on temporal interaction graphs (TIG) aims to model complex networks with the dynamic evolution of interactions on a wide range of web and social graph applications. However, most existing works on TIG either (a) rely on discretely updated node embeddings merely when an interaction occurs that fail to capture the continuous evolution of embedding trajectories of nodes, or (b) overlook the rich temporal patterns hidden in the ever-changing graph data that presumably lead to sub-optimal models. In this paper, we propose a two-module framework named ConTIG, a novel representation learning method on TIG that captures the continuous dynamic evolution of node embedding trajectories. With two essential modules, our model exploits three-fold factors in dynamic networks including latest interaction, neighbor features, and inherent characteristics. In the first update module, we employ a continuous inference block to learn the nodes' state trajectories from time-adjacent interaction patterns using ordinary differential equations. In the second transform module, we introduce a self-attention mechanism to predict future node embeddings by aggregating historical temporal interaction information. Experiment results demonstrate the superiority of ConTIG on temporal link prediction, temporal node recommendation, and dynamic node classification tasks of four datasets compared with a range of state-of-the-art baselines, especially for long-interval interaction prediction.


Assuntos
Aprendizado de Máquina
15.
Cancer Cell ; 42(8): 1415-1433.e12, 2024 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-39029466

RESUMO

The tumor microenvironment (TME) has a significant impact on tumor growth and immunotherapy efficacies. However, the precise cellular interactions and spatial organizations within the TME that drive these effects remain elusive. Using advanced multiplex imaging techniques, we have discovered that regulatory T cells (Tregs) accumulate around lymphatic vessels in the peripheral tumor stroma. This localized accumulation is facilitated by mature dendritic cells enriched in immunoregulatory molecules (mregDCs), which promote chemotaxis of Tregs, establishing a peri-lymphatic Treg-mregDC niche. Within this niche, mregDCs facilitate Treg activation, which in turn restrains the trafficking of tumor antigens to the draining mesenteric lymph nodes, thereby impeding the initiation of anti-tumor adaptive immune responses. Disrupting Treg recruitment to mregDCs inhibits tumor progression. Our study provides valuable insights into the organization of TME and how local crosstalk between lymphoid and myeloid cells suppresses anti-tumor immune responses.


Assuntos
Células Dendríticas , Linfócitos T Reguladores , Microambiente Tumoral , Linfócitos T Reguladores/imunologia , Animais , Microambiente Tumoral/imunologia , Camundongos , Células Dendríticas/imunologia , Células Dendríticas/metabolismo , Humanos , Antígenos de Neoplasias/imunologia , Antígenos de Neoplasias/metabolismo , Vasos Linfáticos/imunologia , Vasos Linfáticos/metabolismo , Camundongos Endogâmicos C57BL , Linfonodos/imunologia , Linhagem Celular Tumoral , Neoplasias/imunologia , Neoplasias/metabolismo
16.
Nat Commun ; 15(1): 7362, 2024 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-39191725

RESUMO

We evaluate deconvolution methods, which infer levels of immune infiltration from bulk expression of tumor samples, through a community-wide DREAM Challenge. We assess six published and 22 community-contributed methods using in vitro and in silico transcriptional profiles of admixed cancer and healthy immune cells. Several published methods predict most cell types well, though they either were not trained to evaluate all functional CD8+ T cell states or do so with low accuracy. Several community-contributed methods address this gap, including a deep learning-based approach, whose strong performance establishes the applicability of this paradigm to deconvolution. Despite being developed largely using immune cells from healthy tissues, deconvolution methods predict levels of tumor-derived immune cells well. Our admixed and purified transcriptional profiles will be a valuable resource for developing deconvolution methods, including in response to common challenges we observe across methods, such as sensitive identification of functional CD4+ T cell states.


Assuntos
Linfócitos T CD4-Positivos , Linfócitos T CD8-Positivos , Neoplasias , Humanos , Linfócitos T CD8-Positivos/metabolismo , Linfócitos T CD4-Positivos/metabolismo , Neoplasias/genética , Neoplasias/imunologia , Neoplasias/patologia , Perfilação da Expressão Gênica/métodos , Transcriptoma , Aprendizado Profundo , Biologia Computacional/métodos , Linfócitos do Interstício Tumoral/imunologia , Regulação Neoplásica da Expressão Gênica
17.
IEEE Trans Med Imaging ; 42(8): 2462-2473, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37028064

RESUMO

Cancer survival prediction requires exploiting related multimodal information (e.g., pathological, clinical and genomic features, etc.) and it is even more challenging in clinical practices due to the incompleteness of patient's multimodal data. Furthermore, existing methods lack sufficient intra- and inter-modal interactions, and suffer from significant performance degradation caused by missing modalities. This manuscript proposes a novel hybrid graph convolutional network, entitled HGCN, which is equipped with an online masked autoencoder paradigm for robust multimodal cancer survival prediction. Particularly, we pioneer modeling the patient's multimodal data into flexible and interpretable multimodal graphs with modality-specific preprocessing. HGCN integrates the advantages of graph convolutional networks (GCNs) and a hypergraph convolutional network (HCN) through node message passing and a hyperedge mixing mechanism to facilitate intra-modal and inter-modal interactions between multimodal graphs. With HGCN, the potential for multimodal data to create more reliable predictions of patient's survival risk is dramatically increased compared to prior methods. Most importantly, to compensate for missing patient modalities in clinical scenarios, we incorporated an online masked autoencoder paradigm into HGCN, which can effectively capture intrinsic dependence between modalities and seamlessly generate missing hyperedges for model inference. Extensive experiments and analysis on six cancer cohorts from TCGA show that our method significantly outperforms the state-of-the-arts in both complete and missing modal settings. Our codes are made available at https://github.com/lin-lcx/HGCN.


Assuntos
Genômica , Neoplasias , Humanos , Neoplasias/diagnóstico por imagem
18.
IEEE Trans Med Imaging ; 42(5): 1337-1348, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37015475

RESUMO

Multi-instance learning (MIL) is widely adop- ted for automatic whole slide image (WSI) analysis and it usually consists of two stages, i.e., instance feature extraction and feature aggregation. However, due to the "weak supervision" of slide-level labels, the feature aggregation stage would suffer from severe over-fitting in training an effective MIL model. In this case, mining more information from limited slide-level data is pivotal to WSI analysis. Different from previous works on improving instance feature extraction, this paper investigates how to exploit the latent relationship of different instances (patches) to combat overfitting in MIL for more generalizable WSI classification. In particular, we propose a novel Multi-instance Rein- forcement Contrastive Learning framework (MuRCL) to deeply mine the inherent semantic relationships of different patches to advance WSI classification. Specifically, the proposed framework is first trained in a self-supervised manner and then finetuned with WSI slide-level labels. We formulate the first stage as a contrastive learning (CL) process, where positive/negative discriminative feature sets are constructed from the same patch-level feature bags of WSIs. To facilitate the CL training, we design a novel reinforcement learning-based agent to progressively update the selection of discriminative feature sets according to an online reward for slide-level feature aggregation. Then, we further update the model with labeled WSI data to regularize the learned features for the final WSI classification. Experimental results on three public WSI classification datasets (Camelyon16, TCGA-Lung and TCGA-Kidney) demonstrate that the proposed MuRCL outperforms state-of-the-art MIL models. In addition, MuRCL can achieve comparable performance to other state-of-the-art MIL models on TCGA-Esca dataset.


Assuntos
Processamento de Imagem Assistida por Computador , Aprendizado de Máquina Supervisionado , Humanos , Conjuntos de Dados como Assunto , Pulmão/diagnóstico por imagem , Rim/diagnóstico por imagem
19.
Research (Wash D C) ; 6: 0179, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37377457

RESUMO

Data-independent acquisition (DIA) technology for protein identification from mass spectrometry and related algorithms is developing rapidly. The spectrum-centric analysis of DIA data without the use of spectra library from data-dependent acquisition data represents a promising direction. In this paper, we proposed an untargeted analysis method, Dear-DIAXMBD, for direct analysis of DIA data. Dear-DIAXMBD first integrates the deep variational autoencoder and triplet loss to learn the representations of the extracted fragment ion chromatograms, then uses the k-means clustering algorithm to aggregate fragments with similar representations into the same classes, and finally establishes the inverted index tables to determine the precursors of fragment clusters between precursors and peptides and between fragments and peptides. We show that Dear-DIAXMBD performs superiorly with the highly complicated DIA data of different species obtained by different instrument platforms. Dear-DIAXMBD is publicly available at https://github.com/jianweishuai/Dear-DIA-XMBD.

20.
Patterns (N Y) ; 3(5): 100509, 2022 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-35607625

RESUMO

There is an increasing risk of people using advanced artificial intelligence, particularly the generative adversarial network (GAN), for scientific image manipulation for the purpose of publications. We demonstrated this possibility by using GAN to fabricate several different types of biomedical images and discuss possible ways for the detection and prevention of such scientific misconducts in research communities.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA