RESUMO
Recent advances in foundation models have revolutionized model development in digital pathology, reducing dependence on extensive manual annotations required by traditional methods. The ability of foundation models to generalize well with few-shot learning addresses critical barriers in adapting models to diverse medical imaging tasks. This work presents the Granular Box Prompt Segment Anything Model (GB-SAM), an improved version of the Segment Anything Model (SAM) fine-tuned using granular box prompts with limited training data. The GB-SAM aims to reduce the dependency on expert pathologist annotators by enhancing the efficiency of the automated annotation process. Granular box prompts are small box regions derived from ground truth masks, conceived to replace the conventional approach of using a single large box covering the entire H&E-stained image patch. This method allows a localized and detailed analysis of gland morphology, enhancing the segmentation accuracy of individual glands and reducing the ambiguity that larger boxes might introduce in morphologically complex regions. We compared the performance of our GB-SAM model against U-Net trained on different sizes of the CRAG dataset. We evaluated the models across histopathological datasets, including CRAG, GlaS, and Camelyon16. GB-SAM consistently outperformed U-Net, with reduced training data, showing less segmentation performance degradation. Specifically, on the CRAG dataset, GB-SAM achieved a Dice coefficient of 0.885 compared to U-Net's 0.857 when trained on 25% of the data. Additionally, GB-SAM demonstrated segmentation stability on the CRAG testing dataset and superior generalization across unseen datasets, including challenging lymph node segmentation in Camelyon16, which achieved a Dice coefficient of 0.740 versus U-Net's 0.491. Furthermore, compared to SAM-Path and Med-SAM, GB-SAM showed competitive performance. GB-SAM achieved a Dice score of 0.900 on the CRAG dataset, while SAM-Path achieved 0.884. On the GlaS dataset, Med-SAM reported a Dice score of 0.956, whereas GB-SAM achieved 0.885 with significantly less training data. These results highlight GB-SAM's advanced segmentation capabilities and reduced dependency on large datasets, indicating its potential for practical deployment in digital pathology, particularly in settings with limited annotated datasets.
RESUMO
The interaction between antigens and antibodies (B cell receptors, BCRs) is the key step underlying the function of the humoral immune system in various biological contexts. The capability to profile the landscape of antigen-binding affinity of a vast number of BCRs will provide a powerful tool to reveal novel insights at unprecedented levels and will yield powerful tools for translational development. However, current experimental approaches for profiling antibody-antigen interactions are costly and time-consuming, and can only achieve low-to-mid throughput. On the other hand, bioinformatics tools in the field of antibody informatics mostly focus on optimization of antibodies given known binding antigens, which is a very different research question and of limited scope. In this work, we developed an innovative Artificial Intelligence tool, Cmai, to address the prediction of the binding between antibodies and antigens that can be scaled to high-throughput sequencing data. Cmai achieved an AUROC of 0.91 in our validation cohort. We devised a biomarker metric based on the output from Cmai applied to high-throughput BCR sequencing data. We found that, during immune-related adverse events (irAEs) caused by immune-checkpoint inhibitor (ICI) treatment, the humoral immunity is preferentially responsive to intracellular antigens from the organs affected by the irAEs. In contrast, extracellular antigens on malignant tumor cells are inducing B cell infiltrations, and the infiltrating B cells have a greater tendency to co-localize with tumor cells expressing these antigens. We further found that the abundance of tumor antigen-targeting antibodies is predictive of ICI treatment response. Overall, Cmai and our biomarker approach filled in a gap that is not addressed by current antibody optimization works nor works such as AlphaFold3 that predict the structures of complexes of proteins that are known to bind.
RESUMO
Existing natural language processing (NLP) methods to convert free-text clinical notes into structured data often require problem-specific annotations and model training. This study aims to evaluate ChatGPT's capacity to extract information from free-text medical notes efficiently and comprehensively. We developed a large language model (LLM)-based workflow, utilizing systems engineering methodology and spiral "prompt engineering" process, leveraging OpenAI's API for batch querying ChatGPT. We evaluated the effectiveness of this method using a dataset of more than 1000 lung cancer pathology reports and a dataset of 191 pediatric osteosarcoma pathology reports, comparing the ChatGPT-3.5 (gpt-3.5-turbo-16k) outputs with expert-curated structured data. ChatGPT-3.5 demonstrated the ability to extract pathological classifications with an overall accuracy of 89%, in lung cancer dataset, outperforming the performance of two traditional NLP methods. The performance is influenced by the design of the instructive prompt. Our case analysis shows that most misclassifications were due to the lack of highly specialized pathology terminology, and erroneous interpretation of TNM staging rules. Reproducibility shows the relatively stable performance of ChatGPT-3.5 over time. In pediatric osteosarcoma dataset, ChatGPT-3.5 accurately classified both grades and margin status with accuracy of 98.6% and 100% respectively. Our study shows the feasibility of using ChatGPT to process large volumes of clinical notes for structured information extraction without requiring extensive task-specific human annotation and model training. The results underscore the potential role of LLMs in transforming unstructured healthcare data into structured formats, thereby supporting research and aiding clinical decision-making.
RESUMO
Recent advancements in tissue imaging techniques have facilitated the visualization and identification of various cell types within physiological and pathological contexts. Despite the emergence of cell-cell interaction studies, there is a lack of methods for evaluating individual spatial interactions. In this study, we introduce Ceograph, a cell spatial organization-based graph convolutional network designed to analyze cell spatial organization (for example,. the cell spatial distribution, morphology, proximity, and interactions) derived from pathology images. Ceograph identifies key cell spatial organization features by accurately predicting their influence on patient clinical outcomes. In patients with oral potentially malignant disorders, our model highlights reduced structural concordance and increased closeness in epithelial substrata as driving features for an elevated risk of malignant transformation. In lung cancer patients, Ceograph detects elongated tumor nuclei and diminished stroma-stroma closeness as biomarkers for insensitivity to EGFR tyrosine kinase inhibitors. With its potential to predict various clinical outcomes, Ceograph offers a deeper understanding of biological processes and supports the development of personalized therapeutic strategies.
Assuntos
Aprendizado Profundo , Neoplasias Pulmonares , Humanos , Comunicação Celular , Núcleo Celular , Neoplasias Pulmonares/diagnóstico por imagemRESUMO
Patient-derived xenografts (PDX) remain valuable models for understanding the biology and for developing novel therapeutics. To expand current PDX models of childhood leukemia, we have developed new PDX models from Hispanic patients, a subgroup with a poorer overall outcome. Of 117 primary leukemia samples obtained, successful engraftment and serial passage in mice were achieved in 82 samples (70%). Hispanic patient samples engrafted at a rate (51/73, 70%) that was similar to non-Hispanic patient samples (31/45, 70%). With a new algorithm to remove mouse contamination in multi-omics datasets including methylation data, we found PDX models faithfully reflected somatic mutations, copy-number alterations, RNA expression, gene fusions, whole-genome methylation patterns, and immunophenotypes found in primary tumor (PT) samples in the first 50 reported here. This cohort of characterized PDX childhood leukemias represents a valuable resource in that germline DNA sequencing has allowed the unambiguous determination of somatic mutations in both PT and PDX.
RESUMO
PURPOSE: Osteosarcoma research advancement requires enhanced data integration across different modalities and sources. Current osteosarcoma research, encompassing clinical, genomic, protein, and tissue imaging data, is hindered by the siloed landscape of data generation and storage. MATERIALS AND METHODS: Clinical, molecular profiling, and tissue imaging data for 573 patients with pediatric osteosarcoma were collected from four public and institutional sources. A common data model incorporating standardized terminology was created to facilitate the transformation, integration, and load of source data into a relational database. On the basis of this database, a data commons accompanied by a user-friendly web portal was developed, enabling various data exploration and analytics functions. RESULTS: The Osteosarcoma Explorer (OSE) was released to the public in 2021. Leveraging a comprehensive and harmonized data set on the backend, the OSE offers a wide range of functions, including Cohort Discovery, Patient Dashboard, Image Visualization, and Online Analysis. Since its initial release, the OSE has experienced an increasing utilization by the osteosarcoma research community and provided solid, continuous user support. To our knowledge, the OSE is the largest (N = 573) and most comprehensive research data commons for pediatric osteosarcoma, a rare disease. This project demonstrates an effective framework for data integration and data commons development that can be readily applied to other projects sharing similar goals. CONCLUSION: The OSE offers an online exploration and analysis platform for integrated clinical, molecular profiling, and tissue imaging data of osteosarcoma. Its underlying data model, database, and web framework support continuous expansion onto new data modalities and sources.
Assuntos
Gerenciamento de Dados , Osteossarcoma , Criança , Humanos , Bases de Dados Factuais , Genômica , Osteossarcoma/diagnóstico por imagem , Osteossarcoma/genéticaRESUMO
Head and neck squamous cell carcinoma (HNSCC), specifically in the oral cavity (oral squamous cell carcinoma, OSCC), is a common, complex cancer that significantly affects patients' quality of life. Early diagnosis typically improves prognoses yet relies on pathologist examination of histology images that exhibit high inter- and intra-observer variation. The advent of deep learning has automated this analysis, notably with object segmentation. However, techniques for automated oral dysplasia diagnosis have been limited to shape or cell stain information, without addressing the diagnostic potential in counting the number of cell layers in the oral epithelium. Our study attempts to address this gap by combining the existing U-Net and HD-Staining architectures for segmenting the oral epithelium and introducing a novel algorithm that we call Onion Peeling for counting the epithelium layer number. Experimental results show a close correlation between our algorithmic and expert manual layer counts, demonstrating the feasibility of automated layer counting. We also show the clinical relevance of oral epithelial layer number to grading oral dysplasia severity through survival analysis. Overall, our study shows that automated counting of oral epithelium layers can represent a potential addition to the digital pathology toolbox. Model generalizability and accuracy could be improved further with a larger training dataset.
RESUMO
Recent advancements in tissue imaging techniques have facilitated the visualization and identification of various cell types within physiological and pathological contexts. Despite the emergence of cell-cell interaction studies, there is a lack of methods for evaluating individual spatial interactions. In this study, we introduce Ceograph, a novel cell spatial organization-based graph convolutional network designed to analyze cell spatial organization (i.e. the cell spatial distribution, morphology, proximity, and interactions) derived from pathology images. Ceograph identifies key cell spatial organization features by accurately predicting their influence on patient clinical outcomes. In patients with oral potentially malignant disorders, our model highlights reduced structural concordance and increased closeness in epithelial substrata as driving features for an elevated risk of malignant transformation. In lung cancer patients, Ceograph detects elongated tumor nuclei and diminished stroma-stroma closeness as biomarkers for insensitivity to EGFR tyrosine kinase inhibitors. With its potential to predict various clinical outcomes, Ceograph offers a deeper understanding of biological processes and supports the development of personalized therapeutic strategies.
RESUMO
Microscopic examination of pathology slides is essential to disease diagnosis and biomedical research. However, traditional manual examination of tissue slides is laborious and subjective. Tumor whole-slide image (WSI) scanning is becoming part of routine clinical procedures and produces massive data that capture tumor histologic details at high resolution. Furthermore, the rapid development of deep learning algorithms has significantly increased the efficiency and accuracy of pathology image analysis. In light of this progress, digital pathology is fast becoming a powerful tool to assist pathologists. Studying tumor tissue and its surrounding microenvironment provides critical insight into tumor initiation, progression, metastasis, and potential therapeutic targets. Nucleus segmentation and classification are critical to pathology image analysis, especially in characterizing and quantifying the tumor microenvironment (TME). Computational algorithms have been developed for nucleus segmentation and TME quantification within image patches. However, existing algorithms are computationally intensive and time consuming for WSI analysis. This study presents Histology-based Detection using Yolo (HD-Yolo), a new method that significantly accelerates nucleus segmentation and TME quantification. We demonstrate that HD-Yolo outperforms existing WSI analysis methods in nucleus detection, classification accuracy, and computation time. We validated the advantages of the system on 3 different tissue types: lung cancer, liver cancer, and breast cancer. For breast cancer, nucleus features by HD-Yolo were more prognostically significant than both the estrogen receptor status by immunohistochemistry and the progesterone receptor status by immunohistochemistry. The WSI analysis pipeline and a real-time nucleus segmentation viewer are available at https://github.com/impromptuRong/hd_wsi.
Assuntos
Neoplasias da Mama , Aprendizado Profundo , Humanos , Feminino , Microambiente Tumoral , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Neoplasias da Mama/patologiaRESUMO
Polyploidy, the duplication of the entire genome within a single cell, is a significant characteristic of cells in many tissues, including the liver. The quantification of hepatic ploidy typically relies on flow cytometry and immunofluorescence (IF) imaging, which are not widely available in clinical settings due to high financial and time costs. To improve accessibility for clinical samples, we developed a computational algorithm to quantify hepatic ploidy using hematoxylin-eosin (H&E) histopathology images, which are commonly obtained during routine clinical practice. Our algorithm uses a deep learning model to first segment and classify different types of cell nuclei in H&E images. It then determines cellular ploidy based on the relative distance between identified hepatocyte nuclei and determines nuclear ploidy using a fitted Gaussian mixture model. The algorithm can establish the total number of hepatocytes and their detailed ploidy information in a region of interest (ROI) on H&E images. This is the first successful attempt to automate ploidy analysis on H&E images. Our algorithm is expected to serve as an important tool for studying the role of polyploidy in human liver disease.
Assuntos
Aprendizado Profundo , Humanos , Amarelo de Eosina-(YS) , Hematoxilina , Fígado , Ploidias , PoliploidiaRESUMO
Over the past decade, many new cancer treatments have been developed and made available to patients. However, in most cases, these treatments only benefit a specific subgroup of patients, making the selection of treatment for a specific patient an essential but challenging task for oncologists. Although some biomarkers were found to associate with treatment response, manual assessment is time-consuming and subjective. With the rapid developments and expanded implementation of artificial intelligence (AI) in digital pathology, many biomarkers can be quantified automatically from histopathology images. This approach allows for a more efficient and objective assessment of biomarkers, aiding oncologists in formulating personalized treatment plans for cancer patients. This review presents an overview and summary of the recent studies on biomarker quantification and treatment response prediction using hematoxylin-eosin (H&E) stained pathology images. These studies have shown that an AI-based digital pathology approach can be practical and will become increasingly important in improving the selection of cancer treatments for patients.
Assuntos
Aprendizado Profundo , Neoplasias , Humanos , Inteligência Artificial , Medicina de Precisão/métodos , Neoplasias/terapia , Neoplasias/patologiaRESUMO
Whole slide imaging is becoming a routine procedure in clinical diagnosis. Advanced image analysis techniques have been developed to assist pathologists in disease diagnosis, staging, subtype classification, and risk stratification. Recently, deep learning algorithms have achieved state-of-the-art performances in various imaging analysis tasks, including tumor region segmentation, nuclei detection, and disease classification. However, widespread clinical use of these algorithms is hampered by their performances often degrading due to image quality issues commonly seen in real-world pathology imaging data such as low resolution, blurring regions, and staining variation. Restore-Generative Adversarial Network (GAN), a deep learning model, was developed to improve the imaging qualities by restoring blurred regions, enhancing low resolution, and normalizing staining colors. The results demonstrate that Restore-GAN can significantly improve image quality, which leads to improved model robustness and performance for existing deep learning algorithms in pathology image analysis. Restore-GAN has the potential to be used to facilitate the applications of deep learning models in digital pathology analyses.
Assuntos
Algoritmos , Patologistas , Humanos , Núcleo Celular , Processamento de Imagem Assistida por Computador , Coloração e RotulagemRESUMO
Tyrosine kinase inhibitors (TKIs) targeting epidermal growth factor receptor (EGFR) are effective for many patients with lung cancer with EGFR mutations. However, not all patients are responsive to EGFR TKIs, including even those harboring EGFR-sensitizing mutations. In this study, we quantified the cells and cellular interaction features of the tumor microenvironment (TME) using routine H&E-stained biopsy sections. These TME features were used to develop a prediction model for survival benefit from EGFR TKI therapy in patients with lung adenocarcinoma and EGFR-sensitizing mutations in the Lung Cancer Mutation Consortium 1 (LCMC1) and validated in an independent LCMC2 cohort. In the validation data set, EGFR TKI treatment prolonged survival in the predicted-to-benefit group but not in the predicted-not-to-benefit group. Among patients treated with EGFR TKIs, the predicted-to-benefit group had prolonged survival outcomes compared with the predicted not-to-benefit group. The EGFR TKI survival benefit positively correlated with tumor-tumor interaction image features and negatively correlated with tumor-stroma interaction. Moreover, the tumor-stroma interaction was associated with higher activation of the hepatocyte growth factor/MET-mediated PI3K/AKT signaling pathway and epithelial-mesenchymal transition process, supporting the hypothesis of fibroblast-involved resistance to EGFR TKI treatment.
Assuntos
Neoplasias Pulmonares , Fosfatidilinositol 3-Quinases , Humanos , Fosfatidilinositol 3-Quinases/genética , Microambiente Tumoral/genética , Inibidores de Proteínas Quinases/farmacologia , Inibidores de Proteínas Quinases/uso terapêutico , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/metabolismo , Receptores ErbB/metabolismo , Resistencia a Medicamentos Antineoplásicos/genética , MutaçãoRESUMO
Background: Current treatment guidelines for stage IV non-small cell lung cancer (NSCLC) with brain metastases recommend brain treatments, including surgical resection and radiotherapy (RT), in addition to resection of the primary lung tumor. Here, we investigate the less-studied impact of treatment sequence on the overall survival. Methods: The National Cancer Database was queried for NSCLC patients with brain metastases who underwent surgical resection of the primary lung tumor (n = 776). Kaplan-Meier survival curves with log-rank test and propensity score stratified Cox regression with Wald test were used to evaluate the associations between various treatment plans and overall survival (OS). Results: Compared to patients who did not receive any brain treatment (median OS = 6.05 months), significantly better survival was observed for those who received brain surgery plus RT (median OS = 26.25 months, p < 0.0001) and for those who received brain RT alone (median OS = 14.49 months, p < 0.001). Patients who received one upfront brain treatment (surgery or RT) before lung surgery were associated with better survival than those who received lung surgery first (p < 0.05). The best survival outcome (median OS 27.1 months) was associated with the sequence of brain surgery plus postoperative brain RT followed by lung surgery. Conclusions: This study shows the value of performing upfront brain treatments followed by primary lung tumor resection for NSCLC patients with brain metastases, especially the procedure of brain surgery plus postoperative brain RT followed by lung surgery.
RESUMO
PURPOSE: To develop a noninvasive prognostic imaging biomarker related to hypoxia to predict SABR tumor control. METHODS AND MATERIALS: A total of 145 subcutaneous syngeneic Dunning prostate R3327-AT1 rat tumors were focally irradiated once using cone beam computed tomography guidance on a small animal irradiator at 225 kV. Various doses in the range of 0 to 100 Gy were administered, while rats breathed air or oxygen, and tumor control was assessed up to 200 days. Oxygen-sensitive magnetic resonance imaging (MRI) (T1-weighted, ΔR1, ΔR2*) was applied to 79 of these tumors at 4.7 T to assess response to an oxygen gas breathing challenge on the day before irradiation as a probe of tumor hypoxia. RESULTS: Increasing radiation dose in the range of 0 to 90 Gy enhanced tumor control of air-breathing rats with a TCD50 estimated at 59.6 ± 1.5 Gy. Control was significantly improved at some doses when rats breathed oxygen during irradiation (eg, 40 Gy; P < .05), and overall there was a modest left shift in the control curve: TCD50(oxygen)â¯=â¯53.1 ± 3.1 Gy (P < .05 vs air). Oxygen-sensitive MRI showed variable response to oxygen gas breathing challenge; the magnitude of T1-weighted signal response (%ΔSI) allowed stratification of tumors in terms of local control at 40 Gy. Tumors showing %ΔSI >0.922 with O2-gas breathing challenge showed significantly better control at 40 Gy during irradiation while breathing oxygen (75% vs 0%, P < .01). In addition, increased radiation dose (50 Gy) substantially overcame resistance, with 50% control for poorly oxygenated tumors. Stratification of dose-response curves based on %ΔSI >0.922 revealed different survival curves, with TCD50â¯=â¯36.2 ± 3.2 Gy for tumors responsive to oxygen gas breathing challenge; this was significantly less than the 54.7 ± 2.4 Gy for unresponsive tumors (P < .005), irrespective of the gas inhaled during tumor irradiation. CONCLUSIONS: Oxygen-sensitive MRI allowed stratification of tumors in terms of local control at 40 Gy, indicating its use as a potential predictive imaging biomarker. Increasing dose to 50 Gy overcame radiation resistance attributable to hypoxia in 50% of tumors.
Assuntos
Imageamento por Ressonância Magnética/métodos , Oxigênio/administração & dosagem , Neoplasias da Próstata/diagnóstico por imagem , Neoplasias da Próstata/radioterapia , Tolerância a Radiação , Radioterapia Guiada por Imagem/métodos , Hipóxia Tumoral , Ar , Animais , Biomarcadores , Tomografia Computadorizada de Feixe Cônico , Relação Dose-Resposta à Radiação , Masculino , Transplante de Neoplasias , Prognóstico , Neoplasias da Próstata/fisiopatologia , Dosagem Radioterapêutica , Ratos , Fatores de TempoRESUMO
This study aims to develop an artificial intelligence (AI)-based model to assist radiologists in pneumoconiosis screening and staging using chest radiographs. The model, based on chest radiographs, was developed using a training cohort and validated using an independent test cohort. Every image in the training and test datasets were labeled by experienced radiologists in a double-blinded fashion. The computational model started by segmenting the lung field into six subregions. Then, convolutional neural network classification model was used to predict the opacity level for each subregion respectively. Finally, the diagnosis for each subject (normal, stage I, II, or III pneumoconiosis) was determined by summarizing the subregion-based prediction results. For the independent test cohort, pneumoconiosis screening accuracy was 0.973, with both sensitivity and specificity greater than 0.97. The accuracy for pneumoconiosis staging was 0.927, better than that achieved by two groups of radiologists (0.87 and 0.84, respectively). This study develops a deep learning-based model for screening and staging of pneumoconiosis using man-annotated chest radiographs. The model outperformed two groups of radiologists in the accuracy of pneumoconiosis staging. This pioneer work demonstrates the feasibility and efficiency of AI-assisted radiography screening and diagnosis in occupational lung diseases.
Assuntos
Aprendizado Profundo , Programas de Rastreamento , Modelos Biológicos , Pneumoconiose/diagnóstico , Bases de Dados como Assunto , Humanos , Pneumoconiose/diagnóstico por imagem , Pneumoconiose/patologia , Radiologistas , Reprodutibilidade dos TestesRESUMO
BACKGROUND: Lung adenocarcinomas (ADCs) show heterogeneous morphological patterns that are classified into five subgroups: lepidic predominant, papillary predominant, acinar predominant, micropapillary predominant and solid predominant. The morphological classification of ADCs has been reported to be associated with patient prognosis and adjuvant chemotherapy response. However, the molecular mechanisms underlying the morphology differences among different subgroups remain largely unknown. METHODS: Using the molecular profiling data from The Cancer Genome Atlas (TCGA) lung ADC (LUAD) cohort, we studied the molecular differences across invasive ADC morphological subgroups. RESULTS: We showed that the expression of proteins and mRNAs, but not the gene mutations copy number alterations (CNA), were significantly associated with lung ADC morphological subgroups. In addition, expression of the FOXM1 gene (which is negatively associated with patient survival) likely plays an important role in the morphological differences among different subgroups. Moreover, we found that protein abundance of PD-L1 were associated with the malignancy of subgroups. These results were validated in an independent cohort. CONCLUSIONS: This study provides insights into the molecular differences among different lung ADC morphological subgroups, which could lead to potential subgroup-specific therapies.
RESUMO
Germ cell tumors (GCTs) are considered a rare disease but are the most common solid tumors in adolescents and young adults, accounting for 15% of all malignancies in this age group. The rarity of GCTs in some groups, particularly children, has impeded progress in treatment and biologic understanding. The most effective GCT research will result from the interrogation of data sets from historical and prospective trials across institutions. However, inconsistent use of terminology among groups, different sample-labeling rules, and lack of data standards have hampered researchers' efforts in data sharing and across-study validation. To overcome the low interoperability of data and facilitate future clinical trials, we worked with the Malignant Germ Cell International Consortium (MaGIC) and developed a GCT clinical data model as a uniform standard to curate and harmonize GCT data sets. This data model will also be the standard for prospective data collection in future trials. Using the GCT data model, we developed a GCT data commons with data sets from both MaGIC and public domains as an integrated research platform. The commons supports functions, such as data query, management, sharing, visualization, and analysis of the harmonized data, as well as patient cohort discovery. This GCT data commons will facilitate future collaborative research to advance the biologic understanding and treatment of GCTs. Moreover, the framework of the GCT data model and data commons will provide insights for other rare disease research communities into developing similar collaborative research platforms.
Assuntos
Neoplasias Embrionárias de Células Germinativas , Neoplasias , Adolescente , Estudos de Coortes , Humanos , Disseminação de Informação , Neoplasias Embrionárias de Células Germinativas/epidemiologia , Neoplasias Embrionárias de Células Germinativas/terapiaRESUMO
The spatial organization of different types of cells in tumor tissues reveals important information about the tumor microenvironment (TME). To facilitate the study of cellular spatial organization and interactions, we developed Histology-based Digital-Staining, a deep learning-based computation model, to segment the nuclei of tumor, stroma, lymphocyte, macrophage, karyorrhexis, and red blood cells from standard hematoxylin and eosin-stained pathology images in lung adenocarcinoma. Using this tool, we identified and classified cell nuclei and extracted 48 cell spatial organization-related features that characterize the TME. Using these features, we developed a prognostic model from the National Lung Screening Trial dataset, and independently validated the model in The Cancer Genome Atlas lung adenocarcinoma dataset, in which the predicted high-risk group showed significantly worse survival than the low-risk group (P = 0.001), with a HR of 2.23 (1.37-3.65) after adjusting for clinical variables. Furthermore, the image-derived TME features significantly correlated with the gene expression of biological pathways. For example, transcriptional activation of both the T-cell receptor and programmed cell death protein 1 pathways positively correlated with the density of detected lymphocytes in tumor tissues, while expression of the extracellular matrix organization pathway positively correlated with the density of stromal cells. In summary, we demonstrate that the spatial organization of different cell types is predictive of patient survival and associated with the gene expression of biological pathways. SIGNIFICANCE: These findings present a deep learning-based analysis tool to study the TME in pathology images and demonstrate that the cell spatial organization is predictive of patient survival and is associated with gene expression.See related commentary by Rodriguez-Antolin, p. 1912.
Assuntos
Adenocarcinoma de Pulmão , Neoplasias Pulmonares , Inteligência Artificial , Humanos , Coloração e Rotulagem , Microambiente TumoralRESUMO
BACKGROUND: The spatial distributions of different types of cells could reveal a cancer cell's growth pattern, its relationships with the tumor microenvironment and the immune response of the body, all of which represent key "hallmarks of cancer". However, the process by which pathologists manually recognize and localize all the cells in pathology slides is extremely labor intensive and error prone. METHODS: In this study, we developed an automated cell type classification pipeline, ConvPath, which includes nuclei segmentation, convolutional neural network-based tumor cell, stromal cell, and lymphocyte classification, and extraction of tumor microenvironment-related features for lung cancer pathology images. To facilitate users in leveraging this pipeline for their research, all source scripts for ConvPath software are available at https://qbrc.swmed.edu/projects/cnn/. FINDINGS: The overall classification accuracy was 92.9% and 90.1% in training and independent testing datasets, respectively. By identifying cells and classifying cell types, this pipeline can convert a pathology image into a "spatial map" of tumor, stromal and lymphocyte cells. From this spatial map, we can extract features that characterize the tumor micro-environment. Based on these features, we developed an image feature-based prognostic model and validated the model in two independent cohorts. The predicted risk group serves as an independent prognostic factor, after adjusting for clinical variables that include age, gender, smoking status, and stage. INTERPRETATION: The analysis pipeline developed in this study could convert the pathology image into a "spatial map" of tumor cells, stromal cells and lymphocytes. This could greatly facilitate and empower comprehensive analysis of the spatial organization of cells, as well as their roles in tumor progression and metastasis.