RESUMO
Lung cancer is both one of the most prevalent and lethal cancers. To improve health outcomes while reducing the healthcare burden, it becomes crucial to move towards early detection and cost-effective workflows. Currently there is no method for on-site rapid histological feedback on biopsies taken in diagnostic endoscopic or surgical procedures. Higher harmonic generation (HHG) microscopy is a laser-based technique that provides images of unprocessed tissue. Here, we report the feasibility of a HHG portable microscope in the clinical workflow in terms of acquisition time, image quality and diagnostic accuracy in suspected pulmonary and pleural malignancy. 109 biopsies of 47 patients were imaged and a biopsy overview image was provided within a median of 6 minutes after excision. The assessment by pathologists and an artificial intelligence (AI) algorithm showed that image quality was sufficient for a malignancy or non-malignancy diagnosis in 97% of the biopsies, and 87% of the HHG images were correctly scored by the pathologists. HHG is therefore an excellent candidate to provide rapid pathology outcome on biopsy samples enabling immediate diagnosis and (local) treatment.
RESUMO
The rapid introduction of digital pathology has greatly facilitated development of artificial intelligence (AI) models in pathology that have shown great promise in assisting morphological diagnostics and quantitation of therapeutic targets. We are now at a tipping point where companies have started to bring algorithms to the market, and questions arise whether the pathology community is ready to implement AI in routine workflow. However, concerns also arise about the use of AI in pathology. This article reviews the pros and cons of introducing AI in diagnostic pathology.
Assuntos
Algoritmos , Inteligência Artificial , Humanos , Fluxo de TrabalhoRESUMO
PURPOSE: Use a conference challenge format to compare machine learning-based gamma-aminobutyric acid (GABA)-edited magnetic resonance spectroscopy (MRS) reconstruction models using one-quarter of the transients typically acquired during a complete scan. METHODS: There were three tracks: Track 1: simulated data, Track 2: identical acquisition parameters with in vivo data, and Track 3: different acquisition parameters with in vivo data. The mean squared error, signal-to-noise ratio, linewidth, and a proposed shape score metric were used to quantify model performance. Challenge organizers provided open access to a baseline model, simulated noise-free data, guides for adding synthetic noise, and in vivo data. RESULTS: Three submissions were compared. A covariance matrix convolutional neural network model was most successful for Track 1. A vision transformer model operating on a spectrogram data representation was most successful for Tracks 2 and 3. Deep learning (DL) reconstructions with 80 transients achieved equivalent or better SNR, linewidth and fit error compared to conventional 320 transient reconstructions. However, some DL models optimized linewidth and SNR without actually improving overall spectral quality, indicating a need for more robust metrics. CONCLUSION: DL-based reconstruction pipelines have the promise to reduce the number of transients required for GABA-edited MRS.
Assuntos
Aprendizado Profundo , Espectroscopia de Ressonância Magnética , Razão Sinal-Ruído , Ácido gama-Aminobutírico , Ácido gama-Aminobutírico/metabolismo , Humanos , Espectroscopia de Ressonância Magnética/métodos , Redes Neurais de Computação , Algoritmos , Encéfalo/diagnóstico por imagem , Encéfalo/metabolismo , Aprendizado de Máquina , Processamento de Imagem Assistida por Computador/métodos , Simulação por ComputadorRESUMO
This literature review presents a comprehensive overview of machine learning (ML) applications in proton MR spectroscopy (MRS). As the use of ML techniques in MRS continues to grow, this review aims to provide the MRS community with a structured overview of the state-of-the-art methods. Specifically, we examine and summarize studies published between 2017 and 2023 from major journals in the MR field. We categorize these studies based on a typical MRS workflow, including data acquisition, processing, analysis, and artificial data generation. Our review reveals that ML in MRS is still in its early stages, with a primary focus on processing and analysis techniques, and less attention given to data acquisition. We also found that many studies use similar model architectures, with little comparison to alternative architectures. Additionally, the generation of artificial data is a crucial topic, with no consistent method for its generation. Furthermore, many studies demonstrate that artificial data suffers from generalization issues when tested on in vivo data. We also conclude that risks related to ML models should be addressed, particularly for clinical applications. Therefore, output uncertainty measures and model biases are critical to investigate. Nonetheless, the rapid development of ML in MRS and the promising results from the reviewed studies justify further research in this field.
Assuntos
Aprendizado de Máquina , Prótons , Espectroscopia de Ressonância Magnética/métodos , Fluxo de Trabalho , Espectroscopia de Prótons por Ressonância MagnéticaRESUMO
BACKGROUND: Keratoconus remains difficult to diagnose, especially in the early stages. It is a progressive disorder of the cornea that starts at a young age. Diagnosis is based on clinical examination and corneal imaging; though in the early stages, when there are no clinical signs, diagnosis depends on the interpretation of corneal imaging (e.g. topography and tomography) by trained cornea specialists. Using artificial intelligence (AI) to analyse the corneal images and detect cases of keratoconus could help prevent visual acuity loss and even corneal transplantation. However, a missed diagnosis in people seeking refractive surgery could lead to weakening of the cornea and keratoconus-like ectasia. There is a need for a reliable overview of the accuracy of AI for detecting keratoconus and the applicability of this automated method to the clinical setting. OBJECTIVES: To assess the diagnostic accuracy of artificial intelligence (AI) algorithms for detecting keratoconus in people presenting with refractive errors, especially those whose vision can no longer be fully corrected with glasses, those seeking corneal refractive surgery, and those suspected of having keratoconus. AI could help ophthalmologists, optometrists, and other eye care professionals to make decisions on referral to cornea specialists. Secondary objectives To assess the following potential causes of heterogeneity in diagnostic performance across studies. ⢠Different AI algorithms (e.g. neural networks, decision trees, support vector machines) ⢠Index test methodology (preprocessing techniques, core AI method, and postprocessing techniques) ⢠Sources of input to train algorithms (topography and tomography images from Placido disc system, Scheimpflug system, slit-scanning system, or optical coherence tomography (OCT); number of training and testing cases/images; label/endpoint variable used for training) ⢠Study setting ⢠Study design ⢠Ethnicity, or geographic area as its proxy ⢠Different index test positivity criteria provided by the topography or tomography device ⢠Reference standard, topography or tomography, one or two cornea specialists ⢠Definition of keratoconus ⢠Mean age of participants ⢠Recruitment of participants ⢠Severity of keratoconus (clinically manifest or subclinical) SEARCH METHODS: We searched CENTRAL (which contains the Cochrane Eyes and Vision Trials Register), Ovid MEDLINE, Ovid Embase, OpenGrey, the ISRCTN registry, ClinicalTrials.gov, and the World Health Organization International Clinical Trials Registry Platform (WHO ICTRP). There were no date or language restrictions in the electronic searches for trials. We last searched the electronic databases on 29 November 2022. SELECTION CRITERIA: We included cross-sectional and diagnostic case-control studies that investigated AI for the diagnosis of keratoconus using topography, tomography, or both. We included studies that diagnosed manifest keratoconus, subclinical keratoconus, or both. The reference standard was the interpretation of topography or tomography images by at least two cornea specialists. DATA COLLECTION AND ANALYSIS: Two review authors independently extracted the study data and assessed the quality of studies using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. When an article contained multiple AI algorithms, we selected the algorithm with the highest Youden's index. We assessed the certainty of evidence using the GRADE approach. MAIN RESULTS: We included 63 studies, published between 1994 and 2022, that developed and investigated the accuracy of AI for the diagnosis of keratoconus. There were three different units of analysis in the studies: eyes, participants, and images. Forty-four studies analysed 23,771 eyes, four studies analysed 3843 participants, and 15 studies analysed 38,832 images. Fifty-four articles evaluated the detection of manifest keratoconus, defined as a cornea that showed any clinical sign of keratoconus. The accuracy of AI seems almost perfect, with a summary sensitivity of 98.6% (95% confidence interval (CI) 97.6% to 99.1%) and a summary specificity of 98.3% (95% CI 97.4% to 98.9%). However, accuracy varied across studies and the certainty of the evidence was low. Twenty-eight articles evaluated the detection of subclinical keratoconus, although the definition of subclinical varied. We grouped subclinical keratoconus, forme fruste, and very asymmetrical eyes together. The tests showed good accuracy, with a summary sensitivity of 90.0% (95% CI 84.5% to 93.8%) and a summary specificity of 95.5% (95% CI 91.9% to 97.5%). However, the certainty of the evidence was very low for sensitivity and low for specificity. In both groups, we graded most studies at high risk of bias, with high applicability concerns, in the domain of patient selection, since most were case-control studies. Moreover, we graded the certainty of evidence as low to very low due to selection bias, inconsistency, and imprecision. We could not explain the heterogeneity between the studies. The sensitivity analyses based on study design, AI algorithm, imaging technique (topography versus tomography), and data source (parameters versus images) showed no differences in the results. AUTHORS' CONCLUSIONS: AI appears to be a promising triage tool in ophthalmologic practice for diagnosing keratoconus. Test accuracy was very high for manifest keratoconus and slightly lower for subclinical keratoconus, indicating a higher chance of missing a diagnosis in people without clinical signs. This could lead to progression of keratoconus or an erroneous indication for refractive surgery, which would worsen the disease. We are unable to draw clear and reliable conclusions due to the high risk of bias, the unexplained heterogeneity of the results, and high applicability concerns, all of which reduced our confidence in the evidence. Greater standardization in future research would increase the quality of studies and improve comparability between studies.
Assuntos
Inteligência Artificial , Ceratocone , Humanos , Ceratocone/diagnóstico por imagem , Estudos Transversais , Exame Físico , Estudos de Casos e ControlesRESUMO
Ductal carcinoma in situ (DCIS) is a non-invasive breast cancer that can progress into invasive ductal carcinoma (IDC). Studies suggest DCIS is often overtreated since a considerable part of DCIS lesions may never progress into IDC. Lower grade lesions have a lower progression speed and risk, possibly allowing treatment de-escalation. However, studies show significant inter-observer variation in DCIS grading. Automated image analysis may provide an objective solution to address high subjectivity of DCIS grading by pathologists. In this study, we developed and evaluated a deep learning-based DCIS grading system. The system was developed using the consensus DCIS grade of three expert observers on a dataset of 1186 DCIS lesions from 59 patients. The inter-observer agreement, measured by quadratic weighted Cohen's kappa, was used to evaluate the system and compare its performance to that of expert observers. We present an analysis of the lesion-level and patient-level inter-observer agreement on an independent test set of 1001 lesions from 50 patients. The deep learning system (dl) achieved on average slightly higher inter-observer agreement to the three observers (o1, o2 and o3) (κo1,dl = 0.81, κo2,dl = 0.53 and κo3,dl = 0.40) than the observers amongst each other (κo1,o2 = 0.58, κo1,o3 = 0.50 and κo2,o3 = 0.42) at the lesion-level. At the patient-level, the deep learning system achieved similar agreement to the observers (κo1,dl = 0.77, κo2,dl = 0.75 and κo3,dl = 0.70) as the observers amongst each other (κo1,o2 = 0.77, κo1,o3 = 0.75 and κo2,o3 = 0.72). The deep learning system better reflected the grading spectrum of DCIS than two of the observers. In conclusion, we developed a deep learning-based DCIS grading system that achieved a performance similar to expert observers. To the best of our knowledge, this is the first automated system for the grading of DCIS that could assist pathologists by providing robust and reproducible second opinions on DCIS grade.
Assuntos
Neoplasias da Mama , Carcinoma Intraductal não Infiltrante , Aprendizado Profundo , Interpretação de Imagem Assistida por Computador/métodos , Gradação de Tumores/métodos , Biópsia , Mama/patologia , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/patologia , Carcinoma Intraductal não Infiltrante/diagnóstico , Carcinoma Intraductal não Infiltrante/patologia , Feminino , Humanos , Pessoa de Meia-IdadeRESUMO
BACKGROUND: Basal cell carcinoma (BCC) is the most common type of skin cancer with incidence rates rising each year. Mohs micrographic surgery (MMS) is most often chosen as treatment for BCC on the face for which each frozen section has to be histologically analysed to ensure complete tumor removal. This causes a heavy burden on health economics. OBJECTIVES: To develop and evaluate a deep learning model for the automated detection of BCC-negative slides and classification of BCC in histopathology slides of MMS based on whole-slide image (WSI). METHODS: Two deep learning models were developed on the basis of 171 digitized H&E frozen slides from 70 different patients. The first model had a U-Net architecture and was used for the segmentation of BCC. A subsequent convolutional neural network used the segmentation to classify the whole slide as BCC or BCC-negative. RESULTS: Quantitative evaluation over manually labelled ground truth data resulted in a Dice score of 0.66 for the segmentation of BCC and an area under the receiver operating characteristic curve (AUC) of 0.90 for the slide-level classification. CONCLUSIONS: This study demonstrates that through WSIs deep learning models may be a feasible option to improve the clinical workflow and reduce costs in histological analysis of BCC in MMS.
Assuntos
Carcinoma Basocelular/cirurgia , Aprendizado Profundo , Cirurgia de Mohs , Neoplasias Cutâneas/cirurgia , Carcinoma Basocelular/patologia , Humanos , Margens de Excisão , Invasividade Neoplásica , Neoplasias Cutâneas/patologiaRESUMO
BACKGROUND: Quantitative myocardial perfusion cardiac MRI can provide a fast and robust assessment of myocardial perfusion status for the noninvasive diagnosis of myocardial ischemia while being more objective than visual assessment. However, it currently has limited use in clinical practice due to the challenging postprocessing required, particularly the segmentation. PURPOSE: To evaluate the efficacy of an automated deep learning (DL) pipeline for image processing prior to quantitative analysis. STUDY TYPE: Retrospective. POPULATION: In all, 175 (350 MRI scans; 1050 image series) clinical patients under both rest and stress conditions (135/10/30 training/validation/test). FIELD STRENGTH/SEQUENCE: 3.0T/2D multislice saturation recovery T1 -weighted gradient echo sequence. ASSESSMENT: Accuracy was assessed, as compared to the manual operator, through the mean square error of the distance between landmarks and the Dice similarity coefficient of the segmentation and bounding box detection. Quantitative perfusion maps obtained using the automated DL-based processing were compared to the results obtained with the manually processed images. STATISTICAL TESTS: Bland-Altman plots and intraclass correlation coefficient (ICC) were used to assess the myocardial blood flow (MBF) obtained using the automated DL pipeline, as compared to values obtained by a manual operator. RESULTS: The mean (SD) error in the detection of the time of peak signal enhancement in the left ventricle was 1.49 (1.4) timeframes. The mean (SD) Dice similarity coefficients for the bounding box and myocardial segmentation were 0.93 (0.03) and 0.80 (0.06), respectively. The mean (SD) error in the RV insertion point was 2.8 (1.8) mm. The Bland-Altman plots showed a bias of 2.6% of the mean MBF between the automated and manually processed MBF values on a per-myocardial segment basis. The ICC was 0.89, 95% confidence interval = [0.87, 0.90]. DATA CONCLUSION: We showed high accuracy, compared to manual processing, for the DL-based processing of myocardial perfusion data leading to quantitative values that are similar to those achieved with manual processing. LEVEL OF EVIDENCE: 3 Technical Efficacy Stage: 1 J. Magn. Reson. Imaging 2020;51:1689-1696.
Assuntos
Aprendizado Profundo , Humanos , Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética , Perfusão , Estudos RetrospectivosAssuntos
Transplante de Coração , Patologistas , Aloenxertos , Rejeição de Enxerto , Coração , HumanosRESUMO
Importance: Application of deep learning algorithms to whole-slide pathology images can potentially improve diagnostic accuracy and efficiency. Objective: Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin-stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists' diagnoses in a diagnostic setting. Design, Setting, and Participants: Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC). Exposures: Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation. Main Outcomes and Measures: The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor. Results: The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4% [95% CI, 64.3%-80.4%]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95% CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P < .001). The top 5 algorithms had a mean AUC that was comparable with the pathologist interpreting the slides in the absence of time constraints (mean AUC, 0.960 [range, 0.923-0.994] for the top 5 algorithms vs 0.966 [95% CI, 0.927-0.998] for the pathologist WOTC). Conclusions and Relevance: In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting.
Assuntos
Neoplasias da Mama/patologia , Metástase Linfática/diagnóstico , Aprendizado de Máquina , Patologistas , Algoritmos , Feminino , Humanos , Metástase Linfática/patologia , Patologia Clínica , Curva ROCRESUMO
Mitotic count (MC) is the most common measure to assess tumor proliferation in breast cancer patients and is highly predictive of patient outcomes. It is, however, subject to inter- and intraobserver variation and reproducibility challenges that may hamper its clinical utility. In past studies, artificial intelligence (AI)-supported MC has been shown to correlate well with traditional MC on glass slides. Considering the potential of AI to improve reproducibility of MC between pathologists, we undertook the next validation step by evaluating the prognostic value of a fully automatic method to detect and count mitoses on whole slide images using a deep learning model. The model was developed in the context of the Mitosis Domain Generalization Challenge 2021 (MIDOG21) grand challenge and was expanded by a novel automatic area selector method to find the optimal mitotic hotspot and calculate the MC per 2 mm2. We employed this method on a breast cancer cohort with long-term follow-up from the University Medical Centre Utrecht (N = 912) and compared predictive values for overall survival of AI-based MC and light-microscopic MC, previously assessed during routine diagnostics. The MIDOG21 model was prognostically comparable to the original MC from the pathology report in uni- and multivariate survival analysis. In conclusion, a fully automated MC AI algorithm was validated in a large cohort of breast cancer with regard to retained prognostic value compared with traditional light-microscopic MC.
Assuntos
Neoplasias da Mama , Mitose , Humanos , Neoplasias da Mama/patologia , Neoplasias da Mama/mortalidade , Feminino , Prognóstico , Pessoa de Meia-Idade , Aprendizado Profundo , Reprodutibilidade dos Testes , Índice Mitótico , Idoso , Valor Preditivo dos Testes , Inteligência Artificial , Interpretação de Imagem Assistida por Computador , AdultoRESUMO
Quantification of myocardial scar from late gadolinium enhancement (LGE) cardiovascular magnetic resonance (CMR) images can be facilitated by automated artificial intelligence (AI)-based analysis. However, AI models are susceptible to domain shifts in which the model performance is degraded when applied to data with different characteristics than the original training data. In this study, CycleGAN models were trained to translate local hospital data to the appearance of a public LGE CMR dataset. After domain adaptation, an AI scar quantification pipeline including myocardium segmentation, scar segmentation, and computation of scar burden, previously developed on the public dataset, was evaluated on an external test set including 44 patients clinically assessed for ischemic scar. The mean ± standard deviation Dice similarity coefficients between the manual and AI-predicted segmentations in all patients were similar to those previously reported: 0.76 ± 0.05 for myocardium and 0.75 ± 0.32 for scar, 0.41 ± 0.12 for scar in scans with pathological findings. Bland-Altman analysis showed a mean bias in scar burden percentage of -0.62% with limits of agreement from -8.4% to 7.17%. These results show the feasibility of deploying AI models, trained with public data, for LGE CMR quantification on local clinical data using unsupervised CycleGAN-based domain adaptation. RELEVANCE STATEMENT: Our study demonstrated the possibility of using AI models trained from public databases to be applied to patient data acquired at a specific institution with different acquisition settings, without additional manual labor to obtain further training labels.
Assuntos
Cicatriz , Imageamento por Ressonância Magnética , Humanos , Cicatriz/diagnóstico por imagem , Masculino , Feminino , Imageamento por Ressonância Magnética/métodos , Pessoa de Meia-Idade , Meios de Contraste , Idoso , Interpretação de Imagem Assistida por Computador/métodos , Inteligência ArtificialRESUMO
We introduce LYSTO, the Lymphocyte Assessment Hackathon, which was held in conjunction with the MICCAI 2019 Conference in Shenzhen (China). The competition required participants to automatically assess the number of lymphocytes, in particular T-cells, in images of colon, breast, and prostate cancer stained with CD3 and CD8 immunohistochemistry. Differently from other challenges setup in medical image analysis, LYSTO participants were solely given a few hours to address this problem. In this paper, we describe the goal and the multi-phase organization of the hackathon; we describe the proposed methods and the on-site results. Additionally, we present post-competition results where we show how the presented methods perform on an independent set of lung cancer slides, which was not part of the initial competition, as well as a comparison on lymphocyte assessment between presented methods and a panel of pathologists. We show that some of the participants were capable to achieve pathologist-level performance at lymphocyte assessment. After the hackathon, LYSTO was left as a lightweight plug-and-play benchmark dataset on grand-challenge website, together with an automatic evaluation platform.
Assuntos
Benchmarking , Neoplasias da Próstata , Masculino , Humanos , Linfócitos , Mama , ChinaRESUMO
Recognition of mitotic figures in histologic tumor specimens is highly relevant to patient outcome assessment. This task is challenging for algorithms and human experts alike, with deterioration of algorithmic performance under shifts in image representations. Considerable covariate shifts occur when assessment is performed on different tumor types, images are acquired using different digitization devices, or specimens are produced in different laboratories. This observation motivated the inception of the 2022 challenge on MItosis Domain Generalization (MIDOG 2022). The challenge provided annotated histologic tumor images from six different domains and evaluated the algorithmic approaches for mitotic figure detection provided by nine challenge participants on ten independent domains. Ground truth for mitotic figure detection was established in two ways: a three-expert majority vote and an independent, immunohistochemistry-assisted set of labels. This work represents an overview of the challenge tasks, the algorithmic strategies employed by the participants, and potential factors contributing to their success. With an F1 score of 0.764 for the top-performing team, we summarize that domain generalization across various tumor domains is possible with today's deep learning-based recognition pipelines. However, we also found that domain characteristics not present in the training set (feline as new species, spindle cell shape as new morphology and a new scanner) led to small but significant decreases in performance. When assessed against the immunohistochemistry-assisted reference standard, all methods resulted in reduced recall scores, with only minor changes in the order of participants in the ranking.
Assuntos
Laboratórios , Mitose , Humanos , Animais , Gatos , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Padrões de ReferênciaRESUMO
INTRODUCTION: The presence of tumor-infiltrating lymphocytes (TILs) in melanoma has been linked to survival. Their predictive capability for immune checkpoint inhibition (ICI) response remains uncertain. Therefore, we investigated the association between treatment response and TILs in the largest cohort to date and analyzed if this association was independent of known clinical predictors. METHODS: In this multicenter cohort study, patients who received first-line anti-PD1 ± anti-CTLA4 for advanced melanoma were identified. TILs were scored on hematoxylin and eosin (H&E) slides of primary melanoma and pre-treatment metastases using the validated TILs-WG, Clark and MIA score. The primary outcome was objective response rate (ORR), with progression free survival and overall survival being secondary outcomes. Univariable and multivariable logistic regression and Cox proportional hazard were performed, adjusting for known clinical predictors. RESULTS: Metastatic melanoma specimens were available for 650 patients and primary specimens for 565 patients. No association was found in primary melanoma specimens. In metastatic specimens, a 10-point increase in the TILs-WG score was associated with a higher probability of response (aOR 1.17, 95 % CI 1.07-1.28), increased PFS (HR 0.93, 95 % CI 0.87-0.996), and OS (HR 0.94, 95 % CI 0.89-0.99). When categorized, patients in the highest tertile TILs-WG score (15-100 %) compared to the lowest tertile (0 %) had a longer median PFS (13.1 vs. 7.3 months, p = 0.04) and OS (49.4 vs. 19.5 months, p = 0.003). Similar results were noted using the MIA and Clark scores. CONCLUSION: In advanced melanoma patients, TIL patterns on H&E slides of pre-treatment metastases, regardless of measurement method, are independently associated with ICI response. TILs are easily scored on readily available H&Es, facilitating the use of this biomarker in clinical practice.
Assuntos
Inibidores de Checkpoint Imunológico , Linfócitos do Interstício Tumoral , Melanoma , Neoplasias Cutâneas , Humanos , Melanoma/imunologia , Melanoma/tratamento farmacológico , Melanoma/patologia , Melanoma/mortalidade , Melanoma/secundário , Linfócitos do Interstício Tumoral/imunologia , Neoplasias Cutâneas/imunologia , Neoplasias Cutâneas/patologia , Neoplasias Cutâneas/tratamento farmacológico , Neoplasias Cutâneas/mortalidade , Inibidores de Checkpoint Imunológico/uso terapêutico , Masculino , Feminino , Pessoa de Meia-Idade , Idoso , Adulto , Estudos Retrospectivos , Melanoma Maligno Cutâneo , Idoso de 80 Anos ou mais , Intervalo Livre de ProgressãoRESUMO
BACKGROUND: In diseases such as interstitial lung diseases (ILDs), patient diagnosis relies on diagnostic analysis of bronchoalveolar lavage fluid (BALF) and biopsies. Immunological BALF analysis includes differentiation of leukocytes by standard cytological techniques that are labor-intensive and time-consuming. Studies have shown promising leukocyte identification performance on blood fractions, using third harmonic generation (THG) and multiphoton excited autofluorescence (MPEF) microscopy. OBJECTIVE: To extend leukocyte differentiation to BALF samples using THG/MPEF microscopy, and to show the potential of a trained deep learning algorithm for automated leukocyte identification and quantification. METHODS: Leukocytes from blood obtained from three healthy individuals and one asthma patient, and BALF samples from six ILD patients were isolated and imaged using label-free microscopy. The cytological characteristics of leukocytes, including neutrophils, eosinophils, lymphocytes, and macrophages, in terms of cellular and nuclear morphology, and THG and MPEF signal intensity, were determined. A deep learning model was trained on 2D images and used to estimate the leukocyte ratios at the image-level using the differential cell counts obtained using standard cytological techniques as reference. RESULTS: Different leukocyte populations were identified in BALF samples using label-free microscopy, showing distinctive cytological characteristics. Based on the THG/MPEF images, the deep learning network has learned to identify individual cells and was able to provide a reasonable estimate of the leukocyte percentage, reaching >90% accuracy on BALF samples in the hold-out testing set. CONCLUSIONS: Label-free THG/MPEF microscopy in combination with deep learning is a promising technique for instant differentiation and quantification of leukocytes. Immediate feedback on leukocyte ratios has potential to speed-up the diagnostic process and to reduce costs, workload and inter-observer variations.
Assuntos
Aprendizado Profundo , Doenças Pulmonares Intersticiais , Humanos , Líquido da Lavagem Broncoalveolar , Microscopia , Doenças Pulmonares Intersticiais/diagnóstico , Leucócitos , Diferenciação Celular , Contagem de Leucócitos , Lavagem BroncoalveolarRESUMO
Introduction: Breast cancer (BC) prognosis is largely influenced by histopathological grade, assessed according to the Nottingham modification of Bloom-Richardson (BR). Mitotic count (MC) is a component of histopathological grading but is prone to subjectivity. This study investigated whether mitoses counting in BC using digital whole slide images (WSI) compares better to light microscopy (LM) when assisted by artificial intelligence (AI), and to which extent differences in digital MC (AI assisted or not) result in BR grade variations. Methods: Fifty BC patients with paired core biopsies and resections were randomly selected. Component scores for BR grade were extracted from pathology reports. MC was assessed using LM, WSI, and AI. Different modalities (LM-MC, WSI-MC, and AI-MC) were analyzed for correlation with scatterplots and linear regression, and for agreement in final BR with Cohen's κ. Results: MC modalities strongly correlated in both biopsies and resections: LM-MC and WSI-MC (R2 0.85 and 0.83, respectively), LM-MC and AI-MC (R2 0.85 and 0.95), and WSI-MC and AI-MC (R2 0.77 and 0.83). Agreement in BR between modalities was high in both biopsies and resections: LM-MC and WSI-MC (κ 0.93 and 0.83, respectively), LM-MC and AI-MC (κ 0.89 and 0.83), and WSI-MC and AI-MC (κ 0.96 and 0.73). Conclusion: This first validation study shows that WSI-MC may compare better to LM-MC when using AI. Agreement between BR grade based on the different mitoses counting modalities was high. These results suggest that mitoses counting on WSI can well be done, and validate the presented AI algorithm for pathologist supervised use in daily practice. Further research is required to advance our knowledge of AI-MC, but it appears at least non-inferior to LM-MC.
RESUMO
The prognostic value of mitotic figures in tumor tissue is well-established for many tumor types and automating this task is of high research interest. However, especially deep learning-based methods face performance deterioration in the presence of domain shifts, which may arise from different tumor types, slide preparation and digitization devices. We introduce the MIDOG++ dataset, an extension of the MIDOG 2021 and 2022 challenge datasets. We provide region of interest images from 503 histological specimens of seven different tumor types with variable morphology with in total labels for 11,937 mitotic figures: breast carcinoma, lung carcinoma, lymphosarcoma, neuroendocrine tumor, cutaneous mast cell tumor, cutaneous melanoma, and (sub)cutaneous soft tissue sarcoma. The specimens were processed in several laboratories utilizing diverse scanners. We evaluated the extent of the domain shift by using state-of-the-art approaches, observing notable differences in single-domain training. In a leave-one-domain-out setting, generalizability improved considerably. This mitotic figure dataset is the first that incorporates a wide domain shift based on different tumor types, laboratories, whole slide image scanners, and species.
Assuntos
Mitose , Neoplasias , Humanos , Algoritmos , Prognóstico , Neoplasias/patologiaRESUMO
Lung ultrasound (LUS) is an important imaging modality used by emergency physicians to assess pulmonary congestion at the patient bedside. B-line artifacts in LUS videos are key findings associated with pulmonary congestion. Not only can the interpretation of LUS be challenging for novice operators, but visual quantification of B-lines remains subject to observer variability. In this work, we investigate the strengths and weaknesses of multiple deep learning approaches for automated B-line detection and localization in LUS videos. We curate and publish, BEDLUS, a new ultrasound dataset comprising 1,419 videos from 113 patients with a total of 15,755 expert-annotated B-lines. Based on this dataset, we present a benchmark of established deep learning methods applied to the task of B-line detection. To pave the way for interpretable quantification of B-lines, we propose a novel "single-point" approach to B-line localization using only the point of origin. Our results show that (a) the area under the receiver operating characteristic curve ranges from 0.864 to 0.955 for the benchmarked detection methods, (b) within this range, the best performance is achieved by models that leverage multiple successive frames as input, and (c) the proposed single-point approach for B-line localization reaches an F 1-score of 0.65, performing on par with the inter-observer agreement. The dataset and developed methods can facilitate further biomedical research on automated interpretation of lung ultrasound with the potential to expand the clinical utility.
Assuntos
Aprendizado Profundo , Edema Pulmonar , Humanos , Pulmão/diagnóstico por imagem , Ultrassonografia/métodos , Edema Pulmonar/diagnóstico , TóraxRESUMO
The density of mitotic figures (MF) within tumor tissue is known to be highly correlated with tumor proliferation and thus is an important marker in tumor grading. Recognition of MF by pathologists is subject to a strong inter-rater bias, limiting its prognostic value. State-of-the-art deep learning methods can support experts but have been observed to strongly deteriorate when applied in a different clinical environment. The variability caused by using different whole slide scanners has been identified as one decisive component in the underlying domain shift. The goal of the MICCAI MIDOG 2021 challenge was the creation of scanner-agnostic MF detection algorithms. The challenge used a training set of 200 cases, split across four scanning systems. As test set, an additional 100 cases split across four scanning systems, including two previously unseen scanners, were provided. In this paper, we evaluate and compare the approaches that were submitted to the challenge and identify methodological factors contributing to better performance. The winning algorithm yielded an F1 score of 0.748 (CI95: 0.704-0.781), exceeding the performance of six experts on the same task.