Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nat Methods ; 21(2): 182-194, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38347140

RESUMO

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.


Assuntos
Inteligência Artificial
2.
Nat Methods ; 21(2): 195-212, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38347141

RESUMO

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Aprendizado de Máquina , Semântica
3.
Nat Methods ; 20(7): 1010-1020, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37202537

RESUMO

The Cell Tracking Challenge is an ongoing benchmarking initiative that has become a reference in cell segmentation and tracking algorithm development. Here, we present a significant number of improvements introduced in the challenge since our 2017 report. These include the creation of a new segmentation-only benchmark, the enrichment of the dataset repository with new datasets that increase its diversity and complexity, and the creation of a silver standard reference corpus based on the most competitive results, which will be of particular interest for data-hungry deep learning-based strategies. Furthermore, we present the up-to-date cell segmentation and tracking leaderboards, an in-depth analysis of the relationship between the performance of the state-of-the-art methods and the properties of the datasets and annotations, and two novel, insightful studies about the generalizability and the reusability of top-performing methods. These studies provide critical practical conclusions for both developers and users of traditional and machine learning-based cell segmentation and tracking algorithms.


Assuntos
Benchmarking , Rastreamento de Células , Rastreamento de Células/métodos , Aprendizado de Máquina , Algoritmos
4.
Nat Methods ; 18(2): 203-211, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33288961

RESUMO

Biomedical imaging is a driver of scientific discovery and a core component of medical care and is being stimulated by the field of deep learning. While semantic segmentation algorithms enable image analysis and quantification in many applications, the design of respective specialized solutions is non-trivial and highly dependent on dataset properties and hardware conditions. We developed nnU-Net, a deep learning-based segmentation method that automatically configures itself, including preprocessing, network architecture, training and post-processing for any new task. The key design choices in this process are modeled as a set of fixed parameters, interdependent rules and empirical decisions. Without manual intervention, nnU-Net surpasses most existing approaches, including highly specialized solutions on 23 public datasets used in international biomedical segmentation competitions. We make nnU-Net publicly available as an out-of-the-box tool, rendering state-of-the-art segmentation accessible to a broad audience by requiring neither expert knowledge nor computing resources beyond standard network training.


Assuntos
Aprendizado Profundo , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação
5.
BJU Int ; 133(6): 690-698, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38343198

RESUMO

OBJECTIVE: To automate the generation of three validated nephrometry scoring systems on preoperative computerised tomography (CT) scans by developing artificial intelligence (AI)-based image processing methods. Subsequently, we aimed to evaluate the ability of these scores to predict meaningful pathological and perioperative outcomes. PATIENTS AND METHODS: A total of 300 patients with preoperative CT with early arterial contrast phase were identified from a cohort of 544 consecutive patients undergoing surgical extirpation for suspected renal cancer. A deep neural network approach was used to automatically segment kidneys and tumours, and then geometric algorithms were used to measure the components of the concordance index (C-Index), Preoperative Aspects and Dimensions Used for an Anatomical classification of renal tumours (PADUA), and tumour contact surface area (CSA) nephrometry scores. Human scores were independently calculated by medical personnel blinded to the AI scores. AI and human score agreement was assessed using linear regression and predictive abilities for meaningful outcomes were assessed using logistic regression and receiver operating characteristic curve analyses. RESULTS: The median (interquartile range) age was 60 (51-68) years, and 40% were female. The median tumour size was 4.2 cm and 91.3% had malignant tumours. In all, 27% of the tumours were high stage, 37% high grade, and 63% of the patients underwent partial nephrectomy. There was significant agreement between human and AI scores on linear regression analyses (R ranged from 0.574 to 0.828, all P < 0.001). The AI-generated scores were equivalent or superior to human-generated scores for all examined outcomes including high-grade histology, high-stage tumour, indolent tumour, pathological tumour necrosis, and radical nephrectomy (vs partial nephrectomy) surgical approach. CONCLUSIONS: Fully automated AI-generated C-Index, PADUA, and tumour CSA nephrometry scores are similar to human-generated scores and predict a wide variety of meaningful outcomes. Once validated, our results suggest that AI-generated nephrometry scores could be delivered automatically from a preoperative CT scan to a clinician and patient at the point of care to aid in decision making.


Assuntos
Neoplasias Renais , Tomografia Computadorizada por Raios X , Humanos , Feminino , Neoplasias Renais/patologia , Neoplasias Renais/cirurgia , Neoplasias Renais/diagnóstico por imagem , Masculino , Pessoa de Meia-Idade , Idoso , Nefrectomia/métodos , Valor Preditivo dos Testes , Inteligência Artificial , Estudos Retrospectivos
6.
Radiology ; 297(1): 164-175, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32720870

RESUMO

Background Relevance of antiangiogenic treatment with bevacizumab in patients with glioblastoma is controversial because progression-free survival benefit did not translate into an overall survival (OS) benefit in randomized phase III trials. Purpose To perform longitudinal characterization of intratumoral angiogenesis and oxygenation by using dynamic susceptibility contrast agent-enhanced (DSC) MRI and evaluate its potential for predicting outcome from administration of bevacizumab. Materials and Methods In this secondary analysis of the prospective randomized phase II/III European Organization for Research and Treatment of Cancer 26101 trial conducted between October 2011 and December 2015 in 596 patients with first recurrence of glioblastoma, the subset of patients with availability of anatomic MRI and DSC MRI at baseline and first follow-up was analyzed. Patients were allocated into those administered bevacizumab (hereafter, the BEV group; either bevacizumab monotherapy or bevacizumab with lomustine) and those not administered bevacizumab (hereafter, the non-BEV group with lomustine monotherapy). Contrast-enhanced tumor volume, noncontrast-enhanced T2 fluid-attenuated inversion recovery (FLAIR) signal abnormality volume, Gaussian-normalized relative cerebral blood volume (nrCBV), Gaussian-normalized relative blood flow (nrCBF), and tumor metabolic rate of oxygen (nTMRO2) was quantified. The predictive ability of these imaging parameters was assessed with multivariable Cox regression and formal interaction testing. Results A total of 254 of 596 patients were evaluated (mean age, 57 years ± 11; 155 men; 161 in the BEV group and 93 in non-BEV group). Progression-free survival was longer in the BEV group (3.7 months; 95% confidence interval [CI]: 3.0, 4.2) compared with the non-BEV group (2.5 months; 95% CI: 1.5, 2.9; P = .01), whereas OS was not different (P = .15). The nrCBV decreased for the BEV group (-16.3%; interquartile range [IQR], -39.5% to 12.0%; P = .01), but not for the non-BEV group (1.2%; IQR, -17.9% to 23.3%; P = .19) between baseline and first follow-up. An identical pattern was observed for both nrCBF and nTMRO2 values. Contrast-enhanced tumor and noncontrast-enhanced T2 FLAIR signal abnormality volumes decreased for the BEV group (-66% [IQR, -83% to -35%] and -33% [IQR, -71% to -5%], respectively; P < .001 for both), whereas they increased for the non-BEV group (30% [IQR, -17% to 98%], P = .001; and 10% [IQR, -13% to 82%], P = .02, respectively) between baseline and first follow-up. None of the assessed MRI parameters were predictive for OS in the BEV group. Conclusion Bevacizumab treatment decreased tumor volumes, angiogenesis, and oxygenation, thereby reflecting its effectiveness for extending progression-free survival; however, these parameters were not predictive of overall survival (OS), which highlighted the challenges of identifying patients that derive an OS benefit from bevacizumab. © RSNA, 2020 Online supplemental material is available for this article. See also the editorial by Dillon in this issue.


Assuntos
Inibidores da Angiogênese/uso terapêutico , Bevacizumab/uso terapêutico , Neoplasias Encefálicas/tratamento farmacológico , Glioblastoma/tratamento farmacológico , Imageamento por Ressonância Magnética/métodos , Neovascularização Patológica/tratamento farmacológico , Antineoplásicos Alquilantes/uso terapêutico , Neoplasias Encefálicas/patologia , Meios de Contraste , Europa (Continente) , Feminino , Glioblastoma/patologia , Humanos , Lomustina/uso terapêutico , Masculino , Pessoa de Meia-Idade , Recidiva Local de Neoplasia , Estudos Prospectivos , Análise de Sobrevida
7.
Radiology ; 295(2): 328-338, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32154773

RESUMO

Background Radiomic features may quantify characteristics present in medical imaging. However, the lack of standardized definitions and validated reference values have hampered clinical use. Purpose To standardize a set of 174 radiomic features. Materials and Methods Radiomic features were assessed in three phases. In phase I, 487 features were derived from the basic set of 174 features. Twenty-five research teams with unique radiomics software implementations computed feature values directly from a digital phantom, without any additional image processing. In phase II, 15 teams computed values for 1347 derived features using a CT image of a patient with lung cancer and predefined image processing configurations. In both phases, consensus among the teams on the validity of tentative reference values was measured through the frequency of the modal value and classified as follows: less than three matches, weak; three to five matches, moderate; six to nine matches, strong; 10 or more matches, very strong. In the final phase (phase III), a public data set of multimodality images (CT, fluorine 18 fluorodeoxyglucose PET, and T1-weighted MRI) from 51 patients with soft-tissue sarcoma was used to prospectively assess reproducibility of standardized features. Results Consensus on reference values was initially weak for 232 of 302 features (76.8%) at phase I and 703 of 1075 features (65.4%) at phase II. At the final iteration, weak consensus remained for only two of 487 features (0.4%) at phase I and 19 of 1347 features (1.4%) at phase II. Strong or better consensus was achieved for 463 of 487 features (95.1%) at phase I and 1220 of 1347 features (90.6%) at phase II. Overall, 169 of 174 features were standardized in the first two phases. In the final validation phase (phase III), most of the 169 standardized features could be excellently reproduced (166 with CT; 164 with PET; and 164 with MRI). Conclusion A set of 169 radiomics features was standardized, which enabled verification and calibration of different radiomics software. © RSNA, 2020 Online supplemental material is available for this article. See also the editorial by Kuhl and Truhn in this issue.


Assuntos
Biomarcadores/análise , Processamento de Imagem Assistida por Computador/normas , Software , Calibragem , Fluordesoxiglucose F18 , Humanos , Neoplasias Pulmonares/diagnóstico por imagem , Imageamento por Ressonância Magnética , Imagens de Fantasmas , Fenótipo , Tomografia por Emissão de Pósitrons , Compostos Radiofarmacêuticos , Reprodutibilidade dos Testes , Sarcoma/diagnóstico por imagem , Tomografia Computadorizada por Raios X
8.
Eur Radiol ; 30(4): 2356-2364, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-31900702

RESUMO

OBJECTIVES: Patients with multiple sclerosis (MS) regularly undergo MRI for assessment of disease burden. However, interpretation may be time consuming and prone to intra- and interobserver variability. Here, we evaluate the potential of artificial neural networks (ANN) for automated volumetric assessment of MS disease burden and activity on MRI. METHODS: A single-institutional dataset with 334 MS patients (334 MRI exams) was used to develop and train an ANN for automated identification and volumetric segmentation of T2/FLAIR-hyperintense and contrast-enhancing (CE) lesions. Independent testing was performed in a single-institutional longitudinal dataset with 82 patients (266 MRI exams). We evaluated lesion detection performance (F1 scores), lesion segmentation agreement (DICE coefficients), and lesion volume agreement (concordance correlation coefficients [CCC]). Independent evaluation was performed on the public ISBI-2015 challenge dataset. RESULTS: The F1 score was maximized in the training set at a detection threshold of 7 mm3 for T2/FLAIR lesions and 14 mm3 for CE lesions. In the training set, mean F1 scores were 0.867 for T2/FLAIR lesions and 0.636 for CE lesions, as compared to 0.878 for T2/FLAIR lesions and 0.715 for CE lesions in the test set. Using these thresholds, the ANN yielded mean DICE coefficients of 0.834 and 0.878 for segmentation of T2/FLAIR and CE lesions in the training set (fivefold cross-validation). Corresponding DICE coefficients in the test set were 0.846 for T2/FLAIR lesions and 0.908 for CE lesions, and the CCC was ≥ 0.960 in each dataset. CONCLUSIONS: Our results highlight the capability of ANN for quantitative state-of-the-art assessment of volumetric lesion load on MRI and potentially enable a more accurate assessment of disease burden in patients with MS. KEY POINTS: • Artificial neural networks (ANN) can accurately detect and segment both T2/FLAIR and contrast-enhancing MS lesions in MRI data. • Performance of the ANN was consistent in a clinically derived dataset, with patients presenting all possible disease stages in MRI scans acquired from standard clinical routine rather than with high-quality research sequences. • Computer-aided evaluation of MS with ANN could streamline both clinical and research procedures in the volumetric assessment of MS disease burden as well as in lesion detection.


Assuntos
Encéfalo/patologia , Imageamento por Ressonância Magnética/métodos , Esclerose Múltipla/diagnóstico , Redes Neurais de Computação , Adulto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Reprodutibilidade dos Testes
9.
Lancet Oncol ; 20(5): 728-740, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30952559

RESUMO

BACKGROUND: The Response Assessment in Neuro-Oncology (RANO) criteria and requirements for a uniform protocol have been introduced to standardise assessment of MRI scans in both clinical trials and clinical practice. However, these criteria mainly rely on manual two-dimensional measurements of contrast-enhancing (CE) target lesions and thus restrict both reliability and accurate assessment of tumour burden and treatment response. We aimed to develop a framework relying on artificial neural networks (ANNs) for fully automated quantitative analysis of MRI in neuro-oncology to overcome the inherent limitations of manual assessment of tumour burden. METHODS: In this retrospective study, we compiled a single-institution dataset of MRI data from patients with brain tumours being treated at Heidelberg University Hospital (Heidelberg, Germany; Heidelberg training dataset) to develop and train an ANN for automated identification and volumetric segmentation of CE tumours and non-enhancing T2-signal abnormalities (NEs) on MRI. Independent testing and large-scale application of the ANN for tumour segmentation was done in a single-institution longitudinal testing dataset from the Heidelberg University Hospital and in a multi-institutional longitudinal testing dataset from the prospective randomised phase 2 and 3 European Organisation for Research and Treatment of Cancer (EORTC)-26101 trial (NCT01290939), acquired at 38 institutions across Europe. In both longitudinal datasets, spatial and temporal tumour volume dynamics were automatically quantified to calculate time to progression, which was compared with time to progression determined by RANO, both in terms of reliability and as a surrogate endpoint for predicting overall survival. We integrated this approach for fully automated quantitative analysis of MRI in neuro-oncology within an application-ready software infrastructure and applied it in a simulated clinical environment of patients with brain tumours from the Heidelberg University Hospital (Heidelberg simulation dataset). FINDINGS: For training of the ANN, MRI data were collected from 455 patients with brain tumours (one MRI per patient) being treated at Heidelberg hospital between July 29, 2009, and March 17, 2017 (Heidelberg training dataset). For independent testing of the ANN, an independent longitudinal dataset of 40 patients, with data from 239 MRI scans, was collected at Heidelberg University Hospital in parallel with the training dataset (Heidelberg test dataset), and 2034 MRI scans from 532 patients at 34 institutions collected between Oct 26, 2011, and Dec 3, 2015, in the EORTC-26101 study were of sufficient quality to be included in the EORTC-26101 test dataset. The ANN yielded excellent performance for accurate detection and segmentation of CE tumours and NE volumes in both longitudinal test datasets (median DICE coefficient for CE tumours 0·89 [95% CI 0·86-0·90], and for NEs 0·93 [0·92-0·94] in the Heidelberg test dataset; CE tumours 0·91 [0·90-0·92], NEs 0·93 [0·93-0·94] in the EORTC-26101 test dataset). Time to progression from quantitative ANN-based assessment of tumour response was a significantly better surrogate endpoint than central RANO assessment for predicting overall survival in the EORTC-26101 test dataset (hazard ratios ANN 2·59 [95% CI 1·86-3·60] vs central RANO 2·07 [1·46-2·92]; p<0·0001) and also yielded a 36% margin over RANO (p<0·0001) when comparing reliability values (ie, agreement in the quantitative volumetrically defined time to progression [based on radiologist ground truth vs automated assessment with ANN] of 87% [266 of 306 with sufficient data] compared with 51% [155 of 306] with local vs independent central RANO assessment). In the Heidelberg simulation dataset, which comprised 466 patients with brain tumours, with 595 MRI scans obtained between April 27, and Sept 17, 2018, automated on-demand processing of MRI scans and quantitative tumour response assessment within the simulated clinical environment required 10 min of computation time (average per scan). INTERPRETATION: Overall, we found that ANN enabled objective and automated assessment of tumour response in neuro-oncology at high throughput and could ultimately serve as a blueprint for the application of ANN in radiology to improve clinical decision making. Future research should focus on prospective validation within clinical trials and application for automated high-throughput imaging biomarker discovery and extension to other diseases. FUNDING: Medical Faculty Heidelberg Postdoc-Program, Else Kröner-Fresenius Foundation.


Assuntos
Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/terapia , Diagnóstico por Computador , Interpretação de Imagem Assistida por Computador , Imageamento por Ressonância Magnética , Redes Neurais de Computação , Automação , Neoplasias Encefálicas/patologia , Ensaios Clínicos Fase II como Assunto , Ensaios Clínicos Fase III como Assunto , Bases de Dados Factuais , Progressão da Doença , Feminino , Alemanha , Humanos , Masculino , Estudos Multicêntricos como Assunto , Valor Preditivo dos Testes , Ensaios Clínicos Controlados Aleatórios como Assunto , Reprodutibilidade dos Testes , Estudos Retrospectivos , Fatores de Tempo , Resultado do Tratamento , Carga Tumoral , Fluxo de Trabalho
10.
Hum Brain Mapp ; 40(17): 4952-4964, 2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31403237

RESUMO

Brain extraction is a critical preprocessing step in the analysis of neuroimaging studies conducted with magnetic resonance imaging (MRI) and influences the accuracy of downstream analyses. The majority of brain extraction algorithms are, however, optimized for processing healthy brains and thus frequently fail in the presence of pathologically altered brain or when applied to heterogeneous MRI datasets. Here we introduce a new, rigorously validated algorithm (termed HD-BET) relying on artificial neural networks that aim to overcome these limitations. We demonstrate that HD-BET outperforms six popular, publicly available brain extraction algorithms in several large-scale neuroimaging datasets, including one from a prospective multicentric trial in neuro-oncology, yielding state-of-the-art performance with median improvements of +1.16 to +2.50 points for the Dice coefficient and -0.66 to -2.51 mm for the Hausdorff distance. Importantly, the HD-BET algorithm, which shows robust performance in the presence of pathology or treatment-induced tissue alterations, is applicable to a broad range of MRI sequence types and is not influenced by variations in MRI hardware and acquisition parameters encountered in both research and clinical practice. For broader accessibility, the HD-BET prediction algorithm is made freely available (www.neuroAI-HD.org) and may become an essential component for robust, automated, high-throughput processing of MRI neuroimaging data.


Assuntos
Encéfalo/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética , Redes Neurais de Computação , Algoritmos , Humanos , Neuroimagem/métodos
11.
Adv Mater ; 36(7): e2307160, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37904613

RESUMO

Large-area processing of perovskite semiconductor thin-films is complex and evokes unexplained variance in quality, posing a major hurdle for the commercialization of perovskite photovoltaics. Advances in scalable fabrication processes are currently limited to gradual and arbitrary trial-and-error procedures. While the in situ acquisition of photoluminescence (PL) videos has the potential to reveal important variations in the thin-film formation process, the high dimensionality of the data quickly surpasses the limits of human analysis. In response, this study leverages deep learning (DL) and explainable artificial intelligence (XAI) to discover relationships between sensor information acquired during the perovskite thin-film formation process and the resulting solar cell performance indicators, while rendering these relationships humanly understandable. The study further shows how gained insights can be distilled into actionable recommendations for perovskite thin-film processing, advancing toward industrial-scale solar cell manufacturing. This study demonstrates that XAI methods will play a critical role in accelerating energy materials science.

12.
ArXiv ; 2024 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-36945687

RESUMO

Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.

13.
Med Image Anal ; 90: 102927, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37672900

RESUMO

Performance metrics for medical image segmentation models are used to measure the agreement between the reference annotation and the predicted segmentation. Usually, overlap metrics, such as the Dice, are used as a metric to evaluate the performance of these models in order for results to be comparable. However, there is a mismatch between the distributions of cases and the difficulty level of segmentation tasks in public data sets compared to clinical practice. Common metrics used to assess performance fail to capture the impact of this mismatch, particularly when dealing with datasets in clinical settings that involve challenging segmentation tasks, pathologies with low signal, and reference annotations that are uncertain, small, or empty. Limitations of common metrics may result in ineffective machine learning research in designing and optimizing models. To effectively evaluate the clinical value of such models, it is essential to consider factors such as the uncertainty associated with reference annotations, the ability to accurately measure performance regardless of the size of the reference annotation volume, and the classification of cases where reference annotations are empty. We study how uncertain, small, and empty reference annotations influence the value of metrics on a stroke in-house data set regardless of the model. We examine metrics behavior on the predictions of a standard deep learning framework in order to identify suitable metrics in such a setting. We compare our results to the BRATS 2019 and Spinal Cord public data sets. We show how uncertain, small, or empty reference annotations require a rethinking of the evaluation. The evaluation code was released to encourage further analysis of this topic https://github.com/SophieOstmeier/UncertainSmallEmpty.git.

14.
J Nucl Med ; 64(10): 1594-1602, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37562802

RESUMO

Evaluation of metabolic tumor volume (MTV) changes using amino acid PET has become an important tool for response assessment in brain tumor patients. MTV is usually determined by manual or semiautomatic delineation, which is laborious and may be prone to intra- and interobserver variability. The goal of our study was to develop a method for automated MTV segmentation and to evaluate its performance for response assessment in patients with gliomas. Methods: In total, 699 amino acid PET scans using the tracer O-(2-[18F]fluoroethyl)-l-tyrosine (18F-FET) from 555 brain tumor patients at initial diagnosis or during follow-up were retrospectively evaluated (mainly glioma patients, 76%). 18F-FET PET MTVs were segmented semiautomatically by experienced readers. An artificial neural network (no new U-Net) was configured on 476 scans from 399 patients, and the network performance was evaluated on a test dataset including 223 scans from 156 patients. Surface and volumetric Dice similarity coefficients (DSCs) were used to evaluate segmentation quality. Finally, the network was applied to a recently published 18F-FET PET study on response assessment in glioblastoma patients treated with adjuvant temozolomide chemotherapy for a fully automated response assessment in comparison to an experienced physician. Results: In the test dataset, 92% of lesions with increased uptake (n = 189) and 85% of lesions with iso- or hypometabolic uptake (n = 33) were correctly identified (F1 score, 92%). Single lesions with a contiguous uptake had the highest DSC, followed by lesions with heterogeneous, noncontiguous uptake and multifocal lesions (surface DSC: 0.96, 0.93, and 0.81 respectively; volume DSC: 0.83, 0.77, and 0.67, respectively). Change in MTV, as detected by the automated segmentation, was a significant determinant of disease-free and overall survival, in agreement with the physician's assessment. Conclusion: Our deep learning-based 18F-FET PET segmentation allows reliable, robust, and fully automated evaluation of MTV in brain tumor patients and demonstrates clinical value for automated response assessment.


Assuntos
Neoplasias Encefálicas , Glioma , Humanos , Aminoácidos , Estudos Retrospectivos , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/terapia , Glioma/patologia , Compostos Radiofarmacêuticos/uso terapêutico , Tirosina , Tomografia por Emissão de Pósitrons/métodos
15.
Urology ; 180: 160-167, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37517681

RESUMO

OBJECTIVE: To determine whether we can surpass the traditional R.E.N.A.L. nephrometry score (H-score) prediction ability of pathologic outcomes by creating artificial intelligence (AI)-generated R.E.N.A.L.+ score (AI+ score) with continuous rather than ordinal components. We also assessed the AI+ score components' relative importance with respect to outcome odds. METHODS: This is a retrospective study of 300 consecutive patients with preoperative computed tomography scans showing suspected renal cancer at a single institution from 2010 to 2018. H-score was tabulated by three trained medical personnel. Deep neural network approach automatically generated kidney segmentation masks of parenchyma and tumor. Geometric algorithms were used to automatically estimate score components as ordinal and continuous variables. Multivariate logistic regression of continuous R.E.N.A.L. components was used to generate AI+ score. Predictive utility was compared between AI+, AI, and H-scores for variables of interest, and AI+ score components' relative importance was assessed. RESULTS: Median age was 60years (interquartile range 51-68), and 40% were female. Median tumor size was 4.2 cm (2.6-6.12), and 92% were malignant, including 27%, 37%, and 23% with high-stage, high-grade, and necrosis, respectively. AI+ score demonstrated superior predictive ability over AI and H-scores for predicting malignancy (area under the curve [AUC] 0.69 vs 0.67 vs 0.64, respectively), high stage (AUC 0.82 vs 0.65 vs 0.71, respectively), high grade (AUC 0.78 vs 0.65 vs 0.65, respectively), pathologic tumor necrosis (AUC 0.81 vs 0.72 vs 0.74, respectively), and partial nephrectomy approach (AUC 0.88 vs 0.74 vs 0.79, respectively). Of AI+ score components, the maximal tumor diameter ("R") was the most important outcomes predictor. CONCLUSION: AI+ score was superior to AI-score and H-score in predicting oncologic outcomes. Time-efficient AI+ score can be used at the point of care, surpassing validated clinical scoring systems.

16.
Med Image Anal ; 86: 102765, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36965252

RESUMO

Challenges have become the state-of-the-art approach to benchmark image analysis algorithms in a comparative manner. While the validation on identical data sets was a great step forward, results analysis is often restricted to pure ranking tables, leaving relevant questions unanswered. Specifically, little effort has been put into the systematic investigation on what characterizes images in which state-of-the-art algorithms fail. To address this gap in the literature, we (1) present a statistical framework for learning from challenges and (2) instantiate it for the specific task of instrument instance segmentation in laparoscopic videos. Our framework relies on the semantic meta data annotation of images, which serves as foundation for a General Linear Mixed Models (GLMM) analysis. Based on 51,542 meta data annotations performed on 2,728 images, we applied our approach to the results of the Robust Medical Instrument Segmentation Challenge (ROBUST-MIS) challenge 2019 and revealed underexposure, motion and occlusion of instruments as well as the presence of smoke or other objects in the background as major sources of algorithm failure. Our subsequent method development, tailored to the specific remaining issues, yielded a deep learning model with state-of-the-art overall performance and specific strengths in the processing of images in which previous methods tended to fail. Due to the objectivity and generic applicability of our approach, it could become a valuable tool for validation in the field of medical image analysis and beyond.


Assuntos
Algoritmos , Laparoscopia , Humanos , Processamento de Imagem Assistida por Computador/métodos
17.
Sci Rep ; 13(1): 19805, 2023 11 13.
Artigo em Inglês | MEDLINE | ID: mdl-37957250

RESUMO

Prostate cancer (PCa) diagnosis on multi-parametric magnetic resonance images (MRI) requires radiologists with a high level of expertise. Misalignments between the MRI sequences can be caused by patient movement, elastic soft-tissue deformations, and imaging artifacts. They further increase the complexity of the task prompting radiologists to interpret the images. Recently, computer-aided diagnosis (CAD) tools have demonstrated potential for PCa diagnosis typically relying on complex co-registration of the input modalities. However, there is no consensus among research groups on whether CAD systems profit from using registration. Furthermore, alternative strategies to handle multi-modal misalignments have not been explored so far. Our study introduces and compares different strategies to cope with image misalignments and evaluates them regarding to their direct effect on diagnostic accuracy of PCa. In addition to established registration algorithms, we propose 'misalignment augmentation' as a concept to increase CAD robustness. As the results demonstrate, misalignment augmentations can not only compensate for a complete lack of registration, but if used in conjunction with registration, also improve the overall performance on an independent test set.


Assuntos
Próstata , Neoplasias da Próstata , Masculino , Humanos , Próstata/diagnóstico por imagem , Próstata/patologia , Imageamento por Ressonância Magnética/métodos , Diagnóstico por Computador/métodos , Neoplasias da Próstata/diagnóstico por imagem , Neoplasias da Próstata/patologia , Computadores
18.
Neuro Oncol ; 25(3): 533-543, 2023 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-35917833

RESUMO

BACKGROUND: To assess whether artificial intelligence (AI)-based decision support allows more reproducible and standardized assessment of treatment response on MRI in neuro-oncology as compared to manual 2-dimensional measurements of tumor burden using the Response Assessment in Neuro-Oncology (RANO) criteria. METHODS: A series of 30 patients (15 lower-grade gliomas, 15 glioblastoma) with availability of consecutive MRI scans was selected. The time to progression (TTP) on MRI was separately evaluated for each patient by 15 investigators over two rounds. In the first round the TTP was evaluated based on the RANO criteria, whereas in the second round the TTP was evaluated by incorporating additional information from AI-enhanced MRI sequences depicting the longitudinal changes in tumor volumes. The agreement of the TTP measurements between investigators was evaluated using concordance correlation coefficients (CCC) with confidence intervals (CI) and P-values obtained using bootstrap resampling. RESULTS: The CCC of TTP-measurements between investigators was 0.77 (95% CI = 0.69,0.88) with RANO alone and increased to 0.91 (95% CI = 0.82,0.95) with AI-based decision support (P = .005). This effect was significantly greater (P = .008) for patients with lower-grade gliomas (CCC = 0.70 [95% CI = 0.56,0.85] without vs. 0.90 [95% CI = 0.76,0.95] with AI-based decision support) as compared to glioblastoma (CCC = 0.83 [95% CI = 0.75,0.92] without vs. 0.86 [95% CI = 0.78,0.93] with AI-based decision support). Investigators with less years of experience judged the AI-based decision as more helpful (P = .02). CONCLUSIONS: AI-based decision support has the potential to yield more reproducible and standardized assessment of treatment response in neuro-oncology as compared to manual 2-dimensional measurements of tumor burden, particularly in patients with lower-grade gliomas. A fully-functional version of this AI-based processing pipeline is provided as open-source (https://github.com/NeuroAI-HD/HD-GLIO-XNAT).


Assuntos
Neoplasias Encefálicas , Glioblastoma , Glioma , Humanos , Glioblastoma/patologia , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/terapia , Neoplasias Encefálicas/patologia , Inteligência Artificial , Reprodutibilidade dos Testes , Glioma/diagnóstico por imagem , Glioma/terapia , Glioma/patologia
19.
J Appl Crystallogr ; 55(Pt 3): 444-454, 2022 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-35719305

RESUMO

Single particle imaging (SPI) at X-ray free-electron lasers is particularly well suited to determining the 3D structure of particles at room temperature. For a successful reconstruction, diffraction patterns originating from a single hit must be isolated from a large number of acquired patterns. It is proposed that this task could be formulated as an image-classification problem and solved using convolutional neural network (CNN) architectures. Two CNN configurations are developed: one that maximizes the F1 score and one that emphasizes high recall. The CNNs are also combined with expectation-maximization (EM) selection as well as size filtering. It is observed that the CNN selections have lower contrast in power spectral density functions relative to the EM selection used in previous work. However, the reconstruction of the CNN-based selections gives similar results. Introducing CNNs into SPI experiments allows the reconstruction pipeline to be streamlined, enables researchers to classify patterns on the fly, and, as a consequence, enables them to tightly control the duration of their experiments. Incorporating non-standard artificial-intelligence-based solutions into an existing SPI analysis workflow may be beneficial for the future development of SPI experiments.

20.
Photoacoustics ; 26: 100341, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35371919

RESUMO

Photoacoustic (PA) imaging has the potential to revolutionize functional medical imaging in healthcare due to the valuable information on tissue physiology contained in multispectral photoacoustic measurements. Clinical translation of the technology requires conversion of the high-dimensional acquired data into clinically relevant and interpretable information. In this work, we present a deep learning-based approach to semantic segmentation of multispectral photoacoustic images to facilitate image interpretability. Manually annotated photoacoustic and ultrasound imaging data are used as reference and enable the training of a deep learning-based segmentation algorithm in a supervised manner. Based on a validation study with experimentally acquired data from 16 healthy human volunteers, we show that automatic tissue segmentation can be used to create powerful analyses and visualizations of multispectral photoacoustic images. Due to the intuitive representation of high-dimensional information, such a preprocessing algorithm could be a valuable means to facilitate the clinical translation of photoacoustic imaging.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA