Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
1.
J Med Imaging (Bellingham) ; 11(1): 014501, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38283653

RESUMO

Purpose: Understanding an artificial intelligence (AI) model's ability to generalize to its target population is critical to ensuring the safe and effective usage of AI in medical devices. A traditional generalizability assessment relies on the availability of large, diverse datasets, which are difficult to obtain in many medical imaging applications. We present an approach for enhanced generalizability assessment by examining the decision space beyond the available testing data distribution. Approach: Vicinal distributions of virtual samples are generated by interpolating between triplets of test images. The generated virtual samples leverage the characteristics already in the test set, increasing the sample diversity while remaining close to the AI model's data manifold. We demonstrate the generalizability assessment approach on the non-clinical tasks of classifying patient sex, race, COVID status, and age group from chest x-rays. Results: Decision region composition analysis for generalizability indicated that a disproportionately large portion of the decision space belonged to a single "preferred" class for each task, despite comparable performance on the evaluation dataset. Evaluation using cross-reactivity and population shift strategies indicated a tendency to overpredict samples as belonging to the preferred class (e.g., COVID negative) for patients whose subgroup was not represented in the model development data. Conclusions: An analysis of an AI model's decision space has the potential to provide insight into model generalizability. Our approach uses the analysis of composition of the decision space to obtain an improved assessment of model generalizability in the case of limited test data.

2.
J Clin Transl Sci ; 7(1): e212, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37900353

RESUMO

Increasing emphasis on the use of real-world evidence (RWE) to support clinical policy and regulatory decision-making has led to a proliferation of guidance, advice, and frameworks from regulatory agencies, academia, professional societies, and industry. A broad spectrum of studies use real-world data (RWD) to produce RWE, ranging from randomized trials with outcomes assessed using RWD to fully observational studies. Yet, many proposals for generating RWE lack sufficient detail, and many analyses of RWD suffer from implausible assumptions, other methodological flaws, or inappropriate interpretations. The Causal Roadmap is an explicit, itemized, iterative process that guides investigators to prespecify study design and analysis plans; it addresses a wide range of guidance within a single framework. By supporting the transparent evaluation of causal assumptions and facilitating objective comparisons of design and analysis choices based on prespecified criteria, the Roadmap can help investigators to evaluate the quality of evidence that a given study is likely to produce, specify a study to generate high-quality RWE, and communicate effectively with regulatory agencies and other stakeholders. This paper aims to disseminate and extend the Causal Roadmap framework for use by clinical and translational researchers; three companion papers demonstrate applications of the Causal Roadmap for specific use cases.

3.
Ther Innov Regul Sci ; 57(3): 453-463, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36869194

RESUMO

The use of Bayesian statistics to support regulatory evaluation of medical devices began in the late 1990s. We review the literature, focusing on recent developments of Bayesian methods, including hierarchical modeling of studies and subgroups, borrowing strength from prior data, effective sample size, Bayesian adaptive designs, pediatric extrapolation, benefit-risk decision analysis, use of real-world evidence, and diagnostic device evaluation. We illustrate how these developments were utilized in recent medical device evaluations. In Supplementary Material, we provide a list of medical devices for which Bayesian statistics were used to support approval by the US Food and Drug Administration (FDA), including those since 2010, the year the FDA published their guidance on Bayesian statistics for medical devices. We conclude with a discussion of current and future challenges and opportunities for Bayesian statistics, including artificial intelligence/machine learning (AI/ML) Bayesian modeling, uncertainty quantification, Bayesian approaches using propensity scores, and computational challenges for high dimensional data and models.


Assuntos
Inteligência Artificial , Projetos de Pesquisa , Estados Unidos , Humanos , Criança , Teorema de Bayes , Tamanho da Amostra , United States Food and Drug Administration
4.
J Biopharm Stat ; 33(5): 611-638, 2023 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-36710380

RESUMO

A limitation of the common measures of diagnostic test performance, such as sensitivity and specificity, is that they do not consider the relative importance of false negative and false positive test results, which are likely to have different clinical consequences. Therefore, the use of classification or prediction measures alone to compare diagnostic tests or biomarkers can be inconclusive for clinicians. Comparing tests on net benefit can be more conclusive because clinical consequences of misdiagnoses are considered. The literature suggested evaluating the binary diagnostic tests based on net benefit, but did not consider diagnostic tests that classify more than two disease states, e.g., stroke subtype (large-artery atherosclerosis, cardioembolism, small-vessel occlusion, stroke of other determined etiology, stroke of undetermined etiology), skin lesion subtype, breast cancer subtypes (benign, mass, calcification, architectural distortion, etc.), METAVIR liver fibrosis state (F0- F4), histopathological classification of cervical intraepithelial neoplasia (CIN), prostate Gleason grade, brain injury (intracranial hemorrhage, mass effect, midline shift, cranial fracture) . Other diseases have more than two stages, such as Alzheimer's disease (dementia due to Alzheimer's disease, mild cognitive disability (MCI) due to Alzheimer's disease, and preclinical presymptomatics due to Alzheimer's disease). In diseases with more than two states, the benefits and risks may vary between states. This paper extends the net-benefit approach of evaluating binary diagnostic tests to multi-state clinical conditions to rule-in or rule-out a clinical condition based on adverse consequences of work-up delay (due to false negative test result) and unnecessary workup (due to false positive test result). We demonstrate our approach with numerical examples and real data.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Acidente Vascular Cerebral , Masculino , Humanos , Doença de Alzheimer/diagnóstico , Disfunção Cognitiva/diagnóstico , Sensibilidade e Especificidade , Acidente Vascular Cerebral/diagnóstico , Testes Diagnósticos de Rotina , Testes Neuropsicológicos
5.
Acad Radiol ; 30(2): 215-229, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36411153

RESUMO

This paper is the fifth in a five-part series on statistical methodology for performance assessment of multi-parametric quantitative imaging biomarkers (mpQIBs) for radiomic analysis. Radiomics is the process of extracting visually imperceptible features from radiographic medical images using data-driven algorithms. We refer to the radiomic features as data-driven imaging markers (DIMs), which are quantitative measures discovered under a data-driven framework from images beyond visual recognition but evident as patterns of disease processes irrespective of whether or not ground truth exists for the true value of the DIM. This paper aims to set guidelines on how to build machine learning models using DIMs in radiomics and to apply and report them appropriately. We provide a list of recommendations, named RANDAM (an abbreviation of "Radiomic ANalysis and DAta Modeling"), for analysis, modeling, and reporting in a radiomic study to make machine learning analyses in radiomics more reproducible. RANDAM contains five main components to use in reporting radiomics studies: design, data preparation, data analysis and modeling, reporting, and material availability. Real case studies in lung cancer research are presented along with simulation studies to compare different feature selection methods and several validation strategies.


Assuntos
Neoplasias Pulmonares , Imageamento por Ressonância Magnética Multiparamétrica , Humanos , Curva ROC , Imageamento por Ressonância Magnética Multiparamétrica/métodos , Diagnóstico por Imagem , Neoplasias Pulmonares/diagnóstico por imagem , Pulmão
6.
Acad Radiol ; 30(2): 196-214, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36273996

RESUMO

Combinations of multiple quantitative imaging biomarkers (QIBs) are often able to predict the likelihood of an event of interest such as death or disease recurrence more effectively than single imaging measurements can alone. The development of such multiparametric quantitative imaging and evaluation of its fitness of use differs from the analogous processes for individual QIBs in several key aspects. A computational procedure to combine the QIB values into a model output must be specified. The output must also be reproducible and be shown to have reasonably strong ability to predict the risk of an event of interest. Attention must be paid to statistical issues not often encountered in the single QIB scenario, including overfitting and bias in the estimates of model performance. This is the fourth in a five-part series on statistical methodology for assessing the technical performance of multiparametric quantitative imaging. Considerations for data acquisition are discussed and recommendations from the literature on methodology to construct and evaluate QIB-based models for risk prediction are summarized. The findings in the literature upon which these recommendations are based are demonstrated through simulation studies. The concepts in this manuscript are applied to a real-life example involving prediction of major adverse cardiac events using automated plaque analysis.


Assuntos
Diagnóstico por Imagem , Humanos , Diagnóstico por Imagem/métodos , Biomarcadores , Simulação por Computador
7.
Acad Radiol ; 30(2): 159-182, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36464548

RESUMO

Multiparametric quantitative imaging biomarkers (QIBs) offer distinct advantages over single, univariate descriptors because they provide a more complete measure of complex, multidimensional biological systems. In disease, where structural and functional disturbances occur across a multitude of subsystems, multivariate QIBs are needed to measure the extent of system malfunction. This paper, the first Use Case in a series of articles on multiparameter imaging biomarkers, considers multiple QIBs as a multidimensional vector to represent all relevant disease constructs more completely. The approach proposed offers several advantages over QIBs as multiple endpoints and avoids combining them into a single composite that obscures the medical meaning of the individual measurements. We focus on establishing statistically rigorous methods to create a single, simultaneous measure from multiple QIBs that preserves the sensitivity of each univariate QIB while incorporating the correlation among QIBs. Details are provided for metrological methods to quantify the technical performance. Methods to reduce the set of QIBs, test the superiority of the mp-QIB model to any univariate QIB model, and design study strategies for generating precision and validity claims are also provided. QIBs of Alzheimer's Disease from the ADNI merge data set are used as a case study to illustrate the methods described.


Assuntos
Doença de Alzheimer , Diagnóstico por Imagem , Humanos , Diagnóstico por Imagem/métodos , Biomarcadores , Doença de Alzheimer/diagnóstico por imagem
8.
Acad Radiol ; 30(2): 183-195, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36202670

RESUMO

This manuscript is the third in a five-part series related to statistical assessment methodology for technical performance of multi-parametric quantitative imaging biomarkers (mp-QIBs). We outline approaches and statistical methodologies for developing and evaluating a phenotype classification model from a set of multiparametric QIBs. We then describe validation studies of the classifier for precision, diagnostic accuracy, and interchangeability with a comparator classifier. We follow with an end-to-end real-world example of development and validation of a classifier for atherosclerotic plaque phenotypes. We consider diagnostic accuracy and interchangeability to be clinically meaningful claims for a phenotype classification model informed by mp-QIB inputs, aiming to provide tools to demonstrate agreement between imaging-derived characteristics and clinically established phenotypes. Understanding that we are working in an evolving field, we close our manuscript with an acknowledgement of existing challenges and a discussion of where additional work is needed. In particular, we discuss the challenges involved with technical performance and analytical validation of mp-QIBs. We intend for this manuscript to further advance the robust and promising science of multiparametric biomarker development.


Assuntos
Diagnóstico por Imagem , Diagnóstico por Imagem/métodos , Biomarcadores , Fenótipo
9.
Acad Radiol ; 30(2): 147-158, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36180328

RESUMO

Multiparameter quantitative imaging incorporates anatomical, functional, and/or behavioral biomarkers to characterize tissue, detect disease, identify phenotypes, define longitudinal change, or predict outcome. Multiple imaging parameters are sometimes considered separately but ideally are evaluated collectively. Often, they are transformed as Likert interpretations, ignoring the correlations of quantitative properties that may result in better reproducibility or outcome prediction. In this paper we present three use cases of multiparameter quantitative imaging: i) multidimensional descriptor, ii) phenotype classification, and iii) risk prediction. A fourth application based on data-driven markers from radiomics is also presented. We describe the technical performance characteristics and their metrics common to all use cases, and provide a structure for the development, estimation, and testing of multiparameter quantitative imaging. This paper serves as an overview for a series of individual articles on the four applications, providing the statistical framework for multiparameter imaging applications in medicine.


Assuntos
Diagnóstico por Imagem , Reprodutibilidade dos Testes , Diagnóstico por Imagem/métodos , Biomarcadores , Fenótipo
11.
JCO Precis Oncol ; 6: e2100372, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35952319

RESUMO

PURPOSE: As immune checkpoint inhibitors (ICI) become increasingly used in frontline settings, identifying early indicators of response is needed. Recent studies suggest a role for circulating tumor DNA (ctDNA) in monitoring response to ICI, but uncertainty exists in the generalizability of these studies. Here, the role of ctDNA for monitoring response to ICI is assessed through a standardized approach by assessing clinical trial data from five independent studies. PATIENTS AND METHODS: Patient-level clinical and ctDNA data were pooled and harmonized from 200 patients across five independent clinical trials investigating the treatment of patients with non-small-cell lung cancer with programmed cell death-1 (PD-1)/programmed death ligand-1 (PD-L1)-directed monotherapy or in combination with chemotherapy. CtDNA levels were measured using different ctDNA assays across the studies. Maximum variant allele frequencies were calculated using all somatic tumor-derived variants in each unique patient sample to correlate ctDNA changes with overall survival (OS) and progression-free survival (PFS). RESULTS: We observed strong associations between reductions in ctDNA levels from on-treatment liquid biopsies with improved OS (OS; hazard ratio, 2.28; 95% CI, 1.62 to 3.20; P < .001) and PFS (PFS; hazard ratio 1.76; 95% CI, 1.31 to 2.36; P < .001). Changes in the maximum variant allele frequencies ctDNA values showed strong association across different outcomes. CONCLUSION: In this pooled analysis of five independent clinical trials, consistent and robust associations between reductions in ctDNA and outcomes were found across multiple end points assessed in patients with non-small-cell lung cancer treated with an ICI. Additional tumor types, stages, and drug classes should be included in future analyses to further validate this. CtDNA may serve as an important tool in clinical development and an early indicator of treatment benefit.


Assuntos
Antineoplásicos Imunológicos , Carcinoma Pulmonar de Células não Pequenas , DNA Tumoral Circulante , Neoplasias Pulmonares , Antineoplásicos Imunológicos/uso terapêutico , Biomarcadores Tumorais/genética , Carcinoma Pulmonar de Células não Pequenas/tratamento farmacológico , DNA Tumoral Circulante/genética , Ensaios Clínicos como Assunto , Humanos , Inibidores de Checkpoint Imunológico/farmacologia , Neoplasias Pulmonares/tratamento farmacológico , Prognóstico
12.
Cancer ; 128 Suppl 4: 883-891, 2022 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-35133658

RESUMO

Multicancer screening is a promising approach to improving the detection of preclinical disease, but current technologies have limited ability to identify precursor or early stage lesions, and approaches for developing the evidentiary chain are unclear. Frameworks to enable development and evaluation from discovery through evidence of clinical effectiveness are discussed.


Assuntos
Detecção Precoce de Câncer , Neoplasias , Humanos , Programas de Rastreamento , Neoplasias/diagnóstico
13.
Biom J ; 64(2): 225-234, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-33377537

RESUMO

In their paper, Liu et al. (2020) pointed out illogical discrepancies between subgroup and overall causal effects for some efficacy measures, in particular the odds and hazard ratios. As the authors show, the culprit is subgroups having prognostic effects within treatment arms. In response to their provocative findings, we found that the odds and hazard ratios are logic respecting when the subgroups are purely predictive, that is, the distribution of the potential outcome for the control treatment is homogeneous across subgroups. We also found that when we redefined the odds and hazards ratio causal estimands in terms of the joint distribution of the potential outcomes, the discrepancies are resolved under specific models in which the potential outcomes are conditionally independent. In response to other discussion points in the paper, we also provide remarks on association versus causation, confounding, statistical computing software, and dichotomania.


Assuntos
Lógica , Software , Extratos Vegetais , Modelos de Riscos Proporcionais , Ensaios Clínicos Controlados Aleatórios como Assunto
14.
Pharm Stat ; 20(5): 965-978, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-33942971

RESUMO

How do we communicate nuanced regulatory information to different audiences, recognizing that the consumer audience is very different from the physician audience? In particular, how do we communicate the heterogeneity of treatment effects - the potential differences in treatment effects based on sex, race, and age? That is a fundamental question at the heart of this panel discussion. Each panelist addressed a specific "challenge question" during their 5-minute presentation, and the list of questions is provided. The presentations were followed by a question and answer session with members of the audience and the panelists.

16.
Stat Methods Med Res ; 26(3): 1373-1388, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25847911

RESUMO

Diagnostic tests are often compared in multi-reader multi-case (MRMC) studies in which a number of cases (subjects with or without the disease in question) are examined by several readers using all tests to be compared. One of the commonly used methods for analyzing MRMC data is the Obuchowski-Rockette (OR) method, which assumes that the true area under the receiver operating characteristic curve (AUC) for each combination of reader and test follows a linear mixed model with fixed effects for test and random effects for reader and the reader-test interaction. This article proposes generalized linear mixed models which generalize the OR model by incorporating a range-appropriate link function that constrains the true AUCs to the unit interval. The proposed models can be estimated by maximizing a pseudo-likelihood based on the approximate normality of AUC estimates. A Monte Carlo expectation-maximization algorithm can be used to maximize the pseudo-likelihood, and a non-parametric bootstrap procedure can be used for inference. The proposed method is evaluated in a simulation study and applied to an MRMC study of breast cancer detection.


Assuntos
Neoplasias da Mama/diagnóstico , Testes Diagnósticos de Rotina/métodos , Modelos Lineares , Algoritmos , Área Sob a Curva , Feminino , Humanos , Funções Verossimilhança , Método de Monte Carlo , Curva ROC
17.
J Biopharm Stat ; 26(6): 1083-1097, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27548805

RESUMO

Comparing diagnostic tests on accuracy alone can be inconclusive. For example, a test may have better sensitivity than another test yet worse specificity. Comparing tests on benefit risk may be more conclusive because clinical consequences of diagnostic error are considered. For benefit-risk evaluation, we propose diagnostic yield, the expected distribution of subjects with true positive, false positive, true negative, and false negative test results in a hypothetical population. We construct a table of diagnostic yield that includes the number of false positive subjects experiencing adverse consequences from unnecessary work-up. We then develop a decision theory for evaluating tests. The theory provides additional interpretation to quantities in the diagnostic yield table. It also indicates that the expected utility of a test relative to a perfect test is a weighted accuracy measure, the average of sensitivity and specificity weighted for prevalence and relative importance of false positive and false negative testing errors, also interpretable as the cost-benefit ratio of treating non-diseased and diseased subjects. We propose plots of diagnostic yield, weighted accuracy, and relative net benefit of tests as functions of prevalence or cost-benefit ratio. Concepts are illustrated with hypothetical screening tests for colorectal cancer with test positive subjects being referred to colonoscopy.


Assuntos
Testes Diagnósticos de Rotina , Medição de Risco , Colonoscopia , Neoplasias Colorretais/diagnóstico , Reações Falso-Negativas , Reações Falso-Positivas , Humanos , Prevalência , Sensibilidade e Especificidade
18.
Clin Infect Dis ; 63(6): 812-7, 2016 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-27193750

RESUMO

The medical community needs systematic and pragmatic approaches for evaluating the benefit-risk trade-offs of diagnostics that assist in medical decision making. Benefit-Risk Evaluation of Diagnostics: A Framework (BED-FRAME) is a strategy for pragmatic evaluation of diagnostics designed to supplement traditional approaches. BED-FRAME evaluates diagnostic yield and addresses 2 key issues: (1) that diagnostic yield depends on prevalence, and (2) that different diagnostic errors carry different clinical consequences. As such, evaluating and comparing diagnostics depends on prevalence and the relative importance of potential errors. BED-FRAME provides a tool for communicating the expected clinical impact of diagnostic application and the expected trade-offs of diagnostic alternatives. BED-FRAME is a useful fundamental supplement to the standard analysis of diagnostic studies that will aid in clinical decision making.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Diagnóstico por Computador , Medição de Risco/métodos , Actinobacteria , Antibacterianos/uso terapêutico , Farmacorresistência Bacteriana , Infecções por Bactérias Gram-Positivas/tratamento farmacológico , Humanos , Modelos Estatísticos , Prevalência
19.
Ther Innov Regul Sci ; 50(2): 241-252, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30227004

RESUMO

Despite concerted efforts to discover and validate prognostic biomarkers or signatures, few medical tests indicated for prognostic uses have been widely accepted by the clinical community. Even fewer, perhaps, are covered by public or private health plans. We were able to identify 6 prognostic marker tests that have been approved or cleared by the US Food and Drug Administration. The pivotal clinical studies for these prognostic marker tests exhibited a wide variety of designs and statistical analyses. From these experiences, we develop statistical points to consider for design, conduct, and analysis of successful clinical validation studies of prognostic tests. In particular, we review broad themes regarding prospective and retrospective study designs, sample size, clinical performance evaluation, and handling of missing data. Our review emphasizes the distinction between a prognostic biomarker and the medical test used to measure it. For this purpose, a section on test measurement validation is also provided.

20.
Clin Infect Dis ; 61(5): 800-6, 2015 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-26113652

RESUMO

Clinical trials that compare strategies to optimize antibiotic use are of critical importance but are limited by competing risks that distort outcome interpretation, complexities of noninferiority trials, large sample sizes, and inadequate evaluation of benefits and harms at the patient level. The Antibacterial Resistance Leadership Group strives to overcome these challenges through innovative trial design. Response adjusted for duration of antibiotic risk (RADAR) is a novel methodology utilizing a superiority design and a 2-step process: (1) categorizing patients into an overall clinical outcome (based on benefits and harms), and (2) ranking patients with respect to a desirability of outcome ranking (DOOR). DOORs are constructed by assigning higher ranks to patients with (1) better overall clinical outcomes and (2) shorter durations of antibiotic use for similar overall clinical outcomes. DOOR distributions are compared between antibiotic use strategies. The probability that a randomly selected patient will have a better DOOR if assigned to the new strategy is estimated. DOOR/RADAR represents a new paradigm in assessing the risks and benefits of new strategies to optimize antibiotic use.


Assuntos
Antibacterianos/administração & dosagem , Antibacterianos/uso terapêutico , Ensaios Clínicos como Assunto , Farmacorresistência Bacteriana , Projetos de Pesquisa , Infecções Bacterianas/tratamento farmacológico , Humanos , Segurança do Paciente , Risco , Resultado do Tratamento
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA