Búsqueda | Portal de Búsqueda de la BVS

1.

Consensus statements on the current landscape of artificial intelligence applications in endoscopy, addressing roadblocks, and advancing artificial intelligence in gastroenterology.

Parasa, Sravanthi; Berzin, Tyler; Leggett, Cadman; Gross, Seth; Repici, Alessandro; Ahmad, Omer F; Chiang, Austin; Coelho-Prabhu, Nayantara; Cohen, Jonathan; Dekker, Evelien; Keswani, Rajesh N; Kahn, Charles E; Hassan, Cesare; Petrick, Nicholas; Mountney, Peter; Ng, Jonathan; Riegler, Michael; Mori, Yuichi; Saito, Yutaka; Thakkar, Shyam; Waxman, Irving; Wallace, Michael Bradley; Sharma, Prateek.

Gastrointest Endosc ; 2024 Apr 16.

Artículo en Inglés | MEDLINE | ID: mdl-38639679

RESUMEN

BACKGROUND AND AIMS: The American Society for Gastrointestinal Endoscopy (ASGE) AI Task Force along with experts in endoscopy, technology space, regulatory authorities, and other medical subspecialties initiated a consensus process that analyzed the current literature, highlighted potential areas, and outlined the necessary research in artificial intelligence (AI) to allow a clearer understanding of AI as it pertains to endoscopy currently. METHODS: A modified Delphi process was used to develop these consensus statements. RESULTS: Statement 1: Current advances in AI allow for the development of AI-based algorithms that can be applied to endoscopy to augment endoscopist performance in detection and characterization of endoscopic lesions. Statement 2: Computer vision-based algorithms provide opportunities to redefine quality metrics in endoscopy using AI, which can be standardized and can reduce subjectivity in reporting quality metrics. Natural language processing-based algorithms can help with the data abstraction needed for reporting current quality metrics in GI endoscopy effortlessly. Statement 3: AI technologies can support smart endoscopy suites, which may help optimize workflows in the endoscopy suite, including automated documentation. Statement 4: Using AI and machine learning helps in predictive modeling, diagnosis, and prognostication. High-quality data with multidimensionality are needed for risk prediction, prognostication of specific clinical conditions, and their outcomes when using machine learning methods. Statement 5: Big data and cloud-based tools can help advance clinical research in gastroenterology. Multimodal data are key to understanding the maximal extent of the disease state and unlocking treatment options. Statement 6: Understanding how to evaluate AI algorithms in the gastroenterology literature and clinical trials is important for gastroenterologists, trainees, and researchers, and hence education efforts by GI societies are needed. Statement 7: Several challenges regarding integrating AI solutions into the clinical practice of endoscopy exist, including understanding the role of human-AI interaction. Transparency, interpretability, and explainability of AI algorithms play a key role in their clinical adoption in GI endoscopy. Developing appropriate AI governance, data procurement, and tools needed for the AI lifecycle are critical for the successful implementation of AI into clinical practice. Statement 8: For payment of AI in endoscopy, a thorough evaluation of the potential value proposition for AI systems may help guide purchasing decisions in endoscopy. Reliable cost-effectiveness studies to guide reimbursement are needed. Statement 9: Relevant clinical outcomes and performance metrics for AI in gastroenterology are currently not well defined. To improve the quality and interpretability of research in the field, steps need to be taken to define these evidence standards. Statement 10: A balanced view of AI technologies and active collaboration between the medical technology industry, computer scientists, gastroenterologists, and researchers are critical for the meaningful advancement of AI in gastroenterology. CONCLUSIONS: The consensus process led by the ASGE AI Task Force and experts from various disciplines has shed light on the potential of AI in endoscopy and gastroenterology. AI-based algorithms have shown promise in augmenting endoscopist performance, redefining quality metrics, optimizing workflows, and aiding in predictive modeling and diagnosis. However, challenges remain in evaluating AI algorithms, ensuring transparency and interpretability, addressing governance and data procurement, determining payment models, defining relevant clinical outcomes, and fostering collaboration between stakeholders. Addressing these challenges while maintaining a balanced perspective is crucial for the meaningful advancement of AI in gastroenterology.

2.

A comparison of Bayesian and score methods for interval estimates of positive/negative likelihood ratios in support of diagnostic device performance evaluation.

Hu, Tingting; Sahiner, Berkman; Petrick, Nicholas; Cha, Kenny; Wen, Si; Pennello, Gene.

J Biopharm Stat ; : 1-19, 2024 Jun 18.

Artículo en Inglés | MEDLINE | ID: mdl-38889012

RESUMEN

BACKGROUND: Positive and negative likelihood ratios (PLR and NLR) are important metrics of accuracy for diagnostic devices with a binary output. However, the properties of Bayesian and frequentist interval estimators of PLR/NLR have not been extensively studied and compared. In this study, we explore the potential use of the Bayesian method for interval estimation of PLR/NLR, and, more broadly, for interval estimation of the ratio of two independent proportions. METHODS: We develop a Bayesian-based approach for interval estimation of PLR/NLR for use as a part of a diagnostic device performance evaluation. Our approach is applicable to a broader setting for interval estimation of any ratio of two independent proportions. We compare score and Bayesian interval estimators for the ratio of two proportions in terms of the coverage probability (CP) and expected interval width (EW) via extensive experiments and applications to two case studies. A supplementary experiment was also conducted to assess the performance of the proposed exact Bayesian method under different priors. RESULTS: Our experimental results show that the overall mean CP for Bayesian interval estimation is consistent with that for the score method (0.950 vs. 0.952), and the overall mean EW for Bayesian is shorter than that for score method (15.929 vs. 19.724). Application to two case studies showed that the intervals estimated using the Bayesian and frequentist approaches are very similar. DISCUSSION: Our numerical results indicate that the proposed Bayesian approach has a comparable CP performance with the score method while yielding higher precision (i.e. a shorter EW).

3.

Profiling the BLAST bioinformatics application for load balancing on high-performance computing clusters.

Cheng, Trinity; Chin, Pei-Ju; Cha, Kenny; Petrick, Nicholas; Mikailov, Mike.

BMC Bioinformatics ; 23(1): 544, 2022 Dec 16.

Artículo en Inglés | MEDLINE | ID: mdl-36526957

RESUMEN

BACKGROUND: The Basic Local Alignment Search Tool (BLAST) is a suite of commonly used algorithms for identifying matches between biological sequences. The user supplies a database file and query file of sequences for BLAST to find identical sequences between the two. The typical millions of database and query sequences make BLAST computationally challenging but also well suited for parallelization on high-performance computing clusters. The efficacy of parallelization depends on the data partitioning, where the optimal data partitioning relies on an accurate performance model. In previous studies, a BLAST job was sped up by 27 times by partitioning the database and query among thousands of processor nodes. However, the optimality of the partitioning method was not studied. Unlike BLAST performance models proposed in the literature that usually have problem size and hardware configuration as the only variables, the execution time of a BLAST job is a function of database size, query size, and hardware capability. In this work, the nucleotide BLAST application BLASTN was profiled using three methods: shell-level profiling with the Unix "time" command, code-level profiling with the built-in "profiler" module, and system-level profiling with the Unix "gprof" program. The runtimes were measured for six node types, using six different database files and 15 query files, on a heterogeneous HPC cluster with 500+ nodes. The empirical measurement data were fitted with quadratic functions to develop performance models that were used to guide the data parallelization for BLASTN jobs. RESULTS: Profiling results showed that BLASTN contains more than 34,500 different functions, but a single function, RunMTBySplitDB, takes 99.12% of the total runtime. Among its 53 child functions, five core functions were identified to make up 92.12% of the overall BLASTN runtime. Based on the performance models, static load balancing algorithms can be applied to the BLASTN input data to minimize the runtime of the longest job on an HPC cluster. Four test cases being run on homogeneous and heterogeneous clusters were tested. Experiment results showed that the runtime can be reduced by 81% on a homogeneous cluster and by 20% on a heterogeneous cluster by re-distributing the workload. DISCUSSION: Optimal data partitioning can improve BLASTN's overall runtime 5.4-fold in comparison with dividing the database and query into the same number of fragments. The proposed methodology can be used in the other applications in the BLAST+ suite or any other application as long as source code is available.

Asunto(s)

Metodologías Computacionales , Programas Informáticos , Algoritmos , Biología Computacional/métodos , Alineación de Secuencia

4.

Scaling bioinformatics applications on HPC.

Mikailov, Mike; Luo, Fu-Jyh; Barkley, Stuart; Valleru, Lohit; Whitney, Stephen; Liu, Zhichao; Thakkar, Shraddha; Tong, Weida; Petrick, Nicholas.

BMC Bioinformatics ; 18(Suppl 14): 501, 2017 12 28.

Artículo en Inglés | MEDLINE | ID: mdl-29297287

RESUMEN

BACKGROUND: Recent breakthroughs in molecular biology and next generation sequencing technologies have led to the expenential growh of the sequence databases. Researchrs use BLAST for processing these sequences. However traditional software parallelization techniques (threads, message passing interface) applied in newer versios of BLAST are not adequate for processing these sequences in timely manner. METHODS: A new method for array job parallelization has been developed which offers O(T) theoretical speed-up in comparison to multi-threading and MPI techniques. Here T is the number of array job tasks. (The number of CPUs that will be used to complete the job equals the product of T multiplied by the number of CPUs used by a single task.) The approach is based on segmentation of both input datasets to the BLAST process, combining partial solutions published earlier (Dhanker and Gupta, Int J Comput Sci Inf Technol_5:4818-4820, 2014), (Grant et al., Bioinformatics_18:765-766, 2002), (Mathog, Bioinformatics_19:1865-1866, 2003). It is accordingly referred to as a "dual segmentation" method. In order to implement the new method, the BLAST source code was modified to allow the researcher to pass to the program the number of records (effective number of sequences) in the original database. The team also developed methods to manage and consolidate the large number of partial results that get produced. Dual segmentation allows for massive parallelization, which lifts the scaling ceiling in exciting ways. RESULTS: BLAST jobs that hitherto failed or slogged inefficiently to completion now finish with speeds that characteristically reduce wallclock time from 27 days on 40 CPUs to a single day using 4104 tasks, each task utilizing eight CPUs and taking less than 7 minutes to complete. CONCLUSIONS: The massive increase in the number of tasks when running an analysis job with dual segmentation reduces the size, scope and execution time of each task. Besides significant speed of completion, additional benefits include fine-grained checkpointing and increased flexibility of job submission. "Trickling in" a swarm of individual small tasks tempers competition for CPU time in the shared HPC environment, and jobs submitted during quiet periods can complete in extraordinarily short time frames. The smaller task size also allows the use of older and less powerful hardware. The CDRH workhorse cluster was commissioned in 2010, yet its eight-core CPUs with only 24GB RAM work well in 2017 for these dual segmentation jobs. Finally, these techniques are excitingly friendly to budget conscious scientific research organizations where probabilistic algorithms such as BLAST might discourage attempts at greater certainty because single runs represent a major resource drain. If a job that used to take 24 days can now be completed in less than an hour or on a space available basis (which is the case at CDRH), repeated runs for more exhaustive analyses can be usefully contemplated.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Humanos , Motor de Búsqueda , Programas Informáticos

5.

Discussion on "Approval policies for modifications to machine learning-based software as a medical device: A study of bio-creep" by Jean Feng, Scott Emerson, and Noah Simon.

Pennello, Gene; Sahiner, Berkman; Gossmann, Alexej; Petrick, Nicholas.

Biometrics ; 77(1): 45-48, 2021 03.

Artículo en Inglés | MEDLINE | ID: mdl-33040332

Asunto(s)

Aprendizaje Automático , Programas Informáticos , Políticas

6.

Comparing two correlated C indices with right-censored survival outcome: a one-shot nonparametric approach.

Kang, Le; Chen, Weijie; Petrick, Nicholas A; Gallas, Brandon D.

Stat Med ; 34(4): 685-703, 2015 Feb 20.

Artículo en Inglés | MEDLINE | ID: mdl-25399736

RESUMEN

The area under the receiver operating characteristic curve is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of area under the receiver operating characteristic curve, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics-based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study.

Asunto(s)

Bioestadística/métodos , Estadísticas no Paramétricas , Algoritmos , Área Bajo la Curva , Biomarcadores , Enfermedades Cardiovasculares/etiología , Simulación por Computador , Humanos , Modelos Estadísticos , Análisis Multivariante , Estudios Prospectivos , Curva ROC , Análisis de Supervivencia

7.

Optimizing area under the ROC curve using semi-supervised learning.

Wang, Shijun; Li, Diana; Petrick, Nicholas; Sahiner, Berkman; Linguraru, Marius George; Summers, Ronald M.

Pattern Recognit ; 48(1): 276-287, 2015 Jan 01.

Artículo en Inglés | MEDLINE | ID: mdl-25395692

RESUMEN

Receiver operating characteristic (ROC) analysis is a standard methodology to evaluate the performance of a binary classification system. The area under the ROC curve (AUC) is a performance metric that summarizes how well a classifier separates two classes. Traditional AUC optimization techniques are supervised learning methods that utilize only labeled data (i.e., the true class is known for all data) to train the classifiers. In this work, inspired by semi-supervised and transductive learning, we propose two new AUC optimization algorithms hereby referred to as semi-supervised learning receiver operating characteristic (SSLROC) algorithms, which utilize unlabeled test samples in classifier training to maximize AUC. Unlabeled samples are incorporated into the AUC optimization process, and their ranking relationships to labeled positive and negative training samples are considered as optimization constraints. The introduced test samples will cause the learned decision boundary in a multidimensional feature space to adapt not only to the distribution of labeled training data, but also to the distribution of unlabeled test data. We formulate the semi-supervised AUC optimization problem as a semi-definite programming problem based on the margin maximization theory. The proposed methods SSLROC1 (1-norm) and SSLROC2 (2-norm) were evaluated using 34 (determined by power analysis) randomly selected datasets from the University of California, Irvine machine learning repository. Wilcoxon signed rank tests showed that the proposed methods achieved significant improvement compared with state-of-the-art methods. The proposed methods were also applied to a CT colonography dataset for colonic polyp classification and showed promising results.

8.

Decision region analysis for generalizability of artificial intelligence models: estimating model generalizability in the case of cross-reactivity and population shift.

Burgon, Alexis; Sahiner, Berkman; Petrick, Nicholas; Pennello, Gene; Cha, Kenny H; Samala, Ravi K.

J Med Imaging (Bellingham) ; 11(1): 014501, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38283653

RESUMEN

Purpose: Understanding an artificial intelligence (AI) model's ability to generalize to its target population is critical to ensuring the safe and effective usage of AI in medical devices. A traditional generalizability assessment relies on the availability of large, diverse datasets, which are difficult to obtain in many medical imaging applications. We present an approach for enhanced generalizability assessment by examining the decision space beyond the available testing data distribution. Approach: Vicinal distributions of virtual samples are generated by interpolating between triplets of test images. The generated virtual samples leverage the characteristics already in the test set, increasing the sample diversity while remaining close to the AI model's data manifold. We demonstrate the generalizability assessment approach on the non-clinical tasks of classifying patient sex, race, COVID status, and age group from chest x-rays. Results: Decision region composition analysis for generalizability indicated that a disproportionately large portion of the decision space belonged to a single "preferred" class for each task, despite comparable performance on the evaluation dataset. Evaluation using cross-reactivity and population shift strategies indicated a tendency to overpredict samples as belonging to the preferred class (e.g., COVID negative) for patients whose subgroup was not represented in the model development data. Conclusions: An analysis of an AI model's decision space has the potential to provide insight into model generalizability. Our approach uses the analysis of composition of the decision space to obtain an improved assessment of model generalizability in the case of limited test data.

9.

End-to-end deep learning method for predicting hormonal treatment response in women with atypical endometrial hyperplasia or endometrial cancer.

Kahaki, Seyed; Hagemann, Ian S; Cha, Kenny H; Trindade, Christopher; Petrick, Nicholas; Kostelecky, Nicolas; Borden, Lindsay E; Atwi, Doaa; Fung, Kar-Ming; Chen, Weijie.

J Med Imaging (Bellingham) ; 11(1): 017502, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38370423

RESUMEN

Purpose: Endometrial cancer (EC) is the most common gynecologic malignancy in the United States, and atypical endometrial hyperplasia (AEH) is considered a high-risk precursor to EC. Hormone therapies and hysterectomy are practical treatment options for AEH and early-stage EC. Some patients prefer hormone therapies for reasons such as fertility preservation or being poor surgical candidates. However, accurate prediction of an individual patient's response to hormonal treatment would allow for personalized and potentially improved recommendations for these conditions. This study aims to explore the feasibility of using deep learning models on whole slide images (WSI) of endometrial tissue samples to predict the patient's response to hormonal treatment. Approach: We curated a clinical WSI dataset of 112 patients from two clinical sites. An expert pathologist annotated these images by outlining AEH/EC regions. We developed an end-to-end machine learning model with mixed supervision. The model is based on image patches extracted from pathologist-annotated AEH/EC regions. Either an unsupervised deep learning architecture (Autoencoder or ResNet50), or non-deep learning (radiomics feature extraction) is used to embed the images into a low-dimensional space, followed by fully connected layers for binary prediction, which was trained with binary responder/non-responder labels established by pathologists. We used stratified sampling to partition the dataset into a development set and a test set for internal validation of the performance of our models. Results: The autoencoder model yielded an AUROC of 0.80 with 95% CI [0.63, 0.95] on the independent test set for the task of predicting a patient with AEH/EC as a responder vs non-responder to hormonal treatment. Conclusions: These findings demonstrate the potential of using mixed supervised machine learning models on WSIs for predicting the response to hormonal treatment in AEH/EC patients.

10.

Tumor Size Is Not Everything: Advancing Radiomics as a Precision Medicine Biomarker in Oncology Drug Development and Clinical Care. A Report of a Multidisciplinary Workshop Coordinated by the RECIST Working Group.

Nakajima, Erica C; Simpson, Amber; Bogaerts, Jan; de Vries, Elisabeth G E; Do, Richard; Garalda, Elena; Goldmacher, Greg; Kinahan, Paul E; Lambin, Philippe; LeStage, Barbara; Li, Qin; Lin, Frank; Litière, Saskia; Perez-Lopez, Raquel; Petrick, Nicholas; Schwartz, Lawrence; Seymour, Lesley; Shankar, Lalitha; Laurie, Scott A.

JCO Precis Oncol ; 8: e2300687, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38635935

RESUMEN

Radiomics, the science of extracting quantifiable data from routine medical images, is a powerful tool that has many potential applications in oncology. The Response Evaluation Criteria in Solid Tumors Working Group (RWG) held a workshop in May 2022, which brought together various stakeholders to discuss the potential role of radiomics in oncology drug development and clinical trials, particularly with respect to response assessment. This article summarizes the results of that workshop, reviewing radiomics for the practicing oncologist and highlighting the work that needs to be done to move forward the incorporation of radiomics into clinical trials.

Asunto(s)

Neoplasias , Medicina de Precisión , Humanos , Medicina de Precisión/métodos , Criterios de Evaluación de Respuesta en Tumores Sólidos , Radiómica , Oncología Médica , Neoplasias/diagnóstico por imagen , Neoplasias/tratamiento farmacológico

11.

Artificial intelligence in medicine: mitigating risks and maximizing benefits via quality assurance, quality control, and acceptance testing.

Mahmood, Usman; Shukla-Dave, Amita; Chan, Heang-Ping; Drukker, Karen; Samala, Ravi K; Chen, Quan; Vergara, Daniel; Greenspan, Hayit; Petrick, Nicholas; Sahiner, Berkman; Huo, Zhimin; Summers, Ronald M; Cha, Kenny H; Tourassi, Georgia; Deserno, Thomas M; Grizzard, Kevin T; Näppi, Janne J; Yoshida, Hiroyuki; Regge, Daniele; Mazurchuk, Richard; Suzuki, Kenji; Morra, Lia; Huisman, Henkjan; Armato, Samuel G; Hadjiiski, Lubomir.

BJR Artif Intell ; 1(1): ubae003, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38476957

RESUMEN

The adoption of artificial intelligence (AI) tools in medicine poses challenges to existing clinical workflows. This commentary discusses the necessity of context-specific quality assurance (QA), emphasizing the need for robust QA measures with quality control (QC) procedures that encompass (1) acceptance testing (AT) before clinical use, (2) continuous QC monitoring, and (3) adequate user training. The discussion also covers essential components of AT and QA, illustrated with real-world examples. We also highlight what we see as the shared responsibility of manufacturers or vendors, regulators, healthcare systems, medical physicists, and clinicians to enact appropriate testing and oversight to ensure a safe and equitable transformation of medicine through AI.

12.

AI and machine learning in medical imaging: key points from development to translation.

Samala, Ravi K; Drukker, Karen; Shukla-Dave, Amita; Chan, Heang-Ping; Sahiner, Berkman; Petrick, Nicholas; Greenspan, Hayit; Mahmood, Usman; Summers, Ronald M; Tourassi, Georgia; Deserno, Thomas M; Regge, Daniele; Näppi, Janne J; Yoshida, Hiroyuki; Huo, Zhimin; Chen, Quan; Vergara, Daniel; Cha, Kenny H; Mazurchuk, Richard; Grizzard, Kevin T; Huisman, Henkjan; Morra, Lia; Suzuki, Kenji; Armato, Samuel G; Hadjiiski, Lubomir.

BJR Artif Intell ; 1(1): ubae006, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38828430

RESUMEN

Innovation in medical imaging artificial intelligence (AI)/machine learning (ML) demands extensive data collection, algorithmic advancements, and rigorous performance assessments encompassing aspects such as generalizability, uncertainty, bias, fairness, trustworthiness, and interpretability. Achieving widespread integration of AI/ML algorithms into diverse clinical tasks will demand a steadfast commitment to overcoming issues in model design, development, and performance assessment. The complexities of AI/ML clinical translation present substantial challenges, requiring engagement with relevant stakeholders, assessment of cost-effectiveness for user and patient benefit, timely dissemination of information relevant to robust functioning throughout the AI/ML lifecycle, consideration of regulatory compliance, and feedback loops for real-world performance evidence. This commentary addresses several hurdles for the development and adoption of AI/ML technologies in medical imaging. Comprehensive attention to these underlying and often subtle factors is critical not only for tackling the challenges but also for exploring novel opportunities for the advancement of AI in radiology.

13.

Methodology for Good Machine Learning with Multi-Omics Data.

Coroller, Thibaud; Sahiner, Berkman; Amatya, Anup; Gossmann, Alexej; Karagiannis, Konstantinos; Moloney, Conor; Samala, Ravi K; Santana-Quintero, Luis; Solovieff, Nadia; Wang, Craig; Amiri-Kordestani, Laleh; Cao, Qian; Cha, Kenny H; Charlab, Rosane; Cross, Frank H; Hu, Tingting; Huang, Ruihao; Kraft, Jeffrey; Krusche, Peter; Li, Yutong; Li, Zheng; Mazo, Ilya; Paul, Rahul; Schnakenberg, Susan; Serra, Paolo; Smith, Sean; Song, Chi; Su, Fei; Tiwari, Mohit; Vechery, Colin; Xiong, Xin; Zarate, Juan Pablo; Zhu, Hao; Chakravartty, Arunava; Liu, Qi; Ohlssen, David; Petrick, Nicholas; Schneider, Julie A; Walderhaug, Mark; Zuber, Emmanuel.

Clin Pharmacol Ther ; 115(4): 745-757, 2024 04.

Artículo en Inglés | MEDLINE | ID: mdl-37965805

RESUMEN

In 2020, Novartis Pharmaceuticals Corporation and the U.S. Food and Drug Administration (FDA) started a 4-year scientific collaboration to approach complex new data modalities and advanced analytics. The scientific question was to find novel radio-genomics-based prognostic and predictive factors for HR+/HER- metastatic breast cancer under a Research Collaboration Agreement. This collaboration has been providing valuable insights to help successfully implement future scientific projects, particularly using artificial intelligence and machine learning. This tutorial aims to provide tangible guidelines for a multi-omics project that includes multidisciplinary expert teams, spanning across different institutions. We cover key ideas, such as "maintaining effective communication" and "following good data science practices," followed by the four steps of exploratory projects, namely (1) plan, (2) design, (3) develop, and (4) disseminate. We break each step into smaller concepts with strategies for implementation and provide illustrations from our collaboration to further give the readers actionable guidance.

Asunto(s)

Inteligencia Artificial , Multiómica , Humanos , Aprendizaje Automático , Genómica

14.

On the assessment of the added value of new predictive biomarkers.

Chen, Weijie; Samuelson, Frank W; Gallas, Brandon D; Kang, Le; Sahiner, Berkman; Petrick, Nicholas.

BMC Med Res Methodol ; 13: 98, 2013 Jul 29.

Artículo en Inglés | MEDLINE | ID: mdl-23895587

RESUMEN

BACKGROUND: The surge in biomarker development calls for research on statistical evaluation methodology to rigorously assess emerging biomarkers and classification models. Recently, several authors reported the puzzling observation that, in assessing the added value of new biomarkers to existing ones in a logistic regression model, statistical significance of new predictor variables does not necessarily translate into a statistically significant increase in the area under the ROC curve (AUC). Vickers et al. concluded that this inconsistency is because AUC "has vastly inferior statistical properties," i.e., it is extremely conservative. This statement is based on simulations that misuse the DeLong et al. method. Our purpose is to provide a fair comparison of the likelihood ratio (LR) test and the Wald test versus diagnostic accuracy (AUC) tests. DISCUSSION: We present a test to compare ideal AUCs of nested linear discriminant functions via an F test. We compare it with the LR test and the Wald test for the logistic regression model. The null hypotheses of these three tests are equivalent; however, the F test is an exact test whereas the LR test and the Wald test are asymptotic tests. Our simulation shows that the F test has the nominal type I error even with a small sample size. Our results also indicate that the LR test and the Wald test have inflated type I errors when the sample size is small, while the type I error converges to the nominal value asymptotically with increasing sample size as expected. We further show that the DeLong et al. method tests a different hypothesis and has the nominal type I error when it is used within its designed scope. Finally, we summarize the pros and cons of all four methods we consider in this paper. SUMMARY: We show that there is nothing inherently less powerful or disagreeable about ROC analysis for showing the usefulness of new biomarkers or characterizing the performance of classification models. Each statistical method for assessing biomarkers and classification models has its own strengths and weaknesses. Investigators need to choose methods based on the assessment purpose, the biomarker development phase at which the assessment is being performed, the available patient data, and the validity of assumptions behind the methodologies.

Asunto(s)

Biomarcadores , Modelos Estadísticos , Valor Predictivo de las Pruebas , Área Bajo la Curva , Humanos , Funciones de Verosimilitud , Modelos Logísticos

15.

Data drift in medical machine learning: implications and potential remedies.

Sahiner, Berkman; Chen, Weijie; Samala, Ravi K; Petrick, Nicholas.

Br J Radiol ; 96(1150): 20220878, 2023 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-36971405

RESUMEN

Data drift refers to differences between the data used in training a machine learning (ML) model and that applied to the model in real-world operation. Medical ML systems can be exposed to various forms of data drift, including differences between the data sampled for training and used in clinical operation, differences between medical practices or context of use between training and clinical use, and time-related changes in patient populations, disease patterns, and data acquisition, to name a few. In this article, we first review the terminology used in ML literature related to data drift, define distinct types of drift, and discuss in detail potential causes within the context of medical applications with an emphasis on medical imaging. We then review the recent literature regarding the effects of data drift on medical ML systems, which overwhelmingly show that data drift can be a major cause for performance deterioration. We then discuss methods for monitoring data drift and mitigating its effects with an emphasis on pre- and post-deployment techniques. Some of the potential methods for drift detection and issues around model retraining when drift is detected are included. Based on our review, we find that data drift is a major concern in medical ML deployment and that more research is needed so that ML models can identify drift early, incorporate effective mitigation strategies and resist performance decay.

Asunto(s)

Aprendizaje Automático , Computación en Informática Médica

16.

Characterization of mechanical stiffness using additive manufacturing and finite element analysis: potential tool for bone health assessment.

Marupudi, Sriharsha; Cao, Qian; Samala, Ravi; Petrick, Nicholas.

3D Print Med ; 9(1): 32, 2023 Nov 18.

Artículo en Inglés | MEDLINE | ID: mdl-37978094

RESUMEN

BACKGROUND: Bone health and fracture risk are known to be correlated with stiffness. Both micro-finite element analysis (µFEA) and mechanical testing of additive manufactured phantoms are useful approaches for estimating mechanical properties of trabecular bone-like structures. However, it is unclear if measurements from the two approaches are consistent. The purpose of this work is to evaluate the agreement between stiffness measurements obtained from mechanical testing of additive manufactured trabecular bone phantoms and µFEA modeling. Agreement between the two methods would suggest 3D printing is a viable method for validation of µFEA modeling. METHODS: A set of 20 lumbar vertebrae regions of interests were segmented and the corresponding trabecular bone phantoms were produced using selective laser sintering. The phantoms were mechanically tested in uniaxial compression to derive their stiffness values. The stiffness values were also derived from in silico simulation, where linear elastic µFEA was applied to simulate the same compression and boundary conditions. Bland-Altman analysis was used to evaluate agreement between the mechanical testing and µFEA simulation values. Additionally, we evaluated the fidelity of the 3D printed phantoms as well as the repeatability of the 3D printing and mechanical testing process. RESULTS: We observed good agreement between the mechanically tested stiffness and µFEA stiffness, with R2 of 0.84 and normalized root mean square deviation of 8.1%. We demonstrate that the overall trabecular bone structures are printed in high fidelity (Dice score of 0.97 (95% CI, [0.96,0.98]) and that mechanical testing is repeatable (coefficient of variation less than 5% for stiffness values from testing of duplicated phantoms). However, we noticed some defects in the resin microstructure of the 3D printed phantoms, which may account for the discrepancy between the stiffness values from simulation and mechanical testing. CONCLUSION: Overall, the level of agreement achieved between the mechanical stiffness and µFEA indicates that our µFEA methods may be acceptable for assessing bone mechanics of complex trabecular structures as part of an analysis of overall bone health.

17.

Weakly Supervised Deep Learning for Predicting the Response to Hormonal Treatment of Women with Atypical Endometrial Hyperplasia: A Feasibility Study.

Kahaki, Seyed; Hagemann, Ian S; Cha, Kenny; Trindade, Christopher J; Petrick, Nicholas; Kostelecky, Nicolas; Chen, Weijie.

Proc SPIE Int Soc Opt Eng ; 124712023 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-37159719

RESUMEN

Endometrial cancer (EC) is the most common gynecologic malignancy in the US and complex atypical hyperplasia (CAH) is considered a high-risk precursor to EC. Treatment options for CAH and early-stage EC include hormone therapies and hysterectomy with the former preferred by certain patients, e.g., for fertility preservation or poor surgical candidates. Accurate prediction of response to hormonal treatment would allow for personalized and potentially improved recommendations for the treatment of these conditions. In this study, we investigate the feasibility of utilizing weakly supervised deep learning models on whole slide images of endometrial tissue samples for the prediction of patient response to hormonal treatment. We curated a clinical whole-slide-image (WSI) dataset of 112 patients from two clinical sites. We developed an end-to-end machine learning model using WSIs of endometrial specimens for the prediction of hormonal treatment response among women with CAH/EC. The model takes patches extracted from pathologist-annotated CAH/EC regions as input and utilizes an unsupervised deep learning architecture (Autoencoder or ResNet50) to embed the images into a low-dimensional space, followed by fully connected layers for binary prediction. Our autoencoder model yielded an AUC of 0.79 with 95% CI [0.61, 0.98] on a hold-out test set in the task of predicting a patient with CAH/EC as a responder vs non-responder to hormonal treatment. Our results, demonstrate the potential for using weakly supervised machine learning models on WSIs for predicting response to hormonal treatment of CAH/EC patients.

18.

Semi-supervised training using cooperative labeling of weakly annotated data for nodule detection in chest CT.

Maynord, Michael; Farhangi, M Mehdi; Fermüller, Cornelia; Aloimonos, Yiannis; Levine, Gary; Petrick, Nicholas; Sahiner, Berkman; Pezeshk, Aria.

Med Phys ; 50(7): 4255-4268, 2023 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-36630691

RESUMEN

PURPOSE: Machine learning algorithms are best trained with large quantities of accurately annotated samples. While natural scene images can often be labeled relatively cheaply and at large scale, obtaining accurate annotations for medical images is both time consuming and expensive. In this study, we propose a cooperative labeling method that allows us to make use of weakly annotated medical imaging data for the training of a machine learning algorithm. As most clinically produced data are weakly-annotated - produced for use by humans rather than machines and lacking information machine learning depends upon - this approach allows us to incorporate a wider range of clinical data and thereby increase the training set size. METHODS: Our pseudo-labeling method consists of multiple stages. In the first stage, a previously established network is trained using a limited number of samples with high-quality expert-produced annotations. This network is used to generate annotations for a separate larger dataset that contains only weakly annotated scans. In the second stage, by cross-checking the two types of annotations against each other, we obtain higher-fidelity annotations. In the third stage, we extract training data from the weakly annotated scans, and combine it with the fully annotated data, producing a larger training dataset. We use this larger dataset to develop a computer-aided detection (CADe) system for nodule detection in chest CT. RESULTS: We evaluated the proposed approach by presenting the network with different numbers of expert-annotated scans in training and then testing the CADe using an independent expert-annotated dataset. We demonstrate that when availability of expert annotations is severely limited, the inclusion of weakly-labeled data leads to a 5% improvement in the competitive performance metric (CPM), defined as the average of sensitivities at different false-positive rates. CONCLUSIONS: Our proposed approach can effectively merge a weakly-annotated dataset with a small, well-annotated dataset for algorithm training. This approach can help enlarge limited training data by leveraging the large amount of weakly labeled data typically generated in clinical image interpretation.

Asunto(s)

Algoritmos , Tomografía Computarizada por Rayos X , Humanos , Aprendizaje Automático , Aprendizaje Automático Supervisado , Procesamiento de Imagen Asistido por Computador/métodos

19.

Regulatory considerations for medical imaging AI/ML devices in the United States: concepts and challenges.

Petrick, Nicholas; Chen, Weijie; Delfino, Jana G; Gallas, Brandon D; Kang, Yanna; Krainak, Daniel; Sahiner, Berkman; Samala, Ravi K.

J Med Imaging (Bellingham) ; 10(5): 051804, 2023 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-37361549

RESUMEN

Purpose: To introduce developers to medical device regulatory processes and data considerations in artificial intelligence and machine learning (AI/ML) device submissions and to discuss ongoing AI/ML-related regulatory challenges and activities. Approach: AI/ML technologies are being used in an increasing number of medical imaging devices, and the fast evolution of these technologies presents novel regulatory challenges. We provide AI/ML developers with an introduction to U.S. Food and Drug Administration (FDA) regulatory concepts, processes, and fundamental assessments for a wide range of medical imaging AI/ML device types. Results: The device type for an AI/ML device and appropriate premarket regulatory pathway is based on the level of risk associated with the device and informed by both its technological characteristics and intended use. AI/ML device submissions contain a wide array of information and testing to facilitate the review process with the model description, data, nonclinical testing, and multi-reader multi-case testing being critical aspects of the AI/ML device review process for many AI/ML device submissions. The agency is also involved in AI/ML-related activities that support guidance document development, good machine learning practice development, AI/ML transparency, AI/ML regulatory research, and real-world performance assessment. Conclusion: FDA's AI/ML regulatory and scientific efforts support the joint goals of ensuring patients have access to safe and effective AI/ML devices over the entire device lifecycle and stimulating medical AI/ML innovation.

20.

AAPM task group report 273: Recommendations on best practices for AI and machine learning for computer-aided diagnosis in medical imaging.

Hadjiiski, Lubomir; Cha, Kenny; Chan, Heang-Ping; Drukker, Karen; Morra, Lia; Näppi, Janne J; Sahiner, Berkman; Yoshida, Hiroyuki; Chen, Quan; Deserno, Thomas M; Greenspan, Hayit; Huisman, Henkjan; Huo, Zhimin; Mazurchuk, Richard; Petrick, Nicholas; Regge, Daniele; Samala, Ravi; Summers, Ronald M; Suzuki, Kenji; Tourassi, Georgia; Vergara, Daniel; Armato, Samuel G.

Med Phys ; 50(2): e1-e24, 2023 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-36565447

RESUMEN

Rapid advances in artificial intelligence (AI) and machine learning, and specifically in deep learning (DL) techniques, have enabled broad application of these methods in health care. The promise of the DL approach has spurred further interest in computer-aided diagnosis (CAD) development and applications using both "traditional" machine learning methods and newer DL-based methods. We use the term CAD-AI to refer to this expanded clinical decision support environment that uses traditional and DL-based AI methods. Numerous studies have been published to date on the development of machine learning tools for computer-aided, or AI-assisted, clinical tasks. However, most of these machine learning models are not ready for clinical deployment. It is of paramount importance to ensure that a clinical decision support tool undergoes proper training and rigorous validation of its generalizability and robustness before adoption for patient care in the clinic. To address these important issues, the American Association of Physicists in Medicine (AAPM) Computer-Aided Image Analysis Subcommittee (CADSC) is charged, in part, to develop recommendations on practices and standards for the development and performance assessment of computer-aided decision support systems. The committee has previously published two opinion papers on the evaluation of CAD systems and issues associated with user training and quality assurance of these systems in the clinic. With machine learning techniques continuing to evolve and CAD applications expanding to new stages of the patient care process, the current task group report considers the broader issues common to the development of most, if not all, CAD-AI applications and their translation from the bench to the clinic. The goal is to bring attention to the proper training and validation of machine learning algorithms that may improve their generalizability and reliability and accelerate the adoption of CAD-AI systems for clinical decision support.

Asunto(s)

Inteligencia Artificial , Diagnóstico por Computador , Humanos , Reproducibilidad de los Resultados , Diagnóstico por Computador/métodos , Diagnóstico por Imagen , Aprendizaje Automático

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA