Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Neurosurgery ; 93(5): 1121-1143, 2023 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-37610208

RESUMO

BACKGROUND AND OBJECTIVES: Spine surgery has advanced in concert with our deeper understanding of its elements. Narrowly focused bibliometric analyses have been conducted previously, but never on the entire corpus of the field. Using big data and bibliometrics, we appraised the entire corpus of spine surgery publications to study the evolution of the specialty as a scholarly field since 1900. METHODS: We queried Web of Science for all contents from 13 major publications dedicated to spine surgery. We next queried by topic [topic = (spine OR spinal OR vertebrae OR vertebral OR intervertebral OR disc OR disk)]; these results were filtered to include articles published by 49 other publications that were manually determined to contain pertinent articles. Articles, along with their metadata, were exported. Statistical and bibliometric analyses were performed using the Bibliometrix R package and various Python packages. RESULTS: Eighty-five thousand five hundred articles from 62 journals and 134 707 unique authors were identified. The annual growth rate of publications was 2.78%, with a surge after 1980, concurrent with the growth of specialized journals. International coauthorship, absent before 1970, increased exponentially with the formation of influential spine study groups. Reference publication year spectroscopy allowed us to identify 200 articles that comprise the historical roots of modern spine surgery and each of its subdisciplines. We mapped the emergence of new topics and saw a recent lexical evolution toward outcomes- and patient-centric terms. Female and minority coauthorship has increased since 1990, but remains low, and disparities across major publications persist. CONCLUSION: The field of spine surgery was borne from pioneering individuals who published their findings in a variety of journals. The renaissance of spine surgery has been powered by international collaboration and is increasingly outcomes focused. While spine surgery is gradually becoming more diverse, there is a clear need for further promotion and outreach to under-represented populations.


Assuntos
Bibliometria , Medicina , Feminino , Humanos , Coluna Vertebral/cirurgia , Publicações
2.
Nature ; 619(7969): 357-362, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37286606

RESUMO

Physicians make critical time-constrained decisions every day. Clinical predictive models can help physicians and administrators make decisions by forecasting clinical and operational events. Existing structured data-based clinical predictive models have limited use in everyday practice owing to complexity in data processing, as well as model development and deployment1-3. Here we show that unstructured clinical notes from the electronic health record can enable the training of clinical language models, which can be used as all-purpose clinical predictive engines with low-resistance development and deployment. Our approach leverages recent advances in natural language processing4,5 to train a large language model for medical language (NYUTron) and subsequently fine-tune it across a wide range of clinical and operational predictive tasks. We evaluated our approach within our health system for five such tasks: 30-day all-cause readmission prediction, in-hospital mortality prediction, comorbidity index prediction, length of stay prediction, and insurance denial prediction. We show that NYUTron has an area under the curve (AUC) of 78.7-94.9%, with an improvement of 5.36-14.7% in the AUC compared with traditional models. We additionally demonstrate the benefits of pretraining with clinical text, the potential for increasing generalizability to different sites through fine-tuning and the full deployment of our system in a prospective, single-arm trial. These results show the potential for using clinical language models in medicine to read alongside physicians and provide guidance at the point of care.


Assuntos
Tomada de Decisão Clínica , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Médicos , Humanos , Tomada de Decisão Clínica/métodos , Readmissão do Paciente , Mortalidade Hospitalar , Comorbidade , Tempo de Internação , Cobertura do Seguro , Área Sob a Curva , Sistemas Automatizados de Assistência Junto ao Leito/tendências , Ensaios Clínicos como Assunto
3.
Neurosurgery ; 93(6): 1228-1234, 2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-37345933

RESUMO

BACKGROUND AND OBJECTIVES: Clinical registries are critical for modern surgery and underpin outcomes research, device monitoring, and trial development. However, existing approaches to registry construction are labor-intensive, costly, and prone to manual error. Natural language processing techniques combined with electronic health record (EHR) data sets can theoretically automate the construction and maintenance of registries. Our aim was to automate the generation of a spine surgery registry at an academic medical center using regular expression (regex) classifiers developed by neurosurgeons to combine domain expertise with interpretable algorithms. METHODS: We used a Hadoop data lake consisting of all the information generated by an academic medical center. Using this database and structured query language queries, we retrieved every operative note written in the department of neurosurgery since our transition to EHR. Notes were parsed using regex classifiers and compared with a random subset of 100 manually reviewed notes. RESULTS: A total of 31 502 operative cases were downloaded and processed using regex classifiers. The codebase required 5 days of development, 3 weeks of validation, and less than 1 hour for the software to generate the autoregistry. Regex classifiers had an average accuracy of 98.86% at identifying both spinal procedures and the relevant vertebral levels, and it correctly identified the entire list of defined surgical procedures in 89% of patients. We were able to identify patients who required additional operations within 30 days to monitor outcomes and quality metrics. CONCLUSION: This study demonstrates the feasibility of automatically generating a spine registry using the EHR and an interpretable, customizable natural language processing algorithm which may reduce pitfalls associated with manual registry development and facilitate rapid clinical research.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Humanos , Sistema de Registros , Software , Algoritmos
4.
Neurosurgery ; 93(5): 986-993, 2023 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-37255296

RESUMO

BACKGROUND AND OBJECTIVES: Advances in targeted therapies and wider application of stereotactic radiosurgery (SRS) have redefined outcomes of patients with brain metastases. Under modern treatment paradigms, there remains limited characterization of which aspects of disease drive demise and in what frequencies. This study aims to characterize the primary causes of terminal decline and evaluate differences in underlying intracranial tumor dynamics in patients with metastatic brain cancer. These fundamental details may help guide management, patient counseling, and research priorities. METHODS: Using NYUMets-Brain-the largest, longitudinal, real-world, open data set of patients with brain metastases-patients treated at New York University Langone Health between 2012 and 2021 with SRS were evaluated. A review of electronic health records allowed for the determination of a primary cause of death in patients who died during the study period. Causes were classified in mutually exclusive, but collectively exhaustive, categories. Multilevel models evaluated for differences in dynamics of intracranial tumors, including changes in volume and number. RESULTS: Of 439 patients with end-of-life data, 73.1% died secondary to systemic disease, 10.3% died secondary to central nervous system (CNS) disease, and 16.6% died because of other causes. CNS deaths were driven by acute increases in intracranial pressure (11%), development of focal neurological deficits (18%), treatment-resistant seizures (11%), and global decline driven by increased intracranial tumor burden (60%). Rate of influx of new intracranial tumors was almost twice as high in patients who died compared with those who survived ( P < .001), but there was no difference in rates of volume change per intracranial tumor ( P = .95). CONCLUSION: Most patients with brain metastases die secondary to systemic disease progression. For patients who die because of neurological disease, tumor dynamics and cause of death mechanisms indicate that the primary driver of decline for many may be unchecked systemic disease with unrelenting spread of new tumors to the CNS rather than failure of local growth control.


Assuntos
Neoplasias Encefálicas , Radiocirurgia , Humanos , Encéfalo/patologia , Neoplasias Encefálicas/cirurgia , Causas de Morte , Estudos Retrospectivos
6.
Neurosurgery ; 92(2): 431-438, 2023 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-36399428

RESUMO

BACKGROUND: The development of accurate machine learning algorithms requires sufficient quantities of diverse data. This poses a challenge in health care because of the sensitive and siloed nature of biomedical information. Decentralized algorithms through federated learning (FL) avoid data aggregation by instead distributing algorithms to the data before centrally updating one global model. OBJECTIVE: To establish a multicenter collaboration and assess the feasibility of using FL to train machine learning models for intracranial hemorrhage (ICH) detection without sharing data between sites. METHODS: Five neurosurgery departments across the United States collaborated to establish a federated network and train a convolutional neural network to detect ICH on computed tomography scans. The global FL model was benchmarked against a standard, centrally trained model using a held-out data set and was compared against locally trained models using site data. RESULTS: A federated network of practicing neurosurgeon scientists was successfully initiated to train a model for predicting ICH. The FL model achieved an area under the ROC curve of 0.9487 (95% CI 0.9471-0.9503) when predicting all subtypes of ICH compared with a benchmark (non-FL) area under the ROC curve of 0.9753 (95% CI 0.9742-0.9764), although performance varied by subtype. The FL model consistently achieved top three performance when validated on any site's data, suggesting improved generalizability. A qualitative survey described the experience of participants in the federated network. CONCLUSION: This study demonstrates the feasibility of implementing a federated network for multi-institutional collaboration among clinicians and using FL to conduct machine learning research, thereby opening a new paradigm for neurosurgical collaboration.


Assuntos
Algoritmos , Benchmarking , Humanos , Hemorragias Intracranianas , Aprendizado de Máquina , Redes Neurais de Computação
7.
Sci Rep ; 11(1): 7482, 2021 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-33820942

RESUMO

Real-time seizure detection is a resource intensive process as it requires continuous monitoring of patients on stereoelectroencephalography. This study improves real-time seizure detection in drug resistant epilepsy (DRE) patients by developing patient-specific deep learning models that utilize a novel self-supervised dynamic thresholding approach. Deep neural networks were constructed on over 2000 h of high-resolution, multichannel SEEG and video recordings from 14 DRE patients. Consensus labels from a panel of epileptologists were used to evaluate model efficacy. Self-supervised dynamic thresholding exhibited improvements in positive predictive value (PPV; difference: 39.0%; 95% CI 4.5-73.5%; Wilcoxon-Mann-Whitney test; N = 14; p = 0.03) with similar sensitivity (difference: 14.3%; 95% CI - 21.7 to 50.3%; Wilcoxon-Mann-Whitney test; N = 14; p = 0.42) compared to static thresholds. In some models, training on as little as 10 min of SEEG data yielded robust detection. Cross-testing experiments reduced PPV (difference: 56.5%; 95% CI 25.8-87.3%; Wilcoxon-Mann-Whitney test; N = 14; p = 0.002), while multimodal detection significantly improved sensitivity (difference: 25.0%; 95% CI 0.2-49.9%; Wilcoxon-Mann-Whitney test; N = 14; p < 0.05). Self-supervised dynamic thresholding improved the efficacy of real-time seizure predictions. Multimodal models demonstrated potential to improve detection. These findings are promising for future deployment in epilepsy monitoring units to enable real-time seizure detection without annotated data and only minimal training time in individual patients.


Assuntos
Eletroencefalografia , Convulsões/diagnóstico por imagem , Técnicas Estereotáxicas , Gravação em Vídeo , Algoritmos , Fenômenos Eletrofisiológicos , Feminino , Humanos , Masculino , Imagem Multimodal , Redes Neurais de Computação , Convulsões/fisiopatologia , Adulto Jovem
8.
World Neurosurg ; 144: e25-e33, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32652276

RESUMO

BACKGROUND: With a growing aging population in the United States, the number of operative lumbar spine pathologies continues to grow. Therefore, our objective was to estimate the future demand for lumbar spine surgery volumes for the United States to the year 2040. METHODS: The National/Nationwide Inpatient Sample was queried for years 2003-2015 for anterior interbody and posterior lumbar fusions (ALIF, PLF) to create national estimates of procedural volumes for those years. The average age and comorbidity burden was characterized, and Poisson modeling controlling for age and sex allowed for surgical volume prediction to 2040 in 10-year increments. Age was grouped into categories (<25, 25-34, 35-44, 45-54, 55-64, 65-74, 75-84, and >85 years), and estimates of surgical volumes for each age subgroup were created. RESULTS: ALIF volume is expected to increase from 46,903 to 55,528, and PLF volume is expected to increase from 248,416 to 297,994 from 2020 to 2040. For ALIF, the largest increases are expected in the 45-54 years (10,316 to 12,216) and 75-84 years (2,898 to 5,340) age groups. Similarly the largest increases in PLF will be seen in the 65-74 years (71,087 to 77,786) and 75-84 years (28,253 to 52,062) age groups. CONCLUSIONS: The large increases in expected volumes of ALIF and PLF could necessitate training of more spinal surgeons and an examination of projected costs. Further analyses are needed to characterize the needs of this increasingly large population of surgical patients.


Assuntos
Vértebras Lombares/cirurgia , Fusão Vertebral/estatística & dados numéricos , Vértebras Torácicas/cirurgia , Adulto , Fatores Etários , Idoso , Idoso de 80 Anos ou mais , Comorbidade , Custos e Análise de Custo , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Procedimentos Neurocirúrgicos/métodos , Procedimentos Neurocirúrgicos/estatística & dados numéricos , Seleção de Pacientes , Fatores Sexuais , Fusão Vertebral/economia , Estados Unidos/epidemiologia
10.
World Neurosurg ; 142: e253-e259, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32599190

RESUMO

OBJECTIVES: Few studies have examined the impact of teaching status and location on outcomes in subarachnoid hemorrhage (SAH). The objective of the present study was to compare mortality and functional outcomes among urban teaching, urban nonteaching, and rural centers for hospitalizations with SAH. METHODS: The National Inpatient Sample for years 2003-2016 was queried for hospitalizations with aneurysmal SAH from 2003 to 2017. Cohorts treated at urban teaching, urban nonteaching, and rural centers were compared with the urban teaching center cohort acting as the reference. The National Inpatient Sample Subarachnoid Hemorrhage Outcome Measure, a validated measure of SAH functional outcome, was used as a coprimary outcome with mortality. Multivariable models adjusted for age, sex, NIH-SSS score, hypertension, and hospital bed size. Trends in SAH mortality rates were calculated. RESULTS: There were 379,716 SAH hospitalizations at urban teaching centers, 105,638 at urban nonteaching centers, and 17,165 at rural centers. Adjusted mortality rates for urban teaching centers were lower than urban nonteaching (21.90% vs. 25.00%, P < 0.0001) and rural (21.90% vs. 30.90%, P < 0.0001) centers. While urban teaching (24.74% to 21.22%) and urban nonteaching (24.78% to 23.68%) had decreases in mortality rates over the study period, rural hospitals showed increased mortality rates (25.67% to 33.38%). CONCLUSIONS: Rural and urban nonteaching centers have higher rates of mortality from SAH than urban teaching centers. Further study is necessary to understand drivers of these differences.


Assuntos
Aneurisma Roto/epidemiologia , Hospitais Rurais/estatística & dados numéricos , Hospitais de Ensino/estatística & dados numéricos , Hospitais Urbanos/estatística & dados numéricos , Aneurisma Intracraniano/epidemiologia , Hemorragia Subaracnóidea/epidemiologia , Idoso , Aneurisma Roto/mortalidade , Feminino , Número de Leitos em Hospital , Humanos , Hipertensão , Incidência , Aneurisma Intracraniano/mortalidade , Masculino , Pessoa de Meia-Idade , Mortalidade , Análise Multivariada , Hemorragia Subaracnóidea/mortalidade , Estados Unidos/epidemiologia
11.
World Neurosurg ; 141: e166-e174, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32416236

RESUMO

BACKGROUND: Subdural hematomas (SDHs) are a common and dangerous condition, with potential for a rapid rise in incidence given the aging U.S. population, but the magnitude of this increase is unknown. Our objective was to characterize the number of SDHs and practicing neurosurgeons from 2003-2016 and project these numbers to 2040. METHODS: Using the National Inpatient Sample years 2003-2016 (nearly 500 million hospitalizations), all hospitalizations with a diagnosis of SDH were identified and grouped by age. Numerical estimates of SDHs were projected to 2040 in 10-year increments for each age group using Poisson modeling with population estimates from the U.S. Census Bureau. The number of neurosurgeons who billed the Centers for Medicare and Medicaid Services from 2012 to 2017 was noted and linearly projected to 2040. RESULTS: From 2020-2040, SDH volume is expected to increase by 78.3%, from 135,859 to 208,212. Most of this increase will be seen in the elderly, as patients 75-84 years old will experience an increase from 37,941 to 69,914 and patients older than 85 years old will experience an increase from 31,200 to 67,181. The number of neurosurgeons is projected to increase from 4675 in 2020 to 6252 in 2040. CONCLUSIONS: SDH is expected to increase significantly from 2020-2040, with the majority of this increase being concentrated in elderly patients. While the number of neurosurgeons will also increase, the ability of current neurosurgical resources to properly handle this expected increase in SDH will need to be addressed on a national scale.


Assuntos
Envelhecimento , Transtornos Cerebrovasculares/terapia , Hematoma Subdural Agudo/terapia , Hematoma Subdural Crônico/terapia , Adulto , Idoso , Idoso de 80 Anos ou mais , Transtornos Cerebrovasculares/diagnóstico , Feminino , Previsões , Hematoma Subdural Agudo/diagnóstico , Hematoma Subdural Crônico/diagnóstico , Humanos , Incidência , Masculino , Medicare/economia , Medicare/estatística & dados numéricos , Pessoa de Meia-Idade , Estados Unidos
12.
World Neurosurg ; 141: e175-e181, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32416237

RESUMO

OBJECTIVE: We sought to predict surgical volumes for 2 common cervical spine procedures from 2020 to 2040. METHODS: Using the National Inpatient Sample from 2003-2016, nationwide estimates of anterior cervical diskectomy and fusion (ACDF) and posterior cervical decompression and fusion (PCDF) volumes were calculated using International Classification of Diseases, Ninth and Tenth Revision (ICD-9, ICD-10) procedure codes. With data from the U.S. Census Bureau, estimates of the U.S. population were used to create Poisson models controlling for age and sex. Age was categorized into ranges (<25 years old, 25-34, 35-44, 45-54, 55-64, 65-74, 75-84, and >85), and estimates of surgical volume for each age group were created. RESULTS: From 2020-2040, increases in surgical volume from 13.3% (153,288-173,699) and 19.3% (29,620-35,335) are expected for ACDF and PCDF, respectively. For ACDF, the largest increases are expected in the 45-54 (42,077-49,827) and 75-84 (8065-14,862) age groups, whereas for PCDF, the largest increases will be seen in the 75-84 (3710-6836) age group. In accordance with an aging population, modest increases will be seen for ACDF (858-1847) and PCDF (730-1573) in the >85-year-old cohort. CONCLUSIONS: As expected, large growth in cervical spine surgical volumes is likely to be seen, which could indicate a need for increased numbers of spinal neurosurgeons and orthopedic surgeons. Further studies are needed to investigate the needs of the field in light of these expected increases in volume.


Assuntos
Vértebras Cervicais/cirurgia , Pescoço/cirurgia , Procedimentos Neurocirúrgicos , Complicações Pós-Operatórias/epidemiologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Descompressão Cirúrgica/métodos , Discotomia/métodos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Fusão Vertebral/métodos , Estados Unidos
14.
Neurosurgery ; 86(2): E108-E117, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31361011

RESUMO

Brain-computer interface (BCI) technology is rapidly developing and changing the paradigm of neurorestoration by linking cortical activity with control of an external effector to provide patients with tangible improvements in their ability to interact with the environment. The sensor component of a BCI circuit dictates the resolution of brain pattern recognition and therefore plays an integral role in the technology. Several sensor modalities are currently in use for BCI applications and are broadly either electrode-based or functional neuroimaging-based. Sensors vary in their inherent spatial and temporal resolutions, as well as in practical aspects such as invasiveness, portability, and maintenance. Hybrid BCI systems with multimodal sensory inputs represent a promising development in the field allowing for complimentary function. Artificial intelligence and deep learning algorithms have been applied to BCI systems to achieve faster and more accurate classifications of sensory input and improve user performance in various tasks. Neurofeedback is an important advancement in the field that has been implemented in several types of BCI systems by showing users a real-time display of their recorded brain activity during a task to facilitate their control over their own cortical activity. In this way, neurofeedback has improved BCI classification and enhanced user control over BCI output. Taken together, BCI systems have progressed significantly in recent years in terms of accuracy, speed, and communication. Understanding the sensory components of a BCI is essential for neurosurgeons and clinicians as they help advance this technology in the clinical setting.


Assuntos
Algoritmos , Interfaces Cérebro-Computador/tendências , Encéfalo/fisiologia , Inteligência Artificial/tendências , Encéfalo/diagnóstico por imagem , Eletrocorticografia/métodos , Eletrocorticografia/tendências , Eletrodos Implantados , Eletroencefalografia/métodos , Eletroencefalografia/tendências , Humanos , Neuroimagem/métodos , Neuroimagem/tendências
16.
Neurol Ther ; 8(2): 351-365, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31435868

RESUMO

Deciphering the massive volume of complex electronic data that has been compiled by hospital systems over the past decades has the potential to revolutionize modern medicine, as well as present significant challenges. Deep learning is uniquely suited to address these challenges, and recent advances in techniques and hardware have poised the field of medical machine learning for transformational growth. The clinical neurosciences are particularly well positioned to benefit from these advances given the subtle presentation of symptoms typical of neurologic disease. Here we review the various domains in which deep learning algorithms have already provided impetus for change-areas such as medical image analysis for the improved diagnosis of Alzheimer's disease and the early detection of acute neurologic events; medical image segmentation for quantitative evaluation of neuroanatomy and vasculature; connectome mapping for the diagnosis of Alzheimer's, autism spectrum disorder, and attention deficit hyperactivity disorder; and mining of microscopic electroencephalogram signals and granular genetic signatures. We additionally note important challenges in the integration of deep learning tools in the clinical setting and discuss the barriers to tackling the challenges that currently exist.

17.
Ann Transl Med ; 7(11): 233, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-31317003

RESUMO

BACKGROUND: Errors in grammar, spelling, and usage in radiology reports are common. To automatically detect inappropriate insertions, deletions, and substitutions of words in radiology reports, we proposed using a neural sequence-to-sequence (seq2seq) model. METHODS: Head CT and chest radiograph reports from Mount Sinai Hospital (MSH) (n=61,722 and 818,978, respectively), Mount Sinai Queens (MSQ) (n=30,145 and 194,309, respectively) and MIMIC-III (n=32,259 and 54,685) were converted into sentences. Insertions, substitutions, and deletions of words were randomly introduced. Seq2seq models were trained using corrupted sentences as input to predict original uncorrupted sentences. Three models were trained using head CTs from MSH, chest radiographs from MSH, and head CTs from all three collections. Model performance was assessed across different sites and modalities. A sample of original, uncorrupted sentences were manually reviewed for any error in syntax, usage, or spelling to estimate real-world proofreading performance of the algorithm. RESULTS: Seq2seq detected 90.3% and 88.2% of corrupted sentences with 97.7% and 98.8% specificity in same-site, same-modality test sets for head CTs and chest radiographs, respectively. Manual review of original, uncorrupted same-site same-modality head CT sentences demonstrated seq2seq positive predictive value (PPV) 0.393 (157/400; 95% CI, 0.346-0.441) and negative predictive value (NPV) 0.986 (789/800; 95% CI, 0.976-0.992) for detecting sentences containing real-world errors, with estimated sensitivity of 0.389 (95% CI, 0.267-0.542) and specificity 0.986 (95% CI, 0.985-0.987) over n=86,211 uncorrupted training examples. CONCLUSIONS: Seq2seq models can be highly effective at detecting erroneous insertions, deletions, and substitutions of words in radiology reports. To achieve high performance, these models require site- and modality-specific training examples. Incorporating additional targeted training data could further improve performance in detecting real-world errors in reports.

18.
Ann Transl Med ; 7(11): 232, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-31317002

RESUMO

BACKGROUND: Differentiating glioblastoma, brain metastasis, and central nervous system lymphoma (CNSL) on conventional magnetic resonance imaging (MRI) can present a diagnostic dilemma due to the potential for overlapping imaging features. We investigate whether machine learning evaluation of multimodal MRI can reliably differentiate these entities. METHODS: Preoperative brain MRI including diffusion weighted imaging (DWI), dynamic contrast enhanced (DCE), and dynamic susceptibility contrast (DSC) perfusion in patients with glioblastoma, lymphoma, or metastasis were retrospectively reviewed. Perfusion maps (rCBV, rCBF), permeability maps (K-trans, Kep, Vp, Ve), ADC, T1C+ and T2/FLAIR images were coregistered and two separate volumes of interest (VOIs) were obtained from the enhancing tumor and non-enhancing T2 hyperintense (NET2) regions. The tumor volumes obtained from these VOIs were utilized for supervised training of support vector classifier (SVC) and multilayer perceptron (MLP) models. Validation of the trained models was performed on unlabeled cases using the leave-one-subject-out method. Head-to-head and multiclass models were created. Accuracies of the multiclass models were compared against two human interpreters reviewing conventional and diffusion-weighted MR images. RESULTS: Twenty-six patients enrolled with histopathologically-proven glioblastoma (n=9), metastasis (n=9), and CNS lymphoma (n=8) were included. The trained multiclass ML models discriminated the three pathologic classes with a maximum accuracy of 69.2% accuracy (18 out of 26; kappa 0.540, P=0.01) using an MLP trained with the VpNET2 tumor volumes. Human readers achieved 65.4% (17 out of 26) and 80.8% (21 out of 26) accuracies, respectively. Using the MLP VpNET2 model as a computer-aided diagnosis (CADx) for cases in which the human reviewers disagreed with each other on the diagnosis resulted in correct diagnoses in 5 (19.2%) additional cases. CONCLUSIONS: Our trained multiclass MLP using VpNET2 can differentiate glioblastoma, brain metastasis, and CNS lymphoma with modest diagnostic accuracy and provides approximately 19% increase in diagnostic yield when added to routine human interpretation.

20.
PLoS Med ; 15(11): e1002683, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30399157

RESUMO

BACKGROUND: There is interest in using convolutional neural networks (CNNs) to analyze medical imaging to provide computer-aided diagnosis (CAD). Recent work has suggested that image classification CNNs may not generalize to new data as well as previously believed. We assessed how well CNNs generalized across three hospital systems for a simulated pneumonia screening task. METHODS AND FINDINGS: A cross-sectional design with multiple model training cohorts was used to evaluate model generalizability to external sites using split-sample validation. A total of 158,323 chest radiographs were drawn from three institutions: National Institutes of Health Clinical Center (NIH; 112,120 from 30,805 patients), Mount Sinai Hospital (MSH; 42,396 from 12,904 patients), and Indiana University Network for Patient Care (IU; 3,807 from 3,683 patients). These patient populations had an age mean (SD) of 46.9 years (16.6), 63.2 years (16.5), and 49.6 years (17) with a female percentage of 43.5%, 44.8%, and 57.3%, respectively. We assessed individual models using the area under the receiver operating characteristic curve (AUC) for radiographic findings consistent with pneumonia and compared performance on different test sets with DeLong's test. The prevalence of pneumonia was high enough at MSH (34.2%) relative to NIH and IU (1.2% and 1.0%) that merely sorting by hospital system achieved an AUC of 0.861 (95% CI 0.855-0.866) on the joint MSH-NIH dataset. Models trained on data from either NIH or MSH had equivalent performance on IU (P values 0.580 and 0.273, respectively) and inferior performance on data from each other relative to an internal test set (i.e., new data from within the hospital system used for training data; P values both <0.001). The highest internal performance was achieved by combining training and test data from MSH and NIH (AUC 0.931, 95% CI 0.927-0.936), but this model demonstrated significantly lower external performance at IU (AUC 0.815, 95% CI 0.745-0.885, P = 0.001). To test the effect of pooling data from sites with disparate pneumonia prevalence, we used stratified subsampling to generate MSH-NIH cohorts that only differed in disease prevalence between training data sites. When both training data sites had the same pneumonia prevalence, the model performed consistently on external IU data (P = 0.88). When a 10-fold difference in pneumonia rate was introduced between sites, internal test performance improved compared to the balanced model (10× MSH risk P < 0.001; 10× NIH P = 0.002), but this outperformance failed to generalize to IU (MSH 10× P < 0.001; NIH 10× P = 0.027). CNNs were able to directly detect hospital system of a radiograph for 99.95% NIH (22,050/22,062) and 99.98% MSH (8,386/8,388) radiographs. The primary limitation of our approach and the available public data is that we cannot fully assess what other factors might be contributing to hospital system-specific biases. CONCLUSION: Pneumonia-screening CNNs achieved better internal than external performance in 3 out of 5 natural comparisons. When models were trained on pooled data from sites with different pneumonia prevalence, they performed better on new pooled data from these sites but not on external data. CNNs robustly identified hospital system and department within a hospital, which can have large differences in disease burden and may confound predictions.


Assuntos
Aprendizado Profundo , Diagnóstico por Computador/métodos , Pneumonia/diagnóstico por imagem , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Radiografia Torácica/métodos , Adulto , Idoso , Estudos Transversais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Valor Preditivo dos Testes , Sistemas de Informação em Radiologia , Reprodutibilidade dos Testes , Estudos Retrospectivos , Estados Unidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...