Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Assunto da revista
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38470976

RESUMO

BACKGROUND: Estimating the risk of revision after arthroplasty could inform patient and surgeon decision-making. However, there is a lack of well-performing prediction models assisting in this task, which may be due to current conventional modeling approaches such as traditional survivorship estimators (such as Kaplan-Meier) or competing risk estimators. Recent advances in machine learning survival analysis might improve decision support tools in this setting. Therefore, this study aimed to assess the performance of machine learning compared with that of conventional modeling to predict revision after arthroplasty. QUESTION/PURPOSE: Does machine learning perform better than traditional regression models for estimating the risk of revision for patients undergoing hip or knee arthroplasty? METHODS: Eleven datasets from published studies from the Dutch Arthroplasty Register reporting on factors associated with revision or survival after partial or total knee and hip arthroplasty between 2018 and 2022 were included in our study. The 11 datasets were observational registry studies, with a sample size ranging from 3038 to 218,214 procedures. We developed a set of time-to-event models for each dataset, leading to 11 comparisons. A set of predictors (factors associated with revision surgery) was identified based on the variables that were selected in the included studies. We assessed the predictive performance of two state-of-the-art statistical time-to-event models for 1-, 2-, and 3-year follow-up: a Fine and Gray model (which models the cumulative incidence of revision) and a cause-specific Cox model (which models the hazard of revision). These were compared with a machine-learning approach (a random survival forest model, which is a decision tree-based machine-learning algorithm for time-to-event analysis). Performance was assessed according to discriminative ability (time-dependent area under the receiver operating curve), calibration (slope and intercept), and overall prediction error (scaled Brier score). Discrimination, known as the area under the receiver operating characteristic curve, measures the model's ability to distinguish patients who achieved the outcomes from those who did not and ranges from 0.5 to 1.0, with 1.0 indicating the highest discrimination score and 0.50 the lowest. Calibration plots the predicted versus the observed probabilities; a perfect plot has an intercept of 0 and a slope of 1. The Brier score calculates a composite of discrimination and calibration, with 0 indicating perfect prediction and 1 the poorest. A scaled version of the Brier score, 1 - (model Brier score/null model Brier score), can be interpreted as the amount of overall prediction error. RESULTS: Using machine learning survivorship analysis, we found no differences between the competing risks estimator and traditional regression models for patients undergoing arthroplasty in terms of discriminative ability (patients who received a revision compared with those who did not). We found no consistent differences between the validated performance (time-dependent area under the receiver operating characteristic curve) of different modeling approaches because these values ranged between -0.04 and 0.03 across the 11 datasets (the time-dependent area under the receiver operating characteristic curve of the models across 11 datasets ranged between 0.52 to 0.68). In addition, the calibration metrics and scaled Brier scores produced comparable estimates, showing no advantage of machine learning over traditional regression models. CONCLUSION: Machine learning did not outperform traditional regression models. CLINICAL RELEVANCE: Neither machine learning modeling nor traditional regression methods were sufficiently accurate in order to offer prognostic information when predicting revision arthroplasty. The benefit of these modeling approaches may be limited in this context.

2.
Clin Orthop Relat Res ; 480(11): 2205-2213, 2022 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-35561268

RESUMO

BACKGROUND: Postoperative delirium in patients aged 60 years or older with hip fractures adversely affects clinical and functional outcomes. The economic cost of delirium is estimated to be as high as USD 25,000 per patient, with a total budgetary impact between USD 6.6 to USD 82.4 billion annually in the United States alone. Forty percent of delirium episodes are preventable, and accurate risk stratification can decrease the incidence and improve clinical outcomes in patients. A previously developed clinical prediction model (the SORG Orthopaedic Research Group hip fracture delirium machine-learning algorithm) is highly accurate on internal validation (in 28,207 patients with hip fractures aged 60 years or older in a US cohort) in identifying at-risk patients, and it can facilitate the best use of preventive interventions; however, it has not been tested in an independent population. For an algorithm to be useful in real life, it must be valid externally, meaning that it must perform well in a patient cohort different from the cohort used to "train" it. With many promising machine-learning prediction models and many promising delirium models, only few have also been externally validated, and even fewer are international validation studies. QUESTION/PURPOSE: Does the SORG hip fracture delirium algorithm, initially trained on a database from the United States, perform well on external validation in patients aged 60 years or older in Australia and New Zealand? METHODS: We previously developed a model in 2021 for assessing risk of delirium in hip fracture patients using records of 28,207 patients obtained from the American College of Surgeons National Surgical Quality Improvement Program. Variables included in the original model included age, American Society of Anesthesiologists (ASA) class, functional status (independent or partially or totally dependent for any activities of daily living), preoperative dementia, preoperative delirium, and preoperative need for a mobility aid. To assess whether this model could be applied elsewhere, we used records from an international hip fracture registry. Between June 2017 and December 2018, 6672 patients older than 60 years of age in Australia and New Zealand were treated surgically for a femoral neck, intertrochanteric hip, or subtrochanteric hip fracture and entered into the Australian & New Zealand Hip Fracture Registry. Patients were excluded if they had a pathological hip fracture or septic shock. Of all patients, 6% (402 of 6672) did not meet the inclusion criteria, leaving 94% (6270 of 6672) of patients available for inclusion in this retrospective analysis. Seventy-one percent (4249 of 5986) of patients were aged 80 years or older, after accounting for 5% (284 of 6270) of missing values; 68% (4292 of 6266) were female, after accounting for 0.06% (4 of 6270) of missing values, and 83% (4690 of 5661) of patients were classified as ASA III/IV, after accounting for 10% (609 of 6270) of missing values. Missing data were imputed using the missForest methodology. In total, 39% (2467 of 6270) of patients developed postoperative delirium. The performance of the SORG hip fracture delirium algorithm on the validation cohort was assessed by discrimination, calibration, Brier score, and a decision curve analysis. Discrimination, known as the area under the receiver operating characteristic curves (c-statistic), measures the model's ability to distinguish patients who achieved the outcomes from those who did not and ranges from 0.5 to 1.0, with 1.0 indicating the highest discrimination score and 0.50 the lowest. Calibration plots the predicted versus the observed probabilities, a perfect plot has an intercept of 0 and a slope of 1. The Brier score calculates a composite of discrimination and calibration, with 0 indicating perfect prediction and 1 the poorest. RESULTS: The SORG hip fracture algorithm, when applied to an external patient cohort, distinguished between patients at low risk and patients at moderate to high risk of developing postoperative delirium. The SORG hip fracture algorithm performed with a c-statistic of 0.74 (95% confidence interval 0.73 to 0.76). The calibration plot showed high accuracy in the lower predicted probabilities (intercept -0.28, slope 0.52) and a Brier score of 0.22 (the null model Brier score was 0.24). The decision curve analysis showed that the model can be beneficial compared with no model or compared with characterizing all patients as at risk for developing delirium. CONCLUSION: Algorithms developed with machine learning are a potential tool for refining treatment of at-risk patients. If high-risk patients can be reliably identified, resources can be appropriately directed toward their care. Although the current iteration of SORG should not be relied on for patient care, it suggests potential utility in assessing risk. Further assessment in different populations, made easier by international collaborations and standardization of registries, would be useful in the development of universally valid prediction models. The model can be freely accessed at: https://sorg-apps.shinyapps.io/hipfxdelirium/ . LEVEL OF EVIDENCE: Level III, therapeutic study.


Assuntos
Delírio , Fraturas do Quadril , Ortopedia , Atividades Cotidianas , Algoritmos , Austrália , Delírio/diagnóstico , Delírio/epidemiologia , Delírio/etiologia , Feminino , Fraturas do Quadril/cirurgia , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Prognóstico , Estudos Retrospectivos
3.
Clin Orthop Relat Res ; 480(1): 150-159, 2022 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-34427569

RESUMO

BACKGROUND: Reliably recognizing the overall pattern and specific characteristics of proximal humerus fractures may aid in surgical decision-making. With conventional onscreen imaging modalities, there is considerable and undesired interobserver variability, even when observers receive training in the application of the classification systems used. It is unclear whether three-dimensional (3D) models, which now can be fabricated with desktop printers at relatively little cost, can decrease interobserver variability in fracture classification. QUESTIONS/PURPOSES: Do 3D-printed handheld models of proximal humerus fractures improve agreement among residents and attending surgeons regarding (1) specific fracture characteristics and (2) patterns according to the Neer and Hertel classification systems? METHODS: Plain radiographs, as well as two-dimensional (2D) and 3D CT images, were collected from 20 patients (aged 18 years or older) who sustained a three-part or four-part proximal humerus fracture treated at a Level I trauma center between 2015 and 2019. The included images were chosen to comprise images from patients whose fractures were considered as difficult-to-classify, displaced fractures. Consequently, the images were assessed for eight fracture characteristics and categorized according to the Neer and Hertel classifications by four orthopaedic residents and four attending orthopaedic surgeons during two separate sessions. In the first session, the assessment was performed with conventional onscreen imaging (radiographs and 2D and 3D CT images). In the second session, 3D-printed handheld models were used for assessment, while onscreen imaging was also available. Although proximal humerus classifications such as the Neer classification have, in the past, been shown to have low interobserver reliability, we theorized that by receiving direct tactile and visual feedback from 3D-printed handheld fracture models, clinicians would be able to recognize the complex 3D aspects of classification systems reliably. Interobserver agreement was determined with the multirater Fleiss kappa and scored according to the categorical rating by Landis and Koch. To determine whether there was a difference between the two sessions, we calculated the delta (difference in the) kappa value with 95% confidence intervals and a two-tailed p value. Post hoc power analysis revealed that with the current sample size, a delta kappa value of 0.40 could be detected with 80% power at alpha = 0.05. RESULTS: Using 3D-printed models in addition to conventional imaging did not improve interobserver agreement of the following fracture characteristics: more than 2 mm medial hinge displacement, more than 8 mm metaphyseal extension, surgical neck fracture, anatomic neck fracture, displacement of the humeral head, more than 10 mm lesser tuberosity displacement, and more than 10 mm greater tuberosity displacement. Agreement regarding the presence of a humeral head-splitting fracture was improved but only to a level that was insufficient for clinical or scientific use (fair to substantial, delta kappa = 0.33 [95% CI 0.02 to 0.64]). Assessing 3D-printed handheld models adjunct to onscreen conventional imaging did not improve the interobserver agreement for pattern recognition according to Neer (delta kappa = 0.02 [95% CI -0.11 to 0.07]) and Hertel (delta kappa = 0.01 [95% CI -0.11 to 0.08]). There were no differences between residents and attending surgeons in terms of whether 3D models helped them classify the fractures, but there were few differences to identify fracture characteristics. However, none of the identified differences improved to almost perfect agreement (kappa value above 0.80), so even those few differences are unlikely to be clinically useful. CONCLUSION: Using 3D-printed handheld fracture models in addition to conventional onscreen imaging of three-part and four-part proximal humerus fractures does not improve agreement among residents and attending surgeons on specific fracture characteristics and patterns. Therefore, we do not recommend that clinicians expend the time and costs needed to create these models if the goal is to classify or describe patients' fracture characteristics or pattern, since doing so is unlikely to improve clinicians' abilities to select treatment or estimate prognosis. LEVEL OF EVIDENCE: Level III, diagnostic study.


Assuntos
Fraturas do Ombro , Tomografia Computadorizada por Raios X , Humanos , Cabeça do Úmero , Variações Dependentes do Observador , Impressão Tridimensional , Reprodutibilidade dos Testes , Fraturas do Ombro/diagnóstico por imagem , Fraturas do Ombro/cirurgia
4.
Clin Orthop Relat Res ; 480(12): 2350-2360, 2022 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-35767811

RESUMO

BACKGROUND: Femoral neck fractures are common and are frequently treated with internal fixation. A major disadvantage of internal fixation is the substantially high number of conversions to arthroplasty because of nonunion, malunion, avascular necrosis, or implant failure. A clinical prediction model identifying patients at high risk of conversion to arthroplasty may help clinicians in selecting patients who could have benefited from arthroplasty initially. QUESTION/PURPOSE: What is the predictive performance of a machine-learning (ML) algorithm to predict conversion to arthroplasty within 24 months after internal fixation in patients with femoral neck fractures? METHODS: We included 875 patients from the Fixation using Alternative Implants for the Treatment of Hip fractures (FAITH) trial. The FAITH trial consisted of patients with low-energy femoral neck fractures who were randomly assigned to receive a sliding hip screw or cancellous screws for internal fixation. Of these patients, 18% (155 of 875) underwent conversion to THA or hemiarthroplasty within the first 24 months. All patients were randomly divided into a training set (80%) and test set (20%). First, we identified 27 potential patient and fracture characteristics that may have been associated with our primary outcome, based on biomechanical rationale and previous studies. Then, random forest algorithms (an ML learning, decision tree-based algorithm that selects variables) identified 10 predictors of conversion: BMI, cardiac disease, Garden classification, use of cardiac medication, use of pulmonary medication, age, lung disease, osteoarthritis, sex, and the level of the fracture line. Based on these variables, five different ML algorithms were trained to identify patterns related to conversion. The predictive performance of these trained ML algorithms was assessed on the training and test sets based on the following performance measures: (1) discrimination (the model's ability to distinguish patients who had conversion from those who did not; expressed with the area under the receiver operating characteristic curve [AUC]), (2) calibration (the plotted estimated versus the observed probabilities; expressed with the calibration curve intercept and slope), and (3) the overall model performance (Brier score: a composite of discrimination and calibration). RESULTS: None of the five ML algorithms performed well in predicting conversion to arthroplasty in the training set and the test set; AUCs of the algorithms in the training set ranged from 0.57 to 0.64, slopes of calibration plots ranged from 0.53 to 0.82, calibration intercepts ranged from -0.04 to 0.05, and Brier scores ranged from 0.14 to 0.15. The algorithms were further evaluated in the test set; AUCs ranged from 0.49 to 0.73, calibration slopes ranged from 0.17 to 1.29, calibration intercepts ranged from -1.28 to 0.34, and Brier scores ranged from 0.13 to 0.15. CONCLUSION: The predictive performance of the trained algorithms was poor, despite the use of one of the best datasets available worldwide on this subject. If the current dataset consisted of different variables or more patients, the performance may have been better. Also, various reasons for conversion to arthroplasty were pooled in this study, but the separate prediction of underlying pathology (such as, avascular necrosis or nonunion) may be more precise. Finally, it may be possible that it is inherently difficult to predict conversion to arthroplasty based on preoperative variables alone. Therefore, future studies should aim to include more variables and to differentiate between the various reasons for arthroplasty. LEVEL OF EVIDENCE: Level III, prognostic study.


Assuntos
Artroplastia de Quadril , Fraturas do Colo Femoral , Humanos , Prognóstico , Modelos Estatísticos , Fraturas do Colo Femoral/cirurgia , Artroplastia de Quadril/efeitos adversos , Fixação Interna de Fraturas/efeitos adversos , Algoritmos , Aprendizado de Máquina , Necrose/etiologia , Necrose/cirurgia , Estudos Retrospectivos , Resultado do Tratamento
5.
Clin Orthop Relat Res ; 480(9): 1766-1775, 2022 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-35412473

RESUMO

BACKGROUND: Incidental durotomy is an intraoperative complication in spine surgery that can lead to postoperative complications, increased length of stay, and higher healthcare costs. Natural language processing (NLP) is an artificial intelligence method that assists in understanding free-text notes that may be useful in the automated surveillance of adverse events in orthopaedic surgery. A previously developed NLP algorithm is highly accurate in the detection of incidental durotomy on internal validation and external validation in an independent cohort from the same country. External validation in a cohort with linguistic differences is required to assess the transportability of the developed algorithm, referred to geographical validation. Ideally, the performance of a prediction model, the NLP algorithm, is constant across geographic regions to ensure reproducibility and model validity. QUESTION/PURPOSE: Can we geographically validate an NLP algorithm for the automated detection of incidental durotomy across three independent cohorts from two continents? METHODS: Patients 18 years or older undergoing a primary procedure of (thoraco)lumbar spine surgery were included. In Massachusetts, between January 2000 and June 2018, 1000 patients were included from two academic and three community medical centers. In Maryland, between July 2016 and November 2018, 1279 patients were included from one academic center, and in Australia, between January 2010 and December 2019, 944 patients were included from one academic center. The authors retrospectively studied the free-text operative notes of included patients for the primary outcome that was defined as intraoperative durotomy. Incidental durotomy occurred in 9% (93 of 1000), 8% (108 of 1279), and 6% (58 of 944) of the patients, respectively, in the Massachusetts, Maryland, and Australia cohorts. No missing reports were observed. Three datasets (Massachusetts, Australian, and combined Massachusetts and Australian) were divided into training and holdout test sets in an 80:20 ratio. An extreme gradient boosting (an efficient and flexible tree-based algorithm) NLP algorithm was individually trained on each training set, and the performance of the three NLP algorithms (respectively American, Australian, and combined) was assessed by discrimination via area under the receiver operating characteristic curves (AUC-ROC; this measures the model's ability to distinguish patients who obtained the outcomes from those who did not), calibration metrics (which plot the predicted and the observed probabilities) and Brier score (a composite of discrimination and calibration). In addition, the sensitivity (true positives, recall), specificity (true negatives), positive predictive value (also known as precision), negative predictive value, F1-score (composite of precision and recall), positive likelihood ratio, and negative likelihood ratio were calculated. RESULTS: The combined NLP algorithm (the combined Massachusetts and Australian data) achieved excellent performance on independent testing data from Australia (AUC-ROC 0.97 [95% confidence interval 0.87 to 0.99]), Massachusetts (AUC-ROC 0.99 [95% CI 0.80 to 0.99]) and Maryland (AUC-ROC 0.95 [95% CI 0.93 to 0.97]). The NLP developed based on the Massachusetts cohort had excellent performance in the Maryland cohort (AUC-ROC 0.97 [95% CI 0.95 to 0.99]) but worse performance in the Australian cohort (AUC-ROC 0.74 [95% CI 0.70 to 0.77]). CONCLUSION: We demonstrated the clinical utility and reproducibility of an NLP algorithm with combined datasets retaining excellent performance in individual countries relative to algorithms developed in the same country alone for detection of incidental durotomy. Further multi-institutional, international collaborations can facilitate the creation of universal NLP algorithms that improve the quality and safety of orthopaedic surgery globally. The combined NLP algorithm has been incorporated into a freely accessible web application that can be found at https://sorg-apps.shinyapps.io/nlp_incidental_durotomy/ . Clinicians and researchers can use the tool to help incorporate the model in evaluating spine registries or quality and safety departments to automate detection of incidental durotomy and optimize prevention efforts. LEVEL OF EVIDENCE: Level III, diagnostic study.


Assuntos
Inteligência Artificial , Processamento de Linguagem Natural , Algoritmos , Austrália , Humanos , Reprodutibilidade dos Testes , Estudos Retrospectivos
6.
Acta Orthop ; 92(4): 385-393, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-33870837

RESUMO

Background and purpose - External validation of machine learning (ML) prediction models is an essential step before clinical application. We assessed the proportion, performance, and transparent reporting of externally validated ML prediction models in orthopedic surgery, using the Transparent Reporting for Individual Prognosis or Diagnosis (TRIPOD) guidelines.Material and methods - We performed a systematic search using synonyms for every orthopedic specialty, ML, and external validation. The proportion was determined by using 59 ML prediction models with only internal validation in orthopedic surgical outcome published up until June 18, 2020, previously identified by our group. Model performance was evaluated using discrimination, calibration, and decision-curve analysis. The TRIPOD guidelines assessed transparent reporting.Results - We included 18 studies externally validating 10 different ML prediction models of the 59 available ML models after screening 4,682 studies. All external validations identified in this review retained good discrimination. Other key performance measures were provided in only 3 studies, rendering overall performance evaluation difficult. The overall median TRIPOD completeness was 61% (IQR 43-89), with 6 items being reported in less than 4/18 of the studies.Interpretation - Most current predictive ML models are not externally validated. The 18 available external validation studies were characterized by incomplete reporting of performance measures, limiting a transparent examination of model performance. Further prospective studies are needed to validate or refute the myriad of predictive ML models in orthopedics while adhering to existing guidelines. This ensures clinicians can take full advantage of validated and clinically implementable ML decision tools.


Assuntos
Técnicas de Apoio para a Decisão , Aprendizado de Máquina/normas , Modelos Estatísticos , Procedimentos Ortopédicos , Humanos , Resultado do Tratamento , Estudos de Validação como Assunto
7.
J Sports Sci ; 37(2): 131-137, 2019 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-29912622

RESUMO

The objective was to systematically review the literature on risk factors and prevention programs for musculoskeletal injuries among tennis players. PubmedMedline, Embase, CINAHL, Cochrane, SportDiscus were searched up to February 2017. Experts in clinical and epidemiological medicine were contacted to obtain additional studies. For risk factors, prospective cohort studies (n > 20) with a statistical analysis for injured and non-injured players were included and studies with a RCT design for prevention programs. Downs&Black checklist was assessed for risk of bias for risk factors. From a total of 4067 articles, five articles met our inclusion criteria for risk factors. No studies on effectiveness of prevention programs were identified. Quality of studies included varied from fair to excellent. Best evidence synthesis revealed moderate evidence for previous injury regardless of body location in general and fewer years of tennis experience for the occurrence of upper extremity injuries. Moderate evidence was found for lower back injuries, a previous back injury, playing >6hours/week and low lateral flexion of the neck for risk factors. Limited evidence was found for male gender as a risk factor. The risk factors identified can assist clinicians in developing prevention-strategies. Further studies should focus on risk factor evaluation in recreational adult tennis players.


Assuntos
Sistema Musculoesquelético/lesões , Tênis/lesões , Traumatismos em Atletas/prevenção & controle , Humanos , Recidiva , Fatores de Risco , Fatores Sexuais
9.
Clin Orthop Relat Res ; 476(4): 767-775, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29480883

RESUMO

BACKGROUND: Although a parent's perception of his or her child's physical and emotional functioning may influence the course of the child's medical care, including access to care and decisions regarding treatment options, no studies have investigated whether the perceptions of a parent are concordant with that of an adolescent diagnosed with a sports-related orthopaedic injury. Identifying and understanding the potential discordance in coping and emotional distress within the athlete adolescent-parent dyads are important, because this discordance may have negative effects on adolescents' well-being. QUESTIONS/PURPOSES: The purposes of this study were (1) to compare adolescent and parent proxy ratings of psychologic symptoms (depression and anxiety), coping skills (catastrophic thinking about pain and pain self-efficacy), and upper extremity physical function and mobility in a population of adolescent-parent dyads in which the adolescent had a sport-related injury; and (2) to compare scores of adolescents and parent proxies with normative scores when such are available. METHODS: We enrolled 54 dyads (eg, pairs) of adolescent patients (mean age 16 years; SD = 1.6) presenting to a sports medicine practice with sports-related injuries as well as their accompanying parent(s). We used Patient-reported Outcomes Measurement Information System questionnaires to measure adolescents' depression, anxiety, upper extremity physical function, and mobility. We used the Pain Catastrophizing Scale short form to assess adolescents' catastrophic thinking about pain and the Pain Self-efficacy Scale short form to measure adolescents' pain self-efficacy. The accompanying parent, 69% mothers (37 of 54) and 31% fathers (17 of 54), completed parent proxy versions of each questionnaire. RESULTS: Parents reported that their children had worse scores (47 ± 9) on depression than what the children themselves reported (43 ± 9; mean difference 4.0; 95% confidence interval [CI], -7.0 to 0.91; p = 0.011; medium effect size -0.47). Also, parents reported that their children engaged in catastrophic thinking about pain to a lesser degree (8 ± 5) than what the children themselves reported (13 ± 4; mean difference 4.5; 95% CI, 2.7-6.4; p < 0.001; large effect size 1.2). Because scores on depression and catastrophic thinking were comparable to the general population, and minimal clinically important difference scores are not available for these measures, it is unclear whether the relatively small observed differences between parents' and adolescents' ratings are clinically meaningful. Parents and children were concordant on their reports of the child's upper extremity physical function (patient perception 47 ± 10, parent proxy 47 ± 8, mean difference -0.43, p = 0.70), mobility (patient perception 43 ± 9, parent proxy 44 ± 9, mean difference -0.59, p = 0.64), anxiety (patient perception 43 ± 10, parent proxy 46 ± 8, mean difference -2.1, p = 0.21), and pain self-efficacy (patient perception 16 ± 5, parent proxy 15 ± 5, mean difference 0.70, p = 0.35). CONCLUSIONS: Parents rated their children as more depressed and engaging in less catastrophic thinking about pain than the adolescents rated themselves. Although these differences are statistically significant, they are of a small magnitude making it unclear as to how clinically important they are in practice. We recommend that providers keep in mind that parents may overestimate depressive symptoms and underestimate the catastrophic thinking about pain in their children, probe for these potential differences, and consider how they might impact medical care. LEVEL OF EVIDENCE: Level I, prognostic study.


Assuntos
Comportamento do Adolescente , Ansiedade/psicologia , Atletas/psicologia , Traumatismos em Atletas/psicologia , Depressão/psicologia , Dor Musculoesquelética/psicologia , Pais/psicologia , Adaptação Psicológica , Adolescente , Fatores Etários , Ansiedade/diagnóstico , Ansiedade/fisiopatologia , Traumatismos em Atletas/diagnóstico , Traumatismos em Atletas/fisiopatologia , Catastrofização , Depressão/diagnóstico , Depressão/fisiopatologia , Avaliação da Deficiência , Emoções , Feminino , Nível de Saúde , Humanos , Masculino , Saúde Mental , Dor Musculoesquelética/diagnóstico , Dor Musculoesquelética/fisiopatologia , Medição da Dor , Percepção da Dor , Medidas de Resultados Relatados pelo Paciente , Autoeficácia , Índice de Gravidade de Doença
10.
J Shoulder Elbow Surg ; 26(9): 1629-1635, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28478896

RESUMO

BACKGROUND: The goals of this study were to evaluate the reliability of a quantitative 3-dimensional computed tomography (Q3DCT) technique for measurement of the capitellar osteochondritis dissecans (OCD) surface area, to analyze OCD distribution using a mapping technique, and to investigate associations between Q3DCT lesion quantification and demographic characteristics and/or clinical examination findings. METHODS: We identified patients with capitellar OCD who presented to our orthopedic sports medicine practice between January 2001 and January 2016 and who had undergone a preoperative computed tomography scan (slice thickness ≤1.25 mm). A total of 17 patients with a median age of 15 years (range, 12-23 years) were included in this study. Three-dimensional polygon models were reconstructed after osseous structures were marked in 3 planes. Surface areas of the OCD lesion as well as the capitellum were measured. Observer agreement was assessed with the intraclass correlation coefficient (ICC). Heat maps were created to visualize OCD distribution. RESULTS: Measurements of the OCD surface area showed almost perfect intraobserver agreement (ICC, 0.99; confidence interval [CI], 0.98-0.99) and interobserver agreement (ICC, 0.93; CI, 0.86-0.97). Measurements of the capitellar surface area also showed almost perfect intraobserver agreement (ICC, 0.97;CI, 0.91-0.99) and interobserver agreement (ICC, 0.86; CI, 0.46-0.96). The median OCD surface area was 101 mm2 (range, 49-217 mm2). On the basis of OCD heat mapping, the posterolateral zone of the capitellum was most frequently affected. OCDs in which the lateral wall was involved were associated with larger lesion size (P = .041), longer duration of symptoms (P = .030), and worse elbow extension (P = .013). CONCLUSIONS: The ability to quantify the capitellar OCD surface area and lesion location in a reliable manner using Q3DCT and a mapping technique should be considered when detailed knowledge of lesion size and location is desired.


Assuntos
Articulação do Cotovelo/cirurgia , Osteocondrite Dissecante/cirurgia , Adolescente , Bases de Dados Factuais , Articulação do Cotovelo/diagnóstico por imagem , Feminino , Humanos , Imageamento Tridimensional , Masculino , Osteocondrite Dissecante/diagnóstico por imagem , Reprodutibilidade dos Testes , Estudos Retrospectivos , Tomografia Computadorizada por Raios X
11.
NPJ Digit Med ; 7(1): 58, 2024 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-38448743

RESUMO

Despite artificial intelligence (AI) technology progresses at unprecedented rate, our ability to translate these advancements into clinical value and adoption at the bedside remains comparatively limited. This paper reviews the current use of implementation outcomes in randomized controlled trials evaluating AI-based clinical decision support and found limited adoption. To advance trust and clinical adoption of AI, there is a need to bridge the gap between traditional quantitative metrics and implementation outcomes to better grasp the reasons behind the success or failure of AI systems and improve their translation into clinical value.

12.
Bone Jt Open ; 5(1): 9-19, 2024 Jan 16.
Artigo em Inglês | MEDLINE | ID: mdl-38226447

RESUMO

Aims: Machine-learning (ML) prediction models in orthopaedic trauma hold great promise in assisting clinicians in various tasks, such as personalized risk stratification. However, an overview of current applications and critical appraisal to peer-reviewed guidelines is lacking. The objectives of this study are to 1) provide an overview of current ML prediction models in orthopaedic trauma; 2) evaluate the completeness of reporting following the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement; and 3) assess the risk of bias following the Prediction model Risk Of Bias Assessment Tool (PROBAST) tool. Methods: A systematic search screening 3,252 studies identified 45 ML-based prediction models in orthopaedic trauma up to January 2023. The TRIPOD statement assessed transparent reporting and the PROBAST tool the risk of bias. Results: A total of 40 studies reported on training and internal validation; four studies performed both development and external validation, and one study performed only external validation. The most commonly reported outcomes were mortality (33%, 15/45) and length of hospital stay (9%, 4/45), and the majority of prediction models were developed in the hip fracture population (60%, 27/45). The overall median completeness for the TRIPOD statement was 62% (interquartile range 30 to 81%). The overall risk of bias in the PROBAST tool was low in 24% (11/45), high in 69% (31/45), and unclear in 7% (3/45) of the studies. High risk of bias was mainly due to analysis domain concerns including small datasets with low number of outcomes, complete-case analysis in case of missing data, and no reporting of performance measures. Conclusion: The results of this study showed that despite a myriad of potential clinically useful applications, a substantial part of ML studies in orthopaedic trauma lack transparent reporting, and are at high risk of bias. These problems must be resolved by following established guidelines to instil confidence in ML models among patients and clinicians. Otherwise, there will remain a sizeable gap between the development of ML prediction models and their clinical application in our day-to-day orthopaedic trauma practice.

13.
Eur J Trauma Emerg Surg ; 49(3): 1545-1553, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-36757419

RESUMO

PURPOSE: Mortality prediction in elderly femoral neck fracture patients is valuable in treatment decision-making. A previously developed and internally validated clinical prediction model shows promise in identifying patients at risk of 90-day and 2-year mortality. Validation in an independent cohort is required to assess the generalizability; especially in geographically distinct regions. Therefore we questioned, is the SORG Orthopaedic Research Group (SORG) femoral neck fracture mortality algorithm externally valid in an Israeli cohort to predict 90-day and 2-year mortality? METHODS: We previously developed a prediction model in 2022 for estimating the risk of mortality in femoral neck fracture patients using a multicenter institutional cohort of 2,478 patients from the USA. The model included the following input variables that are available on clinical admission: age, male gender, creatinine level, absolute neutrophil, hemoglobin level, international normalized ratio (INR), congestive heart failure (CHF), displaced fracture, hemiplegia, chronic obstructive pulmonary disease (COPD), history of cerebrovascular accident (CVA) and beta-blocker use. To assess the generalizability, we used an intercontinental institutional cohort from the Sheba Medical Center in Israel (level I trauma center), queried between June 2008 and February 2022. Generalizability of the model was assessed using discrimination, calibration, Brier score, and decision curve analysis. RESULTS: The validation cohort included 2,033 patients, aged 65 years or above, that underwent femoral neck fracture surgery. Most patients were female 64.8% (n = 1317), the median age was 81 years (interquartile range = 75-86), and 80.4% (n = 1635) patients sustained a displaced fracture (Garden III/IV). The 90-day mortality was 9.4% (n = 190) and 2-year mortality was 30.0% (n = 610). Despite numerous baseline differences, the model performed acceptably to the validation cohort on discrimination (c-statistic 0.67 for 90-day, 0.67 for 2-year), calibration, Brier score, and decision curve analysis. CONCLUSIONS: The previously developed SORG femoral neck fracture mortality algorithm demonstrated good performance in an independent intercontinental population. Current iteration should not be relied on for patient care, though suggesting potential utility in assessing patients at low risk for 90-day or 2-year mortality. Further studies should evaluate this tool in a prospective setting and evaluate its feasibility and efficacy in clinical practice. The algorithm can be freely accessed: https://sorg-apps.shinyapps.io/hipfracturemortality/ . LEVEL OF EVIDENCE: Level III, Prognostic study.


Assuntos
Fraturas do Colo Femoral , Modelos Estatísticos , Idoso , Humanos , Masculino , Feminino , Idoso de 80 Anos ou mais , Prognóstico , Israel/epidemiologia , Estudos Prospectivos , Fraturas do Colo Femoral/cirurgia , Estudos Retrospectivos
14.
JAMIA Open ; 6(2): ooad033, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37266187

RESUMO

Objective: When correcting for the "class imbalance" problem in medical data, the effects of resampling applied on classifier algorithms remain unclear. We examined the effect on performance over several combinations of classifiers and resampling ratios. Materials and Methods: Multiple classification algorithms were trained on 7 resampled datasets: no correction, random undersampling, 4 ratios of Synthetic Minority Oversampling Technique (SMOTE), and random oversampling with the Adaptive Synthetic algorithm (ADASYN). Performance was evaluated in Area Under the Curve (AUC), precision, recall, Brier score, and calibration metrics. A case study on prediction modeling for 30-day unplanned readmissions in previously admitted Urology patients was presented. Results: For most algorithms, using resampled data showed a significant increase in AUC and precision, ranging from 0.74 (CI: 0.69-0.79) to 0.93 (CI: 0.92-0.94), and 0.35 (CI: 0.12-0.58) to 0.86 (CI: 0.81-0.92) respectively. All classification algorithms showed significant increases in recall, and significant decreases in Brier score with distorted calibration overestimating positives. Discussion: Imbalance correction resulted in an overall improved performance, yet poorly calibrated models. There can still be clinical utility due to a strong discriminating performance, specifically when predicting only low and high risk cases is clinically more relevant. Conclusion: Resampling data resulted in increased performances in classification algorithms, yet produced an overestimation of positive predictions. Based on the findings from our case study, a thoughtful predefinition of the clinical prediction task may guide the use of resampling techniques in future studies aiming to improve clinical decision support tools.

15.
OTA Int ; 6(5 Suppl): e283, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38152438

RESUMO

Objectives: With more than 300,000 patients per year in the United States alone, hip fractures are one of the most common injuries occurring in the elderly. The incidence is predicted to rise to 6 million cases per annum worldwide by 2050. Many fracture registries have been established, serving as tools for quality surveillance and evaluating patient outcomes. Most registries are based on billing and procedural codes, prone to under-reporting of cases. Deep learning (DL) is able to interpret radiographic images and assist in fracture detection; we propose to conduct a DL-based approach intended to autocreate a fracture registry, specifically for the hip fracture population. Methods: Conventional radiographs (n = 18,834) from 2919 patients from Massachusetts General Brigham hospitals were extracted (images designated as hip radiographs within the medical record). We designed a cascade model consisting of 3 submodules for image view classification (MI), postoperative implant detection (MII), and proximal femoral fracture detection (MIII), including data augmentation and scaling, and convolutional neural networks for model development. An ensemble model of 10 models (based on ResNet, VGG, DenseNet, and EfficientNet architectures) was created to detect the presence of a fracture. Results: The accuracy of the developed submodules reached 92%-100%; visual explanations of model predictions were generated through gradient-based methods. Time for the automated model-based fracture-labeling was 0.03 seconds/image, compared with an average of 12 seconds/image for human annotation as calculated in our preprocessing stages. Conclusion: This semisupervised DL approach labeled hip fractures with high accuracy. This mitigates the burden of annotations in a large data set, which is time-consuming and prone to under-reporting. The DL approach may prove beneficial for future efforts to autocreate construct registries that outperform current diagnosis and procedural codes. Clinicians and researchers can use the developed DL approach for quality improvement, diagnostic and prognostic research purposes, and building clinical decision support tools.

16.
Bone Joint J ; 105-B(1): 56-63, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36587260

RESUMO

AIMS: This study aimed to answer the following questions: do 3D-printed models lead to a more accurate recognition of the pattern of complex fractures of the elbow?; do 3D-printed models lead to a more reliable recognition of the pattern of these injuries?; and do junior surgeons benefit more from 3D-printed models than senior surgeons? METHODS: A total of 15 orthopaedic trauma surgeons (seven juniors, eight seniors) evaluated 20 complex elbow fractures for their overall pattern (i.e. varus posterior medial rotational injury, terrible triad injury, radial head fracture with posterolateral dislocation, anterior (trans-)olecranon fracture-dislocation, posterior (trans-)olecranon fracture-dislocation) and their specific characteristics. First, fractures were assessed based on radiographs and 2D and 3D CT scans; and in a subsequent round, one month later, with additional 3D-printed models. Diagnostic accuracy (acc) and inter-surgeon reliability (κ) were determined for each assessment. RESULTS: Accuracy significantly improved with 3D-printed models for the whole group on pattern recognition (acc2D/3D = 0.62 vs acc3Dprint= 0.69; Δacc = 0.07 (95% confidence interval (CI) 0.00 to 0.14); p = 0.025). A significant improvement was also seen in reliability for pattern recognition with the additional 3D-printed models (κ2D/3D = 0.41 (moderate) vs κ3Dprint = 0.59 (moderate); Δκ = 0.18 (95% CI 0.14 to 0.22); p ≤ 0.001). Accuracy was comparable between junior and senior surgeons with the 3D-printed model (accjunior = 0.70 vs accsenior = 0.68; Δacc = -0.02 (95% CI -0.17 to 0.13); p = 0.904). Reliability was also comparable between junior and senior surgeons without the 3D-printed model (κjunior = 0.39 (fair) vs κsenior = 0.43 (moderate); Δκ = 0.03 (95% CI -0.03 to 0.10); p = 0.318). However, junior surgeons showed greater improvement regarding reliability than seniors with 3D-printed models (κjunior = 0.65 (substantial) vs κsenior = 0.54 (moderate); Δκ = 0.11 (95% CI 0.04 to 0.18); p = 0.002). CONCLUSION: The use of 3D-printed models significantly improved the accuracy and reliability of recognizing the pattern of complex fractures of the elbow. However, the current long printing time and non-reusable materials could limit the usefulness of 3D-printed models in clinical practice. They could be suitable as a reusable tool for teaching residents.Cite this article: Bone Joint J 2023;105-B(1):56-63.


Assuntos
Lesões no Cotovelo , Articulação do Cotovelo , Luxações Articulares , Fraturas do Rádio , Fraturas da Ulna , Humanos , Cotovelo , Reprodutibilidade dos Testes , Articulação do Cotovelo/diagnóstico por imagem , Articulação do Cotovelo/cirurgia , Fraturas do Rádio/diagnóstico por imagem , Fraturas do Rádio/cirurgia , Luxações Articulares/cirurgia , Fraturas da Ulna/cirurgia , Impressão Tridimensional
17.
Injury ; 54(7): 110757, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37164900

RESUMO

PURPOSE: Effects of clockwise torque rotation onto proximal femoral fracture fixation have been subject of ongoing debate: fixated right-sided trochanteric fractures seem more rotationally stable than left-sided fractures in the biomechanical setting, but this theoretical advantage has not been demonstrated in the clinical setting to date. The purpose of this study was to identify a difference in early reoperation rate between patients undergoing surgery for left- versus right-sided proximal femur fractures using cephalomedullary nailing (CMN). MATERIALS AND METHODS: The American College of Surgeons National Surgical Quality Improvement Program was queried from 2016-2019 to identify patients aged 50 years and older undergoing CMN for a proximal femoral fracture. The primary outcome was any unplanned reoperation within 30 days following surgery. The difference was calculated using a Chi-square test, and observed power calculated using post-hoc power analysis. RESULTS: In total, of 20,122 patients undergoing CMN for proximal femoral fracture management, 1.8% (n=371) had to undergo an unplanned reoperation within 30 days after surgery. Overall, 208 (2.0%) were left-sided and 163 (1.7%) right-sided fractures (p=0.052, risk ratio [RR] 1.22, 95% confidence interval [CI] 1.00-1.50), odds ratio [OR] 1.23 (95%CI 1.00-1.51), power 49.2% (α=0.05). CONCLUSION: This study shows a higher risk of reoperation for left-sided compared to right-sided proximal femur fractures after CMN in a large sample size. Although results may be underpowered and statistically insignificant, this finding might substantiate the hypothesis that clockwise rotation during implant insertion and (postoperative) weightbearing may lead to higher reoperation rates. LEVEL OF EVIDENCE: Therapeutic level II.


Assuntos
Fraturas do Fêmur , Fixação Intramedular de Fraturas , Fraturas do Quadril , Fraturas Proximais do Fêmur , Humanos , Pessoa de Meia-Idade , Idoso , Reoperação , Torque , Pinos Ortopédicos , Resultado do Tratamento , Fraturas do Fêmur/cirurgia , Fraturas do Quadril/cirurgia , Fêmur , Estudos Retrospectivos
18.
BMJ Open ; 13(10): e074700, 2023 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-37852772

RESUMO

INTRODUCTION: Despite technological advancements in recent years, glenoid component loosening remains a common complication after anatomical total shoulder arthroplasty (ATSA) and is one of the main causes of revision surgery. Increasing emphasis is placed on the prevention of glenoid component failure. Previous studies have successfully predicted range of motion, patient-reported outcomes and short-term complications after ATSA using machine learning methods, but an accurate predictive model for (glenoid component) revision is currently lacking. This study aims to use a large international database to accurately predict aseptic loosening of the glenoid component after ATSA using machine learning algorithms. METHODS AND ANALYSIS: For this multicentre, retrospective study, individual patient data will be compiled from previously published studies reporting revision of ATSA. A systematic literature search will be performed in Medline (PubMed) identifying all studies reporting outcomes of ATSA. Authors will be contacted and invited to participate in the Machine Learning Consortium by sharing their anonymised databases. All databases reporting revisions after ATSA will be included, and individual patients with a follow-up less than 2 years or a fracture as the indication for ATSA will be excluded. First, features (predictive variables) will be identified using a random forest feature selection. The resulting features from the compiled database will be used to train various machine learning algorithms (stochastic gradient boosting, random forest, support vector machine, neural network and elastic-net penalised logistic regression). The developed and validated algorithms will be evaluated across discrimination (c-statistic), calibration, the Brier score and the decision curve analysis. The best-performing algorithm will be used to create an open-access online prediction tool. ETHICS AND DISSEMINATION: Data will be collected adhering to the WHO regulation on data sharing. An Institutional Review Board review is not applicable. The study results will be published in a peer-reviewed journal.


Assuntos
Artroplastia do Ombro , Humanos , Artroplastia do Ombro/efeitos adversos , Estudos Retrospectivos , Escápula , Aprendizado de Máquina , Probabilidade , Resultado do Tratamento , Estudos Multicêntricos como Assunto
19.
Bone Jt Open ; 4(3): 168-181, 2023 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-37051847

RESUMO

To develop prediction models using machine-learning (ML) algorithms for 90-day and one-year mortality prediction in femoral neck fracture (FNF) patients aged 50 years or older based on the Hip fracture Evaluation with Alternatives of Total Hip arthroplasty versus Hemiarthroplasty (HEALTH) and Fixation using Alternative Implants for the Treatment of Hip fractures (FAITH) trials. This study included 2,388 patients from the HEALTH and FAITH trials, with 90-day and one-year mortality proportions of 3.0% (71/2,388) and 6.4% (153/2,388), respectively. The mean age was 75.9 years (SD 10.8) and 65.9% of patients (1,574/2,388) were female. The algorithms included patient and injury characteristics. Six algorithms were developed, internally validated and evaluated across discrimination (c-statistic; discriminative ability between those with risk of mortality and those without), calibration (observed outcome compared to the predicted probability), and the Brier score (composite of discrimination and calibration). The developed algorithms distinguished between patients at high and low risk for 90-day and one-year mortality. The penalized logistic regression algorithm had the best performance metrics for both 90-day (c-statistic 0.80, calibration slope 0.95, calibration intercept -0.06, and Brier score 0.039) and one-year (c-statistic 0.76, calibration slope 0.86, calibration intercept -0.20, and Brier score 0.074) mortality prediction in the hold-out set. Using high-quality data, the ML-based prediction models accurately predicted 90-day and one-year mortality in patients aged 50 years or older with a FNF. The final models must be externally validated to assess generalizability to other populations, and prospectively evaluated in the process of shared decision-making.

20.
Eur J Trauma Emerg Surg ; 48(6): 4669-4682, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35643788

RESUMO

PURPOSE: Preoperative prediction of mortality in femoral neck fracture patients aged 65 years or above may be valuable in the treatment decision-making. A preoperative clinical prediction model can aid surgeons and patients in the shared decision-making process, and optimize care for elderly femoral neck fracture patients. This study aimed to develop and internally validate a clinical prediction model using machine learning (ML) algorithms for 90 day and 2 year mortality in femoral neck fracture patients aged 65 years or above. METHODS: A retrospective cohort study at two trauma level I centers and three (non-level I) community hospitals was conducted to identify patients undergoing surgical fixation for a femoral neck fracture. Five different ML algorithms were developed and internally validated and assessed by discrimination, calibration, Brier score and decision curve analysis. RESULTS: In total, 2478 patients were included with 90 day and 2 year mortality rates of 9.1% (n = 225) and 23.5% (n = 582) respectively. The models included patient characteristics, comorbidities and laboratory values. The stochastic gradient boosting algorithm had the best performance for 90 day mortality prediction, with good discrimination (c-statistic = 0.74), calibration (intercept = - 0.05, slope = 1.11) and Brier score (0.078). The elastic-net penalized logistic regression algorithm had the best performance for 2 year mortality prediction, with good discrimination (c-statistic = 0.70), calibration (intercept = - 0.03, slope = 0.89) and Brier score (0.16). The models were incorporated into a freely available web-based application, including individual patient explanations for interpretation of the model to understand the reasoning how the model made a certain prediction: https://sorg-apps.shinyapps.io/hipfracturemortality/ CONCLUSIONS: The clinical prediction models show promise in estimating mortality prediction in elderly femoral neck fracture patients. External and prospective validation of the models may improve surgeon ability when faced with the treatment decision-making. LEVEL OF EVIDENCE: Prognostic Level II.


Assuntos
Fraturas do Colo Femoral , Idoso , Humanos , Estudos Retrospectivos , Fraturas do Colo Femoral/cirurgia , Modelos Estatísticos , Prognóstico , Aprendizado de Máquina , Algoritmos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA