Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 255
Filtrar
1.
Front Artif Intell ; 7: 1419638, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39301479

RESUMEN

Introduction: Deep learning (DL) has significantly advanced medical image classification. However, it often relies on transfer learning (TL) from models pretrained on large, generic non-medical image datasets like ImageNet. Conversely, medical images possess unique visual characteristics that such general models may not adequately capture. Methods: This study examines the effectiveness of modality-specific pretext learning strengthened by image denoising and deblurring in enhancing the classification of pediatric chest X-ray (CXR) images into those exhibiting no findings, i.e., normal lungs, or with cardiopulmonary disease manifestations. Specifically, we use a VGG-16-Sharp-U-Net architecture and leverage its encoder in conjunction with a classification head to distinguish normal from abnormal pediatric CXR findings. We benchmark this performance against the traditional TL approach, viz., the VGG-16 model pretrained only on ImageNet. Measures used for performance evaluation are balanced accuracy, sensitivity, specificity, F-score, Matthew's Correlation Coefficient (MCC), Kappa statistic, and Youden's index. Results: Our findings reveal that models developed from CXR modality-specific pretext encoders substantially outperform the ImageNet-only pretrained model, viz., Baseline, and achieve significantly higher sensitivity (p < 0.05) with marked improvements in balanced accuracy, F-score, MCC, Kappa statistic, and Youden's index. A novel attention-based fuzzy ensemble of the pretext-learned models further improves performance across these metrics (Balanced accuracy: 0.6376; Sensitivity: 0.4991; F-score: 0.5102; MCC: 0.2783; Kappa: 0.2782, and Youden's index:0.2751), compared to Baseline (Balanced accuracy: 0.5654; Sensitivity: 0.1983; F-score: 0.2977; MCC: 0.1998; Kappa: 0.1599, and Youden's index:0.1327). Discussion: The superior results of CXR modality-specific pretext learning and their ensemble underscore its potential as a viable alternative to conventional ImageNet pretraining for medical image classification. Results from this study promote further exploration of medical modality-specific TL techniques in the development of DL models for various medical imaging applications.

2.
Lab Anim ; : 236772241248509, 2024 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-39157984

RESUMEN

Absence of statistical significance (i.e., p > 0.05) in the results of a frequentist test comparing two samples is often used as evidence of absence of difference, or absence of effect of a treatment, on the measured variable. Such conclusions are often wrong because absence of significance may merely result from a sample size that is too small to reveal an effect. To conclude that there is no meaningful effect of a treatment/condition, it is necessary to use an appropriate statistical approach. For frequentist statistics, a simple tool for this goal is the 'two one-sided t-test,' a form of equivalence test that relies on the a priori definition of a minimal difference considered to be relevant. In other words, the smallest effect size of interest should be established in advance. We present the principles of this test and give examples where it allows correct interpretation of the results of a classical t-test assuming absence of difference. Equivalence tests are also very useful in probing whether certain significant results are also biologically meaningful, because when comparing large samples it is possible to find significant results in both an equivalence test and in a two-sample t-test, assuming no difference as the null hypothesis.

3.
Glob Epidemiol ; 8: 100151, 2024 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-39021384

RESUMEN

As widely noted in the literature and by international bodies such as the American Statistical Association, severe misinterpretations of P-values, confidence intervals, and statistical significance are sadly common in public health. This scenario poses serious risks concerning terminal decisions such as the approval or rejection of therapies. Cognitive distortions about statistics likely stem from poor teaching in schools and universities, overly simplified interpretations, and - as we suggest - the reckless use of calculation software with predefined standardized procedures. In light of this, we present a framework to recalibrate the role of frequentist-inferential statistics within clinical and epidemiological research. In particular, we stress that statistics is only a set of rules and numbers that make sense only when properly placed within a well-defined scientific context beforehand. Practical examples are discussed for educational purposes. Alongside this, we propose some tools to better evaluate statistical outcomes, such as multiple compatibility or surprisal intervals or tuples of various point hypotheses. Lastly, we emphasize that every conclusion must be informed by different kinds of scientific evidence (e.g., biochemical, clinical, statistical, etc.) and must be based on a careful examination of costs, risks, and benefits.

4.
J Oral Rehabil ; 2024 Jul 02.
Artículo en Inglés | MEDLINE | ID: mdl-38956893

RESUMEN

BACKGROUND: The proper interpretation of a study's results requires both excellent understanding of good methodological practices and deep knowledge of prior results, aided by the availability of effect sizes. METHODS: This review takes the form of an expository essay exploring the complex and nuanced relationships among statistical significance, clinical importance, and effect sizes. RESULTS: Careful attention to study design and methodology will increase the likelihood of obtaining statistical significance and may enhance the ability of investigators/readers to accurately interpret results. Measures of effect size show how well the variables used in a study account for/explain the variability in the data. Studies reporting strong effects may have greater practical value/utility than studies reporting weak effects. Effect sizes need to be interpreted in context. Verbal summary characterizations of effect sizes (e.g., "weak", "strong") are fundamentally flawed and can lead to inappropriate characterization of results. Common language effect size (CLES) indicators are a relatively new approach to effect sizes that may offer a more accessible interpretation of results that can benefit providers, patients, and the public at large. CONCLUSIONS: It is important to convey research findings in ways that are clear to both the research community and to the public. At a minimum, this requires inclusion of standard effect size data in research reports. Proper selection of measures and careful design of studies are foundational to the interpretation of a study's results. The ability to draw useful conclusions from a study is increased when investigators enhance the methodological quality of their work.

5.
Biochem Genet ; 2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-38951354

RESUMEN

The genomic evaluation process relies on the assumption of linkage disequilibrium between dense single-nucleotide polymorphism (SNP) markers at the genome level and quantitative trait loci (QTL). The present study was conducted with the aim of evaluating four frequentist methods including Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO), Elastic Net, and Genomic Best Linear Unbiased Prediction (GBLUP) and five Bayesian methods including Bayes Ridge Regression (BRR), Bayes A, Bayesian LASSO, Bayes C, and Bayes B, in genomic selection using simulation data. The difference between prediction accuracy was assessed in pairs based on statistical significance (p-value) (i.e., t test and Mann-Whitney U test) and practical significance (Cohen's d effect size) For this purpose, the data were simulated based on two scenarios in different marker densities (4000 and 8000, in the whole genome). The simulated data included a genome with four chromosomes, 1 Morgan each, on which 100 randomly distributed QTL and two different densities of evenly distributed SNPs (1000 and 2000), at the heritability level of 0.4, was considered. For the frequentist methods except for GBLUP, the regularization parameter λ was calculated using a five-fold cross-validation approach. For both scenarios, among the frequentist methods, the highest prediction accuracy was observed by Ridge Regression and GBLUP. The lowest and the highest bias were shown by Ridge Regression and GBLUP, respectively. Also, among the Bayesian methods, Bayes B and BRR showed the highest and lowest prediction accuracy, respectively. The lowest bias in both scenarios was registered by Bayesian LASSO and the highest bias in the first and the second scenario were shown by BRR and Bayes B, respectively. Across all the studied methods in both scenarios, the highest and the lowest accuracy were shown by Bayes B and LASSO and Elastic Net, respectively. As expected, the greatest similarity in performance was observed between GBLUP and BRR ( d = 0.007 , in the first scenario and d = 0.003 , in the second scenario). The results obtained from parametric t and non-parametric Mann-Whitney U tests were similar. In the first and second scenario, out of 36 t test between the performance of the studied methods in each scenario, 14 ( P < . 001 ) and 2 ( P < . 05 ) comparisons were significant, respectively, which indicates that with the increase in the number of predictors, the difference in the performance of different methods decreases. This was proven based on the Cohen's d effect size, so that with the increase in the complexity of the model, the effect size was not seen as very large. The regularization parameters in frequentist methods should be optimized by cross-validation approach before using these methods in genomic evaluation.

6.
Indian J Psychol Med ; 46(4): 356-357, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39056029

RESUMEN

This article presents a table containing redacted data from a real study. The table contains three curiosities: statistical significance in the absence of clinical significance, narrow standard deviations, and the absence of a placebo effect. The data in the table had been obtained by an inexperienced rater; how the inexperience compromised the data is explained. Action points for rater experience, rater training, and rating procedures are suggested.

7.
Front Neurosci ; 18: 1405734, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38855440

RESUMEN

Objective: In this work, we propose a novel method for constructing whole-brain spatio-temporal multilayer functional connectivity networks (FCNs) and four innovative rich-club metrics. Methods: Spatio-temporal multilayer FCNs achieve a high-order representation of the spatio-temporal dynamic characteristics of brain networks by combining the sliding time window method with graph theory and hypergraph theory. The four proposed rich-club scales are based on the dynamic changes in rich-club node identity, providing a parameterized description of the topological dynamic characteristics of brain networks from both temporal and spatial perspectives. The proposed method was validated in three independent differential analysis experiments: male-female gender difference analysis, analysis of abnormality in patients with autism spectrum disorders (ASD), and individual difference analysis. Results: The proposed method yielded results consistent with previous relevant studies and revealed some innovative findings. For instance, the dynamic topological characteristics of specific white matter regions effectively reflected individual differences. The increased abnormality in internal functional connectivity within the basal ganglia may be a contributing factor to the occurrence of repetitive or restrictive behaviors in ASD patients. Conclusion: The proposed methodology provides an efficacious approach for constructing whole-brain spatio-temporal multilayer FCNs and conducting analysis of their dynamic topological structures. The dynamic topological characteristics of spatio-temporal multilayer FCNs may offer new insights into physiological variations and pathological abnormalities in neuroscience.

8.
Neumol. pediátr. (En línea) ; 19(2): 41-45, jun. 2024. tab
Artículo en Español | LILACS | ID: biblio-1566983

RESUMEN

Una de las dificultades más comunes que enfrentan los lectores de artículos del área de la biomedicina y epidemiología es la interpretación del término "significativo". El término "estadísticamente significativo" a menudo se malinterpreta como un resultado "clínicamente significativo". La confusión surge del hecho de que muchas personas equiparan "significativo" con su significado literal de "importante", sin embargo, la significación estadística cuantifica la probabilidad de que los resultados de un estudio se deban al azar, mientras que la significancia clínica refleja la importancia práctica o relevancia en el contexto de la atención médica o práctica clínica. Este artículo aborda la diferencia entre la significación estadística y la relevancia o importancia clínica en la interpretación de los resultados de una investigación biomédica.


One of the most common difficulties faced by readers of articles in the area of biomedicine and epidemiology is the interpretation of the term "significant". The term "statistically significant" is often misinterpreted as a "clinically significant" result. Confusion arises from the fact that many people equate "significant" with its literal meaning of "important," however, statistical significance quantifies the probability that the results of a study are due to chance, while clinical significance reflects the practical importance or relevance in the context of health care or clinical practice. This article addresses the difference between statistical significance and clinical relevance or importance in the interpretation of biomedical research results.


Asunto(s)
Interpretación Estadística de Datos , Investigación Biomédica
9.
Creat Nurs ; 30(2): 100-102, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38679581

RESUMEN

This article traces the development of Creative Nursing from its origin in 1981 as a newsletter about Primary Nursing to its current position as a quarterly international, interdisciplinary, peer-reviewed, indexed, themed journal that continues to nurture novice authors, welcome international submissions, review articles that other journals won't consider, and address subjects that many journals avoid. Future directions include content in multiple languages, new author guidelines that invite submissions of research methods papers, moving beyond statistical significance based on p-value thresholds, asking authors to make explicit the implications for knowledge translation in their papers, and thinking creatively about how artificial intelligence can be leveraged for research, education, and practice.


Asunto(s)
Creatividad , Humanos , Historia del Siglo XXI , Publicaciones Periódicas como Asunto , Historia del Siglo XX , Predicción
10.
Comput Med Imaging Graph ; 115: 102379, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38608333

RESUMEN

Deep learning (DL) has demonstrated its innate capacity to independently learn hierarchical features from complex and multi-dimensional data. A common understanding is that its performance scales up with the amount of training data. However, the data must also exhibit variety to enable improved learning. In medical imaging data, semantic redundancy, which is the presence of similar or repetitive information, can occur due to the presence of multiple images that have highly similar presentations for the disease of interest. Also, the common use of augmentation methods to generate variety in DL training could limit performance when indiscriminately applied to such data. We hypothesize that semantic redundancy would therefore tend to lower performance and limit generalizability to unseen data and question its impact on classifier performance even with large data. We propose an entropy-based sample scoring approach to identify and remove semantically redundant training data and demonstrate using the publicly available NIH chest X-ray dataset that the model trained on the resulting informative subset of training data significantly outperforms the model trained on the full training set, during both internal (recall: 0.7164 vs 0.6597, p<0.05) and external testing (recall: 0.3185 vs 0.2589, p<0.05). Our findings emphasize the importance of information-oriented training sample selection as opposed to the conventional practice of using all available training data.


Asunto(s)
Aprendizaje Profundo , Radiografía Torácica , Semántica , Humanos
11.
Sci Total Environ ; 928: 172427, 2024 Jun 10.
Artículo en Inglés | MEDLINE | ID: mdl-38614337

RESUMEN

This research analyzed the real-world NOx and particle number (PN) emissions of 21 China VI heavy-duty diesel trucks (HDDTs). On-road emission conformity was first evaluated with portable emission measurement system (PEMS). Only 76.19 %, 71.43 % and 61.90 % of the vehicles passed the NOx test, PN test and both tests, respectively. The impacts of vehicle features including exhaust gas recirculation (EGR) equipment, mileage and tractive tonnage were then assessed. Results demonstrated that EGR helped reducing NOx emission factors (EFs) while increased PN EFs. Larger mileages and tractive tonnages corresponded to higher NOx and PN EFs, respectively. In-depth analyses regarding the influences of operating conditions on emissions were conducted with both numerical comparisons and statistical tests. Results proved that HDDTs generated higher NOx EFs under low speeds or large vehicle specific powers (VSPs), and higher PN EFs under high speeds or small VSPs in general. In addition, unqualified vehicles generated significantly higher NOx EFs than qualified vehicles on freeways or under speed≥40 km/h, while significant higher PN EFs were generated on suburban roads, freeways or under operating modes with positive VSPs by unqualified vehicles. The reliability and accuracy of on-board diagnostic (OBD) NOx data were finally investigated. Results revealed that 43 % of the test vehicles did not report reliable OBD data. Correlation analyses between OBD NOx and PEMS measurements further demonstrated that the consistency of instantaneous concentrations were generally low. However, sliding window averaged concentrations show better correlations, e.g., the Pearson correlation coefficients on 20s-window averaged concentrations exceeded 0.85 for most vehicles. The research results provide valuable insights into emission regulation, e.g., focusing more on medium- to high-speed operations to identify unqualified vehicles, setting higher standards to improve the quality of OBD data, and adopting window averaged OBD NOx concentrations in evaluating vehicle emission performance.

12.
Podium (Pinar Río) ; 19(1)abr. 2024.
Artículo en Español | LILACS-Express | LILACS | ID: biblio-1550622

RESUMEN

El presente estudio constituye un trabajo trascendente en el área del conocimiento de la condición física y representa el resultado de investigaciones realizadas en la República de Cuba y en los Estados Unidos Mexicanos como respuesta a la solicitud de ambos países. Fue diseñado estadísticamente, para representar datos oficiales y altamente confiables, con el objetivo de conocer el estado de la condición física de las dos naciones y valorar así, el efecto de los programas de Educación Física que se aplican. Se contó con el apoyo de las organizaciones deportivas y de cultura física al conformar los estudios, cuidadosamente tratados en el diseño de muestra, para ello se contó con un equipo de estadísticos especialistas que tuvieron a su cargo el procesamiento de la información. Los datos de este estudio se consideraron limitados para la publicación y una vez desclasificados se dan conocer. Se utilizaron iguales metodologías en su aplicación, lo que resulta una información valiosa para el perfeccionamiento de los planes y programas que en el campo de la Licenciatura en Cultura Física y se brinda una información que, en su comparación, llama a la reflexión de los especialistas de Educación Física, para continuar el perfeccionamiento de estas especialidades, en general.


O presente estudo constitui um trabalho transcendental na área do conhecimento da aptidão física e representa o resultado de uma pesquisa realizada na República de Cuba e nos Estados Unidos Mexicanos em resposta à solicitação de ambos os países. Foi projetado estatisticamente para representar dados oficiais e altamente confiáveis, com o objetivo de conhecer o estado da aptidão física em ambos os países e, assim, avaliar o efeito dos programas de Educação Física aplicados. As organizações esportivas e de cultura física foram apoiadas na elaboração dos estudos, cuidadosamente tratadas no desenho da amostra, com a ajuda de uma equipe de estatísticos especializados que foram responsáveis pelo processamento das informações. Os dados deste estudo foram considerados limitados para publicação e, uma vez desclassificados, são tornados públicos. Foram utilizadas as mesmas metodologias em sua aplicação, o que resulta em informações valiosas para o aprimoramento dos planos e programas no campo da cultura física e fornece informações que, em sua comparação, exigem a reflexão dos especialistas em educação física, a fim de continuar o aprimoramento dessas especialidades em geral.


The present study constitutes a transcendent work in the area of knowledge of physical condition and represents the result of research carried out in the Republic of Cuba and in the United Mexican States in response to the request of both countries. It was designed statistically, to represent official and highly reliable data, with the objective of knowing the state of the physical condition of the two nations and thus evaluating the effect of the Physical Education programs that are applied. It was had the support of sports and physical culture organizations when forming the studies, carefully treated in the sample design, for this it was had a team of specialist statisticians who were in charge of processing the information. The data from this study was considered limited for publication and will be released once declassified. The same methodologies were used in its application, which is valuable information for the improvement of plans and programs in the field of the Bachelor's Degree in Physical Culture and information is provided that, in comparison, calls for reflection by specialists. of Physical Education, to continue the improvement of these specialties, in general.

13.
J Arthroplasty ; 39(7): 1882-1887, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38309638

RESUMEN

BACKGROUND: Fragility analysis is a method of further characterizing outcomes in terms of the stability of statistical findings. This study assesses the statistical fragility of recent randomized controlled trials (RCTs) evaluating robotic-assisted versus conventional total knee arthroplasty (RA-TKA versus C-TKA). METHODS: We queried PubMed for RCTs comparing alignment, function, and outcomes between RA-TKA and C-TKA. Fragility index (FI) and reverse fragility index (RFI) (collectively, "FI") were calculated for dichotomous outcomes as the number of outcome reversals needed to change statistical significance. Fragility quotient (FQ) was calculated by dividing the FI by the sample size for that outcome event. Median FI and FQ were calculated for all outcomes collectively as well as for each individual outcome. Subanalyses were performed to assess FI and FQ based on outcome event type and statistical significance, as well as study loss to follow-up and year of publication. RESULTS: The overall median FI was 3.0 (interquartile range, [IQR] 1.0 to 6.3) and the median reverse fragility index was 3.0 (IQR 2.0 to 4.0). The overall median FQ was 0.027 (IQR 0.012 to 0.050). Loss to follow-up was greater than FI for 23 of the 38 outcomes assessed. CONCLUSIONS: A small number of alternative outcomes is often enough to reverse the statistical significance of findings in RCTs evaluating dichotomous outcomes in RA-TKA versus C-TKA. We recommend reporting FI and FQ alongside P values to improve the interpretability of RCT results.


Asunto(s)
Artroplastia de Reemplazo de Rodilla , Ensayos Clínicos Controlados Aleatorios como Asunto , Procedimientos Quirúrgicos Robotizados , Artroplastia de Reemplazo de Rodilla/métodos , Humanos , Procedimientos Quirúrgicos Robotizados/métodos , Resultado del Tratamiento , Estudios Transversales , Articulación de la Rodilla/cirugía
14.
Clin Interv Aging ; 19: 277-287, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38380229

RESUMEN

Null hypothesis significant testing (NHST) is the dominant statistical approach in the geriatric and rehabilitation fields. However, NHST is routinely misunderstood or misused. In this case, the findings from clinical trials would be taken as evidence of no effect, when in fact, a clinically relevant question may have a "non-significant" p-value. Conversely, findings are considered clinically relevant when significant differences are observed between groups. To assume that p-value is not an exclusive indicator of an association or the existence of an effect, researchers should be encouraged to report other statistical analysis approaches as Bayesian analysis and complementary statistical tools alongside the p-value (eg, effect size, confidence intervals, minimal clinically important difference, and magnitude-based inference) to improve interpretation of the findings of clinical trials by presenting a more efficient and comprehensive analysis. However, the focus on Bayesian analysis and secondary statistical analyses does not mean that NHST is less important. Only that, to observe a real intervention effect, researchers should use a combination of secondary statistical analyses in conjunction with NHST or Bayesian statistical analysis to reveal what p-values cannot show in the geriatric and rehabilitation studies (eg, the clinical importance of 1kg increase in handgrip strength in the intervention group of long-lived older adults compared to a control group). This paper provides potential insights for improving the interpretation of scientific data in rehabilitation and geriatric fields by utilizing Bayesian and secondary statistical analyses to better scrutinize the results of clinical trials where a p-value alone may not be appropriate to determine the efficacy of an intervention.


Asunto(s)
Fuerza de la Mano , Proyectos de Investigación , Humanos , Anciano , Teorema de Bayes , Interpretación Estadística de Datos
15.
Am J Sports Med ; 52(10): 2667-2675, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38258495

RESUMEN

BACKGROUND: Evidence-based care relies on robust research. The fragility index (FI) is used to assess the robustness of statistically significant findings in randomized controlled trials (RCTs). While the traditional FI is limited to dichotomous outcomes, a novel tool, the continuous fragility index (CFI), allows for the assessment of the robustness of continuous outcomes. PURPOSE: To calculate the CFI of statistically significant continuous outcomes in RCTs evaluating interventions for managing anterior shoulder instability (ASI). STUDY DESIGN: Meta-analysis; Level of evidence, 2. METHODS: A search was conducted across the MEDLINE, Embase, and CENTRAL databases for RCTs assessing management strategies for ASI from inception to October 6, 2022. Studies that reported a statistically significant difference between study groups in ≥1 continuous outcome were included. The CFI was calculated and applied to all available RCTs reporting interventions for ASI. Multivariable linear regression was performed between the CFI and various study characteristics as predictors. RESULTS: There were 27 RCTs, with a total of 1846 shoulders, included. The median sample size was 61 shoulders (IQR, 43). The median CFI across 27 RCTs was 8.2 (IQR, 17.2; 95% CI, 3.6-15.4). The median CFI was 7.9 (IQR, 21; 95% CI, 1-22) for 11 studies comparing surgical methods, 22.6 (IQR, 16; 95% CI, 8.2-30.4) for 6 studies comparing nonsurgical reduction interventions, 2.8 for 3 studies comparing immobilization methods, and 2.4 for 3 studies comparing surgical versus nonsurgical interventions. Significantly, 22 of 57 included outcomes (38.6%) from studies with completed follow-up data had a loss to follow-up exceeding their CFI. Multivariable regression demonstrated that there was a statistically significant positive correlation between a trial's sample size and the CFI of its outcomes (r = 0.23 [95% CI, 0.13-0.33]; P < .001). CONCLUSION: More than a third of continuous outcomes in ASI trials had a CFI less than the reported loss to follow-up. This carries the significant risk of reversing trial findings and should be considered when evaluating available RCT data. We recommend including the FI, CFI, and loss to follow-up in the abstracts of future RCTs.


Asunto(s)
Inestabilidad de la Articulación , Ensayos Clínicos Controlados Aleatorios como Asunto , Humanos , Inestabilidad de la Articulación/cirugía , Inestabilidad de la Articulación/terapia , Articulación del Hombro/cirugía , Articulación del Hombro/fisiopatología
17.
ArXiv ; 2023 Sep 18.
Artículo en Inglés | MEDLINE | ID: mdl-37986725

RESUMEN

Deep learning (DL) has demonstrated its innate capacity to independently learn hierarchical features from complex and multi-dimensional data. A common understanding is that its performance scales up with the amount of training data. Another data attribute is the inherent variety. It follows, therefore, that semantic redundancy, which is the presence of similar or repetitive information, would tend to lower performance and limit generalizability to unseen data. In medical imaging data, semantic redundancy can occur due to the presence of multiple images that have highly similar presentations for the disease of interest. Further, the common use of augmentation methods to generate variety in DL training may be limiting performance when applied to semantically redundant data. We propose an entropy-based sample scoring approach to identify and remove semantically redundant training data. We demonstrate using the publicly available NIH chest X-ray dataset that the model trained on the resulting informative subset of training data significantly outperforms the model trained on the full training set, during both internal (recall: 0.7164 vs 0.6597, p<0.05) and external testing (recall: 0.3185 vs 0.2589, p<0.05). Our findings emphasize the importance of information-oriented training sample selection as opposed to the conventional practice of using all available training data.

18.
MAGMA ; 2023 Nov 21.
Artículo en Inglés | MEDLINE | ID: mdl-37989921

RESUMEN

OBJECTIVE: This study aims to assess the statistical significance of training parameters in 240 dense UNets (DUNets) used for enhancing low Signal-to-Noise Ratio (SNR) and undersampled MRI in various acquisition protocols. The objective is to determine the validity of differences between different DUNet configurations and their impact on image quality metrics. MATERIALS AND METHODS: To achieve this, we trained all DUNets using the same learning rate and number of epochs, with variations in 5 acquisition protocols, 24 loss function weightings, and 2 ground truths. We calculated evaluation metrics for two metric regions of interest (ROI). We employed both Analysis of Variance (ANOVA) and Mixed Effects Model (MEM) to assess the statistical significance of the independent parameters, aiming to compare their efficacy in revealing differences and interactions among fixed parameters. RESULTS: ANOVA analysis showed that, except for the acquisition protocol, fixed variables were statistically insignificant. In contrast, MEM analysis revealed that all fixed parameters and their interactions held statistical significance. This emphasizes the need for advanced statistical analysis in comparative studies, where MEM can uncover finer distinctions often overlooked by ANOVA. DISCUSSION: These findings highlight the importance of utilizing appropriate statistical analysis when comparing different deep learning models. Additionally, the surprising effectiveness of the UNet architecture in enhancing various acquisition protocols underscores the potential for developing improved methods for characterizing and training deep learning models. This study serves as a stepping stone toward enhancing the transparency and comparability of deep learning techniques for medical imaging applications.

19.
Injury ; 54 Suppl 5: 110764, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37923502

RESUMEN

Clinical relevance and statistical significance are different concepts, linked via the sample size calculation. Threshold values for detecting a minimal important change over time are frequently (mis)interpreted as a threshold for the clinical relevance of a difference between groups. The magnitude of a difference between groups that is considered clinically relevant directly impacts the sample size calculation, and thereby the statistical significance in clinical study outcomes. Especially in non-inferiority trials the threshold for clinical relevance, i.e. the predefined margin for non-inferiority, is a crucial choice. A truly inferior treatment will be accepted as non-inferior when this margin is chosen too large. The magnitude of a clinically relevant difference between groups should be carefully considered, by determining the smallest effect for each specific study that is considered worthwhile. This means taking into account the (dis)advantages of both study interventions in terms of benefits, harms, costs, and potential side effects. This article clarifies common sources of confusion, illustrates the implications for clinical research with an example and provides specific suggestions to improve the design and interpretation of clinical research.


Asunto(s)
Relevancia Clínica , Proyectos de Investigación , Humanos
20.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37930023

RESUMEN

Local associations refer to spatial-temporal correlations that emerge from the biological realm, such as time-dependent gene co-expression or seasonal interactions between microbes. One can reveal the intricate dynamics and inherent interactions of biological systems by examining the biological time series data for these associations. To accomplish this goal, local similarity analysis algorithms and statistical methods that facilitate the local alignment of time series and assess the significance of the resulting alignments have been developed. Although these algorithms were initially devised for gene expression analysis from microarrays, they have been adapted and accelerated for multi-omics next generation sequencing datasets, achieving high scientific impact. In this review, we present an overview of the historical developments and recent advances for local similarity analysis algorithms, their statistical properties, and real applications in analyzing biological time series data. The benchmark data and analysis scripts used in this review are freely available at http://github.com/labxscut/lsareview.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Factores de Tiempo , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Benchmarking
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA