Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 470
Filtrar
1.
J Comput Graph Stat ; 33(2): 463-476, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39211031

RESUMEN

In modern data science, higher criticism (HC) method is effective for detecting rare and weak signals. The computation, however, has long been an issue when the number of p-values combined ( K ) and/or the number of repeated HC tests ( N ) are large. Some computing methods have been developed, but they all have significant shortcomings, especially when a stringent significance level is required. In this paper, we propose an accurate and highly efficient computing strategy for four variations of HC. Specifically, we propose an unbiased cross-entropy-based importance sampling method ( IS C E ) to benchmark all existing computing methods, and develop a modified SetTest method (MST) that resolves numerical issues of the existing SetTest approach. We further develop an ultra-fast approach (UFI) combining pre-calculated statistical tables and cubic spline interpolation. Finally, following extensive simulations, we provide a computing strategy integrating MST, UFI and other existing methods with R package "HCp" for virtually any K and small p-values ( ∼ 10 - 20 ). The method is applied to a COVID-19 disease surveillance example for spatio-temporal outbreak detection from case numbers of 804 days in 3,342 counties in the United States. Results confirm viability of the computing strategy for large-scale inferences. Supplementary materials for this article are available online.

2.
Cureus ; 16(5): e61457, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38953092

RESUMEN

This study investigates the effectiveness of multiple COVID-19 vaccinations on daily confirmed cases in Seoul City. Utilizing comprehensive data on vaccinated individuals and confirmed cases sourced from the official website of the Korean Ministry of the Interior and Safety, we conducted detailed statistical analyses to assess the impact of each vaccination dose. The study covers data from April 21, 2021, to September 29, 2022. Statistical multiple linear regression was employed to analyze the relationship between daily confirmed cases (positive outcomes from PCR tests) and multiple vaccine doses, using p-values as the criteria for determining the effectiveness of each dose. The analysis included data from four vaccination doses. The analysis reveals that the first, second, and third doses of the COVID-19 vaccines have a statistically significant positive effect associated with the daily confirmed cases. However, the study finds that the fourth dose does not show a statistically significant impact on the reduction of daily confirmed cases. This suggests that while the initial three doses are crucial for establishing and maintaining high levels of immunity, the incremental benefit of subsequent doses may diminish.

3.
Pharm Stat ; 2024 Jul 11.
Artículo en Inglés | MEDLINE | ID: mdl-38992926

RESUMEN

Clinical trials with continuous primary endpoints typically measure outcomes at baseline, at a fixed timepoint (denoted Tmin), and at intermediate timepoints. The analysis is commonly performed using the mixed model repeated measures method. It is sometimes expected that the effect size will be larger with follow-up longer than Tmin. But extending the follow-up for all patients delays trial completion. We propose an alternative trial design and analysis method that potentially increases statistical power without extending the trial duration or increasing the sample size. We propose following the last enrolled patient until Tmin, with earlier enrollees having variable follow-up durations up to a maximum of Tmax. The sample size at Tmax will be smaller than at Tmin, and due to staggered enrollment, data missing at Tmax will be missing completely at random. For analysis, we propose an alpha-adjusted procedure based on the smaller of the p values at Tmin and Tmax, termed minP $$ minP $$ . This approach can provide the highest power when the powers at Tmin and Tmax are similar. If the power at Tmin and Tmax differ significantly, the power of minP $$ minP $$ is modestly reduced compared with the larger of the two powers. Rare disease trials, due to the limited size of the patient population, may benefit the most with this design.

4.
Rev. neurol. (Ed. impr.) ; 78(7): 209-211, Ene-Jun, 2024.
Artículo en Español | IBECS | ID: ibc-232183

RESUMEN

Las revistas científicas más importantes en campos como medicina, biología y sociología publican reiteradamente artículos y editoriales denunciando que un gran porcentaje de médicos no entiende los conceptos básicos del análisis estadístico, lo que favorece el riesgo de cometer errores al interpretar los datos, los hace más vulnerables frente a informaciones falsas y reduce la eficacia de la investigación. Este problema se extiende a lo largo de toda su carrera profesional y se debe, en gran parte, a una enseñanza deficiente en estadística que es común en países desarrollados. En palabras de H. Halle y S. Krauss, ‘el 90% de los profesores universitarios alemanes que usan con asiduidad el valor de p de los test no entiende lo que mide ese valor’. Es importante destacar que los razonamientos básicos del análisis estadístico son similares a los que realizamos en nuestra vida cotidiana y que comprender los conceptos básicos del análisis estadístico no requiere conocimiento matemático alguno. En contra de lo que muchos investigadores creen, el valor de p del test no es un ‘índice matemático’ que nos permita concluir claramente si, por ejemplo, un fármaco es más efectivo que el placebo. El valor de p del test es simplemente un porcentaje.(AU)


Abstract. Leading scientific journals in fields such as medicine, biology and sociology repeatedly publish articles and editorials claiming that a large percentage of doctors do not understand the basics of statistical analysis, which increases the risk of errors in interpreting data, makes them more vulnerable to misinformation and reduces the effectiveness of research. This problem extends throughout their careers and is largely due to the poor training they receive in statistics – a problem that is common in developed countries. As stated by H. Halle and S. Krauss, ‘90% of German university lecturers who regularly use the p-value in tests do not understand what that value actually measures’. It is important to note that the basic reasoning of statistical analysis is similar to what we do in our daily lives and that understanding the basic concepts of statistical analysis does not require any knowledge of mathematics. Contrary to what many researchers believe, the p-value of the test is not a ‘mathematical index’ that allows us to clearly conclude whether, for example, a drug is more effective than a placebo. The p-value of the test is simply a percentage.(AU)


Asunto(s)
Humanos , Masculino , Femenino , Investigación Biomédica , Publicación Periódica , Publicaciones Científicas y Técnicas , Pruebas de Hipótesis , Valor Predictivo de las Pruebas
5.
Int J Mol Sci ; 25(12)2024 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-38928295

RESUMEN

The genomic analyses of pediatric acute lymphoblastic leukemia (ALL) subtypes, particularly T-cell and B-cell lineages, have been pivotal in identifying potential therapeutic targets. Typical genomic analyses have directed attention toward the most commonly mutated genes. However, assessing the contribution of mutations to cancer phenotypes is crucial. Therefore, we estimated the cancer effects (scaled selection coefficients) for somatic substitutions in T-cell and B-cell cohorts, revealing key insights into mutation contributions. Cancer effects for well-known, frequently mutated genes like NRAS and KRAS in B-ALL were high, which underscores their importance as therapeutic targets. However, less frequently mutated genes IL7R, XBP1, and TOX also demonstrated high cancer effects, suggesting pivotal roles in the development of leukemia when present. In T-ALL, KRAS and NRAS are less frequently mutated than in B-ALL. However, their cancer effects when present are high in both subtypes. Mutations in PIK3R1 and RPL10 were not at high prevalence, yet exhibited some of the highest cancer effects in individual T-cell ALL patients. Even CDKN2A, with a low prevalence and relatively modest cancer effect, is potentially highly relevant for the epistatic effects that its mutated form exerts on other mutations. Prioritizing investigation into these moderately frequent but potentially high-impact targets not only presents novel personalized therapeutic opportunities but also enhances the understanding of disease mechanisms and advances precision therapeutics for pediatric ALL.


Asunto(s)
Mutación , Humanos , Niño , Leucemia-Linfoma Linfoblástico de Células Precursoras B/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras B/epidemiología , Leucemia-Linfoma Linfoblástico de Células T Precursoras/genética , Linfocitos T/inmunología , Linfocitos T/metabolismo , Linfocitos B/inmunología , Linfocitos B/metabolismo
6.
J Pers Med ; 14(6)2024 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-38929876

RESUMEN

BACKGROUND/OBJECTIVES: Temporomandibular disorder (TMD) is the term used to describe a pathology (dysfunction and pain) in the masticatory muscles and temporomandibular joint (TMJ). There is an apparent upward trend in the publication of dental research and a need to continually improve the quality of research. Therefore, this study was conducted to analyse the use of sample size and effect size calculations in a TMD randomised controlled trial. METHODS: The period was restricted to the full 5 years, i.e., papers published in 2019, 2020, 2021, 2022, and 2023. The filter article type-"Randomized Controlled Trial" was used. The studies were graded on a two-level scale: 0-1. In the case of 1, sample size (SS) and effect size (ES) were calculated. RESULTS: In the entire study sample, SS was used in 58% of studies, while ES was used in 15% of studies. CONCLUSIONS: Quality should improve as research increases. One factor that influences quality is the level of statistics. SS and ES calculations provide a basis for understanding the results obtained by the authors. Access to formulas, online calculators and software facilitates these analyses. High-quality trials provide a solid foundation for medical progress, fostering the development of personalized therapies that provide more precise and effective treatment and increase patients' chances of recovery. Improving the quality of TMD research, and medical research in general, helps to increase public confidence in medical advances and raises the standard of patient care.

7.
Korean J Anesthesiol ; 77(3): 316-325, 2024 06.
Artículo en Inglés | MEDLINE | ID: mdl-38835136

RESUMEN

The statistical significance of a clinical trial analysis result is determined by a mathematical calculation and probability based on null hypothesis significance testing. However, statistical significance does not always align with meaningful clinical effects; thus, assigning clinical relevance to statistical significance is unreasonable. A statistical result incorporating a clinically meaningful difference is a better approach to present statistical significance. Thus, the minimal clinically important difference (MCID), which requires integrating minimum clinically relevant changes from the early stages of research design, has been introduced. As a follow-up to the previous statistical round article on P values, confidence intervals, and effect sizes, in this article, we present hands-on examples of MCID and various effect sizes and discuss the terms statistical significance and clinical relevance, including cautions regarding their use.


Asunto(s)
Diferencia Mínima Clínicamente Importante , Humanos , Probabilidad , Proyectos de Investigación , Ensayos Clínicos como Asunto/métodos , Interpretación Estadística de Datos , Intervalos de Confianza
8.
J Evol Biol ; 37(8): 986-993, 2024 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-38843076

RESUMEN

Statistical analysis and data visualization are integral parts of science communication. One of the major issues in current data analysis practice is an overdependency on-and misuse of-p-values. Researchers have been advocating for the estimation and reporting of effect sizes for quantitative research to enhance the clarity and effectiveness of data analysis. Reporting effect sizes in scientific publications has until now been mainly limited to numeric tables, even though effect size plotting is a more effective means of communicating results. We have developed the Durga R package for estimating and plotting effect sizes for paired and unpaired group comparisons. Durga allows users to estimate unstandardized and standardized effect sizes and bootstrapped confidence intervals of the effect sizes. The central functionality of Durga is to combine effect size visualizations with traditional plotting methods. Durga is a powerful statistical and data visualization package that is easy to use, providing the flexibility to estimate effect sizes of paired and unpaired data using different statistical methods. Durga provides a plethora of options for plotting effect size, which allows users to plot data in the most informative and aesthetic way. Here, we introduce the package and its various functions. We further describe a workflow for estimating and plotting effect sizes using example data sets.


Asunto(s)
Programas Informáticos , Interpretación Estadística de Datos , Visualización de Datos
9.
J Biopharm Stat ; : 1-20, 2024 Jun 10.
Artículo en Inglés | MEDLINE | ID: mdl-38853696

RESUMEN

The main idea of this paper is to approximate the exact p-value of a class of non-parametric, two-sample location-scale tests. In this paper, the most famous non-parametric two-sample location-scale tests are formulated in a class of linear rank tests. The permutation distribution of this class is derived from a random allocation design. This allows us to approximate the exact p-value of the non-parametric two-sample location-scale tests of the considered class using the saddlepoint approximation method. The proposed method shows high accuracy in approximating the exact p-value compared to the normal approximation method. Moreover, the proposed method only requires a few calculations and time, as in the case of the simulated method. The procedures of the proposed method are clarified through four sets of real data that represent applications for a number of different fields. In addition, a simulation study compares the proposed method with the traditional methods to approximate the exact p-value of the specified class of the non-parametric two-sample location-scale tests.

10.
Genome Med ; 16(1): 56, 2024 04 16.
Artículo en Inglés | MEDLINE | ID: mdl-38627848

RESUMEN

Despite the abundance of genotype-phenotype association studies, the resulting association outcomes often lack robustness and interpretations. To address these challenges, we introduce PheSeq, a Bayesian deep learning model that enhances and interprets association studies through the integration and perception of phenotype descriptions. By implementing the PheSeq model in three case studies on Alzheimer's disease, breast cancer, and lung cancer, we identify 1024 priority genes for Alzheimer's disease and 818 and 566 genes for breast cancer and lung cancer, respectively. Benefiting from data fusion, these findings represent moderate positive rates, high recall rates, and interpretation in gene-disease association studies.


Asunto(s)
Enfermedad de Alzheimer , Neoplasias de la Mama , Aprendizaje Profundo , Neoplasias Pulmonares , Humanos , Femenino , Enfermedad de Alzheimer/genética , Teorema de Bayes , Estudios de Asociación Genética , Neoplasias de la Mama/genética
11.
Proc Natl Acad Sci U S A ; 121(15): e2304671121, 2024 Apr 09.
Artículo en Inglés | MEDLINE | ID: mdl-38564640

RESUMEN

Contingency tables, data represented as counts matrices, are ubiquitous across quantitative research and data-science applications. Existing statistical tests are insufficient however, as none are simultaneously computationally efficient and statistically valid for a finite number of observations. In this work, motivated by a recent application in reference-free genomic inference [K. Chaung et al., Cell 186, 5440-5456 (2023)], we develop Optimized Adaptive Statistic for Inferring Structure (OASIS), a family of statistical tests for contingency tables. OASIS constructs a test statistic which is linear in the normalized data matrix, providing closed-form P-value bounds through classical concentration inequalities. In the process, OASIS provides a decomposition of the table, lending interpretability to its rejection of the null. We derive the asymptotic distribution of the OASIS test statistic, showing that these finite-sample bounds correctly characterize the test statistic's P-value up to a variance term. Experiments on genomic sequencing data highlight the power and interpretability of OASIS. Using OASIS, we develop a method that can detect SARS-CoV-2 and Mycobacterium tuberculosis strains de novo, which existing approaches cannot achieve. We demonstrate in simulations that OASIS is robust to overdispersion, a common feature in genomic data like single-cell RNA sequencing, where under accepted noise models OASIS provides good control of the false discovery rate, while Pearson's [Formula: see text] consistently rejects the null. Additionally, we show in simulations that OASIS is more powerful than Pearson's [Formula: see text] in certain regimes, including for some important two group alternatives, which we corroborate with approximate power calculations.


Asunto(s)
Genoma , Genómica , Mapeo Cromosómico
12.
Cureus ; 16(3): e56418, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38638715

RESUMEN

Background Organ and body development greatly varies in pediatric patients from year to year. Therefore, the incidence of each adverse event following phenobarbital (PB) administration would vary with age. However, in clinical trials, increasing the sample size of pediatric patients in each age group has been challenging. Therefore, previous studies were conducted by dividing pediatric patients into three or four age groups based on the development stage. Although these results were useful in clinical settings, information on adverse events that occurred at one-year age increments in pediatric patients could further enhance treatment and care. Objectives This study investigated in one-year age increments the occurrence tendency of each adverse event following PB administration in pediatric patients. Methods This study used data obtained from the U.S. Food and Drug Administration Adverse Event Reporting System (FAERS). Two inclusion criteria were set: (1) treatment with PB between January 2004 and June 2023 and (2) age 0-15 years. Using the cutoff value obtained using the Wilcoxon-Mann-Whitney test by the minimum p-value approach, this study explored changes in the occurrence tendency of each adverse event in one-year age increments. At the minimum p-value of <0.05, the age corresponding to this p-value was determined as the cutoff value. Conversely, at the minimum p-value of ≥0.05, the cutoff value was considered nonexistent. Results This study investigated all types of adverse events and explored the cutoff value for each adverse event. We identified 34, 16, 15, nine, five, five, eight, three, and eight types of adverse events for the cutoff values of ≤3/>3, ≤4/>4, ≤5/>5, ≤6/>6, ≤7/>7, ≤8/>8, ≤9/>9, ≤10/>10, and ≤11/>11 years, respectively. Conclusions This study demonstrated that adverse events requiring attention in pediatric patients varied with age. The findings help in the improvement of treatment and care in the pediatric clinical settings.

13.
Biometrics ; 80(2)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38536747

RESUMEN

We develop a method for hybrid analyses that uses external controls to augment internal control arms in randomized controlled trials (RCTs) where the degree of borrowing is determined based on similarity between RCT and external control patients to account for systematic differences (e.g., unmeasured confounders). The method represents a novel extension of the power prior where discounting weights are computed separately for each external control based on compatibility with the randomized control data. The discounting weights are determined using the predictive distribution for the external controls derived via the posterior distribution for time-to-event parameters estimated from the RCT. This method is applied using a proportional hazards regression model with piecewise constant baseline hazard. A simulation study and a real-data example are presented based on a completed trial in non-small cell lung cancer. It is shown that the case weighted power prior provides robust inference under various forms of incompatibility between the external controls and RCT population.


Asunto(s)
Proyectos de Investigación , Humanos , Simulación por Computador , Modelos de Riesgos Proporcionales , Teorema de Bayes
14.
J Transl Med ; 22(1): 258, 2024 03 09.
Artículo en Inglés | MEDLINE | ID: mdl-38461317

RESUMEN

BACKGROUND: The term eGene has been applied to define a gene whose expression level is affected by at least one independent expression quantitative trait locus (eQTL). It is both theoretically and empirically important to identify eQTLs and eGenes in genomic studies. However, standard eGene detection methods generally focus on individual cis-variants and cannot efficiently leverage useful knowledge acquired from auxiliary samples into target studies. METHODS: We propose a multilocus-based eGene identification method called TLegene by integrating shared genetic similarity information available from auxiliary studies under the statistical framework of transfer learning. We apply TLegene to eGene identification in ten TCGA cancers which have an explicit relevant tissue in the GTEx project, and learn genetic effect of variant in TCGA from GTEx. We also adopt TLegene to the Geuvadis project to evaluate its usefulness in non-cancer studies. RESULTS: We observed substantial genetic effect correlation of cis-variants between TCGA and GTEx for a larger number of genes. Furthermore, consistent with the results of our simulations, we found that TLegene was more powerful than existing methods and thus identified 169 distinct candidate eGenes, which was much larger than the approach that did not consider knowledge transfer across target and auxiliary studies. Previous studies and functional enrichment analyses provided empirical evidence supporting the associations of discovered eGenes, and it also showed evidence of allelic heterogeneity of gene expression. Furthermore, TLegene identified more eGenes in Geuvadis and revealed that these eGenes were mainly enriched in cells EBV transformed lymphocytes tissue. CONCLUSION: Overall, TLegene represents a flexible and powerful statistical method for eGene identification through transfer learning of genetic similarity shared across auxiliary and target studies.


Asunto(s)
Neoplasias , Polimorfismo de Nucleótido Simple , Humanos , Sitios de Carácter Cuantitativo/genética , Genómica , Neoplasias/genética , Aprendizaje Automático , Estudio de Asociación del Genoma Completo/métodos
15.
Postgrad Med J ; 100(1185): 451-460, 2024 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-38330498

RESUMEN

First popularized almost a century ago in epidemiologic research by Ronald Fisher and Jerzy Neyman, the P-value has become perhaps the most misunderstood and even misused statistical value or descriptor. Indeed, modern clinical research has now come to be centered around and guided by an arbitrary P-value of <0.05 as a magical threshold for significance, so much so that experimental design, reporting of experimental findings, and interpretation and adoption of such findings have become largely dependent on this "significant" P-value. This has given rise to multiple biases in the overall body of biomedical literature that threatens the very validity of clinical research. Ultimately, a drive toward reporting a "significant" P-value (by various statistical manipulations) risks creating a falsely positive body of science, leading to (i) wasted resources in pursuing fruitless research and (ii) futile or even harmful policies/therapeutic recommendations. This article reviews the history of the P-value, the conceptual basis of P-value in the context of hypothesis testing and challenges in critically appraising clinical evidence vis-à-vis the P-value. This review is aimed at raising awareness of the pitfalls of this rigid observation of the threshold of statistical significance when evaluating clinical trials and to generate discussion regarding whether the scientific body needs a rethink about how we decide clinical significance.


Asunto(s)
Medicina Basada en la Evidencia , Humanos , Investigación Biomédica , Proyectos de Investigación , Interpretación Estadística de Datos
16.
J Clin Transl Sci ; 8(1): e9, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38384917

RESUMEN

The proposal of improving reproducibility by lowering the significance threshold to 0.005 has been discussed, but the impact on conducting clinical trials has yet to be examined from a study design perspective. The impact on sample size and study duration was investigated using design setups from 125 phase II studies published between 2015 and 2022. The impact was assessed using percent increase in sample size and additional years of accrual with the medians being 110.97% higher and 2.65 years longer respectively. The results indicated that this proposal causes additional financial burdens that reduce the efficiency of conducting clinical trials.

17.
Clin Interv Aging ; 19: 277-287, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38380229

RESUMEN

Null hypothesis significant testing (NHST) is the dominant statistical approach in the geriatric and rehabilitation fields. However, NHST is routinely misunderstood or misused. In this case, the findings from clinical trials would be taken as evidence of no effect, when in fact, a clinically relevant question may have a "non-significant" p-value. Conversely, findings are considered clinically relevant when significant differences are observed between groups. To assume that p-value is not an exclusive indicator of an association or the existence of an effect, researchers should be encouraged to report other statistical analysis approaches as Bayesian analysis and complementary statistical tools alongside the p-value (eg, effect size, confidence intervals, minimal clinically important difference, and magnitude-based inference) to improve interpretation of the findings of clinical trials by presenting a more efficient and comprehensive analysis. However, the focus on Bayesian analysis and secondary statistical analyses does not mean that NHST is less important. Only that, to observe a real intervention effect, researchers should use a combination of secondary statistical analyses in conjunction with NHST or Bayesian statistical analysis to reveal what p-values cannot show in the geriatric and rehabilitation studies (eg, the clinical importance of 1kg increase in handgrip strength in the intervention group of long-lived older adults compared to a control group). This paper provides potential insights for improving the interpretation of scientific data in rehabilitation and geriatric fields by utilizing Bayesian and secondary statistical analyses to better scrutinize the results of clinical trials where a p-value alone may not be appropriate to determine the efficacy of an intervention.


Asunto(s)
Fuerza de la Mano , Proyectos de Investigación , Humanos , Anciano , Teorema de Bayes , Interpretación Estadística de Datos
18.
Biom J ; 66(2): e2200204, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38356198

RESUMEN

Storey's estimator for the proportion of true null hypotheses, originally proposed under the continuous framework, has been modified in this work under the discrete framework. The modification results in improved estimation of the parameter of interest. The proposed estimator is used to formulate an adaptive version of the Benjamini-Hochberg procedure. Control over the false discovery rate by the proposed adaptive procedure has been proved analytically. The proposed estimate is also used to formulate an adaptive version of the Benjamini-Hochberg-Heyse procedure. Simulation experiments establish the conservative nature of this new adaptive procedure. Substantial amount of gain in power is observed for the new adaptive procedures over the standard procedures. For demonstration of the proposed method, two important real life gene expression data sets, one related to the study of HIV and the other related to methylation study, are used.


Asunto(s)
Simulación por Computador
19.
Rev. neurol. (Ed. impr.) ; 78(1)1 - 15 de Enero 2024. tab
Artículo en Español | IBECS | ID: ibc-229062

RESUMEN

Una práctica muy habitual en la investigación médica, durante el proceso de análisis de los datos, es dicotomizar variables numéricas en dos grupos. Dicha práctica conlleva la pérdida de información muy útil que puede restar eficacia a la investigación. A través de varios ejemplos, se muestra cómo con la dicotomización de variables numéricas los estudios pierden potencia estadística. Esto puede ser un aspecto crítico que impida valorar, por ejemplo, si un procedimiento terapéutico es más efectivo o si un determinado factor es de riesgo. Por tanto, se recomienda no dicotomizar las variables continuas si no existe un motivo muy concreto para ello. (AU)


Abstract. A very common practice in medical research, during the process of data analysis, is to dichotomise numerical variables in two groups. This leads to the loss of very useful information that can undermine the effectiveness of the research. Several examples are used to show how the dichotomisation of numerical variables can lead to a loss of statistical power in studies. This can be a critical aspect in assessing, for example, whether a therapeutic procedure is more effective or whether a certain factor is a risk factor. Dichotomising continuous variables is therefore not recommended unless there is a very specific reason to do so. (AU)


Asunto(s)
Investigación Biomédica/estadística & datos numéricos , Modelos Estadísticos
20.
Proteomics ; 24(5): e2300145, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-37726251

RESUMEN

Exact p-value (XPV)-based methods for dot product-like score functions-such as the XCorr score implemented in Tide, SEQUEST, Comet or shared peak count-based scoring in MSGF+ and ASPV-provide a fairly good calibration for peptide-spectrum-match (PSM) scoring in database searching-based MS/MS spectrum data identification. Unfortunately, standard XPV methods, in practice, cannot handle high-resolution fragmentation data produced by state-of-the-art mass spectrometers because having smaller bins increases the number of fragment matches that are assigned to incorrect bins and scored improperly. In this article, we present an extension of the XPV method, called the high-resolution exact p-value (HR-XPV) method, which can be used to calibrate PSM scores of high-resolution MS/MS spectra obtained with dot product-like scoring such as the XCorr. The HR-XPV carries remainder masses throughout the fragmentation, allowing them to greatly increase the number of fragments that are properly assigned to the correct bin and, thus, taking advantage of high-resolution data. Using four mass spectrometry data sets, our experimental results demonstrate that HR-XPV produces well-calibrated scores, which in turn results in more trusted spectrum annotations at any false discovery rate level.


Asunto(s)
Algoritmos , Espectrometría de Masas en Tándem , Espectrometría de Masas en Tándem/métodos , Programas Informáticos , Péptidos/química , Calibración , Bases de Datos de Proteínas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA