RESUMEN
Experimental water research lacks clear methodology to estimate experimental error. Especially when natural waters are involved, the characterization tools bear method-specific artifacts while the varying environmental conditions prevent regular repeats. This tutorial review identifies common mistakes, and proposes a practical procedure to determine experimental errors at the example of membrane filtration. Statistical analysis is often applied to an insufficient number of repeated measurements, while not all error sources and contributions are considered. This results in an underestimation of the experimental error. Variations in relevant experimental parameters need to be investigated systematically, and the related errors are quantified as a half of the variation between the max and min values when standard deviation is not applicable. Error of calculated parameters (e.g. flux, pollutant removal and mass loss) is estimated by applying error propagation, where weighing contributions of the experimental parameters are considered. Appropriate judgment and five-time repetition of a selected experiment under identical conditions are proposed to validate the propagated experimental error. For validation, the five repeated data points should lie within the estimated error range of the error bar. The proposed error evaluation procedure is adaptable in experimental water research and intended for researchers to identify the contributing factors of an experimental error and carry out appropriate error quantification and validation. The most important aim is to raise awareness of the necessity to question error methodology and reproducibility of experimental data, to produce and publish high quality research.
Asunto(s)
Filtración , Membranas Artificiales , Filtración/métodos , Purificación del Agua/métodos , Agua/química , Reproducibilidad de los Resultados , Proyectos de Investigación , Error Científico Experimental/estadística & datos numéricosAsunto(s)
Ensayos Clínicos como Asunto , Medicina , Error Científico Experimental , Mala Conducta Científica , Humanos , Ensayos Clínicos como Asunto/ética , Ensayos Clínicos como Asunto/normas , Ensayos Clínicos como Asunto/estadística & datos numéricos , Fraude/prevención & control , Fraude/estadística & datos numéricos , Medicina/métodos , Medicina/normas , Error Científico Experimental/estadística & datos numéricos , Mala Conducta Científica/estadística & datos numéricosRESUMEN
Nucleotide sequence reagents underpin molecular techniques that have been applied across hundreds of thousands of publications. We have previously reported wrongly identified nucleotide sequence reagents in human research publications and described a semi-automated screening tool Seek & Blastn to fact-check their claimed status. We applied Seek & Blastn to screen >11,700 publications across five literature corpora, including all original publications in Gene from 2007 to 2018 and all original open-access publications in Oncology Reports from 2014 to 2018. After manually checking Seek & Blastn outputs for >3,400 human research articles, we identified 712 articles across 78 journals that described at least one wrongly identified nucleotide sequence. Verifying the claimed identities of >13,700 sequences highlighted 1,535 wrongly identified sequences, most of which were claimed targeting reagents for the analysis of 365 human protein-coding genes and 120 non-coding RNAs. The 712 problematic articles have received >17,000 citations, including citations by human clinical trials. Given our estimate that approximately one-quarter of problematic articles may misinform the future development of human therapies, urgent measures are required to address unreliable gene research articles.
Asunto(s)
Secuencia de Bases/genética , Investigación Genética , Genoma Humano/genética , Publicaciones/estadística & datos numéricos , Error Científico Experimental/estadística & datos numéricos , Genética Humana/normas , Humanos , Proteínas/genéticaAsunto(s)
Investigadores/legislación & jurisprudencia , Investigadores/normas , Error Científico Experimental/estadística & datos numéricos , Error Científico Experimental/tendencias , Mala Conducta Científica/legislación & jurisprudencia , Animales , Sesgo , Organización de la Financiación/estadística & datos numéricos , Humanos , Factor de Impacto de la Revista , Proyectos de Investigación , Investigadores/educación , Investigadores/ética , Error Científico Experimental/psicología , Mala Conducta Científica/estadística & datos numéricosRESUMEN
Although variables are often measured with error, the impact of measurement error on machine-learning predictions is seldom quantified. The purpose of this study was to assess the impact of measurement error on the performance of random-forest models and variable importance. First, we assessed the impact of misclassification (i.e., measurement error of categorical variables) of predictors on random-forest model performance (e.g., accuracy, sensitivity) and variable importance (mean decrease in accuracy) using data from the National Comorbidity Survey Replication (2001-2003). Second, we created simulated data sets in which we knew the true model performance and variable importance measures and could verify that quantitative bias analysis was recovering the truth in misclassified versions of the data sets. Our findings showed that measurement error in the data used to construct random forests can distort model performance and variable importance measures and that bias analysis can recover the correct results. This study highlights the utility of applying quantitative bias analysis in machine learning to quantify the impact of measurement error on study results.
Asunto(s)
Sesgo , Error Científico Experimental/estadística & datos numéricos , Simulación por Computador , Conjuntos de Datos como Asunto , Humanos , Aprendizaje Automático/estadística & datos numéricos , Probabilidad , Intento de Suicidio/estadística & datos numéricosRESUMEN
Although next-generation sequencing is widely used in cancer to profile tumors and detect variants, most somatic variant callers used in these pipelines identify variants at the lowest possible granularity, single-nucleotide variants (SNV). As a result, multiple adjacent SNVs are called individually instead of as a multi-nucleotide variants (MNV). With this approach, the amino acid change from the individual SNV within a codon could be different from the amino acid change based on the MNV that results from combining SNV, leading to incorrect conclusions about the downstream effects of the variants. Here, we analyzed 10,383 variant call files (VCF) from the Cancer Genome Atlas (TCGA) and found 12,141 incorrectly annotated MNVs. Analysis of seven commonly mutated genes from 178 studies in cBioPortal revealed that MNVs were consistently missed in 20 of these studies, whereas they were correctly annotated in 15 more recent studies. At the BRAF V600 locus, the most common example of MNV, several public datasets reported separate BRAF V600E and BRAF V600M variants instead of a single merged V600K variant. VCFs from the TCGA Mutect2 caller were used to develop a solution to merge SNV to MNV. Our custom script used the phasing information from the SNV VCF and determined whether SNVs were at the same codon and needed to be merged into MNV before variant annotation. This study shows that institutions performing NGS sequencing for cancer genomics should incorporate the step of merging MNV as a best practice in their pipelines. SIGNIFICANCE: Identification of incorrect mutation calls in TCGA, including clinically relevant BRAF V600 and KRAS G12, will influence research and potentially clinical decisions.
Asunto(s)
Genoma Humano , Genómica/normas , Anotación de Secuencia Molecular/normas , Mutación , Neoplasias/genética , Polimorfismo de Nucleótido Simple , Error Científico Experimental/estadística & datos numéricos , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Neoplasias/patologíaRESUMEN
OBJECTIVES: The present systematic review aimed to perform an in-depth analysis of the different features of retracted publications in the dental field. MATERIAL AND METHODS: This review has been recorded in the PROSPERO database (CRD42017075634). Two independent reviewers performed an electronic search (Pubmed, Retraction Watch) for retracted articles in dental literature up to December 31, 2018. RESULTS: 180 retracted papers were identified, the first published in 2001. Retractions increased by 47% in the last four-year period (2014-2018), when compared with 2009-2013 (94 and 64 retracted publications, respectively). Author misconduct was the most common reason for retraction (65.0%), followed by honest scientific errors (12.2%) and publisher-related issues (10.6%). The majority of retracted research was conducted in Asia (55.6%), with 49 papers written in India (27.2%). 552 researchers (89%) are listed as authors in only one retracted article, while 10 researchers (1.6%) are present in five or more retracted publications. Retracted articles were cited 530 times after retraction: the great majority of these citations (89.6%) did not consider the existence of the retraction notice and treated data from retracted articles as reliable. CONCLUSIONS: Retractions in dental literature have constantly increased in recent years, with the majority of them due to misconduct and fraud. The publication of unreliable research has many negative consequences. Studies derived from such material are designed on potentially incorrect bases, waste funds and resources, and most importantly, increase risk of incorrect treatment for patients. Citation of retracted papers represents a major issue for the scientific community.
Asunto(s)
Investigación Biomédica/normas , Odontología/normas , Fraude/estadística & datos numéricos , Publicaciones Periódicas como Asunto/estadística & datos numéricos , Error Científico Experimental/estadística & datos numéricos , Mala Conducta Científica/estadística & datos numéricos , Bases de Datos Factuales , Humanos , Publicaciones Periódicas como Asunto/normas , Retractación de Publicación como AsuntoRESUMEN
Researchers detecting heterogeneity of regression in a treatment outcome study including a covariate and random assignment to groups often want to investigate the simple treatment effect at the sample grand mean of the covariate and at points one standard deviation above and below that mean. The estimated variances of the simple treatment effect that have traditionally been used in such tests were derived under the assumption that the covariate values were fixed constants. We derive results appropriate for a two-group experiment that instead presume the covariate is a normally distributed random variable. A simulation study is used to confirm the validity of the analytical results and to compare error estimates and confidence intervals based on these results with those based on assuming a fixed covariate. Discrepancies between estimates for fixed and random covariates of the variability of treatment effects can be substantial. However, in situations where the extent of heterogeneity of regression is like that typically reported, presuming the covariate is random rather than fixed will generally result in only a modest increase in estimated standard errors, and in some circumstances can even result in a smaller estimated standard error. We illustrate the new methods with an empirical data set.
Asunto(s)
Intervalos de Confianza , Error Científico Experimental/estadística & datos numéricos , Estadística como Asunto/métodos , Algoritmos , Análisis de Varianza , Simulación por Computador , Humanos , Modelos Estadísticos , Distribución Aleatoria , Análisis de Regresión , Proyectos de Investigación , Resultado del TratamientoRESUMEN
Measurement of the amplitude of accommodation is established as a procedure in a routine optometric eye examination. However, clinical methods of measurement of this basic optical function have several sources of error. They are numerous and diverse, and include depth of focus, reaction time, instrument design, specification of the measurement end-point, specification of the reference point of measurement, measurement conditions, consideration of refractive error, and psychological factors. Several of these sources of inaccuracy are composed of multiple sub-sources, and many of the sub-sources influence the common methods of measurement of amplitude of accommodation. Consideration of these sources of measurement error casts doubt on the reliability of the results of measurement, on the validity of established normative values that have been produced using these methods, and on the value of reports of the results of surgery designed to restore accommodation. Clinicians can reduce the effects of some of the sources of error by modifying techniques of measurement with existing methods, but a new method may further improve accuracy.
Asunto(s)
Acomodación Ocular/fisiología , Error Científico Experimental/estadística & datos numéricos , Pruebas de Visión/normas , Humanos , Modelos Estadísticos , Reproducibilidad de los Resultados , RetinoscopíaRESUMEN
HPLC-MS/MS analysis of various human cell lines shows the presence of a major amount of bovine protein contaminants. These likely originate from fetal bovine serum (FBS), typically used in cell cultures. If evaluated against a human protein database, on average 10% of the identified human proteins will be misleading (bovine proteins, but indicated as if they were human). Bovine contaminants therefore may cause major bias in proteomic studies of cell cultures, if not considered explicitly.
Asunto(s)
Línea Celular/química , Medios de Cultivo/química , Proteínas/análisis , Albúmina Sérica Bovina/química , Animales , Bovinos , Técnicas de Cultivo de Célula , Contaminación de Medicamentos , Células HeLa , Humanos , Proteómica , Error Científico Experimental/estadística & datos numéricos , Espectrometría de Masas en TándemRESUMEN
Biomedical research, particularly when it involves human beings, is always subjected to sources of error that must be recognized. Systematic error or bias is associated with problems in the methodological design or during the execu-tion phase of a research project. It affects its validity and is qualitatively ap-praised. On the other hand, random error is related to variations due to chance. It may be quantitatively expressed, but never removed. This review is the first of a methodological series on general concepts in biostatistics and clin-ical epidemiology developed by the Chair of Scientific Research Methodology at the School of Medicine, University of Valparaíso, Chile. In this article, we address the theoretical concepts of error, its evaluation, and control. Finally, we discuss some current controversies in its conceptualization that are relevant to undergraduate and graduate students of health sciences.
La investigación biomédica, particularmente la que involucra a seres humanos, está siempre sometida a fuentes de error que deben ser reconocidas. El error sistemático o sesgo, se asocia con debilidades en el diseño metodológico o de la fase de ejecución del estudio. Éste afecta su validez y se valora cualitativamente. Por su parte, el error aleatorio se relaciona con las variaciones producidas por el azar, el cual puede expresarse cuantitativamente, pero nunca eliminarse. Esta revisión es la primera entrega de una serie metodológica sobre conceptos generales en bioestadística y epidemiología clínica desarrollada por la Cátedra de Metodología de la Investigación Científica de la Universidad de Valparaíso, Chile. En este artículo se abordan los conceptos teóricos asociados al error, su evaluación y control. Finalmente, se discuten algunas controversias actuales en cuanto a su conceptualización, de relevancia para estudiantes de pre y posgrado de ciencias de la salud.
Asunto(s)
Investigación Biomédica/estadística & datos numéricos , Bioestadística/métodos , Epidemiología/estadística & datos numéricos , Sesgo , Humanos , Proyectos de Investigación , Error Científico Experimental/estadística & datos numéricosRESUMEN
In randomised trials, continuous endpoints are often measured with some degree of error. This study explores the impact of ignoring measurement error and proposes methods to improve statistical inference in the presence of measurement error. Three main types of measurement error in continuous endpoints are considered: classical, systematic, and differential. For each measurement error type, a corrected effect estimator is proposed. The corrected estimators and several methods for confidence interval estimation are tested in a simulation study. These methods combine information about error-prone and error-free measurements of the endpoint in individuals not included in the trial (external calibration sample). We show that, if measurement error in continuous endpoints is ignored, the treatment effect estimator is unbiased when measurement error is classical, while Type-II error is increased at a given sample size. Conversely, the estimator can be substantially biased when measurement error is systematic or differential. In those cases, bias can largely be prevented and inferences improved upon using information from an external calibration sample, of which the required sample size increases as the strength of the association between the error-prone and error-free endpoint decreases. Measurement error correction using already a small (external) calibration sample is shown to improve inferences and should be considered in trials with error-prone endpoints. Implementation of the proposed correction methods is accommodated by a new software package for R.
Asunto(s)
Determinación de Punto Final , Ensayos Clínicos Controlados Aleatorios como Asunto/métodos , Error Científico Experimental , Simulación por Computador , Interpretación Estadística de Datos , Determinación de Punto Final/métodos , Determinación de Punto Final/estadística & datos numéricos , Hemoglobinas/análisis , Humanos , Ensayos Clínicos Controlados Aleatorios como Asunto/normas , Tamaño de la Muestra , Error Científico Experimental/estadística & datos numéricosAsunto(s)
Investigación Biomédica/métodos , Edición/normas , Proyectos de Investigación/estadística & datos numéricos , Teorema de Bayes , Sesgo , Intervalos de Confianza , Humanos , Probabilidad , Edición/estadística & datos numéricos , Proyectos de Investigación/tendencias , Error Científico Experimental/estadística & datos numéricosAsunto(s)
Interpretación Estadística de Datos , Error Científico Experimental/estadística & datos numéricos , Inteligencia Artificial , Macrodatos , Cambio Climático , Intervalos de Confianza , Ética en Investigación , Modelos Teóricos , Comunicación Persuasiva , Política , Edición/normas , Reproducibilidad de los Resultados , Proyectos de InvestigaciónRESUMEN
It is known that the one-sided Simes' test controls the error rate if the underlying distribution is multivariate totally positive of order 2 (MTP2), but not in general. The two-sided test also controls the error rate when the coordinate absolute value has an MTP2 distribution, which holds more generally. We prove mathematically that when the coordinate absolute value controls the error rate at level 2α, then certain kinds of truncated Simes' tests also control the one-sided error rate at level α. We also compare the closure of the truncated tests with the Holms, Hochberg, and Hommel procedures in many scenarios when the test statistics are multivariate normal.
Asunto(s)
Interpretación Estadística de Datos , Análisis Multivariante , Distribuciones Estadísticas , Biometría , Intervalos de Confianza , Humanos , Error Científico Experimental/estadística & datos numéricosRESUMEN
Tethered particle motion experiments are versatile single-molecule techniques enabling one to address in vitro the molecular properties of DNA and its interactions with various partners involved in genetic regulations. These techniques provide raw data such as the tracked particle amplitude of movement, from which relevant information about DNA conformations or states must be recovered. Solving this inverse problem appeals to specific theoretical tools that have been designed in the two last decades, together with the data pre-processing procedures that ought to be implemented to avoid biases inherent to these experimental techniques. These statistical tools and models are reviewed in this paper.
Asunto(s)
ADN/química , Modelos Estadísticos , Imagen Individual de Molécula/métodos , Cadenas de Markov , Simulación de Dinámica Molecular , Movimiento (Física) , Conformación de Ácido Nucleico , Física , Error Científico Experimental/estadística & datos numéricosRESUMEN
Quinones are becoming an essential tool for refractory organics treatment, while their quantification may be not well-considered. In this paper, two kinds of potential errors in quantification were evaluated in multiple pH conditions. They were derived from the coexistence of oxidized/reduced quinone species (Type I) and pH-sensitive feature (Type II), respectively. These errors would remarkably influence the accuracy of quantification while they haven't been emphasized. Thus, to elaborate the relationship between the two types of errors and the absorbance or pH conditions, three typical quinones [Anthraquinone-1-sulfonate (α-AQS), anthraquinone-2,6-disulfonate (AQDS) and lawsone] were selected and their acid dissociation coefficients (pKa) as well as UV-Vis spectra were determined. Results revealed that, for Type I, the relative error (RE) of α-AQS concentration would exceed the limit (5%) when reduced α-AQS was below 48% of total α-AQS. Similar results were found for lawsone. However, the RE can be eliminated by the equation established in this paper. For Type II, the pH-sensitive feature was related to the pKa values of quinones. Absorbances of α-AQS and lawsone would change remarkably with pH variation. Therefore, a model for correction was established. Analog data showed high consistency with experimental data [râ¯=â¯0.995 (nâ¯=â¯25, pâ¯<â¯0.01) and râ¯=â¯0.997 (nâ¯=â¯36, pâ¯<â¯0.01), for lawsone and α-AQS respectively]. Especially, the determination of AQDS concentrations was noticed to be pH-independent at 437â¯nm under pH 4.00 to 9.18 conditions. Based on these features, a comprehensive data solution was proposed for handling these errors.
Asunto(s)
Antraquinonas/análisis , Naftoquinonas/análisis , Error Científico Experimental/estadística & datos numéricos , Purificación del Agua/métodos , Calibración/normas , Concentración de Iones de Hidrógeno , Oxidación-Reducción , Quinonas/análisis , Aguas Residuales/químicaRESUMEN
The qPCR method provides an inexpensive, rapid method for estimating relative telomere length across a set of biological samples. Like all laboratory methods, it involves some degree of measurement error. The estimation of relative telomere length is done subjecting the actual measurements made (the Cq values for telomere and a control gene) to non-linear transformations and combining them into a ratio (the TS ratio). Here, we use computer simulations, supported by mathematical analysis, to explore how errors in measurement affect qPCR estimates of relative telomere length, both in cross-sectional and longitudinal data. We show that errors introduced at the level of Cq values are magnified when the TS ratio is calculated. If the errors at the Cq level are normally distributed and independent of true telomere length, those in the TS ratio are positively skewed and proportional to true telomere length. The repeatability of the TS ratio declines sharply with increasing error in measurement of the Cq values for telomere and/or control gene. In simulated longitudinal data, measurement error alone can produce a pattern of low correlation between successive measures of relative telomere length, coupled with a strong negative dependency of the rate of change on initial relative telomere length. Our results illustrate the importance of reducing measurement error: a small increase in error in Cq values can have large consequences for the power and interpretability of qPCR estimates of relative telomere length. The findings also illustrate the importance of characterising the measurement error in each dataset-coefficients of variation are generally unhelpful, and researchers should report standard deviations of Cq values and/or repeatabilities of TS ratios-and allowing for the known effects of measurement error when interpreting patterns of TS ratio change over time.
Asunto(s)
Reacción en Cadena en Tiempo Real de la Polimerasa/métodos , Error Científico Experimental/estadística & datos numéricos , Simulación por Computador/estadística & datos numéricos , Estudios Transversales , Humanos , Telómero/genética , Homeostasis del Telómero/fisiologíaRESUMEN
Objective: The research sought to determine the prevalence of errata for drug trial publications that are included in systematic reviews, their potential value to reviews, and their accessibility via standard information retrieval methods. Methods: The authors conducted a retrospective review of included studies from forty systematic reviews of drugs evaluated by the Canadian Agency for Drugs and Technologies in Health (CADTH) Common Drug Review (CDR) in 2015. For each article that was included in the systematic reviews, we conducted searches for associated errata using the CDR review report, PubMed, and the journal publishers' websites. The severity of errors described in errata was evaluated using a three-category scale: trivial, minor, or major. The accessibility of errata was determined by examining inclusion in bibliographic databases, costs of obtaining errata, time lag between article and erratum publication, and correction of online articles. Results: The 40 systematic reviews included 127 articles in total, for which 26 errata were identified. These errata described 38 errors. When classified by severity, 6 errors were major; 20 errors were minor; and 12 errors were trivial. No one database contained all the errata. On average, errata were published 211 days after the original article (range: 15-1,036 days). All were freely available. Over one-third (9/24) of online articles were uncorrected after errata publication. Conclusion: Errata frequently described non-trivial errors that would either impact the interpretation of data in the article or, in fewer cases, impact the conclusions of the study. As such, it seems useful for reviewers to identify errata associated with included studies. However, publication time lag and inconsistent database indexing impair errata accessibility.