Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Br J Math Stat Psychol ; 76(2): 353-371, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36627229

RESUMO

Ordinal data occur frequently in the social sciences. When applying principal component analysis (PCA), however, those data are often treated as numeric, implying linear relationships between the variables at hand; alternatively, non-linear PCA is applied where the obtained quantifications are sometimes hard to interpret. Non-linear PCA for categorical data, also called optimal scoring/scaling, constructs new variables by assigning numerical values to categories such that the proportion of variance in those new variables that is explained by a predefined number of principal components (PCs) is maximized. We propose a penalized version of non-linear PCA for ordinal variables that is a smoothed intermediate between standard PCA on category labels and non-linear PCA as used so far. The new approach is by no means limited to monotonic effects and offers both better interpretability of the non-linear transformation of the category labels and better performance on validation data than unpenalized non-linear PCA and/or standard linear PCA. In particular, an application of penalized optimal scaling to ordinal data as given with the International Classification of Functioning, Disability and Health (ICF) is provided.


Assuntos
Dinâmica não Linear , Humanos , Análise de Componente Principal , Avaliação da Deficiência
2.
Psychol Methods ; 28(3): 558-579, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35298215

RESUMO

The last 25 years have shown a steady increase in attention for the Bayes factor as a tool for hypothesis evaluation and model selection. The present review highlights the potential of the Bayes factor in psychological research. We discuss six types of applications: Bayesian evaluation of point null, interval, and informative hypotheses, Bayesian evidence synthesis, Bayesian variable selection and model averaging, and Bayesian evaluation of cognitive models. We elaborate what each application entails, give illustrative examples, and provide an overview of key references and software with links to other applications. The article is concluded with a discussion of the opportunities and pitfalls of Bayes factor applications and a sketch of corresponding future research lines. (PsycInfo Database Record (c) 2023 APA, all rights reserved).


Assuntos
Teorema de Bayes , Pesquisa Comportamental , Psicologia , Humanos , Pesquisa Comportamental/métodos , Psicologia/métodos , Software , Projetos de Pesquisa
3.
Psychon Bull Rev ; 30(2): 534-552, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36085233

RESUMO

In classical statistics, there is a close link between null hypothesis significance testing (NHST) and parameter estimation via confidence intervals. However, for the Bayesian counterpart, a link between null hypothesis Bayesian testing (NHBT) and Bayesian estimation via a posterior distribution is less straightforward, but does exist, and has recently been reiterated by Rouder, Haaf, and Vandekerckhove (2018). It hinges on a combination of a point mass probability and a probability density function as prior (denoted as the spike-and-slab prior). In the present paper, it is first carefully explained how the spike-and-slab prior is defined, and how results can be derived for which proofs were not given in Rouder, Haaf, and Vandekerckhove (2018). Next, it is shown that this spike-and-slab prior can be approximated by a pure probability density function with a rectangular peak around the center towering highly above the remainder of the density function. Finally, we will indicate how this 'hill-and-chimney' prior may in turn be approximated by fully continuous priors. In this way, it is shown that NHBT results can be approximated well by results from estimation using a strongly peaked prior, and it is noted that the estimation itself offers more than merely the posterior odds on which NHBT is based. Thus, it complies with the strong APA requirement of not just mentioning testing results but also offering effect size information. It also offers a transparent perspective on the NHBT approach employing a prior with a strong peak around the chosen point null hypothesis value.


Assuntos
Projetos de Pesquisa , Humanos , Teorema de Bayes , Funções Verossimilhança
4.
Psychol Methods ; 27(3): 466-475, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35901398

RESUMO

In 2019 we wrote an article (Tendeiro & Kiers, 2019) in Psychological Methods over null hypothesis Bayesian testing and its working horse, the Bayes factor. Recently, van Ravenzwaaij and Wagenmakers (2021) offered a response to our piece, also in this journal. Although we do welcome their contribution with thought-provoking remarks on our article, we ended up concluding that there were too many "issues" in van Ravenzwaaij and Wagenmakers (2021) that warrant a rebuttal. In this article we both defend the main premises of our original article and we put the contribution of van Ravenzwaaij and Wagenmakers (2021) under critical appraisal. Our hope is that this exchange between scholars decisively contributes toward a better understanding among psychologists of null hypothesis Bayesian testing in general and of the Bayes factor in particular. (PsycInfo Database Record (c) 2022 APA, all rights reserved).


Assuntos
Projetos de Pesquisa , Teorema de Bayes , Interpretação Estatística de Dados
5.
Psychon Bull Rev ; 29(1): 70-87, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34254263

RESUMO

The practice of sequentially testing a null hypothesis as data are collected until the null hypothesis is rejected is known as optional stopping. It is well known that optional stopping is problematic in the context of p value-based null hypothesis significance testing: The false-positive rates quickly overcome the single test's significance level. However, the state of affairs under null hypothesis Bayesian testing, where p values are replaced by Bayes factors, has perhaps surprisingly been much less consensual. Rouder (2014) used simulations to defend the use of optional stopping under null hypothesis Bayesian testing. The idea behind these simulations is closely related to the idea of sampling from prior predictive distributions. Deng et al. (2016) and Hendriksen et al. (2020) have provided mathematical evidence to the effect that optional stopping under null hypothesis Bayesian testing does hold under some conditions. These papers are, however, exceedingly technical for most researchers in the applied social sciences. In this paper, we provide some mathematical derivations concerning Rouder's approximate simulation results for the two Bayesian hypothesis tests that he considered. The key idea is to consider the probability distribution of the Bayes factor, which is regarded as being a random variable across repeated sampling. This paper therefore offers an intuitive perspective to the literature and we believe it is a valid contribution towards understanding the practice of optional stopping in the context of Bayesian hypothesis testing.


Assuntos
Projetos de Pesquisa , Teorema de Bayes , Simulação por Computador , Humanos , Probabilidade
6.
Front Psychol ; 12: 738258, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34721211

RESUMO

Opinion polarization is increasingly becoming an issue in today's society, producing both unrest at the societal level, and conflict within small scale communications between people of opposite opinion. Often, opinion polarization is conceptualized as the direct opposite of agreement and consequently operationalized as an index of dispersion. However, in doing so, researchers fail to account for the bimodality that is characteristic of a polarized opinion distribution. A valid measurement of opinion polarization would enable us to predict when, and on what issues conflict may arise. The current study is aimed at developing and validating a new index of opinion polarization. The weights of this index were derived from utilizing the knowledge of 58 international experts on polarization through an expert survey. The resulting Opinion Polarization Index predicted expert polarization scores in opinion distributions better than common measures of polarization, such as the standard deviation, Van der Eijk's polarization measure and Esteban and Ray's polarization index. We reflect on the use of expert ratings for the development of measurements in this case, and more in general.

7.
Br J Math Stat Psychol ; 74(3): 541-566, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-33629738

RESUMO

Principal covariate regression (PCOVR) is a method for regressing a set of criterion variables with respect to a set of predictor variables when the latter are many in number and/or collinear. This is done by extracting a limited number of components that simultaneously synthesize the predictor variables and predict the criterion ones. So far, no procedure has been offered for estimating statistical uncertainties of the obtained PCOVR parameter estimates. The present paper shows how this goal can be achieved, conditionally on the model specification, by means of the bootstrap approach. Four strategies for estimating bootstrap confidence intervals are derived and their statistical behaviour in terms of coverage is assessed by means of a simulation experiment. Such strategies are distinguished by the use of the varimax and quartimin procedures and by the use of Procrustes rotations of bootstrap solutions towards the sample solution. In general, the four strategies showed appropriate statistical behaviour, with coverage tending to the desired level for increasing sample sizes. The main exception involved strategies based on the quartimin procedure in cases characterized by complex underlying structures of the components. The appropriateness of the statistical behaviour was higher when the proper number of components were extracted.


Assuntos
Projetos de Pesquisa , Simulação por Computador , Intervalos de Confiança , Tamanho da Amostra
8.
Behav Res Methods ; 53(4): 1648-1668, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-33420716

RESUMO

Principal covariates regression (PCovR) allows one to deal with the interpretational and technical problems associated with running ordinary regression using many predictor variables. In PCovR, the predictor variables are reduced to a limited number of components, and simultaneously, criterion variables are regressed on these components. By means of a weighting parameter, users can flexibly choose how much they want to emphasize reconstruction and prediction. However, when datasets contain many criterion variables, PCovR users face new interpretational problems, because many regression weights will be obtained and because some criteria might be unrelated to the predictors. We therefore propose PCovR2, which extends PCovR by also reducing the criteria to a few components. These criterion components are predicted based on the predictor components. The PCovR2 weighting parameter can again be flexibly used to focus on the reconstruction of the predictors and criteria, or on filtering out relevant predictor components and predictable criterion components. We compare PCovR2 to two other approaches, based on partial least squares (PLS) and principal components regression (PCR), that also reduce the criteria and are therefore called PLS2 and PCR2. By means of a simulated example, we show that PCovR2 outperforms PLS2 and PCR2 when one aims to recover all relevant predictor components and predictable criterion components. Moreover, we conduct a simulation study to evaluate how well PCovR2, PLS2 and PCR2 succeed in finding (1) all underlying components and (2) the subset of relevant predictor and predictable criterion components. Finally, we illustrate the use of PCovR2 by means of empirical data.


Assuntos
Análise dos Mínimos Quadrados , Simulação por Computador , Humanos
9.
R Soc Open Sci ; 7(4): 181351, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32431853

RESUMO

The crisis of confidence has undermined the trust that researchers place in the findings of their peers. In order to increase trust in research, initiatives such as preregistration have been suggested, which aim to prevent various questionable research practices. As it stands, however, no empirical evidence exists that preregistration does increase perceptions of trust. The picture may be complicated by a researcher's familiarity with the author of the study, regardless of the preregistration status of the research. This registered report presents an empirical assessment of the extent to which preregistration increases the trust of 209 active academics in the reported outcomes, and how familiarity with another researcher influences that trust. Contrary to our expectations, we report ambiguous Bayes factors and conclude that we do not have strong evidence towards answering our research questions. Our findings are presented along with evidence that our manipulations were ineffective for many participants, leading to the exclusion of 68% of complete datasets, and an underpowered design as a consequence. We discuss other limitations and confounds which may explain why the findings of the study deviate from a previously conducted pilot study. We reflect on the benefits of using the registered report submission format in light of our results. The OSF page for this registered report and its pilot can be found here: http://dx.doi.org/10.17605/OSF.IO/B3K75.

10.
Eur J Psychotraumatol ; 10(1): 1698223, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31853334

RESUMO

Background: The diagnosis of complex posttraumatic stress disorder (CPTSD) has been suggested for inclusion in the 11th version of the International Classification of Diseases (ICD-11), with support for its construct validity coming from studies employing Latent Class Analysis (LCA) and Latent Profile Analysis (LPA). Objective: The current study aimed to critically evaluate the application of the techniques LCA and LPA as applied in previous studies to substantiate the construct validity of CPTSD. Method: Both LCA and LPA were applied systematically in one sample (n = 245), replicating the setup of previous studies as closely as possible. The interpretation of classes was augmented with the use of graphical visualization. Results: The LCA and LPA analyses indicated divergent results in the same dataset. LCA and LPA partially supported the existence of classes of patients endorsing different PTSD and CPTSD symptom patterns. However, further inspection of the results with scatterplots did not support a clear distinction between PTSD and CPTSD, but rather suggested that there is much greater variability in clinical presentations amongst adult PTSD patients than can be fully accounted for by either PTSD or CPTSD. Discussion: We argue that LCA and LPA may not be sufficient methods to decide on the construct validity of CPTSD, as different subgroups of patients are identified, depending on the statistical exact method used and the interpretation of the fit of different models. Additional methods, including graphical inspection should be employed in future studies.


Antecedentes: El diagnóstico de Trastorno por Estrés Postraumático Complejo (TEPTC) ha sido sugerido para su inclusión en la 11ª versión de la Clasificación Internacional de Enfermedades (CIE-11), con el respaldo de su validez de constructo proveniente de estudios que emplean Análisis de Clases Latentes (LCA) y Análisis de Perfil Latente (APL).Objetivo: El presente estudio tuvo como objetivo evaluar críticamente la aplicación de las técnicas LCA y APL, utilizadas en estudios anteriores, para corroborar la validez de constructo del TEPTC.Método: Se aplicaron sistemáticamente, tanto la técnica LCA como la técnica APL, en una muestra (n = 245), que buscó replicar lo más fielmente posible las configuraciones empleadas en estudios previos. La interpretación de las clases se potenció con el uso de visualización gráfica.Resultados: Los análisis LCA y APL indicaron resultados divergentes en el mismo conjunto de datos. LCA y APL apoyaron parcialmente la existencia de clases de pacientes que validan diferentes patrones de síntomas para el TEPT y el TEPTC. Sin embargo, una mayor inspección de los resultados con diagramas de dispersión no respaldó una distinción clara entre el TEPT y el TEPTC, sino que sugirieron que existe una variabilidad mucho mayor en las presentaciones clínicas entre los pacientes adultos con TEPT de lo que pueda explicarse ya sea por el TEPT o el TEPTC.Discusión: Proponemos que los análisis LCA y APL pueden ser métodos insuficientes para decidir sobre la validez de constructo del TEPTC, ya que se identifican diferentes subgrupos de pacientes, que depende del método estadístico utilizado y la interpretación del ajuste de diferentes modelos. En futuros estudios deben emplearse métodos adicionales que incluyan la inspección gráfica.

11.
Psychol Methods ; 24(6): 774-795, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31094544

RESUMO

Null hypothesis significance testing (NHST) has been under scrutiny for decades. The literature shows overwhelming evidence of a large range of problems affecting NHST. One of the proposed alternatives to NHST is using Bayes factors instead of p values. Here we denote the method of using Bayes factors to test point null models as "null hypothesis Bayesian testing" (NHBT). In this article we offer a wide overview of potential issues (limitations or sources of misinterpretation) with NHBT which is currently missing in the literature. We illustrate many of the shortcomings of NHBT by means of reproducible examples. The article concludes with a discussion of NHBT in particular and testing in general. In particular, we argue that posterior model probabilities should be given more emphasis than Bayes factors, because only the former provide direct answers to the most common research questions under consideration. (PsycINFO Database Record (c) 2019 APA, all rights reserved).


Assuntos
Interpretação Estatística de Dados , Modelos Estatísticos , Probabilidade , Projetos de Pesquisa , Humanos
12.
Educ Psychol Meas ; 79(3): 558-576, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-31105323

RESUMO

Cohen's kappa coefficient is commonly used for assessing agreement between classifications of two raters on a nominal scale. Three variants of Cohen's kappa that can handle missing data are presented. Data are considered missing if one or both ratings of a unit are missing. We study how well the variants estimate the kappa value for complete data under two missing data mechanisms-namely, missingness completely at random and a form of missingness not at random. The kappa coefficient considered in Gwet (Handbook of Inter-rater Reliability, 4th ed.) and the kappa coefficient based on listwise deletion of units with missing ratings were found to have virtually no bias and mean squared error if missingness is completely at random, and small bias and mean squared error if missingness is not at random. Furthermore, the kappa coefficient that treats missing ratings as a regular category appears to be rather heavily biased and has a substantial mean squared error in many of the simulations. Because it performs well and is easy to compute, we recommend to use the kappa coefficient that is based on listwise deletion of missing ratings if it can be assumed that missingness is completely at random or not at random.

13.
Bioinformatics ; 34(17): i988-i996, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30423084

RESUMO

Motivation: In biology, we are often faced with multiple datasets recorded on the same set of objects, such as multi-omics and phenotypic data of the same tumors. These datasets are typically not independent from each other. For example, methylation may influence gene expression, which may, in turn, influence drug response. Such relationships can strongly affect analyses performed on the data, as we have previously shown for the identification of biomarkers of drug response. Therefore, it is important to be able to chart the relationships between datasets. Results: We present iTOP, a methodology to infer a topology of relationships between datasets. We base this methodology on the RV coefficient, a measure of matrix correlation, which can be used to determine how much information is shared between two datasets. We extended the RV coefficient for partial matrix correlations, which allows the use of graph reconstruction algorithms, such as the PC algorithm, to infer the topologies. In addition, since multi-omics data often contain binary data (e.g. mutations), we also extended the RV coefficient for binary data. Applying iTOP to pharmacogenomics data, we found that gene expression acts as a mediator between most other datasets and drug response: only proteomics clearly shares information with drug response that is not present in gene expression. Based on this result, we used TANDEM, a method for drug response prediction, to identify which variables predictive of drug response were distinct to either gene expression or proteomics. Availability and implementation: An implementation of our methodology is available in the R package iTOP on CRAN. Additionally, an R Markdown document with code to reproduce all figures is provided as Supplementary Material. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Proteômica , Algoritmos , Humanos , Neoplasias/genética
14.
Front Psychol ; 9: 564, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29743874

RESUMO

Four human values are considered to underlie individuals' environmental beliefs and behaviors: biospheric (i.e., concern for environment), altruistic (i.e., concern for others), egoistic (i.e., concern for personal resources) and hedonic values (i.e., concern for pleasure and comfort). These values are typically measured with an adapted and shortened version of the Schwartz Value Survey (SVS), to which we refer as the Environmental-SVS (E-SVS). Despite being well-validated, recent research has indicated some concerns about the SVS methodology (e.g., comprehensibility, self-presentation biases) and suggested an alternative method of measuring human values: The Portrait Value Questionnaire (PVQ). However, the PVQ has not yet been adapted and applied to measure values most relevant to understand environmental beliefs and behaviors. Therefore, we tested the Environmental-PVQ (E-PVQ) - a PVQ variant of E-SVS -and compared it with the E-SVS in two studies. Our findings provide strong support for the validity and reliability of both the E-SVS and E-PVQ. In addition, we find that respondents slightly preferred the E-PVQ over the E-SVS (Study 1). In general, both scales correlate similarly to environmental self-identity (Study 1), energy behaviors (Studies 1 and 2), pro-environmental personal norms, climate change beliefs and policy support (Study 2). Accordingly, both methodologies show highly similar results and seem well-suited for measuring human values underlying environmental behaviors and beliefs.

15.
Stat Med ; 37(1): 137-156, 2018 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-29023942

RESUMO

In many situations, a researcher is interested in the analysis of the scores of a set of observation units on a set of variables. However, in medicine, it is very frequent that the information is replicated at different occasions. The occasions can be time-varying or refer to different conditions. In such cases, the data can be stored in a 3-way array or tensor. The Candecomp/Parafac and Tucker3 methods represent the most common methods for analyzing 3-way tensors. In this work, a review of these methods is provided, and then this class of methods is applied to a 3-way data set concerning hospital care data for a hospital in Rome (Italy) during 15 years distinguished in 3 groups of consecutive years (1892-1896, 1940-1944, 1968-1972). The analysis reveals some peculiar aspects about the use of health services and its evolution along the time.


Assuntos
Bioestatística/métodos , Serviços de Saúde/estatística & dados numéricos , Registros Hospitalares/estatística & dados numéricos , Interpretação Estatística de Dados , Bases de Dados Factuais/estatística & dados numéricos , Humanos , Modelos Estatísticos , Análise de Componente Principal/métodos , Cidade de Roma , Software
17.
Psychometrika ; 82(1): 86-111, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-27905056

RESUMO

In the behavioral sciences, many research questions pertain to a regression problem in that one wants to predict a criterion on the basis of a number of predictors. Although in many cases, ordinary least squares regression will suffice, sometimes the prediction problem is more challenging, for three reasons: first, multiple highly collinear predictors can be available, making it difficult to grasp their mutual relations as well as their relations to the criterion. In that case, it may be very useful to reduce the predictors to a few summary variables, on which one regresses the criterion and which at the same time yields insight into the predictor structure. Second, the population under study may consist of a few unknown subgroups that are characterized by different regression models. Third, the obtained data are often hierarchically structured, with for instance, observations being nested into persons or participants within groups or countries. Although some methods have been developed that partially meet these challenges (i.e., principal covariates regression (PCovR), clusterwise regression (CR), and structural equation models), none of these methods adequately deals with all of them simultaneously. To fill this gap, we propose the principal covariates clusterwise regression (PCCR) method, which combines the key idea's behind PCovR (de Jong & Kiers in Chemom Intell Lab Syst 14(1-3):155-164, 1992) and CR (Späth in Computing 22(4):367-373, 1979). The PCCR method is validated by means of a simulation study and by applying it to cross-cultural data regarding satisfaction with life.


Assuntos
Cultura , Modelos Estatísticos , Satisfação Pessoal , Estatística como Assunto , Análise por Conglomerados , Humanos , Análise dos Mínimos Quadrados , Modelos Lineares , Psicometria , Análise de Regressão , Inquéritos e Questionários
18.
Behav Res Methods ; 48(3): 1008-20, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-26170054

RESUMO

MultiLevel Simultaneous Component Analysis (MLSCA) is a data-analytical technique for multivariate two-level data. MLSCA sheds light on the associations between the variables at both levels by specifying separate submodels for each level. Each submodel consists of a component model. Although MLSCA has already been successfully applied in diverse areas within and outside the behavioral sciences, its use is hampered by two issues. First, as MLSCA solutions are fitted by means of iterative algorithms, analyzing large data sets (i.e., data sets with many level one units) may take a lot of computation time. Second, easily accessible software for estimating MLSCA models is lacking so far. In this paper, we address both issues. Specifically, we discuss a computational shortcut for MLSCA fitting. Moreover, we present the MLSCA package, which was built in MATLAB, but is also available in a version that can be used on any Windows computer, without having MATLAB installed.


Assuntos
Análise de Componente Principal , Software , Algoritmos , Análise de Variância , Interpretação Estatística de Dados , Humanos , Modelos Psicológicos , Modelos Estatísticos
19.
PLoS One ; 7(5): e37840, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22693578

RESUMO

BACKGROUND: In systems biology it is common to obtain for the same set of biological entities information from multiple sources. Examples include expression data for the same set of orthologous genes screened in different organisms and data on the same set of culture samples obtained with different high-throughput techniques. A major challenge is to find the important biological processes underlying the data and to disentangle therein processes common to all data sources and processes distinctive for a specific source. Recently, two promising simultaneous data integration methods have been proposed to attain this goal, namely generalized singular value decomposition (GSVD) and simultaneous component analysis with rotation to common and distinctive components (DISCO-SCA). RESULTS: Both theoretical analyses and applications to biologically relevant data show that: (1) straightforward applications of GSVD yield unsatisfactory results, (2) DISCO-SCA performs well, (3) provided proper pre-processing and algorithmic adaptations, GSVD reaches a performance level similar to that of DISCO-SCA, and (4) DISCO-SCA is directly generalizable to more than two data sources. The biological relevance of DISCO-SCA is illustrated with two applications. First, in a setting of comparative genomics, it is shown that DISCO-SCA recovers a common theme of cell cycle progression and a yeast-specific response to pheromones. The biological annotation was obtained by applying Gene Set Enrichment Analysis in an appropriate way. Second, in an application of DISCO-SCA to metabolomics data for Escherichia coli obtained with two different chemical analysis platforms, it is illustrated that the metabolites involved in some of the biological processes underlying the data are detected by one of the two platforms only; therefore, platforms for microbial metabolomics should be tailored to the biological question. CONCLUSIONS: Both DISCO-SCA and properly applied GSVD are promising integrative methods for finding common and distinctive processes in multisource data. Open source code for both methods is provided.


Assuntos
Biologia Computacional/métodos , Estatística como Assunto/métodos , Escherichia coli/metabolismo , Perfilação da Expressão Gênica , Genômica , Metabolômica , Saccharomyces cerevisiae/genética
20.
Front Psychol ; 3: 137, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22593746

RESUMO

A valid interpretation of most statistical techniques requires that one or more assumptions be met. In published articles, however, little information tends to be reported on whether the data satisfy the assumptions underlying the statistical techniques used. This could be due to self-selection: Only manuscripts with data fulfilling the assumptions are submitted. Another explanation could be that violations of assumptions are rarely checked for in the first place. We studied whether and how 30 researchers checked fictitious data for violations of assumptions in their own working environment. Participants were asked to analyze the data as they would their own data, for which often used and well-known techniques such as the t-procedure, ANOVA and regression (or non-parametric alternatives) were required. It was found that the assumptions of the techniques were rarely checked, and that if they were, it was regularly by means of a statistical test. Interviews afterward revealed a general lack of knowledge about assumptions, the robustness of the techniques with regards to the assumptions, and how (or whether) assumptions should be checked. These data suggest that checking for violations of assumptions is not a well-considered choice, and that the use of statistics can be described as opportunistic.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA