RESUMEN
BACKGROUND: Specialized versus generic physiotherapy (PT) reduces Parkinson's disease (PD)-related complications. It is unclear (1) whether other specialized allied heath disciplines, including occupational therapy (OT) and speech and language therapy (S<), also reduce complications; (2) whether there is a synergistic effect among multiple specialized disciplines; and (3) whether each allied health discipline prevents specific complications. OBJECTIVES: To longitudinally assessed whether the level of expertise (specialized vs. generic training) of PT, OT, and S< was associated with the incidence rate of PD-related complications. METHODS: We used claims data of all insured persons with PD in the Netherlands between January 1, 2010, and December 31, 2018. ParkinsonNet-trained therapists were classified as specialized, and other therapists as generic. We used mixed-effects Poisson regression models to estimate rate ratios adjusting for sociodemographic and clinical characteristics. RESULTS: The population of 51,464 persons with PD (mean age, 72.4 years; standard deviation 9.8) sustained 10,525 PD-related complications during follow-up (median 3.3 years). Specialized PT was associated with fewer complications (incidence rate ratio [IRR] of specialized versus generic = 0.79; 95% confidence interval, [0.74-0.83]; P < 0.0001), as was specialized OT (IRR = 0.88 [0.77-0.99]; P = 0.03). We found a trend of an association between specialized S< and a lower rate of PD-related complications (IRR = 0.88 [0.74-1.04]; P = 0.18). The inverse association of specialized OT persisted in the stratum, which also received specialized PT (IRR = 0.62 [0.42-0.90]; P = 0.001). The strongest inverse association of PT was seen with orthopedic injuries (IRR = 0.78 [0.73-0.82]; P < 0.0001) and of S< with pneumonia (IRR = 0.70 [0.53-0.93]; P = 0.03). CONCLUSIONS: These findings support a wider introduction of specialized allied health therapy expertise in PD care and conceivably for other medical conditions. © 2022 The Authors. Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.
Asunto(s)
Enfermedad de Parkinson , Humanos , Anciano , Enfermedad de Parkinson/complicaciones , Logopedia , Modalidades de Fisioterapia , Países BajosRESUMEN
BACKGROUND: Remote smartphone-based 2-minute walking tests (s2MWTs) allow frequent and potentially sensitive measurements of ambulatory function. OBJECTIVE: To investigate the s2MWT on assessment of, and responsiveness to change in ambulatory function in MS. METHODS: One hundred two multiple sclerosis (MS) patients and 24 healthy controls (HCs) performed weekly s2MWTs on self-owned smartphones for 12 and 3 months, respectively. The timed 25-foot walk test (T25FW) and Expanded Disability Status Scale (EDSS) were assessed at 3-month intervals. Anchor-based (using T25FW and EDSS) and distribution-based (curve fitting) methods were used to assess responsiveness of the s2MWT. A local linear trend model was used to fit weekly s2MWT scores of individual patients. RESULTS: A total of 4811 and 355 s2MWT scores were obtained in patients (n = 94) and HC (n = 22), respectively. s2MWT demonstrated large variability (65.6 m) compared to the average score (129.5 m), and was inadequately responsive to anchor-based change in clinical outcomes. Curve fitting separated the trend from noise in high temporal resolution individual-level data, and statistically reliable changes were detected in 45% of patients. CONCLUSIONS: In group-level analyses, clinically relevant change was insufficiently detected due to large variability with sporadic measurements. Individual-level curve fitting reduced the variability in s2MWT, enabling the detection of statistically reliable change in ambulatory function.
Asunto(s)
Esclerosis Múltiple , Humanos , Esclerosis Múltiple/diagnóstico , Teléfono Inteligente , Prueba de Paso , Caminata , Evaluación de la DiscapacidadRESUMEN
Rasch analysis is a procedure to develop and validate instruments that aim to measure a person's traits. However, manual Rasch analysis is a complex and time-consuming task, even more so when the possibility of differential item functioning (DIF) is taken into consideration. Furthermore, manual Rasch analysis by construction relies on a modeler's subjective choices. As an alternative approach, we introduce a semi-automated procedure that is based on the optimization of a new criterion, called in-plus-out-of-questionnaire log likelihood with differential item functioning (IPOQ-LL-DIF), which extends our previous criterion. We illustrate our procedure on artificially generated data as well as on several real-world datasets containing potential DIF items. On these real-world datasets, our procedure found instruments with similar clinimetric properties as those suggested by experts through manual analyses.
Asunto(s)
Psicometría , Humanos , Psicometría/métodos , Encuestas y Cuestionarios , Probabilidad , Reproducibilidad de los ResultadosRESUMEN
BACKGROUND: Understanding the synergetic and antagonistic effects of combinations of drugs and toxins is vital for many applications, including treatment of multifactorial diseases and ecotoxicological monitoring. Synergy is usually assessed by comparing the response of drug combinations to a predicted non-interactive response from reference (null) models. Possible choices of null models are Loewe additivity, Bliss independence and the recently rediscovered Hand model. A different approach is taken by the MuSyC model, which directly fits a generalization of the Hill model to the data. All of these models, however, fit the dose-response relationship with a parametric model. RESULTS: We propose the Hand-GP model, a non-parametric model based on the combination of the Hand model with Gaussian processes. We introduce a new logarithmic squared exponential kernel for the Gaussian process which captures the logarithmic dependence of response on dose. From the monotherapeutic response and the Hand principle, we construct a null reference response and synergy is assessed from the difference between this null reference and the Gaussian process fitted response. Statistical significance of the difference is assessed from the confidence intervals of the Gaussian process fits. We evaluate performance of our model on a simulated data set from Greco, two simulated data sets of our own design and two benchmark data sets from Chou and Talalay. We compare the Hand-GP model to standard synergy models and show that our model performs better on these data sets. We also compare our model to the MuSyC model as an example of a recent method on these five data sets and on two-drug combination screens: Mott et al. anti-malarial screen and O'Neil et al. anti-cancer screen. We identify cases in which the HandGP model is preferred and cases in which the MuSyC model is preferred. CONCLUSION: The Hand-GP model is a flexible model to capture synergy. Its non-parametric and probabilistic nature allows it to model a wide variety of response patterns.
RESUMEN
The rapid increase in loci discovered in genome-wide association studies has created a need to understand the biological implications of these results. Gene-set analysis provides a means of gaining such understanding, but the statistical properties of gene-set analysis are not well understood, which compromises our ability to interpret its results. In this Analysis article, we provide an extensive statistical evaluation of the core structure that is inherent to all gene- set analyses and we examine current implementations in available tools. We show which factors affect valid and successful detection of gene sets and which provide a solid foundation for performing and interpreting gene-set analysis.
Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Modelos Estadísticos , Polimorfismo de Nucleótido Simple/genética , Programas Informáticos , Algoritmos , HumanosRESUMEN
BACKGROUND: Both patients and physicians may choose to delay initiation of dopamine replacement therapy in Parkinson's disease (PD) for various reasons. We used observational data to estimate the effect of earlier treatment in PD. Observational data offer a valuable source of evidence, complementary to controlled trials. METHOD: We studied the Parkinson's Progression Markers Initiative cohort of patients with de novo PD to estimate the effects of duration of PD treatment during the first 2 years of follow-up, exploiting natural interindividual variation in the time to start first treatment. We estimated the Movement Disorder Society-Unified Parkinson's Disease Rating Scale (MDS-UPDRS) Part III (primary outcome) and several functionally relevant outcomes at 2, 3, and 4 years after baseline. To adjust for time-varying confounding, we used marginal structural models with inverse probability of treatment weighting and the parametric g-formula. RESULTS: We included 302 patients from the Parkinson's Progression Markers Initiative cohort. There was a small improvement in MDS-UPDRS Part III scores after 2 years of follow-up for patients who started treatment earlier, and similar, but nonstatistically significant, differences in subsequent years. We found no statistically significant differences in most secondary outcomes, including the presence of motor fluctuations, nonmotor symptoms, MDS-UPDRS Part II scores, and the Schwab and England Activities of Daily Living Scale. CONCLUSION: Earlier treatment initiation does not lead to worse MDS-UPDRS motor scores and may offer small improvements. These findings, based on observational data, are in line with earlier findings from clinical trials. Observational data, when combined with appropriate causal methods, are a valuable source of additional evidence to support real-world clinical decisions. © 2020 The Authors. Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.
Asunto(s)
Enfermedad de Parkinson , Actividades Cotidianas , Estudios de Cohortes , Progresión de la Enfermedad , Inglaterra , Humanos , Enfermedad de Parkinson/tratamiento farmacológico , Índice de Severidad de la EnfermedadRESUMEN
OBJECTIVE: Fatigue is a common symptom among cancer survivors that can be successfully treated with cognitive-behavioral therapy (CBT). Insights into the working mechanisms of CBT are currently limited. The aim of this study was to investigate whether improvements in targeted cognitive-behavioral variables and reduced depressive symptoms mediate the fatigue-reducing effect of CBT. METHODS: We pooled data from three randomized controlled trials that tested the efficacy of CBT to reduce severe fatigue. In all three trials, fatigue severity (checklist individual strength) decreased significantly following CBT. Assessments were conducted pre-treatment and 6 months later. Classical mediation analysis testing a pre-specified model was conducted and its results compared to those of causal discovery, an explorative data-driven approach testing all possible causal associations and retaining the most likely model. RESULTS: Data from 250 cancer survivors (n = 129 CBT, n = 121 waitlist) were analyzed. Classical mediation analysis suggests that increased self-efficacy and decreased fatigue catastrophizing, focusing on symptoms, perceived problems with activity and depressive symptoms mediate the reduction of fatigue brought by CBT. Conversely, causal discovery and post-hoc analyses indicate that fatigue acts as mediator, not outcome, of changes in cognitions, sleep disturbance and depressive symptoms. CONCLUSIONS: Cognitions, sleep disturbance and depressive symptoms improve during CBT. When assessed pre- and post-treatment, fatigue acts as a mediator, not outcome, of these improvements. It seems likely that the working mechanism of CBT is not a one-way causal effect but a dynamic reciprocal process. Trials integrating intermittent assessments are needed to shed light on these mechanisms and inform optimization of CBT.
Asunto(s)
Supervivientes de Cáncer , Terapia Cognitivo-Conductual , Neoplasias , Depresión/terapia , Fatiga/terapia , Humanos , Neoplasias/terapia , Ensayos Clínicos Controlados Aleatorios como Asunto , Resultado del TratamientoRESUMEN
BACKGROUND: A debilitating late effect for childhood cancer survivors (CCS) is cancer-related fatigue (CRF). Little is known about the prevalence and risk factors of fatigue in this population. Here we describe the methodology of the Dutch Childhood Cancer Survivor Late Effect Study on fatigue (DCCSS LATER fatigue study). The aim of the DCCSS LATER fatigue study is to examine the prevalence of and factors associated with CRF, proposing a model which discerns predisposing, triggering, maintaining and moderating factors. Triggering factors are related to the cancer diagnosis and treatment during childhood and are thought to trigger fatigue symptoms. Maintaining factors are daily life- and psychosocial factors which may perpetuate fatigue once triggered. Moderating factors might influence the way fatigue symptoms express in individuals. Predisposing factors already existed before the diagnosis, such as genetic factors, and are thought to increase the vulnerability to develop fatigue. Methodology of the participant inclusion, data collection and planned analyses of the DCCSS LATER fatigue study are presented. RESULTS: Data of 1955 CCS and 455 siblings was collected. Analysis of the data is planned and we aim to start reporting the first results in 2022. CONCLUSION: The DCCSS LATER fatigue study will provide information on the epidemiology of CRF and investigate the role of a broad range of associated factors in CCS. Insight in associated factors for fatigue in survivors experiencing severe and persistent fatigue may help identify individuals at risk for developing CRF and may aid in the development of interventions.
Asunto(s)
Supervivientes de Cáncer , Síndrome de Fatiga Crónica , Neoplasias , Niño , Síndrome de Fatiga Crónica/diagnóstico , Síndrome de Fatiga Crónica/epidemiología , Síndrome de Fatiga Crónica/etiología , Humanos , Neoplasias/complicaciones , Neoplasias/epidemiología , Calidad de Vida , Factores de Riesgo , SobrevivientesRESUMEN
Similar to natural complex systems, such as the Earth's climate or a living cell, semiconductor lithography systems are characterized by nonlinear dynamics across more than a dozen orders of magnitude in space and time. Thousands of sensors measure relevant process variables at appropriate sampling rates, to provide time series as primary sources for system diagnostics. However, high-dimensionality, non-linearity and non-stationarity of the data are major challenges to efficiently, yet accurately, diagnose rare or new system issues by merely using model-based approaches. To reliably narrow down the causal search space, we validate a ranking algorithm that applies transfer entropy for bivariate interaction analysis of a system's multivariate time series to obtain a weighted directed graph, and graph eigenvector centrality to identify the system's most important sources of original information or causal influence. The results suggest that this approach robustly identifies the true drivers or causes of a complex system's deviant behavior, even when its reconstructed information transfer network includes redundant edges.
RESUMEN
Computational modeling plays an important role in modern neuroscience research. Much previous research has relied on statistical methods, separately, to address two problems that are actually interdependent. First, given a particular computational model, Bayesian hierarchical techniques have been used to estimate individual variation in parameters over a population of subjects, leveraging their population-level distributions. Second, candidate models are themselves compared, and individual variation in the expressed model estimated, according to the fits of the models to each subject. The interdependence between these two problems arises because the relevant population for estimating parameters of a model depends on which other subjects express the model. Here, we propose a hierarchical Bayesian inference (HBI) framework for concurrent model comparison, parameter estimation and inference at the population level, combining previous approaches. We show that this framework has important advantages for both parameter estimation and model comparison theoretically and experimentally. The parameters estimated by the HBI show smaller errors compared to other methods. Model comparison by HBI is robust against outliers and is not biased towards overly simplistic models. Furthermore, the fully Bayesian approach of our theory enables researchers to make inference on group-level parameters by performing HBI t-test.
Asunto(s)
Teorema de Bayes , Biología Computacional/métodos , Modelos Neurológicos , Simulación por Computador , Toma de Decisiones/fisiología , Humanos , Aprendizaje/fisiologíaRESUMEN
BACKGROUND: Wearable sensors have been used successfully to characterize bradykinetic gait in patients with Parkinson disease (PD), but most studies to date have been conducted in highly controlled laboratory environments. OBJECTIVE: This paper aims to assess whether sensor-based analysis of real-life gait can be used to objectively and remotely monitor motor fluctuations in PD. METHODS: The Parkinson@Home validation study provides a new reference data set for the development of digital biomarkers to monitor persons with PD in daily life. Specifically, a group of 25 patients with PD with motor fluctuations and 25 age-matched controls performed unscripted daily activities in and around their homes for at least one hour while being recorded on video. Patients with PD did this twice: once after overnight withdrawal of dopaminergic medication and again 1 hour after medication intake. Participants wore sensors on both wrists and ankles, on the lower back, and in the front pants pocket, capturing movement and contextual data. Gait segments of 25 seconds were extracted from accelerometer signals based on manual video annotations. The power spectral density of each segment and device was estimated using Welch's method, from which the total power in the 0.5- to 10-Hz band, width of the dominant frequency, and cadence were derived. The ability to discriminate between before and after medication intake and between patients with PD and controls was evaluated using leave-one-subject-out nested cross-validation. RESULTS: From 18 patients with PD (11 men; median age 65 years) and 24 controls (13 men; median age 68 years), ≥10 gait segments were available. Using logistic LASSO (least absolute shrinkage and selection operator) regression, we classified whether the unscripted gait segments occurred before or after medication intake, with mean area under the receiver operator curves (AUCs) varying between 0.70 (ankle of least affected side, 95% CI 0.60-0.81) and 0.82 (ankle of most affected side, 95% CI 0.72-0.92) across sensor locations. Combining all sensor locations did not significantly improve classification (AUC 0.84, 95% CI 0.75-0.93). Of all signal properties, the total power in the 0.5- to 10-Hz band was most responsive to dopaminergic medication. Discriminating between patients with PD and controls was generally more difficult (AUC of all sensor locations combined: 0.76, 95% CI 0.62-0.90). The video recordings revealed that the positioning of the hands during real-life gait had a substantial impact on the power spectral density of both the wrist and pants pocket sensor. CONCLUSIONS: We present a new video-referenced data set that includes unscripted activities in and around the participants' homes. Using this data set, we show the feasibility of using sensor-based analysis of real-life gait to monitor motor fluctuations with a single sensor location. Future work may assess the value of contextual sensors to control for real-world confounders.
Asunto(s)
Marcha/fisiología , Monitoreo Fisiológico/métodos , Trastornos Motores/diagnóstico , Enfermedad de Parkinson/complicaciones , Dispositivos Electrónicos Vestibles/normas , Anciano , Femenino , Humanos , Masculino , Trastornos Motores/etiologíaRESUMEN
Motivation: Computational models in biology are frequently underdetermined, due to limits in our capacity to measure biological systems. In particular, mechanistic models often contain parameters whose values are not constrained by a single type of measurement. It may be possible to achieve better model determination by combining the information contained in different types of measurements. Bayesian statistics provides a convenient framework for this, allowing a quantification of the reduction in uncertainty with each additional measurement type. We wished to explore whether such integration is feasible and whether it can allow computational models to be more accurately determined. Results: We created an ordinary differential equation model of cell cycle regulation in budding yeast and integrated data from 13 different studies covering different experimental techniques. We found that for some parameters, a single type of measurement, relative time course mRNA expression, is sufficient to constrain them. Other parameters, however, were only constrained when two types of measurements were combined, namely relative time course and absolute transcript concentration. Comparing the estimates to measurements from three additional, independent studies, we found that the degradation and transcription rates indeed matched the model predictions in order of magnitude. The predicted translation rate was incorrect however, thus revealing a deficiency in the model. Since this parameter was not constrained by any of the measurement types separately, it was only possible to falsify the model when integrating multiple types of measurements. In conclusion, this study shows that integrating multiple measurement types can allow models to be more accurately determined. Availability and implementation: The models and files required for running the inference are included in the Supplementary information. Contact: l.wessels@nki.nl. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biología Computacional/métodos , Modelos Biológicos , Teorema de Bayes , Saccharomycetales/genética , Saccharomycetales/metabolismoRESUMEN
BACKGROUND: An important challenge in Parkinson's disease research is how to measure disease progression, ideally at the individual patient level. The MDS-UPDRS, a clinical assessment of motor and nonmotor impairments, is widely used in longitudinal studies. However, its ability to assess within-subject changes is not well known. The objective of this study was to estimate the reliability of the MDS-UPDRS when used to measure within-subject changes in disease progression under real-world conditions. METHODS: Data were obtained from the Parkinson's Progression Markers Initiative cohort and included repeated MDS-UPDRS measurements from 423 de novo Parkinson's disease patients (median follow-up: 54 months). Subtotals were calculated for parts I, II, and III (in on and off states). In addition, factor scores were extracted from each part. A linear Gaussian state space model was used to differentiate variance introduced by long-lasting changes from variance introduced by measurement error and short-term fluctuations. Based on this, we determined the within-subject reliability of 1-year change scores. RESULTS: Overall, the within-subject reliability ranged from 0.13 to 0.62. Of the subscales, parts II and III (OFF) demonstrated the highest within-subject reliability (both 0.50). Of the factor scores, the scores related to gait/posture (0.62), mobility (0.45), and rest tremor (0.43) showed the most consistent behavior. CONCLUSIONS: Our results highlight that MDS-UPDRS change scores contain a substantial amount of error variance, underscoring the need for more reliable instruments to forward our understanding of the heterogeneity in PD progression. Focusing on gait and rest tremor may be a promising approach for an early Parkinson's disease population. © 2019 The Authors. Movement Disorders published by Wiley Periodicals, Inc. on behalf of International Parkinson and Movement Disorder Society.
Asunto(s)
Evaluación de la Discapacidad , Enfermedad de Parkinson/diagnóstico , Enfermedad de Parkinson/terapia , Temblor/fisiopatología , Temblor/terapia , Adulto , Anciano , Progresión de la Enfermedad , Femenino , Humanos , Modelos Lineales , Estudios Longitudinales , Masculino , Persona de Mediana Edad , Enfermedad de Parkinson/fisiopatología , Temblor/diagnósticoRESUMEN
MOTIVATION: The Rank Product (RP) is a statistical technique widely used to detect differentially expressed features in molecular profiling experiments such as transcriptomics, metabolomics and proteomics studies. An implementation of the RP and the closely related Rank Sum (RS) statistics has been available in the RankProd Bioconductor package for several years. However, several recent advances in the understanding of the statistical foundations of the method have made a complete refactoring of the existing package desirable. RESULTS: We implemented a completely refactored version of the RankProd package, which provides a more principled implementation of the statistics for unpaired datasets. Moreover, the permutation-based P -value estimation methods have been replaced by exact methods, providing faster and more accurate results. AVAILABILITY AND IMPLEMENTATION: RankProd 2.0 is available at Bioconductor ( https://www.bioconductor.org/packages/devel/bioc/html/RankProd.html ) and as part of the mzMatch pipeline ( http://www.mzmatch.sourceforge.net ). CONTACT: rainer.breitling@manchester.ac.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Perfilación de la Expresión Génica/métodos , Metabolómica/métodos , Proteómica/métodos , Programas Informáticos , Expresión GénicaRESUMEN
BACKGROUND: The Friedman rank sum test is a widely-used nonparametric method in computational biology. In addition to examining the overall null hypothesis of no significant difference among any of the rank sums, it is typically of interest to conduct pairwise comparison tests. Current approaches to such tests rely on large-sample approximations, due to the numerical complexity of computing the exact distribution. These approximate methods lead to inaccurate estimates in the tail of the distribution, which is most relevant for p-value calculation. RESULTS: We propose an efficient, combinatorial exact approach for calculating the probability mass distribution of the rank sum difference statistic for pairwise comparison of Friedman rank sums, and compare exact results with recommended asymptotic approximations. Whereas the chi-squared approximation performs inferiorly to exact computation overall, others, particularly the normal, perform well, except for the extreme tail. Hence exact calculation offers an improvement when small p-values occur following multiple testing correction. Exact inference also enhances the identification of significant differences whenever the observed values are close to the approximate critical value. We illustrate the proposed method in the context of biological machine learning, were Friedman rank sum difference tests are commonly used for the comparison of classifiers over multiple datasets. CONCLUSIONS: We provide a computationally fast method to determine the exact p-value of the absolute rank sum difference of a pair of Friedman rank sums, making asymptotic tests obsolete. Calculation of exact p-values is easy to implement in statistical software and the implementation in R is provided in one of the Additional files and is also available at http://www.ru.nl/publish/pages/726696/friedmanrsd.zip .
Asunto(s)
Interfaz Usuario-Computador , Biología Computacional/métodos , Humanos , Internet , Estadísticas no ParamétricasRESUMEN
Functional connectivity concerns the correlated activity between neuronal populations in spatially segregated regions of the brain, which may be studied using functional magnetic resonance imaging (fMRI). This coupled activity is conveniently expressed using covariance, but this measure fails to distinguish between direct and indirect effects. A popular alternative that addresses this issue is partial correlation, which regresses out the signal of potentially confounding variables, resulting in a measure that reveals only direct connections. Importantly, provided the data are normally distributed, if two variables are conditionally independent given all other variables, their respective partial correlation is zero. In this paper, we propose a probabilistic generative model that allows us to estimate functional connectivity in terms of both partial correlations and a graph representing conditional independencies. Simulation results show that this methodology is able to outperform the graphical LASSO, which is the de facto standard for estimating partial correlations. Furthermore, we apply the model to estimate functional connectivity for twenty subjects using resting-state fMRI data. Results show that our model provides a richer representation of functional connectivity as compared to considering partial correlations alone. Finally, we demonstrate how our approach can be extended in several ways, for instance to achieve data fusion by informing the conditional independence graph with data from probabilistic tractography. As our Bayesian formulation of functional connectivity provides access to the posterior distribution instead of only to point estimates, we are able to quantify the uncertainty associated with our results. This reveals that while we are able to infer a clear backbone of connectivity in our empirical results, the data are not accurately described by simply looking at the mode of the distribution over connectivity. The implication of this is that deterministic alternatives may misjudge connectivity results by drawing conclusions from noisy and limited data.
Asunto(s)
Encéfalo/fisiología , Imagen por Resonancia Magnética/métodos , Modelos Neurológicos , Red Nerviosa/fisiología , Teorema de Bayes , Biología Computacional , Conectoma/métodos , HumanosRESUMEN
By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.
Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Estudio de Asociación del Genoma Completo/métodos , Programas Informáticos , Simulación por Computador , Enfermedad de Crohn/genética , Humanos , Modelos GenéticosRESUMEN
Attention-deficit/hyperactivity disorder (ADHD) is a common and highly heritable disorder affecting both children and adults. One of the candidate genes for ADHD is DAT1, encoding the dopamine transporter. In an attempt to clarify its mode of action, we assessed brain activity during the reward anticipation phase of the Monetary Incentive Delay (MID) task in a functional MRI paradigm in 87 adult participants with ADHD and 77 controls (average age 36.5 years). The MID task activates the ventral striatum, where DAT1 is most highly expressed. A previous analysis based on standard statistical techniques did not show any significant dependencies between a variant in the DAT1 gene and brain activation [Hoogman et al. (2013); Neuropsychopharm 23:469-478]. Here, we used an alternative method for analyzing the data, that is, causal modeling. The Bayesian Constraint-based Causal Discovery (BCCD) algorithm [Claassen and Heskes (2012); Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence] is able to find direct and indirect dependencies between variables, determines the strength of the dependencies, and provides a graphical visualization to interpret the results. Through BCCD one gets an opportunity to consider several variables together and to infer causal relations between them. Application of the BCCD algorithm confirmed that there is no evidence of a direct link between DAT1 genetic variability and brain activation, but suggested an indirect link mediated through inattention symptoms and diagnostic status of ADHD. Our finding of an indirect link of DAT1 with striatal activity during reward anticipation might explain existing discrepancies in the current literature. Further experiments should confirm this hypothesis. © 2015 Wiley Periodicals, Inc.
RESUMEN
BACKGROUND: The rank product method is a powerful statistical technique for identifying differentially expressed molecules in replicated experiments. A critical issue in molecule selection is accurate calculation of the p-value of the rank product statistic to adequately address multiple testing. Both exact calculation and permutation and gamma approximations have been proposed to determine molecule-level significance. These current approaches have serious drawbacks as they are either computationally burdensome or provide inaccurate estimates in the tail of the p-value distribution. RESULTS: We derive strict lower and upper bounds to the exact p-value along with an accurate approximation that can be used to assess the significance of the rank product statistic in a computationally fast manner. The bounds and the proposed approximation are shown to provide far better accuracy over existing approximate methods in determining tail probabilities, with the slightly conservative upper bound protecting against false positives. We illustrate the proposed method in the context of a recently published analysis on transcriptomic profiling performed in blood. CONCLUSIONS: We provide a method to determine upper bounds and accurate approximate p-values of the rank product statistic. The proposed algorithm provides an order of magnitude increase in throughput as compared with current approaches and offers the opportunity to explore new application domains with even larger multiple testing issue. The R code is published in one of the Additional files and is available at http://www.ru.nl/publish/pages/726696/rankprodbounds.zip .
Asunto(s)
Envejecimiento/genética , Algoritmos , Biomarcadores/análisis , Perfilación de la Expresión Génica/métodos , Bases de Datos Genéticas , Humanos , ProbabilidadRESUMEN
BACKGROUND: Millions of cells are present in thousands of images created in high-throughput screening (HTS). Biologists could classify each of these cells into a phenotype by visual inspection. But in the presence of millions of cells this visual classification task becomes infeasible. Biologists train classification models on a few thousand visually classified example cells and iteratively improve the training data by visual inspection of the important misclassified phenotypes. Classification methods differ in performance and performance evaluation time. We present a comparative study of computational performance of gentle boosting, joint boosting CellProfiler Analyst (CPA), support vector machines (linear and radial basis function) and linear discriminant analysis (LDA) on two data sets of HT29 and HeLa cancer cells. RESULTS: For the HT29 data set we find that gentle boosting, SVM (linear) and SVM (RBF) are close in performance but SVM (linear) is faster than gentle boosting and SVM (RBF). For the HT29 data set the average performance difference between SVM (RBF) and SVM (linear) is 0.42 %. For the HeLa data set we find that SVM (RBF) outperforms other classification methods and is on average 1.41 % better in performance than SVM (linear). CONCLUSIONS: Our study proposes SVM (linear) for iterative improvement of the training data and SVM (RBF) for the final classifier to classify all unlabeled cells in the whole data set.