RESUMO
DNA methylation is an important epigenetic mark that modulates gene expression through the inhibition of transcriptional proteins binding to DNA. As in many other omics experiments, the issue of missing values is an important one, and appropriate imputation techniques are important in avoiding an unnecessary sample size reduction as well as to optimally leverage the information collected. We consider the case where relatively few samples are processed via an expensive high-density whole genome bisulfite sequencing (WGBS) strategy and a larger number of samples is processed using more affordable low-density, array-based technologies. In such cases, one can impute the low-coverage (array-based) methylation data using the high-density information provided by the WGBS samples. In this paper, we propose an efficient Linear Model of Coregionalisation with informative Covariates (LMCC) to predict missing values based on observed values and covariates. Our model assumes that at each site, the methylation vector of all samples is linked to the set of fixed factors (covariates) and a set of latent factors. Furthermore, we exploit the functional nature of the data and the spatial correlation across sites by assuming some Gaussian processes on the fixed and latent coefficient vectors, respectively. Our simulations show that the use of covariates can significantly improve the accuracy of imputed values, especially in cases where missing data contain some relevant information about the explanatory variable. We also showed that our proposed model is particularly efficient when the number of columns is much greater than the number of rows-which is usually the case in methylation data analysis. Finally, we apply and compare our proposed method with alternative approaches on two real methylation datasets, showing how covariates such as cell type, tissue type or age can enhance the accuracy of imputed values.
Assuntos
Metilação de DNA , Epigênese Genética , Metilação de DNA/genética , Humanos , Modelos Estatísticos , Epigenômica/métodos , Bioestatística/métodosRESUMO
Motivated by a DNA methylation application, this article addresses the problem of fitting and inferring a multivariate binomial regression model for outcomes that are contaminated by errors and exhibit extra-parametric variations, also known as dispersion. While dispersion in univariate binomial regression has been extensively studied, addressing dispersion in the context of multivariate outcomes remains a complex and relatively unexplored task. The complexity arises from a noteworthy data characteristic observed in our motivating dataset: non-constant yet correlated dispersion across outcomes. To address this challenge and account for possible measurement error, we propose a novel hierarchical quasi-binomial varying coefficient mixed model, which enables flexible dispersion patterns through a combination of additive and multiplicative dispersion components. To maximize the Laplace-approximated quasi-likelihood of our model, we further develop a specialized two-stage expectation-maximization (EM) algorithm, where a plug-in estimate for the multiplicative scale parameter enhances the speed and stability of the EM iterations. Simulations demonstrated that our approach yields accurate inference for smooth covariate effects and exhibits excellent power in detecting non-zero effects. Additionally, we applied our proposed method to investigate the association between DNA methylation, measured across the genome through targeted custom capture sequencing of whole blood, and levels of anti-citrullinated protein antibodies (ACPA), a preclinical marker for rheumatoid arthritis (RA) risk. Our analysis revealed 23 significant genes that potentially contribute to ACPA-related differential methylation, highlighting the relevance of cell signaling and collagen metabolism in RA. We implemented our method in the R Bioconductor package called "SOMNiBUS."
Assuntos
Algoritmos , Simulação por Computador , Metilação de DNA , Modelos Estatísticos , Humanos , Análise Multivariada , Artrite Reumatoide/genética , Funções Verossimilhança , Sulfitos/química , Análise de Sequência de DNA/métodosRESUMO
Electroencephalography measures are of interest in developmental neuroscience as potentially reliable clinical markers of brain function. Features extracted from electroencephalography are most often averaged across individuals in a population with a particular condition and compared statistically to the mean of a typically developing group, or a group with a different condition, to define whether a feature is representative of the populations as a whole. However, there can be large variability within a population, and electroencephalography features often change dramatically with age, making comparisons difficult. Combined with often low numbers of trials and low signal-to-noise ratios in pediatric populations, establishing biomarkers can be difficult in practice. One approach is to identify electroencephalography features that are less variable between individuals and are relatively stable in a healthy population during development. To identify such features in resting-state electroencephalography, which can be readily measured in many populations, we introduce an innovative application of statistical measures of variance for the analysis of resting-state electroencephalography data. Using these statistical measures, we quantified electroencephalography features commonly used to measure brain development-including power, connectivity, phase-amplitude coupling, entropy, and fractal dimension-according to their intersubject variability. Results from 51 6-month-old infants revealed that the complexity measures, including fractal dimension and entropy, followed by connectivity were the least variable features across participants. This stability was found to be greatest in the right parietotemporal region for both complexity feature, but no significant region of interest was found for connectivity feature. This study deepens our understanding of physiological patterns of electroencephalography data in developing brains, provides an example of how statistical measures can be used to analyze variability in resting-state electroencephalography in a homogeneous group of healthy infants, contributes to the establishment of robust electroencephalography biomarkers of neurodevelopment through the application of variance analyses, and reveals that nonlinear measures may be most relevant biomarkers of neurodevelopment.
Assuntos
Encéfalo , Eletroencefalografia , Criança , Humanos , Lactente , Eletroencefalografia/métodos , Encéfalo/fisiologia , Entropia , BiomarcadoresRESUMO
BACKGROUND: The regulation of the circadian clock genes, which coordinate the activity of the immune system, is disturbed in inflammatory bowel disease (IBD). Emerging evidence suggests that butyrate, a short-chain fatty acid produced by the gut microbiota is involved in the regulation of inflammatory responses as well as circadian-clock genes. This study was conducted to investigate the effects of sodium-butyrate supplementation on the expression of circadian-clock genes, inflammation, sleep and life quality in active ulcerative colitis (UC) patients. METHODS: In the current randomized placebo-controlled trial, 36 active UC patients were randomly divided to receive sodium-butyrate (600 mg/kg) or placebo for 12-weeks. In this study the expression of circadian clock genes (CRY1, CRY2, PER1, PER2, BMAl1 and CLOCK) were assessed by real time polymerase chain reaction (qPCR) in whole blood. Gene expression changes were presented as fold changes in expression (2^-ΔΔCT) relative to the baseline. The faecal calprotectin and serum level of high-sensitivity C-reactive protein (hs-CRP) were assessed by enzyme-linked immunosorbent assay method (ELIZA). Moreover, the sleep quality and IBD quality of life (QoL) were assessed by Pittsburgh sleep quality index (PSQI) and inflammatory bowel disease questionnaire-9 (IBDQ-9) respectively before and after the intervention. RESULTS: The results showed that sodium-butyrate supplementation in comparison with placebo significantly decreased the level of calprotectin (-133.82 ± 155.62 vs. 51.58 ± 95.57, P-value < 0.001) and hs-CRP (-0.36 (-1.57, -0.05) vs. 0.48 (-0.09-4.77), P-value < 0.001) and upregulated the fold change expression of CRY1 (2.22 ± 1.59 vs. 0.63 ± 0.49, P-value < 0.001), CRY2 (2.15 ± 1.26 vs. 0.93 ± 0.80, P-value = 0.001), PER1 (1.86 ± 1.77 vs. 0.65 ± 0.48, P-value = 0.005), BMAL1 (1.85 ± 0.97 vs. 0.86 ± 0.63, P-value = 0.003). Also, sodium-butyrate caused an improvement in the sleep quality (PSQI score: -2.94 ± 3.50 vs. 1.16 ± 3.61, P-value < 0.001) and QoL (IBDQ-9: 17.00 ± 11.36 vs. -3.50 ± 6.87, P-value < 0.001). CONCLUSION: Butyrate may be an effective adjunct treatment for active UC patients by reducing biomarkers of inflammation, upregulation of circadian-clock genes and improving sleep quality and QoL.
Assuntos
Colite Ulcerativa , Suplementos Nutricionais , Qualidade do Sono , Humanos , Colite Ulcerativa/tratamento farmacológico , Colite Ulcerativa/genética , Colite Ulcerativa/metabolismo , Masculino , Feminino , Adulto , Método Duplo-Cego , Pessoa de Meia-Idade , Inflamação/genética , Inflamação/tratamento farmacológico , Proteína C-Reativa/metabolismo , Proteína C-Reativa/genética , Qualidade de Vida , Relógios Circadianos/genética , Relógios Circadianos/efeitos dos fármacos , Complexo Antígeno L1 Leucocitário/genética , Complexo Antígeno L1 Leucocitário/metabolismo , Regulação da Expressão Gênica/efeitos dos fármacos , Butiratos , Ácido ButíricoRESUMO
Capturing the conditional covariances or correlations among the elements of a multivariate response vector based on covariates is important to various fields including neuroscience, epidemiology and biomedicine. We propose a new method called Covariance Regression with Random Forests (CovRegRF) to estimate the covariance matrix of a multivariate response given a set of covariates, using a random forest framework. Random forest trees are built with a splitting rule specially designed to maximize the difference between the sample covariance matrix estimates of the child nodes. We also propose a significance test for the partial effect of a subset of covariates. We evaluate the performance of the proposed method and significance test through a simulation study which shows that the proposed method provides accurate covariance matrix estimates and that the Type-1 error is well controlled. An application of the proposed method to thyroid disease data is also presented. CovRegRF is implemented in a freely available R package on CRAN.
Assuntos
Modelos Estatísticos , Algoritmo Florestas Aleatórias , Criança , Humanos , Simulação por ComputadorRESUMO
BACKGROUND: Attention deficit/hyperactivity disorder (ADHD) is a highly prevalent childhood disorder. Maternal smoking during pregnancy is a replicated environmental risk factor for this disorder. It is also a robust modifier of gene methylation during the prenatal developmental period. In this study, we sought to identify loci differentially methylated by maternal smoking during pregnancy and relate their methylation levels to various behavioural and physical outcomes relevant to ADHD. METHODS: We extracted DNA from blood samples from children diagnosed with ADHD and deeply phenotyped. Genome-wide DNA methylation was assessed using Infinium MethylationEPIC BeadChip. Maternal smoking during pregnancy was self-declared and assessed retrospectively. RESULTS: Our sample included 231 children with ADHD. Statistically significant differences in DNA methylation between children exposed or not to maternal smoking during pregnancy were detected in 3457 CpGs. We kept 30 CpGs with at least 5% of methylation difference between the 2 groups for further analysis. Six genes were associated with varied phenotypes of clinical relevance to ADHD. The levels of DNA methylation in RUNX1 were positively correlated with the CBCL scores, and DNA methylation in MYO1G correlated positively with the score at the Conners rating scale. Methylation level in a CpG located in GFI1 correlated with birthweight, a risk factor for ADHD. Differentially methylated regions were also identified and confirmed the association of RUNX1 methylation levels with the CBCL score. LIMITATIONS: The study has several limitations, including the retrospective recall with self-report of maternal smoking during pregnancy as well as the grouping of individuals of varying age and developmental stage and of both males and females. In addition, the correlation design prevents the building of causation models. CONCLUSION: This study provides evidence for the association between the level of methylation at specific loci and quantitative dimensions highly relevant for ADHD as well as birth weight, a measure that has already been associated with increased risk for ADHD. Our results provide further support to public health educational initiatives to stop maternal smoking during pregnancy.
Assuntos
Transtorno do Deficit de Atenção com Hiperatividade , Efeitos Tardios da Exposição Pré-Natal , Masculino , Gravidez , Criança , Feminino , Humanos , Transtorno do Deficit de Atenção com Hiperatividade/genética , Estudos Retrospectivos , Subunidade alfa 2 de Fator de Ligação ao Core/genética , Fumar/genética , Fumar/efeitos adversos , Metilação de DNA , Peso ao Nascer/genética , Fenótipo , Efeitos Tardios da Exposição Pré-Natal/genéticaRESUMO
MOTIVATION: Investigating the relationships between two sets of variables helps to understand their interactions and can be done with canonical correlation analysis (CCA). However, the correlation between the two sets can sometimes depend on a third set of covariates, often subject-related ones such as age, gender or other clinical measures. In this case, applying CCA to the whole population is not optimal and methods to estimate conditional CCA, given the covariates, can be useful. RESULTS: We propose a new method called Random Forest with Canonical Correlation Analysis (RFCCA) to estimate the conditional canonical correlations between two sets of variables given subject-related covariates. The individual trees in the forest are built with a splitting rule specifically designed to partition the data to maximize the canonical correlation heterogeneity between child nodes. We also propose a significance test to detect the global effect of the covariates on the relationship between two sets of variables. The performance of the proposed method and the global significance test is evaluated through simulation studies that show it provides accurate canonical correlation estimations and well-controlled Type-1 error. We also show an application of the proposed method with EEG data. AVAILABILITY AND IMPLEMENTATION: RFCCA is implemented in a freely available R package on CRAN (https://CRAN.R-project.org/package=RFCCA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
Genomic copy number variants (CNVs) are routinely identified and reported back to patients with neuropsychiatric disorders, but their quantitative effects on essential traits such as cognitive ability are poorly documented. We have recently shown that the effect size of deletions on cognitive ability can be statistically predicted using measures of intolerance to haploinsufficiency. However, the effect sizes of duplications remain unknown. It is also unknown if the effect of multigenic CNVs are driven by a few genes intolerant to haploinsufficiency or distributed across tolerant genes as well. Here, we identified all CNVs > 50 kilobases in 24,092 individuals from unselected and autism cohorts with assessments of general intelligence. Statistical models used measures of intolerance to haploinsufficiency of genes included in CNVs to predict their effect size on intelligence. Intolerant genes decrease general intelligence by 0.8 and 2.6 points of intelligence quotient when duplicated or deleted, respectively. Effect sizes showed no heterogeneity across cohorts. Validation analyses demonstrated that models could predict CNV effect sizes with 78% accuracy. Data on the inheritance of 27,766 CNVs showed that deletions and duplications with the same effect size on intelligence occur de novo at the same frequency. We estimated that around 10,000 intolerant and tolerant genes negatively affect intelligence when deleted, and less than 2% have large effect sizes. Genes encompassed in CNVs were not enriched in any GOterms but gene regulation and brain expression were GOterms overrepresented in the intolerant subgroup. Such pervasive effects on cognition may be related to emergent properties of the genome not restricted to a limited number of biological pathways.
Assuntos
Variações do Número de Cópias de DNA , Genoma , Cognição , Variações do Número de Cópias de DNA/genética , Dosagem de Genes , Humanos , Testes de InteligênciaRESUMO
In the last few years, a significant amount of work has aimed to characterize maturational trajectories of cortical development. The role of pericortical microstructure putatively characterized as the gray-white matter contrast (GWC) at the pericortical gray-white matter boundary and its relationship to more traditional morphological measures of cortical morphometry has emerged as a means to examine finer grained neuroanatomical underpinnings of cortical changes. In this work, we characterize the GWC developmental trajectories in a representative sample (n = 394) of children and adolescents (~4 to ~22 years of age), with repeated scans (1-3 scans per subject, total scans n = 819). We tested whether linear, quadratic, or cubic trajectories of contrast development best described changes in GWC. A best-fit model was identified vertex-wise across the whole cortex via the Akaike Information Criterion (AIC). GWC across nearly the whole brain was found to significantly change with age. Cubic trajectories were likeliest for 63% of vertices, quadratic trajectories were likeliest for 20% of vertices, and linear trajectories were likeliest for 16% of vertices. A main effect of sex was observed in some regions, where males had a higher GWC than females. However, no sex by age interactions were found on GWC. In summary, our results suggest a progressive decrease in GWC at the pericortical boundary throughout childhood and adolescence. This work contributes to efforts seeking to characterize typical, healthy brain development and, by extension, can help elucidate aberrant developmental trajectories.
Assuntos
Córtex Cerebral , Substância Cinzenta , Desenvolvimento Humano , Substância Branca , Adolescente , Adulto , Córtex Cerebral/anatomia & histologia , Córtex Cerebral/diagnóstico por imagem , Córtex Cerebral/crescimento & desenvolvimento , Criança , Pré-Escolar , Feminino , Substância Cinzenta/anatomia & histologia , Substância Cinzenta/diagnóstico por imagem , Substância Cinzenta/crescimento & desenvolvimento , Desenvolvimento Humano/fisiologia , Humanos , Estudos Longitudinais , Imageamento por Ressonância Magnética , Masculino , Fatores Sexuais , Substância Branca/anatomia & histologia , Substância Branca/diagnóstico por imagem , Substância Branca/crescimento & desenvolvimento , Adulto JovemRESUMO
MOTIVATION: The human microbiota is the collection of microorganisms colonizing the human body, and plays an integral part in human health. A growing trend in microbiome analysis is to construct a network to estimate the co-occurrence patterns among taxa through precision matrices. Existing methods do not facilitate investigation into how these networks change with respect to covariates. RESULTS: We propose a new model called Microbiome Differential Network Estimation (MDiNE) to estimate network changes with respect to a binary covariate. The counts of individual taxa in the samples are modeled through a multinomial distribution whose probabilities depend on a latent Gaussian random variable. A sparse precision matrix over all the latent terms determines the co-occurrence network among taxa. The model fit is obtained and evaluated using Hamiltonian Monte Carlo methods. The performance of our model is evaluated through an extensive simulation study and is shown to outperform existing methods in terms of estimation of network parameters. We also demonstrate an application of the model to estimate changes in the intestinal microbial network topology with respect to Crohn's disease. AVAILABILITY AND IMPLEMENTATION: MDiNE is implemented in a freely available R package: https://github.com/kevinmcgregor/mdine. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Microbiota , Humanos , Consórcios Microbianos , Método de Monte Carlo , Distribuição Normal , ProbabilidadeRESUMO
Identifying disease-associated changes in DNA methylation can help us gain a better understanding of disease etiology. Bisulfite sequencing allows the generation of high-throughput methylation profiles at single-base resolution of DNA. However, optimally modeling and analyzing these sparse and discrete sequencing data is still very challenging due to variable read depth, missing data patterns, long-range correlations, data errors, and confounding from cell type mixtures. We propose a regression-based hierarchical model that allows covariate effects to vary smoothly along genomic positions and we have built a specialized EM algorithm, which explicitly allows for experimental errors and cell type mixtures, to make inference about smooth covariate effects in the model. Simulations show that the proposed method provides accurate estimates of covariate effects and captures the major underlying methylation patterns with excellent power. We also apply our method to analyze data from rheumatoid arthritis patients and controls. The method has been implemented in R package SOMNiBUS.
Assuntos
Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Metilação de DNA/genética , Humanos , Análise de Sequência de DNA , SulfitosRESUMO
Neurofilament light (NfL) is a marker of neuroaxonal injury, a prominent feature of Alzheimer's disease. It remains uncertain, however, how it relates to amyloid and tau pathology or neurodegeneration across the Alzheimer's disease continuum. The aim of this study was to investigate how plasma NfL relates to amyloid and tau PET and MRI measures of brain atrophy in participants with and without cognitive impairment. We retrospectively examined the association between plasma NfL and MRI measures of grey/white matter volumes in the Alzheimer's Disease Neuroimaging Initiative [ADNI: n = 1149; 382 cognitively unimpaired control subjects and 767 cognitively impaired participants (mild cognitive impairment n = 420, Alzheimer's disease dementia n = 347)]. Longitudinal plasma NfL was measured using single molecule array (Simoa) technology. Cross-sectional associations between plasma NfL and PET amyloid and tau measures were independently assessed in two cohorts: ADNI [n = 198; 110 cognitively unimpaired, 88 cognitively impaired (MCI n = 67, Alzheimer's disease dementia n = 21), data accessed October 2018]; and Translational Biomarkers in Aging and Dementia [TRIAD, n = 116; 74 cognitively unimpaired, 42 cognitively impaired (MCI n = 16, Alzheimer's disease dementia n = 26), data obtained November 2017 to January 2019]. Associations between plasma NfL and imaging-derived measures were examined voxel-wise using linear regression (cross-sectional) and linear mixed effect models (longitudinal). Cross-sectional analyses in both cohorts showed that plasma NfL was associated with PET findings in brain regions typically affected by Alzheimer's disease; associations were specific to amyloid PET in cognitively unimpaired and tau PET in cognitively impaired (P < 0.05). Longitudinal analyses showed that NfL levels were associated with grey/white matter volume loss; grey matter atrophy in cognitively unimpaired was specific to APOE ε4 carriers (P < 0.05). These findings suggest that plasma NfL increases in response to amyloid-related neuronal injury in preclinical stages of Alzheimer's disease, but is related to tau-mediated neurodegeneration in symptomatic patients. As such, plasma NfL may a useful measure to monitor effects in disease-modifying drug trials.
Assuntos
Doença de Alzheimer/sangue , Doença de Alzheimer/diagnóstico por imagem , Biomarcadores/sangue , Proteínas de Neurofilamentos/sangue , Idoso , Idoso de 80 Anos ou mais , Envelhecimento/psicologia , Peptídeos beta-Amiloides/sangue , Apolipoproteína E4/genética , Disfunção Cognitiva/sangue , Disfunção Cognitiva/diagnóstico por imagem , Estudos de Coortes , Estudos Transversais , Progressão da Doença , Feminino , Substância Cinzenta/diagnóstico por imagem , Humanos , Imageamento por Ressonância Magnética , Masculino , Tomografia por Emissão de Pósitrons , Substância Branca/diagnóstico por imagem , Proteínas tau/sangueRESUMO
DNA methylation allows for the environmental regulation of gene expression and is believed to link environmental stressors to psychiatric disorder phenotypes, such as anorexia nervosa (AN). The oxytocin receptor (OXTR) gene is epigenetically regulated, and studies have shown associations between OXTR and social behaviours in various samples, including women with AN. The present study examined differential levels of methylation at various CG sites of the OXTR gene in 69 women with active AN (AN-Active), 21 in whom AN was in remission (AN-Rem) and 35 with no eating disorder (NED). Within each group, we explored the correlation between methylation and measures of social behaviour such as insecure attachment and social avoidance. Hypermethylation of a number of CG sites was seen in AN-Active participants as compared with AN-Rem and NED participants. In the AN-Rem sample, methylation at CG27501759 was significantly positively correlated with insecure attachment (r = .614, p = .003, permutation Q = 0.008) and social avoidance (r = .588, p = .005, permutation Q = 0.0184). Our results highlight differential methylation of the OXTR gene among women with AN, those in remission from AN, and those who never had AN and provide some evidence of associations between OXTR methylation and social behaviour in women remitted from AN.
Assuntos
Anorexia Nervosa/genética , Anorexia Nervosa/psicologia , Metilação de DNA , Receptores de Ocitocina/genética , Comportamento Social , Adulto , Feminino , Humanos , Adulto JovemRESUMO
Background: This study explored state-related tendencies in DNA methylation in people with anorexia nervosa. Methods: We measured genome-wide DNA methylation in 75 women with active anorexia nervosa (active), 31 women showing stable remission of anorexia nervosa (remitted) and 41 women with no eating disorder (NED). We also obtained post-intervention methylation data from 52 of the women from the active group. Results: Comparisons between members of the active and NED groups showed 58 differentially methylated sites (Q < 0.01) that corresponded to genes relevant to metabolic and nutritional status (lipid and glucose metabolism), psychiatric status (serotonin receptor activity) and immune function. Methylation levels in members of the remitted group differed from those in the active group on 265 probes that also involved sites associated with genes for serotonin and insulin activity, glucose metabolism and immunity. Intriguingly, the direction of methylation effects in remitted participants tended to be opposite to those seen in active participants. The chronicity of Illness correlated (usually inversely, at Q < 0.01) with methylation levels at 64 sites that mapped onto genes regulating glutamate and serotonin activity, insulin function and epigenetic age. In contrast, body mass index increases coincided (at Q < 0.05) with generally increased methylation-level changes at 73 probes associated with lipid and glucose metabolism, immune and inflammatory processes, and olfaction. Limitations: Sample sizes were modest for this type of inquiry, and findings may have been subject to uncontrolled effects of medication and substance use. Conclusion: Findings point to the possibility of reversible epigenetic alterations in anorexia nervosa, and suggest that an adequate pathophysiological model would likely need to include psychiatric, metabolic and immune components.
Assuntos
Anorexia Nervosa/genética , Anorexia Nervosa/fisiopatologia , Metilação de DNA/genética , Epigenoma/genética , Adolescente , Adulto , Anorexia Nervosa/terapia , Doença Crônica , Feminino , Humanos , Estudos Longitudinais , Pessoa de Meia-Idade , Indução de Remissão , Adulto JovemRESUMO
DNA methylation studies have enabled researchers to understand methylation patterns and their regulatory roles in biological processes and disease. However, only a limited number of statistical approaches have been developed to provide formal quantitative analysis. Specifically, a few available methods do identify differentially methylated CpG (DMC) sites or regions (DMR), but they suffer from limitations that arise mostly due to challenges inherent in bisulfite sequencing data. These challenges include: (1) that read-depths vary considerably among genomic positions and are often low; (2) both methylation and autocorrelation patterns change as regions change; and (3) CpG sites are distributed unevenly. Furthermore, there are several methodological limitations: almost none of these tools is capable of comparing multiple groups and/or working with missing values, and only a few allow continuous or multiple covariates. The last of these is of great interest among researchers, as the goal is often to find which regions of the genome are associated with several exposures and traits. To tackle these issues, we have developed an efficient DMC identification method based on Hidden Markov Models (HMMs) called "DMCHMM" which is a three-step approach (model selection, prediction, testing) aiming to address the aforementioned drawbacks. Our proposed method is different from other HMM methods since it profiles methylation of each sample separately, hence exploiting inter-CpG autocorrelation within samples, and it is more flexible than previous approaches by allowing multiple hidden states. Using simulations, we show that DMCHMM has the best performance among several competing methods. An analysis of cell-separated blood methylation profiles is also provided.
Assuntos
Ilhas de CpG/genética , Metilação de DNA , Cadeias de Markov , Sulfitos , Algoritmos , Animais , Sítios de Ligação , Células Sanguíneas/metabolismo , Simulação por Computador/economia , Simulação por Computador/estatística & dados numéricos , Humanos , Análise de Sequência de DNA/métodosRESUMO
BACKGROUND: Conventional phylogenetic clustering approaches rely on arbitrary cutpoints applied a posteriori to phylogenetic estimates. Although in practice, Bayesian and bootstrap-based clustering tend to lead to similar estimates, they often produce conflicting measures of confidence in clusters. The current study proposes a new Bayesian phylogenetic clustering algorithm, which we refer to as DM-PhyClus (Dirichlet-Multinomial Phylogenetic Clustering), that identifies sets of sequences resulting from quick transmission chains, thus yielding easily-interpretable clusters, without using any ad hoc distance or confidence requirement. RESULTS: Simulations reveal that DM-PhyClus can outperform conventional clustering methods, as well as the Gap procedure, a pure distance-based algorithm, in terms of mean cluster recovery. We apply DM-PhyClus to a sample of real HIV-1 sequences, producing a set of clusters whose inference is in line with the conclusions of a previous thorough analysis. CONCLUSIONS: DM-PhyClus, by eliminating the need for cutpoints and producing sensible inference for cluster configurations, can facilitate transmission cluster detection. Future efforts to reduce incidence of infectious diseases, like HIV-1, will need reliable estimates of transmission clusters. It follows that algorithms like DM-PhyClus could serve to better inform public health strategies.
Assuntos
Algoritmos , Infecções por HIV/transmissão , Teorema de Bayes , Análise por Conglomerados , Infecções por HIV/patologia , Infecções por HIV/virologia , HIV-1/classificação , HIV-1/genética , HIV-1/isolamento & purificação , Homossexualidade Masculina , Humanos , Masculino , Filogenia , SoftwareRESUMO
PURPOSE: The epilepsy clinic at the Montreal Neurological Institute receives a high volume of referrals. Despite most patients assessed in the clinic eventually being diagnosed with epilepsy, other disorders causing alteration of consciousness or paroxystic symptoms that could be misdiagnosed as seizures are seen frequently. The incidence and clinical characteristics of such patients have not yet been determined. We aimed to determine the proportion and clinical characteristics of patients referred to our epilepsy clinic who had a final diagnosis other than epilepsy. METHODS: We performed a retrospective chart analysis of consecutive patient referrals to the epilepsy clinic from January 2013 to January 2015, inclusively. RESULTS: Four hundred four patient referrals were evaluated, 106 (or 26%) had a final diagnosis other than epilepsy. Referrals came primarily from general practitioners and nonneurology specialists. Although most patients had a normal routine electroencephalography (EEG) prior to the clinic visit, sleep-deprived EEG and cardiac investigations were rarely performed. Patients received a final diagnosis other than epilepsy after 1 to 2 visits in 92% of cases and with minimal paraclinical investigations. Prolonged video-EEG recording was required in 27% of patients. The most common diagnoses were syncope (33%), psychiatric symptoms (20%), followed by migraine (10%), and psychogenic nonepileptic seizures (9%). CONCLUSIONS: A significant proportion of patients seen in our tertiary care epilepsy clinic is in fact, not patients with epilepsy. Enhanced knowledge of these differential diagnosis and important anamnesis components to rule out seizures will help improve guidelines for referral to Epilepsy clinic and cost-effectively optimize the use of paraclinical investigations.
Assuntos
Transtornos Mentais/epidemiologia , Transtornos de Enxaqueca/diagnóstico , Encaminhamento e Consulta/estatística & dados numéricos , Síncope/diagnóstico , Adolescente , Assistência Ambulatorial , Instituições de Assistência Ambulatorial , Canadá/epidemiologia , Estado de Consciência , Diagnóstico Diferencial , Erros de Diagnóstico , Eletroencefalografia/efeitos adversos , Epilepsia/diagnóstico , Epilepsia/epidemiologia , Feminino , Humanos , Masculino , Transtornos Mentais/diagnóstico , Transtornos Mentais/psicologia , Transtornos de Enxaqueca/epidemiologia , Estudos Retrospectivos , Convulsões/psicologia , Privação do Sono/complicações , Síncope/epidemiologia , Gravação em VídeoRESUMO
Transmission ratio distortion (TRD) is a phenomenon where parental transmission of disease allele to the child does not follow the Mendelian inheritance ratio. TRD occurs in a sex-of-parent-specific or non-sex-of-parent-specific manner. An offset computed from the transmission probability of the minor allele in control-trios can be added to the loglinear model to adjust for TRD. Adjusting the model removes the inflation in the genotype relative risk (RR) estimate and Type 1 error introduced by non-sex-of-parent-specific TRD. We now propose to further extend this model to estimate an imprinting parameter. Some evidence suggests that more than 1% of all mammalian genes are imprinted. In the presence of imprinting, for example, the offspring inheriting an over-transmitted disease allele from the parent with a higher expression level in a neighboring gene is over-represented in the sample. TRD mechanisms such as meiotic drive and gametic competition occur in a sex-of-parent-specific manner. Therefore, sex-of-parent-specific TRD (ST) leads to over-representation of maternal or paternal alleles in the affected child. As a result, ST may bias the imprinting effect when present in the sample. We propose a sex-of-parent-specific transmission offset in adjusting the loglinear model to account for ST. This extended model restores the correct RR estimates for child and imprinting effects, adjusts for inflation in Type 1 error, and improves performance on sensitivity and specificity compared to the original model without ST offset. We conclude that to correctly interpret the association signal of an imprinting effect, adjustment for ST is necessary to ensure valid conclusions.
Assuntos
Impressão Genômica , Padrões de Herança/genética , Fatores Sexuais , Alelos , Criança , Feminino , Loci Gênicos , Genótipo , Humanos , Modelos Lineares , Masculino , Pais , Sensibilidade e EspecificidadeRESUMO
Primary patterns in adult brain connectivity are established during development by coordinated networks of transiently expressed genes; however, neural networks remain malleable throughout life. The present study hypothesizes that structural connectivity from key seed regions may induce effects on their connected targets, which are reflected in gene expression at those targeted regions. To test this hypothesis, analyses were performed on data from two brains from the Allen Human Brain Atlas, for which both gene expression and DW-MRI were available. Structural connectivity was estimated from the DW-MRI data and an approach motivated by network topology, that is, weighted gene coexpression network analysis (WGCNA), was used to cluster genes with similar patterns of expression across the brain. Group exponential lasso models were then used to predict gene cluster expression summaries as a function of seed region structural connectivity patterns. In several gene clusters, brain regions located in the brain stem, diencephalon, and hippocampal formation were identified that have significant predictive power for these expression summaries. These connectivity-associated clusters are enriched in genes associated with synaptic signaling and brain plasticity. Furthermore, using seed region based connectivity provides a novel perspective in understanding relationships between gene expression and connectivity. Hum Brain Mapp 38:3126-3140, 2017. © 2017 Wiley Periodicals, Inc.
Assuntos
Encéfalo/metabolismo , Expressão Gênica/fisiologia , Redes Reguladoras de Genes/fisiologia , Vias Neurais/metabolismo , Adulto , Encéfalo/citologia , Análise por Conglomerados , Conectoma , Conjuntos de Dados como Assunto , Imagem de Difusão por Ressonância Magnética , Humanos , Processamento de Imagem Assistida por Computador , Masculino , Adulto JovemRESUMO
MOTIVATION: DNA methylation patterns are well known to vary substantially across cell types or tissues. Hence, existing normalization methods may not be optimal if they do not take this into account. We therefore present a new R package for normalization of data from the Illumina Infinium Human Methylation450 BeadChip (Illumina 450 K) built on the concepts in the recently published funNorm method, and introducing cell-type or tissue-type flexibility. RESULTS: funtooNorm is relevant for data sets containing samples from two or more cell or tissue types. A visual display of cross-validated errors informs the choice of the optimal number of components in the normalization. Benefits of cell (tissue)-specific normalization are demonstrated in three data sets. Improvement can be substantial; it is strikingly better on chromosome X, where methylation patterns have unique inter-tissue variability. AVAILABILITY AND IMPLEMENTATION: An R package is available at https://github.com/GreenwoodLab/funtooNorm, and has been submitted to Bioconductor at http://bioconductor.org.