Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Am J Hum Genet ; 110(7): 1177-1199, 2023 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-37419091

RESUMO

The existing framework of Mendelian randomization (MR) infers the causal effect of one or multiple exposures on one single outcome. It is not designed to jointly model multiple outcomes, as would be necessary to detect causes of more than one outcome and would be relevant to model multimorbidity or other related disease outcomes. Here, we introduce multi-response Mendelian randomization (MR2), an MR method specifically designed for multiple outcomes to identify exposures that cause more than one outcome or, conversely, exposures that exert their effect on distinct responses. MR2 uses a sparse Bayesian Gaussian copula regression framework to detect causal effects while estimating the residual correlation between summary-level outcomes, i.e., the correlation that cannot be explained by the exposures, and vice versa. We show both theoretically and in a comprehensive simulation study how unmeasured shared pleiotropy induces residual correlation between outcomes irrespective of sample overlap. We also reveal how non-genetic factors that affect more than one outcome contribute to their correlation. We demonstrate that by accounting for residual correlation, MR2 has higher power to detect shared exposures causing more than one outcome. It also provides more accurate causal effect estimates than existing methods that ignore the dependence between related responses. Finally, we illustrate how MR2 detects shared and distinct causal exposures for five cardiovascular diseases in two applications considering cardiometabolic and lipidomic exposures and uncovers residual correlation between summary-level outcomes reflecting known relationships between cardiovascular diseases.


Assuntos
Doenças Cardiovasculares , Humanos , Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/genética , Teorema de Bayes , Multimorbidade , Análise da Randomização Mendeliana/métodos , Causalidade , Estudo de Associação Genômica Ampla
2.
BMC Bioinformatics ; 24(1): 210, 2023 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-37217852

RESUMO

The microbiome plays a key role in the health of the human body. Interest often lies in finding features of the microbiome, alongside other covariates, which are associated with a phenotype of interest. One important property of microbiome data, which is often overlooked, is its compositionality as it can only provide information about the relative abundance of its constituting components. Typically, these proportions vary by several orders of magnitude in datasets of high dimensions. To address these challenges we develop a Bayesian hierarchical linear log-contrast model which is estimated by mean field Monte-Carlo co-ordinate ascent variational inference (CAVI-MC) and easily scales to high dimensional data. We use novel priors which account for the large differences in scale and constrained parameter space associated with the compositional covariates. A reversible jump Monte Carlo Markov chain guided by the data through univariate approximations of the variational posterior probability of inclusion, with proposal parameters informed by approximating variational densities via auxiliary parameters, is used to estimate intractable marginal expectations. We demonstrate that our proposed Bayesian method performs favourably against existing frequentist state of the art compositional data analysis methods. We then apply the CAVI-MC to the analysis of real data exploring the relationship of the gut microbiome to body mass index.


Assuntos
Microbioma Gastrointestinal , Microbiota , Humanos , Teorema de Bayes , Modelos Lineares , Cadeias de Markov , Método de Monte Carlo
3.
BMC Med ; 20(1): 34, 2022 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-35101027

RESUMO

BACKGROUND: Greater maternal adiposity before or during pregnancy is associated with greater offspring adiposity throughout childhood, but the extent to which this is due to causal intrauterine or periconceptional mechanisms remains unclear. Here, we use Mendelian randomisation (MR) with polygenic risk scores (PRS) to investigate whether associations between maternal pre-/early pregnancy body mass index (BMI) and offspring adiposity from birth to adolescence are causal. METHODS: We undertook confounder adjusted multivariable (MV) regression and MR using mother-offspring pairs from two UK cohorts: Avon Longitudinal Study of Parents and Children (ALSPAC) and Born in Bradford (BiB). In ALSPAC and BiB, the outcomes were birthweight (BW; N = 9339) and BMI at age 1 and 4 years (N = 8659 to 7575). In ALSPAC only we investigated BMI at 10 and 15 years (N = 4476 to 4112) and dual-energy X-ray absorptiometry (DXA) determined fat mass index (FMI) from age 10-18 years (N = 2659 to 3855). We compared MR results from several PRS, calculated from maternal non-transmitted alleles at between 29 and 80,939 single nucleotide polymorphisms (SNPs). RESULTS: MV and MR consistently showed a positive association between maternal BMI and BW, supporting a moderate causal effect. For adiposity at most older ages, although MV estimates indicated a strong positive association, MR estimates did not support a causal effect. For the PRS with few SNPs, MR estimates were statistically consistent with the null, but had wide confidence intervals so were often also statistically consistent with the MV estimates. In contrast, the largest PRS yielded MR estimates with narrower confidence intervals, providing strong evidence that the true causal effect on adolescent adiposity is smaller than the MV estimates (Pdifference = 0.001 for 15-year BMI). This suggests that the MV estimates are affected by residual confounding, therefore do not provide an accurate indication of the causal effect size. CONCLUSIONS: Our results suggest that higher maternal pre-/early-pregnancy BMI is not a key driver of higher adiposity in the next generation. Thus, they support interventions that target the whole population for reducing overweight and obesity, rather than a specific focus on women of reproductive age.


Assuntos
Adiposidade/genética , Obesidade/genética , Adolescente , Alelos , Índice de Massa Corporal , Criança , Pré-Escolar , Estudos de Coortes , Feminino , Humanos , Lactente , Estudos Longitudinais , Obesidade/etiologia , Gravidez , Fatores de Risco , Reino Unido
4.
Thorax ; 77(9): 873-881, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-34556554

RESUMO

BACKGROUND: Cystic fibrosis (CF) is a life-threatening genetic disease, affecting around 10 500 people in the UK. Precision medicines have been developed to treat specific CF-gene mutations. The newest, elexacaftor/tezacaftor/ivacaftor (ELEX/TEZ/IVA), has been found to be highly effective in randomised controlled trials (RCTs) and became available to a large proportion of UK CF patients in 2020. Understanding the potential health economic impacts of ELEX/TEZ/IVA is vital to planning service provision. METHODS: We combined observational UK CF Registry data with RCT results to project the impact of ELEX/TEZ/IVA on total days of intravenous (IV) antibiotic treatment at a population level. Registry data from 2015 to 2017 were used to develop prediction models for IV days over a 1-year period using several predictors, and to estimate 1-year population total IV days based on standards of care pre-ELEX/TEZ/IVA. We considered two approaches to imposing the impact of ELEX/TEZ/IVA on projected outcomes using effect estimates from RCTs: approach 1 based on effect estimates on FEV1% and approach 2 based on effect estimates on exacerbation rate. RESULTS: ELEX/TEZ/IVA is expected to result in significant reductions in population-level requirements for IV antibiotics of 16.1% (~17 800 days) using approach 1 and 43.6% (~39 500 days) using approach 2. The two approaches require different assumptions. Increased understanding of the mechanisms through which ELEX/TEZ/IVA acts on these outcomes would enable further refinements to our projections. CONCLUSIONS: This work contributes to increased understanding of the changing healthcare needs of people with CF and illustrates how Registry data can be used in combination with RCT evidence to estimate population-level treatment impacts.


Assuntos
Fibrose Cística , Aminofenóis/uso terapêutico , Antibacterianos/uso terapêutico , Benzodioxóis/uso terapêutico , Fibrose Cística/tratamento farmacológico , Fibrose Cística/genética , Regulador de Condutância Transmembrana em Fibrose Cística/genética , Humanos , Mutação , Estudos Observacionais como Assunto , Ensaios Clínicos Controlados Aleatórios como Assunto , Sistema de Registros
5.
J R Stat Soc Ser C Appl Stat ; 70(4): 886-908, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-35001978

RESUMO

Our work is motivated by the search for metabolite quantitative trait loci (QTL) in a cohort of more than 5000 people. There are 158 metabolites measured by NMR spectroscopy in the 31-year follow-up of the Northern Finland Birth Cohort 1966 (NFBC66). These metabolites, as with many multivariate phenotypes produced by high-throughput biomarker technology, exhibit strong correlation structures. Existing approaches for combining such data with genetic variants for multivariate QTL analysis generally ignore phenotypic correlations or make restrictive assumptions about the associations between phenotypes and genetic loci. We present a computationally efficient Bayesian seemingly unrelated regressions model for high-dimensional data, with cell-sparse variable selection and sparse graphical structure for covariance selection. Cell sparsity allows different phenotype responses to be associated with different genetic predictors and the graphical structure is used to represent the conditional dependencies between phenotype variables. To achieve feasible computation of the large model space, we exploit a factorisation of the covariance matrix. Applying the model to the NFBC66 data with 9000 directly genotyped single nucleotide polymorphisms, we are able to simultaneously estimate genotype-phenotype associations and the residual dependence structure among the metabolites. The R package BayesSUR with full documentation is available at https://cran.r-project.org/web/packages/BayesSUR/.

6.
J Epidemiol Community Health ; 74(11): 933-941, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32581064

RESUMO

BACKGROUND: There are various maternal prenatal biopsychosocial (BPS) predictors of birth weight, making it difficult to quantify their cumulative relationship. METHODS: We studied two birth cohorts: Northern Finland Birth Cohort 1986 (NFBC1986) born in 1985-1986 and the Generation R Study (from the Netherlands) born in 2002-2006. In NFBC1986, we selected variables depicting BPS exposure in association with birth weight and performed factor analysis to derive latent constructs representing the relationship between these variables. In Generation R, the same factors were generated weighted by loadings of NFBC1986. Factor scores from each factor were then allocated into tertiles and added together to calculate a cumulative BPS score. In all cases, we used regression analyses to explore the relationship with birth weight corrected for sex and gestational age and additionally adjusted for other factors. RESULTS: Factor analysis supported a four-factor structure, labelled closely to represent their characteristics as 'Factor1-BMI' (body mass index), 'Factor2-DBP' (diastolic blood pressure), 'Factor3-Socioeconomic-Obstetric-Profile' and 'Factor4-Parental-Lifestyle'. In both cohorts, 'Factor1-BMI' was positively associated with birth weight, whereas other factors showed negative association. 'Factor3-Socioeconomic-Obstetric-Profile' and 'Factor4-Parental-Lifestyle' had the greatest effect size, explaining 30% of the variation in birth weight. Associations of the factors with birth weight were largely driven by 'Factor1-BMI'. Graded decrease in birth weight was observed with increasing cumulative BPS score, jointly evaluating four factors in both cohorts. CONCLUSION: Our study is a proof of concept for maternal prenatal BPS hypothesis, highlighting the components snowball effect on birth weight in two different European birth cohorts.


Assuntos
Peso ao Nascer , Fatores Socioeconômicos , Adulto , Índice de Massa Corporal , Feminino , Finlândia , Idade Gestacional , Humanos , Masculino , Países Baixos , Gravidez , Fatores de Risco
7.
Int J Epidemiol ; 49(1): 233-243, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31074781

RESUMO

BACKGROUND: Maternal pre-pregnancy body mass index (BMI) is positively associated with offspring birth weight (BW) and BMI in childhood and adulthood. Each of these associations could be due to causal intrauterine effects, or confounding (genetic or environmental), or some combination of these. Here we estimate the extent to which the association between maternal BMI and offspring body size is explained by offspring genotype, as a first step towards establishing the importance of genetic confounding. METHODS: We examined the associations of maternal pre-pregnancy BMI with offspring BW and BMI at 1, 5, 10 and 15 years, in three European birth cohorts (n ≤11 498). Bivariate Genomic-relatedness-based Restricted Maximum Likelihood implemented in the GCTA software (GCTA-GREML) was used to estimate the extent to which phenotypic covariance was explained by offspring genotype as captured by common imputed single nucleotide polymorphisms (SNPs). We merged individual participant data from all cohorts, enabling calculation of pooled estimates. RESULTS: Phenotypic covariance (equivalent here to Pearson's correlation coefficient) between maternal BMI and offspring phenotype was 0.15 [95% confidence interval (CI): 0.13, 0.17] for offspring BW, increasing to 0.29 (95% CI: 0.26, 0.31) for offspring 15 year BMI. Covariance explained by offspring genotype was negligible for BW [-0.04 (95% CI: -0.09, 0.01)], but increased to 0.12 (95% CI: 0.04, 0.21) at 15 years, which is equivalent to 43% (95% CI: 15%, 72%) of the phenotypic covariance. Sensitivity analyses using weight, BMI and ponderal index as the offspring phenotype at all ages showed similar results. CONCLUSIONS: Offspring genotype explains a substantial fraction of the covariance between maternal BMI and offspring adolescent BMI. This is consistent with a potentially important role for genetic confounding as a driver of the maternal BMI-offspring BMI association.


Assuntos
Peso ao Nascer/genética , Índice de Massa Corporal , Mães , Obesidade/etiologia , Obesidade Infantil/genética , Adulto , Criança , Feminino , Humanos , Masculino , Obesidade/genética , Gravidez
8.
Int J Epidemiol ; 48(4): 1051-1051k, 2019 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-31321419
9.
Soc Sci Med ; 232: 238-261, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31108330

RESUMO

BACKGROUND: The contingent valuation (CV) method is used to estimate the willingness to pay (WTP) for services and products to inform cost benefit analyses (CBA). A long-standing criticism that stated WTP estimates may be poor indicators of actual WTP, calls into question their validity and the use of such estimates for welfare evaluation, especially in the health sector. Available evidence on the validity of CV studies so far is inconclusive. We systematically reviewed the literature to (1) synthesize the evidence on the criterion validity of WTP/willingness to accept (WTA), (2) undertake a meta-analysis, pooling evidence on the extent of variation between stated and actual WTP values and, (3) explore the reasons for the variation. METHODS: Eight electronic databases were searched, along with citations and reference reviews. 50 papers detailing 159 comparisons were identified and reviewed using a standard proforma. Two reviewers each were involved in the paper selection, review and data extraction. Meta-analysis was conducted using random effects models for ratios of means and percentage differences separately. Meta-bias was investigated using funnel plots. RESULTS: Hypothetical WTP was on average 3.2 times greater than actual WTP, with a range of 0.7-11.8 and 5.7 (0.0-13.6) for ratios of means and percentage differences respectively. However, key methodological differences between surveys of hypothetical and actual values were found. In the meta-analysis, high levels of heterogeneity existed. The overall effect size for mean summaries was 1.79 (1.56-2.04) and 2.37 (1.93-2.80) for percent summaries. Regression analyses identified mixed results on the influence of the different experimental protocols on the variation between stated and actual WTP values. Results indicating publication bias did not account for differences in study design. CONCLUSIONS: The evidence on the criterion validity for CV studies is more mixed than authors are representing because substantial differences in study design between hypothetical and actual WTP/WTA surveys are not accounted for.


Assuntos
Análise Custo-Benefício/métodos , Análise Custo-Benefício/normas , Financiamento Pessoal/estatística & dados numéricos , Humanos , Análise de Regressão , Reprodutibilidade dos Testes
10.
BMJ Open ; 8(12): e024132, 2018 12 19.
Artigo em Inglês | MEDLINE | ID: mdl-30573487

RESUMO

OBJECTIVES: An effectiveness and cost-effectiveness analyses of two-staged community sports interventions; taster sports sessions compared with portfolio of community sport sessions. DESIGN: Quasi-experiment using an interrupted time series design. SETTING: Community sports projects delivered by eight lead partners in London Borough of Hounslow, UK. PARTICIPANTS: Inactive people aged 14 plus years (n=246) were recruited between May 2013 and February 2014. INTERVENTIONS: Community sports interventions delivered in two stages, 6-week programme of taster sport sessions (stage 1) and 6-week programme of portfolio of community sporting sessions delivered by trained coaches (stage 2). OUTCOME MEASURES: (a) Change in days with ≥30 min of self-reported vigorous intensity physical activity (PA), moderate intensity PA, walking and sport; and (b) change in subjective well-being and EQ5D5L quality-adjusted life-years (QALYs). METHODS: Interrupted time series analysis evaluated the effectiveness of the two-staged sports programmes. Cost-effectiveness analysis compares stage 2 with stage 1 from a provider's perspective, reporting outcomes of incremental cost per QALY (2015/2016 price year). Uncertainty was assessed using deterministic and probabilistic sensitivity analyses. RESULTS: Compared with stage 1, counterfactual change at 21 days in PA was lower for vigorous (log odds: -0.52; 95% CI -1 to -0.03), moderate PA (-0.50; 95% CI 0.94 to 0.05) and sport(-0.56; 95% CI -1.02 to -0.10). Stage 2 increased walking (0.28; 95% CI 0.3 to 0.52). Effect overtime was similar. Counterfactual change at 21 days in well-being was positive particularly for 'happiness' (0.29; 95% CI 0.06 to 0.51). Stage 2 was more expensive (£101 per participant) but increased QALYs (0.001; 95% CI -0.034 to 0.036). Cost per QALY for stage 2 was £50 000 and has 29% chance of being cost-effective (£30 000 threshold). CONCLUSION: Community-based sport interventions could increase PA among inactive people. Less intensive sports sessions may be more effective and cost-effective.


Assuntos
Participação da Comunidade/métodos , Exercício Físico , Promoção da Saúde/métodos , Qualidade de Vida , Esportes , Adolescente , Análise Custo-Benefício , Exercício Físico/fisiologia , Exercício Físico/psicologia , Feminino , Humanos , Análise de Séries Temporais Interrompida , Londres , Masculino , Atividade Motora , Comportamento Sedentário , Autorrelato , Esportes/fisiologia , Esportes/psicologia , Inquéritos e Questionários
11.
Wellcome Open Res ; 1: 10, 2016 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-27996064

RESUMO

Background A major cause of disability in secondary progressive multiple sclerosis (SPMS) is progressive brain atrophy, whose pathogenesis is not fully understood. The objective of this study was to identify protein biomarkers of brain atrophy in SPMS. Methods We used surface-enhanced laser desorption-ionization time-of-flight mass spectrometry to carry out an unbiased search for serum proteins whose concentration correlated with the rate of brain atrophy, measured by serial MRI scans over a 2-year period in a well-characterized cohort of 140 patients with SPMS. Protein species were identified by liquid chromatography-electrospray ionization tandem mass spectrometry. Results There was a significant (p<0.004) correlation between the rate of brain atrophy and a rise in the concentration of proteins at 15.1 kDa and 15.9 kDa in the serum. Tandem mass spectrometry identified these proteins as alpha-haemoglobin and beta-haemoglobin, respectively.  The abnormal concentration of free serum haemoglobin was confirmed by ELISA (p<0.001). The serum lactate dehydrogenase activity was also highly significantly raised (p<10-12) in patients with secondary progressive multiple sclerosis. Conclusions An underlying low-grade chronic intravascular haemolysis is a potential source of the iron whose deposition along blood vessels in multiple sclerosis plaques contributes to the neurodegeneration and consequent brain atrophy seen in progressive disease. Chelators of free serum iron will be ineffective in preventing this neurodegeneration, because the iron (Fe2+) is chelated by haemoglobin.

12.
Bioinformatics ; 32(4): 523-32, 2016 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-26504141

RESUMO

MOTIVATION: Analysing the joint association between a large set of responses and predictors is a fundamental statistical task in integrative genomics, exemplified by numerous expression Quantitative Trait Loci (eQTL) studies. Of particular interest are the so-called ': hotspots ': , important genetic variants that regulate the expression of many genes. Recently, attention has focussed on whether eQTLs are common to several tissues, cell-types or, more generally, conditions or whether they are specific to a particular condition. RESULTS: We have implemented MT-HESS, a Bayesian hierarchical model that analyses the association between a large set of predictors, e.g. SNPs, and many responses, e.g. gene expression, in multiple tissues, cells or conditions. Our Bayesian sparse regression algorithm goes beyond ': one-at-a-time ': association tests between SNPs and responses and uses a fully multivariate model search across all linear combinations of SNPs, coupled with a model of the correlation between condition/tissue-specific responses. In addition, we use a hierarchical structure to leverage shared information across different genes, thus improving the detection of hotspots. We show the increase of power resulting from our new approach in an extensive simulation study. Our analysis of two case studies highlights new hotspots that would remain undetected by standard approaches and shows how greater prediction power can be achieved when several tissues are jointly considered. AVAILABILITY AND IMPLEMENTATION: C[Formula: see text] source code and documentation including compilation instructions are available under GNU licence at http://www.mrc-bsu.cam.ac.uk/software/.


Assuntos
Algoritmos , Teorema de Bayes , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Inflamação/genética , Doenças Inflamatórias Intestinais/genética , Locos de Características Quantitativas/genética , Software , Animais , Diabetes Mellitus Tipo 1/genética , Genômica/métodos , Humanos , Modelos Teóricos , Especificidade de Órgãos , Polimorfismo de Nucleotídeo Único/genética , Linguagens de Programação , Ratos , Distribuição Tecidual
13.
PLoS One ; 7(7): e38083, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22815689

RESUMO

BACKGROUND: The clinical, radiological and pathological similarities between sarcoidosis and tuberculosis can make disease differentiation challenging. A complicating factor is that some cases of sarcoidosis may be initiated by mycobacteria. We hypothesised that immunological profiling might provide insight into a possible relationship between the diseases or allow us to distinguish between them. METHODS: We analysed bronchoalveolar lavage (BAL) fluid in sarcoidosis (n = 18), tuberculosis (n = 12) and healthy volunteers (n = 16). We further investigated serum samples in the same groups; sarcoidosis (n = 40), tuberculosis (n = 15) and healthy volunteers (n = 40). A cross-sectional analysis of multiple cytokine profiles was performed and data used to discriminate between samples. RESULTS: We found that BAL profiles were indistinguishable between both diseases and significantly different from healthy volunteers. In sera, tuberculosis patients had significantly lower levels of the Th2 cytokine interleukin-4 (IL-4) than those with sarcoidosis (p = 0.004). Additional serum differences allowed us to create a linear regression model for disease differentiation (within-sample accuracy 91%, cross-validation accuracy 73%). CONCLUSIONS: These data warrant replication in independent cohorts to further develop and validate a serum cytokine signature that may be able to distinguish sarcoidosis from tuberculosis. Systemic Th2 cytokine differences between sarcoidosis and tuberculosis may also underly different disease outcomes to similar respiratory stimuli.


Assuntos
Líquido da Lavagem Broncoalveolar/química , Citocinas/sangue , Sarcoidose Pulmonar/sangue , Sarcoidose Pulmonar/diagnóstico , Tuberculose Pulmonar/sangue , Tuberculose Pulmonar/diagnóstico , Adolescente , Adulto , Idoso , Biomarcadores/sangue , Líquido da Lavagem Broncoalveolar/imunologia , Diagnóstico Diferencial , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Sarcoidose Pulmonar/imunologia , Tuberculose Pulmonar/imunologia , Adulto Jovem
14.
Genome Biol ; 12(2): R13, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21310039

RESUMO

We present a novel pipeline and methodology for simultaneously estimating isoform expression and allelic imbalance in diploid organisms using RNA-seq data. We achieve this by modeling the expression of haplotype-specific isoforms. If unknown, the two parental isoform sequences can be individually reconstructed. A new statistical method, MMSEQ, deconvolves the mapping of reads to multiple transcripts (isoforms or haplotype-specific isoforms). Our software can take into account non-uniform read generation and works with paired-end reads.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , RNA Mensageiro/genética , Análise de Sequência de RNA/métodos , Algoritmos , Desequilíbrio Alélico , Processamento Alternativo , Animais , Haplótipos , Humanos , Camundongos , Modelos Estatísticos , RNA Mensageiro/análise , Software , Transcriptoma
15.
Nucleic Acids Res ; 38(1): e4, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19854940

RESUMO

Affymetrix has recently developed whole-transcript GeneChips-'Gene' and 'Exon' arrays-which interrogate exons along the length of each gene. Although each probe on these arrays is intended to hybridize perfectly to only one transcriptional target, many probes match multiple transcripts located in different parts of the genome or alternative isoforms of the same gene. Existing statistical methods for estimating expression do not take this into account and are thus prone to producing inflated estimates. We propose a method, Multi-Mapping Bayesian Gene eXpression (MMBGX), which disaggregates the signal at 'multi-match' probes. When applied to Gene arrays, MMBGX removes the upward bias of gene-level expression estimates. When applied to Exon arrays, it can further disaggregate the signal between alternative transcripts of the same gene, providing expression estimates of individual splice variants. We demonstrate the performance of MMBGX on simulated data and a tissue mixture data set. We then show that MMBGX can estimate the expression of alternative isoforms within one experimental condition, confirming our results by RT-PCR. Finally, we show that our method for detecting differential splicing has a lower error rate than standard exon-level approaches on a previously validated colon cancer data set.


Assuntos
Processamento Alternativo , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Isoformas de Proteínas/genética , Animais , Teorema de Bayes , Neoplasias do Colo/genética , Neoplasias do Colo/metabolismo , Éxons , Humanos , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Isoformas de Proteínas/metabolismo , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Software
16.
Stat Appl Genet Mol Biol ; 6: Article36, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-18171320

RESUMO

We present a Bayesian hierarchical model for detecting differentially expressed genes using a mixture prior on the parameters representing differential effects. We formulate an easily interpretable 3-component mixture to classify genes as over-expressed, under-expressed and non-differentially expressed, and model gene variances as exchangeable to allow for variability between genes. We show how the proportion of differentially expressed genes, and the mixture parameters, can be estimated in a fully Bayesian way, extending previous approaches where this proportion was fixed and empirically estimated. Good estimates of the false discovery rates are also obtained. Different parametric families for the mixture components can lead to quite different classifications of genes for a given data set. Using Affymetrix data from a knock out and wildtype mice experiment, we show how predictive model checks can be used to guide the choice between possible mixture priors. These checks show that extending the mixture model to allow extra variability around zero instead of the usual point mass null fits the data better. A software package for R is available.


Assuntos
Simulação por Computador , Perfilação da Expressão Gênica , Modelos Genéticos , Animais , Teorema de Bayes , Camundongos
17.
BMC Bioinformatics ; 7: 426, 2006 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-17018143

RESUMO

BACKGROUND: Gene Ontology (GO) terms are often used to assess the results of microarray experiments. The most common way to do this is to perform Fisher's exact tests to find GO terms which are over-represented amongst the genes declared to be differentially expressed in the analysis of the microarray experiment. However, due to the high degree of dependence between GO terms, statistical testing is conservative, and interpretation is difficult. RESULTS: We propose testing groups of GO terms rather than individual terms, to increase statistical power, reduce dependence between tests and improve the interpretation of results. We use the publicly available package POSOC to group the terms. Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms. CONCLUSION: Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data.


Assuntos
Bases de Dados de Proteínas , Perfilação da Expressão Gênica/métodos , Armazenamento e Recuperação da Informação/métodos , Família Multigênica/fisiologia , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteínas/classificação , Proteínas/metabolismo , Algoritmos , Sistemas de Gerenciamento de Base de Dados , Proteínas/genética
18.
Biometrics ; 62(1): 1-9, 2006 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-16542224

RESUMO

We present a Bayesian hierarchical model for detecting differentially expressing genes that includes simultaneous estimation of array effects, and show how to use the output for choosing lists of genes for further investigation. We give empirical evidence that expression-level dependent array effects are needed, and explore different nonlinear functions as part of our model-based approach to normalization. The model includes gene-specific variances but imposes some necessary shrinkage through a hierarchical structure. Model criticism via posterior predictive checks is discussed. Modeling the array effects (normalization) simultaneously with differential expression gives fewer false positive results. To choose a list of genes, we propose to combine various criteria (for instance, fold change and overall expression) into a single indicator variable for each gene. The posterior distribution of these variables is used to pick the list of genes, thereby taking into account uncertainty in parameter estimates. In an application to mouse knockout data, Gene Ontology annotations over- and underrepresented among the genes on the chosen list are consistent with biological expectations.


Assuntos
Teorema de Bayes , Perfilação da Expressão Gênica/estatística & dados numéricos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Animais , Biometria , Reações Falso-Positivas , Perfilação da Expressão Gênica/normas , Camundongos , Camundongos Knockout , Modelos Genéticos
19.
Bioinformatics ; 20(16): 2562-71, 2004 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-15117756

RESUMO

MOTIVATION: Multiclass response (MCR) experiments are those in which there are more than two classes to be compared. In these experiments, though the null hypothesis is simple, there are typically many patterns of gene expression changes across the different classes that led to complex alternatives. In this paper, we propose a new strategy for selecting genes in MCR that is based on a flexible mixture model for the marginal distribution of a modified F-statistic. Using this model, false positive and negative discovery rates can be estimated and combined to produce a rule for selecting a subset of genes. Moreover, the method proposed allows calculation of these rates for any predefined subset of genes. RESULTS: We illustrate the performance our approach using simulated datasets and a real breast cancer microarray dataset. In this latter study, we investigate predefined subset of genes and point out interesting differences between three distinct biological pathways. AVAILABILITY: http://www.bgx.org.uk/software.html


Assuntos
Algoritmos , Neoplasias da Mama/genética , Perfilação da Expressão Gênica/métodos , Modelos Genéticos , Proteínas de Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Humanos , Modelos Estatísticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...