Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Nat Rev Genet ; 19(6): 357-370, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29626206

RESUMO

We are entering a new era of mouse phenomics, driven by large-scale and economical generation of mouse mutants coupled with increasingly sophisticated and comprehensive phenotyping. These studies are generating large, multidimensional gene-phenotype data sets, which are shedding new light on the mammalian genome landscape and revealing many hitherto unknown features of mammalian gene function. Moreover, these phenome resources provide a wealth of disease models and can be integrated with human genomics data as a powerful approach for the interpretation of human genetic variation and its relationship to disease. In the future, the development of novel phenotyping platforms allied to improved computational approaches, including machine learning, for the analysis of phenotype data will continue to enhance our ability to develop a comprehensive and powerful model of mammalian gene-phenotype space.


Assuntos
Bases de Dados Genéticas , Variação Genética , Genoma , Genômica/métodos , Animais , Humanos , Camundongos
2.
Philos Trans A Math Phys Eng Sci ; 381(2247): 20220143, 2023 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-36970832

RESUMO

In this paper, we start by reviewing exchangeability and its relevance to the Bayesian approach. We highlight the predictive nature of Bayesian models and the symmetry assumptions implied by beliefs of an underlying exchangeable sequence of observations. By taking a closer look at the Bayesian bootstrap, the parametric bootstrap of Efron and a version of Bayesian thinking about inference uncovered by Doob based on martingales, we introduce a parametric Bayesian bootstrap. Martingales play a fundamental role. Illustrations are presented as is the relevant theory. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.

3.
PLoS Genet ; 16(10): e1009037, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33035220

RESUMO

Genetic surveillance of malaria parasites supports malaria control programmes, treatment guidelines and elimination strategies. Surveillance studies often pose questions about malaria parasite ancestry (e.g. how antimalarial resistance has spread) and employ statistical methods that characterise parasite population structure. Many of the methods used to characterise structure are unsupervised machine learning algorithms which depend on a genetic distance matrix, notably principal coordinates analysis (PCoA) and hierarchical agglomerative clustering (HAC). PCoA and HAC are sensitive to both the definition of genetic distance and algorithmic specification. Importantly, neither algorithm infers malaria parasite ancestry. As such, PCoA and HAC can inform (e.g. via exploratory data visualisation and hypothesis generation), but not answer comprehensively, key questions about malaria parasite ancestry. We illustrate the sensitivity of PCoA and HAC using 393 Plasmodium falciparum whole genome sequences collected from Cambodia and neighbouring regions (where antimalarial resistance has emerged and spread recently) and we provide tentative guidance for the use and interpretation of PCoA and HAC in malaria parasite genetic epidemiology. This guidance includes a call for fully transparent and reproducible analysis pipelines that feature (i) a clearly outlined scientific question; (ii) a clear justification of analytical methods used to answer the scientific question along with discussion of any inferential limitations; (iii) publicly available genetic distance matrices when downstream analyses depend on them; and (iv) sensitivity analyses. To bridge the inferential disconnect between the output of non-inferential unsupervised learning algorithms and the scientific questions of interest, tailor-made statistical models are needed to infer malaria parasite ancestry. In the absence of such models speculative reasoning should feature only as discussion but not as results.


Assuntos
Genética Populacional/estatística & dados numéricos , Malária Falciparum/epidemiologia , Epidemiologia Molecular , Plasmodium falciparum/genética , Algoritmos , Antimaláricos/uso terapêutico , Camboja/epidemiologia , Análise por Conglomerados , Resistência a Medicamentos/genética , Genótipo , Humanos , Malária Falciparum/tratamento farmacológico , Malária Falciparum/genética , Malária Falciparum/parasitologia , Plasmodium falciparum/patogenicidade , Aprendizado de Máquina não Supervisionado
4.
PLoS Genet ; 8(2): e1002505, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22383892

RESUMO

Metabolic Syndrome (MetS) is highly prevalent and has considerable public health impact, but its underlying genetic factors remain elusive. To identify gene networks involved in MetS, we conducted whole-genome expression and genotype profiling on abdominal (ABD) and gluteal (GLU) adipose tissue, and whole blood (WB), from 29 MetS cases and 44 controls. Co-expression network analysis for each tissue independently identified nine, six, and zero MetS-associated modules of coexpressed genes in ABD, GLU, and WB, respectively. Of 8,992 probesets expressed in ABD or GLU, 685 (7.6%) were expressed in ABD and 51 (0.6%) in GLU only. Differential eigengene network analysis of 8,256 shared probesets detected 22 shared modules with high preservation across adipose depots (D(ABD-GLU) = 0.89), seven of which were associated with MetS (FDR P<0.01). The strongest associated module, significantly enriched for immune response-related processes, contained 94/620 (15%) genes with inter-depot differences. In an independent cohort of 145/141 twins with ABD and WB longitudinal expression data, median variability in ABD due to familiality was greater for MetS-associated versus un-associated modules (ABD: 0.48 versus 0.18, P = 0.08; GLU: 0.54 versus 0.20, P = 7.8×10(-4)). Cis-eQTL analysis of probesets associated with MetS (FDR P<0.01) and/or inter-depot differences (FDR P<0.01) provided evidence for 32 eQTLs. Corresponding eSNPs were tested for association with MetS-related phenotypes in two GWAS of >100,000 individuals; rs10282458, affecting expression of RARRES2 (encoding chemerin), was associated with body mass index (BMI) (P = 6.0×10(-4)); and rs2395185, affecting inter-depot differences of HLA-DRB1 expression, was associated with high-density lipoprotein (P = 8.7×10(-4)) and BMI-adjusted waist-to-hip ratio (P = 2.4×10(-4)). Since many genes and their interactions influence complex traits such as MetS, integrated analysis of genotypes and coexpression networks across multiple tissues relevant to clinical traits is an efficient strategy to identify novel associations.


Assuntos
Tecido Adiposo/metabolismo , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Síndrome Metabólica/genética , Índice de Massa Corporal , Quimiocinas/genética , Feminino , Loci Gênicos , Estudo de Associação Genômica Ampla , Cadeias HLA-DRB1/genética , Humanos , Peptídeos e Proteínas de Sinalização Intercelular , Síndrome Metabólica/patologia , Especificidade de Órgãos , Fenótipo , Locos de Características Quantitativas
5.
Malar J ; 13: 102, 2014 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-24636676

RESUMO

BACKGROUND: Reliable measures of anti-malarial resistance are crucial for malaria control. Resistance is typically a complex trait: multiple mutations in a single parasite (a haplotype or genotype) are necessary for elaboration of the resistant phenotype. The frequency of a genetic motif (proportion of parasite clones in the parasite population that carry a given allele, haplotype or genotype) is a useful measure of resistance. In areas of high endemicity, malaria patients generally harbour multiple parasite clones; they have multiplicities of infection (MOIs) greater than one. However, most standard experimental procedures only allow measurement of marker prevalence (proportion of patient blood samples that test positive for a given mutation or combination of mutations), not frequency. It is misleading to compare marker prevalence between sites that have different mean MOIs; frequencies are required instead. METHODS: A Bayesian statistical model was developed to estimate Plasmodium falciparum genetic motif frequencies from prevalence data collected in the field. To assess model performance and computational speed, a detailed simulation study was implemented. Application of the model was tested using datasets from five sites in Uganda. The datasets included prevalence data on markers of resistance to sulphadoxine-pyrimethamine and an average MOI estimate for each study site. RESULTS: The simulation study revealed that the genetic motif frequencies that were estimated using the model were more accurate and precise than conventional estimates based on direct counting. Importantly, the model did not require measurements of the MOI in each patient; it used the average MOI in the patient population. Furthermore, if a dataset included partially genotyped patient blood samples, the model imputed the data that were missing. Using the model and the Ugandan data, genotype frequencies were estimated and four biologically relevant genotypes were identified. CONCLUSIONS: The model allows fast, accurate, reliable estimation of the frequency of genetic motifs associated with resistance to anti-malarials using prevalence data collected from malaria patients. The model does not require per-patient MOI measurements and can easily analyse data from five markers. The model will be a valuable tool for monitoring markers of anti-malarial drug resistance, including markers of resistance to artemisinin derivatives and partner drugs.


Assuntos
Resistência a Medicamentos , Frequência do Gene , Malária Falciparum/parasitologia , Plasmodium falciparum/genética , Plasmodium falciparum/isolamento & purificação , Genótipo , Haplótipos , Humanos , Malária Falciparum/epidemiologia , Modelos Estatísticos , Plasmodium falciparum/classificação , Plasmodium falciparum/efeitos dos fármacos , Prevalência , Uganda
6.
Methods ; 59(1): 71-9, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23079396

RESUMO

The stochastic nature of generating eukaryotic transcripts challenges conventional methods for obtaining and analyzing single-cell gene expression data. In order to address the inherent noise, detailed methods are described on how to collect data on multiple genes in a large number of single cells using microfluidic arrays. As part of a study exploring the effect of genotype on Wnt pathway activation, data were collected for 96 qPCR assays on 1440 lymphoblastoid cells. The description of methods includes preliminary data processing steps. The methods used in the collection and analysis of single-cell qPCR data are contrasted with those used in conventional qPCR.


Assuntos
Perfilação da Expressão Gênica/métodos , Células Progenitoras Linfoides/metabolismo , Reação em Cadeia da Polimerase em Tempo Real , Análise de Célula Única , Linhagem Celular , Interpretação Estatística de Dados , Humanos , Limite de Detecção , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Via de Sinalização Wnt
7.
PLoS Genet ; 7(9): e1002270, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21931564

RESUMO

We have performed a metabolite quantitative trait locus (mQTL) study of the (1)H nuclear magnetic resonance spectroscopy ((1)H NMR) metabolome in humans, building on recent targeted knowledge of genetic drivers of metabolic regulation. Urine and plasma samples were collected from two cohorts of individuals of European descent, with one cohort comprised of female twins donating samples longitudinally. Sample metabolite concentrations were quantified by (1)H NMR and tested for association with genome-wide single-nucleotide polymorphisms (SNPs). Four metabolites' concentrations exhibited significant, replicable association with SNP variation (8.6×10(-11)

Assuntos
Estudo de Associação Genômica Ampla , Redes e Vias Metabólicas/genética , Metaboloma/genética , Locos de Características Quantitativas/genética , Seleção Genética , Acetiltransferases/genética , Acetiltransferases/metabolismo , Dimetilaminas/sangue , Dimetilaminas/metabolismo , Feminino , Haplótipos , Humanos , Isobutiratos/metabolismo , Isobutiratos/urina , Espectroscopia de Ressonância Magnética , Metilaminas/metabolismo , Metilaminas/urina , Polimorfismo de Nucleotídeo Único
8.
BMC Bioinformatics ; 14 Suppl 13: S8, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24267288

RESUMO

BACKGROUND: In order to better understand cancer as a complex disease with multiple genetic and epigenetic factors, it is vital to model the fundamental biological relationships among these alterations as well as their relationships with important clinical outcomes. METHODS: We develop an integrative network-based Bayesian analysis (iNET) approach that allows us to jointly analyze multi-platform high-dimensional genomic data in a computationally efficient manner. The iNET approach is formulated as an objective Bayesian model selection problem for Gaussian graphical models to model joint dependencies among platform-specific features using known biological mechanisms. Using both simulated datasets and a glioblastoma (GBM) study from The Cancer Genome Atlas (TCGA), we illustrate the iNET approach via integrating three data types, microRNA, gene expression (mRNA), and patient survival time. RESULTS: We show that the iNET approach has greater power in identifying cancer-related microRNAs than non-integrative approaches based on realistic simulated datasets. In the TCGA GBM study, we found many mRNA-microRNA pairs and microRNAs that are associated with patient survival time, with some of these associations identified in previous studies. CONCLUSIONS: The iNET discovers relationships consistent with the underlying biological mechanisms among these variables, as well as identifying important biomarkers that are potentially relevant to patient survival. In addition, we identified some microRNAs that can potentially affect patient survival which are missed by non-integrative approaches.


Assuntos
Teorema de Bayes , Genômica/métodos , Glioblastoma/genética , Neoplasias/genética , Integração de Sistemas , Atlas como Assunto , Simulação por Computador , Humanos , MicroRNAs/genética , Distribuição Normal , RNA Mensageiro/genética , Software , Análise de Sobrevida
9.
Bioinformatics ; 28(22): 2981-2, 2012 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-22962342

RESUMO

SUMMARY: GREVE has been developed to assist with the identification of recurrent genomic aberrations across cancer samples. The exact characterization of such aberrations remains a challenge despite the availability of increasing amount of data, from SNParray to next-generation sequencing. Furthermore, genomic aberrations in cancer are especially difficult to handle because they are, by nature, unique to the patients. However, their recurrence in specific regions of the genome has been shown to reflect their relevance in the development of tumors. GREVE makes use of previously characterized events to identify such regions and focus any further analysis. AVAILABILITY: GREVE is available through a web interface and open-source application (http://www.well.ox.ac.uk/GREVE).


Assuntos
Aberrações Cromossômicas , Genoma Humano , Neoplasias/genética , Software , Pontos de Quebra do Cromossomo , Humanos
10.
Mol Syst Biol ; 7: 525, 2011 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-21878913

RESUMO

¹H Nuclear Magnetic Resonance spectroscopy (¹H NMR) is increasingly used to measure metabolite concentrations in sets of biological samples for top-down systems biology and molecular epidemiology. For such purposes, knowledge of the sources of human variation in metabolite concentrations is valuable, but currently sparse. We conducted and analysed a study to create such a resource. In our unique design, identical and non-identical twin pairs donated plasma and urine samples longitudinally. We acquired ¹H NMR spectra on the samples, and statistically decomposed variation in metabolite concentration into familial (genetic and common-environmental), individual-environmental, and longitudinally unstable components. We estimate that stable variation, comprising familial and individual-environmental factors, accounts on average for 60% (plasma) and 47% (urine) of biological variation in ¹H NMR-detectable metabolite concentrations. Clinically predictive metabolic variation is likely nested within this stable component, so our results have implications for the effective design of biomarker-discovery studies. We provide a power-calculation method which reveals that sample sizes of a few thousand should offer sufficient statistical precision to detect ¹H NMR-based biomarkers quantifying predisposition to disease.


Assuntos
Biomarcadores , Interação Gene-Ambiente , Metaboloma/genética , Ressonância Magnética Nuclear Biomolecular/métodos , Biologia de Sistemas/métodos , População Branca/genética , Idoso , Algoritmos , Biomarcadores/sangue , Biomarcadores/urina , Bases de Dados Genéticas , Feminino , Variação Genética , Humanos , Pessoa de Meia-Idade , Modelos Estatísticos , Projetos de Pesquisa , Tamanho da Amostra , Gêmeos Dizigóticos/genética , Gêmeos Monozigóticos/genética
11.
Proc Natl Acad Sci U S A ; 106(18): 7559-64, 2009 May 05.
Artigo em Inglês | MEDLINE | ID: mdl-19376968

RESUMO

We have cultured Plasmodium falciparum directly from the blood of infected individuals to examine patterns of mature-stage gene expression in patient isolates. Analysis of the transcriptome of P. falciparum is complicated by the highly periodic nature of gene expression because small variations in the stage of parasite development between samples can lead to an apparent difference in gene expression values. To address this issue, we have developed statistical likelihood-based methods to estimate cell cycle progression and commitment to asexual or sexual development lineages in our samples based on microscopy and gene expression patterns. In cases subsequently matched for temporal development, we find that transcriptional patterns in ex vivo culture display little variation across patients with diverse clinical profiles and closely resemble transcriptional profiles that occur in vitro. These statistical methods, available to the research community, assist in the design and interpretation of P. falciparum expression profiling experiments where it is difficult to separate true differential expression from cell-cycle dependent expression. We reanalyze an existing dataset of in vivo patient expression profiles and conclude that previously observed discrete variation is consistent with the commitment of a varying proportion of the parasite population to the sexual development lineage.


Assuntos
Ciclo Celular , Perfilação da Expressão Gênica , Plasmodium falciparum/crescimento & desenvolvimento , Plasmodium falciparum/genética , Animais , Ciclo Celular/genética , Células Cultivadas , Humanos
12.
J Proteome Res ; 10(12): 5562-7, 2011 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-22010953

RESUMO

In biomarker discovery studies, uncertainty associated with case and control labels is often overlooked. By omitting to take into account label uncertainty, model parameters and the predictive risk can become biased, sometimes severely. The most common situation is when the control set contains an unknown number of undiagnosed, or future, cases. This has a marked impact in situations where the model needs to be well-calibrated, e.g., when the prediction performance of a biomarker panel is evaluated. Failing to account for class label uncertainty may lead to underestimation of classification performance and bias in parameter estimates. This can further impact on meta-analysis for combining evidence from multiple studies. Using a simulation study, we outline how conventional statistical models can be modified to address class label uncertainty leading to well-calibrated prediction performance estimates and reduced bias in meta-analysis. We focus on the problem of mislabeled control subjects in case-control studies, i.e., when some of the control subjects are undiagnosed cases, although the procedures we report are generic. The uncertainty in control status is a particular situation common in biomarker discovery studies in the context of genomic and molecular epidemiology, where control subjects are commonly sampled from the general population with an established expected disease incidence rate.


Assuntos
Viés , Biomarcadores/química , Estudos de Casos e Controles , Algoritmos , Biomarcadores/análise , Simulação por Computador , Humanos , Modelos Logísticos , Metanálise como Assunto , Curva ROC , Reprodutibilidade dos Testes , Fatores de Risco , Incerteza
13.
Genet Epidemiol ; 34(4): 299-308, 2010 May.
Artigo em Inglês | MEDLINE | ID: mdl-20025065

RESUMO

Standard techniques for single marker quantitative trait mapping perform poorly in detecting complex interacting genetic influences. When a genetic marker interacts with other genetic markers and/or environmental factors to influence a quantitative trait, a sample of individuals will show different effects according to their exposure to other interacting factors. This paper presents a Bayesian mixture model, which effectively models heterogeneous genetic effects apparent at a single marker. We compute approximate Bayes factors which provide an efficient strategy for screening genetic markers (genome-wide) for evidence of a heterogeneous effect on a quantitative trait. We present a simulation study which demonstrates that the approximation is good and provide a real data example which identifies a population-specific genetic effect on gene expression in the HapMap CEU and YRI populations. We advocate the use of the model as a strategy for identifying candidate interacting markers without any knowledge of the nature or order of the interaction. The source of heterogeneity can be modeled as an extension.


Assuntos
Loci Gênicos , Modelos Estatísticos , Locos de Características Quantitativas , Algoritmos , Alelos , Teorema de Bayes , Simulação por Computador , Meio Ambiente , Marcadores Genéticos , Genótipo , Humanos , Modelos Genéticos , Razão de Chances , Software
14.
Proteome Sci ; 9: 73, 2011 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-22093360

RESUMO

BACKGROUND: The advent of affinity-based proteomics technologies for global protein profiling provides the prospect of finding new molecular biomarkers for common, multifactorial disorders. The molecular phenotypes obtained from studies on such platforms are driven by multiple sources, including genetic, environmental, and experimental components. In characterizing the contribution of different sources of variation to the measured phenotypes, the aim is to facilitate the design and interpretation of future biomedical studies employing exploratory and multiplexed technologies. Thus, biometrical genetic modelling of twin or other family data can be used to decompose the variation underlying a phenotype into biological and experimental components. RESULTS: Using antibody suspension bead arrays and antibodies from the Human Protein Atlas, we study unfractionated serum from a longitudinal study on 154 twins. In this study, we provide a detailed description of how the variation in a molecular phenotype in terms of protein profile can be decomposed into familial i.e. genetic and common environmental; individual environmental, short-term biological and experimental components. The results show that across 69 antibodies analyzed in the study, the median proportion of the total variation explained by familial sources is 12% (IQR 1-22%), and the median proportion of the total variation attributable to experimental sources is 63% (IQR 53-72%). CONCLUSION: The variability analysis of antibody arrays highlights the importance to consider variability components and their relative contributions when designing and evaluating studies for biomarker discoveries with exploratory, high-throughput and multiplexed methods.

15.
Elife ; 102021 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-34225842

RESUMO

Severe falciparum malaria has substantially affected human evolution. Genetic association studies of patients with clinically defined severe malaria and matched population controls have helped characterise human genetic susceptibility to severe malaria, but phenotypic imprecision compromises discovered associations. In areas of high malaria transmission, the diagnosis of severe malaria in young children and, in particular, the distinction from bacterial sepsis are imprecise. We developed a probabilistic diagnostic model of severe malaria using platelet and white count data. Under this model, we re-analysed clinical and genetic data from 2220 Kenyan children with clinically defined severe malaria and 3940 population controls, adjusting for phenotype mis-labelling. Our model, validated by the distribution of sickle trait, estimated that approximately one-third of cases did not have severe malaria. We propose a data-tilting approach for case-control studies with phenotype mis-labelling and show that this reduces false discovery rates and improves statistical power in genome-wide association studies.


In areas of sub-Saharan Africa where malaria is common, most people are frequently exposed to the bites of mosquitoes carrying malaria parasites, so they often have malaria parasites in their blood. Young children, who have not yet built up strong immunity against malaria, often fall ill with severe malaria, a life-threatening disease. It is unclear why some children develop severe malaria and die, while other children with high numbers of parasites in their blood do not develop any apparent symptoms. Genetic susceptibility studies are designed to uncover why such differences exist by comparing individuals with severe malaria (referred to as 'cases') with individuals drawn from the general population (known as 'controls'). But severe malaria can be a challenge to diagnose. Since high numbers of malaria parasites can be found in healthy children, it is sometimes difficult to determine whether the parasites are making a child ill, or whether they are a coincidental finding. Consequently, some of the 'cases' recruited into these studies may actually have a different disease, such as bacterial sepsis. This ultimately affects how the studies are interpreted, and introduces error and inaccuracy into the data. Watson, Ndila et al. investigated whether measuring blood biomarkers in patients (derived from the complete blood count, including platelet counts and white blood cell counts) could improve the accuracy with which malaria is diagnosed. They developed a new mathematical model that incorporates platelet and white blood cell counts. This model estimates that in a large cohort of 2,220 Kenyan children diagnosed with severe malaria, around one third of enrolled children did not actually have this disease. Further analysis suggests that patients with severe malaria are highly unlikely to have platelet counts higher than 200,000 per microlitre. This defines a cut-off that researchers can use to avoid recruiting patients who do not have severe malaria in future studies. Additionally, the ability to diagnose severe malaria more accurately can make it easier to detect and treat other diseases with similar symptoms in children with high numbers of malaria parasites in their blood. Watson, Ndila et al.'s findings support the recommendation that all children with suspected malaria be given broad spectrum antibiotics, as many misdiagnosed children will likely have bacterial sepsis. It also suggests that using complete blood counts, which are cheap to obtain and increasingly available in low-resource settings, could improve diagnostic accuracy in future clinical studies of severe malaria. This could ultimately improve the ability of these studies to find new treatments for this life-threatening disease.


Assuntos
Estudo de Associação Genômica Ampla , Malária , Fenótipo , Adolescente , Adulto , Estudos de Casos e Controles , Criança , Pré-Escolar , Proteínas da Matriz Extracelular/genética , Feminino , Genômica , Humanos , Quênia , Malária/diagnóstico , Malária/epidemiologia , Malária Falciparum , Masculino , Polimorfismo Genético
16.
Bioinformatics ; 25(2): 197-203, 2009 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-19028720

RESUMO

MOTIVATION: Conventional phylogenetic analysis for characterizing the relatedness between taxa typically assumes that a single relationship exists between species at every site along the genome. This assumption fails to take into account recombination which is a fundamental process for generating diversity and can lead to spurious results. Recombination induces a localized phylogenetic structure which may vary along the genome. Here, we generalize a hidden Markov model (HMM) to infer changes in phylogeny along multiple sequence alignments while accounting for rate heterogeneity; the hidden states refer to the unobserved phylogenic topology underlying the relatedness at a genomic location. The dimensionality of the number of hidden states (topologies) and their structure are random (not known a priori) and are sampled using Markov chain Monte Carlo algorithms. The HMM structure allows us to analytically integrate out over all possible changepoints in topologies as well as all the unknown branch lengths. RESULTS: We demonstrate our approach on simulated data and also to the genome of a suspected HIV recombinant strain as well as to an investigation of recombination in the sequences of 15 laboratory mouse strains sequenced by Perlegen Sciences. Our findings indicate that our method allows us to distinguish between rate heterogeneity and variation in phylogeny caused by recombination without being restricted to 4-taxa data.


Assuntos
Filogenia , Recombinação Genética/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Animais , Teorema de Bayes , Genoma , Cadeias de Markov , Camundongos , Seleção Genética
17.
Bioinformatics ; 25(22): 2929-36, 2009 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-19696047

RESUMO

MOTIVATION: Identifying the network structure through which genes and their products interact can help to elucidate normal cell physiology as well as the genetic architecture of pathological phenotypes. Recently, a number of gene network inference tools have appeared based on Gaussian graphical model representations. Following this, we introduce a novel Boosting approach to learn the structure of a high-dimensional Gaussian graphical model motivated by the applications in genomics. A particular emphasis is paid to the inclusion of partial prior knowledge on the structure of the graph. With the increasing availability of pathway information and large-scale gene expression datasets, we believe that conditioning on prior knowledge will be an important aspect in raising the statistical power of structural learning algorithms to infer true conditional dependencies. RESULTS: Our Boosting approach, termed BoostiGraph, is conceptually and algorithmically simple. It complements recent work on the network inference problem based on Lasso-type approaches. BoostiGraph is computationally cheap and is applicable to very high-dimensional graphs. For example, on graphs of order 5000 nodes, it is able to map out paths for the conditional independence structure in few minutes. Using computer simulations, we investigate the ability of our method with and without prior information to infer Gaussian graphical models from artificial as well as actual microarray datasets. The experimental results demonstrate that, using our method, it is possible to recover the true network topology with relatively high accuracy. AVAILABILITY: This method and all other associated files are freely available from http://www.stats.ox.ac.uk/~anjum/.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , Simulação por Computador , Perfilação da Expressão Gênica/métodos , Modelos Estatísticos , Reconhecimento Automatizado de Padrão , Proteoma/genética
18.
Trials ; 21(1): 865, 2020 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-33081827

RESUMO

An amendment to this paper has been published and can be accessed via the original article.

19.
Trials ; 21(1): 386, 2020 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-32381030

RESUMO

BACKGROUND: Exploration and modelling of heterogeneous treatment effects as a function of baseline covariates is an important aspect of precision medicine in randomised controlled trials (RCTs). Randomisation generally guarantees the internal validity of an RCT, but heterogeneity in treatment effect can reduce external validity. Estimation of heterogeneous treatment effects is usually done via a predictive model for individual outcomes, where one searches for interactions between treatment allocation and important patient baseline covariates. However, such models are prone to overfitting and multiple testing and typically demand a transformation of the outcome measurement, for example, from the absolute risk in the original RCT to log-odds of risk in the predictive model. METHODS: We show how reference classes derived from baseline covariates can be used to explore heterogeneous treatment effects via a two-stage approach. We first estimate a risk score which captures on a single dimension some of the heterogeneity in outcomes of the trial population. Heterogeneity in the treatment effect can then be explored via reweighting schemes along this axis of variation. This two-stage approach bypasses the search for interactions with multiple covariates, thus protecting against multiple testing. It also allows for exploration of heterogeneous treatment effects on the original outcome scale of the RCT. This approach would typically be applied to multivariable models of baseline risk to assess the stability of average treatment effects with respect to the distribution of risk in the population studied. CASE STUDY: We illustrate this approach using the single largest randomised treatment trial in severe falciparum malaria and demonstrate how the estimated treatment effect in terms of absolute mortality risk reduction increases considerably in higher risk strata. CONCLUSIONS: 'Local' and 'tilting' reweighting schemes based on ranking patients by baseline risk can be used as a general approach for exploring, graphing and reporting heterogeneity of treatment effect in RCTs. TRIAL REGISTRATION: ISRCTN clinical trials registry: ISRCTN50258054. Prospectively registered on 22 July 2005.


Assuntos
Previsões/métodos , Malária Falciparum/terapia , Projetos de Pesquisa/tendências , Algoritmos , Humanos , Malária Falciparum/mortalidade , Mortalidade , Avaliação de Resultados em Cuidados de Saúde , Medicina de Precisão , Valor Preditivo dos Testes , Ensaios Clínicos Controlados Aleatórios como Assunto , Comportamento de Redução do Risco , Usos Terapêuticos
20.
Trials ; 21(1): 156, 2020 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-32041653

RESUMO

BACKGROUND: Retrospective exploratory analyses of randomised controlled trials (RCTs) seeking to identify treatment effect heterogeneity (TEH) are prone to bias and false positives. Yet the desire to learn all we can from exhaustive data measurements on trial participants motivates the inclusion of such analyses within RCTs. Moreover, widespread advances in machine learning (ML) methods hold potential to utilise such data to identify subjects exhibiting heterogeneous treatment response. METHODS: We present a novel analysis strategy for detecting TEH in randomised data using ML methods, whilst ensuring proper control of the false positive discovery rate. Our approach uses random data partitioning with statistical or ML-based prediction on held-out data. This method can test for both crossover TEH (switch in optimal treatment) and non-crossover TEH (systematic variation in benefit across patients). The former is done via a two-sample hypothesis test measuring overall predictive performance. The latter is done via 'stacking' the ML predictors alongside a classical statistical model to formally test the added benefit of the ML algorithm. An adaptation of recent statistical theory allows for the construction of a valid aggregate p value. This testing strategy is independent of the choice of ML method. RESULTS: We demonstrate our approach with a re-analysis of the SEAQUAMAT trial, which compared quinine to artesunate for the treatment of severe malaria in Asian adults. We find no evidence for any subgroup who would benefit from a change in treatment from the current standard of care, artesunate, but strong evidence for significant TEH within the artesunate treatment group. In particular, we find that artesunate provides a differential benefit to patients with high numbers of circulating ring stage parasites. CONCLUSIONS: ML analysis plans using computational notebooks (documents linked to a programming language that capture the model parameter settings, data processing choices, and evaluation criteria) along with version control can improve the robustness and transparency of RCT exploratory analyses. A data-partitioning algorithm allows researchers to apply the latest ML techniques safe in the knowledge that any declared associations are statistically significant at a user-defined level.


Assuntos
Antimaláricos/uso terapêutico , Artesunato/uso terapêutico , Aprendizado de Máquina , Malária Falciparum/tratamento farmacológico , Plasmodium falciparum/efeitos dos fármacos , Quinina/uso terapêutico , Ensaios Clínicos Controlados Aleatórios como Assunto , Adulto , Algoritmos , Ásia/epidemiologia , Humanos , Malária Falciparum/epidemiologia , Malária Falciparum/parasitologia , Estudos Retrospectivos , Resultado do Tratamento
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA