Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nat Rev Genet ; 19(6): 357-370, 2018 06.
Artículo en Inglés | MEDLINE | ID: mdl-29626206

RESUMEN

We are entering a new era of mouse phenomics, driven by large-scale and economical generation of mouse mutants coupled with increasingly sophisticated and comprehensive phenotyping. These studies are generating large, multidimensional gene-phenotype data sets, which are shedding new light on the mammalian genome landscape and revealing many hitherto unknown features of mammalian gene function. Moreover, these phenome resources provide a wealth of disease models and can be integrated with human genomics data as a powerful approach for the interpretation of human genetic variation and its relationship to disease. In the future, the development of novel phenotyping platforms allied to improved computational approaches, including machine learning, for the analysis of phenotype data will continue to enhance our ability to develop a comprehensive and powerful model of mammalian gene-phenotype space.


Asunto(s)
Bases de Datos Genéticas , Variación Genética , Genoma , Genómica/métodos , Animales , Humanos , Ratones
2.
Philos Trans A Math Phys Eng Sci ; 381(2247): 20220143, 2023 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-36970832

RESUMEN

In this paper, we start by reviewing exchangeability and its relevance to the Bayesian approach. We highlight the predictive nature of Bayesian models and the symmetry assumptions implied by beliefs of an underlying exchangeable sequence of observations. By taking a closer look at the Bayesian bootstrap, the parametric bootstrap of Efron and a version of Bayesian thinking about inference uncovered by Doob based on martingales, we introduce a parametric Bayesian bootstrap. Martingales play a fundamental role. Illustrations are presented as is the relevant theory. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.

3.
PLoS Genet ; 16(10): e1009037, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-33035220

RESUMEN

Genetic surveillance of malaria parasites supports malaria control programmes, treatment guidelines and elimination strategies. Surveillance studies often pose questions about malaria parasite ancestry (e.g. how antimalarial resistance has spread) and employ statistical methods that characterise parasite population structure. Many of the methods used to characterise structure are unsupervised machine learning algorithms which depend on a genetic distance matrix, notably principal coordinates analysis (PCoA) and hierarchical agglomerative clustering (HAC). PCoA and HAC are sensitive to both the definition of genetic distance and algorithmic specification. Importantly, neither algorithm infers malaria parasite ancestry. As such, PCoA and HAC can inform (e.g. via exploratory data visualisation and hypothesis generation), but not answer comprehensively, key questions about malaria parasite ancestry. We illustrate the sensitivity of PCoA and HAC using 393 Plasmodium falciparum whole genome sequences collected from Cambodia and neighbouring regions (where antimalarial resistance has emerged and spread recently) and we provide tentative guidance for the use and interpretation of PCoA and HAC in malaria parasite genetic epidemiology. This guidance includes a call for fully transparent and reproducible analysis pipelines that feature (i) a clearly outlined scientific question; (ii) a clear justification of analytical methods used to answer the scientific question along with discussion of any inferential limitations; (iii) publicly available genetic distance matrices when downstream analyses depend on them; and (iv) sensitivity analyses. To bridge the inferential disconnect between the output of non-inferential unsupervised learning algorithms and the scientific questions of interest, tailor-made statistical models are needed to infer malaria parasite ancestry. In the absence of such models speculative reasoning should feature only as discussion but not as results.


Asunto(s)
Genética de Población/estadística & datos numéricos , Malaria Falciparum/epidemiología , Epidemiología Molecular , Plasmodium falciparum/genética , Algoritmos , Antimaláricos/uso terapéutico , Cambodia/epidemiología , Análisis por Conglomerados , Resistencia a Medicamentos/genética , Genotipo , Humanos , Malaria Falciparum/tratamiento farmacológico , Malaria Falciparum/genética , Malaria Falciparum/parasitología , Plasmodium falciparum/patogenicidad , Aprendizaje Automático no Supervisado
4.
PLoS Genet ; 8(2): e1002505, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22383892

RESUMEN

Metabolic Syndrome (MetS) is highly prevalent and has considerable public health impact, but its underlying genetic factors remain elusive. To identify gene networks involved in MetS, we conducted whole-genome expression and genotype profiling on abdominal (ABD) and gluteal (GLU) adipose tissue, and whole blood (WB), from 29 MetS cases and 44 controls. Co-expression network analysis for each tissue independently identified nine, six, and zero MetS-associated modules of coexpressed genes in ABD, GLU, and WB, respectively. Of 8,992 probesets expressed in ABD or GLU, 685 (7.6%) were expressed in ABD and 51 (0.6%) in GLU only. Differential eigengene network analysis of 8,256 shared probesets detected 22 shared modules with high preservation across adipose depots (D(ABD-GLU) = 0.89), seven of which were associated with MetS (FDR P<0.01). The strongest associated module, significantly enriched for immune response-related processes, contained 94/620 (15%) genes with inter-depot differences. In an independent cohort of 145/141 twins with ABD and WB longitudinal expression data, median variability in ABD due to familiality was greater for MetS-associated versus un-associated modules (ABD: 0.48 versus 0.18, P = 0.08; GLU: 0.54 versus 0.20, P = 7.8×10(-4)). Cis-eQTL analysis of probesets associated with MetS (FDR P<0.01) and/or inter-depot differences (FDR P<0.01) provided evidence for 32 eQTLs. Corresponding eSNPs were tested for association with MetS-related phenotypes in two GWAS of >100,000 individuals; rs10282458, affecting expression of RARRES2 (encoding chemerin), was associated with body mass index (BMI) (P = 6.0×10(-4)); and rs2395185, affecting inter-depot differences of HLA-DRB1 expression, was associated with high-density lipoprotein (P = 8.7×10(-4)) and BMI-adjusted waist-to-hip ratio (P = 2.4×10(-4)). Since many genes and their interactions influence complex traits such as MetS, integrated analysis of genotypes and coexpression networks across multiple tissues relevant to clinical traits is an efficient strategy to identify novel associations.


Asunto(s)
Tejido Adiposo/metabolismo , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Síndrome Metabólico/genética , Índice de Masa Corporal , Quimiocinas/genética , Femenino , Sitios Genéticos , Estudio de Asociación del Genoma Completo , Cadenas HLA-DRB1/genética , Humanos , Péptidos y Proteínas de Señalización Intercelular , Síndrome Metabólico/patología , Especificidad de Órganos , Fenotipo , Sitios de Carácter Cuantitativo
5.
Malar J ; 13: 102, 2014 Mar 17.
Artículo en Inglés | MEDLINE | ID: mdl-24636676

RESUMEN

BACKGROUND: Reliable measures of anti-malarial resistance are crucial for malaria control. Resistance is typically a complex trait: multiple mutations in a single parasite (a haplotype or genotype) are necessary for elaboration of the resistant phenotype. The frequency of a genetic motif (proportion of parasite clones in the parasite population that carry a given allele, haplotype or genotype) is a useful measure of resistance. In areas of high endemicity, malaria patients generally harbour multiple parasite clones; they have multiplicities of infection (MOIs) greater than one. However, most standard experimental procedures only allow measurement of marker prevalence (proportion of patient blood samples that test positive for a given mutation or combination of mutations), not frequency. It is misleading to compare marker prevalence between sites that have different mean MOIs; frequencies are required instead. METHODS: A Bayesian statistical model was developed to estimate Plasmodium falciparum genetic motif frequencies from prevalence data collected in the field. To assess model performance and computational speed, a detailed simulation study was implemented. Application of the model was tested using datasets from five sites in Uganda. The datasets included prevalence data on markers of resistance to sulphadoxine-pyrimethamine and an average MOI estimate for each study site. RESULTS: The simulation study revealed that the genetic motif frequencies that were estimated using the model were more accurate and precise than conventional estimates based on direct counting. Importantly, the model did not require measurements of the MOI in each patient; it used the average MOI in the patient population. Furthermore, if a dataset included partially genotyped patient blood samples, the model imputed the data that were missing. Using the model and the Ugandan data, genotype frequencies were estimated and four biologically relevant genotypes were identified. CONCLUSIONS: The model allows fast, accurate, reliable estimation of the frequency of genetic motifs associated with resistance to anti-malarials using prevalence data collected from malaria patients. The model does not require per-patient MOI measurements and can easily analyse data from five markers. The model will be a valuable tool for monitoring markers of anti-malarial drug resistance, including markers of resistance to artemisinin derivatives and partner drugs.


Asunto(s)
Resistencia a Medicamentos , Frecuencia de los Genes , Malaria Falciparum/parasitología , Plasmodium falciparum/genética , Plasmodium falciparum/aislamiento & purificación , Genotipo , Haplotipos , Humanos , Malaria Falciparum/epidemiología , Modelos Estadísticos , Plasmodium falciparum/clasificación , Plasmodium falciparum/efectos de los fármacos , Prevalencia , Uganda
6.
Methods ; 59(1): 71-9, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23079396

RESUMEN

The stochastic nature of generating eukaryotic transcripts challenges conventional methods for obtaining and analyzing single-cell gene expression data. In order to address the inherent noise, detailed methods are described on how to collect data on multiple genes in a large number of single cells using microfluidic arrays. As part of a study exploring the effect of genotype on Wnt pathway activation, data were collected for 96 qPCR assays on 1440 lymphoblastoid cells. The description of methods includes preliminary data processing steps. The methods used in the collection and analysis of single-cell qPCR data are contrasted with those used in conventional qPCR.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Células Progenitoras Linfoides/metabolismo , Reacción en Cadena en Tiempo Real de la Polimerasa , Análisis de la Célula Individual , Línea Celular , Interpretación Estadística de Datos , Humanos , Límite de Detección , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Vía de Señalización Wnt
7.
PLoS Genet ; 7(9): e1002270, 2011 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-21931564

RESUMEN

We have performed a metabolite quantitative trait locus (mQTL) study of the (1)H nuclear magnetic resonance spectroscopy ((1)H NMR) metabolome in humans, building on recent targeted knowledge of genetic drivers of metabolic regulation. Urine and plasma samples were collected from two cohorts of individuals of European descent, with one cohort comprised of female twins donating samples longitudinally. Sample metabolite concentrations were quantified by (1)H NMR and tested for association with genome-wide single-nucleotide polymorphisms (SNPs). Four metabolites' concentrations exhibited significant, replicable association with SNP variation (8.6×10(-11)

Asunto(s)
Estudio de Asociación del Genoma Completo , Redes y Vías Metabólicas/genética , Metaboloma/genética , Sitios de Carácter Cuantitativo/genética , Selección Genética , Acetiltransferasas/genética , Acetiltransferasas/metabolismo , Dimetilaminas/sangre , Dimetilaminas/metabolismo , Femenino , Haplotipos , Humanos , Isobutiratos/metabolismo , Isobutiratos/orina , Espectroscopía de Resonancia Magnética , Metilaminas/metabolismo , Metilaminas/orina , Polimorfismo de Nucleótido Simple
8.
BMC Bioinformatics ; 14 Suppl 13: S8, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24267288

RESUMEN

BACKGROUND: In order to better understand cancer as a complex disease with multiple genetic and epigenetic factors, it is vital to model the fundamental biological relationships among these alterations as well as their relationships with important clinical outcomes. METHODS: We develop an integrative network-based Bayesian analysis (iNET) approach that allows us to jointly analyze multi-platform high-dimensional genomic data in a computationally efficient manner. The iNET approach is formulated as an objective Bayesian model selection problem for Gaussian graphical models to model joint dependencies among platform-specific features using known biological mechanisms. Using both simulated datasets and a glioblastoma (GBM) study from The Cancer Genome Atlas (TCGA), we illustrate the iNET approach via integrating three data types, microRNA, gene expression (mRNA), and patient survival time. RESULTS: We show that the iNET approach has greater power in identifying cancer-related microRNAs than non-integrative approaches based on realistic simulated datasets. In the TCGA GBM study, we found many mRNA-microRNA pairs and microRNAs that are associated with patient survival time, with some of these associations identified in previous studies. CONCLUSIONS: The iNET discovers relationships consistent with the underlying biological mechanisms among these variables, as well as identifying important biomarkers that are potentially relevant to patient survival. In addition, we identified some microRNAs that can potentially affect patient survival which are missed by non-integrative approaches.


Asunto(s)
Teorema de Bayes , Genómica/métodos , Glioblastoma/genética , Neoplasias/genética , Integración de Sistemas , Atlas como Asunto , Simulación por Computador , Humanos , MicroARNs/genética , Distribución Normal , ARN Mensajero/genética , Programas Informáticos , Análisis de Supervivencia
9.
Bioinformatics ; 28(22): 2981-2, 2012 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-22962342

RESUMEN

SUMMARY: GREVE has been developed to assist with the identification of recurrent genomic aberrations across cancer samples. The exact characterization of such aberrations remains a challenge despite the availability of increasing amount of data, from SNParray to next-generation sequencing. Furthermore, genomic aberrations in cancer are especially difficult to handle because they are, by nature, unique to the patients. However, their recurrence in specific regions of the genome has been shown to reflect their relevance in the development of tumors. GREVE makes use of previously characterized events to identify such regions and focus any further analysis. AVAILABILITY: GREVE is available through a web interface and open-source application (http://www.well.ox.ac.uk/GREVE).


Asunto(s)
Aberraciones Cromosómicas , Genoma Humano , Neoplasias/genética , Programas Informáticos , Puntos de Rotura del Cromosoma , Humanos
10.
Mol Syst Biol ; 7: 525, 2011 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-21878913

RESUMEN

¹H Nuclear Magnetic Resonance spectroscopy (¹H NMR) is increasingly used to measure metabolite concentrations in sets of biological samples for top-down systems biology and molecular epidemiology. For such purposes, knowledge of the sources of human variation in metabolite concentrations is valuable, but currently sparse. We conducted and analysed a study to create such a resource. In our unique design, identical and non-identical twin pairs donated plasma and urine samples longitudinally. We acquired ¹H NMR spectra on the samples, and statistically decomposed variation in metabolite concentration into familial (genetic and common-environmental), individual-environmental, and longitudinally unstable components. We estimate that stable variation, comprising familial and individual-environmental factors, accounts on average for 60% (plasma) and 47% (urine) of biological variation in ¹H NMR-detectable metabolite concentrations. Clinically predictive metabolic variation is likely nested within this stable component, so our results have implications for the effective design of biomarker-discovery studies. We provide a power-calculation method which reveals that sample sizes of a few thousand should offer sufficient statistical precision to detect ¹H NMR-based biomarkers quantifying predisposition to disease.


Asunto(s)
Biomarcadores , Interacción Gen-Ambiente , Metaboloma/genética , Resonancia Magnética Nuclear Biomolecular/métodos , Biología de Sistemas/métodos , Población Blanca/genética , Anciano , Algoritmos , Biomarcadores/sangre , Biomarcadores/orina , Bases de Datos Genéticas , Femenino , Variación Genética , Humanos , Persona de Mediana Edad , Modelos Estadísticos , Proyectos de Investigación , Tamaño de la Muestra , Gemelos Dicigóticos/genética , Gemelos Monocigóticos/genética
11.
Proc Natl Acad Sci U S A ; 106(18): 7559-64, 2009 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-19376968

RESUMEN

We have cultured Plasmodium falciparum directly from the blood of infected individuals to examine patterns of mature-stage gene expression in patient isolates. Analysis of the transcriptome of P. falciparum is complicated by the highly periodic nature of gene expression because small variations in the stage of parasite development between samples can lead to an apparent difference in gene expression values. To address this issue, we have developed statistical likelihood-based methods to estimate cell cycle progression and commitment to asexual or sexual development lineages in our samples based on microscopy and gene expression patterns. In cases subsequently matched for temporal development, we find that transcriptional patterns in ex vivo culture display little variation across patients with diverse clinical profiles and closely resemble transcriptional profiles that occur in vitro. These statistical methods, available to the research community, assist in the design and interpretation of P. falciparum expression profiling experiments where it is difficult to separate true differential expression from cell-cycle dependent expression. We reanalyze an existing dataset of in vivo patient expression profiles and conclude that previously observed discrete variation is consistent with the commitment of a varying proportion of the parasite population to the sexual development lineage.


Asunto(s)
Ciclo Celular , Perfilación de la Expresión Génica , Plasmodium falciparum/crecimiento & desarrollo , Plasmodium falciparum/genética , Animales , Ciclo Celular/genética , Células Cultivadas , Humanos
12.
J Proteome Res ; 10(12): 5562-7, 2011 Dec 02.
Artículo en Inglés | MEDLINE | ID: mdl-22010953

RESUMEN

In biomarker discovery studies, uncertainty associated with case and control labels is often overlooked. By omitting to take into account label uncertainty, model parameters and the predictive risk can become biased, sometimes severely. The most common situation is when the control set contains an unknown number of undiagnosed, or future, cases. This has a marked impact in situations where the model needs to be well-calibrated, e.g., when the prediction performance of a biomarker panel is evaluated. Failing to account for class label uncertainty may lead to underestimation of classification performance and bias in parameter estimates. This can further impact on meta-analysis for combining evidence from multiple studies. Using a simulation study, we outline how conventional statistical models can be modified to address class label uncertainty leading to well-calibrated prediction performance estimates and reduced bias in meta-analysis. We focus on the problem of mislabeled control subjects in case-control studies, i.e., when some of the control subjects are undiagnosed cases, although the procedures we report are generic. The uncertainty in control status is a particular situation common in biomarker discovery studies in the context of genomic and molecular epidemiology, where control subjects are commonly sampled from the general population with an established expected disease incidence rate.


Asunto(s)
Sesgo , Biomarcadores/química , Estudios de Casos y Controles , Algoritmos , Biomarcadores/análisis , Simulación por Computador , Humanos , Modelos Logísticos , Metaanálisis como Asunto , Curva ROC , Reproducibilidad de los Resultados , Factores de Riesgo , Incertidumbre
13.
Genet Epidemiol ; 34(4): 299-308, 2010 May.
Artículo en Inglés | MEDLINE | ID: mdl-20025065

RESUMEN

Standard techniques for single marker quantitative trait mapping perform poorly in detecting complex interacting genetic influences. When a genetic marker interacts with other genetic markers and/or environmental factors to influence a quantitative trait, a sample of individuals will show different effects according to their exposure to other interacting factors. This paper presents a Bayesian mixture model, which effectively models heterogeneous genetic effects apparent at a single marker. We compute approximate Bayes factors which provide an efficient strategy for screening genetic markers (genome-wide) for evidence of a heterogeneous effect on a quantitative trait. We present a simulation study which demonstrates that the approximation is good and provide a real data example which identifies a population-specific genetic effect on gene expression in the HapMap CEU and YRI populations. We advocate the use of the model as a strategy for identifying candidate interacting markers without any knowledge of the nature or order of the interaction. The source of heterogeneity can be modeled as an extension.


Asunto(s)
Sitios Genéticos , Modelos Estadísticos , Sitios de Carácter Cuantitativo , Algoritmos , Alelos , Teorema de Bayes , Simulación por Computador , Ambiente , Marcadores Genéticos , Genotipo , Humanos , Modelos Genéticos , Oportunidad Relativa , Programas Informáticos
14.
Proteome Sci ; 9: 73, 2011 Nov 17.
Artículo en Inglés | MEDLINE | ID: mdl-22093360

RESUMEN

BACKGROUND: The advent of affinity-based proteomics technologies for global protein profiling provides the prospect of finding new molecular biomarkers for common, multifactorial disorders. The molecular phenotypes obtained from studies on such platforms are driven by multiple sources, including genetic, environmental, and experimental components. In characterizing the contribution of different sources of variation to the measured phenotypes, the aim is to facilitate the design and interpretation of future biomedical studies employing exploratory and multiplexed technologies. Thus, biometrical genetic modelling of twin or other family data can be used to decompose the variation underlying a phenotype into biological and experimental components. RESULTS: Using antibody suspension bead arrays and antibodies from the Human Protein Atlas, we study unfractionated serum from a longitudinal study on 154 twins. In this study, we provide a detailed description of how the variation in a molecular phenotype in terms of protein profile can be decomposed into familial i.e. genetic and common environmental; individual environmental, short-term biological and experimental components. The results show that across 69 antibodies analyzed in the study, the median proportion of the total variation explained by familial sources is 12% (IQR 1-22%), and the median proportion of the total variation attributable to experimental sources is 63% (IQR 53-72%). CONCLUSION: The variability analysis of antibody arrays highlights the importance to consider variability components and their relative contributions when designing and evaluating studies for biomarker discoveries with exploratory, high-throughput and multiplexed methods.

15.
Elife ; 102021 07 06.
Artículo en Inglés | MEDLINE | ID: mdl-34225842

RESUMEN

Severe falciparum malaria has substantially affected human evolution. Genetic association studies of patients with clinically defined severe malaria and matched population controls have helped characterise human genetic susceptibility to severe malaria, but phenotypic imprecision compromises discovered associations. In areas of high malaria transmission, the diagnosis of severe malaria in young children and, in particular, the distinction from bacterial sepsis are imprecise. We developed a probabilistic diagnostic model of severe malaria using platelet and white count data. Under this model, we re-analysed clinical and genetic data from 2220 Kenyan children with clinically defined severe malaria and 3940 population controls, adjusting for phenotype mis-labelling. Our model, validated by the distribution of sickle trait, estimated that approximately one-third of cases did not have severe malaria. We propose a data-tilting approach for case-control studies with phenotype mis-labelling and show that this reduces false discovery rates and improves statistical power in genome-wide association studies.


In areas of sub-Saharan Africa where malaria is common, most people are frequently exposed to the bites of mosquitoes carrying malaria parasites, so they often have malaria parasites in their blood. Young children, who have not yet built up strong immunity against malaria, often fall ill with severe malaria, a life-threatening disease. It is unclear why some children develop severe malaria and die, while other children with high numbers of parasites in their blood do not develop any apparent symptoms. Genetic susceptibility studies are designed to uncover why such differences exist by comparing individuals with severe malaria (referred to as 'cases') with individuals drawn from the general population (known as 'controls'). But severe malaria can be a challenge to diagnose. Since high numbers of malaria parasites can be found in healthy children, it is sometimes difficult to determine whether the parasites are making a child ill, or whether they are a coincidental finding. Consequently, some of the 'cases' recruited into these studies may actually have a different disease, such as bacterial sepsis. This ultimately affects how the studies are interpreted, and introduces error and inaccuracy into the data. Watson, Ndila et al. investigated whether measuring blood biomarkers in patients (derived from the complete blood count, including platelet counts and white blood cell counts) could improve the accuracy with which malaria is diagnosed. They developed a new mathematical model that incorporates platelet and white blood cell counts. This model estimates that in a large cohort of 2,220 Kenyan children diagnosed with severe malaria, around one third of enrolled children did not actually have this disease. Further analysis suggests that patients with severe malaria are highly unlikely to have platelet counts higher than 200,000 per microlitre. This defines a cut-off that researchers can use to avoid recruiting patients who do not have severe malaria in future studies. Additionally, the ability to diagnose severe malaria more accurately can make it easier to detect and treat other diseases with similar symptoms in children with high numbers of malaria parasites in their blood. Watson, Ndila et al.'s findings support the recommendation that all children with suspected malaria be given broad spectrum antibiotics, as many misdiagnosed children will likely have bacterial sepsis. It also suggests that using complete blood counts, which are cheap to obtain and increasingly available in low-resource settings, could improve diagnostic accuracy in future clinical studies of severe malaria. This could ultimately improve the ability of these studies to find new treatments for this life-threatening disease.


Asunto(s)
Estudio de Asociación del Genoma Completo , Malaria , Fenotipo , Adolescente , Adulto , Estudios de Casos y Controles , Niño , Preescolar , Proteínas de la Matriz Extracelular/genética , Femenino , Genómica , Humanos , Kenia , Malaria/diagnóstico , Malaria/epidemiología , Malaria Falciparum , Masculino , Polimorfismo Genético
16.
Bioinformatics ; 25(22): 2929-36, 2009 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-19696047

RESUMEN

MOTIVATION: Identifying the network structure through which genes and their products interact can help to elucidate normal cell physiology as well as the genetic architecture of pathological phenotypes. Recently, a number of gene network inference tools have appeared based on Gaussian graphical model representations. Following this, we introduce a novel Boosting approach to learn the structure of a high-dimensional Gaussian graphical model motivated by the applications in genomics. A particular emphasis is paid to the inclusion of partial prior knowledge on the structure of the graph. With the increasing availability of pathway information and large-scale gene expression datasets, we believe that conditioning on prior knowledge will be an important aspect in raising the statistical power of structural learning algorithms to infer true conditional dependencies. RESULTS: Our Boosting approach, termed BoostiGraph, is conceptually and algorithmically simple. It complements recent work on the network inference problem based on Lasso-type approaches. BoostiGraph is computationally cheap and is applicable to very high-dimensional graphs. For example, on graphs of order 5000 nodes, it is able to map out paths for the conditional independence structure in few minutes. Using computer simulations, we investigate the ability of our method with and without prior information to infer Gaussian graphical models from artificial as well as actual microarray datasets. The experimental results demonstrate that, using our method, it is possible to recover the true network topology with relatively high accuracy. AVAILABILITY: This method and all other associated files are freely available from http://www.stats.ox.ac.uk/~anjum/.


Asunto(s)
Biología Computacional/métodos , Redes Reguladoras de Genes , Simulación por Computador , Perfilación de la Expresión Génica/métodos , Modelos Estadísticos , Reconocimiento de Normas Patrones Automatizadas , Proteoma/genética
17.
Bioinformatics ; 25(2): 197-203, 2009 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-19028720

RESUMEN

MOTIVATION: Conventional phylogenetic analysis for characterizing the relatedness between taxa typically assumes that a single relationship exists between species at every site along the genome. This assumption fails to take into account recombination which is a fundamental process for generating diversity and can lead to spurious results. Recombination induces a localized phylogenetic structure which may vary along the genome. Here, we generalize a hidden Markov model (HMM) to infer changes in phylogeny along multiple sequence alignments while accounting for rate heterogeneity; the hidden states refer to the unobserved phylogenic topology underlying the relatedness at a genomic location. The dimensionality of the number of hidden states (topologies) and their structure are random (not known a priori) and are sampled using Markov chain Monte Carlo algorithms. The HMM structure allows us to analytically integrate out over all possible changepoints in topologies as well as all the unknown branch lengths. RESULTS: We demonstrate our approach on simulated data and also to the genome of a suspected HIV recombinant strain as well as to an investigation of recombination in the sequences of 15 laboratory mouse strains sequenced by Perlegen Sciences. Our findings indicate that our method allows us to distinguish between rate heterogeneity and variation in phylogeny caused by recombination without being restricted to 4-taxa data.


Asunto(s)
Filogenia , Recombinación Genética/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Animales , Teorema de Bayes , Genoma , Cadenas de Markov , Ratones , Selección Genética
18.
Trials ; 21(1): 865, 2020 Oct 20.
Artículo en Inglés | MEDLINE | ID: mdl-33081827

RESUMEN

An amendment to this paper has been published and can be accessed via the original article.

19.
Trials ; 21(1): 386, 2020 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-32381030

RESUMEN

BACKGROUND: Exploration and modelling of heterogeneous treatment effects as a function of baseline covariates is an important aspect of precision medicine in randomised controlled trials (RCTs). Randomisation generally guarantees the internal validity of an RCT, but heterogeneity in treatment effect can reduce external validity. Estimation of heterogeneous treatment effects is usually done via a predictive model for individual outcomes, where one searches for interactions between treatment allocation and important patient baseline covariates. However, such models are prone to overfitting and multiple testing and typically demand a transformation of the outcome measurement, for example, from the absolute risk in the original RCT to log-odds of risk in the predictive model. METHODS: We show how reference classes derived from baseline covariates can be used to explore heterogeneous treatment effects via a two-stage approach. We first estimate a risk score which captures on a single dimension some of the heterogeneity in outcomes of the trial population. Heterogeneity in the treatment effect can then be explored via reweighting schemes along this axis of variation. This two-stage approach bypasses the search for interactions with multiple covariates, thus protecting against multiple testing. It also allows for exploration of heterogeneous treatment effects on the original outcome scale of the RCT. This approach would typically be applied to multivariable models of baseline risk to assess the stability of average treatment effects with respect to the distribution of risk in the population studied. CASE STUDY: We illustrate this approach using the single largest randomised treatment trial in severe falciparum malaria and demonstrate how the estimated treatment effect in terms of absolute mortality risk reduction increases considerably in higher risk strata. CONCLUSIONS: 'Local' and 'tilting' reweighting schemes based on ranking patients by baseline risk can be used as a general approach for exploring, graphing and reporting heterogeneity of treatment effect in RCTs. TRIAL REGISTRATION: ISRCTN clinical trials registry: ISRCTN50258054. Prospectively registered on 22 July 2005.


Asunto(s)
Predicción/métodos , Malaria Falciparum/terapia , Proyectos de Investigación/tendencias , Algoritmos , Humanos , Malaria Falciparum/mortalidad , Mortalidad , Evaluación de Resultado en la Atención de Salud , Medicina de Precisión , Valor Predictivo de las Pruebas , Ensayos Clínicos Controlados Aleatorios como Asunto , Conducta de Reducción del Riesgo , Usos Terapéuticos
20.
Trials ; 21(1): 156, 2020 Feb 10.
Artículo en Inglés | MEDLINE | ID: mdl-32041653

RESUMEN

BACKGROUND: Retrospective exploratory analyses of randomised controlled trials (RCTs) seeking to identify treatment effect heterogeneity (TEH) are prone to bias and false positives. Yet the desire to learn all we can from exhaustive data measurements on trial participants motivates the inclusion of such analyses within RCTs. Moreover, widespread advances in machine learning (ML) methods hold potential to utilise such data to identify subjects exhibiting heterogeneous treatment response. METHODS: We present a novel analysis strategy for detecting TEH in randomised data using ML methods, whilst ensuring proper control of the false positive discovery rate. Our approach uses random data partitioning with statistical or ML-based prediction on held-out data. This method can test for both crossover TEH (switch in optimal treatment) and non-crossover TEH (systematic variation in benefit across patients). The former is done via a two-sample hypothesis test measuring overall predictive performance. The latter is done via 'stacking' the ML predictors alongside a classical statistical model to formally test the added benefit of the ML algorithm. An adaptation of recent statistical theory allows for the construction of a valid aggregate p value. This testing strategy is independent of the choice of ML method. RESULTS: We demonstrate our approach with a re-analysis of the SEAQUAMAT trial, which compared quinine to artesunate for the treatment of severe malaria in Asian adults. We find no evidence for any subgroup who would benefit from a change in treatment from the current standard of care, artesunate, but strong evidence for significant TEH within the artesunate treatment group. In particular, we find that artesunate provides a differential benefit to patients with high numbers of circulating ring stage parasites. CONCLUSIONS: ML analysis plans using computational notebooks (documents linked to a programming language that capture the model parameter settings, data processing choices, and evaluation criteria) along with version control can improve the robustness and transparency of RCT exploratory analyses. A data-partitioning algorithm allows researchers to apply the latest ML techniques safe in the knowledge that any declared associations are statistically significant at a user-defined level.


Asunto(s)
Antimaláricos/uso terapéutico , Artesunato/uso terapéutico , Aprendizaje Automático , Malaria Falciparum/tratamiento farmacológico , Plasmodium falciparum/efectos de los fármacos , Quinina/uso terapéutico , Ensayos Clínicos Controlados Aleatorios como Asunto , Adulto , Algoritmos , Asia/epidemiología , Humanos , Malaria Falciparum/epidemiología , Malaria Falciparum/parasitología , Estudios Retrospectivos , Resultado del Tratamiento
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA