Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Am J Hum Genet ; 111(7): 1431-1447, 2024 07 11.
Artículo en Inglés | MEDLINE | ID: mdl-38908374

RESUMEN

Methods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling, and the target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (ß coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best-performing single methods when tuned with cross-validation). Our interactively browsable online-results and open-source workflow prspipe provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.


Asunto(s)
Bancos de Muestras Biológicas , Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Humanos , Herencia Multifactorial/genética , Fenotipo , Diabetes Mellitus Tipo 1/genética , Polimorfismo de Nucleótido Simple , Aprendizaje Automático
2.
Bioinformatics ; 39(9)2023 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-37647640

RESUMEN

MOTIVATION: Existing methods for simulating synthetic genotype and phenotype datasets have limited scalability, constraining their usability for large-scale analyses. Moreover, a systematic approach for evaluating synthetic data quality and a benchmark synthetic dataset for developing and evaluating methods for polygenic risk scores are lacking. RESULTS: We present HAPNEST, a novel approach for efficiently generating diverse individual-level genotypic and phenotypic data. In comparison to alternative methods, HAPNEST shows faster computational speed and a lower degree of relatedness with reference panels, while generating datasets that preserve key statistical properties of real data. These desirable synthetic data properties enabled us to generate 6.8 million common variants and nine phenotypes with varying degrees of heritability and polygenicity across 1 million individuals. We demonstrate how HAPNEST can facilitate biobank-scale analyses through the comparison of seven methods to generate polygenic risk scoring across multiple ancestry groups and different genetic architectures. AVAILABILITY AND IMPLEMENTATION: A synthetic dataset of 1 008 000 individuals and nine traits for 6.8 million common variants is available at https://www.ebi.ac.uk/biostudies/studies/S-BSST936. The HAPNEST software for generating synthetic datasets is available as Docker/Singularity containers and open source Julia and C code at https://github.com/intervene-EU-H2020/synthetic_data.


Asunto(s)
Benchmarking , Exactitud de los Datos , Humanos , Genotipo , Fenotipo , Herencia Multifactorial
3.
NMR Biomed ; 37(4): e5075, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38043545

RESUMEN

Renal pathologies often manifest as alterations in kidney size, providing a valuable avenue for employing dynamic parametric MRI as a means to derive kidney size measurements for the diagnosis, treatment, and monitoring of renal disease. Furthermore, this approach holds significant potential in supporting MRI data-driven preclinical investigations into the intricate mechanisms underlying renal pathophysiology. The integration of deep learning algorithms is crucial in achieving rapid and precise segmentation of the kidney from temporally resolved parametric MRI, facilitating the use of kidney size as a meaningful (pre)clinical biomarker for renal disease. To explore this potential, we employed dynamic parametric T2 mapping of the kidney in rats in conjunction with a custom-tailored deep dilated U-Net (DDU-Net) architecture. The architecture was trained, validated, and tested on manually segmented ground truth kidney data, with benchmarking against an analytical segmentation model and a self-configuring no new U-Net. Subsequently, we applied our approach to in vivo longitudinal MRI data, incorporating interventions that emulate clinically relevant scenarios in rats. Our approach achieved high performance metrics, including a Dice coefficient of 0.98, coefficient of determination of 0.92, and a mean absolute percentage error of 1.1% compared with ground truth. The DDU-Net enabled automated and accurate quantification of acute changes in kidney size, such as aortic occlusion (-8% ± 1%), venous occlusion (5% ± 1%), furosemide administration (2% ± 1%), hypoxemia (-2% ± 1%), and contrast agent-induced acute kidney injury (11% ± 1%). This approach can potentially be instrumental for the development of dynamic parametric MRI-based tools for kidney disorders, offering unparalleled insights into renal pathophysiology.


Asunto(s)
Aprendizaje Profundo , Compuestos Organofosforados , Triazoles , Animales , Ratas , Riñón/diagnóstico por imagen , Algoritmos , Imagen por Resonancia Magnética , Procesamiento de Imagen Asistido por Computador
4.
Proc Natl Acad Sci U S A ; 118(31)2021 08 03.
Artículo en Inglés | MEDLINE | ID: mdl-34261775

RESUMEN

Over the last months, cases of SARS-CoV-2 surged repeatedly in many countries but could often be controlled with nonpharmaceutical interventions including social distancing. We analyzed deidentified Global Positioning System (GPS) tracking data from 1.15 to 1.4 million cell phones in Germany per day between March and November 2020 to identify encounters between individuals and statistically evaluate contact behavior. Using graph sampling theory, we estimated the contact index (CX), a metric for number and heterogeneity of contacts. We found that CX, and not the total number of contacts, is an accurate predictor for the effective reproduction number R derived from case numbers. A high correlation between CX and R recorded more than 2 wk later allows assessment of social behavior well before changes in case numbers become detectable. By construction, the CX quantifies the role of superspreading and permits assigning risks to specific contact behavior. We provide a critical CX value beyond which R is expected to rise above 1 and propose to use that value to leverage the social-distancing interventions for the coming months.


Asunto(s)
COVID-19/transmisión , COVID-19/virología , Teléfono Celular , Trazado de Contacto , SARS-CoV-2/fisiología , COVID-19/epidemiología , Alemania/epidemiología , Humanos
5.
Hum Brain Mapp ; 44(12): 4480-4497, 2023 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-37318944

RESUMEN

White matter impairments caused by gliomas can lead to functional disorders. In this study, we predicted aphasia in patients with gliomas infiltrating the language network using machine learning methods. We included 78 patients with left-hemispheric perisylvian gliomas. Aphasia was graded preoperatively using the Aachen aphasia test (AAT). Subsequently, we created bundle segmentations based on automatically generated tract orientation mappings using TractSeg. To prepare the input for the support vector machine (SVM), we first preselected aphasia-related fiber bundles based on the associations between relative tract volumes and AAT subtests. In addition, diffusion magnetic resonance imaging (dMRI)-based metrics [axial diffusivity (AD), apparent diffusion coefficient (ADC), fractional anisotropy (FA), and radial diffusivity (RD)] were extracted within the fiber bundles' masks with their mean, standard deviation, kurtosis, and skewness values. Our model consisted of random forest-based feature selection followed by an SVM. The best model performance achieved 81% accuracy (specificity = 85%, sensitivity = 73%, and AUC = 85%) using dMRI-based features, demographics, tumor WHO grade, tumor location, and relative tract volumes. The most effective features resulted from the arcuate fasciculus (AF), middle longitudinal fasciculus (MLF), and inferior fronto-occipital fasciculus (IFOF). The most effective dMRI-based metrics were FA, ADC, and AD. We achieved a prediction of aphasia using dMRI-based features and demonstrated that AF, IFOF, and MLF were the most important fiber bundles for predicting aphasia in this cohort.


Asunto(s)
Afasia , Glioma , Sustancia Blanca , Humanos , Imagen de Difusión Tensora/métodos , Benchmarking , Glioma/complicaciones , Glioma/diagnóstico por imagen , Glioma/patología , Afasia/diagnóstico por imagen , Afasia/etiología , Afasia/patología , Imagen de Difusión por Resonancia Magnética , Sustancia Blanca/patología , Aprendizaje Automático
6.
Bioinformatics ; 38(14): 3621-3628, 2022 07 11.
Artículo en Inglés | MEDLINE | ID: mdl-35640976

RESUMEN

MOTIVATION: Medical images can provide rich information about diseases and their biology. However, investigating their association with genetic variation requires non-standard methods. We propose transferGWAS, a novel approach to perform genome-wide association studies directly on full medical images. First, we learn semantically meaningful representations of the images based on a transfer learning task, during which a deep neural network is trained on independent but similar data. Then, we perform genetic association tests with these representations. RESULTS: We validate the type I error rates and power of transferGWAS in simulation studies of synthetic images. Then we apply transferGWAS in a genome-wide association study of retinal fundus images from the UK Biobank. This first-of-a-kind GWAS of full imaging data yielded 60 genomic regions associated with retinal fundus images, of which 7 are novel candidate loci for eye-related traits and diseases. AVAILABILITY AND IMPLEMENTATION: Our method is implemented in Python and available at https://github.com/mkirchler/transferGWAS/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Redes Neurales de la Computación , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Genoma , Aprendizaje Automático
7.
Hum Mol Genet ; 27(R1): R63-R71, 2018 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-29648622

RESUMEN

The human genome is now investigated through high-throughput functional assays, and through the generation of population genomic data. These advances support the identification of functional genetic variants and the prediction of traits (e.g. deleterious variants and disease). This review summarizes lessons learned from the large-scale analyses of genome and exome data sets, modeling of population data and machine-learning strategies to solve complex genomic sequence regions. The review also portrays the rapid adoption of artificial intelligence/deep neural networks in genomics; in particular, deep learning approaches are well suited to model the complex dependencies in the regulatory landscape of the genome, and to provide predictors for genetic variant calling and interpretation.


Asunto(s)
Aprendizaje Profundo/tendencias , Redes Reguladoras de Genes/genética , Genoma Humano/genética , Genómica/tendencias , Exoma/genética , Secuenciación de Nucleótidos de Alto Rendimiento/tendencias , Humanos , Análisis de Secuencia de ADN , Programas Informáticos
8.
Am J Hum Genet ; 101(5): 700-715, 2017 Nov 02.
Artículo en Inglés | MEDLINE | ID: mdl-29100084

RESUMEN

Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the genome by next-generation sequencing (NGS). In particular, accurate detection of pathological STR expansion is limited by the sequence read length during whole-genome analysis. We developed TREDPARSE, a software package that incorporates various cues from read alignment and paired-end distance distribution, as well as a sequence stutter model, in a probabilistic framework to infer repeat sizes for genetic loci, and we used this software to infer repeat sizes for 30 known disease loci. Using simulated data, we show that TREDPARSE outperforms other available software. We sampled the full genome sequences of 12,632 individuals to an average read depth of approximately 30× to 40× with Illumina HiSeq X. We identified 138 individuals with risk alleles at 15 STR disease loci. We validated a representative subset of the samples (n = 19) by Sanger and by Oxford Nanopore sequencing. Additionally, we validated the STR calls against known allele sizes in a set of GeT-RM reference cell-line materials (n = 6). Several STR loci that are entirely guanine or cytosines (G or C) have insufficient read evidence for inference and therefore could not be assayed precisely by TREDPARSE. TREDPARSE extends the limit of STR size detection beyond the physical sequence read length. This extension is critical because many of the disease risk cutoffs are close to or beyond the short sequence read length of 100 to 150 bases.


Asunto(s)
Genoma Humano/genética , Repeticiones de Microsatélite/genética , Adolescente , Adulto , Alelos , Niño , Femenino , Genética de Población/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Masculino , Persona de Mediana Edad , Polimorfismo Genético/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos
9.
Plant Cell ; 29(1): 5-19, 2017 01.
Artículo en Inglés | MEDLINE | ID: mdl-27986896

RESUMEN

The ever-growing availability of high-quality genotypes for a multitude of species has enabled researchers to explore the underlying genetic architecture of complex phenotypes at an unprecedented level of detail using genome-wide association studies (GWAS). The systematic comparison of results obtained from GWAS of different traits opens up new possibilities, including the analysis of pleiotropic effects. Other advantages that result from the integration of multiple GWAS are the ability to replicate GWAS signals and to increase statistical power to detect such signals through meta-analyses. In order to facilitate the simple comparison of GWAS results, we present easyGWAS, a powerful, species-independent online resource for computing, storing, sharing, annotating, and comparing GWAS. The easyGWAS tool supports multiple species, the uploading of private genotype data and summary statistics of existing GWAS, as well as advanced methods for comparing GWAS results across different experiments and data sets in an interactive and user-friendly interface. easyGWAS is also a public data repository for GWAS data and summary statistics and already includes published data and results from several major GWAS. We demonstrate the potential of easyGWAS with a case study of the model organism Arabidopsis thaliana, using flowering and growth-related traits.


Asunto(s)
Biología Computacional/métodos , Genoma de Planta/genética , Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple , Arabidopsis/genética , Arabidopsis/crecimiento & desarrollo , Flores/genética , Flores/crecimiento & desarrollo , Genotipo , Humanos , Fenotipo , Reproducibilidad de los Resultados , Programas Informáticos , Interfaz Usuario-Computador
10.
Proc Natl Acad Sci U S A ; 114(38): 10166-10171, 2017 09 19.
Artículo en Inglés | MEDLINE | ID: mdl-28874526

RESUMEN

Prediction of human physical traits and demographic information from genomic data challenges privacy and data deidentification in personalized medicine. To explore the current capabilities of phenotype-based genomic identification, we applied whole-genome sequencing, detailed phenotyping, and statistical modeling to predict biometric traits in a cohort of 1,061 participants of diverse ancestry. Individually, for a large fraction of the traits, their predictive accuracy beyond ancestry and demographic information is limited. However, we have developed a maximum entropy algorithm that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person. Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans. This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.


Asunto(s)
Confidencialidad , Dermatoglifia del ADN , Modelos Genéticos , Fenotipo , Secuenciación Completa del Genoma , Adulto , Factores de Edad , Algoritmos , Tamaño Corporal , Estudios de Cohortes , Anonimización de la Información , Femenino , Humanos , Masculino , Persona de Mediana Edad , Pigmentación/genética , Adulto Joven
11.
Nat Methods ; 12(8): 755-8, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26076425

RESUMEN

Set tests are a powerful approach for genome-wide association testing between groups of genetic variants and quantitative traits. We describe mtSet (http://github.com/PMBio/limix), a mixed-model approach that enables joint analysis across multiple correlated traits while accounting for population structure and relatedness. mtSet effectively combines the benefits of set tests with multi-trait modeling and is computationally efficient, enabling genetic analysis of large cohorts (up to 500,000 individuals) and multiple traits.


Asunto(s)
Biología Computacional/métodos , Algoritmos , Alelos , Animales , Calibración , Simulación por Computador , Interpretación Estadística de Datos , Frecuencia de los Genes , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos , Internet , Leucocitos/citología , Modelos Estadísticos , Fenotipo , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Ratas , Análisis de Regresión , Reproducibilidad de los Resultados , Programas Informáticos
12.
Nat Methods ; 12(4): 332-4, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25664543

RESUMEN

Linear mixed models (LMMs) have emerged as the method of choice for confounded genome-wide association studies. However, the performance of LMMs in nonrandomly ascertained case-control studies deteriorates with increasing sample size. We propose a framework called LEAP (liability estimator as a phenotype; https://github.com/omerwe/LEAP) that tests for association with estimated latent values corresponding to severity of phenotype, and we demonstrate that this can lead to a substantial power increase.


Asunto(s)
Estudios de Casos y Controles , Estudio de Asociación del Genoma Completo/métodos , Bioestadística , Humanos , Modelos Teóricos , Esclerosis Múltiple/genética , Tamaño de la Muestra
13.
Nat Methods ; 11(3): 309-11, 2014 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24464286

RESUMEN

In epigenome-wide association studies, cell-type composition often differs between cases and controls, yielding associations that simply tag cell type rather than reveal fundamental biology. Current solutions require actual or estimated cell-type composition--information not easily obtainable for many samples of interest. We propose a method, FaST-LMM-EWASher, that automatically corrects for cell-type composition without the need for explicit knowledge of it, and then validate our method by comparison with the state-of-the-art approach. Corresponding software is available from http://www.microsoft.com/science/.


Asunto(s)
Células , Epigenómica , Estudio de Asociación del Genoma Completo , Humanos , Modelos Lineales
14.
Bioinformatics ; 30(22): 3206-14, 2014 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-25075117

RESUMEN

MOTIVATION: Set-based variance component tests have been identified as a way to increase power in association studies by aggregating weak individual effects. However, the choice of test statistic has been largely ignored even though it may play an important role in obtaining optimal power. We compared a standard statistical test-a score test-with a recently developed likelihood ratio (LR) test. Further, when correction for hidden structure is needed, or gene-gene interactions are sought, state-of-the art algorithms for both the score and LR tests can be computationally impractical. Thus we develop new computationally efficient methods. RESULTS: After reviewing theoretical differences in performance between the score and LR tests, we find empirically on real data that the LR test generally has more power. In particular, on 15 of 17 real datasets, the LR test yielded at least as many associations as the score test-up to 23 more associations-whereas the score test yielded at most one more association than the LR test in the two remaining datasets. On synthetic data, we find that the LR test yielded up to 12% more associations, consistent with our results on real data, but also observe a regime of extremely small signal where the score test yielded up to 25% more associations than the LR test, consistent with theory. Finally, our computational speedups now enable (i) efficient LR testing when the background kernel is full rank, and (ii) efficient score testing when the background kernel changes with each test, as for gene-gene interaction tests. The latter yielded a factor of 2000 speedup on a cohort of size 13 500. AVAILABILITY: Software available at http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/. CONTACT: heckerma@microsoft.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudios de Asociación Genética/métodos , Variación Genética , Algoritmos , Interpretación Estadística de Datos , Humanos , Funciones de Verosimilitud , Fenotipo , Polimorfismo de Nucleótido Simple
15.
Nucleic Acids Res ; 41(4): 2095-104, 2013 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-23303775

RESUMEN

DNA methylation has been implicated in a number of diseases and other phenotypes. It is, therefore, of interest to identify and understand the genetic determinants of methylation and epigenomic variation. We investigated the extent to which genetic variation in cis-DNA sequence explains variation in CpG dinucleotide methylation in publicly available data for four brain regions from unrelated individuals, finding that 3-4% of CpG loci assayed were heritable, with a mean estimated narrow-sense heritability of 30% over the heritable loci. Over all loci, the mean estimated heritability was 3%, as compared with a recent twin-based study reporting 18%. Heritable loci were enriched for open chromatin regions and binding sites of CTCF, an influential regulator of transcription and chromatin architecture. Additionally, heritable loci were proximal to genes enriched in several known pathways, suggesting a possible functional role for these loci. Our estimates of heritability are conservative, and we suspect that the number of identified heritable loci will increase as the methylome is assayed across a broader range of cell types and the density of the tested loci is increased. Finally, we show that the number of heritable loci depends on the window size parameter commonly used to identify candidate cis-acting single-nucleotide polymorphism variants.


Asunto(s)
Encéfalo/metabolismo , Metilación de ADN , Sitios de Carácter Cuantitativo , Carácter Cuantitativo Heredable , Islas de CpG , ADN/química , Humanos , Polimorfismo de Nucleótido Simple , Secuencias Reguladoras de Ácidos Nucleicos
16.
Nat Methods ; 8(10): 833-5, 2011 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-21892150

RESUMEN

We describe factored spectrally transformed linear mixed models (FaST-LMM), an algorithm for genome-wide association studies (GWAS) that scales linearly with cohort size in both run time and memory use. On Wellcome Trust data for 15,000 individuals, FaST-LMM ran an order of magnitude faster than current efficient algorithms. Our algorithm can analyze data for 120,000 individuals in just a few hours, whereas current algorithms fail on data for even 20,000 individuals (http://mscompbio.codeplex.com/).


Asunto(s)
Estudio de Asociación del Genoma Completo , Modelos Genéticos , Algoritmos , Simulación por Computador , Programas Informáticos
17.
Bioinformatics ; 29(2): 206-14, 2013 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-23175758

RESUMEN

MOTIVATION: Exploring the genetic basis of heritable traits remains one of the central challenges in biomedical research. In traits with simple Mendelian architectures, single polymorphic loci explain a significant fraction of the phenotypic variability. However, many traits of interest seem to be subject to multifactorial control by groups of genetic loci. Accurate detection of such multivariate associations is non-trivial and often compromised by limited statistical power. At the same time, confounding influences, such as population structure, cause spurious association signals that result in false-positive findings. RESULTS: We propose linear mixed models LMM-Lasso, a mixed model that allows for both multi-locus mapping and correction for confounding effects. Our approach is simple and free of tuning parameters; it effectively controls for population structure and scales to genome-wide datasets. LMM-Lasso simultaneously discovers likely causal variants and allows for multi-marker-based phenotype prediction from genotype. We demonstrate the practical use of LMM-Lasso in genome-wide association studies in Arabidopsis thaliana and linkage mapping in mouse, where our method achieves significantly more accurate phenotype prediction for 91% of the considered phenotypes. At the same time, our model dissects the phenotypic variability into components that result from individual single nucleotide polymorphism effects and population structure. Enrichment of known candidate genes suggests that the individual associations retrieved by LMM-Lasso are likely to be genuine. AVAILABILITY: Code available under http://webdav.tuebingen. mpg.de/u/karsten/Forschung/research.html. CONTACT: rakitsch@tuebingen.mpg.de, ippert@microsoft.com or stegle@ebi.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Animales , Mapeo Cromosómico , Sitios Genéticos , Genoma , Genotipo , Humanos , Modelos Lineales , Ratones , Fenotipo , Grupos de Población/genética
18.
Bioinformatics ; 29(11): 1382-9, 2013 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-23559640

RESUMEN

MOTIVATION: Genomic studies have revealed a substantial heritable component of the transcriptional state of the cell. To fully understand the genetic regulation of gene expression variability, it is important to study the effect of genotype in the context of external factors such as alternative environmental conditions. In model systems, explicit environmental perturbations have been considered for this purpose, allowing to directly test for environment-specific genetic effects. However, such experiments are limited to species that can be profiled in controlled environments, hampering their use in important systems such as human. Moreover, even in seemingly tightly regulated experimental conditions, subtle environmental perturbations cannot be ruled out, and hence unknown environmental influences are frequent. Here, we propose a model-based approach to simultaneously infer unmeasured environmental factors from gene expression profiles and use them in genetic analyses, identifying environment-specific associations between polymorphic loci and individual gene expression traits. RESULTS: In extensive simulation studies, we show that our method is able to accurately reconstruct environmental factors and their interactions with genotype in a variety of settings. We further illustrate the use of our model in a real-world dataset in which one environmental factor has been explicitly experimentally controlled. Our method is able to accurately reconstruct the true underlying environmental factor even if it is not given as an input, allowing to detect genuine genotype-environment interactions. In addition to the known environmental factor, we find unmeasured factors involved in novel genotype-environment interactions. Our results suggest that interactions with both known and unknown environmental factors significantly contribute to gene expression variability. AVAILABILITY: and implementation: Software available at http://pmbio.github.io/envGPLVM/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Interacción Gen-Ambiente , Regulación Fúngica de la Expresión Génica , Genotipo , Humanos , Modelos Lineales , Modelos Genéticos , Sitios de Carácter Cuantitativo
19.
Bioinformatics ; 29(12): 1526-33, 2013 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-23599503

RESUMEN

MOTIVATION: Approaches for testing sets of variants, such as a set of rare or common variants within a gene or pathway, for association with complex traits are important. In particular, set tests allow for aggregation of weak signal within a set, can capture interplay among variants and reduce the burden of multiple hypothesis testing. Until now, these approaches did not address confounding by family relatedness and population structure, a problem that is becoming more important as larger datasets are used to increase power. RESULTS: We introduce a new approach for set tests that handles confounders. Our model is based on the linear mixed model and uses two random effects-one to capture the set association signal and one to capture confounders. We also introduce a computational speedup for two random-effects models that makes this approach feasible even for extremely large cohorts. Using this model with both the likelihood ratio test and score test, we find that the former yields more power while controlling type I error. Application of our approach to richly structured Genetic Analysis Workshop 14 data demonstrates that our method successfully corrects for population structure and family relatedness, whereas application of our method to a 15 000 individual Crohn's disease case-control cohort demonstrates that it additionally recovers genes not recoverable by univariate analysis. AVAILABILITY: A Python-based library implementing our approach is available at http://mscompbio.codeplex.com.


Asunto(s)
Marcadores Genéticos , Estudio de Asociación del Genoma Completo/métodos , Algoritmos , Estudios de Casos y Controles , Enfermedad de Crohn/genética , Humanos , Modelos Lineales , Fenotipo , Polimorfismo de Nucleótido Simple
20.
J Hum Genet ; 59(5): 269-75, 2014 May.
Artículo en Inglés | MEDLINE | ID: mdl-24670270

RESUMEN

The use of mixed models to determine narrow-sense heritability and related quantities such as SNP heritability has received much recent attention. Less attention has been paid to the inherent variability in these estimates. One approach for quantifying variability in estimates of heritability is a frequentist approach, in which heritability is estimated using maximum likelihood and its variance is quantified through an asymptotic normal approximation. An alternative approach is to quantify the uncertainty in heritability through its Bayesian posterior distribution. In this paper, we develop the latter approach, make it computationally efficient and compare it to the frequentist approach. We show theoretically that, for a sufficiently large sample size and intermediate values of heritability, the two approaches provide similar results. Using the Atherosclerosis Risk in Communities cohort, we show empirically that the two approaches can give different results and that the variance/uncertainty can remain large.


Asunto(s)
Patrón de Herencia , Modelos Genéticos , Carácter Cuantitativo Heredable , Incertidumbre , Algoritmos , Aterosclerosis/genética , Teorema de Bayes , Estudios de Cohortes , Femenino , Genotipo , Humanos , Masculino , Fenotipo , Polimorfismo de Nucleótido Simple , Grupos Raciales/genética , Factores de Riesgo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA