Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Am J Hum Genet ; 110(9): 1534-1548, 2023 09 07.
Artículo en Inglés | MEDLINE | ID: mdl-37633278

RESUMEN

Despite extensive research on global heritability estimation for complex traits, few methods accurately dissect local heritability. A precise local heritability estimate is crucial for high-resolution mapping in genetics. Here, we report the effective heritability estimator (EHE) that can use p values from genome-wide association studies (GWASs) for local heritability estimation by directly converting marginal heritability estimates of SNPs to a non-redundant heritability estimate of a gene or a small genomic region. EHE provides higher accuracy and precision for local heritability estimation among seven compared methods. Importantly, EHE can be applied to estimate the conditional heritability of nearby genes, where redundant heritability among the genes can also be removed further. The conditional estimation can be guided by tissue-specific expression profiles (or other functional scores) to prioritize and quantify more functionally important genes of complex phenotypes. Applying EHE to 42 complex phenotypes from the UK Biobank, we revealed the existence of two types of distinct genetic architectures for various complex phenotypes and found that highly pleiotropic genes are not enriched for more heritability compared to other candidate susceptibility genes. EHE provides an accurate and robust way to dissect the genetic architecture of complex phenotypes.


Asunto(s)
Estudio de Asociación del Genoma Completo , Genómica , Herencia Multifactorial/genética , Fenotipo , Polimorfismo de Nucleótido Simple/genética
2.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35289357

RESUMEN

Over the past decade, statistical methods have been developed to estimate single nucleotide polymorphism (SNP) heritability, which measures the proportion of phenotypic variance explained by all measured SNPs in the data. Estimates of SNP heritability measure the degree to which the available genetic variants influence phenotypes and improve our understanding of the genetic architecture of complex phenotypes. In this article, we review the recently developed and commonly used SNP heritability estimation methods for continuous and binary phenotypes from the perspective of model assumptions and parameter optimization. We primarily focus on their capacity to handle multiple phenotypes and longitudinal measurements, their ability for SNP heritability partition and their use of individual-level data versus summary statistics. State-of-the-art statistical methods that are scalable to the UK Biobank dataset are also elucidated in detail.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Fenotipo
3.
Int J Mol Sci ; 23(19)2022 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-36232752

RESUMEN

Several disease risk variants reside on non-coding regions of DNA, particularly in open chromatin regions of specific cell types. Identifying the cell types relevant to complex traits through the integration of chromatin accessibility data and genome-wide association studies (GWAS) data can help to elucidate the mechanisms of these traits. In this study, we created a collection of associations between the combinations of chromatin accessibility data (bulk and single-cell) with an array of 201 complex phenotypes. We integrated the GWAS data of these 201 phenotypes with bulk chromatin accessibility data from 137 cell types measured by DNase-I hypersensitive sequencing and found significant results (FDR adjusted p-value ≤ 0.05) for at least one cell type in 21 complex phenotypes, such as atopic dermatitis, Graves' disease, and body mass index. With the integration of single-cell chromatin accessibility data measured by an assay for transposase-accessible chromatin with high-throughput sequencing (scATAC-seq), taken from 111 adult and 111 fetal cell types, the resolution of association was magnified, enabling the identification of further cell types. This resulted in the identification of significant correlations (FDR adjusted p-value ≤ 0.05) between 15 categories of single-cell subtypes and 59 phenotypes ranging from autoimmune diseases like Graves' disease to cardiovascular traits like diastolic/systolic blood pressure.


Asunto(s)
Cromatina , Enfermedad de Graves , Cromatina/genética , ADN/genética , Desoxirribonucleasas/genética , Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Fenotipo , Transposasas/genética
4.
Genet Epidemiol ; 44(1): 90-103, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31587362

RESUMEN

While it is well established that genetics can be a major contributor to population variation of complex traits, the relative contributions of rare and common variants to phenotypic variation remains a matter of considerable debate. Here, we simulate genetic and phenotypic data across different case/control panel sampling strategies, sequencing methods, and genetic architecture models based on evolutionary forces to determine the statistical performance of rare variant association tests (RVATs) widely in use. We find that the highest statistical power of RVATs is achieved by sampling case/control individuals from the extremes of an underlying quantitative trait distribution. We also demonstrate that the use of genotyping arrays, in conjunction with imputation from a whole-genome sequenced (WGS) reference panel, recovers the vast majority (90%) of the power that could be achieved by sequencing the case/control panel using current tools. Finally, we show that for dichotomous traits, the statistical performance of RVATs decreases as rare variants become more important in the trait architecture. Our results extend previous work to show that RVATs are insufficiently powered to make generalizable conclusions about the role of rare variants in dichotomous complex traits.


Asunto(s)
Variación Genética/genética , Genética de Población/métodos , Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Herencia Multifactorial/genética , Estudios de Casos y Controles , Simulación por Computador , Humanos , Fenotipo , Proyectos de Investigación
5.
Clin Genet ; 95(2): 329-333, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30267408

RESUMEN

Genetic investigations were performed in three brothers from a consanguineous union, the two oldest diagnosed with rod-cone dystrophy (RCD), the youngest with early-onset cone-rod dystrophy and the two youngest with nephrotic-range proteinuria. Targeted next-generation sequencing did not identify homozygous pathogenic variant in the oldest brother. Whole exome sequencing (WES) applied to the family identified compound heterozygous variants in CC2D2A (c.2774G>C p.(Arg925Pro); c.4730_4731delinsTGTATA p.(Ala1577Valfs*5)) in the three brothers with a homozygous deletion in CNGA3 (c.1235_1236del p.(Glu412Valfs*6)) in the youngest correcting his diagnosis to achromatopsia plus RCD. None of the three subjects had cerebral abnormalities or learning disabilities inconsistent with Meckel-Gruber and Joubert syndromes, usually associated with CC2D2A mutations. Interestingly, an African woman with RCD shared the CC2D2A missense variant (c.2774G>C p.(Arg925Pro); with c.3182+355_3825del p.(?)). The two youngest also carried compound heterozygous variants in CUBN (c.7906C>T rs137998687 p.(Arg2636*); c.10344C>G p.(Cys3448Trp)) that may explain their nephrotic-range proteinuria. Our study identifies for the first time CC2D2A mutations in isolated RCD and underlines the power of WES to decipher complex phenotypes.


Asunto(s)
Distrofias de Conos y Bastones/diagnóstico , Distrofias de Conos y Bastones/genética , Proteínas del Citoesqueleto/genética , Secuenciación del Exoma , Predisposición Genética a la Enfermedad , Mutación , Fenotipo , Alelos , Sustitución de Aminoácidos , Análisis Mutacional de ADN , Femenino , Estudios de Asociación Genética , Genotipo , Humanos , Linaje , Adulto Joven
6.
Genet Epidemiol ; 39(1): 11-19, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25371374

RESUMEN

Genetic simulation programs are used to model data under specified assumptions to facilitate the understanding and study of complex genetic systems. Standardized data sets generated using genetic simulation are essential for the development and application of novel analytical tools in genetic epidemiology studies. With continuing advances in high-throughput genomic technologies and generation and analysis of larger, more complex data sets, there is a need for updating current approaches in genetic simulation modeling. To provide a forum to address current and emerging challenges in this area, the National Cancer Institute (NCI) sponsored a workshop, entitled "Genetic Simulation Tools for Post-Genome Wide Association Studies of Complex Diseases" at the National Institutes of Health (NIH) in Bethesda, Maryland on March 11-12, 2014. The goals of the workshop were to (1) identify opportunities, challenges, and resource needs for the development and application of genetic simulation models; (2) improve the integration of tools for modeling and analysis of simulated data; and (3) foster collaborations to facilitate development and applications of genetic simulation. During the course of the meeting, the group identified challenges and opportunities for the science of simulation, software and methods development, and collaboration. This paper summarizes key discussions at the meeting, and highlights important challenges and opportunities to advance the field of genetic simulation.


Asunto(s)
Simulación por Computador , Enfermedad/genética , Modelos Genéticos , Programas Informáticos , Estudio de Asociación del Genoma Completo , Genómica , Humanos , Epidemiología Molecular
7.
Genet Epidemiol ; 39(1): 35-44, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25417809

RESUMEN

Demographic events and natural selection alter patterns of genetic variation within populations and may play a substantial role in shaping the genetic architecture of complex phenotypes and disease. However, the joint impact of these basic evolutionary forces is often ignored in the assessment of statistical tests of association. Here, we provide a simulation-based framework for generating DNA sequences that incorporates selection and demography with flexible models for simulating phenotypic variation (sfs_coder). This tool also allows the user to perform locus-specific simulations by automatically querying annotated genomic functional elements and genetic maps. We demonstrate the effects of evolutionary forces on patterns of genetic variation by simulating recently inferred models of human selection and demography. We use these simulations to show that the demographic model and locus-specific features, such as the proportion of sites under selection, may have practical implications for estimating the statistical power of sequencing-based rare variant association tests. In particular, for some phenotype models, there may be higher power to detect rare variant associations in African populations compared to non-Africans, but power is considerably reduced in regions of the genome with rampant negative selection. Furthermore, we show that existing methods for simulating large samples based on resampling from a small set of observed haplotypes fail to recapitulate the distribution of rare variants in the presence of rapid population growth (as has been observed in several human populations).


Asunto(s)
Simulación por Computador , Genética de Población , Modelos Genéticos , Variación Genética , Genoma Humano , Haplotipos , Humanos , Tamaño de la Muestra , Selección Genética
8.
J Evol Biol ; 28(4): 807-25, 2015 04.
Artículo en Inglés | MEDLINE | ID: mdl-25752450

RESUMEN

The extent to which genotypic variation at a priori identified candidate genes can explain variation in complex phenotypes is a major debate in evolutionary biology. Whereas some high-profile genes such as the MHC or MC1R clearly do account for variation in ecologically relevant characters, many complex phenotypes such as response to parasite infection may well be underpinned by a large number of genes, each of small and effectively undetectable effect. Here, we characterize a suite of novel candidate genes for variation in gastrointestinal nematode (Trichostrongylus tenuis) burden among red grouse (Lagopus lagopus scotica) individuals across a network of moors in north-east Scotland. We test for associations between parasite load and genotypic variation in twelve genes previously identified to be differentially expressed in experimentally infected red grouse or genetically differentiated among red grouse populations with overall different parasite loads. These genes are associated with a broad physiological response including immune system processes. Based on individual-level generalized linear models, genotypic variants in nine genes were significantly associated with parasite load, with effect sizes accounting for differences of 514-666 worms per bird. All but one of these variants were synonymous or untranslated, suggesting that these may be linked to protein-coding variants or affect regulatory processes. In contrast, population-level analyses revealed few and inconsistent associations with parasite load, and little evidence of signatures of natural selection. We discuss the broader significance of these contrasting results in the context of the utility of population genomics and landscape genomics approaches in detecting adaptive genomic signatures.


Asunto(s)
Galliformes/genética , Galliformes/parasitología , Interacciones Huésped-Parásitos/genética , Trichostrongylus/patogenicidad , Animales , Enfermedades de las Aves/genética , Enfermedades de las Aves/parasitología , Femenino , Modelos Lineales , Masculino , Metagenómica , Modelos Genéticos , Carga de Parásitos , Polimorfismo de Nucleótido Simple , Escocia , Selección Genética , Tricostrongiliasis/veterinaria
9.
J Exp Biol ; 218(Pt 1): 134-9, 2015 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-25568460

RESUMEN

Since the final decades of the last century, twin studies have made a remarkable contribution to the genetics of human complex traits and diseases. With the recent rapid development in modern biotechnology of high-throughput genetic and genomic analyses, twin modelling is expanding from analysis of diseases to molecular phenotypes in functional genomics especially in epigenetics, a thriving field of research that concerns the environmental regulation of gene expression through DNA methylation, histone modification, microRNA and long non-coding RNA expression, etc. The application of the twin method to molecular phenotypes offers new opportunities to study the genetic (nature) and environmental (nurture) contributions to epigenetic regulation of gene activity during developmental, ageing and disease processes. Besides the classical twin model, the case co-twin design using identical twins discordant for a trait or disease is becoming a popular and powerful design for epigenome-wide association study in linking environmental exposure to differential epigenetic regulation and to disease status while controlling for individual genetic make-up. It can be expected that novel uses of twin methods in epigenetic studies are going to help with efficiently unravelling the genetic and environmental basis of epigenomics in human complex diseases.


Asunto(s)
Epigénesis Genética , Estudios en Gemelos como Asunto , Gemelos/genética , Envejecimiento/genética , Enfermedad/genética , Crecimiento y Desarrollo/genética , Humanos
10.
Genomics ; 104(6 Pt A): 406-11, 2014 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25261766

RESUMEN

Evolutionary engineering has been used to improve key industrial strain traits, such as carbon source utilization, tolerance to adverse environmental conditions, and resistance to chemical inhibitors, for many decades due to its technical simplicity and effectiveness. The lack of need for prior genetic knowledge underlying the phenotypes of interest makes this a powerful approach for strain development for even species with minimal genotypic information. While the basic experimental procedure for laboratory adaptive evolution has remained broadly similar for many years, a range of recent advances show promise for improving the experimental workflows for evolutionary engineering by accelerating the pace of evolution, simplifying the analysis of evolved mutants, and providing new ways of linking desirable phenotypes to selectable characteristics. This review aims to highlight some of these recent advances and discuss how they may be used to improve industrially relevant microbial phenotypes.


Asunto(s)
Evolución Molecular Dirigida , Evolución Molecular , Microbiología Industrial , Biocatálisis , Reactores Biológicos , Aptitud Genética , Variación Genética , Genotipo , Fenotipo , Biología de Sistemas
11.
Genet Epidemiol ; 37(7): 643-57, 2013 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-24123198

RESUMEN

Cancer risk is determined by a complex interplay of genetic and environmental factors. Genome-wide association studies (GWAS) have identified hundreds of common (minor allele frequency [MAF] > 0.05) and less common (0.01 < MAF < 0.05) genetic variants associated with cancer. The marginal effects of most of these variants have been small (odds ratios: 1.1-1.4). There remain unanswered questions on how best to incorporate the joint effects of genes and environment, including gene-environment (G × E) interactions, into epidemiologic studies of cancer. To help address these questions, and to better inform research priorities and allocation of resources, the National Cancer Institute sponsored a "Gene-Environment Think Tank" on January 10-11, 2012. The objective of the Think Tank was to facilitate discussions on (1) the state of the science, (2) the goals of G × E interaction studies in cancer epidemiology, and (3) opportunities for developing novel study designs and analysis tools. This report summarizes the Think Tank discussion, with a focus on contemporary approaches to the analysis of G × E interactions. Selecting the appropriate methods requires first identifying the relevant scientific question and rationale, with an important distinction made between analyses aiming to characterize the joint effects of putative or established genetic and environmental factors and analyses aiming to discover novel risk factors or novel interaction effects. Other discussion items include measurement error, statistical power, significance, and replication. Additional designs, exposure assessments, and analytical approaches need to be considered as we move from the current small number of success stories to a fuller understanding of the interplay of genetic and environmental factors.


Asunto(s)
Interacción Gen-Ambiente , Predisposición Genética a la Enfermedad , National Cancer Institute (U.S.) , Neoplasias/epidemiología , Neoplasias/etiología , Estudio de Asociación del Genoma Completo/métodos , Humanos , Motivación , Neoplasias/genética , Salud Pública/métodos , Reproducibilidad de los Resultados , Informe de Investigación , Riesgo , Tamaño de la Muestra , Estados Unidos
12.
Yeast ; 31(7): 233-41, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24760744

RESUMEN

Enabled by comparative genomics, yeasts have increasingly developed into a powerful model system for molecular evolution. Here we survey several areas in which yeast studies have made important contributions, including regulatory evolution, gene duplication and divergence, evolution of gene order and evolution of complexity. In each area we highlight key studies and findings based on techniques ranging from statistical analysis of large datasets to direct laboratory measurements of fitness. Future work will combine traditional evolutionary genetics analysis and experimental evolution with tools from systems biology to yield mechanistic insight into complex phenotypes.


Asunto(s)
Evolución Molecular , Variación Genética/genética , Genoma Fúngico/genética , Filogenia , Saccharomyces cerevisiae/genética , Duplicación de Gen/genética
13.
Biotechnol Bioeng ; 110(10): 2616-23, 2013 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-23613173

RESUMEN

Lignocellulosic biomass has become an important feedstock to mitigate current ethical and economical concerns related to the bio-based production of fuels and chemicals. During the pre-treatment and hydrolysis of the lignocellulosic biomass, a complex mixture of sugars and inhibitors are formed. The inhibitors interfere with microbial growth and product yields. This study uses an adaptive laboratory evolution method called visualizing evolution in real-time (VERT) to uncover the molecular mechanisms associated with tolerance to hydrolysates of lignocellulosic biomass in Saccharomyces cerevisiae. VERT enables a more rational scheme for isolating adaptive mutants for characterization and molecular analyses. Subsequent growth kinetic analyses of the mutants in individual and combinations of common inhibitors present in hydrolysates (acetic acid, furfural, and hydroxymethylfurfural) showed differential levels of resistance to different inhibitors, with enhanced growth rates up to 57%, 12%, 22%, and 24% in hydrolysates, acetic acid, HMF and furfural, respectively. Interestingly, some of the adaptive mutants exhibited reduced fitness in the presence of individual inhibitors, but showed enhanced fitness in the presence of combinations of inhibitors compared to the parental strains. Transcriptomic analysis revealed different mechanisms for resistance to hydrolysates and a potential cross adaptation between oxidative stress and hydrolysates tolerance in several of the mutants.


Asunto(s)
Adaptación Biológica/fisiología , Bioingeniería/métodos , Biomasa , Lignina/metabolismo , Saccharomyces cerevisiae/fisiología , Ácido Acético/metabolismo , Evolución Biológica , Furaldehído/análogos & derivados , Furaldehído/metabolismo , Perfilación de la Expresión Génica , Glucosa/metabolismo , Modelos Biológicos , Mutación , Fenotipo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Transcriptoma
14.
BMC Genom Data ; 24(1): 50, 2023 09 04.
Artículo en Inglés | MEDLINE | ID: mdl-37667186

RESUMEN

BACKGROUND: A relevant part of the genetic architecture of complex traits is still unknown; despite the discovery of many disease-associated common variants. Polygenic risk score (PRS) models are based on the evaluation of the additive effects attributable to common variants and have been successfully implemented to assess the genetic susceptibility for many phenotypes. In contrast, burden tests are often used to identify an enrichment of rare deleterious variants in specific genes. Both kinds of genetic contributions are typically analyzed independently. Many studies suggest that complex phenotypes are influenced by both low effect common variants and high effect rare deleterious variants. The aim of this paper is to integrate the effect of both common and rare functional variants for a more comprehensive genetic risk modeling. METHODS: We developed a framework combining gene-based scores based on the enrichment of rare functionally relevant variants with genome-wide PRS based on common variants for association analysis and prediction models. We applied our framework on UK Biobank dataset with genotyping and exome data and considered 28 blood biomarkers levels as target phenotypes. For each biomarker, an association analysis was performed on full cohort using gene-based scores (GBS). The cohort was then split into 3 subsets for PRS construction and feature selection, predictive model training, and independent evaluation, respectively. Prediction models were generated including either PRS, GBS or both (combined). RESULTS: Association analyses of the cohort were able to detect significant genes that were previously known to be associated with different biomarkers. Interestingly, the analyses also revealed heterogeneous effect sizes and directionality highlighting the complexity of the blood biomarkers regulation. However, the combined models for many biomarkers show little or no improvement in prediction accuracy compared to the PRS models. CONCLUSION: This study shows that rare variants play an important role in the genetic architecture of complex multifactorial traits such as blood biomarkers. However, while rare deleterious variants play a strong role at an individual level, our results indicate that classical common variant based PRS might be more informative to predict the genetic susceptibility at the population level.


Asunto(s)
Exoma , Predisposición Genética a la Enfermedad , Humanos , Predisposición Genética a la Enfermedad/genética , Biomarcadores , Fenotipo , Herencia Multifactorial/genética
15.
Evolution ; 76(5): 1033-1051, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35334114

RESUMEN

The evolution of complex phenotypes like reproductive strategies is challenging to understand, as they often depend on multiple adaptations that only jointly result in a specific functionality. Sulawesi ricefishes (Adrianichthyidae) evolved a reproductive strategy termed as pelvic brooding. In contrast to the more common transfer brooding, female pelvic brooders carry an egg bundle connected to their body for weeks until the fry hatches. To examine the genetic architecture of pelvic brooding, we crossed the pelvic brooding Oryzias eversi and the transfer brooding Oryzias nigrimas (species divergence time: ∼3.6 my). We hypothesize, that a low number of loci and modularity have facilitated the rapid evolution of pelvic brooding. Traits associated to pelvic brooding, like rib length, pelvic fin length, and morphology of the genital papilla, were correlated in the parental species but correlations were reduced or lost in their F1 and F2 hybrids. Using the Castle-Wright estimator, we found that generally few loci underlie the studied traits. Further, both parental species showed modularity in their body plans. In conclusion, morphological traits related to pelvic brooding were based on a few loci and the mid-body region likely could evolve independently from the remaining body parts. Both factors presumably facilitated the evolution of pelvic brooding.


Asunto(s)
Oryzias , Adaptación Fisiológica , Animales , Femenino , Indonesia , Fenotipo , Reproducción
16.
Bio Protoc ; 12(20)2022 Oct 20.
Artículo en Inglés | MEDLINE | ID: mdl-36353713

RESUMEN

Directed evolution is a powerful technique for identifying beneficial mutations in defined DNA sequences with the goal of improving desired phenotypes. Recent methodological advances have made the evolution of short DNA sequences quick and easy. However, the evolution of DNA sequences >5kb in length, notably gene clusters, is still a challenge for most existing methods. Since many important microbial phenotypes are encoded by multigene pathways, they are usually improved via adaptive laboratory evolution (ALE), which while straightforward to implement can suffer from off-target and hitchhiker mutations that can adversely affect the fitness of the evolved strain. We have therefore developed a new directed evolution method (Inducible Directed Evolution, IDE) that combines the specificity and throughput of recent continuous directed evolution methods with the ease of ALE. Here, we present detailed methods for operating Inducible Directed Evolution (IDE), which enables long (up to 85kb) DNA sequences to be mutated in a high throughput manner via a simple series of incubation steps. In IDE, an intracellular mutagenesis plasmid (MP) tunably mutagenizes the pathway of interest, located on the phagemid (PM). MP contains a mutagenic operon ( danQ926, dam, seqA, emrR, ugi , and cda1 ) that can be expressed via the addition of a chemical inducer. Expression of the mutagenic operon during a cell cycle represses DNA repair mechanisms such as proofreading, translesion synthesis, mismatch repair, and base excision and selection, which leads to a higher mutation rate. Induction of the P1 lytic cycle results in packaging of the mutagenized phagemid, and the pathway-bearing phage particles infect naïve cells, generating a mutant library that can be screened or selected for improved variants. Successive rounds of IDE enable optimization of complex phenotypes encoded by large pathways (as of this writing up to 36 kb), without requiring inefficient transformation steps. Additionally, IDE avoids off-target genomic mutations and enables decoupling of mutagenesis and screening steps, establishing it as a powerful tool for optimizing complex phenotypes in E. coli .

17.
Front Genet ; 13: 1014947, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36276986

RESUMEN

Causal variants for rare genetic diseases are often rare in the general population. Rare variants may also contribute to common complex traits and can have much larger per-allele effect sizes than common variants, although power to detect these associations can be limited. Sequencing costs have steadily declined with technological advancements, making it feasible to adopt whole-exome and whole-genome profiling for large biobank-scale sample sizes. These large amounts of sequencing data provide both opportunities and challenges for rare-variant association analysis. Herein, we review the basic concepts of rare-variant analysis methods, the current state-of-the-art methods in utilizing variant annotations or external controls to improve the statistical power, and particular challenges facing rare variant analysis such as accounting for population structure, extremely unbalanced case-control design. We also review recent advances and challenges in rare variant analysis for familial sequencing data and for more complex phenotypes such as survival data. Finally, we discuss other potential directions for further methodology investigation.

18.
Front Mol Biosci ; 8: 663532, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34222331

RESUMEN

Machine learning is helping the interpretation of biological complexity by enabling the inference and classification of cellular, organismal and ecological phenotypes based on large datasets, e.g., from genomic, transcriptomic and metagenomic analyses. A number of available algorithms can help search these datasets to uncover patterns associated with specific traits, including disease-related attributes. While, in many instances, treating an algorithm as a black box is sufficient, it is interesting to pursue an enhanced understanding of how system variables end up contributing to a specific output, as an avenue toward new mechanistic insight. Here we address this challenge through a suite of algorithms, named BowSaw, which takes advantage of the structure of a trained random forest algorithm to identify combinations of variables ("rules") frequently used for classification. We first apply BowSaw to a simulated dataset and show that the algorithm can accurately recover the sets of variables used to generate the phenotypes through complex Boolean rules, even under challenging noise levels. We next apply our method to data from the integrative Human Microbiome Project and find previously unreported high-order combinations of microbial taxa putatively associated with Crohn's disease. By leveraging the structure of trees within a random forest, BowSaw provides a new way of using decision trees to generate testable biological hypotheses.

19.
Evol Lett ; 5(1): 61-74, 2021 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-33552536

RESUMEN

Evolutionary genetic studies have uncovered abundant evidence for genomic hotspots of phenotypic evolution, as well as biased patterns of mutations at those loci. However, the theoretical basis for this concentration of particular types of mutations at particular loci remains largely unexplored. In addition, historical contingency is known to play a major role in evolutionary trajectories, but has not been reconciled with the existence of such hotspots. For example, do the appearance of hotspots and the fixation of different types of mutations at those loci depend on the starting state and/or on the nature and direction of selection? Here, we use a computational approach to examine these questions, focusing the anthocyanin pigmentation pathway, which has been extensively studied in the context of flower color transitions. We investigate two transitions that are common in nature, the transition from blue to purple pigmentation and from purple to red pigmentation. Both sets of simulated transitions occur with a small number of mutations at just four loci and show strikingly similar peaked shapes of evolutionary trajectories, with the mutations of the largest effect occurring early but not first. Nevertheless, the types of mutations (biochemical vs. regulatory) as well as their direction and magnitude are contingent on the particular transition. These simulated color transitions largely mirror findings from natural flower color transitions, which are known to occur via repeated changes at a few hotspot loci. Still, some types of mutations observed in our simulated color evolution are rarely observed in nature, suggesting that pleiotropic effects further limit the trajectories between color phenotypes. Overall, our results indicate that the branching structure of the pathway leads to a predictable concentration of evolutionary change at the hotspot loci, but the types of mutations at these loci and their order is contingent on the evolutionary context.

20.
Front Genet ; 9: 364, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30233646

RESUMEN

In the last years, a series of methods for genomic prediction (GP) have been established, and the advantages of GP over pedigree best linear unbiased prediction (BLUP) have been reported. However, the majority of previously proposed GP models are purely based on mathematical considerations while seldom take the abundant biological knowledge into account. Prediction ability of those models largely depends on the consistency between the statistical assumptions and the underlying genetic architectures of traits of interest. In this study, gene annotation information was incorporated into GP models by constructing haplotypes with SNPs mapped to genic regions. Haplotype allele similarity between pairs of individuals was measured through different approaches at single gene level and then converted into whole genome level, which was then treated as a special kernel and used in kernel based GP models. Results shown that the gene annotation guided methods gave higher or at least comparable predictive ability in some traits, especially in the Arabidopsis dataset and the rice breeding population. Compared to SNP models and haplotype models without gene annotation, the gene annotation based models improved the predictive ability by 0.56~26.67% in the Arabidopsis and 1.62~16.53% in the rice breeding population, respectively. However, incorporating gene annotation slightly improved the predictive ability for several traits but did not show any extra gain for the rest traits in a chicken population. In conclusion, integrating gene annotation into GP models could be beneficial for some traits, species, and populations compared to SNP models and haplotype models without gene annotation. However, more studies are yet to be conducted to implicitly investigate the characteristics of these gene annotation guided models.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA