Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 207
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 2024 Oct 09.
Artículo en Inglés | MEDLINE | ID: mdl-39385032

RESUMEN

The human hippocampus and prefrontal cortex play critical roles in learning and cognition1,2, yet the dynamic molecular characteristics of their development remain enigmatic. Here we investigated the epigenomic and three-dimensional chromatin conformational reorganization during the development of the hippocampus and prefrontal cortex, using more than 53,000 joint single-nucleus profiles of chromatin conformation and DNA methylation generated by single-nucleus methyl-3C sequencing (snm3C-seq3)3. The remodelling of DNA methylation is temporally separated from chromatin conformation dynamics. Using single-cell profiling and multimodal single-molecule imaging approaches, we have found that short-range chromatin interactions are enriched in neurons, whereas long-range interactions are enriched in glial cells and non-brain tissues. We reconstructed the regulatory programs of cell-type development and differentiation, finding putatively causal common variants for schizophrenia strongly overlapping with chromatin loop-connected, cell-type-specific regulatory regions. Our data provide multimodal resources for studying gene regulatory dynamics in brain development and demonstrate that single-cell three-dimensional multi-omics is a powerful approach for dissecting neuropsychiatric risk loci.

2.
Genome Res ; 33(7): 1032-1041, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37197991

RESUMEN

Mendelian randomization (MR) has emerged as a powerful approach to leverage genetic instruments to infer causality between pairs of traits in observational studies. However, the results of such studies are susceptible to biases owing to weak instruments, as well as the confounding effects of population stratification and horizontal pleiotropy. Here, we show that family data can be leveraged to design MR tests that are provably robust to confounding from population stratification, assortative mating, and dynastic effects. We show in simulations that our approach, MR-Twin, is robust to confounding from population stratification and is not affected by weak instrument bias, whereas standard MR methods yield inflated false positive rates. We then conduct an exploratory analysis of MR-Twin and other MR methods applied to 121 trait pairs in the UK Biobank data set. Our results suggest that confounding from population stratification can lead to false positives for existing MR methods, whereas MR-Twin is immune to this type of confounding, and that MR-Twin can help assess whether traditional approaches may be inflated owing to confounding from population stratification.


Asunto(s)
Análisis de la Aleatorización Mendeliana , Reproducción , Sesgo , Estudio de Asociación del Genoma Completo , Análisis de la Aleatorización Mendeliana/métodos , Fenotipo , Humanos
3.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39297879

RESUMEN

Structural variation (SV) refers to insertions, deletions, inversions, and duplications in human genomes. SVs are present in approximately 1.5% of the human genome. Still, this small subset of genetic variation has been implicated in the pathogenesis of psoriasis, Crohn's disease and other autoimmune disorders, autism spectrum and other neurodevelopmental disorders, and schizophrenia. Since identifying structural variants is an important problem in genetics, several specialized computational techniques have been developed to detect structural variants directly from sequencing data. With advances in whole-genome sequencing (WGS) technologies, a plethora of SV detection methods have been developed. However, dissecting SVs from WGS data remains a challenge, with the majority of SV detection methods prone to a high false-positive rate, and no existing method able to precisely detect a full range of SVs present in a sample. Previous studies have shown that none of the existing SV callers can maintain high accuracy across various SV lengths and genomic coverages. Here, we report an integrated structural variant calling framework, Variant Identification and Structural Variant Analysis (VISTA), that leverages the results of individual callers using a novel and robust filtering and merging algorithm. In contrast to existing consensus-based tools which ignore the length and coverage, VISTA overcomes this limitation by executing various combinations of top-performing callers based on variant length and genomic coverage to generate SV events with high accuracy. We evaluated the performance of VISTA on comprehensive gold-standard datasets across varying organisms and coverage. We benchmarked VISTA using the Genome-in-a-Bottle gold standard SV set, haplotype-resolved de novo assemblies from the Human Pangenome Reference Consortium, along with an in-house polymerase chain reaction (PCR)-validated mouse gold standard set. VISTA maintained the highest F1 score among top consensus-based tools measured using a comprehensive gold standard across both mouse and human genomes. VISTA also has an optimized mode, where the calls can be optimized for precision or recall. VISTA-optimized can attain 100% precision and the highest sensitivity among other variant callers. In conclusion, VISTA represents a significant advancement in structural variant calling, offering a robust and accurate framework that outperforms existing consensus-based tools and sets a new standard for SV detection in genomic research.


Asunto(s)
Genoma Humano , Variación Estructural del Genoma , Programas Informáticos , Humanos , Secuenciación Completa del Genoma/métodos , Algoritmos , Genómica/métodos , Biología Computacional/métodos , Variación Genética
4.
Nat Methods ; 19(4): 429-440, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35396482

RESUMEN

Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.


Asunto(s)
Metagenoma , Metagenómica , Archaea/genética , Metagenómica/métodos , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN , Programas Informáticos
5.
PLoS Genet ; 18(11): e1010447, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36342933

RESUMEN

We introduce pleiotropic association test (PAT) for joint analysis of multiple traits using genome-wide association study (GWAS) summary statistics. The method utilizes the decomposition of phenotypic covariation into genetic and environmental components to create a likelihood ratio test statistic for each genetic variant. Though PAT does not directly interpret which trait(s) drive the association, a per trait interpretation of the omnibus p-value is provided through an extension to the meta-analysis framework, m-values. In simulations, we show PAT controls the false positive rate, increases statistical power, and is robust to model misspecifications of genetic effect. Additionally, simulations comparing PAT to three multi-trait methods, HIPO, MTAG, and ASSET, show PAT identified 15.3% more omnibus associations over the next best method. When these associations were interpreted on a per trait level using m-values, PAT had 37.5% more true per trait interpretations with a 0.92% false positive assignment rate. When analyzing four traits from the UK Biobank, PAT discovered 22,095 novel variants. Through the m-values interpretation framework, the number of per trait associations for two traits were almost tripled and were nearly doubled for another trait relative to the original single trait GWAS.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Pleiotropía Genética , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Metaanálisis como Asunto
6.
Am J Hum Genet ; 108(1): 36-48, 2021 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-33352115

RESUMEN

Identifying and interpreting pleiotropic loci is essential to understanding the shared etiology among diseases and complex traits. A common approach to mapping pleiotropic loci is to meta-analyze GWAS summary statistics across multiple traits. However, this strategy does not account for the complex genetic architectures of traits, such as genetic correlations and heritabilities. Furthermore, the interpretation is challenging because phenotypes often have different characteristics and units. We propose PLEIO (Pleiotropic Locus Exploration and Interpretation using Optimal test), a summary-statistic-based framework to map and interpret pleiotropic loci in a joint analysis of multiple diseases and complex traits. Our method maximizes power by systematically accounting for genetic correlations and heritabilities of the traits in the association test. Any set of related phenotypes, binary or quantitative traits with different units, can be combined seamlessly. In addition, our framework offers interpretation and visualization tools to help downstream analyses. Using our method, we combined 18 traits related to cardiovascular disease and identified 13 pleiotropic loci, which showed four different patterns of associations.


Asunto(s)
Pleiotropía Genética/genética , Estudio de Asociación del Genoma Completo/métodos , Enfermedades Cardiovasculares/genética , Predisposición Genética a la Enfermedad/genética , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética
7.
Brief Bioinform ; 23(4)2022 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-35753701

RESUMEN

Advances in whole-genome sequencing (WGS) promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from WGS data presents a substantial number of challenges and a plethora of SV detection methods have been developed. Currently, evidence that investigators can use to select appropriate SV detection tools is lacking. In this article, we have evaluated the performance of SV detection tools on mouse and human WGS data using a comprehensive polymerase chain reaction-confirmed gold standard set of SVs and the genome-in-a-bottle variant set, respectively. In contrast to the previous benchmarking studies, our gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of the SV detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance as the SV detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Additionally, we have determined the SV callers best suited for low- and ultralow-pass sequencing data as well as for different deletion length categories.


Asunto(s)
Benchmarking , Genoma Humano , Animales , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Ratones , Secuenciación Completa del Genoma/métodos
8.
PLoS Genet ; 17(9): e1009733, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34543273

RESUMEN

Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of "fine mapping" methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. We demonstrate the efficacy of MsCAVIAR in both a simulation study and a trans-ethnic, trans-biobank fine mapping analysis of High Density Lipoprotein (HDL).


Asunto(s)
Estudio de Asociación del Genoma Completo , Causalidad , Mapeo Cromosómico/métodos , Humanos , Desequilibrio de Ligamiento , Lipoproteínas HDL/genética , Polimorfismo de Nucleótido Simple
9.
BMC Genomics ; 23(1): 260, 2022 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-35379194

RESUMEN

BACKGROUND: The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused global disruption of human health and activity. Being able to trace the early outbreak of SARS-CoV-2 within a locality can inform public health measures and provide insights to contain or prevent viral transmission. Investigation of the transmission history requires efficient sequencing methods and analytic strategies, which can be generally useful in the study of viral outbreaks. METHODS: The County of Los Angeles (hereafter, LA County) sustained a large outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). To learn about the transmission history, we carried out surveillance viral genome sequencing to determine 142 viral genomes from unique patients seeking care at the University of California, Los Angeles (UCLA) Health System. 86 of these genomes were from samples collected before April 19, 2020. RESULTS: We found that the early outbreak in LA County, as in other international air travel hubs, was seeded by multiple introductions of strains from Asia and Europe. We identified a USA-specific strain, B.1.43, which was found predominantly in California and Washington State. While samples from LA County carried the ancestral B.1.43 genome, viral genomes from neighboring counties in California and from counties in Washington State carried additional mutations, suggesting a potential origin of B.1.43 in Southern California. We quantified the transmission rate of SARS-CoV-2 over time, and found evidence that the public health measures put in place in LA County to control the virus were effective at preventing transmission, but might have been undermined by the many introductions of SARS-CoV-2 into the region. CONCLUSION: Our work demonstrates that genome sequencing can be a powerful tool for investigating outbreaks and informing the public health response. Our results reinforce the critical need for the USA to have coordinated inter-state responses to the pandemic.


Asunto(s)
COVID-19 , COVID-19/epidemiología , Brotes de Enfermedades , Genómica , Humanos , Los Angeles/epidemiología , SARS-CoV-2/genética
10.
PLoS Biol ; 17(6): e3000333, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-31220077

RESUMEN

Developing new software tools for analysis of large-scale biological data is a key component of advancing modern biomedical research. Scientific reproduction of published findings requires running computational tools on data generated by such studies, yet little attention is presently allocated to the installability and archival stability of computational software tools. Scientific journals require data and code sharing, but none currently require authors to guarantee the continuing functionality of newly published tools. We have estimated the archival stability of computational biology software tools by performing an empirical analysis of the internet presence for 36,702 omics software resources published from 2005 to 2017. We found that almost 28% of all resources are currently not accessible through uniform resource locators (URLs) published in the paper they first appeared in. Among the 98 software tools selected for our installability test, 51% were deemed "easy to install," and 28% of the tools failed to be installed at all because of problems in the implementation. Moreover, for papers introducing new software, we found that the number of citations significantly increased when authors provided an easy installation process. We propose for incorporation into journal policy several practical solutions for increasing the widespread installability and archival stability of published bioinformatics software.


Asunto(s)
Biología Computacional/métodos , Difusión de la Información/métodos , Almacenamiento y Recuperación de la Información/métodos , Investigación Biomédica , Bases de Datos Factuales , Humanos , Internet , Programas Informáticos/tendencias
11.
Nature ; 538(7626): 523-527, 2016 10 27.
Artículo en Inglés | MEDLINE | ID: mdl-27760116

RESUMEN

Three-dimensional physical interactions within chromosomes dynamically regulate gene expression in a tissue-specific manner. However, the 3D organization of chromosomes during human brain development and its role in regulating gene networks dysregulated in neurodevelopmental disorders, such as autism or schizophrenia, are unknown. Here we generate high-resolution 3D maps of chromatin contacts during human corticogenesis, permitting large-scale annotation of previously uncharacterized regulatory relationships relevant to the evolution of human cognition and disease. Our analyses identify hundreds of genes that physically interact with enhancers gained on the human lineage, many of which are under purifying selection and associated with human cognitive function. We integrate chromatin contacts with non-coding variants identified in schizophrenia genome-wide association studies (GWAS), highlighting multiple candidate schizophrenia risk genes and pathways, including transcription factors involved in neurogenesis, and cholinergic signalling molecules, several of which are supported by independent expression quantitative trait loci and gene expression analyses. Genome editing in human neural progenitors suggests that one of these distal schizophrenia GWAS loci regulates FOXG1 expression, supporting its potential role as a schizophrenia risk gene. This work provides a framework for understanding the effect of non-coding regulatory elements on human brain development and the evolution of cognition, and highlights novel mechanisms underlying neuropsychiatric disorders.


Asunto(s)
Encéfalo/embriología , Encéfalo/metabolismo , Cromatina/química , Cromatina/genética , Cromosomas Humanos/química , Cromosomas Humanos/genética , Regulación del Desarrollo de la Expresión Génica , Conformación de Ácido Nucleico , Cromatina/metabolismo , Cromosomas Humanos/metabolismo , Cognición , Elementos de Facilitación Genéticos/genética , Epigénesis Genética , Factores de Transcripción Forkhead/genética , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo , Humanos , Proteínas del Tejido Nervioso/genética , Células-Madre Neurales/metabolismo , Neurogénesis , Especificidad de Órganos , Polimorfismo de Nucleótido Simple/genética , Regiones Promotoras Genéticas/genética , Reproducibilidad de los Resultados , Esquizofrenia/genética , Esquizofrenia/patología
12.
PLoS Genet ; 15(12): e1008481, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31834882

RESUMEN

Many disease risk loci identified in genome-wide association studies are present in non-coding regions of the genome. Previous studies have found enrichment of expression quantitative trait loci (eQTLs) in disease risk loci, indicating that identifying causal variants for gene expression is important for elucidating the genetic basis of not only gene expression but also complex traits. However, detecting causal variants is challenging due to complex genetic correlation among variants known as linkage disequilibrium (LD) and the presence of multiple causal variants within a locus. Although several fine-mapping approaches have been developed to overcome these challenges, they may produce large sets of putative causal variants when true causal variants are in high LD with many non-causal variants. In eQTL studies, there is an additional source of information that can be used to improve fine-mapping called allelic imbalance (AIM) that measures imbalance in gene expression on two chromosomes of a diploid organism. In this work, we develop a novel statistical method that leverages both AIM and total expression data to detect causal variants that regulate gene expression. We illustrate through simulations and application to 10 tissues of the Genotype-Tissue Expression (GTEx) dataset that our method identifies the true causal variants with higher specificity than an approach that uses only eQTL information. Across all tissues and genes, our method achieves a median reduction rate of 11% in the number of putative causal variants. We use chromatin state data from the Roadmap Epigenomics Consortium to show that the putative causal variants identified by our method are enriched for active regions of the genome, providing orthogonal support that our method identifies causal variants with increased specificity.


Asunto(s)
Desequilibrio Alélico , Cromatina/genética , Mapeo Cromosómico/métodos , Sitios de Carácter Cuantitativo , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Desequilibrio de Ligamiento , Herencia Multifactorial , Polimorfismo de Nucleótido Simple
13.
PLoS Genet ; 15(12): e1008528, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31869344

RESUMEN

Asthma is a chronic inflammatory disease of the airways with contributions from genes, environmental exposures, and their interactions. While genome-wide association studies (GWAS) in humans have identified ~200 susceptibility loci, the genetic factors that modulate risk of asthma through gene-environment (GxE) interactions remain poorly understood. Using the Hybrid Mouse Diversity Panel (HMDP), we sought to identify the genetic determinants of airway hyperreactivity (AHR) in response to diesel exhaust particles (DEP), a model traffic-related air pollutant. As measured by invasive plethysmography, AHR under control and DEP-exposed conditions varied 3-4-fold in over 100 inbred strains from the HMDP. A GWAS with linear mixed models mapped two loci significantly associated with lung resistance under control exposure to chromosomes 2 (p = 3.0x10-6) and 19 (p = 5.6x10-7). The chromosome 19 locus harbors Il33 and is syntenic to asthma association signals observed at the IL33 locus in humans. A GxE GWAS for post-DEP exposure lung resistance identified a significantly associated locus on chromosome 3 (p = 2.5x10-6). Among the genes at this locus is Dapp1, an adaptor molecule expressed in immune-related and mucosal tissues, including the lung. Dapp1-deficient mice exhibited significantly lower AHR than control mice but only after DEP exposure, thus functionally validating Dapp1 as one of the genes underlying the GxE association at this locus. In summary, our results indicate that some of the genetic determinants for asthma-related phenotypes may be shared between mice and humans, as well as the existence of GxE interactions in mice that modulate lung function in response to air pollution exposures relevant to humans.


Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/genética , Contaminantes Atmosféricos/toxicidad , Asma/genética , Hiperreactividad Bronquial/inducido químicamente , Lipoproteínas/genética , Emisiones de Vehículos/toxicidad , Animales , Asma/inducido químicamente , Hiperreactividad Bronquial/genética , Mapeo Cromosómico , Modelos Animales de Enfermedad , Femenino , Interacción Gen-Ambiente , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Ratones , Pletismografía
14.
Am J Physiol Lung Cell Mol Physiol ; 320(1): L41-L62, 2021 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-33050709

RESUMEN

In this study, a genetically diverse panel of 43 mouse strains was exposed to ammonia, and genome-wide association mapping was performed employing a single-nucleotide polymorphism (SNP) assembly. Transcriptomic analysis was used to help resolve the genetic determinants of ammonia-induced acute lung injury. The encoded proteins were prioritized based on molecular function, nonsynonymous SNP within a functional domain or SNP within the promoter region that altered expression. This integrative functional approach revealed 14 candidate genes that included Aatf, Avil, Cep162, Hrh4, Lama3, Plcb4, and Ube2cbp, which had significant SNP associations, and Aff1, Bcar3, Cntn4, Kcnq5, Prdm10, Ptcd3, and Snx19, which had suggestive SNP associations. Of these genes, Bcar3, Cep162, Hrh4, Kcnq5, and Lama3 are particularly noteworthy and had pathophysiological roles that could be associated with acute lung injury in several ways.


Asunto(s)
Lesión Pulmonar Aguda/patología , Amoníaco/toxicidad , Marcadores Genéticos , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Transcriptoma , Lesión Pulmonar Aguda/inducido químicamente , Lesión Pulmonar Aguda/genética , Animales , Femenino , Regulación de la Expresión Génica , Humanos , Ratones , Ratones Endogámicos BALB C , Ratones Endogámicos CBA
15.
PLoS Genet ; 14(12): e1007309, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30589851

RESUMEN

A genome-wide association study (GWAS) seeks to identify genetic variants that contribute to the development and progression of a specific disease. Over the past 10 years, new approaches using mixed models have emerged to mitigate the deleterious effects of population structure and relatedness in association studies. However, developing GWAS techniques to accurately test for association while correcting for population structure is a computational and statistical challenge. Using laboratory mouse strains as an example, our review characterizes the problem of population structure in association studies and describes how it can cause false positive associations. We then motivate mixed models in the context of unmodeled factors.


Asunto(s)
Genética de Población , Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Animales , Sesgo , Enfermedad/genética , Femenino , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Modelos Lineales , Masculino , Ratones , Modelos Estadísticos , Linaje , Fenotipo , Filogenia , Polimorfismo de Nucleótido Simple
16.
BMC Biol ; 18(1): 92, 2020 07 28.
Artículo en Inglés | MEDLINE | ID: mdl-32723395

RESUMEN

An amendment to this paper has been published and can be accessed via the original article.

17.
BMC Biol ; 18(1): 37, 2020 04 07.
Artículo en Inglés | MEDLINE | ID: mdl-32264902

RESUMEN

Metagenomics studies leverage genomic reference databases to generate discoveries in basic science and translational research. However, current microbial studies use disparate reference databases that lack consistent standards of specimen inclusion, data preparation, taxon labelling and accessibility, hindering their quality and comprehensiveness, and calling for the establishment of recommendations for reference genome database assembly. Here, we analyze existing fungal and bacterial databases and discuss guidelines for the development of a master reference database that promises to improve the quality and quantity of omics research.


Asunto(s)
Bacterias/genética , Bases de Datos Genéticas/normas , Hongos/genética , Metagenómica/normas , Metagenómica/instrumentación
18.
Am J Hum Genet ; 100(5): 789-802, 2017 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-28475861

RESUMEN

Recent successes in genome-wide association studies (GWASs) make it possible to address important questions about the genetic architecture of complex traits, such as allele frequency and effect size. One lesser-known aspect of complex traits is the extent of allelic heterogeneity (AH) arising from multiple causal variants at a locus. We developed a computational method to infer the probability of AH and applied it to three GWASs and four expression quantitative trait loci (eQTL) datasets. We identified a total of 4,152 loci with strong evidence of AH. The proportion of all loci with identified AH is 4%-23% in eQTLs, 35% in GWASs of high-density lipoprotein (HDL), and 23% in GWASs of schizophrenia. For eQTLs, we observed a strong correlation between sample size and the proportion of loci with AH (R2 = 0.85, p = 2.2 × 10-16), indicating that statistical power prevents identification of AH in other loci. Understanding the extent of AH may guide the development of new methods for fine mapping and association mapping of complex traits.


Asunto(s)
Alelos , Frecuencia de los Genes , Sitios de Carácter Cuantitativo , Bases de Datos Genéticas , Estudios de Asociación Genética , Humanos , Desequilibrio de Ligamiento , Modelos Moleculares , Fenotipo
19.
Genet Epidemiol ; 42(1): 49-63, 2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-29114909

RESUMEN

BACKGROUND: Epistasis and gene-environment interactions are known to contribute significantly to variation of complex phenotypes in model organisms. However, their identification in human association studies remains challenging for myriad reasons. In the case of epistatic interactions, the large number of potential interacting sets of genes presents computational, multiple hypothesis correction, and other statistical power issues. In the case of gene-environment interactions, the lack of consistently measured environmental covariates in most disease studies precludes searching for interactions and creates difficulties for replicating studies. RESULTS: In this work, we develop a new statistical approach to address these issues that leverages genetic ancestry, defined as the proportion of ancestry derived from each ancestral population (e.g., the fraction of European/African ancestry in African Americans), in admixed populations. We applied our method to gene expression and methylation data from African American and Latino admixed individuals, respectively, identifying nine interactions that were significant at P<5×10-8. We show that two of the interactions in methylation data replicate, and the remaining six are significantly enriched for low P-values (P<1.8×10-6). CONCLUSION: We show that genetic ancestry can be a useful proxy for unknown and unmeasured covariates in the search for interaction effects. These results have important implications for our understanding of the genetic architecture of complex traits.


Asunto(s)
Población Negra/genética , Negro o Afroamericano/genética , Epistasis Genética/genética , Interacción Gen-Ambiente , Hispánicos o Latinos/genética , Modelos Genéticos , Población Blanca/genética , Metilación de ADN , Humanos , Fenotipo
20.
BMC Genomics ; 20(Suppl 5): 423, 2019 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-31167634

RESUMEN

BACKGROUND: High throughput sequencing has spurred the development of metagenomics, which involves the direct analysis of microbial communities in various environments such as soil, ocean water, and the human body. Many existing methods based on marker genes or k-mers have limited sensitivity or are too computationally demanding for many users. Additionally, most work in metagenomics has focused on bacteria and archaea, neglecting to study other key microbes such as viruses and eukaryotes. RESULTS: Here we present a method, MiCoP (Microbiome Community Profiling), that uses fast-mapping of reads to build a comprehensive reference database of full genomes from viruses and eukaryotes to achieve maximum read usage and enable the analysis of the virome and eukaryome in each sample. We demonstrate that mapping of metagenomic reads is feasible for the smaller viral and eukaryotic reference databases. We show that our method is accurate on simulated and mock community data and identifies many more viral and fungal species than previously-reported results on real data from the Human Microbiome Project. CONCLUSIONS: MiCoP is a mapping-based method that proves more effective than existing methods at abundance profiling of viruses and eukaryotes in metagenomic samples. MiCoP can be used to detect the full diversity of these communities. The code, data, and documentation are publicly available on GitHub at: https://github.com/smangul1/MiCoP .


Asunto(s)
Biología Computacional/métodos , Hongos/genética , Marcadores Genéticos , Metagenómica/métodos , Microbiota , Análisis de Secuencia de ADN/métodos , Virus/genética , Algoritmos , Hongos/clasificación , Genoma Fúngico , Genoma Viral , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Virus/clasificación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA