Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
2.
Genome Med ; 14(1): 104, 2022 Sep 09.
Article in English | MEDLINE | ID: mdl-36085083

ABSTRACT

BACKGROUND: Large medical centers in urban areas, like Los Angeles, care for a diverse patient population and offer the potential to study the interplay between genetic ancestry and social determinants of health. Here, we explore the implications of genetic ancestry within the University of California, Los Angeles (UCLA) ATLAS Community Health Initiative-an ancestrally diverse biobank of genomic data linked with de-identified electronic health records (EHRs) of UCLA Health patients (N=36,736). METHODS: We quantify the extensive continental and subcontinental genetic diversity within the ATLAS data through principal component analysis, identity-by-descent, and genetic admixture. We assess the relationship between genetically inferred ancestry (GIA) and >1500 EHR-derived phenotypes (phecodes). Finally, we demonstrate the utility of genetic data linked with EHR to perform ancestry-specific and multi-ancestry genome and phenome-wide scans across a broad set of disease phenotypes. RESULTS: We identify 5 continental-scale GIA clusters including European American (EA), African American (AA), Hispanic Latino American (HL), South Asian American (SAA) and East Asian American (EAA) individuals and 7 subcontinental GIA clusters within the EAA GIA corresponding to Chinese American, Vietnamese American, and Japanese American individuals. Although we broadly find that self-identified race/ethnicity (SIRE) is highly correlated with GIA, we still observe marked differences between the two, emphasizing that the populations defined by these two criteria are not analogous. We find a total of 259 significant associations between continental GIA and phecodes even after accounting for individuals' SIRE, demonstrating that for some phenotypes, GIA provides information not already captured by SIRE. GWAS identifies significant associations for liver disease in the 22q13.31 locus across the HL and EAA GIA groups (HL p-value=2.32×10-16, EAA p-value=6.73×10-11). A subsequent PheWAS at the top SNP reveals significant associations with neurologic and neoplastic phenotypes specifically within the HL GIA group. CONCLUSIONS: Overall, our results explore the interplay between SIRE and GIA within a disease context and underscore the utility of studying the genomes of diverse individuals through biobank-scale genotyping linked with EHR-based phenotyping.


Subject(s)
Electronic Health Records , Public Health , Asian People , Biological Specimen Banks , Genomics , Humans
3.
PLoS Genet ; 17(9): e1009772, 2021 09.
Article in English | MEDLINE | ID: mdl-34516545

ABSTRACT

Late-onset Alzheimer's disease (LOAD) is the most common type of dementia causing irreversible brain damage to the elderly and presents a major public health challenge. Clinical research and genome-wide association studies have suggested a potential contribution of the endocytic pathway to AD, with an emphasis on common loci. However, the contribution of rare variants in this pathway to AD has not been thoroughly investigated. In this study, we focused on the effect of rare variants on AD by first applying a rare-variant gene-set burden analysis using genes in the endocytic pathway on over 3,000 individuals with European ancestry from three large whole-genome sequencing (WGS) studies. We identified significant associations of rare-variant burden within the endocytic pathway with AD, which were successfully replicated in independent datasets. We further demonstrated that this endocytic rare-variant enrichment is associated with neurofibrillary tangles (NFTs) and age-related phenotypes, increasing the risk of obtaining severer brain damage, earlier age-at-onset, and earlier age-of-death. Next, by aggregating rare variants within each gene, we sought to identify single endocytic genes associated with AD and NFTs. Careful examination using NFTs revealed one significantly associated gene, ANKRD13D. To identify functional associations, we integrated bulk RNA-Seq data from over 600 brain tissues and found two endocytic expression genes (eGenes), HLA-A and SLC26A7, that displayed significant influences on their gene expressions. Differential expressions between AD patients and controls of these three identified genes were further examined by incorporating scRNA-Seq data from 48 post-mortem brain samples and demonstrated distinct expression patterns across cell types. Taken together, our results demonstrated strong rare-variant effect in the endocytic pathway on AD risk and progression and functional effect of gene expression alteration in both bulk and single-cell resolution, which may bring more insight and serve as valuable resources for future AD genetic studies, clinical research, and therapeutic targeting.


Subject(s)
Alzheimer Disease/pathology , Endocytosis , Phenotype , Alzheimer Disease/genetics , Genome-Wide Association Study , Humans , Polymorphism, Single Nucleotide , Whole Genome Sequencing
4.
Nat Commun ; 11(1): 2798, 2020 06 03.
Article in English | MEDLINE | ID: mdl-32493925

ABSTRACT

Mediator 12 (MED12) and MED13 are components of the Mediator multi-protein complex, that facilitates the initial steps of gene transcription. Here, in an Arabidopsis mutant screen, we identify MED12 and MED13 as positive gene regulators, both of which contribute broadly to morc1 de-repressed gene expression. Both MED12 and MED13 are preferentially required for the expression of genes depleted in active chromatin marks, a chromatin signature shared with morc1 re-activated loci. We further discover that MED12 tends to interact with genes that are responsive to environmental stimuli, including light and radiation. We demonstrate that light-induced transient gene expression depends on MED12, and is accompanied by a concomitant increase in MED12 enrichment during induction. In contrast, the steady-state expression level of these genes show little dependence on MED12, suggesting that MED12 is primarily required to aid the expression of genes in transition from less-active to more active states.


Subject(s)
Arabidopsis Proteins/metabolism , Arabidopsis/genetics , Gene Expression Regulation, Plant , Repressor Proteins/metabolism , Arabidopsis/radiation effects , Arabidopsis Proteins/genetics , Chromatin/metabolism , DNA Methylation/genetics , DNA Methylation/radiation effects , Epigenesis, Genetic/radiation effects , Gene Expression Regulation, Plant/radiation effects , Genes, Plant , Genes, Suppressor , Genetic Loci , Green Fluorescent Proteins/metabolism , Light , Plants, Genetically Modified , Repressor Proteins/genetics , Up-Regulation/genetics , Up-Regulation/radiation effects
5.
PLoS Comput Biol ; 15(12): e1007556, 2019 12.
Article in English | MEDLINE | ID: mdl-31851693

ABSTRACT

Next-generation sequencing technology (NGS) enables the discovery of nearly all genetic variants present in a genome. A subset of these variants, however, may have poor sequencing quality due to limitations in NGS or variant callers. In genetic studies that analyze a large number of sequenced individuals, it is critical to detect and remove those variants with poor quality as they may cause spurious findings. In this paper, we present ForestQC, a statistical tool for performing quality control on variants identified from NGS data by combining a traditional filtering approach and a machine learning approach. Our software uses the information on sequencing quality, such as sequencing depth, genotyping quality, and GC contents, to predict whether a particular variant is likely to be false-positive. To evaluate ForestQC, we applied it to two whole-genome sequencing datasets where one dataset consists of related individuals from families while the other consists of unrelated individuals. Results indicate that ForestQC outperforms widely used methods for performing quality control on variants such as VQSR of GATK by considerably improving the quality of variants to be included in the analysis. ForestQC is also very efficient, and hence can be applied to large sequencing datasets. We conclude that combining a machine learning algorithm trained with sequencing quality information and the filtering approach is a practical approach to perform quality control on genetic variants from sequencing data.


Subject(s)
Genetic Variation , High-Throughput Nucleotide Sequencing/statistics & numerical data , Software , Algorithms , Computational Biology , Databases, Genetic/statistics & numerical data , High-Throughput Nucleotide Sequencing/standards , Humans , Machine Learning , Polymorphism, Single Nucleotide , Quality Control , Whole Genome Sequencing/standards , Whole Genome Sequencing/statistics & numerical data
6.
Proc Natl Acad Sci U S A ; 115(5): E1069-E1074, 2018 01 30.
Article in English | MEDLINE | ID: mdl-29339507

ABSTRACT

Genome-wide characterization by next-generation sequencing has greatly improved our understanding of the landscape of epigenetic modifications. Since 2008, whole-genome bisulfite sequencing (WGBS) has become the gold standard for DNA methylation analysis, and a tremendous amount of WGBS data has been generated by the research community. However, the systematic comparison of DNA methylation profiles to identify regulatory mechanisms has yet to be fully explored. Here we reprocessed the raw data of over 500 publicly available Arabidopsis WGBS libraries from various mutant backgrounds, tissue types, and stress treatments and also filtered them based on sequencing depth and efficiency of bisulfite conversion. This enabled us to identify high-confidence differentially methylated regions (hcDMRs) by comparing each test library to over 50 high-quality wild-type controls. We developed statistical and quantitative measurements to analyze the overlapping of DMRs and to cluster libraries based on their effect on DNA methylation. In addition to confirming existing relationships, we revealed unanticipated connections between well-known genes. For instance, MET1 and CMT3 were found to be required for the maintenance of asymmetric CHH methylation at nonoverlapping regions of CMT2 targeted heterochromatin. Our comparative methylome approach has established a framework for extracting biological insights via large-scale comparison of methylomes and can also be adopted for other genomics datasets.


Subject(s)
Arabidopsis/genetics , DNA Methylation , Epigenomics , Gene Expression Regulation, Plant , Cluster Analysis , Computational Biology , CpG Islands , Epigenesis, Genetic , Gene Library , Genome, Plant , Heterochromatin/chemistry , High-Throughput Nucleotide Sequencing , Plants, Genetically Modified , Sequence Analysis, DNA , Sequence Analysis, RNA , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...