Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 40(4)2024 03 29.
Artigo em Inglês | MEDLINE | ID: mdl-38490256

RESUMO

SUMMARY: Admixed populations, with their unique and diverse genetic backgrounds, are often underrepresented in genetic studies. This oversight not only limits our understanding but also exacerbates existing health disparities. One major barrier has been the lack of efficient tools tailored for the special challenges of genetic studies of admixed populations. Here, we present admix-kit, an integrated toolkit and pipeline for genetic analyses of admixed populations. Admix-kit implements a suite of methods to facilitate genotype and phenotype simulation, association testing, genetic architecture inference, and polygenic scoring in admixed populations. AVAILABILITY AND IMPLEMENTATION: Admix-kit package is open-source and available at https://github.com/KangchengHou/admix-kit. Additionally, users can use the pipeline designed for admixed genotype simulation available at https://github.com/UW-GAC/admix-kit_workflow.


Assuntos
Software , Genótipo , Fenótipo
2.
Pac Symp Biocomput ; 29: 322-326, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38160289

RESUMO

The following sections are included:OverviewDealing with the lack of diversity in current research datasetsDevelopment of fair machine learning algorithmsRace, genetic ancestry, and population structureConclusionAcknowledgments.


Assuntos
Biologia Computacional , Medicina de Precisão , Humanos , Aprendizado de Máquina , Desigualdades de Saúde
3.
bioRxiv ; 2023 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-37873338

RESUMO

Admixed populations, with their unique and diverse genetic backgrounds, are often underrepresented in genetic studies. This oversight not only limits our understanding but also exacerbates existing health disparities. One major barrier has been the lack of efficient tools tailored for the special challenges of genetic study of admixed populations. Here, we present admix-kit, an integrated toolkit and pipeline for genetic analyses of admixed populations. Admix-kit implements a suite of methods to facilitate genotype and phenotype simulation, association testing, genetic architecture inference, and polygenic scoring in admixed populations.

4.
Nature ; 618(7966): 774-781, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37198491

RESUMO

Polygenic scores (PGSs) have limited portability across different groupings of individuals (for example, by genetic ancestries and/or social determinants of health), preventing their equitable use1-3. PGS portability has typically been assessed using a single aggregate population-level statistic (for example, R2)4, ignoring inter-individual variation within the population. Here, using a large and diverse Los Angeles biobank5 (ATLAS, n = 36,778) along with the UK Biobank6 (UKBB, n = 487,409), we show that PGS accuracy decreases individual-to-individual along the continuum of genetic ancestries7 in all considered populations, even within traditionally labelled 'homogeneous' genetic ancestries. The decreasing trend is well captured by a continuous measure of genetic distance (GD) from the PGS training data: Pearson correlation of -0.95 between GD and PGS accuracy averaged across 84 traits. When applying PGS models trained on individuals labelled as white British in the UKBB to individuals with European ancestries in ATLAS, individuals in the furthest GD decile have 14% lower accuracy relative to the closest decile; notably, the closest GD decile of individuals with Hispanic Latino American ancestries show similar PGS performance to the furthest GD decile of individuals with European ancestries. GD is significantly correlated with PGS estimates themselves for 82 of 84 traits, further emphasizing the importance of incorporating the continuum of genetic ancestries in PGS interpretation. Our results highlight the need to move away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when considering PGSs.


Assuntos
Herança Multifatorial , Grupos Raciais , Humanos , Europa (Continente)/etnologia , Hispânico ou Latino/genética , Herança Multifatorial/genética , Grupos Raciais/genética , Reino Unido , População Branca/genética , População Europeia/genética , Los Angeles , Bases de Dados Genéticas
5.
Pac Symp Biocomput ; 28: 181-185, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36540975

RESUMO

The following sections are included: Overview, Equitable risk prediction, Pharmacoequity, Race, genetic ancestry, and population structure, Conclusion, Acknowledgments, References.


Assuntos
Biologia Computacional , Medicina de Precisão , Humanos
6.
Genome Med ; 14(1): 104, 2022 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-36085083

RESUMO

BACKGROUND: Large medical centers in urban areas, like Los Angeles, care for a diverse patient population and offer the potential to study the interplay between genetic ancestry and social determinants of health. Here, we explore the implications of genetic ancestry within the University of California, Los Angeles (UCLA) ATLAS Community Health Initiative-an ancestrally diverse biobank of genomic data linked with de-identified electronic health records (EHRs) of UCLA Health patients (N=36,736). METHODS: We quantify the extensive continental and subcontinental genetic diversity within the ATLAS data through principal component analysis, identity-by-descent, and genetic admixture. We assess the relationship between genetically inferred ancestry (GIA) and >1500 EHR-derived phenotypes (phecodes). Finally, we demonstrate the utility of genetic data linked with EHR to perform ancestry-specific and multi-ancestry genome and phenome-wide scans across a broad set of disease phenotypes. RESULTS: We identify 5 continental-scale GIA clusters including European American (EA), African American (AA), Hispanic Latino American (HL), South Asian American (SAA) and East Asian American (EAA) individuals and 7 subcontinental GIA clusters within the EAA GIA corresponding to Chinese American, Vietnamese American, and Japanese American individuals. Although we broadly find that self-identified race/ethnicity (SIRE) is highly correlated with GIA, we still observe marked differences between the two, emphasizing that the populations defined by these two criteria are not analogous. We find a total of 259 significant associations between continental GIA and phecodes even after accounting for individuals' SIRE, demonstrating that for some phenotypes, GIA provides information not already captured by SIRE. GWAS identifies significant associations for liver disease in the 22q13.31 locus across the HL and EAA GIA groups (HL p-value=2.32×10-16, EAA p-value=6.73×10-11). A subsequent PheWAS at the top SNP reveals significant associations with neurologic and neoplastic phenotypes specifically within the HL GIA group. CONCLUSIONS: Overall, our results explore the interplay between SIRE and GIA within a disease context and underscore the utility of studying the genomes of diverse individuals through biobank-scale genotyping linked with EHR-based phenotyping.


Assuntos
Registros Eletrônicos de Saúde , Saúde Pública , Povo Asiático , Bancos de Espécimes Biológicos , Genômica , Humanos
7.
iScience ; 24(3): 102188, 2021 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-33615196

RESUMO

Coronavirus disease 2019 (COVID-19) has exposed health care disparities in minority groups including Hispanics/Latinxs (HL). Studies of COVID-19 risk factors for HL have relied on county-level data. We investigated COVID-19 risk factors in HL using individual-level, electronic health records in a Los Angeles health system between March 9, 2020, and August 31, 2020. Of 9,287 HL tested for SARS-CoV-2, 562 were positive. HL constituted an increasing percentage of all COVID-19 positive individuals as disease severity escalated. Multiple risk factors identified in Non-Hispanic/Latinx whites (NHL-W), like renal disease, also conveyed risk in HL. Pre-existing nonrheumatic mitral valve disorder was a risk factor for HL hospitalization but not for NHL-W COVID-19 or HL influenza hospitalization, suggesting it may be a specific HL COVID-19 risk. Admission laboratory values also suggested that HL presented with a greater inflammatory response. COVID-19 risk factors for HL can help guide equitable government policies and identify at-risk populations.

8.
medRxiv ; 2020 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-32637977

RESUMO

With the continuing coronavirus disease 2019 (COVID-19) pandemic coupled with phased reopening, it is critical to identify risk factors associated with susceptibility and severity of disease in a diverse population to help shape government policies, guide clinical decision making, and prioritize future COVID-19 research. In this retrospective case-control study, we used de-identified electronic health records (EHR) from the University of California Los Angeles (UCLA) Health System between March 9th, 2020 and June 14th, 2020 to identify risk factors for COVID-19 susceptibility (severe acute respiratory distress syndrome coronavirus 2 (SARS-CoV-2) PCR test positive), inpatient admission, and severe outcomes (treatment in an intensive care unit or intubation). Of the 26,602 individuals tested by PCR for SARS-CoV-2, 992 were COVID-19 positive (3.7% of Tested), 220 were admitted in the hospital (22% of COVID-19 positive), and 77 had a severe outcome (35% of Inpatient). Consistent with previous studies, males and individuals older than 65 years old had increased risk of inpatient admission. Notably, individuals self-identifying as Hispanic or Latino constituted an increasing percentage of COVID-19 patients as disease severity escalated, comprising 24% of those testing positive, but 40% of those with a severe outcome, a disparity that remained after correcting for medical comorbidities. Cardiovascular disease, hypertension, and renal disease were premorbid risk factors present before SARS-CoV-2 PCR testing associated with COVID-19 susceptibility. Less well-established risk factors for COVID-19 susceptibility included pre-existing dementia (odds ratio (OR) 5.2 [3.2-8.3], p=2.6 x 10-10), mental health conditions (depression OR 2.1 [1.6-2.8], p=1.1 x 10-6) and vitamin D deficiency (OR 1.8 [1.4-2.2], p=5.7 x 10-6). Renal diseases including end-stage renal disease and anemia due to chronic renal disease were the predominant premorbid risk factors for COVID-19 inpatient admission. Other less established risk factors for COVID-19 inpatient admission included previous renal transplant (OR 9.7 [2.8-39], p=3.2x10-4) and disorders of the immune system (OR 6.0 [2.3, 16], p=2.7x10-4). Prior use of oral steroid medications was associated with decreased COVID-19 positive testing risk (OR 0.61 [0.45, 0.81], p=4.3x10-4), but increased inpatient admission risk (OR 4.5 [2.3, 8.9], p=1.8x10-5). We did not observe that prior use of angiotensin converting enzyme inhibitors or angiotensin receptor blockers increased the risk of testing positive for SARS-CoV-2, being admitted to the hospital, or having a severe outcome. This study involving direct EHR extraction identified known and less well-established demographics, and prior diagnoses and medications as risk factors for COVID-19 susceptibility and inpatient admission. Knowledge of these risk factors including marked ethnic disparities observed in disease severity should guide government policies, identify at-risk populations, inform clinical decision making, and prioritize future COVID-19 research.

9.
J Comput Biol ; 22(5): 451-62, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25526526

RESUMO

Ever since its introduction, the haplotype copy model has proven to be one of the most successful approaches for modeling genetic variation in human populations, with applications ranging from ancestry inference to genotype phasing and imputation. Motivated by coalescent theory, this approach assumes that any chromosome (haplotype) can be modeled as a mosaic of segments copied from a set of chromosomes sampled from the same population. At the core of the model is the assumption that any chromosome from the sample is equally likely to contribute a priori to the copying process. Motivated by recent works that model genetic variation in a geographic continuum, we propose a new spatial-aware haplotype copy model that jointly models geography and the haplotype copying process. We extend hidden Markov models of haplotype diversity such that at any given location, haplotypes that are closest in the genetic-geographic continuum map are a priori more likely to contribute to the copying process than distant ones. Through simulations starting from the 1000 Genomes data, we show that our model achieves superior accuracy in genotype imputation over the standard spatial-unaware haplotype copy model. In addition, we show the utility of our model in selecting a small personalized reference panel for imputation that leads to both improved accuracy as well as to a lower computational runtime than the standard approach. Finally, we show our proposed model can be used to localize individuals on the genetic-geographical map on the basis of their genotype data.


Assuntos
Algoritmos , Genoma Humano , Haplótipos , Modelos Genéticos , Grupos Raciais/genética , Cromossomos Humanos Par 22 , Variação Genética , Genética Populacional , Estudo de Associação Genômica Ampla , Geografia , Humanos , Desequilíbrio de Ligação , Cadeias de Markov , Polimorfismo de Nucleotídeo Único
10.
G3 (Bethesda) ; 4(12): 2505-18, 2014 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-25371484

RESUMO

Ancestry analysis from genetic data plays a critical role in studies of human disease and evolution. Recent work has introduced explicit models for the geographic distribution of genetic variation and has shown that such explicit models yield superior accuracy in ancestry inference over nonmodel-based methods. Here we extend such work to introduce a method that models admixture between ancestors from multiple sources across a geographic continuum. We devise efficient algorithms based on hidden Markov models to localize on a map the recent ancestors (e.g., grandparents) of admixed individuals, joint with assigning ancestry at each locus in the genome. We validate our methods by using empirical data from individuals with mixed European ancestry from the Population Reference Sample study and show that our approach is able to localize their recent ancestors within an average of 470 km of the reported locations of their grandparents. Furthermore, simulations from real Population Reference Sample genotype data show that our method attains high accuracy in localizing recent ancestors of admixed individuals in Europe (an average of 550 km from their true location for localization of two ancestries in Europe, four generations ago). We explore the limits of ancestry localization under our approach and find that performance decreases as the number of distinct ancestries and generations since admixture increases. Finally, we build a map of expected localization accuracy across admixed individuals according to the location of origin within Europe of their ancestors.


Assuntos
Modelos Genéticos , Algoritmos , Diploide , Frequência do Gene , Loci Gênicos , Variação Genética , Genoma Humano , Haplótipos , Humanos , Desequilíbrio de Ligação , Cadeias de Markov , População Branca/genética
11.
Bioinformatics ; 28(10): 1359-67, 2012 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-22495753

RESUMO

MOTIVATION: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in two-way admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). RESULTS: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos.


Assuntos
Algoritmos , Genética Populacional , Hispânico ou Latino/genética , Fluxo Gênico , Genética Populacional/métodos , Haplótipos , Humanos , Indígenas Norte-Americanos/genética , Desequilíbrio de Ligação , Cadeias de Markov , México , Porto Rico , Estados Unidos , População Branca/genética
12.
PLoS Genet ; 7(4): e1001371, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21541012

RESUMO

While genome-wide association studies (GWAS) have primarily examined populations of European ancestry, more recent studies often involve additional populations, including admixed populations such as African Americans and Latinos. In admixed populations, linkage disequilibrium (LD) exists both at a fine scale in ancestral populations and at a coarse scale (admixture-LD) due to chromosomal segments of distinct ancestry. Disease association statistics in admixed populations have previously considered SNP association (LD mapping) or admixture association (mapping by admixture-LD), but not both. Here, we introduce a new statistical framework for combining SNP and admixture association in case-control studies, as well as methods for local ancestry-aware imputation. We illustrate the gain in statistical power achieved by these methods by analyzing data of 6,209 unrelated African Americans from the CARe project genotyped on the Affymetrix 6.0 chip, in conjunction with both simulated and real phenotypes, as well as by analyzing the FGFR2 locus using breast cancer GWAS data from 5,761 African-American women. We show that, at typed SNPs, our method yields an 8% increase in statistical power for finding disease risk loci compared to the power achieved by standard methods in case-control studies. At imputed SNPs, we observe an 11% increase in statistical power for mapping disease loci when our local ancestry-aware imputation framework and the new scoring statistic are jointly employed. Finally, we show that our method increases statistical power in regions harboring the causal SNP in the case when the causal SNP is untyped and cannot be imputed. Our methods and our publicly available software are broadly applicable to GWAS in admixed populations.


Assuntos
Negro ou Afro-Americano/genética , Neoplasias da Mama/genética , Genoma Humano , Estudo de Associação Genômica Ampla/métodos , Receptor Tipo 2 de Fator de Crescimento de Fibroblastos/genética , Negro ou Afro-Americano/estatística & dados numéricos , Algoritmos , Mapeamento Cromossômico , Doença das Coronárias/genética , Diabetes Mellitus Tipo 2/genética , Feminino , Frequência do Gene , Variação Genética , Genética Populacional/estatística & dados numéricos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Genótipo , Humanos , Desequilíbrio de Ligação , Masculino , Razão de Chances , Fenótipo , Polimorfismo de Nucleotídeo Único , Análise de Componente Principal , Software
13.
J Comput Biol ; 18(3): 459-68, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21385047

RESUMO

Abstract Next generation high-throughput sequencing (NGS) is poised to replace array-based technologies as the experiment of choice for measuring RNA expression levels. Several groups have demonstrated the power of this new approach (RNA-seq), making significant and novel contributions and simultaneously proposing methodologies for the analysis of RNA-seq data. In a typical experiment, millions of short sequences (reads) are sampled from RNA extracts and mapped back to a reference genome. The number of reads mapping to each gene is used as proxy for its corresponding RNA concentration. A significant challenge in analyzing RNA expression of homologous genes is the large fraction of the reads that map to multiple locations in the reference genome. Currently, these reads are either dropped from the analysis, or a naive algorithm is used to estimate their underlying distribution. In this work, we present a rigorous alternative for handling the reads generated in an RNA-seq experiment within a probabilistic model for RNA-seq data; we develop maximum likelihood-based methods for estimating the model parameters. In contrast to previous methods, our model takes into account the fact that the DNA of the sequenced individual is not a perfect copy of the reference sequence. We show with both simulated and real RNA-seq data that our new method improves the accuracy and power of RNA-seq experiments.


Assuntos
Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Perfilação da Expressão Gênica/economia , Sequenciamento de Nucleotídeos em Larga Escala/economia , Humanos , Modelos Estatísticos , RNA/genética
14.
J Comput Biol ; 15(9): 1155-71, 2008 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-18973433

RESUMO

The presence of genotyping errors can invalidate statistical tests for linkage and disease association, particularly for methods based on haplotype analysis. Becker et al. have recently proposed a simple likelihood ratio approach for detecting errors in trio genotype data. Under this approach, a SNP genotype is flagged as a potential error if the likelihood associated with the original trio genotype data increases by a multiplicative factor exceeding a user selected threshold when the SNP genotype under test is deleted. In this article we give improved error detection methods using the likelihood ratio test approach in conjunction with likelihood functions that can be efficiently computed based on a Hidden Markov Model of haplotype diversity in the population under study. Experimental results on both simulated and real datasets show that proposed methods have highly scalable running time and achieve significantly improved detection accuracy compared to previous methods.


Assuntos
Variação Genética , Funções Verossimilhança , Cadeias de Markov , Algoritmos , Simulação por Computador , Ligação Genética , Genótipo , Haplótipos , Humanos , Modelos Genéticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA