Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 60
Filtrar
1.
Nat Rev Genet ; 2024 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-38714860
2.
Nat Biotechnol ; 2024 Apr 26.
Artículo en Inglés | MEDLINE | ID: mdl-38671154

RESUMEN

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies. We curated variants from the Genome in a Bottle (GIAB) HG002 individual to create a TR dataset to benchmark existing and future TR analysis methods. We also present an improved variant comparison method that handles variants greater than 4 bp in length and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ~24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 'truth-set' TR benchmark. We demonstrate the utility of this pipeline across short-read and long-read technologies.

3.
bioRxiv ; 2024 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-38328152

RESUMEN

Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve TR analysis, especially for long or complex repeats. Here we introduce LongTR, which accurately genotypes tandem repeats from high fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr.

4.
bioRxiv ; 2023 Nov 23.
Artículo en Inglés | MEDLINE | ID: mdl-38045311

RESUMEN

Motivation: Somatic mosaicism, in which a mutation occurs post-zygotically, has been implicated in several developmental disorders, cancers, and other diseases. Short tandem repeats (STRs) consist of repeated sequences of 1-6bp and comprise more than 1 million loci in the human genome. Somatic mosaicism at STRs is known to play a key role in the pathogenicity of loci implicated in repeat expansion disorders and is highly prevalent in cancers exhibiting microsatellite instability. While a variety of tools have been developed to genotype germline variation at STRs, a method for systematically identifying mosaic STRs (mSTRs) is lacking. Results: We introduce prancSTR, a novel method for detecting mSTRs from individual high-throughput sequencing datasets. Unlike many existing mosaicism detection methods for other variant types, prancSTR does not require a matched control sample as input. We show that prancSTR accurately identifies mSTRs in simulated data and demonstrate its feasibility by identifying candidate mSTRs in whole genome sequencing (WGS) data derived from lymphoblastoid cell lines for individuals sequenced by the 1000 Genomes Project. Our analysis identified an average of 76 and 577 non-homopolymer and homopolymer mSTRs respectively per cell line as well as multiple cell lines with outlier mSTR counts more than 6 times the population average, suggesting a subset of cell lines have particularly high STR instability rates. Availability: prancSTR is freely available at https://github.com/gymrek-lab/trtools. Documentation: Detailed documentation is available at https://trtools.readthedocs.io/.

5.
Cell Genom ; 3(12): 100458, 2023 Dec 13.
Artículo en Inglés | MEDLINE | ID: mdl-38116119

RESUMEN

Short tandem repeats (STRs) are genomic regions consisting of repeated sequences of 1-6 bp in succession. Single-nucleotide polymorphism (SNP)-based genome-wide association studies (GWASs) do not fully capture STR effects. To study these effects, we imputed 445,720 STRs into genotype arrays from 408,153 White British UK Biobank participants and tested for association with 44 blood phenotypes. Using two fine-mapping methods, we identify 119 candidate causal STR-trait associations and estimate that STRs account for 5.2%-7.6% of causal variants identifiable from GWASs for these traits. These are among the strongest associations for multiple phenotypes, including a coding CTG repeat associated with apolipoprotein B levels, a promoter CGG repeat with platelet traits, and an intronic poly(A) repeat with mean platelet volume. Our study suggests that STRs make widespread contributions to complex traits, provides stringently selected candidate causal STRs, and demonstrates the need to consider a more complete view of genetic variation in GWASs.

6.
bioRxiv ; 2023 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-37961319

RESUMEN

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples. We then curate variants from the GIAB HG002 individual to create a tandem repeat benchmark. We also present a variant comparison method that handles small and large alleles and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ∼24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 TR benchmark. We work with the GIAB community to demonstrate the utility of this benchmark across short and long read technologies.

7.
Nat Commun ; 14(1): 6711, 2023 10 23.
Artículo en Inglés | MEDLINE | ID: mdl-37872149

RESUMEN

Tandem repeats (TRs) represent one of the largest sources of genetic variation in humans and are implicated in a range of phenotypes. Here we present a deep characterization of TR variation based on high coverage whole genome sequencing from 3550 diverse individuals from the 1000 Genomes Project and H3Africa cohorts. We develop a method, EnsembleTR, to integrate genotypes from four separate methods resulting in high-quality genotypes at more than 1.7 million TR loci. Our catalog reveals novel sequence features influencing TR heterozygosity, identifies population-specific trinucleotide expansions, and finds hundreds of novel eQTL signals. Finally, we generate a phased haplotype panel which can be used to impute most TRs from nearby single nucleotide polymorphisms (SNPs) with high accuracy. Overall, the TR genotypes and reference haplotype panel generated here will serve as valuable resources for future genome-wide and population-wide studies of TRs and their role in human phenotypes.


Asunto(s)
Polimorfismo de Nucleótido Simple , Secuencias Repetidas en Tándem , Humanos , Genotipo , Secuenciación Completa del Genoma
8.
J Mol Biol ; 435(20): 168260, 2023 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-37678708

RESUMEN

Short tandem repeats (STRs) are consecutive repetitions of one to six nucleotide motifs. They are hypervariable due to the high prevalence of repeat unit insertions or deletions primarily caused by polymerase slippage during replication. Genetic variation at STRs has been shown to influence a range of traits in humans, including gene expression, cancer risk, and autism. Until recently STRs have been poorly studied since they pose significant challenges to bioinformatics analyses. Moreover, genome-wide analysis of STR variation in population-scale cohorts requires large amounts of data and computational resources. However, the recent advent of genome-wide analysis tools has resulted in multiple large genome-wide datasets of STR variation spanning nearly two million genomic loci in thousands of individuals from diverse populations. Here we present WebSTR, a database of genetic variation and other characteristics of genome-wide STRs across human populations. WebSTR is based on reference panels of more than 1.7 million human STRs created with state of the art repeat annotation methods and can easily be extended to include additional cohorts or species. It currently contains data based on STR genotypes for individuals from the 1000 Genomes Project, H3Africa, the Genotype-Tissue Expression (GTEx) Project and colorectal cancer patients from the TCGA dataset. WebSTR is implemented as a relational database with programmatic access available through an API and a web portal for browsing data. The web portal is publicly available at https://webstr.ucsd.edu.


Asunto(s)
Bases de Datos Genéticas , Variación Genética , Genoma Humano , Repeticiones de Microsatélite , Humanos , Biología Computacional , Genotipo , Repeticiones de Microsatélite/genética , Estudio de Asociación del Genoma Completo , Conjuntos de Datos como Asunto , Neoplasias Colorrectales/genética
9.
Nature ; 617(7960): 256-258, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37165235

Asunto(s)
Genoma , Genómica , Humanos
10.
Genome Res ; 33(5): 689-702, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-37127331

RESUMEN

Short tandem repeats (STRs) are a class of rapidly mutating genetic elements typically characterized by repeated units of 1-6 bp. We leveraged whole-genome sequencing data for 152 recombinant inbred (RI) strains from the BXD family of mice to map loci that modulate genome-wide patterns of new mutations arising during parent-to-offspring transmission at STRs. We defined quantitative phenotypes describing the numbers and types of germline STR mutations in each strain and performed quantitative trait locus (QTL) analyses for each of these phenotypes. We identified a locus on Chromosome 13 at which strains inheriting the C57BL/6J (B) haplotype have a higher rate of STR expansions than those inheriting the DBA/2J (D) haplotype. The strongest candidate gene in this locus is Msh3, a known modifier of STR stability in cancer and at pathogenic repeat expansions in mice and humans, as well as a current drug target against Huntington's disease. The D haplotype at this locus harbors a cluster of variants near the 5' end of Msh3, including multiple missense variants near the DNA mismatch recognition domain. In contrast, the B haplotype contains a unique retrotransposon insertion. The rate of expansion covaries positively with Msh3 expression-with higher expression from the B haplotype. Finally, detailed analysis of mutation patterns showed that strains carrying the B allele have higher expansion rates, but slightly lower overall total mutation rates, compared with those with the D allele, particularly at tetranucleotide repeats. Our results suggest an important role for inherited variants in Msh3 in modulating genome-wide patterns of germline mutations at STRs.


Asunto(s)
Repeticiones de Microsatélite , Sitios de Carácter Cuantitativo , Animales , Ratones , Haplotipos , Ratones Endogámicos C57BL , Ratones Endogámicos DBA
11.
bioRxiv ; 2023 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-36945429

RESUMEN

Tandem repeats (TRs) represent one of the largest sources of genetic variation in humans and are implicated in a range of phenotypes. Here we present a deep characterization of TR variation based on high coverage whole genome sequencing from 3,550 diverse individuals from the 1000 Genomes Project and H3Africa cohorts. We develop a method, EnsembleTR, to integrate genotypes from four separate methods resulting in high-quality genotypes at more than 1.7 million TR loci. Our catalog reveals novel sequence features influencing TR heterozygosity, identifies population-specific trinucleotide expansions, and finds hundreds of novel eQTL signals. Finally, we generate a phased haplotype panel which can be used to impute most TRs from nearby single nucleotide polymorphisms (SNPs) with high accuracy. Overall, the TR genotypes and reference haplotype panel generated here will serve as valuable resources for future genome-wide and population-wide studies of TRs and their role in human phenotypes.

12.
Bioinformatics ; 39(3)2023 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-36847450

RESUMEN

SUMMARY: Leveraging local ancestry and haplotype information in genome-wide association studies and downstream analyses can improve the utility of genomics for individuals from diverse and recently admixed ancestries. However, most existing simulation, visualization and variant analysis frameworks are based on variant-level analysis and do not automatically handle these features. We present haptools, an open-source toolkit for performing local ancestry aware and haplotype-based analysis of complex traits. Haptools supports fast simulation of admixed genomes, visualization of admixture tracks, simulation of haplotype- and local ancestry-specific phenotype effects and a variety of file operations and statistics computed in a haplotype-aware manner. AVAILABILITY AND IMPLEMENTATION: Haptools is freely available at https://github.com/cast-genomics/haptools. DOCUMENTATION: Detailed documentation is available at https://haptools.readthedocs.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Programas Informáticos , Haplotipos , Genómica , Genoma
13.
Bioinform Adv ; 3(1): vbad002, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36726730

RESUMEN

Motivation: Previous studies have shown that the heritability of multiple brain-related traits and disorders is highly enriched in transcriptional enhancer regions. However, these regions often contain many individual variants, while only a subset of them are likely to causally contribute to a trait. Statistical fine-mapping techniques can identify putative causal variants, but their resolution is often limited, especially in regions with multiple variants in high linkage disequilibrium. In these cases, alternative computational methods to estimate the impact of individual variants can aid in variant prioritization. Results: Here, we develop a deep learning pipeline to predict cell-type-specific enhancer activity directly from genomic sequences and quantify the impact of individual genetic variants in these regions. We show that the variants highlighted by our deep learning models are targeted by purifying selection in the human population, likely indicating a functional role. We integrate our deep learning predictions with statistical fine-mapping results for 8 brain-related traits, identifying 63 distinct candidate causal variants predicted to contribute to these traits by modulating enhancer activity, representing 6% of all genome-wide association study signals analyzed. Overall, our study provides a valuable computational method that can prioritize individual variants based on their estimated regulatory impact, but also highlights the limitations of existing methods for variant prioritization and fine-mapping. Availability and implementation: The data underlying this article, nucleotide-level importance scores, and code for running the deep learning pipeline are available at https://github.com/Pandaman-Ryan/AgentBind-brain. Contact: mgymrek@ucsd.edu. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

14.
J Evol Biol ; 36(2): 321-336, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36289560

RESUMEN

Short tandem repeats (STRs) are units of 1-6 bp that repeat in a tandem fashion in DNA. Along with single nucleotide polymorphisms and large structural variations, they are among the major genomic variants underlying genetic, and likely phenotypic, divergence. STRs experience mutation rates that are orders of magnitude higher than other well-studied genotypic variants. Frequent copy number changes result in a wide range of alleles, and provide unique opportunities for modulating complex phenotypes through variation in repeat length. While classical studies have identified key roles of individual STR loci, the advent of improved sequencing technology, high-quality genome assemblies for diverse species, and bioinformatics methods for genome-wide STR analysis now enable more systematic study of STR variation across wide evolutionary ranges. In this review, we explore mutation and selection processes that affect STR copy number evolution, and how these processes give rise to varying STR patterns both within and across species. Finally, we review recent examples of functional and adaptive changes linked to STRs.


Asunto(s)
Genoma , Repeticiones de Microsatélite , Mutación , Genotipo , Fenotipo
15.
Cell Rep ; 41(10): 111761, 2022 12 06.
Artículo en Inglés | MEDLINE | ID: mdl-36476851

RESUMEN

Ewing sarcoma (EwS) is characterized by EWSR1-ETS fusion transcription factors converting polymorphic GGAA microsatellites (mSats) into potent neo-enhancers. Although the paucity of additional mutations makes EwS a genuine model to study principles of cooperation between dominant fusion oncogenes and neo-enhancers, this is impeded by the limited number of well-characterized models. Here we present the Ewing Sarcoma Cell Line Atlas (ESCLA), comprising whole-genome, DNA methylation, transcriptome, proteome, and chromatin immunoprecipitation sequencing (ChIP-seq) data of 18 cell lines with inducible EWSR1-ETS knockdown. The ESCLA shows hundreds of EWSR1-ETS-targets, the nature of EWSR1-ETS-preferred GGAA mSats, and putative indirect modes of EWSR1-ETS-mediated gene regulation, converging in the duality of a specific but plastic EwS signature. We identify heterogeneously regulated EWSR1-ETS-targets as potential prognostic EwS biomarkers. Our freely available ESCLA (http://r2platform.com/escla/) is a rich resource for EwS research and highlights the power of comprehensive datasets to unravel principles of heterogeneous gene regulation by chimeric transcription factors.


Asunto(s)
Sarcoma de Ewing , Humanos , Sarcoma de Ewing/genética , Multiómica , Oncogenes , Línea Celular , Factores de Transcripción
16.
Cell Genom ; 2(3)2022 Mar 09.
Artículo en Inglés | MEDLINE | ID: mdl-35720252

RESUMEN

Mouse substrains are an invaluable model for understanding disease. We compared C57BL/6J, which is the most commonly used inbred mouse strain, with eight C57BL/6 and five C57BL/10 closely related inbred substrains. Whole-genome sequencing and RNA-sequencing analysis yielded 352,631 SNPs, 109,096 indels, 150,344 short tandem repeats (STRs), 3,425 structural variants (SVs), and 2,826 differentially expressed genes (DE genes) among these 14 strains; 312,981 SNPs (89%) distinguished the B6 and B10 lineages. These SNPs were clustered into 28 short segments that are likely due to introgressed haplotypes rather than new mutations. Outside of these introgressed regions, we identified 53 SVs, protein-truncating SNPs, and frameshifting indels that were associated with DE genes. Our results can be used for both forward and reverse genetic approaches and illustrate how introgression and mutational processes give rise to differences among these widely used inbred substrains.

17.
Reprod Sci ; 29(12): 3465-3476, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-35697922

RESUMEN

Racial disparity exists for hypertensive disorders in pregnancy (HDP), which leads to disparate morbidity and mortality worldwide. The enzyme heme oxygenase-1 (HO-1) is encoded by HMOX1, which has genetic polymorphisms in its regulatory region that impact its expression and activity and have been associated with various diseases. However, studies of these genetic variants in HDP have been limited. The objective of this study was to examine HMOX1 as a potential genetic contributor of ancestral disparity seen in HDP. First, the 1000 Genomes Project (1 KG) phase 3 was utilized to compare the frequencies of alleles, genotypes, and estimated haplotypes of guanidine thymidine repeats (GTn; containing rs3074372) and A/T SNP (rs2071746) among females from five ancestral populations (Africa, the Americas, Europe, East Asia, and South Asia, N = 1271). Then, using genomic DNA from women with a history of HDP, we explored the possibility of HMOX1 variants predisposing women to HDP (N = 178) compared with an equivalent ancestral group from 1 KG (N = 263). Both HMOX1 variants were distributed differently across ancestries, with African women having a distinct distribution and an overall higher prevalence of the variants previously associated with lower HO-1 expression. The two HMOX1 variants display linkage disequilibrium in all but the African group, and within EUR cohort, LL and AA individuals have a higher prevalence in HDP. HMOX1 variants demonstrate ancestral differences that may contribute to racial disparity in HDP. Understanding maternal genetic contribution to HDP will help improve prediction and facilitate personalized approaches to care for HDP.


Asunto(s)
Hemo-Oxigenasa 1 , Hipertensión Inducida en el Embarazo , Embarazo , Humanos , Femenino , Hemo-Oxigenasa 1/genética , Polimorfismo Genético , Haplotipos , Alelos
18.
Science ; 373(6562): 1440-1441, 2021 Sep 24.
Artículo en Inglés | MEDLINE | ID: mdl-34554784

RESUMEN

Unexplored variable number tandem repeats make a large contribution to complex traits.

19.
Sci Adv ; 7(25)2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-34134993

RESUMEN

Mechanisms by which noncoding genetic variation influences gene expression remain only partially understood but are considered to be major determinants of phenotypic diversity and disease risk. Here, we evaluated effects of >50 million single-nucleotide polymorphisms and short insertions/deletions provided by five inbred strains of mice on the responses of macrophages to interleukin-4 (IL-4), a cytokine that plays pleiotropic roles in immunity and tissue homeostasis. Of >600 genes induced >2-fold by IL-4 across the five strains, only 26 genes reached this threshold in all strains. By applying deep learning and motif mutation analyses to epigenetic data for macrophages from each strain, we identified the dominant combinations of lineage-determining and signal-dependent transcription factors driving IL-4 enhancer activation. These studies further revealed mechanisms by which noncoding genetic variation influences absolute levels of enhancer activity and their dynamic responses to IL-4, thereby contributing to strain-differential patterns of gene expression and phenotypic diversity.


Asunto(s)
Interleucina-4 , Macrófagos , Animales , Elementos de Facilitación Genéticos , Interleucina-4/genética , Interleucina-4/metabolismo , Macrófagos/metabolismo , Ratones , Ratones Endogámicos C57BL , Factores de Transcripción/metabolismo
20.
Cell Rep Med ; 2(4): 100250, 2021 04 20.
Artículo en Inglés | MEDLINE | ID: mdl-33948580

RESUMEN

Genome-wide association studies (GWASs) are instrumental in identifying loci harboring common single-nucleotide variants (SNVs) that affect human traits and diseases. GWAS hits emerge in clusters, but the focus is often on the most significant hit in each trait- or disease-associated locus. The remaining hits represent SNVs in linkage disequilibrium (LD) and are considered redundant and thus frequently marginally reported or exploited. Here, we interrogate the value of integrating the full set of GWAS hits in a locus repeatedly associated with cardiac conduction traits and arrhythmia, SCN5A-SCN10A. Our analysis reveals 5 common 7-SNV haplotypes (Hap1-5) with 2 combinations associated with life-threatening arrhythmia-Brugada syndrome (the risk Hap1/1 and protective Hap2/3 genotypes). Hap1 and Hap2 share 3 SNVs; thus, this analysis suggests that assuming redundancy among clustered GWAS hits can lead to confounding disease-risk associations and supports the need to deconstruct GWAS data in the context of haplotype composition.


Asunto(s)
Síndrome de Brugada/genética , Predisposición Genética a la Enfermedad/genética , Desequilibrio de Ligamiento/genética , Polimorfismo de Nucleótido Simple/genética , Adulto , Síndrome de Brugada/diagnóstico , Pruebas Genéticas/métodos , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Haplotipos/genética , Humanos , Persona de Mediana Edad , Fenotipo , Sitios de Carácter Cuantitativo/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...