Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 61
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 170(1): 199-212.e20, 2017 Jun 29.
Artículo en Inglés | MEDLINE | ID: mdl-28666119

RESUMEN

Type 2 diabetes (T2D) affects Latinos at twice the rate seen in populations of European descent. We recently identified a risk haplotype spanning SLC16A11 that explains ∼20% of the increased T2D prevalence in Mexico. Here, through genetic fine-mapping, we define a set of tightly linked variants likely to contain the causal allele(s). We show that variants on the T2D-associated haplotype have two distinct effects: (1) decreasing SLC16A11 expression in liver and (2) disrupting a key interaction with basigin, thereby reducing cell-surface localization. Both independent mechanisms reduce SLC16A11 function and suggest SLC16A11 is the causal gene at this locus. To gain insight into how SLC16A11 disruption impacts T2D risk, we demonstrate that SLC16A11 is a proton-coupled monocarboxylate transporter and that genetic perturbation of SLC16A11 induces changes in fatty acid and lipid metabolism that are associated with increased T2D risk. Our findings suggest that increasing SLC16A11 function could be therapeutically beneficial for T2D. VIDEO ABSTRACT.


Asunto(s)
Diabetes Mellitus Tipo 2/metabolismo , Transportadores de Ácidos Monocarboxílicos/genética , Transportadores de Ácidos Monocarboxílicos/metabolismo , Basigina/metabolismo , Membrana Celular/metabolismo , Cromosomas Humanos Par 17/metabolismo , Técnicas de Silenciamiento del Gen , Haplotipos , Hepatocitos/metabolismo , Heterocigoto , Código de Histonas , Humanos , Hígado/metabolismo , Modelos Moleculares , Transportadores de Ácidos Monocarboxílicos/química
2.
Cell ; 151(6): 1185-99, 2012 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-23217706

RESUMEN

Reprogramming of cellular metabolism is a key event during tumorigenesis. Despite being known for decades (Warburg effect), the molecular mechanisms regulating this switch remained unexplored. Here, we identify SIRT6 as a tumor suppressor that regulates aerobic glycolysis in cancer cells. Importantly, loss of SIRT6 leads to tumor formation without activation of known oncogenes, whereas transformed SIRT6-deficient cells display increased glycolysis and tumor growth, suggesting that SIRT6 plays a role in both establishment and maintenance of cancer. By using a conditional SIRT6 allele, we show that SIRT6 deletion in vivo increases the number, size, and aggressiveness of tumors. SIRT6 also functions as a regulator of ribosome metabolism by corepressing MYC transcriptional activity. Lastly, Sirt6 is selectively downregulated in several human cancers, and expression levels of SIRT6 predict prognosis and tumor-free survival rates, highlighting SIRT6 as a critical modulator of cancer metabolism. Our studies reveal SIRT6 to be a potent tumor suppressor acting to suppress cancer metabolism.


Asunto(s)
Neoplasias/metabolismo , Sirtuinas/metabolismo , Animales , Proliferación Celular , Regulación hacia Abajo , Fibroblastos/metabolismo , Técnicas de Inactivación de Genes , Glucólisis , Humanos , Ratones , Ratones Desnudos , Ratones SCID , Trasplante de Neoplasias , Proteínas Proto-Oncogénicas c-myc/metabolismo , Sirtuinas/genética , Transcripción Genética , Trasplante Heterólogo , Proteínas Supresoras de Tumor/genética
3.
Nature ; 589(7841): 246-250, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33442040

RESUMEN

Autism spectrum disorder (ASD) is an early-onset developmental disorder characterized by deficits in communication and social interaction and restrictive or repetitive behaviours1,2. Family studies demonstrate that ASD has a substantial genetic basis with contributions both from inherited and de novo variants3,4. It has been estimated that de novo mutations may contribute to 30% of all simplex cases, in which only a single child is affected per family5. Tandem repeats (TRs), defined here as sequences of 1 to 20 base pairs in size repeated consecutively, comprise one of the major sources of de novo mutations in humans6. TR expansions are implicated in dozens of neurological and psychiatric disorders7. Yet, de novo TR mutations have not been characterized on a genome-wide scale, and their contribution to ASD remains unexplored. Here we develop new bioinformatics methods for identifying and prioritizing de novo TR mutations from sequencing data and perform a genome-wide characterization of de novo TR mutations in ASD-affected probands and unaffected siblings. We infer specific mutation events and their precise changes in repeat number, and primarily focus on more prevalent stepwise copy number changes rather than large expansions. Our results demonstrate a significant genome-wide excess of TR mutations in ASD probands. Mutations in probands tend to be larger, enriched in fetal brain regulatory regions, and are predicted to be more evolutionarily deleterious. Overall, our results highlight the importance of considering repeat variants in future studies of de novo mutations.


Asunto(s)
Trastorno del Espectro Autista/genética , Expansión de las Repeticiones de ADN/genética , Predisposición Genética a la Enfermedad , Adolescente , Adulto , Trastorno del Espectro Autista/patología , Encéfalo/metabolismo , Niño , Variaciones en el Número de Copia de ADN/genética , Femenino , Feto/metabolismo , Mutación de Línea Germinal/genética , Humanos , Análisis de los Mínimos Cuadrados , Masculino , Persona de Mediana Edad , Edad Paterna , Adulto Joven
4.
Genome Res ; 33(5): 689-702, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-37127331

RESUMEN

Short tandem repeats (STRs) are a class of rapidly mutating genetic elements typically characterized by repeated units of 1-6 bp. We leveraged whole-genome sequencing data for 152 recombinant inbred (RI) strains from the BXD family of mice to map loci that modulate genome-wide patterns of new mutations arising during parent-to-offspring transmission at STRs. We defined quantitative phenotypes describing the numbers and types of germline STR mutations in each strain and performed quantitative trait locus (QTL) analyses for each of these phenotypes. We identified a locus on Chromosome 13 at which strains inheriting the C57BL/6J (B) haplotype have a higher rate of STR expansions than those inheriting the DBA/2J (D) haplotype. The strongest candidate gene in this locus is Msh3, a known modifier of STR stability in cancer and at pathogenic repeat expansions in mice and humans, as well as a current drug target against Huntington's disease. The D haplotype at this locus harbors a cluster of variants near the 5' end of Msh3, including multiple missense variants near the DNA mismatch recognition domain. In contrast, the B haplotype contains a unique retrotransposon insertion. The rate of expansion covaries positively with Msh3 expression-with higher expression from the B haplotype. Finally, detailed analysis of mutation patterns showed that strains carrying the B allele have higher expansion rates, but slightly lower overall total mutation rates, compared with those with the D allele, particularly at tetranucleotide repeats. Our results suggest an important role for inherited variants in Msh3 in modulating genome-wide patterns of germline mutations at STRs.


Asunto(s)
Repeticiones de Microsatélite , Sitios de Carácter Cuantitativo , Animales , Ratones , Haplotipos , Ratones Endogámicos C57BL , Ratones Endogámicos DBA
5.
Cell ; 147(7): 1628-39, 2011 Dec 23.
Artículo en Inglés | MEDLINE | ID: mdl-22196736

RESUMEN

Hundreds of chromatin regulators (CRs) control chromatin structure and function by catalyzing and binding histone modifications, yet the rules governing these key processes remain obscure. Here, we present a systematic approach to infer CR function. We developed ChIP-string, a meso-scale assay that combines chromatin immunoprecipitation with a signature readout of 487 representative loci. We applied ChIP-string to screen 145 antibodies, thereby identifying effective reagents, which we used to map the genome-wide binding of 29 CRs in two cell types. We found that specific combinations of CRs colocalize in characteristic patterns at distinct chromatin environments, at genes of coherent functions, and at distal regulatory elements. When comparing between cell types, CRs redistribute to different loci but maintain their modular and combinatorial associations. Our work provides a multiplex method that substantially enhances the ability to monitor CR binding, presents a large resource of CR maps, and reveals common principles for combinatorial CR function.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Cromatina/metabolismo , Genómica/métodos , Código de Histonas , Cromatina/química , Ensamble y Desensamble de Cromatina , Células Madre Embrionarias , Genoma , Humanos , Células K562
7.
Bioinformatics ; 39(3)2023 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-36847450

RESUMEN

SUMMARY: Leveraging local ancestry and haplotype information in genome-wide association studies and downstream analyses can improve the utility of genomics for individuals from diverse and recently admixed ancestries. However, most existing simulation, visualization and variant analysis frameworks are based on variant-level analysis and do not automatically handle these features. We present haptools, an open-source toolkit for performing local ancestry aware and haplotype-based analysis of complex traits. Haptools supports fast simulation of admixed genomes, visualization of admixture tracks, simulation of haplotype- and local ancestry-specific phenotype effects and a variety of file operations and statistics computed in a haplotype-aware manner. AVAILABILITY AND IMPLEMENTATION: Haptools is freely available at https://github.com/cast-genomics/haptools. DOCUMENTATION: Detailed documentation is available at https://haptools.readthedocs.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Programas Informáticos , Haplotipos , Genómica , Genoma
8.
Nature ; 617(7960): 256-258, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37165235

Asunto(s)
Genoma , Genómica , Humanos
9.
J Evol Biol ; 36(2): 321-336, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36289560

RESUMEN

Short tandem repeats (STRs) are units of 1-6 bp that repeat in a tandem fashion in DNA. Along with single nucleotide polymorphisms and large structural variations, they are among the major genomic variants underlying genetic, and likely phenotypic, divergence. STRs experience mutation rates that are orders of magnitude higher than other well-studied genotypic variants. Frequent copy number changes result in a wide range of alleles, and provide unique opportunities for modulating complex phenotypes through variation in repeat length. While classical studies have identified key roles of individual STR loci, the advent of improved sequencing technology, high-quality genome assemblies for diverse species, and bioinformatics methods for genome-wide STR analysis now enable more systematic study of STR variation across wide evolutionary ranges. In this review, we explore mutation and selection processes that affect STR copy number evolution, and how these processes give rise to varying STR patterns both within and across species. Finally, we review recent examples of functional and adaptive changes linked to STRs.


Asunto(s)
Genoma , Repeticiones de Microsatélite , Mutación , Genotipo , Fenotipo
10.
Bioinformatics ; 37(5): 731-733, 2021 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-32805020

RESUMEN

SUMMARY: A rich set of tools have recently been developed for performing genome-wide genotyping of tandem repeats (TRs). However, standardized tools for downstream analysis of these results are lacking. To facilitate TR analysis applications, we present TRTools, a Python library and suite of command line tools for filtering, merging and quality control of TR genotype files. TRTools utilizes an internal harmonization module, making it compatible with outputs from a wide range of TR genotypers. AVAILABILITY AND IMPLEMENTATION: TRTools is freely available at https://github.com/gymreklab/TRTools. Detailed documentation is available at https://trtools.readthedocs.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Secuencias Repetidas en Tándem , Documentación , Biblioteca de Genes , Genotipo
11.
Nature ; 538(7624): 201-206, 2016 Oct 13.
Artículo en Inglés | MEDLINE | ID: mdl-27654912

RESUMEN

Here we report the Simons Genome Diversity Project data set: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioural modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that of other non-Africans.


Asunto(s)
Variación Genética/genética , Genoma Humano/genética , Genómica , Tasa de Mutación , Filogenia , Grupos Raciales/genética , Animales , Australia , Población Negra/genética , Conjuntos de Datos como Asunto , Genética de Población , Historia Antigua , Migración Humana/historia , Humanos , Nativos de Hawái y Otras Islas del Pacífico/genética , Hombre de Neandertal/genética , Nueva Guinea , Análisis de Secuencia de ADN , Especificidad de la Especie , Factores de Tiempo
12.
BMC Bioinformatics ; 22(1): 201, 2021 Apr 20.
Artículo en Inglés | MEDLINE | ID: mdl-33879052

RESUMEN

BACKGROUND: A major challenge in evaluating quantitative ChIP-seq analyses, such as peak calling and differential binding, is a lack of reliable ground truth data. Accurate simulation of ChIP-seq data can mitigate this challenge, but existing frameworks are either too cumbersome to apply genome-wide or unable to model a number of important experimental conditions in ChIP-seq. RESULTS: We present ChIPs, a toolkit for rapidly simulating ChIP-seq data using statistical models of key experimental steps. We demonstrate how ChIPs can be used for a range of applications, including benchmarking analysis tools and evaluating the impact of various experimental parameters. ChIPs is implemented as a standalone command-line program written in C++ and is available from https://github.com/gymreklab/chips . CONCLUSIONS: ChIPs is an efficient ChIP-seq simulation framework that generates realistic datasets over a flexible range of experimental conditions. It can serve as an important component in various ChIP-seq analyses where ground truth data are needed.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Programas Informáticos , Simulación por Computador , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Modelos Estadísticos , Análisis de Secuencia de ADN
13.
Genome Res ; 28(11): 1709-1719, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30352806

RESUMEN

Whole-genome sequencing is increasingly used to identify Mendelian variants in clinical pipelines. These pipelines focus on single-nucleotide variants (SNVs) and also structural variants, while ignoring more complex repeat sequence variants. Here, we consider the problem of genotyping Variable Number Tandem Repeats (VNTRs), composed of inexact tandem duplications of short (6-100 bp) repeating units. VNTRs span 3% of the human genome, are frequently present in coding regions, and have been implicated in multiple Mendelian disorders. Although existing tools recognize VNTR carrying sequence, genotyping VNTRs (determining repeat unit count and sequence variation) from whole-genome sequencing reads remains challenging. We describe a method, adVNTR, that uses hidden Markov models to model each VNTR, count repeat units, and detect sequence variation. adVNTR models can be developed for short-read (Illumina) and single-molecule (Pacific Biosciences [PacBio]) whole-genome and whole-exome sequencing, and show good results on multiple simulated and real data sets.


Asunto(s)
Técnicas de Genotipaje/métodos , Repeticiones de Minisatélite , Genoma Humano , Humanos , Cadenas de Markov , Polimorfismo Genético
14.
Nucleic Acids Res ; 47(15): e90, 2019 09 05.
Artículo en Inglés | MEDLINE | ID: mdl-31194863

RESUMEN

Tandem repeat (TR) expansions have been implicated in dozens of genetic diseases, including Huntington's Disease, Fragile X Syndrome, and hereditary ataxias. Furthermore, TRs have recently been implicated in a range of complex traits, including gene expression and cancer risk. While the human genome harbors hundreds of thousands of TRs, analysis of TR expansions has been mainly limited to known pathogenic loci. A major challenge is that expanded repeats are beyond the read length of most next-generation sequencing (NGS) datasets and are not profiled by existing genome-wide tools. We present GangSTR, a novel algorithm for genome-wide genotyping of both short and expanded TRs. GangSTR extracts information from paired-end reads into a unified model to estimate maximum likelihood TR lengths. We validate GangSTR on real and simulated data and show that GangSTR outperforms alternative methods in both accuracy and speed. We apply GangSTR to a deeply sequenced trio to profile the landscape of TR expansions in a healthy family and validate novel expansions using orthogonal technologies. Our analysis reveals that healthy individuals harbor dozens of long TR alleles not captured by current genome-wide methods. GangSTR will likely enable discovery of novel disease-associated variants not currently accessible from NGS.


Asunto(s)
Expansión de las Repeticiones de ADN , Genoma Humano , Repeticiones de Microsatélite , Análisis de Secuencia de ADN/estadística & datos numéricos , Programas Informáticos , Algoritmos , Secuencia de Bases , Conjuntos de Datos como Asunto , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Funciones de Verosimilitud , Alineación de Secuencia
15.
Nat Methods ; 14(6): 590-592, 2017 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-28436466

RESUMEN

Short tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases, population genetics applications, and forensic casework. However, it has proven problematic to genotype STRs from high-throughput sequencing data. Here, we describe HipSTR, a novel haplotype-based method for robustly genotyping and phasing STRs from Illumina sequencing data, and we report a genome-wide analysis and validation of de novo STR mutations. HipSTR is freely available at https://hipstr-tool.github.io/HipSTR.


Asunto(s)
Mapeo Cromosómico/métodos , Dermatoglifia del ADN/métodos , Predisposición Genética a la Enfermedad/genética , Variación Genética/genética , Genoma Humano/genética , Repeticiones de Microsatélite/genética , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Alineación de Secuencia , Análisis de Secuencia de ADN , Programas Informáticos
16.
Am J Hum Genet ; 98(5): 919-933, 2016 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-27126583

RESUMEN

Short tandem repeats (STRs) are mutation-prone loci that span nearly 1% of the human genome. Previous studies have estimated the mutation rates of highly polymorphic STRs by using capillary electrophoresis and pedigree-based designs. Although this work has provided insights into the mutational dynamics of highly mutable STRs, the mutation rates of most others remain unknown. Here, we harnessed whole-genome sequencing data to estimate the mutation rates of Y chromosome STRs (Y-STRs) with 2-6 bp repeat units that are accessible to Illumina sequencing. We genotyped 4,500 Y-STRs by using data from the 1000 Genomes Project and the Simons Genome Diversity Project. Next, we developed MUTEA, an algorithm that infers STR mutation rates from population-scale data by using a high-resolution SNP-based phylogeny. After extensive intrinsic and extrinsic validations, we harnessed MUTEA to derive mutation-rate estimates for 702 polymorphic STRs by tracing each locus over 222,000 meioses, resulting in the largest collection of Y-STR mutation rates to date. Using our estimates, we identified determinants of STR mutation rates and built a model to predict rates for STRs across the genome. These predictions indicate that the load of de novo STR mutations is at least 75 mutations per generation, rivaling the load of all other known variant types. Finally, we identified Y-STRs with potential applications in forensics and genetic genealogy, assessed the ability to differentiate between the Y chromosomes of father-son pairs, and imputed Y-STR genotypes.


Asunto(s)
Cromosomas Humanos Y/genética , Genoma Humano , Haplotipos/genética , Repeticiones de Microsatélite/genética , Tasa de Mutación , Mutación/genética , Genotipo , Humanos , Masculino
17.
Nucleic Acids Res ; 44(8): 3750-62, 2016 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-27060133

RESUMEN

Despite representing an important source of genetic variation, tandem repeats (TRs) remain poorly studied due to technical difficulties. We hypothesized that TRs can operate as expression (eQTLs) and methylation (mQTLs) quantitative trait loci. To test this we analyzed the effect of variation at 4849 promoter-associated TRs, genotyped in 120 individuals, on neighboring gene expression and DNA methylation. Polymorphic promoter TRs were associated with increased variance in local gene expression and DNA methylation, suggesting functional consequences related to TR variation. We identified >100 TRs associated with expression/methylation levels of adjacent genes. These potential eQTL/mQTL TRs were enriched for overlaps with transcription factor binding and DNaseI hypersensitivity sites, providing a rationale for their effects. Moreover, we showed that most TR variants are poorly tagged by nearby single nucleotide polymorphisms (SNPs) markers, indicating that many functional TR variants are not effectively assayed by SNP-based approaches. Our study assigns biological significance to TR variations in the human genome, and suggests that a significant fraction of TR variations exert functional effects via alterations of local gene expression or epigenetics. We conclude that targeted studies that focus on genotyping TR variants are required to fully ascertain functional variation in the genome.


Asunto(s)
Metilación de ADN , Regulación de la Expresión Génica , Polimorfismo de Nucleótido Simple , Regiones Promotoras Genéticas , Secuencias Repetidas en Tándem , Técnicas de Genotipaje , Humanos , Desequilibrio de Ligamiento , Sitios de Carácter Cuantitativo , Análisis de Secuencia de ADN
18.
Genome Res ; 24(11): 1894-904, 2014 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-25135957

RESUMEN

Short tandem repeats are among the most polymorphic loci in the human genome. These loci play a role in the etiology of a range of genetic diseases and have been frequently utilized in forensics, population genetics, and genetic genealogy. Despite this plethora of applications, little is known about the variation of most STRs in the human population. Here, we report the largest-scale analysis of human STR variation to date. We collected information for nearly 700,000 STR loci across more than 1000 individuals in Phase 1 of the 1000 Genomes Project. Extensive quality controls show that reliable allelic spectra can be obtained for close to 90% of the STR loci in the genome. We utilize this call set to analyze determinants of STR variation, assess the human reference genome's representation of STR alleles, find STR loci with common loss-of-function alleles, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs. Overall, these analyses further elucidate the scale of genetic variation beyond classical point mutations.


Asunto(s)
Genética de Población/métodos , Genoma Humano/genética , Repeticiones de Microsatélite/genética , Polimorfismo de Nucleótido Simple , Alelos , Frecuencia de los Genes , Variación Genética , Genómica/métodos , Genotipo , Humanos , Desequilibrio de Ligamiento
19.
Nucleic Acids Res ; 42(Database issue): D726-30, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24150937

RESUMEN

Living organisms change their proteome dramatically to sustain a stable internal milieu in fluctuating environments. To study the dynamics of proteins during stress, we measured the localization and abundance of the Saccharomyces cerevisiae proteome under various growth conditions and genetic backgrounds using the GFP collection. We created a database (DB) called 'LoQAtE' (Localizaiton and Quantitation Atlas of the yeast proteomE), available online at http://www.weizmann.ac.il/molgen/loqate/, to provide easy access to these data. Using LoQAtE DB, users can get a profile of changes for proteins of interest as well as querying advanced intersections by either abundance changes, primary localization or localization shifts over the tested conditions. Currently, the DB hosts information on 5330 yeast proteins under three external perturbations (DTT, H2O2 and nitrogen starvation) and two genetic mutations [in the chaperonin containing TCP1 (CCT) complex and in the proteasome]. Additional conditions will be uploaded regularly. The data demonstrate hundreds of localization and abundance changes, many of which were not detected at the level of mRNA. LoQAtE is designed to allow easy navigation for non-experts in high-content microscopy and data are available for download. These data should open up new perspectives on the significant role of proteins while combating external and internal fluctuations.


Asunto(s)
Bases de Datos de Proteínas , Proteoma/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Redes Reguladoras de Genes , Internet , Proteoma/análisis , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/análisis
20.
Genome Res ; 22(6): 1154-62, 2012 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-22522390

RESUMEN

Short tandem repeats (STRs) have a wide range of applications, including medical genetics, forensics, and genetic genealogy. High-throughput sequencing (HTS) has the potential to profile hundreds of thousands of STR loci. However, mainstream bioinformatics pipelines are inadequate for the task. These pipelines treat STR mapping as gapped alignment, which results in cumbersome processing times and a biased sampling of STR alleles. Here, we present lobSTR, a novel method for profiling STRs in personal genomes. lobSTR harnesses concepts from signal processing and statistical learning to avoid gapped alignment and to address the specific noise patterns in STR calling. The speed and reliability of lobSTR exceed the performance of current mainstream algorithms for STR profiling. We validated lobSTR's accuracy by measuring its consistency in calling STRs from whole-genome sequencing of two biological replicates from the same individual, by tracing Mendelian inheritance patterns in STR alleles in whole-genome sequencing of a HapMap trio, and by comparing lobSTR results to traditional molecular techniques. Encouraged by the speed and accuracy of lobSTR, we used the algorithm to conduct a comprehensive survey of STR variations in a deeply sequenced personal genome. We traced the mutation dynamics of close to 100,000 STR loci and observed more than 50,000 STR variations in a single genome. lobSTR's implementation is an end-to-end solution. The package accepts raw sequencing reads and provides the user with the genotyping results. It is written in C/C++, includes multi-threading capabilities, and is compatible with the BAM format.


Asunto(s)
Genoma Humano , Genómica/métodos , Repeticiones de Microsatélite , Programas Informáticos , Algoritmos , Electroforesis/métodos , Femenino , Variación Genética , Proyecto Mapa de Haplotipos , Humanos , Masculino , Linaje , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA