Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
1.
Am J Hum Genet ; 101(5): 700-715, 2017 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-29100084

RESUMO

Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the genome by next-generation sequencing (NGS). In particular, accurate detection of pathological STR expansion is limited by the sequence read length during whole-genome analysis. We developed TREDPARSE, a software package that incorporates various cues from read alignment and paired-end distance distribution, as well as a sequence stutter model, in a probabilistic framework to infer repeat sizes for genetic loci, and we used this software to infer repeat sizes for 30 known disease loci. Using simulated data, we show that TREDPARSE outperforms other available software. We sampled the full genome sequences of 12,632 individuals to an average read depth of approximately 30× to 40× with Illumina HiSeq X. We identified 138 individuals with risk alleles at 15 STR disease loci. We validated a representative subset of the samples (n = 19) by Sanger and by Oxford Nanopore sequencing. Additionally, we validated the STR calls against known allele sizes in a set of GeT-RM reference cell-line materials (n = 6). Several STR loci that are entirely guanine or cytosines (G or C) have insufficient read evidence for inference and therefore could not be assayed precisely by TREDPARSE. TREDPARSE extends the limit of STR size detection beyond the physical sequence read length. This extension is critical because many of the disease risk cutoffs are close to or beyond the short sequence read length of 100 to 150 bases.


Assuntos
Genoma Humano/genética , Repetições de Microssatélites/genética , Adolescente , Adulto , Alelos , Criança , Feminino , Genética Populacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo Genético/genética , Análise de Sequência de DNA/métodos , Software
2.
Proc Natl Acad Sci U S A ; 114(38): 10166-10171, 2017 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-28874526

RESUMO

Prediction of human physical traits and demographic information from genomic data challenges privacy and data deidentification in personalized medicine. To explore the current capabilities of phenotype-based genomic identification, we applied whole-genome sequencing, detailed phenotyping, and statistical modeling to predict biometric traits in a cohort of 1,061 participants of diverse ancestry. Individually, for a large fraction of the traits, their predictive accuracy beyond ancestry and demographic information is limited. However, we have developed a maximum entropy algorithm that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person. Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans. This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.


Assuntos
Confidencialidade , Impressões Digitais de DNA , Modelos Genéticos , Fenótipo , Sequenciamento Completo do Genoma , Adulto , Fatores Etários , Algoritmos , Tamanho Corporal , Estudos de Coortes , Anonimização de Dados , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Pigmentação/genética , Adulto Jovem
3.
Proc Natl Acad Sci U S A ; 113(42): 11901-11906, 2016 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-27702888

RESUMO

We report on the sequencing of 10,545 human genomes at 30×-40× coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high-confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present the distribution of over 150 million single-nucleotide variants in the coding and noncoding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries on average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct high-resolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use.


Assuntos
Genoma Humano , Genômica , Sequenciamento Completo do Genoma , Mapeamento Cromossômico , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Predisposição Genética para Doença , Variação Genética , Genômica/métodos , Humanos , Fases de Leitura Aberta , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Regiões não Traduzidas
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa