Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 57
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Am J Hum Genet ; 102(5): 731-743, 2018 05 03.
Artículo en Inglés | MEDLINE | ID: mdl-29706352

RESUMEN

Large-scale, population-based genomic studies have provided a context for modern medical genetics. Among such studies, however, African populations have remained relatively underrepresented. The breadth of genetic diversity across the African continent argues for an exploration of local genomic context to facilitate burgeoning disease mapping studies in Africa. We sought to characterize genetic variation and to assess population substructure within a cohort of HIV-positive children from Botswana-a Southern African country that is regionally underrepresented in genomic databases. Using whole-exome sequencing data from 164 Batswana and comparisons with 150 similarly sequenced HIV-positive Ugandan children, we found that 13%-25% of variation observed among Batswana was not captured by public databases. Uncaptured variants were significantly enriched (p = 2.2 × 10-16) for coding variants with minor allele frequencies between 1% and 5% and included predicted-damaging non-synonymous variants. Among variants found in public databases, corresponding allele frequencies varied widely, with Botswana having significantly higher allele frequencies among rare (<1%) pathogenic and damaging variants. Batswana clustered with other Southern African populations, but distinctly from 1000 Genomes African populations, and had limited evidence for admixture with extra-continental ancestries. We also observed a surprising lack of genetic substructure in Botswana, despite multiple tribal ethnicities and language groups, alongside a higher degree of relatedness than purported founder populations from the 1000 Genomes project. Our observations reveal a complex, but distinct, ancestral history and genomic architecture among Batswana and suggest that disease mapping within similar Southern African populations will require a deeper repository of genetic variation and allelic dependencies than presently exists.


Asunto(s)
Población Negra/genética , Secuenciación del Exoma , Variación Genética , Botswana , Estudios de Cohortes , Pool de Genes , Genética de Población , Genoma Humano , Geografía , Humanos , Filogenia , Análisis de Componente Principal
2.
Mol Psychiatry ; 25(2): 476-490, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-31673123

RESUMEN

Tourette syndrome (TS) is a childhood-onset neuropsychiatric disorder characterized by repetitive motor movements and vocal tics. The clinical manifestations of TS are complex and often overlap with other neuropsychiatric disorders. TS is highly heritable; however, the underlying genetic basis and molecular and neuronal mechanisms of TS remain largely unknown. We performed whole-exome sequencing of a hundred trios (probands and their parents) with detailed records of their clinical presentations and identified a risk gene, ASH1L, that was both de novo mutated and associated with TS based on a transmission disequilibrium test. As a replication, we performed follow-up targeted sequencing of ASH1L in additional 524 unrelated TS samples and replicated the association (P value = 0.001). The point mutations in ASH1L cause defects in its enzymatic activity. Therefore, we established a transgenic mouse line and performed an array of anatomical, behavioral, and functional assays to investigate ASH1L function. The Ash1l+/- mice manifested tic-like behaviors and compulsive behaviors that could be rescued by the tic-relieving drug haloperidol. We also found that Ash1l disruption leads to hyper-activation and elevated dopamine-releasing events in the dorsal striatum, all of which could explain the neural mechanisms for the behavioral abnormalities in mice. Taken together, our results provide compelling evidence that ASH1L is a TS risk gene.


Asunto(s)
Proteínas de Unión al ADN/genética , N-Metiltransferasa de Histona-Lisina/genética , Síndrome de Tourette/genética , Adolescente , Adulto , Animales , Niño , Preescolar , China , Proteínas de Unión al ADN/metabolismo , Familia , Femenino , Predisposición Genética a la Enfermedad/genética , N-Metiltransferasa de Histona-Lisina/metabolismo , Humanos , Masculino , Ratones , Ratones Transgénicos , Persona de Mediana Edad , Mutación/genética , Padres , Trastornos de Tic/genética , Síndrome de Tourette/complicaciones , Factores de Transcripción/genética , Secuenciación del Exoma/métodos
3.
Optik (Stuttg) ; 241: 167100, 2021 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-33976457

RESUMEN

Since discovered in Hubei, China in December 2019, Corona Virus Disease 2019 named COVID-19 has lasted more than one year, and the number of new confirmed cases and confirmed deaths is still at a high level. COVID-19 is an infectious disease caused by SARS-CoV-2. Although RT-PCR is considered the gold standard for detection of COVID-19, CT plays an important role in the diagnosis and evaluation of the therapeutic effect of COVID-19. Diagnosis and localization of COVID-19 on CT images using deep learning can provide quantitative auxiliary information for doctors. This article proposes a novel network with multi-receptive field attention module to diagnose COVID-19 on CT images. This attention module includes three parts, a pyramid convolution module (PCM), a multi-receptive field spatial attention block (SAB), and a multi-receptive field channel attention block (CAB). The PCM can improve the diagnostic ability of the network for lesions of different sizes and shapes. The role of SAB and CAB is to focus the features extracted from the network on the lesion area to improve the ability of COVID-19 discrimination and localization. We verify the effectiveness of the proposed method on two datasets. The accuracy rate of 97.12%, specificity of 96.89%, and sensitivity of 97.21% are achieved by the proposed network on DTDB dataset provided by the Beijing Ditan Hospital Capital Medical University. Compared with other state-of-the-art attention modules, the proposed method achieves better result. As for the public COVID-19 SARS-CoV-2 dataset, 95.16% for accuracy, 95.6% for F1-score and 99.01% for AUC are obtained. The proposed network can effectively assist doctors in the diagnosis of COVID-19 CT images.

4.
Am J Hum Genet ; 100(2): 205-215, 2017 02 02.
Artículo en Inglés | MEDLINE | ID: mdl-28089252

RESUMEN

Whole-genome sequencing (WGS) allows for a comprehensive view of the sequence of the human genome. We present and apply integrated methodologic steps for interrogating WGS data to characterize the genetic architecture of 10 heart- and blood-related traits in a sample of 1,860 African Americans. In order to evaluate the contribution of regulatory and non-protein coding regions of the genome, we conducted aggregate tests of rare variation across the entire genomic landscape using a sliding window, complemented by an annotation-based assessment of the genome using predefined regulatory elements and within the first intron of all genes. These tests were performed treating all variants equally as well as with individual variants weighted by a measure of predicted functional consequence. Significant findings were assessed in 1,705 individuals of European ancestry. After these steps, we identified and replicated components of the genomic landscape significantly associated with heart- and blood-related traits. For two traits, lipoprotein(a) levels and neutrophil count, aggregate tests of low-frequency and rare variation were significantly associated across multiple motifs. For a third trait, cardiac troponin T, investigation of regulatory domains identified a locus on chromosome 9. These practical approaches for WGS analysis led to the identification of informative genomic regions and also showed that defined non-coding regions, such as first introns of genes and regulatory domains, are associated with important risk factor phenotypes. This study illustrates the tractable nature of WGS data and outlines an approach for characterizing the genetic architecture of complex traits.


Asunto(s)
Negro o Afroamericano/genética , Estudio de Asociación del Genoma Completo , Lipoproteína(a)/genética , Troponina T/genética , Proteína C-Reactiva/metabolismo , HDL-Colesterol/sangre , LDL-Colesterol/sangre , Cromosomas Humanos Par 9/genética , Frecuencia de los Genes , Genoma Humano , Genómica , Hemoglobinas/metabolismo , Humanos , Intrones , Recuento de Leucocitos , Lipoproteína(a)/sangre , Magnesio/sangre , Péptido Natriurético Encefálico/sangre , Péptido Natriurético Encefálico/genética , Neutrófilos/citología , Fragmentos de Péptidos/sangre , Fragmentos de Péptidos/genética , Fósforo/sangre , Recuento de Plaquetas , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Troponina T/sangre , Población Blanca/genética
5.
Genome Res ; 26(12): 1651-1662, 2016 12.
Artículo en Inglés | MEDLINE | ID: mdl-27934697

RESUMEN

Rhesus macaques (Macaca mulatta) are the most widely used nonhuman primate in biomedical research, have the largest natural geographic distribution of any nonhuman primate, and have been the focus of much evolutionary and behavioral investigation. Consequently, rhesus macaques are one of the most thoroughly studied nonhuman primate species. However, little is known about genome-wide genetic variation in this species. A detailed understanding of extant genomic variation among rhesus macaques has implications for the use of this species as a model for studies of human health and disease, as well as for evolutionary population genomics. Whole-genome sequencing analysis of 133 rhesus macaques revealed more than 43.7 million single-nucleotide variants, including thousands predicted to alter protein sequences, transcript splicing, and transcription factor binding sites. Rhesus macaques exhibit 2.5-fold higher overall nucleotide diversity and slightly elevated putative functional variation compared with humans. This functional variation in macaques provides opportunities for analyses of coding and noncoding variation, and its cellular consequences. Despite modestly higher levels of nonsynonymous variation in the macaques, the estimated distribution of fitness effects and the ratio of nonsynonymous to synonymous variants suggest that purifying selection has had stronger effects in rhesus macaques than in humans. Demographic reconstructions indicate this species has experienced a consistently large but fluctuating population size. Overall, the results presented here provide new insights into the population genomics of nonhuman primates and expand genomic information directly relevant to primate models of human disease.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Macaca mulatta/genética , Secuenciación Completa del Genoma/métodos , Animales , Evolución Molecular , Femenino , Aptitud Genética , Macaca mulatta/clasificación , Modelos Animales , Polimorfismo de Nucleótido Simple , Densidad de Población
6.
Genet Med ; 21(9): 1998-2006, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-30828085

RESUMEN

PURPOSE: To assess the clinical performance of an expanded noninvasive prenatal screening (NIPS) test ("NIPS-Plus") for detection of both aneuploidy and genome-wide microdeletion/microduplication syndromes (MMS). METHODS: A total of 94,085 women with a singleton pregnancy were prospectively enrolled in the study. The cell-free plasma DNA was directly sequenced without intermediate amplification and fetal abnormalities identified using an improved copy-number variation (CNV) calling algorithm. RESULTS: A total of 1128 pregnancies (1.2%) were scored positive for clinically significant fetal chromosome abnormalities. This comprised 965 aneuploidies (1.026%) and 163 (0.174%) MMS. From follow-up tests, the positive predictive values (PPVs) for T21, T18, T13, rare trisomies, and sex chromosome aneuploidies were calculated as 95%, 82%, 46%, 29%, and 47%, respectively. For known MMS (n = 32), PPVs were 93% (DiGeorge), 68% (22q11.22 microduplication), 75% (Prader-Willi/Angleman), and 50% (Cri du Chat). For the remaining genome-wide MMS (n = 88), combined PPVs were 32% (CNVs ≥10 Mb) and 19% (CNVs <10 Mb). CONCLUSION: NIPS-Plus yielded high PPVs for common aneuploidies and DiGeorge syndrome, and moderate PPVs for other MMS. Our results present compelling evidence that NIPS-Plus can be used as a first-tier pregnancy screening method to improve detection rates of clinically significant fetal chromosome abnormalities.


Asunto(s)
Ácidos Nucleicos Libres de Células/genética , Aberraciones Cromosómicas , Trastornos de los Cromosomas/diagnóstico , Pruebas Prenatales no Invasivas/métodos , Adolescente , Adulto , Aneuploidia , Trastornos de los Cromosomas/genética , Trastornos de los Cromosomas/patología , Variaciones en el Número de Copia de ADN/genética , Femenino , Humanos , Cariotipificación , Persona de Mediana Edad , Embarazo , Diagnóstico Prenatal , Factores de Riesgo , Aberraciones Cromosómicas Sexuales , Trisomía/genética , Adulto Joven
7.
Am J Obstet Gynecol ; 219(3): 287.e1-287.e18, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-29852155

RESUMEN

BACKGROUND: Next-generation sequencing is emerging as a viable alternative to chromosome microarray analysis for the diagnosis of chromosome disease syndromes. One next-generation sequencing methodology, copy number variation sequencing, has been shown to deliver high reliability, accuracy, and reproducibility for detection of fetal copy number variations in prenatal samples. However, its clinical utility as a first-tier diagnostic method has yet to be demonstrated in a large cohort of pregnant women referred for fetal chromosome testing. OBJECTIVE: We sought to evaluate copy number variation sequencing as a first-tier diagnostic method for detection of fetal chromosome anomalies in a general population of pregnant women with high-risk prenatal indications. STUDY DESIGN: This was a prospective analysis of 3429 pregnant women referred for amniocentesis and fetal chromosome testing for different risk indications, including advanced maternal age, high-risk maternal serum screening, and positivity for an ultrasound soft marker. Amniocentesis was performed by standard procedures. Amniocyte DNA was analyzed by copy number variation sequencing with a chromosome resolution of 0.1 Mb. Fetal chromosome anomalies including whole chromosome aneuploidy and segmental imbalances were independently confirmed by gold standard cytogenetic and molecular methods and their pathogenicity determined following guidelines of the American College of Medical Genetics for sequence variants. RESULTS: Clear interpretable copy number variation sequencing results were obtained for all 3429 amniocentesis samples. Copy number variation sequencing identified 3293 samples (96%) with a normal molecular karyotype and 136 samples (4%) with an altered molecular karyotype. A total of 146 fetal chromosome anomalies were detected, comprising 46 whole chromosome aneuploidies (pathogenic), 29 submicroscopic microdeletions/microduplications with known or suspected associations with chromosome disease syndromes (pathogenic), 22 other microdeletions/microduplications (likely pathogenic), and 49 variants of uncertain significance. Overall, the cumulative frequency of pathogenic/likely pathogenic and variants of uncertain significance chromosome anomalies in the patient cohort was 2.83% and 1.43%, respectively. In the 3 high-risk advanced maternal age, high-risk maternal serum screening, and ultrasound soft marker groups, the most common whole chromosome aneuploidy detected was trisomy 21, followed by sex chromosome aneuploidies, trisomy 18, and trisomy 13. Across all clinical indications, there was a similar incidence of submicroscopic copy number variations, with approximately equal proportions of pathogenic/likely pathogenic and variants of uncertain significance copy number variations. If karyotyping had been used as an alternate cytogenetics detection method, copy number variation sequencing would have returned a 1% higher yield of pathogenic or likely pathogenic copy number variations. CONCLUSION: In a large prospective clinical study, copy number variation sequencing delivered high reliability and accuracy for identifying clinically significant fetal anomalies in prenatal samples. Based on key performance criteria, copy number variation sequencing appears to be a well-suited methodology for first-tier diagnosis of pregnant women in the general population at risk of having a suspected fetal chromosome abnormality.


Asunto(s)
Trastornos de los Cromosomas/diagnóstico , Variaciones en el Número de Copia de ADN/genética , Adulto , Amniocentesis , Aneuploidia , China , Aberraciones Cromosómicas , Trastornos de los Cromosomas/genética , Síndrome de Down/diagnóstico , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Hibridación Fluorescente in Situ , Cariotipificación , Análisis por Micromatrices , Embarazo , Diagnóstico Prenatal , Estudios Prospectivos , Análisis de Secuencia de ADN , Aberraciones Cromosómicas Sexuales , Síndrome de la Trisomía 13/diagnóstico , Síndrome de la Trisomía 18/diagnóstico
8.
Hum Mutat ; 38(6): 669-677, 2017 06.
Artículo en Inglés | MEDLINE | ID: mdl-28247551

RESUMEN

Detailed characterization of chromosomal abnormalities, a common cause for congenital abnormalities and pregnancy loss, is critical for elucidating genes for human fetal development. Here, 2,186 product-of-conception samples were tested for copy-number variations (CNVs) at two clinical diagnostic centers using whole-genome sequencing and high-resolution chromosomal microarray analysis. We developed a new gene discovery approach to predict potential developmental genes and identified 275 candidate genes from CNVs detected from both datasets. Based on Mouse Genome Informatics (MGI) and Zebrafish model organism database (ZFIN), 75% of identified genes could lead to developmental defects when mutated. Genes involved in embryonic development, gene transcription, and regulation of biological processes were significantly enriched. Especially, transcription factors and gene families sharing specific protein domains predominated, which included known developmental genes such as HOX, NKX homeodomain genes, and helix-loop-helix containing HAND2, NEUROG2, and NEUROD1 as well as potential novel developmental genes. We observed that developmental genes were denser in certain chromosomal regions, enabling identification of 31 potential genomic loci with clustered genes associated with development.


Asunto(s)
Aberraciones Cromosómicas , Trastornos de los Cromosomas/genética , Desarrollo Embrionario/genética , Factores de Transcripción/genética , Animales , Trastornos de los Cromosomas/patología , Variaciones en el Número de Copia de ADN/genética , Femenino , Genoma Humano , Humanos , Ratones , Análisis por Micromatrices , Embarazo , Pez Cebra/genética
9.
BMC Genomics ; 18(1): 396, 2017 05 22.
Artículo en Inglés | MEDLINE | ID: mdl-28532386

RESUMEN

BACKGROUND: The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x have shown promise for using low coverage WGS in studies focused on variant discovery, association study replications, and population genomics characterization. However, the performance of low coverage WGS in populations with a complex history and no reference panel remains to be determined. RESULTS: South Indian populations are known to have a complex population structure and are an example of a major population group that lacks adequate reference panels. To test the performance of extremely low-coverage WGS (EXL-WGS) in populations with a complex history and to provide a reference resource for South Indian populations, we performed EXL-WGS on 185 South Indian individuals from eight populations to ~1.6x coverage. Using two variant discovery pipelines, SNPTools and GATK, we generated a consensus call set that has ~90% sensitivity for identifying common variants (minor allele frequency ≥ 10%). Imputation further improves the sensitivity of our call set. In addition, we obtained high-coverage for the whole mitochondrial genome to infer the maternal lineage evolutionary history of the Indian samples. CONCLUSIONS: Overall, we demonstrate that EXL-WGS with imputation can be a valuable study design for variant discovery with a dramatically lower cost than standard WGS, even in populations with a complex history and without available reference data. In addition, the South Indian EXL-WGS data generated in this study will provide a valuable resource for future Indian genomic studies.


Asunto(s)
Pueblo Asiatico/genética , Metagenómica , Secuenciación Completa del Genoma , Variación Genética , Genoma Mitocondrial/genética , Humanos
10.
Appl Opt ; 56(4): 816-822, 2017 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-28158081

RESUMEN

The radiation force of a high-energy laser caused by reflection at the input surface of a mounted KH2PO4 (KDP) crystal is studied, along with its effects on the second-harmonic generation (SHG) efficiency of the laser beam. A comprehensive model incorporating principles of momentum transfer, mechanics, and optics is proposed, taking advantage of which, the mechanical stress within the KDP crystal that is caused by the radiation force, and the SHG efficiency that is affected by the stress are successively studied. Moreover, the effects of the intensity of the laser beam on the radiation force, the stress, and the SHG efficiency are determined, respectively. It demonstrates that a high-energy laser beam causes macroscopic radiation force and further contributes negative effects to SHG efficiency.

11.
BMC Bioinformatics ; 17(1): 361, 2016 Sep 10.
Artículo en Inglés | MEDLINE | ID: mdl-27612449

RESUMEN

BACKGROUND: The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. RESULTS: We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. CONCLUSIONS: Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.


Asunto(s)
Genoma Humano , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Bases de Datos Genéticas , Humanos
12.
Hum Mutat ; 37(11): 1209-1214, 2016 11.
Artículo en Inglés | MEDLINE | ID: mdl-27507420

RESUMEN

Understanding the evolution of disease-associated mutations is fundamental to analyze pathogenetics of diseases. Mutation, recombination (by GC-biased gene conversion, gBGC), and selection have been known to shape the evolution of disease-associated mutations, but how these evolutionary forces work together is still an open question. In this study, we analyzed several human large-scale datasets (1000 Genomes, ESP6500, ExAC and ClinVar), and found that base-biased mutagenesis generates more GC→AT than AT→GC mutations, while gBGC promotes the fixation of AT→GC mutations to balance the impact of base-biased mutation on genome. Due to this effect of gBGC, purifying selection removes more deleterious AT→GC mutations than GC→AT from population, but many high-frequency (fixed and nearly fixed) deleterious AT→GC mutations are remained possibly due to high genetic load. As a special subset, disease-associated mutations follow this evolutionary rule, in which disease-associated GC→AT mutations are more enriched in rare mutations compared with AT→GC, while disease-associated AT→GC are more enriched in mutations with high frequency. Thus, we presented a base-biased evolutionary framework that explains the base-biased generation and accumulation of disease-associated mutations in human populations.


Asunto(s)
Predisposición Genética a la Enfermedad , Mutación , Composición de Base , Bases de Datos Genéticas , Evolución Molecular , Conversión Génica , Genoma Humano , Humanos , Modelos Genéticos , Recombinación Genética , Selección Genética
13.
Hum Mutat ; 37(3): 231-234, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26670213

RESUMEN

As the amount of human genomic sequence available from personal genomes and exomes has increased, so too has the observation of genomic positions having two or more alternative alleles, so-called multiallelic sites. For portions of the haploid genome that are present in more than one copy, including segmental duplications, variation at such multisite variant positions becomes even more complex. Despite the frequency of multiallelic variants, a number of commonly used resources and tools in genomic research and diagnostics do not support these multiallelic variants all together or require special modifications. Here, we explore the frequency of multiallelic sites in large samples with whole exome sequencing and discuss potential outcomes of failing to account for multiple variant alleles. We also briefly discuss some commonly utilized resources that fully support multiallelic sites.


Asunto(s)
Alelos , Exoma/genética , Genoma Humano/genética , Humanos
14.
Genome Res ; 23(5): 833-42, 2013 May.
Artículo en Inglés | MEDLINE | ID: mdl-23296920

RESUMEN

Next-generation sequencing is a powerful approach for discovering genetic variation. Sensitive variant calling and haplotype inference from population sequencing data remain challenging. We describe methods for high-quality discovery, genotyping, and phasing of SNPs for low-coverage (approximately 5×) sequencing of populations, implemented in a pipeline called SNPTools. Our pipeline contains several innovations that specifically address challenges caused by low-coverage population sequencing: (1) effective base depth (EBD), a nonparametric statistic that enables more accurate statistical modeling of sequencing data; (2) variance ratio scoring, a variance-based statistic that discovers polymorphic loci with high sensitivity and specificity; and (3) BAM-specific binomial mixture modeling (BBMM), a clustering algorithm that generates robust genotype likelihoods from heterogeneous sequencing data. Last, we develop an imputation engine that refines raw genotype likelihoods to produce high-quality phased genotypes/haplotypes. Designed for large population studies, SNPTools' input/output (I/O) and storage aware design leads to improved computing performance on large sequencing data sets. We apply SNPTools to the International 1000 Genomes Project (1000G) Phase 1 low-coverage data set and obtain genotyping accuracy comparable to that of SNP microarray.


Asunto(s)
Genotipo , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Polimorfismo de Nucleótido Simple/genética , Algoritmos , Secuencia de Bases , Proyecto Genoma Humano , Humanos
15.
Nature ; 467(7311): 52-8, 2010 Sep 02.
Artículo en Inglés | MEDLINE | ID: mdl-20811451

RESUMEN

Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of

Asunto(s)
Variaciones en el Número de Copia de ADN , Genoma Humano , Polimorfismo de Nucleótido Simple , Grupos de Población/genética , Proyecto Genoma Humano , Humanos
16.
BMC Genomics ; 16: 143, 2015 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-25765891

RESUMEN

BACKGROUND: Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls. RESULTS: This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%. CONCLUSIONS: In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans.


Asunto(s)
Exoma/genética , Mutación INDEL/genética , Mutagénesis , Biología Computacional , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Proyecto Genoma Humano , Humanos , Aprendizaje Automático
17.
BMC Genomics ; 16: 214, 2015 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-25887218

RESUMEN

BACKGROUND: Generation of long (>5 Kb) DNA sequencing reads provides an approach for interrogation of complex regions in the human genome. Currently, large-insert whole genome sequencing (WGS) technologies from Pacific Biosciences (PacBio) enable analysis of chromosomal structural variations (SVs), but the cost to achieve the required sequence coverage across the entire human genome is high. RESULTS: We developed a method (termed PacBio-LITS) that combines oligonucleotide-based DNA target-capture enrichment technologies with PacBio large-insert library preparation to facilitate SV studies at specific chromosomal regions. PacBio-LITS provides deep sequence coverage at the specified sites at substantially reduced cost compared with PacBio WGS. The efficacy of PacBio-LITS is illustrated by delineating the breakpoint junctions of low copy repeat (LCR)-associated complex structural rearrangements on chr17p11.2 in patients diagnosed with Potocki-Lupski syndrome (PTLS; MIM#610883). We successfully identified previously determined breakpoint junctions in three PTLS cases, and also were able to discover novel junctions in repetitive sequences, including LCR-mediated breakpoints. The new information has enabled us to propose mechanisms for formation of these structural variants. CONCLUSIONS: The new method leverages the cost efficiency of targeted capture-sequencing as well as the mappability and scaffolding capabilities of long sequencing reads generated by the PacBio platform. It is therefore suitable for studying complex SVs, especially those involving LCRs, inversions, and the generation of chimeric Alu elements at the breakpoints. Other genomic research applications, such as haplotype phasing and small insertion and deletion validation could also benefit from this technology.


Asunto(s)
Genómica/métodos , Aberraciones Cromosómicas , Biblioteca de Genes , Reordenamiento Génico , Estudios de Asociación Genética/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Flujo de Trabajo
18.
BMC Bioinformatics ; 15: 30, 2014 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-24475911

RESUMEN

BACKGROUND: Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results. RESULTS: To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts. CONCLUSIONS: By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples.


Asunto(s)
Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Internet , Programas Informáticos , Genoma/genética , Humanos
19.
Blood ; 119(8): 1929-34, 2012 Feb 23.
Artículo en Inglés | MEDLINE | ID: mdl-22219226

RESUMEN

Factor VIII (FVIII) functions as a cofactor for factor IXa in the contact coagulation pathway and circulates in a protective complex with von Willebrand factor (VWF). Plasma FVIII activity is strongly influenced by environmental and genetic factors through VWF-dependent and -independent mechanisms. Single nucleotide polymorphisms (SNPs) of the coding and promoter sequence in the FVIII gene have been extensively studied for effects on FVIII synthesis, secretion, and activity, but impacts of non-disease-causing intronic SNPs remain largely unknown. We analyzed FVIII SNPs and FVIII activity in 10,434 healthy Americans of European (EA) or African (AA) descent in the Atherosclerosis Risk in Communities (ARIC) study. Among covariates, age, race, diabetes, and ABO contributed 2.2%, 3.5%, 4%, and 10.7% to FVIII intersubject variation, respectively. Four intronic FVIII SNPs associated with FVIII activity and 8 with FVIII-VWF ratio in a sex- and race-dependent manner. The FVIII haplotypes AT and GCTTTT also associated with FVIII activity. Seven VWF SNPs were associated with FVIII activity in EA subjects, but no FVIII SNPs were associated with VWF Ag. These data demonstrate that intronic SNPs could directly or indirectly influence intersubject variation of FVIII activity. Further investigation may reveal novel mechanisms of regulating FVIII expression and activity.


Asunto(s)
Factor VIII/genética , Factor VIII/metabolismo , Polimorfismo de Nucleótido Simple , Factor de von Willebrand/genética , Negro o Afroamericano/genética , Aterosclerosis/sangre , Aterosclerosis/etnología , Aterosclerosis/genética , Secuencia de Bases , Femenino , Frecuencia de los Genes , Genotipo , Haplotipos , Humanos , Intrones/genética , Desequilibrio de Ligamiento , Masculino , Persona de Mediana Edad , Estudios Prospectivos , Medición de Riesgo , Factores de Riesgo , Factores Sexuales , Población Blanca/genética
20.
Proc Natl Acad Sci U S A ; 108(29): 11983-8, 2011 Jul 19.
Artículo en Inglés | MEDLINE | ID: mdl-21730125

RESUMEN

High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2-4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.


Asunto(s)
Demografía , Evolución Molecular , Genes/genética , Variación Genética , Genética de Población , Modelos Genéticos , Grupos Raciales/genética , Frecuencia de los Genes , Flujo Genético , Genómica/métodos , Humanos , Análisis de Secuencia de ADN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA