Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Gigascience ; 5(1): 42, 2016 10 11.
Artículo en Inglés | MEDLINE | ID: mdl-27724973

RESUMEN

BACKGROUND: Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. FINDINGS: As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics' Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics' standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. CONCLUSIONS: These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.


Asunto(s)
Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , ADN/sangre , Haplotipos , Humanos , Reproducibilidad de los Resultados
2.
Nature ; 487(7406): 190-5, 2012 Jul 11.
Artículo en Inglés | MEDLINE | ID: mdl-22785314

RESUMEN

Recent advances in whole-genome sequencing have brought the vision of personal genomics and genomic medicine closer to reality. However, current methods lack clinical accuracy and the ability to describe the context (haplotypes) in which genome variants co-occur in a cost-effective manner. Here we describe a low-cost DNA sequencing and haplotyping process, long fragment read (LFR) technology, which is similar to sequencing long single DNA molecules without cloning or separation of metaphase chromosomes. In this study, ten LFR libraries were made using only ∼100 picograms of human DNA per sample. Up to 97% of the heterozygous single nucleotide variants were assembled into long haplotype contigs. Removal of false positive single nucleotide variants not phased by multiple LFR haplotypes resulted in a final genome error rate of 1 in 10 megabases. Cost-effective and accurate genome sequencing and haplotyping from 10-20 human cells, as demonstrated here, will enable comprehensive genetic studies and diverse clinical applications.


Asunto(s)
Genoma Humano , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Alelos , Línea Celular , Femenino , Silenciador del Gen , Variación Genética , Haplotipos , Humanos , Mutación , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN/economía , Análisis de Secuencia de ADN/normas
3.
Proc Natl Acad Sci U S A ; 109(30): 11920-7, 2012 Jul 24.
Artículo en Inglés | MEDLINE | ID: mdl-22797899

RESUMEN

Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the Personal Genome Project choose to forgo privacy via our institutional review board- approved "open consent" process. The contribution of public data and samples facilitates both scientific discovery and standardization of methods. We present our findings after enrollment of more than 1,800 participants, including whole-genome sequencing of 10 pilot participant genomes (the PGP-10). We introduce the Genome-Environment-Trait Evidence (GET-Evidence) system. This tool automatically processes genomes and prioritizes both published and novel variants for interpretation. In the process of reviewing the presumed healthy PGP-10 genomes, we find numerous literature references implying serious disease. Although it is sometimes impossible to rule out a late-onset effect, stringent evidence requirements can address the high rate of incidental findings. To that end we develop a peer production system for recording and organizing variant evaluations according to standard evidence guidelines, creating a public forum for reaching consensus on interpretation of clinically relevant variants. Genome analysis becomes a two-step process: using a prioritized list to record variant evaluations, then automatically sorting reviewed variants using these annotations. Genome data, health and trait information, participant samples, and variant interpretations are all shared in the public domain-we invite others to review our results using our participant samples and contribute to our interpretations. We offer our public resource and methods to further personalized medical research.


Asunto(s)
Bases de Datos Genéticas , Variación Genética , Genoma Humano/genética , Fenotipo , Medicina de Precisión/métodos , Programas Informáticos , Línea Celular , Recolección de Datos , Humanos , Medicina de Precisión/tendencias , Análisis de Secuencia de ADN
4.
PLoS Genet ; 6(5): e1000954, 2010 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-20531933

RESUMEN

While it is widely held that an organism's genomic information should remain constant, several protein families are known to modify it. Members of the AID/APOBEC protein family can deaminate DNA. Similarly, members of the ADAR family can deaminate RNA. Characterizing the scope of these events is challenging. Here we use large genomic data sets, such as the two billion sequences in the NCBI Trace Archive, to look for clusters of mismatches of the same type, which are a hallmark of editing events caused by APOBEC3 and ADAR. We align 603,249,815 traces from the NCBI trace archive to their reference genomes. In clusters of mismatches of increasing size, at least one systematic sequencing error dominates the results (G-to-A). It is still present in mismatches with 99% accuracy and only vanishes in mismatches at 99.99% accuracy or higher. The error appears to have entered into about 1% of the HapMap, possibly affecting other users that rely on this resource. Further investigation, using stringent quality thresholds, uncovers thousands of mismatch clusters with no apparent defects in their chromatograms. These traces provide the first reported candidates of endogenous DNA editing in human, further elucidating RNA editing in human and mouse and also revealing, for the first time, extensive RNA editing in Xenopus tropicalis. We show that the NCBI Trace Archive provides a valuable resource for the investigation of the phenomena of DNA and RNA editing, as well as setting the stage for a comprehensive mapping of editing events in large-scale genomic datasets.


Asunto(s)
ADN/genética , Genómica , Edición de ARN , Desaminasas APOBEC , Adenosina Desaminasa/genética , Disparidad de Par Base , Citidina Desaminasa , Citosina Desaminasa/genética , Humanos , Familia de Multigenes , Proteínas de Unión al ARN
5.
Lancet ; 375(9725): 1525-35, 2010 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-20435227

RESUMEN

BACKGROUND: The cost of genomic information has fallen steeply, but the clinical translation of genetic risk estimates remains unclear. We aimed to undertake an integrated analysis of a complete human genome in a clinical context. METHODS: We assessed a patient with a family history of vascular disease and early sudden death. Clinical assessment included analysis of this patient's full genome sequence, risk prediction for coronary artery disease, screening for causes of sudden cardiac death, and genetic counselling. Genetic analysis included the development of novel methods for the integration of whole genome and clinical risk. Disease and risk analysis focused on prediction of genetic risk of variants associated with mendelian disease, recognised drug responses, and pathogenicity for novel variants. We queried disease-specific mutation databases and pharmacogenomics databases to identify genes and mutations with known associations with disease and drug response. We estimated post-test probabilities of disease by applying likelihood ratios derived from integration of multiple common variants to age-appropriate and sex-appropriate pre-test probabilities. We also accounted for gene-environment interactions and conditionally dependent risks. FINDINGS: Analysis of 2.6 million single nucleotide polymorphisms and 752 copy number variations showed increased genetic risk for myocardial infarction, type 2 diabetes, and some cancers. We discovered rare variants in three genes that are clinically associated with sudden cardiac death-TMEM43, DSP, and MYBPC3. A variant in LPA was consistent with a family history of coronary artery disease. The patient had a heterozygous null mutation in CYP2C19 suggesting probable clopidogrel resistance, several variants associated with a positive response to lipid-lowering therapy, and variants in CYP4F2 and VKORC1 that suggest he might have a low initial dosing requirement for warfarin. Many variants of uncertain importance were reported. INTERPRETATION: Although challenges remain, our results suggest that whole-genome sequencing can yield useful and clinically relevant information for individual patients. FUNDING: National Institute of General Medical Sciences; National Heart, Lung And Blood Institute; National Human Genome Research Institute; Howard Hughes Medical Institute; National Library of Medicine, Lucile Packard Foundation for Children's Health; Hewlett Packard Foundation; Breetwor Family Foundation.


Asunto(s)
Predisposición Genética a la Enfermedad/genética , Pruebas Genéticas , Genoma Humano , Análisis de Secuencia de ADN , Enfermedades Vasculares/genética , Adulto , Hidrocarburo de Aril Hidroxilasas/genética , Proteínas Portadoras/genética , Citocromo P-450 CYP2C19 , Sistema Enzimático del Citocromo P-450/genética , Familia 4 del Citocromo P450 , Muerte Súbita Cardíaca , Desmoplaquinas/genética , Ambiente , Salud de la Familia , Asesoramiento Genético , Humanos , Lipoproteína(a)/genética , Masculino , Proteínas de la Membrana/genética , Oxigenasas de Función Mixta/genética , Mutación , Osteoartritis/genética , Linaje , Farmacogenética , Polimorfismo de Nucleótido Simple , Medición de Riesgo , Vitamina K Epóxido Reductasas
6.
Nature ; 460(7258): 1011-5, 2009 Aug 20.
Artículo en Inglés | MEDLINE | ID: mdl-19587683

RESUMEN

Recent advances in sequencing technologies have initiated an era of personal genome sequences. To date, human genome sequences have been reported for individuals with ancestry in three distinct geographical regions: a Yoruba African, two individuals of northwest European origin, and a person from China. Here we provide a highly annotated, whole-genome sequence for a Korean individual, known as AK1. The genome of AK1 was determined by an exacting, combined approach that included whole-genome shotgun sequencing (27.8x coverage), targeted bacterial artificial chromosome sequencing, and high-resolution comparative genomic hybridization using custom microarrays featuring more than 24 million probes. Alignment to the NCBI reference, a composite of several ethnic clades, disclosed nearly 3.45 million single nucleotide polymorphisms (SNPs), including 10,162 non-synonymous SNPs, and 170,202 deletion or insertion polymorphisms (indels). SNP and indel densities were strongly correlated genome-wide. Applying very conservative criteria yielded highly reliable copy number variants for clinical considerations. Potential medical phenotypes were annotated for non-synonymous SNPs, coding domain indels, and structural variants. The integration of several human whole-genome sequences derived from several ethnic groups will assist in understanding genetic ancestry, migration patterns and population bottlenecks.


Asunto(s)
Pueblo Asiatico/genética , Genoma Humano/genética , Cromosomas Artificiales Bacterianos/genética , Hibridación Genómica Comparativa , Biología Computacional , Humanos , Mutación INDEL/genética , Corea (Geográfico) , Análisis de Secuencia por Matrices de Oligonucleótidos , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADN
7.
Genome Res ; 19(9): 1606-15, 2009 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-19525355

RESUMEN

Utilizing the full power of next-generation sequencing often requires the ability to perform large-scale multiplex enrichment of many specific genomic loci in multiple samples. Several technologies have been recently developed but await substantial improvements. We report the 10,000-fold improvement of a previously developed padlock-based approach, and apply the assay to identifying genetic variations in hypermutable CpG regions across human chromosome 21. From approximately 3 million reads derived from a single Illumina Genome Analyzer lane, approximately 94% (approximately 50,500) target sites can be observed with at least one read. The uniformity of coverage was also greatly improved; up to 93% and 57% of all targets fell within a 100- and 10-fold coverage range, respectively. Alleles at >400,000 target base positions were determined across six subjects and examined for single nucleotide polymorphisms (SNPs), and the concordance with independently obtained genotypes was 98.4%-100%. We detected >500 SNPs not currently in dbSNP, 362 of which were in targeted CpG locations. Transitions in CpG sites were at least 13.7 times more abundant than non-CpG transitions. Fractions of polymorphic CpG sites are lower in CpG-rich regions and show higher correlation with human-chimpanzee divergence within CpG versus non-CpG sites. This is consistent with the hypothesis that methylation rate heterogeneity along chromosomes contributes to mutation rate variation in humans. Our success suggests that targeted CpG resequencing is an efficient way to identify common and rare genetic variations. In addition, the significantly improved padlock capture technology can be readily applied to other projects that require multiplex sample preparation.


Asunto(s)
Cromosomas Humanos Par 21/genética , Islas de CpG/genética , Sondas de ADN/genética , Variación Genética , Genoma Humano/genética , Mutación , Análisis de Secuencia de ADN/métodos , Animales , Biología Computacional/métodos , Genotipo , Humanos , Polimorfismo de Nucleótido Simple , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
8.
Bioinformatics ; 25(17): 2194-9, 2009 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-19549630

RESUMEN

MOTIVATION: Primary data analysis methods are of critical importance in second generation DNA sequencing. Improved methods have the potential to increase yield and reduce the error rates. Openly documented analysis tools enable the user to understand the primary data, this is important for the optimization and validity of their scientific work. RESULTS: In this article, we describe Swift, a new tool for performing primary data analysis on the Illumina Solexa Sequencing Platform. Swift is the first tool, outside of the vendors own software, which completes the full analysis process, from raw images through to base calls. As such it provides an alternative to, and independent validation of, the vendor supplied tool. Our results show that Swift is able to increase yield by 13.8%, at comparable error rate.


Asunto(s)
Análisis de Secuencia de ADN/métodos , Programas Informáticos , Secuencia de Bases , Biología Computacional , Datos de Secuencia Molecular
9.
Proc USENIX Annu Tech Conf ; 2008: 391-404, 2008 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-20514356

RESUMEN

We introduce the Free Factory, a platform for deploying data-intensive web services using small clusters of commodity hardware and free software. Independently administered virtual machines called Freegols give application developers the flexibility of a general purpose web server, along with access to distributed batch processing, cache and storage services. Each cluster exploits idle RAM and disk space for cache, and reserves disks in each node for high bandwidth storage. The batch processing service uses a variation of the MapReduce model. Virtualization allows every CPU in the cluster to participate in batch jobs. Each 48-node cluster can achieve 4-8 gigabytes per second of disk I/O. Our intent is to use multiple clusters to process hundreds of simultaneous requests on multi-hundred terabyte data sets. Currently, our applications achieve 1 gigabyte per second of I/O with 123 disks by scheduling batch jobs on two clusters, one of which is located in a remote data center.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA