Pesquisa | BVS Educação Profissional em Saúde

Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells.

Peters, Brock A; Kermani, Bahram G; Sparks, Andrew B; Alferov, Oleg; Hong, Peter; Alexeev, Andrei; Jiang, Yuan; Dahl, Fredrik; Tang, Y Tom; Haas, Juergen; Robasky, Kimberly; Zaranek, Alexander Wait; Lee, Je-Hyuk; Ball, Madeleine Price; Peterson, Joseph E; Perazich, Helena; Yeung, George; Liu, Jia; Chen, Linsu; Kennemer, Michael I; Pothuraju, Kaliprasad; Konvicka, Karel; Tsoupko-Sitnikov, Mike; Pant, Krishna P; Ebert, Jessica C; Nilsen, Geoffrey B; Baccash, Jonathan; Halpern, Aaron L; Church, George M; Drmanac, Radoje.

Nature ; 487(7406): 190-5, 2012 Jul 11.

Artigo em Inglês | MEDLINE | ID: mdl-22785314

RESUMO

Recent advances in whole-genome sequencing have brought the vision of personal genomics and genomic medicine closer to reality. However, current methods lack clinical accuracy and the ability to describe the context (haplotypes) in which genome variants co-occur in a cost-effective manner. Here we describe a low-cost DNA sequencing and haplotyping process, long fragment read (LFR) technology, which is similar to sequencing long single DNA molecules without cloning or separation of metaphase chromosomes. In this study, ten LFR libraries were made using only â¼100 picograms of human DNA per sample. Up to 97% of the heterozygous single nucleotide variants were assembled into long haplotype contigs. Removal of false positive single nucleotide variants not phased by multiple LFR haplotypes resulted in a final genome error rate of 1 in 10 megabases. Cost-effective and accurate genome sequencing and haplotyping from 10-20 human cells, as demonstrated here, will enable comprehensive genetic studies and diverse clinical applications.

Assuntos

Genoma Humano , Genômica/métodos , Análise de Sequência de DNA/métodos , Alelos , Linhagem Celular , Feminino , Inativação Gênica , Variação Genética , Haplótipos , Humanos , Mutação , Reprodutibilidade dos Testes , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/normas

A highly annotated whole-genome sequence of a Korean individual.

Kim, Jong-Il; Ju, Young Seok; Park, Hansoo; Kim, Sheehyun; Lee, Seonwook; Yi, Jae-Hyuk; Mudge, Joann; Miller, Neil A; Hong, Dongwan; Bell, Callum J; Kim, Hye-Sun; Chung, In-Soon; Lee, Woo-Chung; Lee, Ji-Sun; Seo, Seung-Hyun; Yun, Ji-Young; Woo, Hyun Nyun; Lee, Heewook; Suh, Dongwhan; Lee, Seungbok; Kim, Hyun-Jin; Yavartanoo, Maryam; Kwak, Minhye; Zheng, Ying; Lee, Mi Kyeong; Park, Hyunjun; Kim, Jeong Yeon; Gokcumen, Omer; Mills, Ryan E; Zaranek, Alexander Wait; Thakuria, Joseph; Wu, Xiaodi; Kim, Ryan W; Huntley, Jim J; Luo, Shujun; Schroth, Gary P; Wu, Thomas D; Kim, HyeRan; Yang, Kap-Seok; Park, Woong-Yang; Kim, Hyungtae; Church, George M; Lee, Charles; Kingsmore, Stephen F; Seo, Jeong-Sun.

Nature ; 460(7258): 1011-5, 2009 Aug 20.

Artigo em Inglês | MEDLINE | ID: mdl-19587683

RESUMO

Recent advances in sequencing technologies have initiated an era of personal genome sequences. To date, human genome sequences have been reported for individuals with ancestry in three distinct geographical regions: a Yoruba African, two individuals of northwest European origin, and a person from China. Here we provide a highly annotated, whole-genome sequence for a Korean individual, known as AK1. The genome of AK1 was determined by an exacting, combined approach that included whole-genome shotgun sequencing (27.8x coverage), targeted bacterial artificial chromosome sequencing, and high-resolution comparative genomic hybridization using custom microarrays featuring more than 24 million probes. Alignment to the NCBI reference, a composite of several ethnic clades, disclosed nearly 3.45 million single nucleotide polymorphisms (SNPs), including 10,162 non-synonymous SNPs, and 170,202 deletion or insertion polymorphisms (indels). SNP and indel densities were strongly correlated genome-wide. Applying very conservative criteria yielded highly reliable copy number variants for clinical considerations. Potential medical phenotypes were annotated for non-synonymous SNPs, coding domain indels, and structural variants. The integration of several human whole-genome sequences derived from several ethnic groups will assist in understanding genetic ancestry, migration patterns and population bottlenecks.

Assuntos

Povo Asiático/genética , Genoma Humano/genética , Cromossomos Artificiais Bacterianos/genética , Hibridização Genômica Comparativa , Biologia Computacional , Humanos , Mutação INDEL/genética , Coreia (Geográfico) , Análise de Sequência com Séries de Oligonucleotídeos , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA

A public resource facilitating clinical use of genomes.

Ball, Madeleine P; Thakuria, Joseph V; Zaranek, Alexander Wait; Clegg, Tom; Rosenbaum, Abraham M; Wu, Xiaodi; Angrist, Misha; Bhak, Jong; Bobe, Jason; Callow, Matthew J; Cano, Carlos; Chou, Michael F; Chung, Wendy K; Douglas, Shawn M; Estep, Preston W; Gore, Athurva; Hulick, Peter; Labarga, Alberto; Lee, Je-Hyuk; Lunshof, Jeantine E; Kim, Byung Chul; Kim, Jong-Il; Li, Zhe; Murray, Michael F; Nilsen, Geoffrey B; Peters, Brock A; Raman, Anugraha M; Rienhoff, Hugh Y; Robasky, Kimberly; Wheeler, Matthew T; Vandewege, Ward; Vorhaus, Daniel B; Yang, Joyce L; Yang, Luhan; Aach, John; Ashley, Euan A; Drmanac, Radoje; Kim, Seong-Jin; Li, Jin Billy; Peshkin, Leonid; Seidman, Christine E; Seo, Jeong-Sun; Zhang, Kun; Rehm, Heidi L; Church, George M.

Proc Natl Acad Sci U S A ; 109(30): 11920-7, 2012 Jul 24.

Artigo em Inglês | MEDLINE | ID: mdl-22797899

RESUMO

Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the Personal Genome Project choose to forgo privacy via our institutional review board- approved "open consent" process. The contribution of public data and samples facilitates both scientific discovery and standardization of methods. We present our findings after enrollment of more than 1,800 participants, including whole-genome sequencing of 10 pilot participant genomes (the PGP-10). We introduce the Genome-Environment-Trait Evidence (GET-Evidence) system. This tool automatically processes genomes and prioritizes both published and novel variants for interpretation. In the process of reviewing the presumed healthy PGP-10 genomes, we find numerous literature references implying serious disease. Although it is sometimes impossible to rule out a late-onset effect, stringent evidence requirements can address the high rate of incidental findings. To that end we develop a peer production system for recording and organizing variant evaluations according to standard evidence guidelines, creating a public forum for reaching consensus on interpretation of clinically relevant variants. Genome analysis becomes a two-step process: using a prioritized list to record variant evaluations, then automatically sorting reviewed variants using these annotations. Genome data, health and trait information, participant samples, and variant interpretations are all shared in the public domain-we invite others to review our results using our participant samples and contribute to our interpretations. We offer our public resource and methods to further personalized medical research.

Assuntos

Bases de Dados Genéticas , Variação Genética , Genoma Humano/genética , Fenótipo , Medicina de Precisão/métodos , Software , Linhagem Celular , Coleta de Dados , Humanos , Medicina de Precisão/tendências , Análise de Sequência de DNA

A survey of genomic traces reveals a common sequencing error, RNA editing, and DNA editing.

Zaranek, Alexander Wait; Levanon, Erez Y; Zecharia, Tomer; Clegg, Tom; Church, George M.

PLoS Genet ; 6(5): e1000954, 2010 May 20.

Artigo em Inglês | MEDLINE | ID: mdl-20531933

RESUMO

While it is widely held that an organism's genomic information should remain constant, several protein families are known to modify it. Members of the AID/APOBEC protein family can deaminate DNA. Similarly, members of the ADAR family can deaminate RNA. Characterizing the scope of these events is challenging. Here we use large genomic data sets, such as the two billion sequences in the NCBI Trace Archive, to look for clusters of mismatches of the same type, which are a hallmark of editing events caused by APOBEC3 and ADAR. We align 603,249,815 traces from the NCBI trace archive to their reference genomes. In clusters of mismatches of increasing size, at least one systematic sequencing error dominates the results (G-to-A). It is still present in mismatches with 99% accuracy and only vanishes in mismatches at 99.99% accuracy or higher. The error appears to have entered into about 1% of the HapMap, possibly affecting other users that rely on this resource. Further investigation, using stringent quality thresholds, uncovers thousands of mismatch clusters with no apparent defects in their chromatograms. These traces provide the first reported candidates of endogenous DNA editing in human, further elucidating RNA editing in human and mouse and also revealing, for the first time, extensive RNA editing in Xenopus tropicalis. We show that the NCBI Trace Archive provides a valuable resource for the investigation of the phenomena of DNA and RNA editing, as well as setting the stage for a comprehensive mapping of editing events in large-scale genomic datasets.

Assuntos

DNA/genética , Genômica , Edição de RNA , Desaminases APOBEC , Adenosina Desaminase/genética , Pareamento Incorreto de Bases , Citidina Desaminase , Citosina Desaminase/genética , Humanos , Família Multigênica , Proteínas de Ligação a RNA

Multiplex padlock targeted sequencing reveals human hypermutable CpG variations.

Li, Jin Billy; Gao, Yuan; Aach, John; Zhang, Kun; Kryukov, Gregory V; Xie, Bin; Ahlford, Annika; Yoon, Jung-Ki; Rosenbaum, Abraham M; Zaranek, Alexander Wait; LeProust, Emily; Sunyaev, Shamil R; Church, George M.

Genome Res ; 19(9): 1606-15, 2009 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-19525355

RESUMO

Utilizing the full power of next-generation sequencing often requires the ability to perform large-scale multiplex enrichment of many specific genomic loci in multiple samples. Several technologies have been recently developed but await substantial improvements. We report the 10,000-fold improvement of a previously developed padlock-based approach, and apply the assay to identifying genetic variations in hypermutable CpG regions across human chromosome 21. From approximately 3 million reads derived from a single Illumina Genome Analyzer lane, approximately 94% (approximately 50,500) target sites can be observed with at least one read. The uniformity of coverage was also greatly improved; up to 93% and 57% of all targets fell within a 100- and 10-fold coverage range, respectively. Alleles at >400,000 target base positions were determined across six subjects and examined for single nucleotide polymorphisms (SNPs), and the concordance with independently obtained genotypes was 98.4%-100%. We detected >500 SNPs not currently in dbSNP, 362 of which were in targeted CpG locations. Transitions in CpG sites were at least 13.7 times more abundant than non-CpG transitions. Fractions of polymorphic CpG sites are lower in CpG-rich regions and show higher correlation with human-chimpanzee divergence within CpG versus non-CpG sites. This is consistent with the hypothesis that methylation rate heterogeneity along chromosomes contributes to mutation rate variation in humans. Our success suggests that targeted CpG resequencing is an efficient way to identify common and rare genetic variations. In addition, the significantly improved padlock capture technology can be readily applied to other projects that require multiplex sample preparation.

Assuntos

Cromossomos Humanos Par 21/genética , Ilhas de CpG/genética , Sondas de DNA/genética , Variação Genética , Genoma Humano/genética , Mutação , Análise de Sequência de DNA/métodos , Animais , Biologia Computacional/métodos , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

Clinical assessment incorporating a personal genome.

Ashley, Euan A; Butte, Atul J; Wheeler, Matthew T; Chen, Rong; Klein, Teri E; Dewey, Frederick E; Dudley, Joel T; Ormond, Kelly E; Pavlovic, Aleksandra; Morgan, Alexander A; Pushkarev, Dmitry; Neff, Norma F; Hudgins, Louanne; Gong, Li; Hodges, Laura M; Berlin, Dorit S; Thorn, Caroline F; Sangkuhl, Katrin; Hebert, Joan M; Woon, Mark; Sagreiya, Hersh; Whaley, Ryan; Knowles, Joshua W; Chou, Michael F; Thakuria, Joseph V; Rosenbaum, Abraham M; Zaranek, Alexander Wait; Church, George M; Greely, Henry T; Quake, Stephen R; Altman, Russ B.

Lancet ; 375(9725): 1525-35, 2010 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-20435227

RESUMO

BACKGROUND: The cost of genomic information has fallen steeply, but the clinical translation of genetic risk estimates remains unclear. We aimed to undertake an integrated analysis of a complete human genome in a clinical context. METHODS: We assessed a patient with a family history of vascular disease and early sudden death. Clinical assessment included analysis of this patient's full genome sequence, risk prediction for coronary artery disease, screening for causes of sudden cardiac death, and genetic counselling. Genetic analysis included the development of novel methods for the integration of whole genome and clinical risk. Disease and risk analysis focused on prediction of genetic risk of variants associated with mendelian disease, recognised drug responses, and pathogenicity for novel variants. We queried disease-specific mutation databases and pharmacogenomics databases to identify genes and mutations with known associations with disease and drug response. We estimated post-test probabilities of disease by applying likelihood ratios derived from integration of multiple common variants to age-appropriate and sex-appropriate pre-test probabilities. We also accounted for gene-environment interactions and conditionally dependent risks. FINDINGS: Analysis of 2.6 million single nucleotide polymorphisms and 752 copy number variations showed increased genetic risk for myocardial infarction, type 2 diabetes, and some cancers. We discovered rare variants in three genes that are clinically associated with sudden cardiac death-TMEM43, DSP, and MYBPC3. A variant in LPA was consistent with a family history of coronary artery disease. The patient had a heterozygous null mutation in CYP2C19 suggesting probable clopidogrel resistance, several variants associated with a positive response to lipid-lowering therapy, and variants in CYP4F2 and VKORC1 that suggest he might have a low initial dosing requirement for warfarin. Many variants of uncertain importance were reported. INTERPRETATION: Although challenges remain, our results suggest that whole-genome sequencing can yield useful and clinically relevant information for individual patients. FUNDING: National Institute of General Medical Sciences; National Heart, Lung And Blood Institute; National Human Genome Research Institute; Howard Hughes Medical Institute; National Library of Medicine, Lucile Packard Foundation for Children's Health; Hewlett Packard Foundation; Breetwor Family Foundation.

Assuntos

Predisposição Genética para Doença/genética , Testes Genéticos , Genoma Humano , Análise de Sequência de DNA , Doenças Vasculares/genética , Adulto , Hidrocarboneto de Aril Hidroxilases/genética , Proteínas de Transporte/genética , Citocromo P-450 CYP2C19 , Sistema Enzimático do Citocromo P-450/genética , Família 4 do Citocromo P450 , Morte Súbita Cardíaca , Desmoplaquinas/genética , Meio Ambiente , Saúde da Família , Aconselhamento Genético , Humanos , Lipoproteína(a)/genética , Masculino , Proteínas de Membrana/genética , Oxigenases de Função Mista/genética , Mutação , Osteoartrite/genética , Linhagem , Farmacogenética , Polimorfismo de Nucleotídeo Único , Medição de Risco , Vitamina K Epóxido Redutases

Swift: primary data analysis for the Illumina Solexa sequencing platform.

Whiteford, Nava; Skelly, Tom; Curtis, Christina; Ritchie, Matt E; Löhr, Andrea; Zaranek, Alexander Wait; Abnizova, Irina; Brown, Clive.

Bioinformatics ; 25(17): 2194-9, 2009 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-19549630

RESUMO

MOTIVATION: Primary data analysis methods are of critical importance in second generation DNA sequencing. Improved methods have the potential to increase yield and reduce the error rates. Openly documented analysis tools enable the user to understand the primary data, this is important for the optimization and validity of their scientific work. RESULTS: In this article, we describe Swift, a new tool for performing primary data analysis on the Illumina Solexa Sequencing Platform. Swift is the first tool, outside of the vendors own software, which completes the full analysis process, from raw images through to base calls. As such it provides an alternative to, and independent validation of, the vendor supplied tool. Our results show that Swift is able to increase yield by 13.8%, at comparable error rate.

Assuntos

Análise de Sequência de DNA/métodos , Software , Sequência de Bases , Biologia Computacional , Dados de Sequência Molecular

The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes.

Mao, Qing; Ciotlos, Serban; Zhang, Rebecca Yu; Ball, Madeleine P; Chin, Robert; Carnevali, Paolo; Barua, Nina; Nguyen, Staci; Agarwal, Misha R; Clegg, Tom; Connelly, Abram; Vandewege, Ward; Zaranek, Alexander Wait; Estep, Preston W; Church, George M; Drmanac, Radoje; Peters, Brock A.

Gigascience ; 5(1): 42, 2016 10 11.

Artigo em Inglês | MEDLINE | ID: mdl-27724973

RESUMO

BACKGROUND: Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. FINDINGS: As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics' Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics' standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. CONCLUSIONS: These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.

Assuntos

Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , DNA/sangue , Haplótipos , Humanos , Reprodutibilidade dos Testes

Free Factories: Unified Infrastructure for Data Intensive Web Services.

Zaranek, Alexander Wait; Clegg, Tom; Vandewege, Ward; Church, George M.

Proc USENIX Annu Tech Conf ; 2008: 391-404, 2008 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-20514356

RESUMO

We introduce the Free Factory, a platform for deploying data-intensive web services using small clusters of commodity hardware and free software. Independently administered virtual machines called Freegols give application developers the flexibility of a general purpose web server, along with access to distributed batch processing, cache and storage services. Each cluster exploits idle RAM and disk space for cache, and reserves disks in each node for high bandwidth storage. The batch processing service uses a variation of the MapReduce model. Virtualization allows every CPU in the cluster to participate in batch jobs. Each 48-node cluster can achieve 4-8 gigabytes per second of disk I/O. Our intent is to use multiple clusters to process hundreds of simultaneous requests on multi-hundred terabyte data sets. Currently, our applications achieve 1 gigabyte per second of I/O with 123 disks by scheduling batch jobs on two clusters, one of which is located in a remote data center.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA