RESUMEN
BACKGROUND: Increasing numbers of healthy individuals are undergoing predispositional personal genome sequencing. Here we describe the design and early outcomes of the PeopleSeq Consortium, a multi-cohort collaboration of predispositional genome sequencing projects, which is examining the medical, behavioral, and economic outcomes of returning genomic sequencing information to healthy individuals. METHODS: Apparently healthy adults who participated in four of the sequencing projects in the Consortium were included. Web-based surveys were administered before and after genomic results disclosure, or in some cases only after results disclosure. Surveys inquired about sociodemographic characteristics, motivations and concerns, behavioral and medical responses to sequencing results, and perceived utility. RESULTS: Among 1395 eligible individuals, 658 enrolled in the Consortium when contacted and 543 have completed a survey after receiving their genomic results thus far (mean age 53.0 years, 61.4% male, 91.7% white, 95.5% college graduates). Most participants (98.1%) were motivated to undergo sequencing because of curiosity about their genetic make-up. The most commonly reported concerns prior to pursuing sequencing included how well the results would predict future risk (59.2%) and the complexity of genetic variant interpretation (56.8%), while 47.8% of participants were concerned about the privacy of their genetic information. Half of participants reported discussing their genomic results with a healthcare provider during a median of 8.0 months after receiving the results; 13.5% reported making an additional appointment with a healthcare provider specifically because of their results. Few participants (< 10%) reported making changes to their diet, exercise habits, or insurance coverage because of their results. Many participants (39.5%) reported learning something new to improve their health that they did not know before. Reporting regret or harm from the decision to undergo sequencing was rare (< 3.0%). CONCLUSIONS: Healthy individuals who underwent predispositional sequencing expressed some concern around privacy prior to pursuing sequencing, but were enthusiastic about their experience and not distressed by their results. While reporting value in their health-related results, few participants reported making medical or lifestyle changes.
Asunto(s)
Predisposición Genética a la Enfermedad/psicología , Pruebas Genéticas , Conocimientos, Actitudes y Práctica en Salud , Medicina de Precisión/psicología , Secuenciación Completa del Genoma , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Humanos , Masculino , Persona de Mediana Edad , Motivación , Encuestas y CuestionariosRESUMEN
The Personal Genome Project (PGP) is an effort to enroll many participants to create an open-access repository of genome, health and trait data for research. However, PGP participants are not enrolled for studying any specific traits and participants choose the phenotypes to disclose. To measure the extent and willingness and to encourage and guide participants to contribute phenotypes, we developed an algorithm to score and rank the phenotypes and participants of the PGP. The scoring algorithm calculates the participation index (P-index) for every participant, where 0 indicates no reported phenotypes and 100 indicate complete phenotype reporting. We calculated the P-index for all 5,015 participants in the PGP and they ranged from 0 to 96.7. We found that participants mainly have either high scores (P-index > 90, 29.5%) or low scores (P-index < 10, 57.8%). While, there are significantly more males than female participants (1,793 versus 1,271), females tend to have on average higher P-indexes (P = 0.015). We also reported the P-indexes of participants based on demographics and states like Missouri and Massachusetts have better P-indexes than states like Utah and Minnesota. The P-index can therefore be used as an unbiased way to measure and rank participant's phenotypic contribution towards the PGP.
Asunto(s)
Fenotipo , Algoritmos , Estudios de Cohortes , Enfermedad , Femenino , Genoma Humano , Geografía , Humanos , Masculino , Carácter Cuantitativo Heredable , Encuestas y Cuestionarios , Estados UnidosRESUMEN
BACKGROUND: Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. FINDINGS: As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics' Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics' standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. CONCLUSIONS: These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.
Asunto(s)
Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , ADN/sangre , Haplotipos , Humanos , Reproducibilidad de los ResultadosRESUMEN
BACKGROUND: Since its initiation in 2005, the Harvard Personal Genome Project has enrolled thousands of volunteers interested in publicly sharing their genome, health and trait data. Because these data are highly identifiable, we use an 'open consent' framework that purposefully excludes promises about privacy and requires participants to demonstrate comprehension prior to enrollment. DISCUSSION: Our model of non-anonymous, public genomes has led us to a highly participatory model of researcher-participant communication and interaction. The participants, who are highly committed volunteers, self-pursue and donate research-relevant datasets, and are actively engaged in conversations with both our staff and other Personal Genome Project participants. We have quantitatively assessed these communications and donations, and report our experiences with returning research-grade whole genome data to participants. We also observe some of the community growth and discussion that has occurred related to our project. SUMMARY: We find that public non-anonymous data is valuable and leads to a participatory research model, which we encourage others to consider. The implementation of this model is greatly facilitated by web-based tools and methods and participant education. Project results are long-term proactive participant involvement and the growth of a community that benefits both researchers and participants.
RESUMEN
Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the Personal Genome Project choose to forgo privacy via our institutional review board- approved "open consent" process. The contribution of public data and samples facilitates both scientific discovery and standardization of methods. We present our findings after enrollment of more than 1,800 participants, including whole-genome sequencing of 10 pilot participant genomes (the PGP-10). We introduce the Genome-Environment-Trait Evidence (GET-Evidence) system. This tool automatically processes genomes and prioritizes both published and novel variants for interpretation. In the process of reviewing the presumed healthy PGP-10 genomes, we find numerous literature references implying serious disease. Although it is sometimes impossible to rule out a late-onset effect, stringent evidence requirements can address the high rate of incidental findings. To that end we develop a peer production system for recording and organizing variant evaluations according to standard evidence guidelines, creating a public forum for reaching consensus on interpretation of clinically relevant variants. Genome analysis becomes a two-step process: using a prioritized list to record variant evaluations, then automatically sorting reviewed variants using these annotations. Genome data, health and trait information, participant samples, and variant interpretations are all shared in the public domain-we invite others to review our results using our participant samples and contribute to our interpretations. We offer our public resource and methods to further personalized medical research.
Asunto(s)
Bases de Datos Genéticas , Variación Genética , Genoma Humano/genética , Fenotipo , Medicina de Precisión/métodos , Programas Informáticos , Línea Celular , Recolección de Datos , Humanos , Medicina de Precisión/tendencias , Análisis de Secuencia de ADNRESUMEN
Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (< 1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.
Asunto(s)
Análisis Mutacional de ADN/métodos , Genes Sintéticos , Variación Genética , Estudio de Asociación del Genoma Completo/métodos , Trombofilia/genética , Alelos , Secuencia de Bases , Femenino , Predisposición Genética a la Enfermedad , Genoma Humano , Genotipo , Haplotipos , Humanos , Masculino , Linaje , Estándares de Referencia , Medición de Riesgo , Alineación de Secuencia , Análisis de Secuencia de ADNRESUMEN
DNA methylation has been traditionally viewed as a highly stable epigenetic mark in postmitotic cells. However, postnatal brains appear to show stimulus-induced methylation changes, at least in a few identified CpG dinucleotides. How extensively the neuronal DNA methylome is regulated by neuronal activity is unknown. Using a next-generation sequencing-based method for genome-wide analysis at single-nucleotide resolution, we quantitatively compared the CpG methylation landscape of adult mouse dentate granule neurons in vivo before and after synchronous neuronal activation. About 1.4% of 219,991 CpGs measured showed rapid active demethylation or de novo methylation. Some modifications remained stable for at least 24 h. These activity-modified CpGs showed a broad genomic distribution with significant enrichment in low-CpG density regions, and were associated with brain-specific genes related to neuronal plasticity. Our study implicates modification of the neuronal DNA methylome as a previously underappreciated mechanism for activity-dependent epigenetic regulation in the adult nervous system.
Asunto(s)
Metilación de ADN/fisiología , Hipocampo/citología , Hipocampo/fisiología , Neuronas/fisiología , Animales , Mapeo Cromosómico/métodos , Islas de CpG/genética , Islas de CpG/fisiología , Epigenómica/métodos , Regulación de la Expresión Génica/genética , Regulación de la Expresión Génica/fisiología , Genómica/métodos , Ratones , Datos de Secuencia Molecular , Actividad Motora/genética , Condicionamiento Físico Animal , Estadística como AsuntoRESUMEN
Studies of epigenetic modifications would benefit from improved methods for high-throughput methylation profiling. We introduce two complementary approaches that use next-generation sequencing technology to detect cytosine methylation. In the first method, we designed approximately 10,000 bisulfite padlock probes to profile approximately 7,000 CpG locations distributed over the ENCODE pilot project regions and applied them to human B-lymphocytes, fibroblasts and induced pluripotent stem cells. This unbiased choice of targets takes advantage of existing expression and chromatin immunoprecipitation data and enabled us to observe a pattern of low promoter methylation and high gene-body methylation in highly expressed genes. The second method, methyl-sensitive cut counting, generated nontargeted genome-scale data for approximately 1.4 million HpaII sites in the DNA of B-lymphocytes and confirmed that gene-body methylation in highly expressed genes is a consistent phenomenon throughout the human genome. Our observations highlight the usefulness of techniques that are not inherently or intentionally biased towards particular subsets like CpG islands or promoter regions.