Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
Nat Methods ; 10(8): 774-80, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23852450

ABSTRACT

Transcriptional enhancers are a primary mechanism by which tissue-specific gene expression is achieved. Despite the importance of these regulatory elements in development, responses to environmental stresses and disease, testing enhancer activity in animals remains tedious, with a minority of enhancers having been characterized. Here we describe 'enhancer-FACS-seq' (eFS) for highly parallel identification of active, tissue-specific enhancers in Drosophila melanogaster embryos. Analysis of enhancers identified by eFS as being active in mesodermal tissues revealed enriched DNA binding site motifs of known and putative, previously uncharacterized mesodermal transcription factors. Naive Bayes classifiers using transcription factor binding site motifs accurately predicted mesodermal enhancer activity. Application of eFS to other cell types and organisms should accelerate the cataloging of enhancers and understanding how transcriptional regulation is encoded in them.


Subject(s)
Amino Acid Motifs , Drosophila melanogaster/genetics , Flow Cytometry/methods , Gene Expression Regulation, Developmental , Animals , Binding Sites , Drosophila melanogaster/embryology , Enhancer Elements, Genetic , Green Fluorescent Proteins/biosynthesis , Green Fluorescent Proteins/genetics , Mesoderm , Sequence Analysis, DNA
2.
Proc Natl Acad Sci U S A ; 109(30): 11920-7, 2012 Jul 24.
Article in English | MEDLINE | ID: mdl-22797899

ABSTRACT

Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the Personal Genome Project choose to forgo privacy via our institutional review board- approved "open consent" process. The contribution of public data and samples facilitates both scientific discovery and standardization of methods. We present our findings after enrollment of more than 1,800 participants, including whole-genome sequencing of 10 pilot participant genomes (the PGP-10). We introduce the Genome-Environment-Trait Evidence (GET-Evidence) system. This tool automatically processes genomes and prioritizes both published and novel variants for interpretation. In the process of reviewing the presumed healthy PGP-10 genomes, we find numerous literature references implying serious disease. Although it is sometimes impossible to rule out a late-onset effect, stringent evidence requirements can address the high rate of incidental findings. To that end we develop a peer production system for recording and organizing variant evaluations according to standard evidence guidelines, creating a public forum for reaching consensus on interpretation of clinically relevant variants. Genome analysis becomes a two-step process: using a prioritized list to record variant evaluations, then automatically sorting reviewed variants using these annotations. Genome data, health and trait information, participant samples, and variant interpretations are all shared in the public domain-we invite others to review our results using our participant samples and contribute to our interpretations. We offer our public resource and methods to further personalized medical research.


Subject(s)
Databases, Genetic , Genetic Variation , Genome, Human/genetics , Phenotype , Precision Medicine/methods , Software , Cell Line , Data Collection , Humans , Precision Medicine/trends , Sequence Analysis, DNA
3.
Nat Biotechnol ; 24(11): 1429-35, 2006 Nov.
Article in English | MEDLINE | ID: mdl-16998473

ABSTRACT

Transcription factors (TFs) interact with specific DNA regulatory sequences to control gene expression throughout myriad cellular processes. However, the DNA binding specificities of only a small fraction of TFs are sufficiently characterized to predict the sequences that they can and cannot bind. We present a maximally compact, synthetic DNA sequence design for protein binding microarray (PBM) experiments that represents all possible DNA sequence variants of a given length k (that is, all 'k-mers') on a single, universal microarray. We constructed such all k-mer microarrays covering all 10-base pair (bp) binding sites by converting high-density single-stranded oligonucleotide arrays to double-stranded (ds) DNA arrays. Using these microarrays we comprehensively determined the binding specificities over a full range of affinities for five TFs of different structural classes from yeast, worm, mouse and human. The unbiased coverage of all k-mers permits high-throughput interrogation of binding site preferences, including nucleotide interdependencies, at unprecedented resolution.


Subject(s)
Oligonucleotide Array Sequence Analysis/methods , Protein Binding , Transcription Factors/metabolism , Animals , Basic Helix-Loop-Helix Leucine Zipper Transcription Factors/chemistry , Binding Sites/physiology , Caenorhabditis elegans , Caenorhabditis elegans Proteins/chemistry , Early Growth Response Protein 1/chemistry , Homeodomain Proteins/chemistry , Humans , Mice , Octamer Transcription Factor-1/chemistry , Saccharomyces cerevisiae , Saccharomyces cerevisiae Proteins/chemistry , Shelterin Complex , Telomere-Binding Proteins/chemistry , Transcription Factors/chemistry
4.
Gigascience ; 5(1): 42, 2016 10 11.
Article in English | MEDLINE | ID: mdl-27724973

ABSTRACT

BACKGROUND: Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. FINDINGS: As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics' Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics' standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. CONCLUSIONS: These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.


Subject(s)
Genome, Human , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , DNA/blood , Haplotypes , Humans , Reproducibility of Results
5.
Sci Data ; 3: 160025, 2016 Jun 07.
Article in English | MEDLINE | ID: mdl-27271295

ABSTRACT

The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.


Subject(s)
Benchmarking , Genome, Human , Exome , Genomics , Humans , INDEL Mutation
6.
Science ; 370(6523): 1422-1423, 2020 12 18.
Article in English | MEDLINE | ID: mdl-33335056
7.
Genome Med ; 6(2): 10, 2014 Feb 28.
Article in English | MEDLINE | ID: mdl-24713084

ABSTRACT

BACKGROUND: Since its initiation in 2005, the Harvard Personal Genome Project has enrolled thousands of volunteers interested in publicly sharing their genome, health and trait data. Because these data are highly identifiable, we use an 'open consent' framework that purposefully excludes promises about privacy and requires participants to demonstrate comprehension prior to enrollment. DISCUSSION: Our model of non-anonymous, public genomes has led us to a highly participatory model of researcher-participant communication and interaction. The participants, who are highly committed volunteers, self-pursue and donate research-relevant datasets, and are actively engaged in conversations with both our staff and other Personal Genome Project participants. We have quantitatively assessed these communications and donations, and report our experiences with returning research-grade whole genome data to participants. We also observe some of the community growth and discussion that has occurred related to our project. SUMMARY: We find that public non-anonymous data is valuable and leads to a participatory research model, which we encourage others to consider. The implementation of this model is greatly facilitated by web-based tools and methods and participant education. Project results are long-term proactive participant involvement and the growth of a community that benefits both researchers and participants.

8.
Am J Clin Nutr ; 86(6): 1806-8; author reply 1808, 2007 Dec.
Article in English | MEDLINE | ID: mdl-18065604
9.
PLoS One ; 4(4): e5242, 2009.
Article in English | MEDLINE | ID: mdl-19370158

ABSTRACT

BACKGROUND: Calorie restriction (CR) is the only intervention known to extend lifespan in a wide range of organisms, including mammals. However, the mechanisms by which it regulates mammalian aging remain largely unknown, and the involvement of the TOR and sirtuin pathways (which regulate aging in simpler organisms) remain controversial. Additionally, females of most mammals appear to live longer than males within species; and, although it remains unclear whether this holds true for mice, the relationship between sex-biased and CR-induced gene expression remains largely unexplored. METHODOLOGY/PRINCIPAL FINDINGS: We generated microarray gene expression data from livers of male mice fed high calorie or CR diets, and we find that CR significantly changes the expression of over 3,000 genes, many between 10- and 50-fold. We compare our data to the GenAge database of known aging-related genes and to prior microarray expression data of genes expressed differently between male and female mice. CR generally feminizes gene expression and many of the most significantly changed individual genes are involved in aging, hormone signaling, and p53-associated regulation of the cell cycle and apoptosis. Among the genes showing the largest and most statistically significant CR-induced expression differences are Ddit4, a key regulator of the TOR pathway, and Nnmt, a regulator of lifespan linked to the sirtuin pathway. Using western analysis we confirmed post-translational inhibition of the TOR pathway. CONCLUSIONS: Our data show that CR induces widespread gene expression changes and acts through highly evolutionarily conserved pathways, from microorganisms to mammals, and that its life-extension effects might arise partly from a shift toward a gene expression profile more typical of females.


Subject(s)
Aging/metabolism , Caloric Restriction , Gene Expression Regulation , Longevity , Aging/genetics , Animals , Apoptosis/physiology , Carrier Proteins/metabolism , Cell Cycle/physiology , Female , Gene Expression Profiling , Hormones/physiology , Liver/metabolism , Male , Mice , Nitrosamines/metabolism , Phosphorylation , Phosphotransferases (Alcohol Group Acceptor)/metabolism , Sex Factors , Signal Transduction/physiology , Sirtuins/metabolism , TOR Serine-Threonine Kinases , Transcription Factors/metabolism , Tyramine/analogs & derivatives , Tyramine/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL