Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
1.
Genome Res ; 27(1): 157-164, 2017 01.
Article in English | MEDLINE | ID: mdl-27903644

ABSTRACT

Improvement of variant calling in next-generation sequence data requires a comprehensive, genome-wide catalog of high-confidence variants called in a set of genomes for use as a benchmark. We generated deep, whole-genome sequence data of 17 individuals in a three-generation pedigree and called variants in each genome using a range of currently available algorithms. We used haplotype transmission information to create a phased "Platinum" variant catalog of 4.7 million single-nucleotide variants (SNVs) plus 0.7 million small (1-50 bp) insertions and deletions (indels) that are consistent with the pattern of inheritance in the parents and 11 children of this pedigree. Platinum genotypes are highly concordant with the current catalog of the National Institute of Standards and Technology for both SNVs (>99.99%) and indels (99.92%) and add a validated truth catalog that has 26% more SNVs and 45% more indels. Analysis of 334,652 SNVs that were consistent between informatics pipelines yet inconsistent with haplotype transmission ("nonplatinum") revealed that the majority of these variants are de novo and cell-line mutations or reside within previously unidentified duplications and deletions. The reference materials from this study are a resource for objective assessment of the accuracy of variant calls throughout genomes.


Subject(s)
Genome, Human/genetics , Genomics , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Algorithms , Databases, Genetic , Exome/genetics , Genotype , Humans , INDEL Mutation/genetics , Pedigree , Polymorphism, Single Nucleotide , Software
2.
Mol Syst Biol ; 11(12): 852, 2015 Dec 23.
Article in English | MEDLINE | ID: mdl-26700852

ABSTRACT

Mammalian chromosomes fold into arrays of megabase-sized topologically associating domains (TADs), which are arranged into compartments spanning multiple megabases of genomic DNA. TADs have internal substructures that are often cell type specific, but their higher-order organization remains elusive. Here, we investigate TAD higher-order interactions with Hi-C through neuronal differentiation and show that they form a hierarchy of domains-within-domains (metaTADs) extending across genomic scales up to the range of entire chromosomes. We find that TAD interactions are well captured by tree-like, hierarchical structures irrespective of cell type. metaTAD tree structures correlate with genetic, epigenomic and expression features, and structural tree rearrangements during differentiation are linked to transcriptional state changes. Using polymer modelling, we demonstrate that hierarchical folding promotes efficient chromatin packaging without the loss of contact specificity, highlighting a role far beyond the simple need for packing efficiency.


Subject(s)
Chromatin/chemistry , Chromosomes/chemistry , Mouse Embryonic Stem Cells/cytology , Neurons/cytology , Transcription, Genetic , Animals , Cell Differentiation , Cells, Cultured , Chromatin Assembly and Disassembly , Epigenesis, Genetic , Gene Expression Regulation , Mice
3.
J Comput Chem ; 34(22): 1881-9, 2013 Aug 15.
Article in English | MEDLINE | ID: mdl-23703289

ABSTRACT

Coarse-grained protein structure models offer increased efficiency in structural modeling, but these must be coupled with fast and accurate methods to revert to a full-atom structure. Here, we present a novel algorithm to reconstruct mainchain models from C traces. This has been parameterized by fitting Gaussian mixture models (GMMs) to short backbone fragments centered on idealized peptide bonds. The method we have developed is statistically significantly more accurate than several competing methods, both in terms of RMSD values and dihedral angle differences. The method produced Ramachandran dihedral angle distributions that are closer to that observed in real proteins and better Phaser molecular replacement log-likelihood gains. Amino acid residue sidechain reconstruction accuracy using SCWRL4 was found to be statistically significantly correlated to backbone reconstruction accuracy. Finally, the PD2 method was found to produce significantly lower energy full-atom models using Rosetta which has implications for multiscale protein modeling using coarse-grained models. A webserver and C++ source code is freely available for noncommercial use from: http://www.sbg.bio.ic.ac.uk/phyre2/PD2_ca2main/.


Subject(s)
Algorithms , Carbon/chemistry , Molecular Dynamics Simulation , Proteins/chemistry , Software , Protein Conformation
4.
Nucleic Acids Res ; 39(Database issue): D141-5, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21062808

ABSTRACT

The Rfam database aims to catalogue non-coding RNAs through the use of sequence alignments and statistical profile models known as covariance models. In this contribution, we discuss the pros and cons of using the online encyclopedia, Wikipedia, as a source of community-derived annotation. We discuss the addition of groupings of related RNA families into clans and new developments to the website. Rfam is available on the Web at http://rfam.sanger.ac.uk.


Subject(s)
Databases, Nucleic Acid , RNA, Untranslated/chemistry , Encyclopedias as Topic , Models, Statistical , Nucleic Acid Conformation , RNA, Untranslated/classification , Sequence Alignment , Sequence Analysis, RNA
6.
Nat Biotechnol ; 37(5): 555-560, 2019 05.
Article in English | MEDLINE | ID: mdl-30858580

ABSTRACT

Standardized benchmarking approaches are required to assess the accuracy of variants called from sequence data. Although variant-calling tools and the metrics used to assess their performance continue to improve, important challenges remain. Here, as part of the Global Alliance for Genomics and Health (GA4GH), we present a benchmarking framework for variant calling. We provide guidance on how to match variant calls with different representations, define standard performance metrics, and stratify performance by variant type and genome context. We describe limitations of high-confidence calls and regions that can be used as truth sets (for example, single-nucleotide variant concordance of two methods is 99.7% inside versus 76.5% outside high-confidence regions). Our web-based app enables comparison of variant calls against truth sets to obtain a standardized performance report. Our approach has been piloted in the PrecisionFDA variant-calling challenges to identify the best-in-class variant-calling methods within high-confidence regions. Finally, we recommend a set of best practices for using our tools and evaluating the results.


Subject(s)
Benchmarking , Exome/genetics , Genome, Human/genetics , High-Throughput Nucleotide Sequencing , Algorithms , Genomics/trends , Germ Cells , Humans , Polymorphism, Single Nucleotide/genetics , Software
7.
Nat Biotechnol ; 37(5): 567, 2019 05.
Article in English | MEDLINE | ID: mdl-30899106

ABSTRACT

In the version of this article initially published online, two pairs of headings were switched with each other in Table 4: "Recall (PCR free)" was switched with "Recall (with PCR)," and "Precision (PCR free)" was switched with "Precision (with PCR)." The error has been corrected in the print, PDF and HTML versions of this article.

8.
Elife ; 62017 09 26.
Article in English | MEDLINE | ID: mdl-28949289

ABSTRACT

Sonic hedgehog (Shh) expression in the limb bud organizing centre called the zone of polarizing activity is regulated by the ZRS enhancer. Here, we examine in mouse and in a mouse limb-derived cell line the dynamic events that activate and restrict the spatial activity of the ZRS. Fibroblast growth factor (FGF) signalling in the distal limb primes the ZRS at early embryonic stages maintaining a poised, but inactive state broadly across the distal limb mesenchyme. The E26 transformation-specific transcription factor, ETV4, which is induced by FGF signalling and acts as a repressor of ZRS activity, interacts with the histone deacetylase HDAC2 and ensures that the poised ZRS remains transcriptionally inactive. Conversely, GABPα, an activator of the ZRS, recruits p300, which is associated with histone acetylation (H3K27ac) indicative of an active enhancer. Hence, the primed but inactive state of the ZRS is induced by FGF signalling and in combination with balanced histone modification events establishes the restricted, active enhancer responsible for patterning the limb bud during development.


Subject(s)
Chromatin/metabolism , Extremities/embryology , Fibroblast Growth Factors/metabolism , Hedgehog Proteins/metabolism , Histones/metabolism , Protein Processing, Post-Translational , Proto-Oncogene Proteins c-ets/metabolism , Acetylation , Animals , Histone Deacetylase 2/metabolism , Mice , Protein Binding
9.
Genome Biol ; 16: 110, 2015 May 27.
Article in English | MEDLINE | ID: mdl-26013771

ABSTRACT

BACKGROUND: Interphase chromosomes adopt a hierarchical structure, and recent data have characterized their chromatin organization at very different scales, from sub-genic regions associated with DNA-binding proteins at the order of tens or hundreds of bases, through larger regions with active or repressed chromatin states, up to multi-megabase-scale domains associated with nuclear positioning, replication timing and other qualities. However, we have lacked detailed, quantitative models to understand the interactions between these different strata. RESULTS: Here we collate large collections of matched locus-level chromatin features and Hi-C interaction data, representing higher-order organization, across three human cell types. We use quantitative modeling approaches to assess whether locus-level features are sufficient to explain higher-order structure, and identify the most influential underlying features. We identify structurally variable domains between cell types and examine the underlying features to discover a general association with cell-type-specific enhancer activity. We also identify the most prominent features marking the boundaries of two types of higher-order domains at different scales: topologically associating domains and nuclear compartments. We find parallel enrichments of particular chromatin features for both types, including features associated with active promoters and the architectural proteins CTCF and YY1. CONCLUSIONS: We show that integrative modeling of large chromatin dataset collections using random forests can generate useful insights into chromosome structure. The models produced recapitulate known biological features of the cell types involved, allow exploration of the antecedents of higher-order structures and generate testable hypotheses for further experimental studies.


Subject(s)
Chromatin/genetics , Models, Molecular , CCCTC-Binding Factor , Cell Line , DNA Replication Timing , DNA-Binding Proteins/genetics , Databases, Genetic , Epigenesis, Genetic , Genetic Loci , Humans , K562 Cells , Multigene Family , Repressor Proteins/genetics , Repressor Proteins/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL