RESUMO
Polycomb group (PcG) proteins are required for the epigenetic maintenance of developmental genes in a silent state. Proteins in the Polycomb-repressive complex 1 (PRC1) class of the PcG are conserved from flies to humans and inhibit transcription. One hypothesis for PRC1 mechanism is that it compacts chromatin, based in part on electron microscopy experiments demonstrating that Drosophila PRC1 compacts nucleosomal arrays. We show that this function is conserved between Drosophila and mouse PRC1 complexes and requires a region with an overrepresentation of basic amino acids. While the active region is found in the Posterior Sex Combs (PSC) subunit in Drosophila, it is unexpectedly found in a different PRC1 subunit, a Polycomb homolog called M33, in mice. We provide experimental support for the general importance of a charged region by predicting the compacting capability of PcG proteins from species other than Drosophila and mice and by testing several of these proteins using solution assays and microscopy. We infer that the ability of PcG proteins to compact chromatin in vitro can be predicted by the presence of domains of high positive charge and that PRC1 components from a variety of species conserve this highly charged region. This supports the hypothesis that compaction is a key aspect of PcG function.
Assuntos
Cromatina/metabolismo , Proteínas Repressoras/química , Proteínas Repressoras/metabolismo , Animais , Linhagem Celular , Sequência Conservada/genética , Drosophila melanogaster/classificação , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Evolução Molecular , Camundongos , Mutação , Filogenia , Complexo Repressor Polycomb 1 , Proteínas do Grupo Polycomb , Proteínas Repressoras/genética , Relação Estrutura-AtividadeRESUMO
BACKGROUND: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. RESULTS: This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. CONCLUSIONS: Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects.
Assuntos
Biologia Computacional , Comportamento Cooperativo , Software , Comunicação , InternetRESUMO
Modern DNA sequencing technologies enable geneticists to rapidly identify genetic variation among many human genomes. However, isolating the minority of variants underlying disease remains an important, yet formidable challenge for medical genetics. We have developed GEMINI (GEnome MINIng), a flexible software package for exploring all forms of human genetic variation. Unlike existing tools, GEMINI integrates genetic variation with a diverse and adaptable set of genome annotations (e.g., dbSNP, ENCODE, UCSC, ClinVar, KEGG) into a unified database to facilitate interpretation and data exploration. Whereas other methods provide an inflexible set of variant filters or prioritization methods, GEMINI allows researchers to compose complex queries based on sample genotypes, inheritance patterns, and both pre-installed and custom genome annotations. GEMINI also provides methods for ad hoc queries and data exploration, a simple programming interface for custom analyses that leverage the underlying database, and both command line and graphical tools for common analyses. We demonstrate GEMINI's utility for exploring variation in personal genomes and family based genetic studies, and illustrate its ability to scale to studies involving thousands of human samples. GEMINI is designed for reproducibility and flexibility and our goal is to provide researchers with a standard framework for medical genomics.
Assuntos
Bases de Dados Genéticas , Variação Genética , Genoma Humano , Genômica/métodos , Software , Mineração de Dados , Genótipo , HumanosRESUMO
Long noncoding RNAs (lncRNAs) have important regulatory roles and can function at the level of chromatin. To determine where lncRNAs bind to chromatin, we developed capture hybridization analysis of RNA targets (CHART), a hybridization-based technique that specifically enriches endogenous RNAs along with their targets from reversibly cross-linked chromatin extracts. CHART was used to enrich the DNA and protein targets of endogenous lncRNAs from flies and humans. This analysis was extended to genome-wide mapping of roX2, a well-studied ncRNA involved in dosage compensation in Drosophila. CHART revealed that roX2 binds at specific genomic sites that coincide with the binding sites of proteins from the male-specific lethal complex that affects dosage compensation. These results reveal the genomic targets of roX2 and demonstrate how CHART can be used to study RNAs in a manner analogous to chromatin immunoprecipitation for proteins.
Assuntos
Proteínas de Drosophila/genética , Drosophila/genética , Genômica , RNA não Traduzido/genética , Proteínas de Ligação a RNA/genética , Motivos de Aminoácidos , Animais , Sítios de Ligação , Cromatina/química , Cromatina/genética , Imunoprecipitação da Cromatina , Mecanismo Genético de Compensação de Dose , Masculino , Modelos Genéticos , Hibridização de Ácido Nucleico , Ribonuclease H/químicaRESUMO
BACKGROUND: Ongoing innovation in phylogenetics and evolutionary biology has been accompanied by a proliferation of software tools, data formats, analytical techniques and web servers. This brings with it the challenge of integrating phylogenetic and other related biological data found in a wide variety of formats, and underlines the need for reusable software that can read, manipulate and transform this information into the various forms required to build computational pipelines. RESULTS: We built a Python software library for working with phylogenetic data that is tightly integrated with Biopython, a broad-ranging toolkit for computational biology. Our library, Bio.Phylo, is highly interoperable with existing libraries, tools and standards, and is capable of parsing common file formats for phylogenetic trees, performing basic transformations and manipulations, attaching rich annotations, and visualizing trees. We unified the modules for working with the standard file formats Newick, NEXUS and phyloXML behind a consistent and simple API, providing a common set of functionality independent of the data source. CONCLUSIONS: Bio.Phylo meets a growing need in bioinformatics for working with heterogeneous types of phylogenetic data. By supporting interoperability with multiple file formats and leveraging existing Biopython features, this library simplifies the construction of phylogenetic workflows. We also provide examples of the benefits of building a community around a shared open-source project. Bio.Phylo is included with Biopython, available through the Biopython website, http://biopython.org.
Assuntos
Filogenia , Software , Biologia Computacional/métodosRESUMO
The engineering of biological components has been facilitated by de novo synthesis of gene-length DNA. Biological engineering at the level of pathways and genomes, however, requires a scalable and cost-effective assembly of DNA molecules that are longer than approximately 10 kb, and this remains a challenge. Here we present the development of pairwise selection assembly (PSA), a process that involves hierarchical construction of long-length DNA through the use of a standard set of components and operations. In PSA, activation tags at the termini of assembly sub-fragments are reused throughout the assembly process to activate vector-encoded selectable markers. Marker activation enables stringent selection for a correctly assembled product in vivo, often obviating the need for clonal isolation. Importantly, construction via PSA is sequence-independent, and does not require primary sequence modification (e.g. the addition or removal of restriction sites). The utility of PSA is demonstrated in the construction of a completely synthetic 91-kb chromosome arm from Saccharomyces cerevisiae.
Assuntos
DNA/síntese química , Engenharia Genética/métodos , Saccharomyces cerevisiae/genética , Sequência de Bases , Cromossomos Fúngicos , DNA/químicaRESUMO
SUMMARY: The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. AVAILABILITY: Biopython is freely available, with documentation and source code at (www.biopython.org) under the Biopython license.
Assuntos
Biologia Computacional/métodos , Software , Bases de Dados Factuais , Internet , Linguagens de ProgramaçãoRESUMO
Recent genome-wide association studies identified the angiotensin-converting enzyme gene (ACE) as an Alzheimer's disease (AD) risk locus. However, the pathogenic mechanism by which ACE causes AD is unknown. Using whole-genome sequencing, we identified rare ACE coding variants in AD families and investigated one, ACE1 R1279Q, in knockin (KI) mice. Similar to AD, ACE1 was increased in neurons, but not microglia or astrocytes, of KI brains, which became elevated further with age. Angiotensin II (angII) and angII receptor AT1R signaling were also increased in KI brains. Autosomal dominant neurodegeneration and neuroinflammation occurred with aging in KI hippocampus, which were absent in the cortex and cerebellum. Female KI mice exhibited greater hippocampal electroencephalograph disruption and memory impairment compared to males. ACE variant effects were more pronounced in female KI mice, suggesting a mechanism for higher AD risk in women. Hippocampal neurodegeneration was completely rescued by treatment with brain-penetrant drugs that inhibit ACE1 and AT1R. Although ACE variant-induced neurodegeneration did not depend on ß-amyloid (Aß) pathology, amyloidosis in 5XFAD mice crossed to KI mice accelerated neurodegeneration and neuroinflammation, whereas Aß deposition was unchanged. KI mice had normal blood pressure and cerebrovascular functions. Our findings strongly suggest that increased ACE1/angII signaling causes aging-dependent, Aß-accelerated selective hippocampal neuron vulnerability and female susceptibility, hallmarks of AD that have hitherto been enigmatic. We conclude that repurposed brain-penetrant ACE inhibitors and AT1R blockers may protect against AD.
Assuntos
Doença de Alzheimer , Doença de Alzheimer/tratamento farmacológico , Doença de Alzheimer/genética , Peptídeos beta-Amiloides/metabolismo , Inibidores da Enzima Conversora de Angiotensina/farmacologia , Inibidores da Enzima Conversora de Angiotensina/uso terapêutico , Animais , Encéfalo/metabolismo , Modelos Animais de Doenças , Feminino , Estudo de Associação Genômica Ampla , Masculino , Camundongos , Camundongos TransgênicosRESUMO
With the advent of whole genome-sequencing (WGS) studies, family-based designs enable sex-specific analysis approaches that can be applied to only affected individuals; tests using family-based designs are attractive because they are completely robust against the effects of population substructure. These advantages make family-based association tests (FBATs) that use siblings as well as parents especially suited for the analysis of late-onset diseases such as Alzheimer's Disease (AD). However, the application of FBATs to assess sex-specific effects can require additional filtering steps, as sensitivity to sequencing errors is amplified in this type of analysis. Here, we illustrate the implementation of robust analysis approaches and additional filtering steps that can minimize the chances of false positive-findings due to sex-specific sequencing errors. We apply this approach to two family-based AD datasets and identify four novel loci (GRID1, RIOK3, MCPH1, ZBTB7C) showing sex-specific association with AD risk. Following stringent quality control filtering, the strongest candidate is ZBTB7C (Pinter = 1.83 × 10-7), in which the minor allele of rs1944572 confers increased risk for AD in females and protection in males. ZBTB7C encodes the Zinc Finger and BTB Domain Containing 7C, a transcriptional repressor of membrane metalloproteases (MMP). Members of this MMP family were implicated in AD neuropathology.
Assuntos
Doença de Alzheimer/genética , Análise de Dados , Bases de Dados Genéticas , Família , Estudos de Associação Genética , Loci Gênicos/genética , Estudo de Associação Genômica Ampla , Peptídeos e Proteínas de Sinalização Intracelular/genética , Sequenciamento Completo do Genoma , Alelos , Domínio BTB-POZ/genética , Feminino , Humanos , Masculino , Metaloproteases/genética , Metaloproteases/metabolismo , Risco , Fatores Sexuais , Dedos de Zinco/genéticaRESUMO
Genome duplication is potentially a good source of new genes, but such genes take time to evolve. We have found a group of "duplication-resistant" genes, which have undergone convergent restoration to singleton status following several independent genome duplications. Restoration of duplication-resistant genes to singleton status could be important to long-term survival of a polyploid lineage. Angiosperms show more frequent polyploidization and a higher degree of duplicate gene preservation than other paleopolyploids, making them well-suited to further study of duplication-resistant genes.
Assuntos
Arabidopsis/genética , Duplicação Gênica , Oryza/genética , Saccharomyces/genética , Tetraodontiformes/genética , Animais , Evolução Molecular , Genes de Plantas , Genoma Fúngico , Poliploidia , Estrutura Terciária de ProteínaRESUMO
Standardized benchmarking approaches are required to assess the accuracy of variants called from sequence data. Although variant-calling tools and the metrics used to assess their performance continue to improve, important challenges remain. Here, as part of the Global Alliance for Genomics and Health (GA4GH), we present a benchmarking framework for variant calling. We provide guidance on how to match variant calls with different representations, define standard performance metrics, and stratify performance by variant type and genome context. We describe limitations of high-confidence calls and regions that can be used as truth sets (for example, single-nucleotide variant concordance of two methods is 99.7% inside versus 76.5% outside high-confidence regions). Our web-based app enables comparison of variant calls against truth sets to obtain a standardized performance report. Our approach has been piloted in the PrecisionFDA variant-calling challenges to identify the best-in-class variant-calling methods within high-confidence regions. Finally, we recommend a set of best practices for using our tools and evaluating the results.
Assuntos
Benchmarking , Exoma/genética , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala , Algoritmos , Genômica/tendências , Células Germinativas , Humanos , Polimorfismo de Nucleotídeo Único/genética , SoftwareRESUMO
In the version of this article initially published online, two pairs of headings were switched with each other in Table 4: "Recall (PCR free)" was switched with "Recall (with PCR)," and "Precision (PCR free)" was switched with "Precision (with PCR)." The error has been corrected in the print, PDF and HTML versions of this article.
RESUMO
The cereal species, of central importance to our diet, began to diverge 50-70 million years ago. For the past few thousand years, these species have undergone largely parallel selection regimes associated with domestication and improvement. The rice genome sequence provides a platform for organizing information about diverse cereals, and together with genetic maps and sequence samples from other cereals is yielding new insights into both the shared and the independent dimensions of cereal evolution. New data and population-based approaches are identifying genes that have been involved in cereal improvement. Reduced-representation sequencing promises to accelerate gene discovery in many large-genome cereals, and to better link the under-explored genomes of 'orphan' cereals with state-of-the-art knowledge.
Assuntos
Grão Comestível/genética , Evolução Molecular , Genoma , Modelos Genéticos , Seleção Genética , Agricultura , Arqueologia , Cruzamento , Variação Genética , Genética Populacional , Oryza/genéticaRESUMO
Sensitivity of short read DNA-sequencing for gene fusion detection is improving, but is hampered by the significant amount of noise composed of uninteresting or false positive hits in the data. In this paper we describe a tiered prioritisation approach to extract high impact gene fusion events from existing structural variant calls. Using cell line and patient DNA sequence data we improve the annotation and interpretation of structural variant calls to best highlight likely cancer driving fusions. We also considerably improve on the automated visualisation of the high impact structural variants to highlight the effects of the variants on the resulting transcripts. The resulting framework greatly improves on readily detecting clinically actionable structural variants.
RESUMO
The importance of angiosperms to sustaining humanity by providing a wide range of 'ecosystem services' warrants increased exploration of their genomic diversity. The nearly completed sequences for two species representing the major angiosperm subclasses, specifically the dicot Arabidopsis thaliana and the monocot Oryza sativa, provide a foundation for comparative analysis across the angiosperms. The angiosperms also exemplify some challenges to be faced as genomics makes new inroads into describing biotic diversity, in particular polyploidy (genome-wide chromatin duplication), and much larger genome sizes than have been studied to date.
Assuntos
Variação Genética , Genoma de Planta , Magnoliopsida/genética , Arabidopsis/genética , Cromossomos , Genes de Plantas , Oryza/genética , PoliploidiaRESUMO
OBJECTIVE: To evaluate the utility of targeted exome sequencing for the molecular diagnosis of mitochondrial disorders, which exhibit marked phenotypic and genetic heterogeneity. METHODS: We considered a diverse set of 102 patients with suspected mitochondrial disorders based on clinical, biochemical, and/or molecular findings, and whose disease ranged from mild to severe, with varying age at onset. We sequenced the mitochondrial genome (mtDNA) and the exons of 1,598 nuclear-encoded genes implicated in mitochondrial biology, mitochondrial disease, or monogenic disorders with phenotypic overlap. We prioritized variants likely to underlie disease and established molecular diagnoses in accordance with current clinical genetic guidelines. RESULTS: Targeted exome sequencing yielded molecular diagnoses in established disease loci in 22% of cases, including 17 of 18 (94%) with prior molecular diagnoses and 5 of 84 (6%) without. The 5 new diagnoses implicated 2 genes associated with canonical mitochondrial disorders (NDUFV1, POLG2), and 3 genes known to underlie other neurologic disorders (DPYD, KARS, WFS1), underscoring the phenotypic and biochemical overlap with other inborn errors. We prioritized variants in an additional 26 patients, including recessive, X-linked, and mtDNA variants that were enriched 2-fold over background and await further support of pathogenicity. In one case, we modeled patient mutations in yeast to provide evidence that recessive mutations in ATP5A1 can underlie combined respiratory chain deficiency. CONCLUSION: The results demonstrate that targeted exome sequencing is an effective alternative to the sequential testing of mtDNA and individual nuclear genes as part of the investigation of mitochondrial disease. Our study underscores the ongoing challenge of variant interpretation in the clinical setting.
Assuntos
DNA Mitocondrial/genética , Exoma/genética , Marcação de Genes/métodos , Doenças Mitocondriais/diagnóstico , Doenças Mitocondriais/genética , Análise de Sequência de DNA/métodos , Adolescente , Adulto , Sequência de Aminoácidos , Criança , Pré-Escolar , Feminino , Predisposição Genética para Doença , Humanos , Lactente , Recém-Nascido , Masculino , Pessoa de Meia-Idade , Dados de Sequência Molecular , Linhagem , Adulto JovemRESUMO
The preferential in vitro interaction of the PHD finger of RAG2, a subunit of the V(D)J recombinase, with histone H3 tails simultaneously trimethylated at lysine 4 and symmetrically dimethylated at arginine 2 (H3R2me2sK4me3) predicted the existence of the previously unknown histone modification H3R2me2s. Here, we report the in vivo identification of H3R2me2s . Consistent with the binding specificity of the RAG2 PHD finger, high levels of H3R2me2sK4me3 are found at antigen receptor gene segments ready for rearrangement. However, this double modification is much more general; it is conserved throughout eukaryotic evolution. In mouse, H3R2me2s is tightly correlated with H3K4me3 at active promoters throughout the genome. Mutational analysis in S. cerevisiae reveals that deposition of H3R2me2s requires the same Set1 complex that deposits H3K4me3. Our work suggests that H3R2me2sK4me3, not simply H3K4me3 alone, is the mark of active promoters and that factors that recognize H3K4me3 will have their binding modulated by their preference for H3R2me2s.
Assuntos
Arginina/metabolismo , Eucariotos/genética , Eucariotos/metabolismo , Genoma/genética , Histonas/metabolismo , Lisina/metabolismo , Animais , Sequência Conservada/genética , Evolução Molecular , Loci Gênicos/genética , Histona-Lisina N-Metiltransferase/metabolismo , Metilação , Camundongos , RNA Interferente Pequeno/metabolismo , Receptores de Antígenos/imunologia , Recombinação Genética/genética , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismoRESUMO
Engineered biosynthetic pathways have the potential to produce high-value molecules from inexpensive feedstocks, but a key limitation is engineering enzymes with high activity and specificity for new reactions. Here, we developed a method for combining structure-based computational protein design with library-based enzyme screening, in which inter-residue correlations favored by the design are encoded into a defined-sequence library. We validated this approach by engineering a glucose 6-oxidase enzyme for use in a proposed pathway to convert D-glucose into D-glucaric acid. The most active variant, identified after only one round of diversification and screening of only 10,000 wells, is approximately 400-fold more active on glucose than is the wild-type enzyme. We anticipate that this strategy will be broadly applicable to the discovery of new enzymes for engineered biological pathways.