Search | VHL Search Portal

1.

Whole-genome sequencing reveals new Alzheimer's disease-associated rare variants in loci related to synaptic function and neuronal development.

Prokopenko, Dmitry; Morgan, Sarah L; Mullin, Kristina; Hofmann, Oliver; Chapman, Brad; Kirchner, Rory; Amberkar, Sandeep; Wohlers, Inken; Lange, Christoph; Hide, Winston; Bertram, Lars; Tanzi, Rudolph E.

Alzheimers Dement ; 17(9): 1509-1527, 2021 09.

Article in English | MEDLINE | ID: mdl-33797837

ABSTRACT

INTRODUCTION: Genome-wide association studies have led to numerous genetic loci associated with Alzheimer's disease (AD). Whole-genome sequencing (WGS) now permits genome-wide analyses to identify rare variants contributing to AD risk. METHODS: We performed single-variant and spatial clustering-based testing on rare variants (minor allele frequency [MAF] ≤1%) in a family-based WGS-based association study of 2247 subjects from 605 multiplex AD families, followed by replication in 1669 unrelated individuals. RESULTS: We identified 13 new AD candidate loci that yielded consistent rare-variant signals in discovery and replication cohorts (4 from single-variant, 9 from spatial-clustering), implicating these genes: FNBP1L, SEL1L, LINC00298, PRKCH, C15ORF41, C2CD3, KIF2A, APC, LHX9, NALCN, CTNNA2, SYTL3, and CLSTN2. DISCUSSION: Downstream analyses of these novel loci highlight synaptic function, in contrast to common AD-associated variants, which implicate innate immunity and amyloid processing. These loci have not been associated previously with AD, emphasizing the ability of WGS to identify AD-associated rare variants, particularly outside of the exome.

Subject(s)

Alzheimer Disease/genetics , Gene Frequency/genetics , Genetic Predisposition to Disease , Whole Genome Sequencing , Genome-Wide Association Study , Humans , Ion Channels/genetics , Kinesins/genetics , Membrane Proteins/genetics , Microtubule-Associated Proteins/genetics , Proteins/genetics

2.

Clonal dynamics of native haematopoiesis.

Sun, Jianlong; Ramos, Azucena; Chapman, Brad; Johnnidis, Jonathan B; Le, Linda; Ho, Yu-Jui; Klein, Allon; Hofmann, Oliver; Camargo, Fernando D.

Nature ; 514(7522): 322-7, 2014 Oct 16.

Article in English | MEDLINE | ID: mdl-25296256

ABSTRACT

It is currently thought that life-long blood cell production is driven by the action of a small number of multipotent haematopoietic stem cells. Evidence supporting this view has been largely acquired through the use of functional assays involving transplantation. However, whether these mechanisms also govern native non-transplant haematopoiesis is entirely unclear. Here we have established a novel experimental model in mice where cells can be uniquely and genetically labelled in situ to address this question. Using this approach, we have performed longitudinal analyses of clonal dynamics in adult mice that reveal unprecedented features of native haematopoiesis. In contrast to what occurs following transplantation, steady-state blood production is maintained by the successive recruitment of thousands of clones, each with a minimal contribution to mature progeny. Our results demonstrate that a large number of long-lived progenitors, rather than classically defined haematopoietic stem cells, are the main drivers of steady-state haematopoiesis during most of adulthood. Our results also have implications for understanding the cellular origin of haematopoietic disease.

Subject(s)

Cell Lineage , Clone Cells/cytology , Hematopoiesis , Animals , Cellular Senescence , Clone Cells/metabolism , DNA Transposable Elements/genetics , Female , Genetic Markers/genetics , Hematopoietic Stem Cell Transplantation , Hematopoietic Stem Cells/cytology , Hematopoietic Stem Cells/metabolism , Male , Mice , Myelopoiesis , Staining and Labeling , Time Factors

3.

Silencing of the Drosophila ortholog of SOX5 leads to abnormal neuronal development and behavioral impairment.

Li, Airong; Hooli, Basavaraj; Mullin, Kristina; Tate, Rebecca E; Bubnys, Adele; Kirchner, Rory; Chapman, Brad; Hofmann, Oliver; Hide, Winston; Tanzi, Rudolph E.

Hum Mol Genet ; 26(8): 1472-1482, 2017 04 15.

Article in English | MEDLINE | ID: mdl-28186563

ABSTRACT

SOX5 encodes a transcription factor that is expressed in multiple tissues including heart, lung and brain. Mutations in SOX5 have been previously found in patients with amyotrophic lateral sclerosis (ALS) and developmental delay, intellectual disability and dysmorphic features. To characterize the neuronal role of SOX5, we silenced the Drosophila ortholog of SOX5, Sox102F, by RNAi in various neuronal subtypes in Drosophila. Silencing of Sox102F led to misorientated and disorganized michrochaetes, neurons with shorter dendritic arborization (DA) and reduced complexity, diminished larval peristaltic contractions, loss of neuromuscular junction bouton structures, impaired olfactory perception, and severe neurodegeneration in brain. Silencing of SOX5 in human SH-SY5Y neuroblastoma cells resulted in a significant repression of WNT signaling activity and altered expression of WNT-related genes. Genetic association and meta-analyses of the results in several large family-based and case-control late-onset familial Alzheimer's disease (LOAD) samples of SOX5 variants revealed several variants that show significant association with AD disease status. In addition, analysis for rare and highly penetrate functional variants revealed four novel variants/mutations in SOX5, which taken together with functional prediction analysis, suggests a strong role of SOX5 causing AD in the carrier families. Collectively, these findings indicate that SOX5 is a novel candidate gene for LOAD with an important role in neuronal function. The genetic findings warrant further studies to identify and characterize SOX5 variants that confer risk for AD, ALS and intellectual disability.

Subject(s)

Alzheimer Disease/genetics , Amyotrophic Lateral Sclerosis/genetics , Developmental Disabilities/genetics , Drosophila Proteins/genetics , SOXD Transcription Factors/genetics , Alzheimer Disease/pathology , Amyotrophic Lateral Sclerosis/pathology , Animals , Developmental Disabilities/pathology , Drosophila/genetics , Gene Silencing , Genetic Association Studies , Humans , Neuromuscular Junction/genetics , Neuromuscular Junction/pathology , Neuronal Plasticity/genetics , Neurons/metabolism , Neurons/pathology , RNA Interference , Wnt Signaling Pathway/genetics

4.

Identification of small RNA pathway genes using patterns of phylogenetic conservation and divergence.

Tabach, Yuval; Billi, Allison C; Hayes, Gabriel D; Newman, Martin A; Zuk, Or; Gabel, Harrison; Kamath, Ravi; Yacoby, Keren; Chapman, Brad; Garcia, Susana M; Borowsky, Mark; Kim, John K; Ruvkun, Gary.

Nature ; 493(7434): 694-8, 2013 Jan 31.

Article in English | MEDLINE | ID: mdl-23364702

ABSTRACT

Genetic and biochemical analyses of RNA interference (RNAi) and microRNA (miRNA) pathways have revealed proteins such as Argonaute and Dicer as essential cofactors that process and present small RNAs to their targets. Well-validated small RNA pathway cofactors such as these show distinctive patterns of conservation or divergence in particular animal, plant, fungal and protist species. We compared 86 divergent eukaryotic genome sequences to discern sets of proteins that show similar phylogenetic profiles with known small RNA cofactors. A large set of additional candidate small RNA cofactors have emerged from functional genomic screens for defects in miRNA- or short interfering RNA (siRNA)-mediated repression in Caenorhabditis elegans and Drosophila melanogaster, and from proteomic analyses of proteins co-purifying with validated small RNA pathway proteins. The phylogenetic profiles of many of these candidate small RNA pathway proteins are similar to those of known small RNA cofactor proteins. We used a Bayesian approach to integrate the phylogenetic profile analysis with predictions from diverse transcriptional coregulation and proteome interaction data sets to assign a probability for each protein for a role in a small RNA pathway. Testing high-confidence candidates from this analysis for defects in RNAi silencing, we found that about one-half of the predicted small RNA cofactors are required for RNAi silencing. Many of the newly identified small RNA pathway proteins are orthologues of proteins implicated in RNA splicing. In support of a deep connection between the mechanism of RNA splicing and small-RNA-mediated gene silencing, the presence of the Argonaute proteins and other small RNA components in the many species analysed strongly correlates with the number of introns in those species.

Subject(s)

Caenorhabditis elegans/genetics , Genetic Variation , Phylogeny , RNA, Small Interfering/genetics , Animals , Caenorhabditis elegans/classification , Caenorhabditis elegans Proteins/genetics , Eukaryota/classification , Eukaryota/genetics , Genome/genetics , MicroRNAs/genetics , Proteome , RNA Splicing

5.

Compaction of chromatin by diverse Polycomb group proteins requires localized regions of high charge.

Grau, Daniel J; Chapman, Brad A; Garlick, Joe D; Borowsky, Mark; Francis, Nicole J; Kingston, Robert E.

Genes Dev ; 25(20): 2210-21, 2011 Oct 15.

Article in English | MEDLINE | ID: mdl-22012622

ABSTRACT

Polycomb group (PcG) proteins are required for the epigenetic maintenance of developmental genes in a silent state. Proteins in the Polycomb-repressive complex 1 (PRC1) class of the PcG are conserved from flies to humans and inhibit transcription. One hypothesis for PRC1 mechanism is that it compacts chromatin, based in part on electron microscopy experiments demonstrating that Drosophila PRC1 compacts nucleosomal arrays. We show that this function is conserved between Drosophila and mouse PRC1 complexes and requires a region with an overrepresentation of basic amino acids. While the active region is found in the Posterior Sex Combs (PSC) subunit in Drosophila, it is unexpectedly found in a different PRC1 subunit, a Polycomb homolog called M33, in mice. We provide experimental support for the general importance of a charged region by predicting the compacting capability of PcG proteins from species other than Drosophila and mice and by testing several of these proteins using solution assays and microscopy. We infer that the ability of PcG proteins to compact chromatin in vitro can be predicted by the presence of domains of high positive charge and that PRC1 components from a variety of species conserve this highly charged region. This supports the hypothesis that compaction is a key aspect of PcG function.

Subject(s)

Chromatin/metabolism , Repressor Proteins/chemistry , Repressor Proteins/metabolism , Animals , Cell Line , Conserved Sequence/genetics , Drosophila melanogaster/classification , Drosophila melanogaster/genetics , Drosophila melanogaster/metabolism , Evolution, Molecular , Mice , Mutation , Phylogeny , Polycomb Repressive Complex 1 , Polycomb-Group Proteins , Repressor Proteins/genetics , Structure-Activity Relationship

6.

VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research.

Lai, Zhongwu; Markovets, Aleksandra; Ahdesmaki, Miika; Chapman, Brad; Hofmann, Oliver; McEwen, Robert; Johnson, Justin; Dougherty, Brian; Barrett, J Carl; Dry, Jonathan R.

Nucleic Acids Res ; 44(11): e108, 2016 06 20.

Article in English | MEDLINE | ID: mdl-27060149

ABSTRACT

Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research.

Subject(s)

Computational Biology/methods , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Software , Alleles , Gene Frequency , Genetic Variation , Humans , INDEL Mutation , Loss of Heterozygosity , Lung Neoplasms/genetics , Neoplasms/genetics , ROC Curve , Research

7.

The 2015 Bioinformatics Open Source Conference (BOSC 2015).

Harris, Nomi L; Cock, Peter J A; Lapp, Hilmar; Chapman, Brad; Davey, Rob; Fields, Christopher; Hokamp, Karsten; Munoz-Torres, Monica.

PLoS Comput Biol ; 12(2): e1004691, 2016 Feb.

Article in English | MEDLINE | ID: mdl-26914653

ABSTRACT

The Bioinformatics Open Source Conference (BOSC) is organized by the Open Bioinformatics Foundation (OBF), a nonprofit group dedicated to promoting the practice and philosophy of open source software development and open science within the biological research community. Since its inception in 2000, BOSC has provided bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community. BOSC offers a focused environment for developers and users to interact and share ideas about standards; software development practices; practical techniques for solving bioinformatics problems; and approaches that promote open science and sharing of data, results, and software. BOSC is run as a two-day special interest group (SIG) before the annual Intelligent Systems in Molecular Biology (ISMB) conference. BOSC 2015 took place in Dublin, Ireland, and was attended by over 125 people, about half of whom were first-time attendees. Session topics included "Data Science;" "Standards and Interoperability;" "Open Science and Reproducibility;" "Translational Bioinformatics;" "Visualization;" and "Bioinformatics Open Source Project Updates". In addition to two keynote talks and dozens of shorter talks chosen from submitted abstracts, BOSC 2015 included a panel, titled "Open Source, Open Door: Increasing Diversity in the Bioinformatics Open Source Community," that provided an opportunity for open discussion about ways to increase the diversity of participants in BOSC in particular, and in open source bioinformatics in general. The complete program of BOSC 2015 is available online at http://www.open-bio.org/wiki/BOSC_2015_Schedule.

Subject(s)

Computational Biology/organization & administration , Congresses as Topic , Humans , Ireland

8.

Deep targeted sequencing of 12 breast cancer susceptibility regions in 4611 women across four different ethnicities.

Lindström, Sara; Ablorh, Akweley; Chapman, Brad; Gusev, Alexander; Chen, Gary; Turman, Constance; Eliassen, A Heather; Price, Alkes L; Henderson, Brian E; Le Marchand, Loic; Hofmann, Oliver; Haiman, Christopher A; Kraft, Peter.

Breast Cancer Res ; 18(1): 109, 2016 11 05.

Article in English | MEDLINE | ID: mdl-27814745

ABSTRACT

BACKGROUND: Although genome-wide association studies (GWASs) have identified thousands of disease susceptibility regions, the underlying causal mechanism in these regions is not fully known. It is likely that the GWAS signal originates from one or many as yet unidentified causal variants. METHODS: Using next-generation sequencing, we characterized 12 breast cancer susceptibility regions identified by GWASs in 2288 breast cancer cases and 2323 controls across four populations of African American, European, Japanese, and Hispanic ancestry. RESULTS: After genotype calling and quality control, we identified 137,530 single-nucleotide variants (SNVs); of those, 87.2 % had a minor allele frequency (MAF) <0.005. For SNVs with MAF >0.005, we calculated the smallest number of SNVs needed to obtain a posterior probability set (PPS) such that there is 90 % probability that the causal SNV is included. We found that the PPS for two regions, 2q35 and 11q13, contained less than 5 % of the original SNVs, dramatically decreasing the number of potentially causal SNVs. However, we did not find strong evidence supporting a causal role for any individual SNV. In addition, there were no significant gene-based rare SNV associations after correcting for multiple testing. CONCLUSIONS: This study illustrates some of the challenges faced in fine-mapping studies in the post-GWAS era, most importantly the large sample sizes needed to identify rare-variant associations or to distinguish the effects of strongly correlated common SNVs.

Subject(s)

Breast Neoplasms/genetics , Ethnicity/genetics , Genetic Predisposition to Disease , High-Throughput Nucleotide Sequencing , Adult , Case-Control Studies , Female , Gene Frequency , Genetic Variation , Genome-Wide Association Study , Humans , Middle Aged , Molecular Sequence Annotation , Nurses , Open Reading Frames , Polymorphism, Single Nucleotide

9.

Bioconda: sustainable and comprehensive software distribution for the life sciences.

Grüning, Björn; Dale, Ryan; Sjödin, Andreas; Chapman, Brad A; Rowe, Jillian; Tomkins-Tinch, Christopher H; Valieris, Renan; Köster, Johannes.

Nat Methods ; 15(7): 475-476, 2018 07.

Article in English | MEDLINE | ID: mdl-29967506

Subject(s)

Software , Computational Biology , User-Computer Interface

10.

Community-driven development for computational biology at Sprints, Hackathons and Codefests.

Möller, Steffen; Afgan, Enis; Banck, Michael; Bonnal, Raoul J P; Booth, Timothy; Chilton, John; Cock, Peter J A; Gumbel, Markus; Harris, Nomi; Holland, Richard; Kalas, Matús; Kaján, László; Kibukawa, Eri; Powel, David R; Prins, Pjotr; Quinn, Jacqueline; Sallou, Olivier; Strozzi, Francesco; Seemann, Torsten; Sloggett, Clare; Soiland-Reyes, Stian; Spooner, William; Steinbiss, Sascha; Tille, Andreas; Travis, Anthony J; Guimera, Roman; Katayama, Toshiaki; Chapman, Brad A.

BMC Bioinformatics ; 15 Suppl 14: S7, 2014.

Article in English | MEDLINE | ID: mdl-25472764

ABSTRACT

BACKGROUND: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. RESULTS: This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. CONCLUSIONS: Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects.

Subject(s)

Computational Biology , Cooperative Behavior , Software , Communication , Internet

11.

GEMINI: integrative exploration of genetic variation and genome annotations.

Paila, Umadevi; Chapman, Brad A; Kirchner, Rory; Quinlan, Aaron R.

PLoS Comput Biol ; 9(7): e1003153, 2013.

Article in English | MEDLINE | ID: mdl-23874191

ABSTRACT

Modern DNA sequencing technologies enable geneticists to rapidly identify genetic variation among many human genomes. However, isolating the minority of variants underlying disease remains an important, yet formidable challenge for medical genetics. We have developed GEMINI (GEnome MINIng), a flexible software package for exploring all forms of human genetic variation. Unlike existing tools, GEMINI integrates genetic variation with a diverse and adaptable set of genome annotations (e.g., dbSNP, ENCODE, UCSC, ClinVar, KEGG) into a unified database to facilitate interpretation and data exploration. Whereas other methods provide an inflexible set of variant filters or prioritization methods, GEMINI allows researchers to compose complex queries based on sample genotypes, inheritance patterns, and both pre-installed and custom genome annotations. GEMINI also provides methods for ad hoc queries and data exploration, a simple programming interface for custom analyses that leverage the underlying database, and both command line and graphical tools for common analyses. We demonstrate GEMINI's utility for exploring variation in personal genomes and family based genetic studies, and illustrate its ability to scale to studies involving thousands of human samples. GEMINI is designed for reproducibility and flexibility and our goal is to provide researchers with a standard framework for medical genomics.

Subject(s)

Databases, Genetic , Genetic Variation , Genome, Human , Genomics/methods , Software , Data Mining , Genotype , Humans

12.

The Stem Cell Discovery Engine: an integrated repository and analysis system for cancer stem cell comparisons.

Ho Sui, Shannan J; Begley, Kimberly; Reilly, Dorothy; Chapman, Brad; McGovern, Ray; Rocca-Sera, Philippe; Maguire, Eamonn; Altschuler, Gabriel M; Hansen, Terah A A; Sompallae, Ramakrishna; Krivtsov, Andrei; Shivdasani, Ramesh A; Armstrong, Scott A; Culhane, Aedín C; Correll, Mick; Sansone, Susanna-Assunta; Hofmann, Oliver; Hide, Winston.

Nucleic Acids Res ; 40(Database issue): D984-91, 2012 Jan.

Article in English | MEDLINE | ID: mdl-22121217

ABSTRACT

Mounting evidence suggests that malignant tumors are initiated and maintained by a subpopulation of cancerous cells with biological properties similar to those of normal stem cells. However, descriptions of stem-like gene and pathway signatures in cancers are inconsistent across experimental systems. Driven by a need to improve our understanding of molecular processes that are common and unique across cancer stem cells (CSCs), we have developed the Stem Cell Discovery Engine (SCDE)-an online database of curated CSC experiments coupled to the Galaxy analytical framework. The SCDE allows users to consistently describe, share and compare CSC data at the gene and pathway level. Our initial focus has been on carefully curating tissue and cancer stem cell-related experiments from blood, intestine and brain to create a high quality resource containing 53 public studies and 1098 assays. The experimental information is captured and stored in the multi-omics Investigation/Study/Assay (ISA-Tab) format and can be queried in the data repository. A linked Galaxy framework provides a comprehensive, flexible environment populated with novel tools for gene list comparisons against molecular signatures in GeneSigDB and MSigDB, curated experiments in the SCDE and pathways in WikiPathways. The SCDE is available at http://discovery.hsci.harvard.edu.

Subject(s)

Databases, Genetic , Neoplastic Stem Cells/metabolism , Animals , Gene Expression Profiling , Humans , Mice , Systems Integration

13.

The genomic binding sites of a noncoding RNA.

Simon, Matthew D; Wang, Charlotte I; Kharchenko, Peter V; West, Jason A; Chapman, Brad A; Alekseyenko, Artyom A; Borowsky, Mark L; Kuroda, Mitzi I; Kingston, Robert E.

Proc Natl Acad Sci U S A ; 108(51): 20497-502, 2011 Dec 20.

Article in English | MEDLINE | ID: mdl-22143764

ABSTRACT

Long noncoding RNAs (lncRNAs) have important regulatory roles and can function at the level of chromatin. To determine where lncRNAs bind to chromatin, we developed capture hybridization analysis of RNA targets (CHART), a hybridization-based technique that specifically enriches endogenous RNAs along with their targets from reversibly cross-linked chromatin extracts. CHART was used to enrich the DNA and protein targets of endogenous lncRNAs from flies and humans. This analysis was extended to genome-wide mapping of roX2, a well-studied ncRNA involved in dosage compensation in Drosophila. CHART revealed that roX2 binds at specific genomic sites that coincide with the binding sites of proteins from the male-specific lethal complex that affects dosage compensation. These results reveal the genomic targets of roX2 and demonstrate how CHART can be used to study RNAs in a manner analogous to chromatin immunoprecipitation for proteins.

Subject(s)

Drosophila Proteins/genetics , Drosophila/genetics , Genomics , RNA, Untranslated/genetics , RNA-Binding Proteins/genetics , Amino Acid Motifs , Animals , Binding Sites , Chromatin/chemistry , Chromatin/genetics , Chromatin Immunoprecipitation , Dosage Compensation, Genetic , Male , Models, Genetic , Nucleic Acid Hybridization , Ribonuclease H/chemistry

14.

CloudMan as a platform for tool, data, and analysis distribution.

Afgan, Enis; Chapman, Brad; Taylor, James.

BMC Bioinformatics ; 13: 315, 2012 Nov 27.

Article in English | MEDLINE | ID: mdl-23181507

ABSTRACT

BACKGROUND: Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way. RESULTS: CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations. CONCLUSIONS: With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions.

Subject(s)

Information Storage and Retrieval , Software

15.

Bio.Phylo: a unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython.

Talevich, Eric; Invergo, Brandon M; Cock, Peter J A; Chapman, Brad A.

BMC Bioinformatics ; 13: 209, 2012 Aug 21.

Article in English | MEDLINE | ID: mdl-22909249

ABSTRACT

BACKGROUND: Ongoing innovation in phylogenetics and evolutionary biology has been accompanied by a proliferation of software tools, data formats, analytical techniques and web servers. This brings with it the challenge of integrating phylogenetic and other related biological data found in a wide variety of formats, and underlines the need for reusable software that can read, manipulate and transform this information into the various forms required to build computational pipelines. RESULTS: We built a Python software library for working with phylogenetic data that is tightly integrated with Biopython, a broad-ranging toolkit for computational biology. Our library, Bio.Phylo, is highly interoperable with existing libraries, tools and standards, and is capable of parsing common file formats for phylogenetic trees, performing basic transformations and manipulations, attaching rich annotations, and visualizing trees. We unified the modules for working with the standard file formats Newick, NEXUS and phyloXML behind a consistent and simple API, providing a common set of functionality independent of the data source. CONCLUSIONS: Bio.Phylo meets a growing need in bioinformatics for working with heterogeneous types of phylogenetic data. By supporting interoperability with multiple file formats and leveraging existing Biopython features, this library simplifies the construction of phylogenetic workflows. We also provide examples of the benefits of building a community around a shared open-source project. Bio.Phylo is included with Biopython, available through the Biopython website, http://biopython.org.

Subject(s)

Phylogeny , Software , Computational Biology/methods

16.

Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community.

Krampis, Konstantinos; Booth, Tim; Chapman, Brad; Tiwari, Bela; Bicak, Mesude; Field, Dawn; Nelson, Karen E.

BMC Bioinformatics ; 13: 42, 2012 Mar 19.

Article in English | MEDLINE | ID: mdl-22429538

ABSTRACT

BACKGROUND: A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. RESULTS: Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. CONCLUSIONS: Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them.

Subject(s)

Computing Methodologies , Genomics/methods , Animals , Computers , Humans , Sequence Alignment , Software

17.

Pairwise selection assembly for sequence-independent construction of long-length DNA.

Blake, William J; Chapman, Brad A; Zindal, Anuradha; Lee, Michael E; Lippow, Shaun M; Baynes, Brian M.

Nucleic Acids Res ; 38(8): 2594-602, 2010 May.

Article in English | MEDLINE | ID: mdl-20194119

ABSTRACT

The engineering of biological components has been facilitated by de novo synthesis of gene-length DNA. Biological engineering at the level of pathways and genomes, however, requires a scalable and cost-effective assembly of DNA molecules that are longer than approximately 10 kb, and this remains a challenge. Here we present the development of pairwise selection assembly (PSA), a process that involves hierarchical construction of long-length DNA through the use of a standard set of components and operations. In PSA, activation tags at the termini of assembly sub-fragments are reused throughout the assembly process to activate vector-encoded selectable markers. Marker activation enables stringent selection for a correctly assembled product in vivo, often obviating the need for clonal isolation. Importantly, construction via PSA is sequence-independent, and does not require primary sequence modification (e.g. the addition or removal of restriction sites). The utility of PSA is demonstrated in the construction of a completely synthetic 91-kb chromosome arm from Saccharomyces cerevisiae.

Subject(s)

DNA/chemical synthesis , Genetic Engineering/methods , Saccharomyces cerevisiae/genetics , Base Sequence , Chromosomes, Fungal , DNA/chemistry

18.

The Bioinformatics Open Source Conference (BOSC) 2013.

Harris, Nomi L; Cock, Peter J A; Chapman, Brad A; Goecks, Jeremy; Hotz, Hans-Rudolf; Lapp, Hilmar.

Bioinformatics ; 31(2): 299-300, 2015 Jan 15.

Article in English | MEDLINE | ID: mdl-25024288

Subject(s)

Biomedical Research/trends , Computational Biology/trends , Genomics/methods , Congresses as Topic , Humans

19.

Galaxy CloudMan: delivering cloud compute clusters.

Afgan, Enis; Baker, Dannon; Coraor, Nate; Chapman, Brad; Nekrutenko, Anton; Taylor, James.

BMC Bioinformatics ; 11 Suppl 12: S4, 2010 Dec 21.

Article in English | MEDLINE | ID: mdl-21210983

ABSTRACT

BACKGROUND: Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is "cloud computing", which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate "as is" use by experimental biologists. RESULTS: We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. CONCLUSIONS: The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.

Subject(s)

Computational Biology/methods , Software , Cluster Analysis , Internet

20.

Biopython: freely available Python tools for computational molecular biology and bioinformatics.

Cock, Peter J A; Antao, Tiago; Chang, Jeffrey T; Chapman, Brad A; Cox, Cymon J; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J L.

Bioinformatics ; 25(11): 1422-3, 2009 Jun 01.

Article in English | MEDLINE | ID: mdl-19304878

ABSTRACT

SUMMARY: The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. AVAILABILITY: Biopython is freely available, with documentation and source code at (www.biopython.org) under the Biopython license.

Subject(s)

Computational Biology/methods , Software , Databases, Factual , Internet , Programming Languages

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL