ABSTRACT
Cell division, wherein 1 cell divides into 2 daughter cells, is fundamental to all living organisms. Cytokinesis, the final step in cell division, begins with the formation of an actomyosin contractile ring, positioned midway between the segregated chromosomes. Constriction of the ring with concomitant membrane deposition in a specified spatiotemporal manner generates a cleavage furrow that physically separates the cytoplasm. Unique lipids with specific biophysical properties have been shown to localize to intercellular bridges (also called midbody) connecting the 2 dividing cells; however, their biological roles and delivery mechanisms remain largely unknown. In this study, we show that ceramide phosphoethanolamine (CPE), the structural analog of sphingomyelin, has unique acyl chain anchors in Drosophila spermatocytes and is essential for meiotic cytokinesis. The head group of CPE is also important for spermatogenesis. We find that aberrant central spindle and contractile ring behavior but not mislocalization of phosphatidylinositol phosphates (PIPs) at the plasma membrane is responsible for the male meiotic cytokinesis defect in CPE-deficient animals. Further, we demonstrate the enrichment of CPE in multivesicular bodies marked by Rab7, which in turn localize to cleavage furrow. Volume electron microscopy analysis using correlative light and focused ion beam scanning electron microscopy shows that CPE-enriched Rab7 positive endosomes are juxtaposed on contractile ring material. Correlative light and transmission electron microscopy reveal Rab7 positive endosomes as a multivesicular body-like organelle that releases its intraluminal vesicles in the vicinity of ingressing furrows. Genetic ablation of Rab7 or Rab35 or expression of dominant negative Rab11 results in significant meiotic cytokinesis defects. Further, we show that Rab11 function is required for localization of CPE positive endosomes to the cleavage furrow. Our results imply that endosomal delivery of CPE to ingressing membranes is crucial for meiotic cytokinesis.
Subject(s)
Cytokinesis , Sphingomyelins , Actomyosin/metabolism , Animals , Cytokinesis/genetics , Drosophila/genetics , Endosomes/metabolism , Male , Meiosis , Phosphatidylinositol Phosphates/metabolismABSTRACT
SUMMARY: It has been observed in different kinds of networks, such as social or biological ones, a typical behavior inspired by the general principle 'similarity breeds connections'. These networks are defined as homophilic as nodes belonging to the same class preferentially interact with each other. In this work, we present HONTO (HOmophily Network TOol), a user-friendly open-source Python3 package designed to evaluate and analyze homophily in complex networks. The tool takes in input from the network along with a partition of its nodes into classes and yields a matrix whose entries are the homophily/heterophily z-score values. To complement the analysis, the tool also provides z-score values of nodes that do not interact with any other node of the same class. Homophily/heterophily z-scores values are presented as a heatmap allowing a visual at-a-glance interpretation of results. AVAILABILITY AND IMPLEMENTATION: Tool's source code is available at https://github.com/cumbof/honto under the MIT license, installable as a package from PyPI (pip install honto) and conda-forge (conda install -c conda-forge honto), and has a wrapper for the Galaxy platform available on the official Galaxy ToolShed (Blankenberg et al., 2014) at https://toolshed.g2.bx.psu.edu/view/fabio/honto.
Subject(s)
Software , HumansABSTRACT
MOTIVATION: Pathogenic copy-number variants (CNVs) can cause a heterogeneous spectrum of rare and severe disorders. However, most CNVs are benign and are part of natural variation in human genomes. CNV pathogenicity classification, genotype-phenotype analyses, and therapeutic target identification are challenging and time-consuming tasks that require the integration and analysis of information from multiple scattered sources by experts. RESULTS: Here, we introduce the CNV-ClinViewer, an open-source web application for clinical evaluation and visual exploration of CNVs. The application enables real-time interactive exploration of large CNV datasets in a user-friendly designed interface and facilitates semi-automated clinical CNV interpretation following the ACMG guidelines by integrating the ClassifCNV tool. In combination with clinical judgment, the application enables clinicians and researchers to formulate novel hypotheses and guide their decision-making process. Subsequently, the CNV-ClinViewer enhances for clinical investigators' patient care and for basic scientists' translational genomic research. AVAILABILITY AND IMPLEMENTATION: The web application is freely available at https://cnv-ClinViewer.broadinstitute.org and the open-source code can be found at https://github.com/LalResearchGroup/CNV-clinviewer.
Subject(s)
DNA Copy Number Variations , Software , Humans , Genomics , Phenotype , Genome, HumanABSTRACT
There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments.
Subject(s)
Computational Biology , Software , Humans , Computational Biology/methods , Data Analysis , Research PersonnelABSTRACT
Bacillus anthracis Ser/Thr protein kinase PrkC is necessary for phenotypic memory and spore germination, and the loss of PrkC-dependent phosphorylation events affect the spore development. During sporulation, Bacillus sp. can store 3-Phosphoglycerate (3-PGA) that will be required at the onset of germination when ATP will be necessary. The Phosphoglycerate mutase (Pgm) catalyzes the isomerization of 2-PGA and 3-PGA and is important for spore germination as a key metabolic enzyme that maintains 3-PGA pool at later events. Therefore, regulation of Pgm is important for an efficient spore germination process and metabolic switching. While the increased expression of Pgm in B. anthracis decreases spore germination efficiency, it remains unexplored if PrkC could directly influence Pgm activity. Here, we report the phosphorylation and regulation of Pgm by PrkC and its impact on Pgm stability and catalytic activity. Mass spectrometry revealed Pgm phosphorylation on seven threonine residues. In silico mutational analysis highlighted the role of Thr459 residue towards metal and substrate binding. Altogether, we demonstrated that PrkC-mediated Pgm phosphorylation negatively regulates its activity that is essential to maintain Pgm in its apo-like isoform before germination. This study advances the role of Pgm regulation that represents an important switch for B. anthracis resumption of metabolism and spore germination.
Subject(s)
Bacillus anthracis , Protein Kinases , Phosphorylation , Protein Kinases/metabolism , Bacillus anthracis/metabolism , Phosphoglycerate Mutase/metabolism , Threonine/metabolism , Spores, Bacterial/genetics , Spores, Bacterial/metabolism , Bacterial Proteins/metabolismABSTRACT
HIV-associated cognitive dysfunction during combination antiretroviral therapy (cART) involves mitochondrial dysfunction, but the impact of contemporary cART on chronic metabolic changes in the brain and in latent HIV infection is unclear. We interrogated mitochondrial function in a human microglia (hµglia) cell line harboring inducible HIV provirus and in SH-SY5Y cells after exposure to individual antiretroviral drugs or cART, using the MitoStress assay. cART-induced changes in protein expression, reactive oxygen species (ROS) production, mitochondrial DNA copy number, and cellular iron were also explored. Finally, we evaluated the ability of ROS scavengers or plasmid-mediated overexpression of the antioxidant iron-binding protein, Fth1, to reverse mitochondrial defects. Contemporary antiretroviral drugs, particularly bictegravir, depressed multiple facets of mitochondrial function by 20-30%, with the most pronounced effects in latently infected HIV+ hµglia and SH-SY5Y cells. Latently HIV-infected hµglia exhibited upregulated glycolysis. Increases in total and/or mitochondrial ROS, mitochondrial DNA copy number, and cellular iron accompanied mitochondrial defects in hµglia and SH-SY5Y cells. In SH-SY5Y cells, cART reduced mitochondrial iron-sulfur-cluster-containing supercomplex and subunit expression and increased Nox2 expression. Fth1 overexpression or pre-treatment with N-acetylcysteine prevented cART-induced mitochondrial dysfunction. Contemporary cART impairs mitochondrial bioenergetics in hµglia and SH-SY5Y cells, partly through cellular iron accumulation; some effects differ by HIV latency.
Subject(s)
HIV Infections , Neuroblastoma , Humans , Microglia/metabolism , HIV Infections/complications , HIV Infections/drug therapy , HIV Infections/metabolism , Reactive Oxygen Species/metabolism , Neuroblastoma/metabolism , Iron/metabolism , Mitochondria/metabolism , DNA, Mitochondrial/metabolismABSTRACT
BACKGROUND: Computational methods based on initial screening and prediction of peptides for desired functions have proven to be effective alternatives to lengthy and expensive biochemical experimental methods traditionally utilized in peptide research, thus saving time and effort. However, for many researchers, the lack of expertise in utilizing programming libraries, access to computational resources, and flexible pipelines are big hurdles to adopting these advanced methods. RESULTS: To address the above mentioned barriers, we have implemented the peptide design and analysis under Galaxy (PDAUG) package, a Galaxy-based Python powered collection of tools, workflows, and datasets for rapid in-silico peptide library analysis. In contrast to existing methods like standard programming libraries or rigid single-function web-based tools, PDAUG offers an integrated GUI-based toolset, providing flexibility to build and distribute reproducible pipelines and workflows without programming expertise. Finally, we demonstrate the usability of PDAUG in predicting anticancer properties of peptides using four different feature sets and assess the suitability of various ML algorithms. CONCLUSION: PDAUG offers tools for peptide library generation, data visualization, built-in and public database peptide sequence retrieval, peptide feature calculation, and machine learning (ML) modeling. Additionally, this toolset facilitates researchers to combine PDAUG with hundreds of compatible existing Galaxy tools for limitless analytic strategies.
Subject(s)
Peptide Library , Software , Algorithms , Machine Learning , Peptides/chemistryABSTRACT
SUMMARY: Literature exploration in PubMed on a large number of biomedical entities (e.g. genes, diseases or experiments) can be time-consuming and challenging, especially when assessing associations between entities. Here, we describe SimText, a user-friendly toolset that provides customizable and systematic workflows for the analysis of similarities among a set of entities based on text. SimText can be used for (i) text collection from PubMed and extraction of words with different text mining approaches, and (ii) interactive analysis and visualization of data using unsupervised learning techniques in an interactive app. AVAILABILITY AND IMPLEMENTATION: We developed SimText as an open-source R software and integrated it into Galaxy (https://usegalaxy.eu), an online data analysis platform with supporting self-learning training material available at https://training.galaxyproject.org. A command-line version of the toolset is available for download from GitHub (https://github.com/dlal-group/simtext) or as Docker image (https://hub.docker.com/r/dlalgroup/simtext/tags.). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Data Mining , Software , Data Mining/methods , PubMed , Data Interpretation, Statistical , Data AnalysisABSTRACT
The current state of much of the Wuhan pneumonia virus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies and requires unimpeded access to data, analysis tools, and computational infrastructure. Here, we show that community efforts in developing open analytical software tools over the past 10 years, combined with national investments into scientific computational infrastructure, can overcome these deficiencies and provide an accessible platform for tackling global health emergencies in an open and transparent manner. Specifically, we use all SARS-CoV-2 genomic data available in the public domain so far to (1) underscore the importance of access to raw data and (2) demonstrate that existing community efforts in curation and deployment of biomedical software can reliably support rapid, reproducible research during global health crises. All our analyses are fully documented at https://github.com/galaxyproject/SARS-CoV-2.
Subject(s)
Betacoronavirus/pathogenicity , Coronavirus Infections/virology , Pneumonia, Viral/virology , Public Health , Severe Acute Respiratory Syndrome/virology , COVID-19 , Data Analysis , Humans , Pandemics , SARS-CoV-2ABSTRACT
Galaxy (https://galaxyproject.org) is a web-based computational workbench used by tens of thousands of scientists across the world to analyze large biomedical datasets. Since 2005, the Galaxy project has fostered a global community focused on achieving accessible, reproducible, and collaborative research. Together, this community develops the Galaxy software framework, integrates analysis tools and visualizations into the framework, runs public servers that make Galaxy available via a web browser, performs and publishes analyses using Galaxy, leads bioinformatics workshops that introduce and use Galaxy, and develops interactive training materials for Galaxy. Over the last two years, all aspects of the Galaxy project have grown: code contributions, tools integrated, users, and training materials. Key advances in Galaxy's user interface include enhancements for analyzing large dataset collections as well as interactive tools for exploratory data analysis. Extensions to Galaxy's framework include support for federated identity and access management and increased ability to distribute analysis jobs to remote resources. New community resources include large public servers in Europe and Australia, an increasing number of regional and local Galaxy communities, and substantial growth in the Galaxy Training Network.
Subject(s)
Software , Biomedical Research , Data Analysis , Datasets as Topic , Metabolomics/methods , Metagenomics/methods , Proteomics/methods , Reproducibility of Results , Single-Cell Analysis/methodsABSTRACT
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.
Subject(s)
Genome, Human , Genomic Structural Variation , Heuristics , Humans , INDEL MutationABSTRACT
Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.
Subject(s)
Genomics/statistics & numerical data , Metabolomics/statistics & numerical data , Molecular Imaging/statistics & numerical data , Proteomics/statistics & numerical data , User-Computer Interface , Datasets as Topic , Humans , Information Dissemination , International Cooperation , Internet , Reproducibility of ResultsABSTRACT
Research in population genetics and evolutionary biology has always provided a computational backbone for life sciences as a whole. Today evolutionary and population biology reasoning are essential for interpretation of large complex datasets that are characteristic of all domains of today's life sciences ranging from cancer biology to microbial ecology. This situation makes algorithms and software tools developed by our community more important than ever before. This means that we, developers of software tool for molecular evolutionary analyses, now have a shared responsibility to make these tools accessible using modern technological developments as well as provide adequate documentation and training.
Subject(s)
Biological Evolution , Computational Biology , Software/standardsABSTRACT
Complex biomedical analyses require the use of multiple software tools in concert and remain challenging for much of the biomedical research community. We introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource that currently supports the streamlined interaction of 20 bioinformatics tools and data resources. To facilitate integrative analysis by non-programmers, it offers a growing set of 'recipes', short workflows to guide investigators through high-utility analysis tasks.
Subject(s)
Algorithms , Chromosome Mapping/methods , Computational Biology/methods , Databases, Genetic , Genome, Human/genetics , Software , Data Mining , Humans , Internet , Systems IntegrationABSTRACT
High-throughput data production technologies, particularly 'next-generation' DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale.
Subject(s)
Computational Biology/statistics & numerical data , Datasets as Topic/statistics & numerical data , User-Computer Interface , Biomedical Research , Computational Biology/methods , Databases, Genetic , Humans , Internet , Reproducibility of ResultsABSTRACT
The manifestation of mitochondrial DNA (mtDNA) diseases depends on the frequency of heteroplasmy (the presence of several alleles in an individual), yet its transmission across generations cannot be readily predicted owing to a lack of data on the size of the mtDNA bottleneck during oogenesis. For deleterious heteroplasmies, a severe bottleneck may abruptly transform a benign (low) frequency in a mother into a disease-causing (high) frequency in her child. Here we present a high-resolution study of heteroplasmy transmission conducted on blood and buccal mtDNA of 39 healthy mother-child pairs of European ancestry (a total of 156 samples, each sequenced at â¼20,000× per site). On average, each individual carried one heteroplasmy, and one in eight individuals carried a disease-associated heteroplasmy, with minor allele frequency ≥1%. We observed frequent drastic heteroplasmy frequency shifts between generations and estimated the effective size of the germ-line mtDNA bottleneck at only â¼30-35 (interquartile range from 9 to 141). Accounting for heteroplasmies, we estimated the mtDNA germ-line mutation rate at 1.3 × 10(-8) (interquartile range from 4.2 × 10(-9) to 4.1 × 10(-8)) mutations per site per year, an order of magnitude higher than for nuclear DNA. Notably, we found a positive association between the number of heteroplasmies in a child and maternal age at fertilization, likely attributable to oocyte aging. This study also took advantage of droplet digital PCR (ddPCR) to validate heteroplasmies and confirm a de novo mutation. Our results can be used to predict the transmission of disease-causing mtDNA variants and illuminate evolutionary dynamics of the mitochondrial genome.
Subject(s)
DNA, Mitochondrial/genetics , Germ Cells/metabolism , Inheritance Patterns/genetics , Maternal Age , Age Factors , Child , Disease/genetics , Female , Gene Frequency/genetics , Humans , INDEL Mutation/genetics , Reproducibility of Results , Sequence Analysis, DNAABSTRACT
UNLABELLED: The Galaxy platform has developed into a fully featured collaborative workbench, with goals of inherently capturing provenance to enable reproducible data analysis, and of making it straightforward to run one's own server. However, many Galaxy platform tools rely on the presence of reference data, such as alignment indexes, to function efficiently. Until now, the building of this cache of data for Galaxy has been an error-prone manual process lacking reproducibility and provenance. The Galaxy Data Manager framework is an enhancement that changes the management of Galaxy's built-in data cache from a manual procedure to an automated graphical user interface (GUI) driven process, which contains the same openness, reproducibility and provenance that is afforded to Galaxy's analysis tools. Data Manager tools allow the Galaxy administrator to download, create and install additional datasets for any type of reference data in real time. AVAILABILITY AND IMPLEMENTATION: The Galaxy Data Manager framework is implemented in Python and has been integrated as part of the core Galaxy platform. Individual Data Manager tools can be defined locally or installed from a ToolShed, allowing the Galaxy community to define additional Data Manager tools as needed, with full versioning and dependency support.
Subject(s)
Software , Humans , Reproducibility of ResultsABSTRACT
Viral helicases are promising targets for the development of antiviral therapies. Given their vital function of unwinding double-stranded nucleic acids, inhibiting them blocks the viral replication cycle. Previous studies have elucidated key structural details of these helicases, including the location of substrate binding sites, flexible domains, and the discovery of potential inhibitors. Here we present a series of new Galaxy tools and workflows for performing and analyzing molecular dynamics simulations of viral helicases. We first validate them by demonstrating recapitulation of data from previous simulations of Zika (NS3) and SARS-CoV-2 (NSP13) helicases in apo and complex with inhibitors. We further demonstrate the utility and generalizability of these Galaxy workflows by applying them to new cases, proving their usefulness as a widely accessible method for exploring antiviral activity.