Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
2.
Bioinformatics ; 37(17): 2563-2569, 2021 Sep 09.
Article in English | MEDLINE | ID: mdl-33693556

ABSTRACT

MOTIVATION: The processing of k-mers (subsequences of length k) is at the foundation of many sequence processing algorithms in bioinformatics, including k-mer counting for genome size estimation, genome assembly, and taxonomic classification for metagenomics. Minimizers-ordered m-mers where m < k-are often used to group k-mers into bins as a first step in such processing. However, minimizers are known to generate bins of very different sizes, which can pose challenges for distributed and parallel processing, as well as generally increase memory requirements. Furthermore, although various minimizer orderings have been proposed, their practical value for improving tool efficiency has not yet been fully explored. RESULTS: We present Discount, a distributed k-mer counting tool based on Apache Spark, which we use to investigate the behaviour of various minimizer orderings in practice when applied to metagenomics data. Using this tool, we then introduce the universal frequency ordering, a new combination of frequency-sampled minimizers and universal k-mer hitting sets, which yields both evenly distributed binning and small bin sizes. We show that this ordering allows Discount to perform distributed k-mer counting on a large dataset in as little as 1/8 of the memory of comparable approaches, making it the most efficient out-of-core distributed k-mer counting method available. AVAILABILITY AND IMPLEMENTATION: Discount is GPL licensed and available at https://github.com/jtnystrom/discount. The data underlying this article are available in the article and in its online supplementary material. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

3.
Heliyon ; 6(8): e04618, 2020 Aug.
Article in English | MEDLINE | ID: mdl-32904262

ABSTRACT

Multi-omics analyses, combining transcriptomics, genomics, proteomics, and so on, have led to important insights in many areas of biology and medicine. To support these analyses, software that can handle the difficulties associated with multi-omics datasets is crucial. Here, we describe Panomicon, a web-based, interactive analysis environment for multi-omics data. Building on Toxygates, a tool previously created to study single-omics data that features interactive clustering, heatmaps, and user data uploads, Panomicon introduces improvements for the storage and handling of additional omics types, as well as tools for the generation and visualization of interaction networks between different types of omics data. Panomicon is a new type of environment for the collaborative study of multi-omics data, both for users uploading data to our server and for groups wishing to host their own deployment of Panomicon. We demonstrate Panomicon's capabilities by revisiting a microRNA-mRNA interaction networks study in a non-small cell lung cancer dataset.

4.
Genome Biol ; 19(1): 112, 2018 08 17.
Article in English | MEDLINE | ID: mdl-30115128

ABSTRACT

BACKGROUND: Numerous scaffold-level sequences for wheat are now being released and, in this context, we report on a strategy for improving the overall assembly to a level comparable to that of the human genome. RESULTS: Using chromosome 7A of wheat as a model, sequence-finished megabase-scale sections of this chromosome were established by combining a new independent assembly using a bacterial artificial chromosome (BAC)-based physical map, BAC pool paired-end sequencing, chromosome-arm-specific mate-pair sequencing and Bionano optical mapping with the International Wheat Genome Sequencing Consortium RefSeq v1.0 sequence and its underlying raw data. The combined assembly results in 18 super-scaffolds across the chromosome. The value of finished genome regions is demonstrated for two approximately 2.5 Mb regions associated with yield and the grain quality phenotype of fructan carbohydrate grain levels. In addition, the 50 Mb centromere region analysis incorporates cytological data highlighting the importance of non-sequence data in the assembly of this complex genome region. CONCLUSIONS: Sufficient genome sequence information is shown to now be available for the wheat community to produce sequence-finished releases of each chromosome of the reference genome. The high-level completion identified that an array of seven fructosyl transferase genes underpins grain quality and that yield attributes are affected by five F-box-only-protein-ubiquitin ligase domain and four root-specific lipid transfer domain genes. The completed sequence also includes the centromere.


Subject(s)
Agriculture , Genome, Plant , Optical Phenomena , Physical Chromosome Mapping/methods , Triticum/genetics , Centromere/metabolism , Chromosomes, Artificial, Bacterial/genetics , Chromosomes, Plant/genetics , Fructans/analysis , Seeds/genetics
5.
Sci Rep ; 7(1): 1390, 2017 05 03.
Article in English | MEDLINE | ID: mdl-28469246

ABSTRACT

Toxygates was originally released as a user-friendly interface to enhance the accessibility of the large-scale toxicogenomics database, Open TG-GATEs, generated by the Japanese Toxicogenomics Project. Since the original release, significant new functionality has been added to enable users to perform sophisticated computational analysis with only modest bioinformatics skills. The new features include an orthologous mode for data comparison among different species, interactive clustering and heatmap visualisation, enrichment analysis of gene sets, and user data uploading. In a case study, we use these new functions to study the hepatotoxicity of peroxisome proliferator-activated receptor alpha (PPARα) agonist WY-14643. Our findings suggest that WY-14643 caused hypertrophy in the bile duct by intracellular Ca2+ dysregulation, which resulted in the induction of genes in a non-canonical WNT/Ca2+ signalling pathway. With this new release of Toxygates, we provide a suite of tools that allow anyone to carry out in-depth analysis of toxicogenomics in Open TG-GATEs, and of any other dataset that is uploaded.


Subject(s)
Computational Biology/methods , Toxicogenetics/methods , Animals , Bile Ducts/drug effects , Bile Ducts/metabolism , Cluster Analysis , Databases, Factual , Drug Discovery , Gene Expression , Humans , Hypertrophy/chemically induced , Hypertrophy/genetics , PPAR alpha/agonists , Pyrimidines/toxicity , Signal Transduction/drug effects , Software , User-Computer Interface
6.
PLoS One ; 9(6): e99030, 2014.
Article in English | MEDLINE | ID: mdl-24918583

ABSTRACT

Prioritising candidate genes for further experimental characterisation is an essential, yet challenging task in biomedical research. One way of achieving this goal is to identify specific biological themes that are enriched within the gene set of interest to obtain insights into the biological phenomena under study. Biological pathway data have been particularly useful in identifying functional associations of genes and/or gene sets. However, biological pathway information as compiled in varied repositories often differs in scope and content, preventing a more effective and comprehensive characterisation of gene sets. Here we describe a new approach to constructing biologically coherent gene sets from pathway data in major public repositories and employing them for functional analysis of large gene sets. We first revealed significant overlaps in gene content between different pathways and then defined a clustering method based on the shared gene content and the similarity of gene overlap patterns. We established the biological relevance of the constructed pathway clusters using independent quantitative measures and we finally demonstrated the effectiveness of the constructed pathway clusters in comparative functional enrichment analysis of gene sets associated with diverse human diseases gathered from the literature. The pathway clusters and gene mappings have been integrated into the TargetMine data warehouse and are likely to provide a concise, manageable and biologically relevant means of functional analysis of gene sets and to facilitate candidate gene prioritisation.


Subject(s)
Biomedical Research , Cluster Analysis
7.
Bioinformatics ; 29(23): 3080-6, 2013 Dec 01.
Article in English | MEDLINE | ID: mdl-24048354

ABSTRACT

MOTIVATION: In early stage drug development, it is desirable to assess the toxicity of compounds as quickly as possible. Biomarker genes can help predict whether a candidate drug will adversely affect a given individual, but they are often difficult to discover. In addition, the mechanism of toxicity of many drugs and common compounds is not yet well understood. The Japanese Toxicogenomics Project provides a large database of systematically collected microarray samples from rats (liver, kidney and primary hepatocytes) and human cells (primary hepatocytes) after exposure to 170 different compounds in different dosages and at different time intervals. However, until now, no intuitive user interface has been publically available, making it time consuming and difficult for individual researchers to explore the data. RESULTS: We present Toxygates, a user-friendly integrated analysis platform for this database. Toxygates combines a large microarray dataset with the ability to fetch semantic linked data, such as pathways, compound-protein interactions and orthologs, on demand. It can also perform pattern-based compound ranking with respect to the expression values of a set of relevant candidate genes. By using Toxygates, users can freely interrogate the transcriptome's response to particular compounds and conditions, which enables deep exploration of toxicity mechanisms.


Subject(s)
Biomarkers/analysis , Databases, Factual , Gene Expression Regulation/drug effects , Software , Toxicogenetics , Animals , Dose-Response Relationship, Drug , Glutathione/metabolism , Hepatocytes/drug effects , Humans , Kidney/drug effects , Liver/drug effects , Oligonucleotide Array Sequence Analysis/methods , Rats , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...