Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 206
Filter
Add more filters

Publication year range
1.
Proc Natl Acad Sci U S A ; 121(35): e2402435121, 2024 Aug 27.
Article in English | MEDLINE | ID: mdl-39159372

ABSTRACT

Firmly anchored on observational data, giant radio lobes from massive galaxies hosting supermassive black holes can exert a major negative feedback effect, by endowing the intergalactic gas with significant magnetic pressure hence retarding or preventing gas accretion onto less massive halos in the vicinity. Since massive galaxies that are largely responsible for producing the giant radio lobes, this effect is expected to be stronger in more overdense large-scale environments, such as protoclusters, than in underdense regions, such as voids. We show that by redshift [Formula: see text] halos with masses up to [Formula: see text] are significantly hindered from accreting gas due to this effect for radio bubble volume filling fraction of [Formula: see text], respectively. Since the vast majority of the stars in the universe at [Formula: see text][Formula: see text] 2 to 3 form precisely in those halos, this negative feedback process is likely one major culprit for causing the global downturn in star formation in the universe. It also provides a natural explanation for the rather sudden flattening of the slope of the galaxy rest-frame UV luminosity function around [Formula: see text]. A cross-correlation between protoclusters and Faraday rotation measures may test the predicted magnetic field. Inclusion of this external feedback process in the next generation of cosmological simulations may be imperative.

2.
Nano Lett ; 24(1): 104-113, 2024 Jan 10.
Article in English | MEDLINE | ID: mdl-37943097

ABSTRACT

Optical meron is a type of nonplanar topological texture mainly observed in surface plasmon polaritons and highly symmetric points of photonic crystals in the reciprocal space. Here, we report Poynting-vector merons formed at the real space of a photonic crystal for a Γ-point illumination. Optical merons can be utilized for subwavelength-resolution manipulation of nanoparticles, resembling a topological Hall effect on electrons via magnetic merons. In particular, staggered merons and antimerons impose strong radiation pressure on large gold nanoparticles (AuNPs), while focused hot spots in antimerons generate dominant optical gradient forces on small AuNPs. Synergistically, differently sized AuNPs in a still environment can be trapped or orbit in opposite directions, mimicking a coupled galaxy system. They can also be separated with a 10 nm precision when applying a flow velocity of >1 mm/s. Our study unravels a novel way to exploit topological textures for optical manipulation with deep-subwavelength precision and switchable topology in a lossless environment.

3.
Sensors (Basel) ; 24(14)2024 Jul 18.
Article in English | MEDLINE | ID: mdl-39066055

ABSTRACT

The purpose of this study was to examine the validity of two wearable smartwatches (the Apple Watch 6 (AW) and the Galaxy Watch 4 (GW)) and smartphone applications (Apple Health for iPhone mobiles and Samsung Health for Android mobiles) for estimating step counts in daily life. A total of 104 healthy adults (36 AW, 25 GW, and 43 smartphone application users) were engaged in daily activities for 24 h while wearing an ActivPAL accelerometer on the thigh and a smartwatch on the wrist. The validities of the smartwatch and smartphone estimates of step counts were evaluated relative to criterion values obtained from an ActivPAL accelerometer. The strongest relationship between the ActivPAL accelerometer and the devices was found for the AW (r = 0.99, p < 0.001), followed by the GW (r = 0.82, p < 0.001), and the smartphone applications (r = 0.93, p < 0.001). For overall group comparisons, the MAPE (Mean Absolute Percentage Error) values (computed as the average absolute value of the group-level errors) were 6.4%, 10.5%, and 29.6% for the AW, GW, and smartphone applications, respectively. The results of the present study indicate that the AW and GW showed strong validity in measuring steps, while the smartphone applications did not provide reliable step counts in free-living conditions.


Subject(s)
Accelerometry , Activities of Daily Living , Mobile Applications , Smartphone , Wearable Electronic Devices , Humans , Male , Female , Adult , Accelerometry/instrumentation , Accelerometry/methods , Young Adult , Monitoring, Ambulatory/methods , Monitoring, Ambulatory/instrumentation , Walking/physiology , Middle Aged
4.
BMC Bioinformatics ; 24(1): 263, 2023 Jun 23.
Article in English | MEDLINE | ID: mdl-37353753

ABSTRACT

BACKGROUND: Protein-protein interactions play a crucial role in almost all cellular processes. Identifying interacting proteins reveals insight into living organisms and yields novel drug targets for disease treatment. Here, we present a publicly available, automated pipeline to predict genome-wide protein-protein interactions and produce high-quality multimeric structural models. RESULTS: Application of our method to the Human and Yeast genomes yield protein-protein interaction networks similar in quality to common experimental methods. We identified and modeled Human proteins likely to interact with the papain-like protease of SARS-CoV2's non-structural protein 3. We also produced models of SARS-CoV2's spike protein (S) interacting with myelin-oligodendrocyte glycoprotein receptor and dipeptidyl peptidase-4. CONCLUSIONS: The presented method is capable of confidently identifying interactions while providing high-quality multimeric structural models for experimental validation. The interactome modeling pipeline is available at usegalaxy.org and usegalaxy.eu.


Subject(s)
COVID-19 , Protein Interaction Mapping , Humans , RNA, Viral/metabolism , SARS-CoV-2 , Saccharomyces cerevisiae/metabolism
5.
BMC Bioinformatics ; 24(1): 446, 2023 Nov 27.
Article in English | MEDLINE | ID: mdl-38012574

ABSTRACT

BACKGROUND: Galaxy is a web-based open-source platform for scientific analyses. Researchers use thousands of high-quality tools and workflows for their respective analyses in Galaxy. Tool recommender system predicts a collection of tools that can be used to extend an analysis. In this work, a tool recommender system is developed by training a transformer on workflows available on Galaxy Europe and its performance is compared to other neural networks such as recurrent, convolutional and dense neural networks. RESULTS: The transformer neural network achieves two times faster convergence, has significantly lower model usage (model reconstruction and prediction) time and shows a better generalisation that goes beyond training workflows than the older tool recommender system created using RNN in Galaxy. In addition, the transformer also outperforms CNN and DNN on several key indicators. It achieves a faster convergence time, lower model usage time, and higher quality tool recommendations than CNN. Compared to DNN, it converges faster to a higher precision@k metric (approximately 0.98 by transformer compared to approximately 0.9 by DNN) and shows higher quality tool recommendations. CONCLUSION: Our work shows a novel usage of transformers to recommend tools for extending scientific workflows. A more robust tool recommendation model, created using a transformer, having significantly lower usage time than RNN and CNN, higher precision@k than DNN, and higher quality tool recommendations than all three neural networks, will benefit researchers in creating scientifically significant workflows and exploratory data analysis in Galaxy. Additionally, the ability to train faster than all three neural networks imparts more scalability for training on larger datasets consisting of millions of tool sequences. Open-source scripts to create the recommendation model are available under MIT licence at https://github.com/anuprulez/galaxy_tool_recommendation_transformers.


Subject(s)
Neural Networks, Computer , Software , Workflow , Data Analysis , Europe
6.
Genes Cells ; 27(12): 706-718, 2022 Dec.
Article in English | MEDLINE | ID: mdl-36181413

ABSTRACT

Genome-editing using the CRISPR-Cas9 system has the potential to substantially accelerate crop breeding. Since off-target editing is one of problems, a reliable method for comprehensively detecting off-target sites is needed. A number of in silico methods based on homology to on-target sequence have been developed, however the prediction without false negative is still under discussion. In this study, we performed a SITE-Seq analysis to predict potential off-target sites. SITE-Seq analysis is a comprehensive method that can detect double-strand breaks in vitro. Furthermore, we developed a systematic method using SITE-Seq in combination with web-based Galaxy system (Galaxy for Cut Site Detection), which can perform reproducible analyses without command line operations. We conducted a SITE-Seq analysis of a rice genome targeted by OsFH15 gRNA-Cas9 as a model, and found 41 candidate off-target sites in the annotated regions. Detailed amplicon-sequencing revealed mutations at one off-target site in actual genome-edited rice. Since this off-target site has an uncommon protospacer adjacent motif, it is difficult to predict using in silico methods alone. Therefore, we propose a novel off-target assessment scheme for genome-edited crops that combines the prediction of off-target candidates by SITE-Seq and in silico programs and the validation of off-target sites by amplicon-sequencing.


Subject(s)
Oryza , Oryza/genetics , Internet
7.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32728687

ABSTRACT

Transcriptional switch (TS) is a widely observed phenomenon caused by changes in the relative expression of transcripts from the same gene, in spatial, temporal or other dimensions. TS has been associated with human diseases, plant development and stress responses. Its investigation is often hampered by a lack of suitable tools allowing comprehensive and flexible TS analysis for high-throughput RNA sequencing (RNA-Seq) data. Here, we present deepTS, a user-friendly web-based implementation that enables a fully interactive, multifunctional identification, visualization and analysis of TS events for large-scale RNA-Seq datasets from pairwise, temporal and population experiments. deepTS offers rich functionality to streamline RNA-Seq-based TS analysis for both model and non-model organisms and for those with or without reference transcriptome. The presented case studies highlight the capabilities of deepTS and demonstrate its potential for the transcriptome-wide TS analysis of pairwise, temporal and population RNA-Seq data. We believe deepTS will help research groups, regardless of their informatics expertise, perform accessible, reproducible and collaborative TS analyses of large-scale RNA-Seq data.


Subject(s)
Models, Genetic , RNA-Seq , RNA , Transcriptome , RNA/biosynthesis , RNA/genetics
8.
Expert Rev Proteomics ; 20(11): 251-266, 2023.
Article in English | MEDLINE | ID: mdl-37787106

ABSTRACT

INTRODUCTION: Continuous advances in mass spectrometry (MS) technologies have enabled deeper and more reproducible proteome characterization and a better understanding of biological systems when integrated with other 'omics data. Bioinformatic resources meeting the analysis requirements of increasingly complex MS-based proteomic data and associated multi-omic data are critically needed. These requirements included availability of software that would span diverse types of analyses, scalability for large-scale, compute-intensive applications, and mechanisms to ease adoption of the software. AREAS COVERED: The Galaxy ecosystem meets these requirements by offering a multitude of open-source tools for MS-based proteomics analyses and applications, all in an adaptable, scalable, and accessible computing environment. A thriving global community maintains these software and associated training resources to empower researcher-driven analyses. EXPERT OPINION: The community-supported Galaxy ecosystem remains a crucial contributor to basic biological and clinical studies using MS-based proteomics. In addition to the current status of Galaxy-based resources, we describe ongoing developments for meeting emerging challenges in MS-based proteomic informatics. We hope this review will catalyze increased use of Galaxy by researchers employing MS-based proteomics and inspire software developers to join the community and implement new tools, workflows, and associated training content that will add further value to this already rich ecosystem.


Subject(s)
Proteomics , Humans , Computational Biology/methods , Mass Spectrometry/methods , Proteomics/methods , Software
9.
Microb Cell Fact ; 22(1): 227, 2023 Nov 06.
Article in English | MEDLINE | ID: mdl-37932726

ABSTRACT

BACKGROUND: Not changing the native constitution of genes prior to their expression by a heterologous host can affect the amount of proteins synthesized as well as their folding, hampering their activity and even cell viability. Over the past decades, several strategies have been developed to optimize the translation of heterologous genes by accommodating the difference in codon usage between species. While there have been a handful of studies assessing various codon optimization strategies, to the best of our knowledge, no research has been performed towards the evaluation and comparison of codon harmonization algorithms. To highlight their importance and encourage meaningful discussion, we compared different open-source codon harmonization tools pertaining to their in silico performance, and we investigated the influence of different gene-specific factors. RESULTS: In total, 27 genes were harmonized with four tools toward two different heterologous hosts. The difference in %MinMax values between the harmonized and the original sequences was calculated (ΔMinMax), and statistical analysis of the obtained results was carried out. It became clear that not all tools perform similarly, and the choice of tool should depend on the intended application. Almost all biological factors under investigation (GC content, RNA secondary structures and choice of heterologous host) had a significant influence on the harmonization results and thus must be taken into account. These findings were substantiated using a validation dataset consisting of 8 strategically chosen genes. CONCLUSIONS: Due to the size of the dataset, no complex models could be developed. However, this initial study showcases significant differences between the results of various codon harmonization tools. Although more elaborate investigation is needed, it is clear that biological factors such as GC content, RNA secondary structures and heterologous hosts must be taken into account when selecting the codon harmonization tool.


Subject(s)
Algorithms , Proteins , Codon , Proteins/genetics , Codon Usage , Biological Factors
10.
Funct Integr Genomics ; 22(6): 1433-1448, 2022 Dec.
Article in English | MEDLINE | ID: mdl-36227427

ABSTRACT

Biological processes result from interactions among molecules and cell-to-cell communications. In the last 50 years, network theory has empowered advances in understanding molecular networks' structure and dynamics that regulate biological systems. Adopting a network data analysis point of view at more laboratories might enrich their research capacity to generate forward working hypotheses. This work briefly describes network theory origins and provides basic graph analysis principles in biological systems, specific centrality measurements, and the main models for network structures. Also, we describe a workflow employing user-friendly free platforms to process, construct, and analyze transcriptome data from a network perspective. With this assay, we expect to encourage the implementation of network theory analysis on biological data in everyday laboratory research.


Subject(s)
Software , Transcriptome
11.
Brief Bioinform ; 21(2): 676-686, 2020 03 23.
Article in English | MEDLINE | ID: mdl-30815667

ABSTRACT

A widely used approach in transcriptome analysis is the alignment of short reads to a reference genome. However, owing to the deficiencies of specially designed analytical systems, short reads unmapped to the genome sequence are usually ignored, resulting in the loss of significant biological information and insights. To fill this gap, we present Comprehensive Assembly and Functional annotation of Unmapped RNA-Seq data (CAFU), a Galaxy-based framework that can facilitate the large-scale analysis of unmapped RNA sequencing (RNA-Seq) reads from single- and mixed-species samples. By taking advantage of machine learning techniques, CAFU addresses the issue of accurately identifying the species origin of transcripts assembled using unmapped reads from mixed-species samples. CAFU also represents an innovation in that it provides a comprehensive collection of functions required for transcript confidence evaluation, coding potential calculation, sequence and expression characterization and function annotation. These functions and their dependencies have been integrated into a Galaxy framework that provides access to CAFU via a user-friendly interface, dramatically simplifying complex exploration tasks involving unmapped RNA-Seq reads. CAFU has been validated with RNA-Seq data sets from wheat and Zea mays (maize) samples. CAFU is freely available via GitHub: https://github.com/cma2015/CAFU.


Subject(s)
Computational Biology/methods , Sequence Analysis, RNA/methods , Genes, Plant , Humans , RNA, Messenger/genetics , Triticum/genetics , User-Computer Interface , Zea mays/genetics
12.
Stat Med ; 41(18): 3466-3478, 2022 08 15.
Article in English | MEDLINE | ID: mdl-35574857

ABSTRACT

In research synthesis, publication bias (PB) refers to the phenomenon that the publication of a study is associated with the direction and statistical significance of its results. Consequently, it may lead to biased (commonly optimistic) estimates of treatment effects. Visualization tools such as funnel plots have been widely used to investigate PB in univariate meta-analyses. The trim and fill procedure is a nonparametric method to identify and adjust for PB. It is popular among applied scientists due to its simplicity. However, most visualization tools and PB correction methods focus on univariate outcomes. For a meta-analysis with multiple outcomes, the conventional univariate trim and fill method can only account for different outcomes separately and thus may lead to inconsistent conclusions. In this article, we propose a bivariate trim and fill procedure to simultaneously account for PB in the presence of two outcomes that are possibly associated. Based on a recently developed galaxy plot for bivariate meta-analysis, the proposed procedure uses a data-driven imputation algorithm to detect and adjust PB. The method relies on the symmetry of the galaxy plot and assumes that some studies are suppressed based on a linear combination of outcomes. The method projects bivariate outcomes along a particular direction, uses the univariate trim and fill method to estimate the number of trimmed and filled studies, and yields consistent conclusions about PB. The proposed approach is validated using simulated data and is applied to a meta-analysis of the efficacy and safety of antidepressant drugs.


Subject(s)
Publication Bias , Humans
13.
Int J Mol Sci ; 23(9)2022 Apr 28.
Article in English | MEDLINE | ID: mdl-35563261

ABSTRACT

Nucleosomes are basic units of DNA packing in eukaryotes. Their structure is well conserved from yeast to human and consists of the histone octamer core and 147 bp DNA wrapped around it. Nucleosomes are bound to a majority of the eukaryotic genomic DNA, including its regulatory regions. Hence, they also play a major role in gene regulation. For the latter, their precise positioning on DNA is essential. In the present paper, we describe Galaxy dnpatterntools-software package for nucleosome DNA sequence analysis and mapping. This software will be useful for computational biologists practitioners to conduct more profound studies of gene regulatory mechanisms.


Subject(s)
Chromatin Assembly and Disassembly , Nucleosomes , DNA/metabolism , Humans , Nucleosomes/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Sequence Analysis, DNA
14.
BMC Bioinformatics ; 22(Suppl 15): 544, 2021 Nov 08.
Article in English | MEDLINE | ID: mdl-34749633

ABSTRACT

BACKGROUND: Improving the availability and usability of data and analytical tools is a critical precondition for further advancing modern biological and biomedical research. For instance, one of the many ramifications of the COVID-19 global pandemic has been to make even more evident the importance of having bioinformatics tools and data readily actionable by researchers through convenient access points and supported by adequate IT infrastructures. One of the most successful efforts in improving the availability and usability of bioinformatics tools and data is represented by the Galaxy workflow manager and its thriving community. In 2020 we introduced Laniakea, a software platform conceived to streamline the configuration and deployment of "on-demand" Galaxy instances over the cloud. By facilitating the set-up and configuration of Galaxy web servers, Laniakea provides researchers with a powerful and highly customisable platform for executing complex bioinformatics analyses. The system can be accessed through a dedicated and user-friendly web interface that allows the Galaxy web server's initial configuration and deployment. RESULTS: "Laniakea@ReCaS", the first instance of a Laniakea-based service, is managed by ELIXIR-IT and was officially launched in February 2020, after about one year of development and testing that involved several users. Researchers can request access to Laniakea@ReCaS through an open-ended call for use-cases. Ten project proposals have been accepted since then, totalling 18 Galaxy on-demand virtual servers that employ ~ 100 CPUs, ~ 250 GB of RAM and ~ 5 TB of storage and serve several different communities and purposes. Herein, we present eight use cases demonstrating the versatility of the platform. CONCLUSIONS: During this first year of activity, the Laniakea-based service emerged as a flexible platform that facilitated the rapid development of bioinformatics tools, the efficient delivery of training activities, and the provision of public bioinformatics services in different settings, including food safety and clinical research. Laniakea@ReCaS provides a proof of concept of how enabling access to appropriate, reliable IT resources and ready-to-use bioinformatics tools can considerably streamline researchers' work.


Subject(s)
COVID-19 , Cloud Computing , Computational Biology , Humans , SARS-CoV-2 , Software
15.
J Proteome Res ; 20(12): 5419-5423, 2021 12 03.
Article in English | MEDLINE | ID: mdl-34709836

ABSTRACT

Mass spectrometry-based proteomics is a high-throughput technology generating ever-larger amounts of data per project. However, storing, processing, and interpreting these data can be a challenge. A key element in simplifying this process is the development of interactive frameworks focusing on visualization that can greatly simplify both the interpretation of data and the generation of new knowledge. Here we present PeptideShaker Online, a user-friendly web-based framework for the identification of mass spectrometry-based proteomics data, from raw file conversion to interactive visualization of the resulting data. Storage and processing of the data are performed via the versatile Galaxy platform (through SearchGUI, PeptideShaker, and moFF), while the interaction with the results happens via a locally installed web server, thus enabling researchers to process and interpret their own data without requiring advanced bioinformatics skills or direct access to compute-intensive infrastructures. The source code, additional documentation, and a fully functional demo is available at https://github.com/barsnes-group/peptide-shaker-online.


Subject(s)
Proteomics , Software , Computational Biology/methods , Internet , Mass Spectrometry , Proteomics/methods
16.
BMC Genomics ; 22(1): 114, 2021 Feb 10.
Article in English | MEDLINE | ID: mdl-33568057

ABSTRACT

BACKGROUND: Processing and analyzing whole genome sequencing (WGS) is computationally intense: a single Illumina MiSeq WGS run produces ~ 1 million 250-base-pair reads for each of 24 samples. This poses significant obstacles for smaller laboratories, or laboratories not affiliated with larger projects, which may not have dedicated bioinformatics staff or computing power to effectively use genomic data to protect public health. Building on the success of the cloud-based Galaxy bioinformatics platform ( http://galaxyproject.org ), already known for its user-friendliness and powerful WGS analytical tools, the Center for Food Safety and Applied Nutrition (CFSAN) at the U.S. Food and Drug Administration (FDA) created a customized 'instance' of the Galaxy environment, called GalaxyTrakr ( https://www.galaxytrakr.org ), for use by laboratory scientists performing food-safety regulatory research. The goal was to enable laboratories outside of the FDA internal network to (1) perform quality assessments of sequence data, (2) identify links between clinical isolates and positive food/environmental samples, including those at the National Center for Biotechnology Information sequence read archive ( https://www.ncbi.nlm.nih.gov/sra/ ), and (3) explore new methodologies such as metagenomics. GalaxyTrakr hosts a variety of free and adaptable tools and provides the data storage and computing power to run the tools. These tools support coordinated analytic methods and consistent interpretation of results across laboratories. Users can create and share tools for their specific needs and use sequence data generated locally and elsewhere. RESULTS: In its first full year (2018), GalaxyTrakr processed over 85,000 jobs and went from 25 to 250 users, representing 53 different public and state health laboratories, academic institutions, international health laboratories, and federal organizations. By mid-2020, it has grown to 600 registered users and processed over 450,000 analytical jobs. To illustrate how laboratories are making use of this resource, we describe how six institutions use GalaxyTrakr to quickly analyze and review their data. Instructions for participating in GalaxyTrakr are provided. CONCLUSIONS: GalaxyTrakr advances food safety by providing reliable and harmonized WGS analyses for public health laboratories and promoting collaboration across laboratories with differing resources. Anticipated enhancements to this resource will include workflows for additional foodborne pathogens, viruses, and parasites, as well as new tools and services.


Subject(s)
Metagenomics , Public Health , Computational Biology , High-Throughput Nucleotide Sequencing , Humans , Whole Genome Sequencing
17.
Metabolomics ; 17(10): 91, 2021 09 25.
Article in English | MEDLINE | ID: mdl-34562172

ABSTRACT

INTRODUCTION: Inductively coupled plasma mass spectrometry (ICP-MS) experiments generate complex multi-dimensional data sets that require specialist data analysis tools. OBJECTIVE: Here we describe tools to facilitate analysis of the ionome composed of high-throughput elemental profiling data. METHODS: IonFlow is a Galaxy tool written in R for ionomics data analysis and is freely accessible at https://github.com/wanchanglin/ionflow . It is designed as a pipeline that can process raw data to enable exploration and interpretation using multivariate statistical techniques and network-based algorithms, including principal components analysis, hierarchical clustering, relevance network extraction and analysis, and gene set enrichment analysis. RESULTS AND CONCLUSION: The pipeline is described and tested on two benchmark data sets of the haploid S. Cerevisiae ionome and of the human HeLa cell ionome.


Subject(s)
Saccharomyces cerevisiae , Cluster Analysis , HeLa Cells , Humans , Principal Component Analysis
18.
Stud Hist Philos Sci ; 88: 220-236, 2021 08.
Article in English | MEDLINE | ID: mdl-34224943

ABSTRACT

Galaxies are the basic structural element of the universe; galaxy formation theory seeks to explain how these structures came to be. I trace some of the foundational ideas in galaxy formation, with emphasis on the need for non-baryonic cold dark matter. Many elements of early theory did not survive contact with observations of low surface brightness galaxies, leading to the need for auxiliary hypotheses like feedback. The failure points often trace to the surprising predictive successes of an alternative to dark matter, the Modified Newtonian Dynamics (MOND). While dark matter models are flexible in accommodating observations, they do not provide the predictive capacity of MOND. If the universe is made of cold dark matter, why does MOND get any predictions right?


Subject(s)
Galaxies
19.
J Proteome Res ; 19(7): 2772-2785, 2020 07 02.
Article in English | MEDLINE | ID: mdl-32396365

ABSTRACT

Multiomics approaches focused on mass spectrometry (MS)-based data, such as metaproteomics, utilize genomic and/or transcriptomic sequencing data to generate a comprehensive protein sequence database. These databases can be very large, containing millions of sequences, which reduces the sensitivity of matching tandem mass spectrometry (MS/MS) data to sequences to generate peptide spectrum matches (PSMs). Here, we describe and evaluate a sectioning method for generating an enriched database for those protein sequences that are most likely present in the sample. Our evaluation demonstrates how this method helps to increase the sensitivity of PSMs while maintaining acceptable false discovery rate statistics-offering a flexible alternative to traditional large database searching, as well as previously described two-step database searching methods for large sequence database applications. Furthermore, implementation in the Galaxy platform provides access to an automated and customizable workflow for carrying out the method. Additionally, the results of this study provide valuable insights into the advantages and limitations offered by available methods aimed at addressing challenges of genome-guided, large database applications in proteomics. Relevant raw data has been made available at https://zenodo.org/ using data set identifier "3754789" and https://arcticdata.io/catalog using data set identifier "A2VX06340".


Subject(s)
Proteomics , Tandem Mass Spectrometry , Databases, Protein , Genomics , Peptides/genetics , Software
20.
BMC Genomics ; 21(Suppl 3): 163, 2020 Apr 02.
Article in English | MEDLINE | ID: mdl-32241255

ABSTRACT

BACKGROUND: DNA methylation is a crucial epigenomic mechanism in various biological processes. Using whole-genome bisulfite sequencing (WGBS) technology, methylated cytosine sites can be revealed at the single nucleotide level. However, the WGBS data analysis process is usually complicated and challenging. RESULTS: To alleviate the associated difficulties, we integrated the WGBS data processing steps and downstream analysis into a two-phase approach. First, we set up the required tools in Galaxy and developed workflows to calculate the methylation level from raw WGBS data and generate a methylation status summary, the mtable. This computation environment is wrapped into the Docker container image DocMethyl, which allows users to rapidly deploy an executable environment without tedious software installation and library dependency problems. Next, the mtable files were uploaded to the web server EpiMOLAS_web to link with the gene annotation databases that enable rapid data retrieval and analyses. CONCLUSION: To our knowledge, the EpiMOLAS framework, consisting of DocMethyl and EpiMOLAS_web, is the first approach to include containerization technology and a web-based system for WGBS data analysis from raw data processing to downstream analysis. EpiMOLAS will help users cope with their WGBS data and also conduct reproducible analyses of publicly available data, thereby gaining insights into the mechanisms underlying complex biological phenomenon. The Galaxy Docker image DocMethyl is available at https://hub.docker.com/r/lsbnb/docmethyl/. EpiMOLAS_web is publicly accessible at http://symbiosis.iis.sinica.edu.tw/epimolas/.


Subject(s)
Computational Biology/methods , DNA Methylation/genetics , Genome, Human/genetics , Whole Genome Sequencing/methods , CpG Islands/genetics , Humans , Internet , Software
SELECTION OF CITATIONS
SEARCH DETAIL