Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 112
Filtrar
Mais filtros

País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 25(1): 200, 2024 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-38802733

RESUMO

BACKGROUND: The initial version of SEDA assists life science researchers without programming skills with the preparation of DNA and protein sequence FASTA files for multiple bioinformatics applications. However, the initial version of SEDA lacks a command-line interface for more advanced users and does not allow the creation of automated analysis pipelines. RESULTS: The present paper discusses the updates of the new SEDA release, including the addition of a complete command-line interface, new functionalities like gene annotation, a framework for automated pipelines, and improved integration in Linux environments. CONCLUSION: SEDA is an open-source Java application and can be installed using the different distributions available ( https://www.sing-group.org/seda/download.html ) as well as through a Docker image ( https://hub.docker.com/r/pegi3s/seda ). It is released under a GPL-3.0 license, and its source code is publicly accessible on GitHub ( https://github.com/sing-group/seda ). The software version at the time of submission is archived at Zenodo (version v1.6.0, http://doi.org/10.5281/zenodo.10201605 ).


Assuntos
Biologia Computacional , Software , Biologia Computacional/métodos , Análise de Dados
2.
BMC Bioinformatics ; 25(1): 110, 2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38475691

RESUMO

BACKGROUND: The analysis of large and complex biological datasets in bioinformatics poses a significant challenge to achieving reproducible research outcomes due to inconsistencies and the lack of standardization in the analysis process. These issues can lead to discrepancies in results, undermining the credibility and impact of bioinformatics research and creating mistrust in the scientific process. To address these challenges, open science practices such as sharing data, code, and methods have been encouraged. RESULTS: CREDO, a Customizable, REproducible, DOcker file generator for bioinformatics applications, has been developed as a tool to moderate reproducibility issues by building and distributing docker containers with embedded bioinformatics tools. CREDO simplifies the process of generating Docker images, facilitating reproducibility and efficient research in bioinformatics. The crucial step in generating a Docker image is creating the Dockerfile, which requires incorporating heterogeneous packages and environments such as Bioconductor and Conda. CREDO stores all required package information and dependencies in a Github-compatible format to enhance Docker image reproducibility, allowing easy image creation from scratch. The user-friendly GUI and CREDO's ability to generate modular Docker images make it an ideal tool for life scientists to efficiently create Docker images. Overall, CREDO is a valuable tool for addressing reproducibility issues in bioinformatics research and promoting open science practices.


Assuntos
Biologia Computacional , Software , Reprodutibilidade dos Testes , Biologia Computacional/métodos
3.
Molecules ; 29(13)2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38999091

RESUMO

In the organic laboratory, the 13C nuclear magnetic resonance (NMR) spectrum of a newly synthesized compound remains an essential step in elucidating its structure. For the chemist, the interpretation of such a spectrum, which is a set of chemical-shift values, is made easier if he/she has a tool capable of predicting with sufficient accuracy the carbon-shift values from the structure he/she intends to prepare. As there are few open-source methods for accurately estimating this property, we applied our graph-machine approach to build models capable of predicting the chemical shifts of carbons. For this study, we focused on benzene compounds, building an optimized model derived from training a database of 10,577 chemical shifts originating from 2026 structures that contain up to ten types of non-carbon atoms, namely H, O, N, S, P, Si, and halogens. It provides a training root-mean-squared relative error (RMSRE) of 0.5%, i.e., a root-mean-squared error (RMSE) of 0.6 ppm, and a mean absolute error (MAE) of 0.4 ppm for estimating the chemical shifts of the 10k carbons. The predictive capability of the graph-machine model is also compared with that of three commercial packages on a dataset of 171 original benzenic structures (1012 chemical shifts). The graph-machine model proves to be very efficient in predicting chemical shifts, with an RMSE of 0.9 ppm, and compares favorably with the RMSEs of 3.4, 1.8, and 1.9 ppm computed with the ChemDraw v. 23.1.1.3, ACD v. 11.01, and MestReNova v. 15.0.1-35756 packages respectively. Finally, a Docker-based tool is proposed to predict the carbon chemical shifts of benzenic compounds solely from their SMILES codes.

4.
BMC Bioinformatics ; 24(1): 49, 2023 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-36792982

RESUMO

BACKGROUND: A wide range of tools are available for the detection of copy number variants (CNVs) from whole-genome sequencing (WGS) data. However, none of them focus on clinically-relevant CNVs, such as those that are associated with known genetic syndromes. Such variants are often large in size, typically 1-5 Mb, but currently available CNV callers have been developed and benchmarked for the discovery of smaller variants. Thus, the ability of these programs to detect tens of real syndromic CNVs remains largely unknown. RESULTS: Here we present ConanVarvar, a tool which implements a complete workflow for the targeted analysis of large germline CNVs from WGS data. ConanVarvar comes with an intuitive R Shiny graphical user interface and annotates identified variants with information about 56 associated syndromic conditions. We benchmarked ConanVarvar and four other programs on a dataset containing real and simulated syndromic CNVs larger than 1 Mb. In comparison to other tools, ConanVarvar reports 10-30 times less false-positive variants without compromising sensitivity and is quicker to run, especially on large batches of samples. CONCLUSIONS: ConanVarvar is a useful instrument for primary analysis in disease sequencing studies, where large CNVs could be the cause of disease.


Assuntos
Variações do Número de Cópias de DNA , Células Germinativas , Sequenciamento Completo do Genoma , Fluxo de Trabalho , Sequenciamento de Nucleotídeos em Larga Escala
5.
BMC Bioinformatics ; 24(1): 288, 2023 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-37464285

RESUMO

BACKGROUND:  PacBio high fidelity (HiFi) sequencing reads are both long (15-20 kb) and highly accurate (> Q20). Because of these properties, they have revolutionised genome assembly leading to more accurate and contiguous genomes. In eukaryotes the mitochondrial genome is sequenced alongside the nuclear genome often at very high coverage. A dedicated tool for mitochondrial genome assembly using HiFi reads is still missing. RESULTS:  MitoHiFi was developed within the Darwin Tree of Life Project to assemble mitochondrial genomes from the HiFi reads generated for target species. The input for MitoHiFi is either the raw reads or the assembled contigs, and the tool outputs a mitochondrial genome sequence fasta file along with annotation of protein and RNA genes. Variants arising from heteroplasmy are assembled independently, and nuclear insertions of mitochondrial sequences are identified and not used in organellar genome assembly. MitoHiFi has been used to assemble 374 mitochondrial genomes (368 Metazoa and 6 Fungi species) for the Darwin Tree of Life Project, the Vertebrate Genomes Project and the Aquatic Symbiosis Genome Project. Inspection of 60 mitochondrial genomes assembled with MitoHiFi for species that already have reference sequences in public databases showed the widespread presence of previously unreported repeats. CONCLUSIONS:  MitoHiFi is able to assemble mitochondrial genomes from a wide phylogenetic range of taxa from Pacbio HiFi data. MitoHiFi is written in python and is freely available on GitHub ( https://github.com/marcelauliano/MitoHiFi ). MitoHiFi is available with its dependencies as a Docker container on GitHub (ghcr.io/marcelauliano/mitohifi:master).


Assuntos
Genoma Mitocondrial , Filogenia , RNA , Eucariotos , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala
6.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32436933

RESUMO

Whole exome sequencing (WES) is a powerful approach for discovering sequence variants in cancer cells but its time effectiveness is limited by the complexity and issues of WES data analysis. Here we present iWhale, a customizable pipeline based on Docker and SCons, reliably detecting somatic variants by three complementary callers (MuTect2, Strelka2 and VarScan2). The results are combined to obtain a single variant call format file for each sample and variants are annotated by integrating a wide range of information extracted from several reference databases, ultimately allowing variant and gene prioritization according to different criteria. iWhale allows users to conduct a complex series of WES analyses with a powerful yet customizable and easy-to-use tool, running on most operating systems (macOs, GNU/Linux and Windows). iWhale code is freely available at https://github.com/alexcoppe/iWhale and the docker image is downloadable from https://hub.docker.com/r/alexcoppe/iwhale.


Assuntos
Biologia Computacional/métodos , Mutação , Neoplasias/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Sequenciamento do Exoma
7.
Glob Chang Biol ; 29(15): 4440-4452, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37303068

RESUMO

Dynamic Global Vegetation Models (DGVMs) provide a state-of-the-art process-based approach to study the complex interplay between vegetation and its physical environment. For example, they help to predict how terrestrial plants interact with climate, soils, disturbance and competition for resources. We argue that there is untapped potential for the use of DGVMs in ecological and ecophysiological research. One fundamental barrier to realize this potential is that many researchers with relevant expertize (ecology, plant physiology, soil science, etc.) lack access to the technical resources or awareness of the research potential of DGVMs. Here we present the Land Sites Platform (LSP): new software that facilitates single-site simulations with the Functionally Assembled Terrestrial Ecosystem Simulator, an advanced DGVM coupled with the Community Land Model. The LSP includes a Graphical User Interface and an Application Programming Interface, which improve the user experience and lower the technical thresholds for installing these model architectures and setting up model experiments. The software is distributed via version-controlled containers; researchers and students can run simulations directly on their personal computers or servers, with relatively low hardware requirements, and on different operating systems. Version 1.0 of the LSP supports site-level simulations. We provide input data for 20 established geo-ecological observation sites in Norway and workflows to add generic sites from public global datasets. The LSP makes standard model experiments with default data easily achievable (e.g., for educational or introductory purposes) while retaining flexibility for more advanced scientific uses. We further provide tools to visualize the model input and output, including simple examples to relate predictions to local observations. The LSP improves access to land surface and DGVM modelling as a building block of community cyberinfrastructure that may inspire new avenues for mechanistic ecosystem research across disciplines.


Assuntos
Clima , Ecossistema , Humanos , Fenômenos Fisiológicos Vegetais , Software , Plantas
8.
Sensors (Basel) ; 23(17)2023 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-37688118

RESUMO

There is a rapid increase in the number of edge devices in IoT solutions, generating vast amounts of data that need to be processed and analyzed efficiently. Traditional cloud-based architectures can face latency, bandwidth, and privacy challenges when dealing with this data flood. There is currently no unified approach to the creation of edge computing solutions. This work addresses this problem by exploring containerization for data processing solutions at the network's edge. The current approach involves creating a specialized application compatible with the device used. Another approach involves using containerization for deployment and monitoring. The heterogeneity of edge environments would greatly benefit from a universal modular platform. Our proposed edge computing-based framework implements a streaming extract, transform, and load pipeline for data processing and analysis using ZeroMQ as the communication backbone and containerization for scalable deployment. Results demonstrate the effectiveness of the proposed framework, making it suitable for time-sensitive IoT applications.

9.
Molecules ; 28(19)2023 Sep 26.
Artigo em Inglês | MEDLINE | ID: mdl-37836648

RESUMO

The refractive index (RI) of liquids is a key physical property of molecular compounds and materials. In addition to its ubiquitous role in physics, it is also exploited to impart specific optical properties (transparency, opacity, and gloss) to materials and various end-use products. Since few methods exist to accurately estimate this property, we have designed a graph machine model (GMM) capable of predicting the RI of liquid organic compounds containing up to 16 different types of atoms and effective in discriminating between stereoisomers. Using 8267 carefully checked RI values from the literature and the corresponding 2D organic structures, the GMM provides a training root mean square relative error of less than 0.5%, i.e., an RMSE of 0.004 for the estimation of the refractive index of the 8267 compounds. The GMM predictive ability is also compared to that obtained by several fragment-based approaches. Finally, a Docker-based tool is proposed to predict the RI of organic compounds solely from their SMILES code. The GMM developed is easy to apply, as shown by the video tutorials provided on YouTube.

10.
BMC Bioinformatics ; 23(1): 498, 2022 Nov 19.
Artigo em Inglês | MEDLINE | ID: mdl-36402955

RESUMO

BACKGROUND: Genome-wide association studies (GWAS) are a powerful method to detect associations between variants and phenotypes. A GWAS requires several complex computations with large data sets, and many steps may need to be repeated with varying parameters. Manual running of these analyses can be tedious, error-prone and hard to reproduce. RESULTS: The H3AGWAS workflow from the Pan-African Bioinformatics Network for H3Africa is a powerful, scalable and portable workflow implementing pre-association analysis, implementation of various association testing methods and post-association analysis of results. CONCLUSIONS: The workflow is scalable-laptop to cluster to cloud (e.g., SLURM, AWS Batch, Azure). All required software is containerised and can run under Docker or Singularity.


Assuntos
Biologia Computacional , Estudo de Associação Genômica Ampla , Fluxo de Trabalho , Biologia Computacional/métodos , Software , Fenótipo
11.
Sensors (Basel) ; 22(13)2022 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-35808390

RESUMO

Maritime Domain Awareness (MDA) is a strategic field of study that seeks to provide a coastal country with an effective monitoring of its maritime resources and its Exclusive Economic Zone (EEZ). In this scope, a Maritime Monitoring System (MMS) aims to leverage active surveillance of military and non-military activities at sea using sensing devices such as radars, optronics, automatic Identification Systems (AISs), and IoT, among others. However, deploying a nation-scale MMS imposes great challenges regarding the scalability and cybersecurity of this heterogeneous system. Aiming to address these challenges, this work explores the use of blockchain to leverage MMS cybersecurity and to ensure the integrity, authenticity, and availability of relevant navigation data. We propose a prototype built on a permissioned blockchain solution using HyperLedger Fabric-a robust, modular, and efficient open-source blockchain platform. We evaluate this solution's performance through a practical experiment where the prototype receives sensing data from a Software-Defined-Radio (SDR)-based low-cost AIS receiver built with a Raspberry Pi. In order to reduce scalability attrition, we developed a dockerized blockchain client easily deployed on a large scale. Furthermore, we determined, through extensive experimentation, the client optimal hardware configuration, also aiming to reduce implementation and maintenance costs. The performance results provide a quantitative analysis of the blockchain technology overhead and its impact in terms of Quality of Service (QoS), demonstrating the feasibility and effectiveness of our solution in the scope of an MMS using AIS data.


Assuntos
Blockchain , Segurança Computacional , Humanos , Monitorização Fisiológica , Software , Tecnologia
12.
BMC Bioinformatics ; 22(1): 298, 2021 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-34082707

RESUMO

BACKGROUND: RNA-Seq is a well-established technology extensively used for transcriptome profiling, allowing the analysis of coding and non-coding RNA molecules. However, this technology produces a vast amount of data requiring sophisticated computational approaches for their analysis than other traditional technologies such as Real-Time PCR or microarrays, strongly discouraging non-expert users. For this reason, dozens of pipelines have been deployed for the analysis of RNA-Seq data. Although interesting, these present several limitations and their usage require a technical background, which may be uncommon in small research laboratories. Therefore, the application of these technologies in such contexts is still limited and causes a clear bottleneck in knowledge advancement. RESULTS: Motivated by these considerations, we have developed RNAdetector, a new free cross-platform and user-friendly RNA-Seq data analysis software that can be used locally or in cloud environments through an easy-to-use Graphical User Interface allowing the analysis of coding and non-coding RNAs from RNA-Seq datasets of any sequenced biological species. CONCLUSIONS: RNAdetector is a new software that fills an essential gap between the needs of biomedical and research labs to process RNA-Seq data and their common lack of technical background in performing such analysis, which usually relies on outsourcing such steps to third party bioinformatics facilities or using expensive commercial software.


Assuntos
Computação em Nuvem , Análise de Dados , Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala , RNA-Seq , Análise de Sequência de RNA , Software
13.
BMC Bioinformatics ; 22(1): 85, 2021 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-33627090

RESUMO

BACKGROUND: Benchmarking the performance of complex analytical pipelines is an essential part of developing Lab Developed Tests (LDT). Reference samples and benchmark calls published by Genome in a Bottle (GIAB) consortium have enabled the evaluation of analytical methods. The performance of such methods is not uniform across the different genomic regions of interest and variant types. Several benchmarking methods such as hap.py, vcfeval, and vcflib are available to assess the analytical performance characteristics of variant calling algorithms. However, assessing the performance characteristics of an overall LDT assay still requires stringing together several such methods and experienced bioinformaticians to interpret the results. In addition, these methods are dependent on the hardware, operating system and other software libraries, making it impossible to reliably repeat the analytical assessment, when any of the underlying dependencies change in the assay. Here we present a scalable and reproducible, cloud-based benchmarking workflow that is independent of the laboratory and the technician executing the workflow, or the underlying compute hardware used to rapidly and continually assess the performance of LDT assays, across their regions of interest and reportable range, using a broad set of benchmarking samples. RESULTS: The benchmarking workflow was used to evaluate the performance characteristics for secondary analysis pipelines commonly used by Clinical Genomics laboratories in their LDT assays such as the GATK HaplotypeCaller v3.7 and the SpeedSeq workflow based on FreeBayes v0.9.10. Five reference sample truth sets generated by Genome in a Bottle (GIAB) consortium, six samples from the Personal Genome Project (PGP) and several samples with validated clinically relevant variants from the Centers for Disease Control were used in this work. The performance characteristics were evaluated and compared for multiple reportable ranges, such as whole exome and the clinical exome. CONCLUSIONS: We have implemented a benchmarking workflow for clinical diagnostic laboratories that generates metrics such as specificity, precision and sensitivity for germline SNPs and InDels within a reportable range using whole exome or genome sequencing data. Combining these benchmarking results with validation using known variants of clinical significance in publicly available cell lines, we were able to establish the performance of variant calling pipelines in a clinical setting.


Assuntos
Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala , Exoma , Células Germinativas , Polimorfismo de Nucleotídeo Único , Software , Fluxo de Trabalho
14.
Genomics ; 112(1): 127-134, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-30926570

RESUMO

Next generation sequencing techniques produce enormous data but its analysis and visualization remains a big challenge. To address this, we have developed Genome Annotator Light(GAL), a Docker based package for genome analysis and data visualization. GAL integrated several existing tools and in-house programs inside a Docker Container for systematic analysis and visualization of genomes through web browser. GAL takes varieties of input types ranging from raw Fasta files to fully annotated files, processes them through a standard annotation pipeline and visualizes on a web browser. Comparative genomic analysis is performed automatically within a given taxonomic class. GAL creates interactive genome browser with clickable genomic feature tracks; local BLAST-able database; query page, on-fly downstream data analysis using EMBOSS etc. Overall, GAL is an extremely convenient, portable and platform independent. Fully integrated web-resources can be easily created and deployed, e.g. www.eumicrobedb.org/cglab, for our in-house genomes. GAL is freely available at https://hub.docker.com/u/cglabiicb/.


Assuntos
Genômica/métodos , Software , Gráficos por Computador
15.
Sensors (Basel) ; 21(4)2021 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-33669314

RESUMO

Containers virtually package a piece of software and share the host Operating System (OS) upon deployment. This makes them notably light weight and suitable for dynamic service deployment at the network edge and Internet of Things (IoT) devices for reduced latency and energy consumption. Data collection, computation, and now intelligence is included in variety of IoT devices which have very tight latency and energy consumption conditions. Recent studies satisfy latency condition through containerized services deployment on IoT devices and gateways. They fail to account for the limited energy and computing resources of these devices which limit the scalability and concurrent services deployment. This paper aims to establish guidelines and identify critical factors for containerized services deployment on resource constrained IoT devices. For this purpose, two container orchestration tools (i.e., Docker Swarm and Kubernetes) are tested and compared on a baseline IoT gateways testbed. Experiments use Deep Learning driven data analytics and Intrusion Detection System services, and evaluate the time it takes to prepare and deploy a container (creation time), Central Processing Unit (CPU) utilization for concurrent containers deployment, memory usage under different traffic loads, and energy consumption. The results indicate that container creation time and memory usage are decisive factors for containerized micro service architecture.

16.
Sensors (Basel) ; 21(12)2021 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-34200575

RESUMO

Industrial Internet of Things (IIoT) applications are being used more and more frequently. Data collected by various sensors can be used to provide innovative digital services supporting increasing efficiency or cost reduction. The implementation of such applications requires the integration and analysis of heterogeneous data coming from a broad variety of sensors. To support these steps, this paper introduces OPAL, a software toolbox consolidating several software components for the semantically annotated integration and analysis of IoT-data. Data storage is realized in a standardized and INSPIRE-compliant way utilizing the SensorThings API. Supporting a broad variety of use cases, OPAL provides several import adapters to access data sources with various protocols (e.g., the OPC UA protocol, which is often used in industrial environments). In addition, a unified management and execution environment, called PERMA, is introduced to allow the programming language independent integration of algorithms.


Assuntos
Internet das Coisas , Software , Algoritmos
17.
Entropy (Basel) ; 23(7)2021 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-34356455

RESUMO

With the advent of microservice-based software architectures, an increasing number of modern cloud environments and enterprises use operating system level virtualization, which is often referred to as container infrastructures. Docker Swarm is one of the most popular container orchestration infrastructures, providing high availability and fault tolerance. Occasionally, discovered container escape vulnerabilities allow adversaries to execute code on the host operating system and operate within the cloud infrastructure. We show that Docker Swarm is currently not secured against misbehaving manager nodes. This allows a high impact, high probability privilege escalation attack, which we refer to as leadership hijacking, the possibility of which is neglected by the current cloud security literature. Cloud lateral movement and defense evasion payloads allow an adversary to leverage the Docker Swarm functionality to control each and every host in the underlying cluster. We demonstrate an end-to-end attack, in which an adversary with access to an application running on the cluster achieves full control of the cluster. To reduce the probability of a successful high impact attack, container orchestration infrastructures must reduce the trust level of participating nodes and, in particular, incorporate adversary immune leader election algorithms.

18.
BMC Genomics ; 21(1): 193, 2020 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-32122303

RESUMO

BACKGROUND: Genome assemblies are foundational for understanding the biology of a species. They provide a physical framework for mapping additional sequences, thereby enabling characterization of, for example, genomic diversity and differences in gene expression across individuals and tissue types. Quality metrics for genome assemblies gauge both the completeness and contiguity of an assembly and help provide confidence in downstream biological insights. To compare quality across multiple assemblies, a set of common metrics are typically calculated and then compared to one or more gold standard reference genomes. While several tools exist for calculating individual metrics, applications providing comprehensive evaluations of multiple assembly features are, perhaps surprisingly, lacking. Here, we describe a new toolkit that integrates multiple metrics to characterize both assembly and gene annotation quality in a way that enables comparison across multiple assemblies and assembly types. RESULTS: Our application, named GenomeQC, is an easy-to-use and interactive web framework that integrates various quantitative measures to characterize genome assemblies and annotations. GenomeQC provides researchers with a comprehensive summary of these statistics and allows for benchmarking against gold standard reference assemblies. CONCLUSIONS: The GenomeQC web application is implemented in R/Shiny version 1.5.9 and Python 3.6 and is freely available at https://genomeqc.maizegdb.org/ under the GPL license. All source code and a containerized version of the GenomeQC pipeline is available in the GitHub repository https://github.com/HuffordLab/GenomeQC.


Assuntos
Genômica/métodos , Mapeamento Cromossômico , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Análise de Sequência de DNA , Software
19.
BMC Genomics ; 21(Suppl 3): 163, 2020 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-32241255

RESUMO

BACKGROUND: DNA methylation is a crucial epigenomic mechanism in various biological processes. Using whole-genome bisulfite sequencing (WGBS) technology, methylated cytosine sites can be revealed at the single nucleotide level. However, the WGBS data analysis process is usually complicated and challenging. RESULTS: To alleviate the associated difficulties, we integrated the WGBS data processing steps and downstream analysis into a two-phase approach. First, we set up the required tools in Galaxy and developed workflows to calculate the methylation level from raw WGBS data and generate a methylation status summary, the mtable. This computation environment is wrapped into the Docker container image DocMethyl, which allows users to rapidly deploy an executable environment without tedious software installation and library dependency problems. Next, the mtable files were uploaded to the web server EpiMOLAS_web to link with the gene annotation databases that enable rapid data retrieval and analyses. CONCLUSION: To our knowledge, the EpiMOLAS framework, consisting of DocMethyl and EpiMOLAS_web, is the first approach to include containerization technology and a web-based system for WGBS data analysis from raw data processing to downstream analysis. EpiMOLAS will help users cope with their WGBS data and also conduct reproducible analyses of publicly available data, thereby gaining insights into the mechanisms underlying complex biological phenomenon. The Galaxy Docker image DocMethyl is available at https://hub.docker.com/r/lsbnb/docmethyl/. EpiMOLAS_web is publicly accessible at http://symbiosis.iis.sinica.edu.tw/epimolas/.


Assuntos
Biologia Computacional/métodos , Metilação de DNA/genética , Genoma Humano/genética , Sequenciamento Completo do Genoma/métodos , Ilhas de CpG/genética , Humanos , Internet , Software
20.
J Transl Med ; 18(1): 494, 2020 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-33380328

RESUMO

BACKGROUND: Tracking the genetic variability of Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2) is a crucial challenge. Mainly to identify target sequences in order to generate robust vaccines and neutralizing monoclonal antibodies, but also to track viral genetic temporal and geographic evolution and to mine for variants associated with reduced or increased disease severity. Several online tools and bioinformatic phylogenetic analyses have been released, but the main interest lies in the Spike protein, which is the pivotal element of current vaccine design, and in the Receptor Binding Domain, that accounts for most of the neutralizing the antibody activity. METHODS: Here, we present an open-source bioinformatic protocol, and a web portal focused on SARS-CoV-2 single mutations and minimal consensus sequence building as a companion vaccine design tool. Furthermore, we provide immunogenomic analyses to understand the impact of the most frequent RBD variations. RESULTS: Results on the whole GISAID sequence dataset at the time of the writing (October 2020) reveals an emerging mutation, S477N, located on the central part of the Spike protein Receptor Binding Domain, the Receptor Binding Motif. Immunogenomic analyses revealed some variation in mutated epitope MHC compatibility, T-cell recognition, and B-cell epitope probability for most frequent human HLAs. CONCLUSIONS: This work provides a framework able to track down SARS-CoV-2 genomic variability.


Assuntos
COVID-19/virologia , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus/genética , Sítios de Ligação/genética , COVID-19/epidemiologia , Vacinas contra COVID-19/genética , Biologia Computacional , Mineração de Dados , Variação Genética , Humanos , Fenômenos Imunogenéticos , Modelos Moleculares , Mutação , Pandemias/estatística & dados numéricos , Domínios Proteicos , Receptores Virais , SARS-CoV-2/imunologia , Software , Glicoproteína da Espícula de Coronavírus/imunologia , Pesquisa Translacional Biomédica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA