Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
1.
BMC Bioinformatics ; 25(1): 110, 2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38475691

RESUMO

BACKGROUND: The analysis of large and complex biological datasets in bioinformatics poses a significant challenge to achieving reproducible research outcomes due to inconsistencies and the lack of standardization in the analysis process. These issues can lead to discrepancies in results, undermining the credibility and impact of bioinformatics research and creating mistrust in the scientific process. To address these challenges, open science practices such as sharing data, code, and methods have been encouraged. RESULTS: CREDO, a Customizable, REproducible, DOcker file generator for bioinformatics applications, has been developed as a tool to moderate reproducibility issues by building and distributing docker containers with embedded bioinformatics tools. CREDO simplifies the process of generating Docker images, facilitating reproducibility and efficient research in bioinformatics. The crucial step in generating a Docker image is creating the Dockerfile, which requires incorporating heterogeneous packages and environments such as Bioconductor and Conda. CREDO stores all required package information and dependencies in a Github-compatible format to enhance Docker image reproducibility, allowing easy image creation from scratch. The user-friendly GUI and CREDO's ability to generate modular Docker images make it an ideal tool for life scientists to efficiently create Docker images. Overall, CREDO is a valuable tool for addressing reproducibility issues in bioinformatics research and promoting open science practices.


Assuntos
Biologia Computacional , Software , Reprodutibilidade dos Testes , Biologia Computacional/métodos
2.
Bioinformatics ; 39(5)2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37079732

RESUMO

MOTIVATION: The transition from evaluating a single time point to examining the entire dynamic evolution of a system is possible only in the presence of the proper framework. The strong variability of dynamic evolution makes the definition of an explanatory procedure for data fitting and clustering challenging. RESULTS: We developed CONNECTOR, a data-driven framework able to analyze and inspect longitudinal data in a straightforward and revealing way. When used to analyze tumor growth kinetics over time in 1599 patient-derived xenograft growth curves from ovarian and colorectal cancers, CONNECTOR allowed the aggregation of time-series data through an unsupervised approach in informative clusters. We give a new perspective of mechanism interpretation, specifically, we define novel model aggregations and we identify unanticipated molecular associations with response to clinically approved therapies. AVAILABILITY AND IMPLEMENTATION: CONNECTOR is freely available under GNU GPL license at https://qbioturin.github.io/connector and https://doi.org/10.17504/protocols.io.8epv56e74g1b/v1.


Assuntos
Software , Humanos , Animais , Análise por Conglomerados , Fatores de Tempo , Modelos Animais de Doenças , Medição de Risco
3.
J Biomed Inform ; 148: 104546, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37984546

RESUMO

OBJECTIVE: Computational models are at the forefront of the pursuit of personalized medicine thanks to their descriptive and predictive abilities. In the presence of complex and heterogeneous data, patient stratification is a prerequisite for effective precision medicine, since disease development is often driven by individual variability and unpredictable environmental events. Herein, we present GreatNectorworkflow as a valuable tool for (i) the analysis and clustering of patient-derived longitudinal data, and (ii) the simulation of the resulting model of patient-specific disease dynamics. METHODS: GreatNectoris designed by combining an analytic strategy composed of CONNECTOR, a data-driven framework for the inspection of longitudinal data, and an unsupervised methodology to stratify the subjects with GreatMod, a quantitative modeling framework based on the Petri Net formalism and its generalizations. RESULTS: To illustrate GreatNectorcapabilities, we exploited longitudinal data of four immune cell populations collected from Multiple Sclerosis patients. Our main results report that the T-cell dynamics after alemtuzumab treatment separate non-responders versus responders patients, and the patients in the non-responders group are characterized by an increase of the Th17 concentration around 36 months. CONCLUSION: GreatNectoranalysis was able to stratify individual patients into three model meta-patients whose dynamics suggested insight into patient-tailored interventions.


Assuntos
Medicina de Precisão , Humanos , Fluxo de Trabalho , Simulação por Computador , Medicina de Precisão/métodos , Análise por Conglomerados
4.
Int J Mol Sci ; 25(1)2023 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-38203629

RESUMO

Among the several mechanisms accounting for endocrine resistance in breast cancer, autophagy has emerged as an important player. Previous reports have evidenced that tamoxifen (Tam) induces autophagy and activates transcription factor EB (TFEB), which regulates the expression of genes controlling autophagy and lysosomal biogenesis. However, the mechanisms by which this occurs have not been elucidated as yet. This investigation aims at dissecting how TFEB is activated and contributes to Tam resistance in luminal A breast cancer cells. TFEB was overexpressed and prominently nuclear in Tam-resistant MCF7 cells (MCF7-TamR) compared with their parental counterpart, and this was not dependent on alterations of its nucleo-cytoplasmic shuttling. Tam promoted the release of lysosomal Ca2+ through the major transient receptor potential cation channel mucolipin subfamily member 1 (TRPML1) and two-pore channels (TPCs), which caused the nuclear translocation and activation of TFEB. Consistently, inhibiting lysosomal calcium release restored the susceptibility of MCF7-TamR cells to Tam. Our findings demonstrate that Tam drives the nuclear relocation and transcriptional activation of TFEB by triggering the release of Ca2+ from the acidic compartment, and they suggest that lysosomal Ca2+ channels may represent new druggable targets to counteract the onset of autophagy-mediated endocrine resistance in luminal A breast cancer cells.


Assuntos
Cálcio , Neoplasias , Tamoxifeno/farmacologia , Cálcio da Dieta , Autofagia , Lisossomos
5.
BMC Bioinformatics ; 22(1): 209, 2021 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-33888059

RESUMO

BACKGROUND: Graphs are mathematical structures widely used for expressing relationships among elements when representing biomedical and biological information. On top of these representations, several analyses are performed. A common task is the search of one substructure within one graph, called target. The problem is referred to as one-to-one subgraph search, and it is known to be NP-complete. Heuristics and indexing techniques can be applied to facilitate the search. Indexing techniques are also exploited in the context of searching in a collection of target graphs, referred to as one-to-many subgraph problem. Filter-and-verification methods that use indexing approaches provide a fast pruning of target graphs or parts of them that do not contain the query. The expensive verification phase is then performed only on the subset of promising targets. Indexing strategies extract graph features at a sufficient granularity level for performing a powerful filtering step. Features are memorized in data structures allowing an efficient access. Indexing size, querying time and filtering power are key points for the development of efficient subgraph searching solutions. RESULTS: An existing approach, GRAPES, has been shown to have good performance in terms of speed-up for both one-to-one and one-to-many cases. However, it suffers in the size of the built index. For this reason, we propose GRAPES-DD, a modified version of GRAPES in which the indexing structure has been replaced with a Decision Diagram. Decision Diagrams are a broad class of data structures widely used to encode and manipulate functions efficiently. Experiments on biomedical structures and synthetic graphs have confirmed our expectation showing that GRAPES-DD has substantially reduced the memory utilization compared to GRAPES without worsening the searching time. CONCLUSION: The use of Decision Diagrams for searching in biochemical and biological graphs is completely new and potentially promising thanks to their ability to encode compactly sets by exploiting their structure and regularity, and to manipulate entire sets of elements at once, instead of exploring each single element explicitly. Search strategies based on Decision Diagram makes the indexing for biochemical graphs, and not only, more affordable allowing us to potentially deal with huge and ever growing collections of biochemical and biological structures.


Assuntos
Vitis , Indexação e Redação de Resumos , Algoritmos , Bases de Dados Factuais
6.
Br J Haematol ; 194(2): 378-381, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34002365

RESUMO

Minimal residual disease (MRD) determined by classic polymerase chain reaction (PCR) methods is a powerful outcome predictor in mantle cell lymphoma (MCL). Nevertheless, some technical pitfalls can reduce the rate of of molecular markers. Therefore, we applied the EuroClonality-NGS IGH (next-generation sequencing immunoglobulin heavy chain) method (previously published in acute lymphoblastic leukaemia) to 20 MCL patients enrolled in an Italian phase III trial sponsored by Fondazione Italiana Linfomi. Results from this preliminary investigation show that EuroClonality-NGS IGH method is feasible in the MCL context, detecting a molecular IGH target in 19/20 investigated cases, allowing MRD monitoring also in those patients lacking a molecular marker for classical screening approaches.


Assuntos
Rearranjo Gênico , Sequenciamento de Nucleotídeos em Larga Escala , Cadeias Pesadas de Imunoglobulinas/genética , Linfoma de Célula do Manto/genética , Biomarcadores Tumorais/genética , Genes de Imunoglobulinas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Itália/epidemiologia , Linfoma de Célula do Manto/diagnóstico , Linfoma de Célula do Manto/epidemiologia , Neoplasia Residual/diagnóstico , Neoplasia Residual/epidemiologia , Neoplasia Residual/genética
7.
Int J Mol Sci ; 22(23)2021 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-34884559

RESUMO

BACKGROUND: Biological processes are based on complex networks of cells and molecules. Single cell multi-omics is a new tool aiming to provide new incites in the complex network of events controlling the functionality of the cell. METHODS: Since single cell technologies provide many sample measurements, they are the ideal environment for the application of Deep Learning and Machine Learning approaches. An autoencoder is composed of an encoder and a decoder sub-model. An autoencoder is a very powerful tool in data compression and noise removal. However, the decoder model remains a black box from which is impossible to depict the contribution of the single input elements. We have recently developed a new class of autoencoders, called Sparsely Connected Autoencoders (SCA), which have the advantage of providing a controlled association among the input layer and the decoder module. This new architecture has the benefit that the decoder model is not a black box anymore and can be used to depict new biologically interesting features from single cell data. RESULTS: Here, we show that SCA hidden layer can grab new information usually hidden in single cell data, like providing clustering on meta-features difficult, i.e. transcription factors expression, or not technically not possible, i.e. miRNA expression, to depict in single cell RNAseq data. Furthermore, SCA representation of cell clusters has the advantage of simulating a conventional bulk RNAseq, which is a data transformation allowing the identification of similarity among independent experiments. CONCLUSIONS: In our opinion, SCA represents the bioinformatics version of a universal "Swiss-knife" for the extraction of hidden knowledgeable features from single cell omics data.


Assuntos
Adenocarcinoma de Pulmão/patologia , Análise por Conglomerados , Biologia Computacional/métodos , Neoplasias Pulmonares/patologia , Aprendizado de Máquina , Redes Neurais de Computação , Análise de Célula Única/métodos , Adenocarcinoma de Pulmão/genética , Humanos , Neoplasias Pulmonares/genética , Sequenciamento do Exoma
8.
Int J Mol Sci ; 22(8)2021 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-33921709

RESUMO

BACKGROUND: Disruption of alternative splicing (AS) is frequently observed in cancer and might represent an important signature for tumor progression and therapy. Exon skipping (ES) represents one of the most frequent AS events, and in non-small cell lung cancer (NSCLC) MET exon 14 skipping was shown to be targetable. METHODS: We constructed neural networks (NN/CNN) specifically designed to detect MET exon 14 skipping events using RNAseq data. Furthermore, for discovery purposes we also developed a sparsely connected autoencoder to identify uncharacterized MET isoforms. RESULTS: The neural networks had a Met exon 14 skipping detection rate greater than 94% when tested on a manually curated set of 690 TCGA bronchus and lung samples. When globally applied to 2605 TCGA samples, we observed that the majority of false positives was characterized by a blurry coverage of exon 14, but interestingly they share a common coverage peak in the second intron and we speculate that this event could be the transcription signature of a LINE1 (Long Interspersed Nuclear Element 1)-MET (Mesenchymal Epithelial Transition receptor tyrosine kinase) fusion. CONCLUSIONS: Taken together, our results indicate that neural networks can be an effective tool to provide a quick classification of pathological transcription events, and sparsely connected autoencoders could represent the basis for the development of an effective discovery tool.


Assuntos
Aprendizado Profundo , Éxons/genética , Variação Genética/genética , Humanos , Redes Neurais de Computação
9.
BMC Bioinformatics ; 21(Suppl 17): 550, 2020 Dec 14.
Artigo em Inglês | MEDLINE | ID: mdl-33308135

RESUMO

BACKGROUND: Multiple Sclerosis (MS) represents nowadays in Europe the leading cause of non-traumatic disabilities in young adults, with more than 700,000 EU cases. Although huge strides have been made over the years, MS etiology remains partially unknown. Furthermore, the presence of various endogenous and exogenous factors can greatly influence the immune response of different individuals, making it difficult to study and understand the disease. This becomes more evident in a personalized-fashion when medical doctors have to choose the best therapy for patient well-being. In this optics, the use of stochastic models, capable of taking into consideration all the fluctuations due to unknown factors and individual variability, is highly advisable. RESULTS: We propose a new model to study the immune response in relapsing remitting MS (RRMS), the most common form of MS that is characterized by alternate episodes of symptom exacerbation (relapses) with periods of disease stability (remission). In this new model, both the peripheral lymph node/blood vessel and the central nervous system are explicitly represented. The model was created and analysed using Epimod, our recently developed general framework for modeling complex biological systems. Then the effectiveness of our model was shown by modeling the complex immunological mechanisms characterizing RRMS during its course and under the DAC administration. CONCLUSIONS: Simulation results have proven the ability of the model to reproduce in silico the immune T cell balance characterizing RRMS course and the DAC effects. Furthermore, they confirmed the importance of a timely intervention on the disease course.


Assuntos
Sistema Imunitário/fisiologia , Modelos Biológicos , Esclerose Múltipla Recidivante-Remitente/imunologia , Interface Usuário-Computador , Algoritmos , Daclizumabe/uso terapêutico , Humanos , Imunossupressores/uso terapêutico , Esclerose Múltipla Recidivante-Remitente/tratamento farmacológico , Esclerose Múltipla Recidivante-Remitente/patologia , Processos Estocásticos
10.
BMC Bioinformatics ; 21(Suppl 8): 344, 2020 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-32938370

RESUMO

BACKGROUND: Emerging and re-emerging infectious diseases such as Zika, SARS, ncovid19 and Pertussis, pose a compelling challenge for epidemiologists due to their significant impact on global public health. In this context, computational models and computer simulations are one of the available research tools that epidemiologists can exploit to better understand the spreading characteristics of these diseases and to decide on vaccination policies, human interaction controls, and other social measures to counter, mitigate or simply delay the spread of the infectious diseases. Nevertheless, the construction of mathematical models for these diseases and their solutions remain a challenging tasks due to the fact that little effort has been devoted to the definition of a general framework easily accessible even by researchers without advanced modelling and mathematical skills. RESULTS: In this paper we describe a new general modeling framework to study epidemiological systems, whose novelties and strengths are: (1) the use of a graphical formalism to simplify the model creation phase; (2) the implementation of an R package providing a friendly interface to access the analysis techniques implemented in the framework; (3) a high level of portability and reproducibility granted by the containerization of all analysis techniques implemented in the framework; (4) a well-defined schema and related infrastructure to allow users to easily integrate their own analysis workflow in the framework. Then, the effectiveness of this framework is showed through a case of study in which we investigate the pertussis epidemiology in Italy. CONCLUSIONS: We propose a new general modeling framework for the analysis of epidemiological systems, which exploits Petri Net graphical formalism, R environment, and Docker containerization to derive a tool easily accessible by any researcher even without advanced mathematical and computational skills. Moreover, the framework was implemented following the guidelines defined by Reproducible Bioinformatics Project so it guarantees reproducible analysis and makes simple the developed of new user-defined workflows.


Assuntos
Biologia Computacional/métodos , Simulação por Computador/normas , Vacinação/métodos , Coqueluche/epidemiologia , Adolescente , Criança , Humanos , Reprodutibilidade dos Testes
11.
BMC Infect Dis ; 20(1): 798, 2020 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-33115434

RESUMO

BACKGROUND: Severe acute respiratory syndrome coronavirus 2 (SARS-COV-2), the causative agent of the coronavirus disease 19 (COVID-19), is a highly transmittable virus. Since the first person-to-person transmission of SARS-CoV-2 was reported in Italy on February 21st, 2020, the number of people infected with SARS-COV-2 increased rapidly, mainly in northern Italian regions, including Piedmont. A strict lockdown was imposed on March 21st until May 4th when a gradual relaxation of the restrictions started. In this context, computational models and computer simulations are one of the available research tools that epidemiologists can exploit to understand the spread of the diseases and to evaluate social measures to counteract, mitigate or delay the spread of the epidemic. METHODS: This study presents an extended version of the Susceptible-Exposed-Infected-Removed-Susceptible (SEIRS) model accounting for population age structure. The infectious population is divided into three sub-groups: (i) undetected infected individuals, (ii) quarantined infected individuals and (iii) hospitalized infected individuals. Moreover, the strength of the government restriction measures and the related population response to these are explicitly represented in the model. RESULTS: The proposed model allows us to investigate different scenarios of the COVID-19 spread in Piedmont and the implementation of different infection-control measures and testing approaches. The results show that the implemented control measures have proven effective in containing the epidemic, mitigating the potential dangerous impact of a large proportion of undetected cases. We also forecast the optimal combination of individual-level measures and community surveillance to contain the new wave of COVID-19 spread after the re-opening work and social activities. CONCLUSIONS: Our model is an effective tool useful to investigate different scenarios and to inform policy makers about the potential impact of different control strategies. This will be crucial in the upcoming months, when very critical decisions about easing control measures will need to be taken.


Assuntos
Controle de Doenças Transmissíveis/métodos , Infecções por Coronavirus/epidemiologia , Infecções por Coronavirus/prevenção & controle , Pandemias/prevenção & controle , Pneumonia Viral/epidemiologia , Pneumonia Viral/prevenção & controle , Betacoronavirus/isolamento & purificação , COVID-19 , Portador Sadio/diagnóstico , Portador Sadio/epidemiologia , Infecções por Coronavirus/diagnóstico , Infecções por Coronavirus/transmissão , Suscetibilidade a Doenças/diagnóstico , Suscetibilidade a Doenças/epidemiologia , Humanos , Itália/epidemiologia , Modelos Teóricos , Pneumonia Viral/diagnóstico , Pneumonia Viral/transmissão , Quarentena , SARS-CoV-2
12.
BMC Bioinformatics ; 20(Suppl 6): 623, 2019 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-31822261

RESUMO

BACKGROUND: Multiple Sclerosis (MS) is an immune-mediated inflammatory disease of the Central Nervous System (CNS) which damages the myelin sheath enveloping nerve cells thus causing severe physical disability in patients. Relapsing Remitting Multiple Sclerosis (RRMS) is one of the most common form of MS in adults and is characterized by a series of neurologic symptoms, followed by periods of remission. Recently, many treatments were proposed and studied to contrast the RRMS progression. Among these drugs, daclizumab (commercial name Zinbryta), an antibody tailored against the Interleukin-2 receptor of T cells, exhibited promising results, but its efficacy was accompanied by an increased frequency of serious adverse events. Manifested side effects consisted of infections, encephalitis, and liver damages. Therefore daclizumab has been withdrawn from the market worldwide. Another interesting case of RRMS regards its progression in pregnant women where a smaller incidence of relapses until the delivery has been observed. RESULTS: In this paper we propose a new methodology for studying RRMS, which we implemented in GreatSPN, a state-of-the-art open-source suite for modelling and analyzing complex systems through the Petri Net (PN) formalism. This methodology exploits: (a) an extended Colored PN formalism to provide a compact graphical description of the system and to automatically derive a set of ODEs encoding the system dynamics and (b) the Latin Hypercube Sampling with PRCC index to calibrate ODE parameters for reproducing the real behaviours in healthy and MS subjects.To show the effectiveness of such methodology a model of RRMS has been constructed and studied. Two different scenarios of RRMS were thus considered. In the former scenario the effect of the daclizumab administration is investigated, while in the latter one RRMS was studied in pregnant women. CONCLUSIONS: We propose a new computational methodology to study RRMS disease. Moreover, we show that model generated and calibrated according to this methodology is able to reproduce the expected behaviours.


Assuntos
Simulação por Computador , Esclerose Múltipla Recidivante-Remitente , Biologia Computacional , Progressão da Doença , Feminino , Humanos , Imunossupressores/uso terapêutico , Esclerose Múltipla Recidivante-Remitente/imunologia , Esclerose Múltipla Recidivante-Remitente/fisiopatologia , Gravidez , Recidiva
13.
Bioinformatics ; 34(5): 871-872, 2018 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-29069297

RESUMO

Summary: Short reads sequencing technology has been used for more than a decade now. However, the analysis of RNAseq and ChIPseq data is still computational demanding and the simple access to raw data does not guarantee results reproducibility between laboratories. To address these two aspects, we developed SeqBox, a cheap, efficient and reproducible RNAseq/ChIPseq hardware/software solution based on NUC6I7KYK mini-PC (an Intel consumer game computer with a fast processor and a high performance SSD disk), and Docker container platform. In SeqBox the analysis of RNAseq and ChIPseq data is supported by a friendly GUI. This allows access to fast and reproducible analysis also to scientists with/without scripting experience. Availability and implementation: Docker container images, docker4seq package and the GUI are available at http://www.bioinformatica.unito.it/reproducibile.bioinformatics.html. Contact: beccuti@di.unito.it. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Imunoprecipitação da Cromatina/métodos , Análise de Sequência de RNA/métodos , Software , Biologia Computacional/métodos , Reprodutibilidade dos Testes
14.
Int J Mol Sci ; 21(1)2019 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-31906249

RESUMO

Recent improvements in cost-effectiveness of high-throughput technologies has allowed RNA sequencing of total transcriptomes suitable for evaluating the expression and regulation of circRNAs, a relatively novel class of transcript isoforms with suggested roles in transcriptional and post-transcriptional gene expression regulation, as well as their possible use as biomarkers, due to their deregulation in various human diseases. A limited number of integrated workflows exists for prediction, characterization, and differential expression analysis of circRNAs, none of them complying with computational reproducibility requirements. We developed Docker4Circ for the complete analysis of circRNAs from RNA-Seq data. Docker4Circ runs a comprehensive analysis of circRNAs in human and model organisms, including: circRNAs prediction; classification and annotation using six public databases; back-splice sequence reconstruction; internal alternative splicing of circularizing exons; alignment-free circRNAs quantification from RNA-Seq reads; and differential expression analysis. Docker4Circ makes circRNAs analysis easier and more accessible thanks to: (i) its R interface; (ii) encapsulation of computational tasks into docker images; (iii) user-friendly Java GUI Interface availability; and (iv) no need of advanced bash scripting skills for correct use. Furthermore, Docker4Circ ensures a reproducible analysis since all its tasks are embedded into a docker image following the guidelines provided by Reproducible Bioinformatics Project.


Assuntos
Bases de Dados de Ácidos Nucleicos , RNA Circular/genética , RNA-Seq , Software , Animais , Humanos
15.
BMC Bioinformatics ; 19(Suppl 10): 349, 2018 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-30367595

RESUMO

BACKGROUND: Reproducibility of a research is a key element in the modern science and it is mandatory for any industrial application. It represents the ability of replicating an experiment independently by the location and the operator. Therefore, a study can be considered reproducible only if all used data are available and the exploited computational analysis workflow is clearly described. However, today for reproducing a complex bioinformatics analysis, the raw data and the list of tools used in the workflow could be not enough to guarantee the reproducibility of the results obtained. Indeed, different releases of the same tools and/or of the system libraries (exploited by such tools) might lead to sneaky reproducibility issues. RESULTS: To address this challenge, we established the Reproducible Bioinformatics Project (RBP), which is a non-profit and open-source project, whose aim is to provide a schema and an infrastructure, based on docker images and R package, to provide reproducible results in Bioinformatics. One or more Docker images are then defined for a workflow (typically one for each task), while the workflow implementation is handled via R-functions embedded in a package available at github repository. Thus, a bioinformatician participating to the project has firstly to integrate her/his workflow modules into Docker image(s) exploiting an Ubuntu docker image developed ad hoc by RPB to make easier this task. Secondly, the workflow implementation must be realized in R according to an R-skeleton function made available by RPB to guarantee homogeneity and reusability among different RPB functions. Moreover she/he has to provide the R vignette explaining the package functionality together with an example dataset which can be used to improve the user confidence in the workflow utilization. CONCLUSIONS: Reproducible Bioinformatics Project provides a general schema and an infrastructure to distribute robust and reproducible workflows. Thus, it guarantees to final users the ability to repeat consistently any analysis independently by the used UNIX-like architecture.


Assuntos
Biologia Computacional/métodos , Humanos , MicroRNAs/genética , Reprodutibilidade dos Testes , Software , Interface Usuário-Computador , Fluxo de Trabalho
16.
BMC Bioinformatics ; 18(1): 516, 2017 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-29169317

RESUMO

BACKGROUND: Mantle Cell Lymphoma (MCL) is a B cell aggressive neoplasia accounting for about the 6% of all lymphomas. The most common molecular marker of clonality in MCL, as in other B lymphoproliferative disorders, is the ImmunoGlobulin Heavy chain (IGH) rearrangement, occurring in B-lymphocytes. The patient-specific IGH rearrangement is extensively used to monitor the Minimal Residual Disease (MRD) after treatment through the standardized Allele-Specific Oligonucleotides Quantitative Polymerase Chain Reaction based technique. Recently, several studies have suggested that the IGH monitoring through deep sequencing techniques can produce not only comparable results to Polymerase Chain Reaction-based methods, but also might overcome the classical technique in terms of feasibility and sensitivity. However, no standard bioinformatics tool is available at the moment for data analysis in this context. RESULTS: In this paper we present HashClone, an easy-to-use and reliable bioinformatics tool that provides B-cells clonality assessment and MRD monitoring over time analyzing data from Next-Generation Sequencing (NGS) technique. The HashClone strategy-based is composed of three steps: the first and second steps implement an alignment-free prediction method that identifies a set of putative clones belonging to the repertoire of the patient under study. In the third step the IGH variable region, diversity region, and joining region identification is obtained by the alignment of rearrangements with respect to the international ImMunoGenetics information system database. Moreover, a provided graphical user interface for HashClone execution and clonality visualization over time facilitate the tool use and the results interpretation. The HashClone performance was tested on the NGS data derived from MCL patients to assess the major B-cell clone in the diagnostic samples and to monitor the MRD in the real and artificial follow up samples. CONCLUSIONS: Our experiments show that in all the experimental settings, HashClone was able to correctly detect the major B-cell clones and to precisely follow them in several samples showing better accuracy than the state-of-art tool.


Assuntos
Linfoma de Células B/genética , Neoplasia Residual/genética , Algoritmos , Alelos , Linfócitos B/patologia , Células Clonais , Humanos , Reprodutibilidade dos Testes
17.
Biochim Biophys Acta ; 1864(2): 211-8, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26589354

RESUMO

The adduction of fumaric acid to the sulfhydryl group of certain cysteine (Cys) residues in proteins via a Michael-like reaction leads to the formation of S-(2-succino)cysteine (2SC) sites. Although its role remains to be fully understood, this post-translational Cys modification (protein succination) has been implicated in the pathogenesis of diabetes/obesity and fumarate hydratase-related diseases. In this study, theoretical approaches to address sequence- and 3D-structure-based features possibly underlying the specificity of protein succination have been applied to perform the first analysis of the available data on the succinate proteome. A total of 182 succinated proteins, 205 modifiable, and 1750 non-modifiable sites have been examined. The rate of 2SC sites per protein ranged from 1 to 3, and the overall relative abundance of modifiable sites was 10.8%. Modifiable and non-modifiable sites were not distinguishable when the hydrophobicity of the Cys-flaking peptides, the acid dissociation constant value of the sulfhydryl groups, and the secondary structure of the Cys-containing segments were compared. By contrast, significant differences were determined when the accessibility of the sulphur atoms and the amino acid composition of the Cys-flaking peptides were analysed. Based on these findings, a sequence-based score function has been evaluated as a descriptor for Cys residues. In conclusion, our results indicate that modifiable and non-modifiable sites form heterogeneous subsets when features often discussed to describe Cys reactivity are examined. However, they also suggest that some differences exist, which may constitute the baseline for further investigations aimed at the development of predictive methods for 2SC sites in proteins.


Assuntos
Cisteína/análogos & derivados , Processamento de Proteína Pós-Traducional/genética , Proteínas/química , Proteoma , Aminoácidos/química , Aminoácidos/genética , Biologia Computacional , Cisteína/química , Cisteína/genética , Fumaratos/química , Humanos , Modelos Teóricos , Conformação Molecular , Proteínas/genética , Análise de Sequência de Proteína , Succinatos/química
18.
Plant J ; 84(1): 216-27, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26252423

RESUMO

Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, because only 6278 bacterial artificial chromosome (BACs) in the physical map were sequenced, fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15 622 BACs representing the minimal tiling path of 72 052 physical-mapped gene-bearing BACs. This generated ~1.7 Gb of genomic sequence containing an estimated 2/3 of all Morex barley genes. Exploration of these sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high recombination rates, there are also gene-dense regions with suppressed recombination. We made use of published map-anchored sequence data from Aegilops tauschii to develop a synteny viewer between barley and the ancestor of the wheat D-genome. Except for some notable inversions, there is a high level of collinearity between the two species. The software HarvEST:Barley provides facile access to BAC sequences and their annotations, along with the barley-Ae. tauschii synteny viewer. These BAC sequences constitute a resource to improve the efficiency of marker development, map-based cloning, and comparative genomics in barley and related crops. Additional knowledge about regions of the barley genome that are gene-dense but low recombination is particularly relevant.


Assuntos
Cromossomos Artificiais Bacterianos/genética , Genoma de Planta/genética , Hordeum/genética , Dados de Sequência Molecular
19.
BMC Bioinformatics ; 16 Suppl 9: S2, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26050971

RESUMO

BACKGROUND: RNA-Seq provides remarkable power in the area of biomarkers discovery and disease characterization. Two crucial steps that affect RNA-Seq experiment results are Library Sample Preparation (LSP) and Bioinformatics Analysis (BA). This work describes an evaluation of the combined effect of LSP methods and BA tools in the detection of splice variants. RESULTS: Different LSPs (TruSeq unstranded/stranded, ScriptSeq, NuGEN) allowed the detection of a large common set of splice variants. However, each LSP also detected a small set of unique transcripts that are characterized by a low coverage and/or FPKM. This effect was particularly evident using the low input RNA NuGEN v2 protocol. A benchmark dataset, in which synthetic reads as well as reads generated from standard (Illumina TruSeq 100) and low input (NuGEN) LSPs were spiked-in was used to evaluate the effect of LSP on the statistical detection of alternative splicing events (AltDE). Statistical detection of AltDE was done using as prototypes for splice variant-quantification Cuffdiff2 and RSEM-EBSeq. As prototype for exon-level analysis DEXSeq was used. Exon-level analysis performed slightly better than splice variant-quantification approaches, although at most only 50% of the spiked-in transcripts was detected. The performances of both splice variant-quantification and exon-level analysis improved when raising the number of input reads. CONCLUSION: Data, derived from NuGEN v2, were not the ideal input for AltDE, especially when the exon-level approach was used. We observed that both splice variant-quantification and exon-level analysis performances were strongly dependent on the number of input reads. Moreover, the ribosomal RNA depletion protocol was less sensitive in detecting splicing variants, possibly due to the significant percentage of the reads mapping to non-coding transcripts.


Assuntos
Processamento Alternativo/genética , Biologia Computacional/métodos , Biblioteca Gênica , Análise de Sequência de RNA/métodos , Éxons/genética , Humanos , RNA/genética , RNA Ribossômico/genética , RNA Ribossômico/metabolismo , Fluxo de Trabalho
20.
Bioinformatics ; 30(24): 3556-7, 2014 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-25286921

RESUMO

SUMMARY: Chimera is a Bioconductor package that organizes, annotates, analyses and validates fusions reported by different fusion detection tools; current implementation can deal with output from bellerophontes, chimeraScan, deFuse, fusionCatcher, FusionFinder, FusionHunter, FusionMap, mapSplice, Rsubread, tophat-fusion and STAR. The core of Chimera is a fusion data structure that can store fusion events detected with any of the aforementioned tools. Fusions are then easily manipulated with standard R functions or through the set of functionalities specifically developed in Chimera with the aim of supporting the user in managing fusions and discriminating false-positive results.


Assuntos
Fusão Gênica , Software , Animais , Anotação de Sequência Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA