Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
1.
Sci Data ; 11(1): 471, 2024 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-38724521

RESUMEN

We present a de novo transcriptome of the mosquito vector Culex pipiens, assembled by sequences of susceptible and insecticide resistant larvae. The high quality of the assembly was confirmed by TransRate and BUSCO. A mapping percentage until 94.8% was obtained by aligning contigs to Nr, SwissProt, and TrEMBL, with 27,281 sequences that simultaneously mapped on the three databases. A total of 14,966 ORFs were also functionally annotated by using the eggNOG database. Among them, we identified ORF sequences of the main gene families involved in insecticide resistance. Therefore, this resource stands as a valuable reference for further studies of differential gene expression as well as to identify genes of interest for genetic-based control tools.


Asunto(s)
Culex , Resistencia a los Insecticidas , Larva , Transcriptoma , Animales , Culex/genética , Larva/genética , Larva/crecimiento & desarrollo , Resistencia a los Insecticidas/genética , Mosquitos Vectores/genética , Sistemas de Lectura Abierta
2.
Sci Data ; 10(1): 720, 2023 10 19.
Artículo en Inglés | MEDLINE | ID: mdl-37857654

RESUMEN

Understanding the genomic underpinnings of thermal adaptation is a hot topic in eco-evolutionary studies of parasites. Marine heteroxenous parasites have complex life cycles encompassing a free-living larval stage, an ectothermic intermediate host and a homeothermic definitive host, thus representing compelling systems for the study of thermal adaptation. The Antarctic anisakid Contracaecum osculatum sp. D is a marine parasite able to survive and thrive both at very cold and warm temperatures within the environment and its hosts. Here, a de novo transcriptome of C. osculatum sp. D was generated for the first time, by performing RNA-Seq experiments on a set of individuals exposed to temperatures experienced by the nematode during its life cycle. The analysis generated 425,954,724 reads, which were assembled and then annotated. The high-quality assembly was validated, achieving over 88% mapping against the transcriptome. The transcriptome of this parasite will represent a valuable genomic resource for future studies aimed at disentangling the genomic architecture of thermal tolerance and metabolic pathways related to temperature stress.


Asunto(s)
Nematodos , Parásitos , Animales , Humanos , Regiones Antárticas , Nematodos/genética , Temperatura , Transcriptoma
3.
Int J Mol Sci ; 24(14)2023 Jul 19.
Artículo en Inglés | MEDLINE | ID: mdl-37511429

RESUMEN

Molecular dynamics simulation is a widely employed computational technique for studying the dynamic behavior of molecular systems over time. By simulating macromolecular biological systems consisting of a drug, a receptor and a solvated environment with thousands of water molecules, MD allows for realistic ligand-receptor binding interactions (lrbi) to be studied. In this study, we present MD-ligand-receptor (MDLR), a state-of-the-art software designed to explore the intricate interactions between ligands and receptors over time using molecular dynamics trajectories. Unlike traditional static analysis tools, MDLR goes beyond simply taking a snapshot of ligand-receptor binding interactions (lrbi), uncovering long-lasting molecular interactions and predicting the time-dependent inhibitory activity of specific drugs. With MDLR, researchers can gain insights into the dynamic behavior of complex ligand-receptor systems. Our pipeline is optimized for high-performance computing, capable of efficiently processing vast molecular dynamics trajectories on multicore Linux servers or even multinode HPC clusters. In the latter case, MDLR allows the user to analyze large trajectories in a very short time. To facilitate the exploration and visualization of lrbi, we provide an intuitive Python notebook (Jupyter), which allows users to examine and interpret the results through various graphical representations.


Asunto(s)
Simulación de Dinámica Molecular , Programas Informáticos , Ligandos , Unión Proteica
4.
Sci Data ; 10(1): 330, 2023 05 27.
Artículo en Inglés | MEDLINE | ID: mdl-37244908

RESUMEN

Dispersal is a key process in ecology and evolutionary biology, as it shapes biodiversity patterns over space and time. Attitude to disperse is unevenly distributed among individuals within populations, and that individual personality can have pivotal roles in the shaping of this attitude. Here, we assembled and annotated the first de novo transcriptome of the head tissues of Salamandra salamandra from individuals, representative of distinct behavioral profiles. We obtained 1,153,432,918 reads, which were successfully assembled and annotated. The high-quality of the assembly was confirmed by three assembly validators. The alignment of contigs against the de novo transcriptome led to a mapping percentage higher than 94%. The homology annotation with DIAMOND led to 153,048 (blastx) and 95,942 (blastp) shared contigs, annotated on NR, Swiss-Prot and TrEMBL. The domain and site protein prediction led to 9850 GO-annotated contigs. This de novo transcriptome represents reliable reference for comparative gene expression studies between alternative behavioral types, for comparative gene expression studies within Salamandra, and for whole transcriptome and proteome studies in amphibians.


Asunto(s)
Salamandra , Transcriptoma , Animales , Perfilación de la Expresión Génica , Estudios de Asociación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Larva/genética , Anotación de Secuencia Molecular , Salamandra/genética
5.
Front Cell Infect Microbiol ; 13: 1079991, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37009516

RESUMEN

Introduction: Anisakis pegreffii is a sibling species within the A. simplex (s.l.) complex requiring marine homeothermic (mainly cetaceans) and heterothermic (crustaceans, fish, and cephalopods) organisms to complete its life cycle. It is also a zoonotic species, able to accidentally infect humans (anisakiasis). To investigate the molecular signals involved in this host-parasite interaction and pathogenesis, the proteomic composition of the extracellular vesicles (EVs) released by the third-stage larvae (L3) of A. pegreffii, was characterized. Methods: Genetically identified L3 of A. pegreffii were maintained for 24 h at 37°C and EVs were isolated by serial centrifugation and ultracentrifugation of culture media. Proteomic analysis was performed by Shotgun Analysis. Results and discussion: EVs showed spherical shaped structure (size 65-295 nm). Proteomic results were blasted against the A. pegreffii specific transcriptomic database, and 153 unique proteins were identified. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analysis predicted several proteins belonging to distinct metabolic pathways. The similarity search employing selected parasitic nematodes database revealed that proteins associated with A. pegreffii EVs might be involved in parasite survival and adaptation, as well as in pathogenic processes. Further, a possible link between the A. pegreffii EVs proteins versus those of human and cetaceans' hosts, were predicted by using HPIDB database. The results, herein described, expand knowledge concerning the proteins possibly implied in the host-parasite interactions between this parasite and its natural and accidental hosts.


Asunto(s)
Anisakiasis , Anisakis , Enfermedades de los Peces , Parásitos , Animales , Humanos , Anisakis/genética , Larva , Proteómica , Anisakiasis/etiología , Anisakiasis/parasitología , Enfermedades de los Peces/parasitología
6.
Int J Mol Sci ; 25(1)2023 Dec 23.
Artículo en Inglés | MEDLINE | ID: mdl-38203410

RESUMEN

Chronic exposure to ultraviolet (UV) radiation is known to induce the formation of DNA photo-adducts, including cyclobutane pyrimidine dimers (CPDs) and Dewar valence derivatives (DVs). While CPDs usually occur at higher frequency than DVs, recent studies have shown that the latter display superior selectivity and significant stability in interaction with the human DNA/topoisomerase 1 complex (TOP1). With the aim to deeply investigate the mechanism of interaction of DVs with TOP1, we report here four all-atom molecular dynamic simulations spanning one microsecond. These simulations are focused on the stability and conformational changes of two DNA/TOP1-DV complexes in solution, the data being compared with the biomimetic thymine dimer counterparts. Results from root-mean-square deviation (RMSD) and root-mean-square fluctuation (RMSF) analyses unequivocally confirmed increased stability of the DNA/TOP1-DV complexes throughout the simulation duration. Detailed interaction analyses, uncovering the presence of salt bridges, hydrogen bonds, water-mediated interactions, and hydrophobic interactions, as well as pinpointing the non-covalent interactions within the complexes, enabled the identification of specific TOP1 residues involved in the interactions over time and suggested a potential TOP1 inhibition mechanism in action.


Asunto(s)
ADN-Topoisomerasas de Tipo I , Simulación de Dinámica Molecular , Humanos , Biomimética , Aductos de ADN , Interpretación Estadística de Datos , Dímeros de Pirimidina
7.
Int J Mol Sci ; 23(23)2022 Nov 24.
Artículo en Inglés | MEDLINE | ID: mdl-36498979

RESUMEN

Human Topoisomerase I (hTop1p) is a ubiquitous enzyme that relaxes supercoiled DNA through a conserved mechanism involving transient breakage, rotation, and binding. Htop1p is the molecular target of the chemotherapeutic drug camptothecin (CPT). It causes the hTop1p-DNA complex to slow down the binding process and clash with the replicative machinery during the S phase of the cell cycle, forcing cells to activate the apoptotic response. This gives hTop1p a central role in cancer therapy. Recently, two artesunic acid derivatives (compounds c6 and c7) have been proposed as promising inhibitors of hTop1p with possible antitumor activity. We used several computational approaches to obtain in silico confirmations of the experimental data and to form a comprehensive dynamic description of the ligand-receptor system. We performed molecular docking analyses to verify the ability of the two new derivatives to access the enzyme-DNA interface, and a classical molecular dynamics simulation was performed to assess the capacity of the two compounds to maintain a stable binding pose over time. Finally, we calculated the noncovalent interactions between the two new derivatives and the hTop1p receptor in order to propose a possible inhibitory mechanism like that adopted by CPT.


Asunto(s)
ADN-Topoisomerasas de Tipo I , Inhibidores de Topoisomerasa I , Humanos , ADN-Topoisomerasas de Tipo I/metabolismo , Simulación del Acoplamiento Molecular , Inhibidores de Topoisomerasa I/farmacología , Inhibidores de Topoisomerasa I/química , Inhibidores Enzimáticos/farmacología , Camptotecina , ADN/química , Simulación de Dinámica Molecular
8.
Sci Data ; 9(1): 619, 2022 10 13.
Artículo en Inglés | MEDLINE | ID: mdl-36229462

RESUMEN

Understanding the genomic underpinnings of antipredatory behaviors is a hot topic in eco-evolutionary research. Yellow-bellied toad of the genus Bombina are textbook examples of the deimatic display, a time-structured behavior aimed at startling predators. Here, we generated the first de novo brain transcriptome of the Apennine yellow-bellied toad Bombina pachypus, a species showing inter-individual variation in the deimatic display. Through Rna-Seq experiments on a set of individuals showing distinct behavioral phenotypes, we generated 316,329,573 reads, which were assembled and annotated. The high-quality assembly was confirmed by assembly validators and by aligning the contigs against the de novo transcriptome with a mapping percentage higher than 91.0%. The homology annotation with DIAMOND (blastx) led to 77,391 contigs annotated on Nr, Swiss Prot and TrEMBL, whereas the domain and site protein prediction made with InterProScan led to 4747 GO-annotated and 1025 KEGG-annotated contigs. The B. pachypus transcriptome described here will be a valuable resource for further studies on the genomic underpinnings of behavioral variation in amphibians.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Transcriptoma , Animales , Anuros , Encéfalo , Anotación de Secuencia Molecular , Análisis de Secuencia de ARN
9.
BMC Res Notes ; 15(1): 223, 2022 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-35752825

RESUMEN

OBJECTIVES: Anisakis pegreffii is a zoonotic parasite requiring marine organisms to complete its life-history. Human infection (anisakiasis) occurs when the third stage larvae (L3) are accidentally ingested with raw or undercooked infected fish or squids. A new de novo transcriptome of A. pegreffii was here generated aiming to provide a robust bulk of data to be used for a comprehensive "ready-to-use" resource for detecting functional studies on genes and gene products of A. pegreffii involved in the molecular mechanisms of parasite-host interaction. DATA DESCRIPTION: A RNA-seq library of A. pegreffii L3 was here newly generated by using Illumina TruSeq platform. It was combined with other five RNA-seq datasets previously gathered from L3 of the same species stored in SRA of NCBI. The final dataset was analyzed by launching three assembler programs and two validation tools. The use of a robust pipeline produced a high-confidence protein-coding transcriptome of A. pegreffii. These data represent a more robust and complete transcriptome of this species with respect to the actually existing resources. This is of importance for understanding the involved adaptive and immunomodulatory genes implicated in the "cross talk" between the parasite and its hosts, including the accidental one (humans).


Asunto(s)
Anisakiasis , Anisakis , Parásitos , Animales , Anisakiasis/genética , Anisakiasis/parasitología , Anisakis/genética , Peces/genética , Larva/genética , Parásitos/genética , Transcriptoma
10.
Int J Mol Sci ; 23(2)2022 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-35055101

RESUMEN

We report here the synthesis of novel thymine biomimetic photo-adducts bearing an alkane spacer between nucleobases and characterized by antimelanoma activity against two mutated cancer cell lines overexpressing human Topoisomerase 1 (TOP1), namely SKMEL28 and RPMI7951. Among them, Dewar Valence photo-adducts showed a selectivity index higher than the corresponding pyrimidine-(6-4)-pyrimidone and cyclobutane counterpart and were characterized by the highest affinity towards TOP1/DNA complex as evaluated by molecular docking analysis. The antimelanoma activity of novel photo-adducts was retained after loading into UV photo-protective lignin nanoparticles as stabilizing agent and efficient drug delivery system. Overall, these results support a combined antimelanoma and UV sunscreen strategy involving the use of photo-protective lignin nanoparticles for the controlled release of thymine dimers on the skin followed by their sacrificial transformation into photo-adducts and successive inhibition of melanoma and alert of cellular UV machinery repair pathways.


Asunto(s)
Antineoplásicos/química , Antineoplásicos/farmacología , Mimetismo Biológico , Portadores de Fármacos/química , Lignina , Nanopartículas , Timina/química , Biomimética , Línea Celular Tumoral , Daño del ADN/efectos de los fármacos , Sistemas de Liberación de Medicamentos , Humanos , Lignina/química , Modelos Moleculares , Conformación Molecular , Estructura Molecular , Nanopartículas/química , Nanopartículas/ultraestructura , Fotoquímica , Dímeros de Pirimidina/química , Solventes , Análisis Espectral , Relación Estructura-Actividad , Rayos Ultravioleta
11.
Methods Mol Biol ; 2284: 253-270, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33835447

RESUMEN

RNA editing by A-to-I deamination is a relevant co/posttranscriptional modification carried out by ADAR enzymes. In humans, it has pivotal cellular effects and its deregulation has been linked to a variety of human disorders including neurological and neurodegenerative diseases and cancer. Despite its biological relevance, the detection of RNA editing variants in large transcriptome sequencing experiments (RNAseq) is yet a challenging computational task. To drastically reduce computing times we have developed a novel REDItools version able to identify A-to-I events in huge amount of RNAseq data employing High Performance Computing (HPC) infrastructures.Here we show how to use REDItools v2 in HPC systems.


Asunto(s)
Metodologías Computacionales , Edición de ARN/fisiología , Análisis de Secuencia de ARN/métodos , Animales , Biología Computacional/métodos , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias/genética , Enfermedades del Sistema Nervioso/genética , Enfermedades Neurodegenerativas/genética , Programas Informáticos , Transcriptoma
12.
Methods Mol Biol ; 2284: 393-415, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33835454

RESUMEN

Since 1950 main studies of RNA regarded its role in the protein synthesis. Later insights showed that only a small portion of RNA codes for proteins where the rest could have different functional roles. With the advent of Next Generation Sequencing (NGS) and in particular with RNA-seq technology the cost of sequencing production dropped down. Among the NGS application areas, the transcriptome analysis, that is, the analysis of transcripts in a cell, their quantification for a specific developmental stage or treatment condition, became more and more adopted in the laboratories. As a consequence in the last decade new insights were gained in the understanding of both transcriptome complexity and involvement of RNA molecules in cellular processes. For what concerns computational advances, bioinformatics research developed new methods for analyzing RNA-seq data. The comparison among transcriptome profiles from several samples is often a difficult task for nonexpert programmers. Here, in this chapter, we introduce RAP (RNA-Seq Analysis Pipeline), a completely automated web tool for transcriptome analysis. It is a user-friendly web tool implementing a detailed transcriptome workflow to detect differential expressed genes and transcript, identify spliced junctions and constitutive or alternative polyadenylation sites and predict gene fusion events. Through the web interface the researchers can get all this information without any knowledge of the underlying High Performance Computing infrastructure.


Asunto(s)
Internet , RNA-Seq/métodos , Programas Informáticos , Animales , Biología Computacional/métodos , Análisis de Datos , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Poliadenilación , RNA-Seq/estadística & datos numéricos , Análisis de Secuencia de ARN/métodos , Transcriptoma , Secuenciación del Exoma
13.
Nucleic Acids Res ; 49(D1): D1012-D1019, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33104797

RESUMEN

RNA editing is a relevant epitranscriptome phenomenon able to increase the transcriptome and proteome diversity of eukaryotic organisms. ADAR mediated RNA editing is widespread in humans in which millions of A-to-I changes modify thousands of primary transcripts. RNA editing has pivotal roles in the regulation of gene expression or modulation of the innate immune response or functioning of several neurotransmitter receptors. Massive transcriptome sequencing has fostered the research in this field. Nonetheless, different aspects of the RNA editing biology are still unknown and need to be elucidated. To support the study of A-to-I RNA editing we have updated our REDIportal catalogue raising its content to about 16 millions of events detected in 9642 human RNAseq samples from the GTEx project by using a dedicated pipeline based on the HPC version of the REDItools software. REDIportal now allows searches at sample level, provides overviews of RNA editing profiles per each RNAseq experiment, implements a Gene View module to look at individual events in their genic context and hosts the CLAIRE database. Starting from this novel version, REDIportal will start collecting non-human RNA editing changes for comparative genomics investigations. The database is freely available at http://srv00.recas.ba.infn.it/atlas/index.html.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Regulación de la Expresión Génica , Proteoma/genética , Edición de ARN/genética , Transcriptoma/genética , Secuencia de Bases/genética , Curaduría de Datos/métodos , Minería de Datos/métodos , Perfilación de la Expresión Génica/métodos , Genómica/métodos , Humanos , Internet , Proteómica/métodos
14.
Sci Data ; 7(1): 437, 2020 12 16.
Artículo en Inglés | MEDLINE | ID: mdl-33328476

RESUMEN

Stressful experiences are part of everyday life and animals have evolved physiological and behavioral responses aimed at coping with stress and maintaining homeostasis. However, repeated or intense stress can induce maladaptive reactions leading to behavioral disorders. Adaptations in the brain, mediated by changes in gene expression, have a crucial role in the stress response. Recent years have seen a tremendous increase in studies on the transcriptional effects of stress. The input raw data are freely available from public repositories and represent a wealth of information for further global and integrative retrospective analyses. We downloaded from the Sequence Read Archive 751 samples (SRA-experiments), from 18 independent BioProjects studying the effects of different stressors on the brain transcriptome in mice. We performed a massive bioinformatics re-analysis applying a single, standardized pipeline for computing differential gene expression. This data mining allowed the identification of novel candidate stress-related genes and specific signatures associated with different stress conditions. The large amount of computational results produced was systematized in the interactive "Stress Mice Portal".


Asunto(s)
Encéfalo/fisiología , Expresión Génica , Estrés Fisiológico , Estrés Psicológico , Transcriptoma , Animales , Biología Computacional , Minería de Datos , Conjuntos de Datos como Asunto , Femenino , Masculino , Ratones
15.
BMC Bioinformatics ; 21(Suppl 10): 353, 2020 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-32838738

RESUMEN

BACKGROUND: RNA editing is a widespread co-/post-transcriptional mechanism that alters primary RNA sequences through the modification of specific nucleotides and it can increase both the transcriptome and proteome diversity. The automatic detection of RNA-editing from RNA-seq data is computational intensive and limited to small data sets, thus preventing a reliable genome-wide characterisation of such process. RESULTS: In this work we introduce HPC-REDItools, an upgraded tool for accurate RNA-editing events discovery from large dataset repositories. AVAILABILITY: https://github.com/BioinfoUNIBA/REDItools2 . CONCLUSIONS: HPC-REDItools is dramatically faster than the previous version, REDItools, enabling big-data analysis by means of a MPI-based implementation and scaling almost linearly with the number of available cores.


Asunto(s)
Metodologías Computacionales , Edición de ARN/genética , Programas Informáticos , Algoritmos , Secuencia de Bases , Genoma , Transcriptoma/genética
16.
BMC Bioinformatics ; 21(Suppl 10): 352, 2020 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-32838759

RESUMEN

BACKGROUND: The advent of Next Generation Sequencing (NGS) technologies and the concomitant reduction in sequencing costs allows unprecedented high throughput profiling of biological systems in a cost-efficient manner. Modern biological experiments are increasingly becoming both data and computationally intensive and the wealth of publicly available biological data is introducing bioinformatics into the "Big Data" era. For these reasons, the effective application of High Performance Computing (HPC) architectures is becoming progressively more recognized also by bioinformaticians. Here we describe HPC resources provisioning pilot programs dedicated to bioinformaticians, run by the Italian Node of ELIXIR (ELIXIR-IT) in collaboration with CINECA, the main Italian supercomputing center. RESULTS: Starting from April 2016, CINECA and ELIXIR-IT launched the pilot Call "ELIXIR-IT HPC@CINECA", offering streamlined access to HPC resources for bioinformatics. Resources are made available either through web front-ends to dedicated workflows developed at CINECA or by providing direct access to the High Performance Computing systems through a standard command-line interface tailored for bioinformatics data analysis. This allows to offer to the biomedical research community a production scale environment, continuously updated with the latest available versions of publicly available reference datasets and bioinformatic tools. Currently, 63 research projects have gained access to the HPC@CINECA program, for a total handout of ~ 8 Millions of CPU/hours and, for data storage, ~ 100 TB of permanent and ~ 300 TB of temporary space. CONCLUSIONS: Three years after the beginning of the ELIXIR-IT HPC@CINECA program, we can appreciate its impact over the Italian bioinformatics community and draw some considerations. Several Italian researchers who applied to the program have gained access to one of the top-ranking public scientific supercomputing facilities in Europe. Those investigators had the opportunity to sensibly reduce computational turnaround times in their research projects and to process massive amounts of data, pursuing research approaches that would have been otherwise difficult or impossible to undertake. Moreover, by taking advantage of the wealth of documentation and training material provided by CINECA, participants had the opportunity to improve their skills in the usage of HPC systems and be better positioned to apply to similar EU programs of greater scale, such as PRACE. To illustrate the effective usage and impact of the resources awarded by the program - in different research applications - we report five successful use cases, which have already published their findings in peer-reviewed journals.


Asunto(s)
Biología Computacional , Metodologías Computacionales , Programas Informáticos , Algoritmos , Animales , Línea Celular , Bases de Datos Genéticas , Fusión Génica , Genoma , Humanos , Prunus persica/genética , Edición de ARN , Golondrinas/genética
17.
Gigascience ; 9(5)2020 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-32444882

RESUMEN

BACKGROUND: Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data. RESULTS: To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in genomic medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a high-performance computing cluster. Results with synthetic data and with single-nucleotide variants associated with Mendelian diseases and with genome-wide association study hits in the non-coding regions of the human genome, involhing millions of examples, show that parSMURF achieves state-of-the-art results and an 80-fold speed-up with respect to the sequential version. CONCLUSIONS: parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data. The C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available at https://github.com/AnacletoLAB/parSMURF.


Asunto(s)
Biología Computacional/métodos , Predisposición Genética a la Enfermedad , Variación Genética , Estudio de Asociación del Genoma Completo/métodos , Programas Informáticos , Algoritmos , Bases de Datos Genéticas , Genómica/métodos , Humanos , Aprendizaje Automático , Reproducibilidad de los Resultados
18.
Mol Neurobiol ; 57(5): 2301-2313, 2020 May.
Artículo en Inglés | MEDLINE | ID: mdl-32020500

RESUMEN

Autism spectrum disorder (ASD) is a heterogeneous neurodevelopmental condition with unknown etiology. Recent experimental evidences suggest the contribution of non-coding RNAs (ncRNAs) in the pathophysiology of ASD. In this work, we aimed to investigate the expression profile of the ncRNA class of circular RNAs (circRNAs) in the hippocampus of the BTBR T + tf/J (BTBR) mouse model and age-matched C57BL/6J (B6) mice. Alongside, we analyzed BTBR hippocampal gene expression profile to evaluate possible correlations between the differential abundance of circular and linear gene products. From RNA sequencing data, we identified circRNAs highly modulated in BTBR mice. Thirteen circRNAs and their corresponding linear isoforms were validated by RT-qPCR analysis. The BTBR-regulated circCdh9 was better characterized in terms of molecular structure and expression, highlighting altered levels not only in the hippocampus, but also in the cerebellum, prefrontal cortex, and amygdala. Finally, gene expression analysis of the BTBR hippocampus pinpointed altered biological and molecular pathways relevant for the ASD phenotype. By comparison of circRNA and gene expression profiles, we identified 6 genes significantly regulated at either circRNA or mRNA gene products, suggesting low overall correlation between circRNA and host gene expression. In conclusion, our results indicate a consistent deregulation of circRNA expression in the hippocampus of BTBR mice. ASD-related circRNAs should be considered in functional studies to identify their contribution to the etiology of the disorder. In addition, as abundant and highly stable molecules, circRNAs represent interesting potential biomarkers for autism.


Asunto(s)
Trastorno del Espectro Autista/metabolismo , Modelos Animales de Enfermedad , Hipocampo/metabolismo , Ratones Endogámicos/metabolismo , Ratones Mutantes/metabolismo , ARN Circular/biosíntesis , ARN Mensajero/biosíntesis , Animales , Trastorno del Espectro Autista/genética , Química Encefálica , Perfilación de la Expresión Génica , Ontología de Genes , Humanos , Masculino , Ratones Endogámicos C57BL , Ratones Endogámicos/genética , Ratones Mutantes/genética , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Especificidad de la Especie
19.
Nucleic Acids Res ; 47(1): 221-236, 2019 01 10.
Artículo en Inglés | MEDLINE | ID: mdl-30462294

RESUMEN

8-Oxo-7,8-dihydro-2'-deoxyguanosine (8-oxodG) is one of the major DNA modifications and a potent pre-mutagenic lesion prone to mispair with 2'-deoxyadenosine (dA). Several thousand residues of 8-oxodG are constitutively generated in the genome of mammalian cells, but their genomic distribution has not yet been fully characterized. Here, by using OxiDIP-Seq, a highly sensitive methodology that uses immuno-precipitation with efficient anti-8-oxodG antibodies combined with high-throughput sequencing, we report the genome-wide distribution of 8-oxodG in human non-tumorigenic epithelial breast cells (MCF10A), and mouse embryonic fibroblasts (MEFs). OxiDIP-Seq revealed sites of 8-oxodG accumulation overlapping with γH2AX ChIP-Seq signals within the gene body of transcribed long genes, particularly at the DNA replication origins contained therein. We propose that the presence of persistent single-stranded DNA, as a consequence of transcription-replication clashes at these sites, determines local vulnerability to DNA oxidation and/or its slow repair. This oxidatively-generated damage, likely in combination with other kinds of lesion, might contribute to the formation of DNA double strand breaks and activation of DNA damage response.


Asunto(s)
Daño del ADN/genética , Replicación del ADN/genética , Desoxiguanosina/análogos & derivados , Histonas/genética , 8-Hidroxi-2'-Desoxicoguanosina , Animales , Línea Celular Tumoral , Mapeo Cromosómico , ADN/química , ADN de Cadena Simple/genética , ADN de Cadena Simple/metabolismo , Desoxiadenosinas/genética , Desoxiguanosina/genética , Fibroblastos/metabolismo , Genoma/genética , Humanos , Ratones , Oxidación-Reducción , Origen de Réplica/genética
20.
Gigascience ; 7(10)2018 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-29860514

RESUMEN

Background: Gene fusions derive from chromosomal rearrangements. The resulting chimeric transcripts are often endowed with oncogenic potential. Furthermore, they serve as diagnostic tools for the clinical classification of cancer subgroups with different prognosis and, in some cases, they can provide specific drug targets. To date, many efforts have been carried out to study gene fusion events occurring in tumor samples. In recent years, the availability of a comprehensive next-generation sequencing dataset for all existing human tumor cell lines has provided the opportunity to further investigate these data in order to identify novel and still uncharacterized gene fusion events. Results: In our work, we have extensively reanalyzed 935 paired-end RNA-sequencing experiments downloaded from the Cancer Cell Line Encyclopedia repository, aiming at addressing novel putative cell-line specific gene fusion events in human malignancies. The bioinformatics analysis has been performed by the execution of four gene fusion detection algorithms. The results have been further prioritized by running a Bayesian classifier that makes an in silico validation. The collection of fusion events supported by all of the predictive software results in a robust set of ∼1,700 in silico predicted novel candidates suitable for downstream analyses. Given the huge amount of data and information produced, computational results have been systematized in a database named LiGeA. The database can be browsed through a dynamic and interactive web portal, further integrated with validated data from other well-known repositories. Taking advantage of the intuitive query forms, the users can easily access, navigate, filter, and select the putative gene fusions for further validations and studies. They can also find suitable experimental models for a given fusion of interest. Conclusions: We believe that the LiGeA resource can represent not only the first compendium of both known and putative novel gene fusion events in the catalog of all of the human malignant cell lines but it can also become a handy starting point for wet-lab biologists who wish to investigate novel cancer biomarkers and specific drug targets.


Asunto(s)
Análisis de Datos , Minería de Datos , Fusión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Línea Celular , Línea Celular Tumoral , Biología Computacional/métodos , Bases de Datos Genéticas , Reordenamiento Génico , Genoma Humano , Genómica/métodos , Humanos , Fusión de Oncogenes , Translocación Genética , Navegador Web
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA