RESUMEN
The 5' and 3' untranslated regions of eukaryotic mRNAs (UTRs) play crucial roles in the post-transcriptional regulation of gene expression through the modulation of nucleo-cytoplasmic mRNA transport, translation efficiency, subcellular localization, and message stability. Since 1996, we have developed and maintained UTRdb, a specialized database of UTR sequences. Here we present UTRdb 2.0, a major update of UTRdb featuring an extensive collection of eukaryotic 5' and 3' UTR sequences, including over 26 million entries from over 6 million genes and 573 species, enriched with a curated set of functional annotations. Annotations include CAGE tags and polyA signals to label the completeness of 5' and 3'UTRs, respectively. In addition, uORFs and IRES are annotated in 5'UTRs as well as experimentally validated miRNA targets in 3'UTRs. Further annotations include evolutionarily conserved blocks, Rfam motifs, ADAR-mediated RNA editing events, and m6A modifications. A web interface allowing a flexible selection and retrieval of specific subsets of UTRs, selected according to a combination of criteria, has been implemented which also provides comprehensive download facilities. UTRdb 2.0 is accessible at http://utrdb.cloud.ba.infn.it/utrdb/.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , Eucariontes , ARN Mensajero , Regiones no Traducidas , Regiones no Traducidas 3'/genética , Regiones no Traducidas 5' , Eucariontes/genética , Células Eucariotas/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismoRESUMEN
The interrelationship between IgAs and microbiota diversity is still unclear. Here we show that BALB/c mice had higher abundance and diversity of IgAs than C57BL/6 mice and that this correlated with increased microbiota diversity. We show that polyreactive IgAs mediated the entrance of non-invasive bacteria to Peyer's patches, independently of CX3CR1(+) phagocytes. This allowed the induction of bacteria-specific IgA and the establishment of a positive feedback loop of IgA production. Cohousing of mice or fecal transplantation had little or no influence on IgA production and had only partial impact on microbiota composition. Germ-free BALB/c, but not C57BL/6, mice already had polyreactive IgAs that influenced microbiota diversity and selection after colonization. Together, these data suggest that genetic predisposition to produce polyreactive IgAs has a strong impact on the generation of antigen-specific IgAs and the selection and maintenance of microbiota diversity.
Asunto(s)
Antígenos Bacterianos/inmunología , Variación Genética/inmunología , Inmunoglobulina A/inmunología , Microbiota/inmunología , Animales , Bacterias/clasificación , Bacterias/genética , Bacterias/inmunología , ADN Bacteriano/química , ADN Bacteriano/genética , Heces/microbiología , Citometría de Flujo , Interacciones Huésped-Patógeno/inmunología , Inmunización , Inmunoglobulina A/sangre , Inmunoglobulina A/metabolismo , Metagenómica/métodos , Ratones Endogámicos BALB C , Ratones Endogámicos C57BL , Microbiota/genética , Ganglios Linfáticos Agregados/inmunología , Ganglios Linfáticos Agregados/metabolismo , Ganglios Linfáticos Agregados/microbiología , Filogenia , ARN Ribosómico 16S/genética , Salmonella typhimurium/genética , Salmonella typhimurium/inmunología , Salmonella typhimurium/fisiología , Especificidad de la EspecieRESUMEN
In mammals, RNA editing events involve the conversion of adenosine (A) in inosine (I) by ADAR enzymes or the hydrolytic deamination of cytosine (C) in uracil (U) by the APOBEC family of enzymes, mostly APOBEC1. RNA editing has a plethora of biological functions, and its deregulation has been associated with various human disorders. While the large-scale detection of A-to-I is quite straightforward using the Illumina RNAseq technology, the identification of C-to-U events is a non-trivial task. This difficulty arises from the rarity of such events in eukaryotic genomes and the challenge of distinguishing them from background noise. Direct RNA sequencing by Oxford Nanopore Technology (ONT) permits the direct detection of Us on sequenced RNA reads. Surprisingly, using ONT reads from wild-type (WT) and APOBEC1-knock-out (KO) murine cell lines as well as in vitro synthesized RNA without any modification, we identified a systematic error affecting the accuracy of the Cs call, thereby leading to incorrect identifications of C-to-U events. To overcome this issue in direct RNA reads, here we introduce a novel machine learning strategy based on the isolation Forest (iForest) algorithm in which C-to-U editing events are considered as sequencing anomalies. Using in vitro synthesized and human ONT reads, our model optimizes the signal-to-noise ratio improving the detection of C-to-U editing sites with high accuracy, over 90% in all samples tested. Our results suggest that iForest, known for its rapid implementation and minimal memory requirements, is a promising tool to denoise ONT reads and reliably identify RNA modifications.
Asunto(s)
Edición de ARN , ARN , Ratones , Animales , Humanos , ARN/genética , Secuencia de Bases , Desaminasas APOBEC/genética , Mamíferos/genética , Análisis de Secuencia de ARNRESUMEN
Multiple acyl-CoA dehydrogenase deficiency (MADD) is a rare inborn error of metabolism affecting fatty acid and amino acid oxidation with an incidence of 1 in 200,000 live births. MADD has three clinical phenotypes: severe neonatal-onset with or without congenital anomalies, and a milder late-onset form. Clinical diagnosis is supported by urinary organic acid and blood acylcarnitine analysis using tandem mass spectrometry in newborn screening programs. MADD is an autosomal recessive trait caused by biallelic mutations in the ETFA, ETFB, and ETFDH genes encoding the alpha and beta subunits of the electron transfer flavoprotein (ETF) and ETF-coenzyme Q oxidoreductase enzymes. Despite significant advancements in sequencing techniques, many patients remain undiagnosed, impacting their access to clinical care and genetic counseling. In this report, we achieved a definitive molecular diagnosis in a newborn by combining whole-genome sequencing (WGS) with RNA sequencing (RNA-seq). Whole-exome sequencing and next-generation gene panels fail to detect variants, possibly affecting splicing, in deep intronic regions. Here, we report a unique deep intronic mutation in intron 1 of the ETFDH gene, c.35-959A>G, in a patient with early-onset lethal MADD, resulting in pseudo-exon inclusion. The identified variant is the third mutation reported in this region, highlighting ETFDH intron 1 vulnerability. It cannot be excluded that these intronic sequence features may be more common in other genes than is currently believed. This study highlights the importance of incorporating RNA analysis into genome-wide testing to reveal the functional consequences of intronic mutations.
Asunto(s)
Flavoproteínas Transportadoras de Electrones , Intrones , Proteínas Hierro-Azufre , Deficiencia Múltiple de Acil Coenzima A Deshidrogenasa , Oxidorreductasas actuantes sobre Donantes de Grupo CH-NH , Humanos , Deficiencia Múltiple de Acil Coenzima A Deshidrogenasa/genética , Flavoproteínas Transportadoras de Electrones/genética , Oxidorreductasas actuantes sobre Donantes de Grupo CH-NH/genética , Proteínas Hierro-Azufre/genética , Intrones/genética , Recién Nacido , Mutación , Masculino , Femenino , Secuenciación Completa del GenomaRESUMEN
Various next generation sequencing (NGS) based strategies have been successfully used in the recent past for tracing origins and understanding the evolution of infectious agents, investigating the spread and transmission chains of outbreaks, as well as facilitating the development of effective and rapid molecular diagnostic tests and contributing to the hunt for treatments and vaccines. The ongoing COVID-19 pandemic poses one of the greatest global threats in modern history and has already caused severe social and economic costs. The development of efficient and rapid sequencing methods to reconstruct the genomic sequence of SARS-CoV-2, the etiological agent of COVID-19, has been fundamental for the design of diagnostic molecular tests and to devise effective measures and strategies to mitigate the diffusion of the pandemic. Diverse approaches and sequencing methods can, as testified by the number of available sequences, be applied to SARS-CoV-2 genomes. However, each technology and sequencing approach has its own advantages and limitations. In the current review, we will provide a brief, but hopefully comprehensive, account of currently available platforms and methodological approaches for the sequencing of SARS-CoV-2 genomes. We also present an outline of current repositories and databases that provide access to SARS-CoV-2 genomic data and associated metadata. Finally, we offer general advice and guidelines for the appropriate sharing and deposition of SARS-CoV-2 data and metadata, and suggest that more efficient and standardized integration of current and future SARS-CoV-2-related data would greatly facilitate the struggle against this new pathogen. We hope that our 'vademecum' for the production and handling of SARS-CoV-2-related sequencing data, will contribute to this objective.
Asunto(s)
COVID-19/virología , Genoma Viral , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , SARS-CoV-2/genética , COVID-19/epidemiología , Humanos , PandemiasRESUMEN
RNA editing is a relevant epitranscriptome phenomenon able to increase the transcriptome and proteome diversity of eukaryotic organisms. ADAR mediated RNA editing is widespread in humans in which millions of A-to-I changes modify thousands of primary transcripts. RNA editing has pivotal roles in the regulation of gene expression or modulation of the innate immune response or functioning of several neurotransmitter receptors. Massive transcriptome sequencing has fostered the research in this field. Nonetheless, different aspects of the RNA editing biology are still unknown and need to be elucidated. To support the study of A-to-I RNA editing we have updated our REDIportal catalogue raising its content to about 16 millions of events detected in 9642 human RNAseq samples from the GTEx project by using a dedicated pipeline based on the HPC version of the REDItools software. REDIportal now allows searches at sample level, provides overviews of RNA editing profiles per each RNAseq experiment, implements a Gene View module to look at individual events in their genic context and hosts the CLAIRE database. Starting from this novel version, REDIportal will start collecting non-human RNA editing changes for comparative genomics investigations. The database is freely available at http://srv00.recas.ba.infn.it/atlas/index.html.
Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Regulación de la Expresión Génica , Proteoma/genética , Edición de ARN/genética , Transcriptoma/genética , Secuencia de Bases/genética , Curaduría de Datos/métodos , Minería de Datos/métodos , Perfilación de la Expresión Génica/métodos , Genómica/métodos , Humanos , Internet , Proteómica/métodosRESUMEN
Genome instability is a condition characterized by the accumulation of genetic alterations and is a hallmark of cancer cells. To uncover new genes and cellular pathways affecting endogenous DNA damage and genome integrity, we exploited a Synthetic Genetic Array (SGA)-based screen in yeast. Among the positive genes, we identified VID22, reported to be involved in DNA double-strand break repair. vid22Δ cells exhibit increased levels of endogenous DNA damage, chronic DNA damage response activation and accumulate DNA aberrations in sequences displaying high probabilities of forming G-quadruplexes (G4-DNA). If not resolved, these DNA secondary structures can block the progression of both DNA and RNA polymerases and correlate with chromosome fragile sites. Vid22 binds to and protects DNA at G4-containing regions both in vitro and in vivo. Loss of VID22 causes an increase in gross chromosomal rearrangement (GCR) events dependent on G-quadruplex forming sequences. Moreover, the absence of Vid22 causes defects in the correct maintenance of G4-DNA rich elements, such as telomeres and mtDNA, and hypersensitivity to the G4-stabilizing ligand TMPyP4. We thus propose that Vid22 is directly involved in genome integrity maintenance as a novel regulator of G4 metabolism.
Asunto(s)
G-Cuádruplex , Inestabilidad Genómica , Proteínas de la Membrana/fisiología , Proteínas de Saccharomyces cerevisiae/fisiología , Aberraciones Cromosómicas , Daño del ADN , Genoma Fúngico , Proteínas de la Membrana/genética , Proteínas de la Membrana/metabolismo , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Homeostasis del TelómeroRESUMEN
BACKGROUND: The high-mobility group Hmga family of proteins are non-histone chromatin-interacting proteins which have been associated with a number of nuclear functions, including heterochromatin formation, replication, recombination, DNA repair, transcription, and formation of enhanceosomes. Due to its role based on dynamic interaction with chromatin, Hmga2 has a pathogenic role in diverse tumors and has been mainly studied in a cancer context; however, whether Hmga2 has similar physiological functions in normal cells remains less explored. Hmga2 was additionally shown to be required during the exit of embryonic stem cells (ESCs) from the ground state of pluripotency, to allow their transition into epiblast-like cells (EpiLCs), and here, we use that system to gain further understanding of normal Hmga2 function. RESULTS: We demonstrated that Hmga2 KO pluripotent stem cells fail to develop into EpiLCs. By using this experimental system, we studied the chromatin changes that take place upon the induction of EpiLCs and we observed that the loss of Hmga2 affects the histone mark H3K27me3, whose levels are higher in Hmga2 KO cells. Accordingly, a sustained expression of polycomb repressive complex 2 (PRC2), responsible for H3K27me3 deposition, was observed in KO cells. However, gene expression differences between differentiating wt vs Hmga2 KO cells did not show any significant enrichments of PRC2 targets. Similarly, endogenous Hmga2 association to chromatin in epiblast stem cells did not show any clear relationships with gene expression modification observed in Hmga2 KO. Hmga2 ChIP-seq confirmed that this protein preferentially binds to the chromatin regions associated with nuclear lamina. Starting from this observation, we demonstrated that nuclear lamina underwent severe alterations when Hmga2 KO or KD cells were induced to exit from the naïve state and this phenomenon is accompanied by a mislocalization of the heterochromatin mark H3K9me3 within the nucleus. As nuclear lamina (NL) is involved in the organization of 3D chromatin structure, we explored the possible effects of Hmga2 loss on this phenomenon. The analysis of Hi-C data in wt and Hmga2 KO cells allowed us to observe that inter-TAD (topologically associated domains) interactions in Hmga2 KO cells are different from those observed in wt cells. These differences clearly show a peculiar compartmentalization of inter-TAD interactions in chromatin regions associated or not to nuclear lamina. CONCLUSIONS: Overall, our results indicate that Hmga2 interacts with heterochromatic lamin-associated domains, and highlight a role for Hmga2 in the crosstalk between chromatin and nuclear lamina, affecting the establishment of inter-TAD interactions.
Asunto(s)
Membrana Nuclear , Células Madre Pluripotentes , Cromatina/genética , Cromatina/metabolismo , Proteína HMGA2/genética , Proteína HMGA2/metabolismo , Heterocromatina/metabolismo , Histonas/genética , Membrana Nuclear/metabolismo , Células Madre Pluripotentes/metabolismo , Complejo Represivo Polycomb 2/genéticaRESUMEN
Microcephalic Osteodysplastic Primordial Dwarfism type II (MOPDII) represents the most common form of primordial dwarfism. MOPD clinical features include severe prenatal and postnatal growth retardation, postnatal severe microcephaly, hypotonia, and an increased risk for cerebrovascular disease and insulin resistance. Autosomal recessive biallelic loss-of-function genomic variants in the centrosomal pericentrin (PCNT) gene on chromosome 21q22 cause MOPDII. Over the past decade, exome sequencing (ES) and massive RNA sequencing have been effectively employed for both the discovery of novel disease genes and to expand the genotypes of well-known diseases. In this paper we report the results both the RNA sequencing and ES of three patients affected by MOPDII with the aim of exploring whether differentially expressed genes and previously uncharacterized gene variants, in addition to PCNT pathogenic variants, could be associated with the complex phenotype of this disease. We discovered a downregulation of key factors involved in growth, such as IGF1R, IGF2R, and RAF1, in all three investigated patients. Moreover, ES identified a shortlist of genes associated with deleterious, rare variants in MOPDII patients. Our results suggest that Next Generation Sequencing (NGS) technologies can be successfully applied for the molecular characterization of the complex genotypic background of MOPDII.
Asunto(s)
Enanismo , Microcefalia , Osteocondrodisplasias , Humanos , Femenino , Embarazo , Microcefalia/genética , Exoma/genética , Transcriptoma , Retardo del Crecimiento Fetal/genética , Enanismo/genética , Osteocondrodisplasias/genética , Genotipo , MutaciónRESUMEN
The Yes-associated protein (YAP), one of the major effectors of the Hippo pathway together with its related protein WW-domain-containing transcription regulator 1 (WWTR1; also known as TAZ), mediates a range of cellular processes from proliferation and death to morphogenesis. YAP and WW-domain-containing transcription regulator 1 (WWTR1; also known as TAZ) regulate a large number of target genes, acting as coactivators of DNA-binding transcription factors or as negative regulators of transcription by interacting with the nucleosome remodeling and histone deacetylase complexes. YAP is expressed in self-renewing embryonic stem cells (ESCs), although it is still debated whether it plays any crucial roles in the control of either stemness or differentiation. Here we show that the transient downregulation of YAP in mouse ESCs perturbs cellular homeostasis, leading to the inability to differentiate properly. Bisulfite genomic sequencing revealed that this transient knockdown caused a genome-wide alteration of the DNA methylation remodeling that takes place during the early steps of differentiation, suggesting that the phenotype we observed might be due to the dysregulation of some of the mechanisms involved in regulation of ESC exit from pluripotency. By gene expression analysis, we identified two molecules that could have a role in the altered genome-wide methylation profile: the long noncoding RNA ephemeron, whose rapid upregulation is crucial for the transition of ESCs into epiblast, and the methyltransferase-like protein Dnmt3l, which, during the embryo development, cooperates with Dnmt3a and Dnmt3b to contribute to the de novo DNA methylation that governs early steps of ESC differentiation. These data suggest a new role for YAP in the governance of the epigenetic dynamics of exit from pluripotency.
Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/metabolismo , Diferenciación Celular , ADN (Citosina-5-)-Metiltransferasas/metabolismo , Metilación de ADN , Células Madre Embrionarias de Ratones/citología , Proteínas Adaptadoras Transductoras de Señales/genética , Animales , ADN (Citosina-5-)-Metiltransferasas/genética , Ratones , Células Madre Embrionarias de Ratones/metabolismo , Transducción de Señal , Proteínas Señalizadoras YAP , ADN Metiltransferasa 3BRESUMEN
Effective systems for the analysis of molecular data are fundamental for monitoring the spread of infectious diseases and studying pathogen evolution. The rapid identification of emerging viral strains, and/or genetic variants potentially associated with novel phenotypic features is one of the most important objectives of genomic surveillance of human pathogens and represents one of the first lines of defense for the control of their spread. During the COVID 19 pandemic, several taxonomic frameworks have been proposed for the classification of SARS-Cov-2 isolates. These systems, which are typically based on phylogenetic approaches, represent essential tools for epidemiological studies as well as contributing to the study of the origin of the outbreak. Here, we propose an alternative, reproducible, and transparent phenetic method to study changes in SARS-CoV-2 genomic diversity over time. We suggest that our approach can complement other systems and facilitate the identification of biologically relevant variants in the viral genome. To demonstrate the validity of our approach, we present comparative genomic analyses of more than 175,000 genomes. Our method delineates 22 distinct SARS-CoV-2 haplogroups, which, based on the distribution of high-frequency genetic variants, fall into four major macrohaplogroups. We highlight biased spatiotemporal distributions of SARS-CoV-2 genetic profiles and show that seven of the 22 haplogroups (and of all of the four haplogroup clusters) showed a broad geographic distribution within China by the time the outbreak was widely recognized-suggesting early emergence and widespread cryptic circulation of the virus well before its isolation in January 2020. General patterns of genomic variability are remarkably similar within all major SARS-CoV-2 haplogroups, with UTRs consistently exhibiting the greatest variability, with s2m, a conserved secondary structure element of unknown function in the 3'-UTR of the viral genome showing evidence of a functional shift. Although several polymorphic sites that are specific to one or more haplogroups were predicted to be under positive or negative selection, overall our analyses suggest that the emergence of novel types is unlikely to be driven by convergent evolution and independent fixation of advantageous substitutions, or by selection of recombined strains. In the absence of extensive clinical metadata for most available genome sequences, and in the context of extensive geographic and temporal biases in the sampling, many questions regarding the evolution and clinical characteristics of SARS-CoV-2 isolates remain open. However, our data indicate that the approach outlined here can be usefully employed in the identification of candidate SARS-CoV-2 genetic variants of clinical and epidemiological importance.
Asunto(s)
COVID-19/genética , Evolución Molecular , Genoma Viral , Genómica , Filogenia , SARS-CoV-2/genética , HumanosRESUMEN
A number of studies have reported the successful application of single-molecule sequencing technologies to the determination of the size and sequence of pathological expanded microsatellite repeats over the last 5 years. However, different custom bioinformatics pipelines were employed in each study, preventing meaningful comparisons and somewhat limiting the reproducibility of the results. In this review, we provide a brief summary of state-of-the-art methods for the characterization of expanded repeats alleles, along with a detailed comparison of bioinformatics tools for the determination of repeat length and sequence, using both real and simulated data. Our reanalysis of publicly available human genome sequencing data suggests a modest, but statistically significant, increase of the error rate of single-molecule sequencing technologies at genomic regions containing short tandem repeats. However, we observe that all the methods herein tested, irrespective of the strategy used for the analysis of the data (either based on the alignment or assembly of the reads), show high levels of sensitivity in both the detection of expanded tandem repeats and the estimation of the expansion size, suggesting that approaches based on single-molecule sequencing technologies are highly effective for the detection and quantification of tandem repeat expansions and contractions.
Asunto(s)
Biología Computacional , Secuenciación de Nucleótidos de Alto Rendimiento , Repeticiones de Microsatélite , Datos de Secuencia Molecular , Análisis de Secuencia de ADN , Alelos , Mapeo Cromosómico , Genoma Humano , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN/métodosRESUMEN
SUMMARY: While over 200 000 genomic sequences are currently available through dedicated repositories, ad hoc methods for the functional annotation of SARS-CoV-2 genomes do not harness all currently available resources for the annotation of functionally relevant genomic sites. Here, we present CorGAT, a novel tool for the functional annotation of SARS-CoV-2 genomic variants. By comparisons with other state of the art methods we demonstrate that, by providing a more comprehensive and rich annotation, our method can facilitate the identification of evolutionary patterns in the genome of SARS-CoV-2. AVAILABILITYAND IMPLEMENTATION: Galaxy.http://corgat.cloud.ba.infn.it/galaxy; software: https://github.com/matteo14c/CorGAT/tree/Revision_V1; docker: https://hub.docker.com/r/laniakeacloud/galaxy_corgat. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMEN
SUMMARY: ITSoneWB (ITSone WorkBench) is a Galaxy-based bioinformatic environment where comprehensive and high-quality reference data are connected with established pipelines and new tools in an automated and easy-to-use service targeted at global taxonomic analysis of eukaryotic communities based on Internal Transcribed Spacer 1 variants high-throughput sequencing. AVAILABILITY AND IMPLEMENTATION: ITSoneWB has been deployed on the INFN-Bari ReCaS cloud facility and is freely available on the web at http://itsonewb.cloud.ba.infn.it/galaxy. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Eucariontes , Programas Informáticos , Biología Computacional , Secuenciación de Nucleótidos de Alto Rendimiento , Exactitud de los DatosRESUMEN
MOTIVATION: Clinical applications of genome re-sequencing technologies typically generate large amounts of data that need to be carefully annotated and interpreted to identify genetic variants potentially associated with pathological conditions. In this context, accurate and reproducible methods for the functional annotation and prioritization of genetic variants are of fundamental importance. RESULTS: In this article, we present VINYL, a flexible and fully automated system for the functional annotation and prioritization of genetic variants. Extensive analyses of both real and simulated datasets suggest that VINYL can identify clinically relevant genetic variants in a more accurate manner compared to equivalent state of the art methods, allowing a more rapid and effective prioritization of genetic variants in different experimental settings. As such we believe that VINYL can establish itself as a valuable tool to assist healthcare operators and researchers in clinical genomics investigations. AVAILABILITY AND IMPLEMENTATION: VINYL is available at http://beaconlab.it/VINYL and https://github.com/matteo14c/VINYL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMEN
BACKGROUND: Breast cancer (BC) is the most common malignancy in women, in whom it reaches 20% of the total neoplasia incidence. Most BCs are considered sporadic and a number of factors, including familiarity, age, hormonal cycles and diet, have been reported to be BC risk factors. Also the gut microbiota plays a role in breast cancer development. In fact, its imbalance has been associated to various human diseases including cancer although a consequential cause-effect phenomenon has never been proven. METHODS: The aim of this work was to characterize the breast tissue microbiome in 34 women affected by BC using an NGS-based method, and analyzing the tumoral and the adjacent non-tumoral tissue of each patient. RESULTS: The healthy and tumor tissues differed in bacterial composition and richness: the number of Amplicon Sequence Variants (ASVs) was higher in healthy tissues than in tumor tissues (p = 0.001). Moreover, our analyses, able to investigate from phylum down to species taxa for each sample, revealed major differences in the two richest phyla, namely, Proteobacteria and Actinobacteria. Notably, the levels of Actinobacteria and Proteobacteria were, respectively, higher and lower in healthy with respect to tumor tissues. CONCLUSIONS: Our study provides information about the breast tissue microbial composition, as compared with very closely adjacent healthy tissue (paired samples within the same woman); the differences found are such to have possible diagnostic and therapeutic implications; further studies are necessary to clarify if the differences found in the breast tissue microbiome are simply an association or a concausative pathogenetic effect in BC. A comparison of different results on similar studies seems not to assess a universal microbiome signature, but single ones depending on the environmental cohorts' locations.
Asunto(s)
Neoplasias de la Mama/microbiología , Mama/microbiología , Disbiosis/microbiología , Microbioma Gastrointestinal/genética , Adulto , Biodiversidad , Femenino , Humanos , Persona de Mediana Edad , ARN Ribosómico 16S/análisisRESUMEN
BACKGROUND: Improving the availability and usability of data and analytical tools is a critical precondition for further advancing modern biological and biomedical research. For instance, one of the many ramifications of the COVID-19 global pandemic has been to make even more evident the importance of having bioinformatics tools and data readily actionable by researchers through convenient access points and supported by adequate IT infrastructures. One of the most successful efforts in improving the availability and usability of bioinformatics tools and data is represented by the Galaxy workflow manager and its thriving community. In 2020 we introduced Laniakea, a software platform conceived to streamline the configuration and deployment of "on-demand" Galaxy instances over the cloud. By facilitating the set-up and configuration of Galaxy web servers, Laniakea provides researchers with a powerful and highly customisable platform for executing complex bioinformatics analyses. The system can be accessed through a dedicated and user-friendly web interface that allows the Galaxy web server's initial configuration and deployment. RESULTS: "Laniakea@ReCaS", the first instance of a Laniakea-based service, is managed by ELIXIR-IT and was officially launched in February 2020, after about one year of development and testing that involved several users. Researchers can request access to Laniakea@ReCaS through an open-ended call for use-cases. Ten project proposals have been accepted since then, totalling 18 Galaxy on-demand virtual servers that employ ~ 100 CPUs, ~ 250 GB of RAM and ~ 5 TB of storage and serve several different communities and purposes. Herein, we present eight use cases demonstrating the versatility of the platform. CONCLUSIONS: During this first year of activity, the Laniakea-based service emerged as a flexible platform that facilitated the rapid development of bioinformatics tools, the efficient delivery of training activities, and the provision of public bioinformatics services in different settings, including food safety and clinical research. Laniakea@ReCaS provides a proof of concept of how enabling access to appropriate, reliable IT resources and ready-to-use bioinformatics tools can considerably streamline researchers' work.
Asunto(s)
COVID-19 , Nube Computacional , Biología Computacional , Humanos , SARS-CoV-2 , Programas InformáticosRESUMEN
RNA editing is a widespread co/posttranscriptional mechanism affecting primary RNAs by specific nucleotide modifications, which plays relevant roles in molecular processes including regulation of gene expression and/or the processing of noncoding RNAs. In recent years, the detection of editing sites has been improved through the availability of high-throughput RNA sequencing (RNA-Seq) technologies. Accurate bioinformatics pipelines are essential for the analysis of next-generation sequencing (NGS) data to ensure the correct identification of edited sites. Several pipelines, using various read mappers and variant callers with a wide range of adjustable parameters, are available for the detection of RNA editing events. In this review, we discuss some of the most recent and popular tools and provide guidelines for RNA-Seq data generation and analysis for the detection of RNA editing in massive transcriptome data. Using simulated and real data sets, we provide an overview of their behavior, emphasizing the fact that the RNA editing detection in NGS data sets remains a challenging task.
Asunto(s)
Biología Computacional/métodos , Genoma Humano , Edición de ARN , Transcriptoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Análisis de Secuencia de ARN/métodos , Programas InformáticosRESUMEN
BACKGROUND: RNA editing is a widespread co-/post-transcriptional mechanism that alters primary RNA sequences through the modification of specific nucleotides and it can increase both the transcriptome and proteome diversity. The automatic detection of RNA-editing from RNA-seq data is computational intensive and limited to small data sets, thus preventing a reliable genome-wide characterisation of such process. RESULTS: In this work we introduce HPC-REDItools, an upgraded tool for accurate RNA-editing events discovery from large dataset repositories. AVAILABILITY: https://github.com/BioinfoUNIBA/REDItools2 . CONCLUSIONS: HPC-REDItools is dramatically faster than the previous version, REDItools, enabling big-data analysis by means of a MPI-based implementation and scaling almost linearly with the number of available cores.
Asunto(s)
Metodologías Computacionales , Edición de ARN/genética , Programas Informáticos , Algoritmos , Secuencia de Bases , Genoma , Transcriptoma/genéticaRESUMEN
BACKGROUND: The advent of Next Generation Sequencing (NGS) technologies and the concomitant reduction in sequencing costs allows unprecedented high throughput profiling of biological systems in a cost-efficient manner. Modern biological experiments are increasingly becoming both data and computationally intensive and the wealth of publicly available biological data is introducing bioinformatics into the "Big Data" era. For these reasons, the effective application of High Performance Computing (HPC) architectures is becoming progressively more recognized also by bioinformaticians. Here we describe HPC resources provisioning pilot programs dedicated to bioinformaticians, run by the Italian Node of ELIXIR (ELIXIR-IT) in collaboration with CINECA, the main Italian supercomputing center. RESULTS: Starting from April 2016, CINECA and ELIXIR-IT launched the pilot Call "ELIXIR-IT HPC@CINECA", offering streamlined access to HPC resources for bioinformatics. Resources are made available either through web front-ends to dedicated workflows developed at CINECA or by providing direct access to the High Performance Computing systems through a standard command-line interface tailored for bioinformatics data analysis. This allows to offer to the biomedical research community a production scale environment, continuously updated with the latest available versions of publicly available reference datasets and bioinformatic tools. Currently, 63 research projects have gained access to the HPC@CINECA program, for a total handout of ~ 8 Millions of CPU/hours and, for data storage, ~ 100 TB of permanent and ~ 300 TB of temporary space. CONCLUSIONS: Three years after the beginning of the ELIXIR-IT HPC@CINECA program, we can appreciate its impact over the Italian bioinformatics community and draw some considerations. Several Italian researchers who applied to the program have gained access to one of the top-ranking public scientific supercomputing facilities in Europe. Those investigators had the opportunity to sensibly reduce computational turnaround times in their research projects and to process massive amounts of data, pursuing research approaches that would have been otherwise difficult or impossible to undertake. Moreover, by taking advantage of the wealth of documentation and training material provided by CINECA, participants had the opportunity to improve their skills in the usage of HPC systems and be better positioned to apply to similar EU programs of greater scale, such as PRACE. To illustrate the effective usage and impact of the resources awarded by the program - in different research applications - we report five successful use cases, which have already published their findings in peer-reviewed journals.