RESUMEN
BACKGROUND: There is a need for novel treatments for neuroblastoma, despite the emergence of new biological and immune treatments, since refractory pediatric neuroblastoma is still a medical challenge. Phyto cannabinoids and their hemisynthetic derivatives have shown evidence supporting their anticancer potential. The aim of this research was to examine Phytocannabinoids or hemisynthetic cannabinoids, which reduce the SHSY-5Y, neuroblastoma cell line's viability. METHODS: Hexane and acetyl acetate extracts were produced starting with Cannabis sativa L. as raw material, then, 9-tetrahidrocannabinol, its acid counterpart and CBN were isolated. In addition, acetylated derivatives of THC and CBN were synthesized. The identification and purity of the chemicals was determined by High Performance Liquid Chromatography and 1H y 13C Magnetic Nuclear Resonance. Then, the capacity to affect the viability of SHSY-5Y, a neuroblastoma cell line, was examined using the resazurin method. Finally, to gain insight into the mechanism of action of the extracts, phytocannabinoids and acetylated derivatives on the examined cells, a caspase 3/7 determination was performed on cells exposed to these compounds. RESULTS: The structure and purity of the isolated compounds was demonstrated. The extracts, the phytocannabinoids and their acetylated counterparts inhibited the viability of the SHSY 5Y cells, being CBN the most potent of all the tested molecules with an inhibitory concentration of 50 percent of 9.5 µM. CONCLUSION: Each of the evaluated molecules exhibited the capacity to activate caspases 3/7, indicating that at least in part, the cytotoxicity of the tested phytocannabinoids and their hemi-synthetic derivatives is mediated by apoptosis.
Asunto(s)
Cannabinoides , Cannabis , Caspasa 3 , Supervivencia Celular , Neuroblastoma , Extractos Vegetales , Humanos , Cannabis/química , Extractos Vegetales/farmacología , Extractos Vegetales/química , Línea Celular Tumoral , Neuroblastoma/tratamiento farmacológico , Supervivencia Celular/efectos de los fármacos , Caspasa 3/metabolismo , Caspasa 3/efectos de los fármacos , Cannabinoides/farmacología , Cannabinoides/química , Caspasa 7/metabolismo , Apoptosis/efectos de los fármacos , Acetilación/efectos de los fármacos , Cromatografía Líquida de Alta PresiónRESUMEN
BACKGROUND: Epstein-Barr virus (EBV) is a human gammaherpesvirus etiologically linked to several benign and malignant diseases. EBV-associated malignancies exhibit an unusual global distribution that might be partly attributed to virus and host genetic backgrounds. OBJECTIVES: To assemble a new genome of EBV (CEMO3) from a paediatric Burkitt's lymphoma from Rio de Janeiro State (Southeast Brazil). In addition, to perform global phylogenetic analysis using complete EBV genomes, including CEMO3, and investigate the genetic relationship of some South American (SA) genomes through EBV subgenomic targets. METHODS: CEMO3 was sequenced through next generation sequencing and its coverage and gaps were corrected through the Sanger method. CEMO3 and 67 EBV genomes representing diverse geographic regions were evaluated through maximum likelihood phylogenetic analysis. Further, the polymorphism of subgenomic regions of some SA EBV genomes were assessed. FINDINGS: The whole bulk tumour sequencing yielded 23,217 reads related to EBV, which 172,713 base pairs of the newly EBV genome CEMO3 was assembled. The CEMO3 and most SA EBV genomes clustered within the SA subclade closely related to the African Raji strain, forming the South American/Raji clade. Notably, these Raji-related genomes exhibit significant genetic diversity, characterised by distinctive synapomorphies at some gene levels absent in the original Raji strain. CONCLUSION: The CEMO3 represents a new South American EBV genome assembled. Albeit the majority of EBV genomes from SA are Raji-related, it harbours a high diversity different from the original Raji strain.
Asunto(s)
Infecciones por Virus de Epstein-Barr , Herpesvirus Humano 4 , Niño , Humanos , Herpesvirus Humano 4/genética , Infecciones por Virus de Epstein-Barr/genética , Infecciones por Virus de Epstein-Barr/patología , Filogenia , Genoma Viral/genética , BrasilRESUMEN
BACKGROUND: In living organisms, small heat shock proteins (sHSPs) are triggered in response to stress situations. This family of proteins is large in plants and, in the case of tomato (Solanum lycopersicum), 33 genes have been identified, most of them related to heat stress response and to the ripening process. Transcriptomic and proteomic studies have revealed complex patterns of expression for these genes. In this work, we investigate the coregulation of these genes by performing a computational analysis of their promoter architecture to find regulatory motifs known as heat shock elements (HSEs). We leverage the presence of sHSP members that originated from tandem duplication events and analyze the promoter architecture diversity of the whole sHSP family, focusing on the identification of HSEs. RESULTS: We performed a search for conserved genomic sequences in the promoter regions of the sHSPs of tomato, plus several other proteins (mainly HSPs) that are functionally related to heat stress situations or to ripening. Several computational analyses were performed to build multiple sequence motifs and identify transcription factor binding sites (TFBS) homologous to HSF1AE and HSF21 in Arabidopsis. We also investigated the expression and interaction of these proteins under two heat stress situations in whole tomato plants and in protoplast cells, both in the presence and in the absence of heat shock transcription factor A2 (HsfA2). The results of these analyses indicate that different sHSPs are up-regulated depending on the activation or repression of HsfA2, a key regulator of HSPs. Further, the analysis of protein-protein interaction between the sHSP protein family and other heat shock response proteins (Hsp70, Hsp90 and MBF1c) suggests that several sHSPs are mediating alternative stress response through a regulatory subnetwork that is not dependent on HsfA2. CONCLUSIONS: Overall, this study identifies two regulatory motifs (HSF1AE and HSF21) associated with the sHSP family in tomato which are considered genomic HSEs. The study also suggests that, despite the apparent redundancy of these proteins, which has been linked to gene duplication, tomato sHSPs showed different up-regulation and different interaction patterns when analyzed under different stress situations.
Asunto(s)
Regulación de la Expresión Génica de las Plantas , Proteínas de Choque Térmico Pequeñas/genética , Motivos de Nucleótidos , Proteínas de Plantas/genética , Secuencias Reguladoras de Ácidos Nucleicos , Solanum lycopersicum/genética , Duplicación de Gen , Proteínas de Choque Térmico Pequeñas/metabolismo , Respuesta al Choque Térmico , Solanum lycopersicum/crecimiento & desarrollo , Solanum lycopersicum/metabolismo , Proteínas de Plantas/metabolismo , Regiones Promotoras Genéticas , Mapas de Interacción de ProteínasRESUMEN
Motivation: To attain acceptable sample misassignment rates, current approaches to multiplex single-molecule real-time sequencing require upstream quality improvement, which is obtained from multiple passes over the sequenced insert and significantly reduces the effective read length. In order to fully exploit the raw read length on multiplex applications, robust barcodes capable of dealing with the full single-pass error rates are needed. Results: We present a method for designing sequencing barcodes that can withstand a large number of insertion, deletion and substitution errors and are suitable for use in multiplex single-molecule real-time sequencing. The manuscript focuses on the design of barcodes for full-length single-pass reads, impaired by challenging error rates in the order of 11%. The proposed barcodes can multiplex hundreds or thousands of samples while achieving sample misassignment probabilities as low as 10-7 under the above conditions, and are designed to be compatible with chemical constraints imposed by the sequencing process. Availability and Implementation: Software tools for constructing watermark barcode sets and demultiplexing barcoded reads, together with example sets of barcodes and synthetic barcoded reads, are freely available at www.cifasis-conicet.gov.ar/ezpeleta/NS-watermark . Contact: ezpeleta@cifasis-conicet.gov.ar.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Simulación por ComputadorRESUMEN
Nucleic-acid barcoding is an enabling technique for many applications, but its use remains limited in emerging long-read sequencing technologies with intrinsically low raw accuracy. Here, we apply so-called NS-watermark barcodes, whose error correction capability was previously validated in silico, in a proof of concept where we synthesize 3840 NS-watermark barcodes and use them to asymmetrically tag and simultaneously sequence amplicons from two evolutionarily distant species (namely Bordetella pertussis and Drosophila mojavensis) on the ONT MinION platform. To our knowledge, this is the largest number of distinct, non-random tags ever sequenced in parallel and the first report of microarray-based synthesis as a source for large oligonucleotide pools for barcoding. We recovered the identity of more than 86% of the barcodes, with a crosstalk rate of 0.17% (i.e., one misassignment every 584 reads). This falls in the range of the index hopping rate of established, high-accuracy Illumina sequencing, despite the increased number of tags and the relatively low accuracy of both microarray-based synthesis and long-read sequencing. The robustness of NS-watermark barcodes, together with their scalable design and compatibility with low-cost massive synthesis, makes them promising for present and future sequencing applications requiring massive labeling, such as long-read single-cell RNA-Seq.
Asunto(s)
Código de Barras del ADN Taxonómico , Secuenciación de Nucleótidos de Alto Rendimiento , Código de Barras del ADN Taxonómico/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodosRESUMEN
BACKGROUND: Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. RESULTS: A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples. CONCLUSIONS: A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples.
Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Biología Computacional/métodos , Humanos , Neoplasias/genéticaRESUMEN
The study of long non-coding RNAs (lncRNAs), greater than 200 nucleotides, is central to understanding the development and progression of many complex diseases. Unlike proteins, the functionality of lncRNAs is only subtly encoded in their primary sequence. Current in-silico lncRNA annotation methods mostly rely on annotations inferred from interaction networks. But extensive experimental studies are required to build these networks. In this work, we present a graph-based machine learning method called FGGA-lnc for the automatic gene ontology (GO) annotation of lncRNAs across the three GO subdomains. We build upon FGGA (factor graph GO annotation), a computational method originally developed to annotate protein sequences from non-model organisms. In the FGGA-lnc version, a coding-based approach is introduced to fuse primary sequence and secondary structure information of lncRNA molecules. As a result, lncRNA sequences become sequences of a higher-order alphabet allowing supervised learning methods to assess individual GO-term annotations. Raw GO annotations obtained in this way are unaware of the GO structure and therefore likely to be inconsistent with it. The message-passing algorithm embodied by factor graph models overcomes this problem. Evaluations of the FGGA-lnc method on lncRNA data, from model and non-model organisms, showed promising results suggesting it as a candidate to satisfy the huge demand for functional annotations arising from high-throughput sequencing technologies.
RESUMEN
Single nucleotide variants (SNVs) occurring in a protein coding gene may disrupt its function in multiple ways. Predicting this disruption has been recognized as an important problem in bioinformatics research. Many tools, hereafter p-tools, have been designed to perform these predictions and many of them are now of common use in scientific research, even in clinical applications. This highlights the importance of understanding the semantics of their outputs. To shed light on this issue, two questions are formulated, (i) do p-tools provide similar predictions? (inner consistency), and (ii) are these predictions consistent with the literature? (outer consistency). To answer these, six p-tools are evaluated with exhaustive SNV datasets from the BRCA1 gene. Two indices, called K a l l and K s t r o n g , are proposed to quantify the inner consistency of pairs of p-tools while the outer consistency is quantified by standard information retrieval metrics. While the inner consistency analysis reveals that most of the p-tools are not consistent with each other, the outer consistency analysis reveals they are characterized by a low prediction performance. Although this result highlights the need of improving the prediction performance of individual p-tools, the inner consistency results pave the way to the systematic design of truly diverse ensembles of p-tools that can overcome the limitations of individual members.
Asunto(s)
Proteína BRCA1 , Biología Computacional , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Proteína BRCA1/genética , Proteína BRCA1/metabolismo , HumanosRESUMEN
The GO-Cellular Component (GO-CC) ontology provides a controlled vocabulary for the consistent description of the subcellular compartments or macromolecular complexes where proteins may act. Current machine learning-based methods used for the automated GO-CC annotation of proteins suffer from the inconsistency of individual GO-CC term predictions. Here, we present FGGA-CC+, a class of hierarchical graph-based classifiers for the consistent GO-CC annotation of protein coding genes at the subcellular compartment or macromolecular complex levels. Aiming to boost the accuracy of GO-CC predictions, we make use of the protein localization knowledge in the GO-Biological Process (GO-BP) annotations to boost the accuracy of GO-CC prediction. As a result, FGGA-CC+ classifiers are built from annotation data in both the GO-CC and GO-BP ontologies. Due to their graph-based design, FGGA-CC+ classifiers are fully interpretable and their predictions amenable to expert analysis. Promising results on protein annotation data from five model organisms were obtained. Additionally, successful validation results in the annotation of a challenging subset of tandem duplicated genes in the tomato non-model organism were accomplished. Overall, these results suggest that FGGA-CC+ classifiers can indeed be useful for satisfying the huge demand of GO-CC annotation arising from ubiquitous high throughout sequencing and proteomic projects.
Asunto(s)
Arabidopsis/metabolismo , Biología Computacional/métodos , Drosophila melanogaster/metabolismo , Ontología de Genes , Proteínas/metabolismo , Saccharomyces cerevisiae/metabolismo , Solanum lycopersicum/metabolismo , Animales , Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Proteínas/análisis , Proteómica , Programas InformáticosRESUMEN
Interleukin-10 (IL10) is an immune regulatory cytokine. Single nucleotide polymorphisms (SNPs) in IL10 promoter have been associated with prognosis in adult classical Hodgkin lymphoma (cHL). We analyzed IL10 SNPs -1082 and -592 in respect of therapy response, gene expression and tumor microenvironment (TME) composition in 98 pediatric patients with cHL. As confirmatory results, we found that -1082AA/AG; -592CC genotypes and ATA haplotype were associated with unfavourable prognosis: Progression-free survival (PFS) was shorter in -1082AA+AG (72.2%) than in GG patients (100%) (P = 0.024), and in -592AA (50%) and AC (74.2%) vs. CC patients (87.0%) (P = 0.009). In multivariate analysis, the -592CC genotype and the ATA haplotype retained prognostic impact (HR: 0.41, 95% CI 0.2-0.86; P = 0.018, and HR: 3.06 95% CI 1.03-9.12; P = 0.044, respectively). Our analysis further led to some new observations, namely: (1) Low IL10 mRNA expression was associated with -1082GG genotype (P = 0.014); (2) IL10 promoter polymorphisms influence TME composition;-1082GG/-592CC carriers showed low numbers of infiltrating cells expressing MAF transcription factor (20 vs. 78 and 49 vs. 108 cells/mm2, respectively; P< 0.05); while ATA haplotype (high expression) associated with high numbers of MAF+ cells (P = 0.005). Specifically, -1082GG patients exhibited low percentages of CD68+MAF+ (M2-like) intratumoral macrophages (15.04% vs. 47.26%, P = 0.017). Considering ours as an independent validation cohort, our results give support to the clinical importance of IL10 polymorphisms in the full spectrum of cHL, and advance the concept of genetic control of microenvironment composition as a basis for susceptibility and therapeutic response.
RESUMEN
BACKGROUND Epstein-Barr virus (EBV) is a human gammaherpesvirus etiologically linked to several benign and malignant diseases. EBV-associated malignancies exhibit an unusual global distribution that might be partly attributed to virus and host genetic backgrounds. OBJECTIVES To assemble a new genome of EBV (CEMO3) from a paediatric Burkitt's lymphoma from Rio de Janeiro State (Southeast Brazil). In addition, to perform global phylogenetic analysis using complete EBV genomes, including CEMO3, and investigate the genetic relationship of some South American (SA) genomes through EBV subgenomic targets. METHODS CEMO3 was sequenced through next generation sequencing and its coverage and gaps were corrected through the Sanger method. CEMO3 and 67 EBV genomes representing diverse geographic regions were evaluated through maximum likelihood phylogenetic analysis. Further, the polymorphism of subgenomic regions of some SA EBV genomes were assessed. FINDINGS The whole bulk tumour sequencing yielded 23,217 reads related to EBV, which 172,713 base pairs of the newly EBV genome CEMO3 was assembled. The CEMO3 and most SA EBV genomes clustered within the SA subclade closely related to the African Raji strain, forming the South American/Raji clade. Notably, these Raji-related genomes exhibit significant genetic diversity, characterised by distinctive synapomorphies at some gene levels absent in the original Raji strain. CONCLUSION The CEMO3 represents a new South American EBV genome assembled. Albeit the majority of EBV genomes from SA are Raji-related, it harbours a high diversity different from the original Raji strain.
RESUMEN
As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum.
Asunto(s)
Drosophila melanogaster/genética , Ontología de Genes , Algoritmos , Animales , Arabidopsis/genética , Biología Computacional , Solanum lycopersicum/genética , Saccharomyces cerevisiae/genética , Programas InformáticosRESUMEN
In plants, fruit maturation and oxidative stress can induce small heat shock protein (sHSP) synthesis to maintain cellular homeostasis. Although the tomato reference genome was published in 2012, the actual number and functionality of sHSP genes remain unknown. Using a transcriptomic (RNA-seq) and evolutionary genomic approach, putative sHSP genes in the Solanum lycopersicum (cv. Heinz 1706) genome were investigated. A sHSP gene family of 33 members was established. Remarkably, roughly half of the members of this family can be explained by nine independent tandem duplication events that determined, evolutionarily, their functional fates. Within a mitochondrial class subfamily, only one duplicated member, Solyc08g078700, retained its ancestral chaperone function, while the others, Solyc08g078710 and Solyc08g078720, likely degenerated under neutrality and lack ancestral chaperone function. Functional conservation occurred within a cytosolic class I subfamily, whose four members, Solyc06g076570, Solyc06g076560, Solyc06g076540, and Solyc06g076520, support â¼57% of the total sHSP RNAm in the red ripe fruit. Subfunctionalization occurred within a new subfamily, whose two members, Solyc04g082720 and Solyc04g082740, show heterogeneous differential expression profiles during fruit ripening. These findings, involving the birth/death of some genes or the preferential/plastic expression of some others during fruit ripening, highlight the importance of tandem duplication events in the expansion of the sHSP gene family in the tomato genome. Despite its evolutionary diversity, the sHSP gene family in the tomato genome seems to be endowed with a core set of four homeostasis genes: Solyc05g014280, Solyc03g082420, Solyc11g020330, and Solyc06g076560, which appear to provide a baseline protection during both fruit ripening and heat shock stress in different tomato tissues.
Asunto(s)
Duplicación de Gen , Genes de Plantas , Proteínas de Choque Térmico Pequeñas/genética , Familia de Multigenes , Solanum lycopersicum/genética , Secuencias Repetidas en Tándem , Biología Computacional/métodos , Perfilación de la Expresión Génica , Regulación de la Expresión Génica de las Plantas , Proteínas de Choque Térmico Pequeñas/clasificación , Proteínas de Choque Térmico Pequeñas/metabolismo , Solanum lycopersicum/metabolismo , Anotación de Secuencia Molecular , Filogenia , Transporte de Proteínas , TranscriptomaRESUMEN
For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10(-2) per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10(-9) at the expense of a rate of read losses just in the order of 10(-6).
Asunto(s)
Código de Barras del ADN Taxonómico/métodos , ProbabilidadRESUMEN
Trypanosoma cruzi is divided into two phylogenetic lineages, T. cruzi I and T. cruzi II, which contain different spliced leader (SL) RNA gene promoter sequences: Class I SL gene promoter sequences are found in T. cruzi II, and Class II sequences in T. cruzi I. We analysed different SL RNA promoter sequences from CL-Brener reference strain, belonging to T. cruzi II lineage, and detected sequences that differed within the -80/+1 highly conserved region. Indeed, many of these divergent SL promoters present features of T. cruzi I promoters. Some of these sequences were grouped into the T. cruzi I sequences clade by Bayesian analysis. The results presented herein show that sequence heterogeneity in SL RNA gene promoter not only exists between T. cruzi strains but also within CL-Brener strain. These CL-Brener "T. cruzi I-like" sequences could be considered a molecular trace of a hybrid origin of the SL RNA gene and a new evidence for the presence of sequences of T. cruzi I origin into a T. cruzi II strain. The possible origins of these sequences are discussed.