Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
BMC Bioinformatics ; 25(1): 31, 2024 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-38233808

RESUMEN

Analyzing the interactions of circular RNAs (circRNAs) is a crucial step in understanding their functional impacts. While there are numerous visualization tools available for investigating circRNA interaction networks, these tools are typically limited to known circRNAs from specific databases. Moreover, these existing tools usually require complex installation procedures which can be time-consuming and challenging for users. There is a lack of a user-friendly web application that facilitates interactive exploration and visualization of circRNA interaction networks. CircNetVis is an interactive online web application to enhance the analysis of human/mouse circRNA interactions. The tool allows three different input formats of circRNAs including circRNA IDs from CircBase, circRNA coordinates (chromosome, start position, end position), and circRNA sequences in the FASTA format. It integrates multiple interaction networks for visualization and investigation of the interplay between circRNA, microRNAs, mRNAs and RNA binding proteins. CircNetVis also enables users to interactively explore the interactions of unknown circRNAs which are not reported from previous databases. The tool can generate interactive plots and allows users to save results as output files for offline usage. CircNetVis is implemented as a web application using R-shiny and freely available for academic use at https://www.meb.ki.se/shiny/truvu/CircNetVis/ .


Asunto(s)
MicroARNs , ARN Circular , Humanos , Ratones , Animales , MicroARNs/genética , MicroARNs/metabolismo , ARN Mensajero/genética , Programas Informáticos , Bases de Datos Factuales , Redes Reguladoras de Genes
2.
Bioinformatics ; 38(5): 1287-1294, 2022 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-34864849

RESUMEN

MOTIVATION: RNA expression at isoform level is biologically more informative than at gene level and can potentially reveal cellular subsets and corresponding biomarkers that are not visible at gene level. However, due to the strong 3' bias sequencing protocol, mRNA quantification for high-throughput single-cell RNA sequencing such as Chromium Single Cell 3' 10× Genomics is currently performed at the gene level. RESULTS: We have developed an isoform-level quantification method for high-throughput single-cell RNA sequencing by exploiting the concepts of transcription clusters and isoform paralogs. The method, called Scasa, compares well in simulations against competing approaches including Alevin, Cellranger, Kallisto, Salmon, Terminus and STARsolo at both isoform- and gene-level expression. The reanalysis of a CITE-Seq dataset with isoform-based Scasa reveals a subgroup of CD14 monocytes missed by gene-based methods. AVAILABILITY AND IMPLEMENTATION: Implementation of Scasa including source code, documentation, tutorials and test data supporting this study is available at Github: https://github.com/eudoraleer/scasa and Zenodo: https://doi.org/10.5281/zenodo.5712503. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Programas Informáticos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , ARN Mensajero/genética , ARN
3.
Sensors (Basel) ; 23(3)2023 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-36772473

RESUMEN

The expression abundance of transcripts in nondiseased breast tissue varies among individuals. The association study of genotypes and imaging phenotypes may help us to understand this individual variation. Since existing reports mainly focus on tumors or lesion areas, the heterogeneity of pathological image features and their correlations with RNA expression profiles for nondiseased tissue are not clear. The aim of this study is to discover the association between the nucleus features and the transcriptome-wide RNAs. We analyzed both microscopic histology images and RNA-sequencing data of 456 breast tissues from the Genotype-Tissue Expression (GTEx) project and constructed an automatic computational framework. We classified all samples into four clusters based on their nucleus morphological features and discovered feature-specific gene sets. The biological pathway analysis was performed on each gene set. The proposed framework evaluates the morphological characteristics of the cell nucleus quantitatively and identifies the associated genes. We found image features that capture population variation in breast tissue associated with RNA expressions, suggesting that the variation in expression pattern affects population variation in the morphological traits of breast tissue. This study provides a comprehensive transcriptome-wide view of imaging-feature-specific RNA expression for healthy breast tissue. Such a framework could also be used for understanding the connection between RNA expression and morphology in other tissues and organs. Pathway analysis indicated that the gene sets we identified were involved in specific biological processes, such as immune processes.


Asunto(s)
Neoplasias de la Mama , Transcriptoma , Humanos , Femenino , Transcriptoma/genética , ARN/genética , Análisis de Secuencia de ARN , Genotipo , Fenotipo , Neoplasias de la Mama/diagnóstico por imagen , Neoplasias de la Mama/genética
4.
BMC Genomics ; 23(1): 106, 2022 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-35135477

RESUMEN

BACKGROUND: Circular RNA (circRNA), a class of RNA molecule with a loop structure, has recently attracted researchers due to its diverse biological functions and potential biomarkers of human diseases. Most of the current circRNA detection methods from RNA-sequencing (RNA-Seq) data utilize the mapping information of paired-end (PE) reads to eliminate false positives. However, much of the practical RNA-Seq data such as cross-linking immunoprecipitation sequencing (CLIP-Seq) data usually contain single-end (SE) reads. It is not clear how well these tools perform on SE RNA-Seq data. RESULTS: In this study, we present a systematic evaluation of six advanced RNA-based methods and two CLIP-Seq based methods for detecting circRNAs from SE RNA-Seq data. The performances of the methods are rigorously assessed based on precision, sensitivity, F1 score, and true discovery rate. We investigate the impacts of read length, false positive ratio, sequencing depth and PE mapping information on the performances of the methods using simulated SE RNA-Seq simulated datasets. The real datasets used in this study consist of four experimental RNA-Seq datasets with ≥100bp read length and 124 CLIP-Seq samples from 45 studies that contain mostly short-read (≤50bp) RNA-Seq data. The simulation study shows that the sensitivities of most of the methods can be improved by increasing either read length or sequencing depth, and that the levels of false positive rates significantly affect the precision of all methods. Furthermore, the PE mapping information can improve the method's precision but can not always guarantee the increase of F1 score. Overall, no method is dominant for all SE RNA-Seq data. The RNA-based methods perform better for the long-read datasets but are worse for the short-read datasets. In contrast, the CLIP-Seq based methods outperform the RNA-Seq based methods for all the short-read samples. Combining the results of these methods can significantly improve precision in the CLIP-Seq data. CONCLUSIONS: The results provide a systematic evaluation of circRNA detection methods on SE RNA-Seq data that would facilitate researchers' strategies in circRNA analysis.


Asunto(s)
ARN Circular , ARN , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Inmunoprecipitación , ARN/genética , RNA-Seq , Análisis de Secuencia de ARN
5.
BMC Bioinformatics ; 22(1): 495, 2021 Oct 13.
Artículo en Inglés | MEDLINE | ID: mdl-34645386

RESUMEN

BACKGROUND: Circular RNA (circRNA) is an emerging class of RNA molecules attracting researchers due to its potential for serving as markers for diagnosis, prognosis, or therapeutic targets of cancer, cardiovascular, and autoimmune diseases. Current methods for detection of circRNA from RNA sequencing (RNA-seq) focus mostly on improving mapping quality of reads supporting the back-splicing junction (BSJ) of a circRNA to eliminate false positives (FPs). We show that mapping information alone often cannot predict if a BSJ-supporting read is derived from a true circRNA or not, thus increasing the rate of FP circRNAs. RESULTS: We have developed Circall, a novel circRNA detection method from RNA-seq. Circall controls the FPs using a robust multidimensional local false discovery rate method based on the length and expression of circRNAs. It is computationally highly efficient by using a quasi-mapping algorithm for fast and accurate RNA read alignments. We applied Circall on two simulated datasets and three experimental datasets of human cell-lines. The results show that Circall achieves high sensitivity and precision in the simulated data. In the experimental datasets it performs well against current leading methods. Circall is also substantially faster than the other methods, particularly for large datasets. CONCLUSIONS: With those better performances in the detection of circRNAs and in computational time, Circall facilitates the analyses of circRNAs in large numbers of samples. Circall is implemented in C++ and R, and available for use at https://www.meb.ki.se/sites/biostatwiki/circall and https://github.com/datngu/Circall.


Asunto(s)
ARN Circular , ARN , Humanos , ARN/genética , Empalme del ARN , RNA-Seq , Análisis de Secuencia de ARN
6.
Bioinformatics ; 36(3): 805-812, 2020 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-31400221

RESUMEN

MOTIVATION: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. RESULTS: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model Xß, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers Xß as a bilinear model with both X and ß unknown. Joint estimation of X and ß is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and ß. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets. AVAILABILITY AND IMPLEMENTATION: The method and pipeline are implemented as a tool and freely available for use at http://fafner.meb.ki.se/biostatwiki/xaem/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , RNA-Seq , Algoritmos , Isoformas de Proteínas/genética , Análisis de Secuencia de ARN , Programas Informáticos
7.
Am J Hematol ; 96(5): 580-588, 2021 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-33625756

RESUMEN

Molecular classification of acute myeloid leukemia (AML) aids prognostic stratification and clinical management. Our aim in this study is to identify transcriptome-wide mRNAs that are specific to each of the molecular subtypes of AML. We analyzed RNA-sequencing data of 955 AML samples from three cohorts, including the BeatAML project, the Cancer Genome Atlas, and a cohort of Swedish patients to provide a comprehensive transcriptome-wide view of subtype-specific mRNA expression. We identified 729 subtype-specific mRNAs, discovered in the BeatAML project and validated in the other two cohorts. Using unique proteomics data, we also validated the presence of subtype-specific mRNAs at the protein level, yielding a rich collection of potential protein-based biomarkers for the AML community. To enable the exploration of subtype-specific mRNA expression by the broader scientific community, we provide an interactive resource to the public.


Asunto(s)
Leucemia Mieloide Aguda/genética , ARN Mensajero/biosíntesis , ARN Neoplásico/biosíntesis , Transcriptoma , Biomarcadores de Tumor , Genes Relacionados con las Neoplasias , Humanos , Leucemia Mieloide Aguda/clasificación , Leucemia Mieloide Aguda/metabolismo , Proteínas de Neoplasias/biosíntesis , Proteínas de Neoplasias/genética , Proteínas de Fusión Oncogénica/biosíntesis , Proteínas de Fusión Oncogénica/genética , Proteoma , ARN Mensajero/genética , ARN Neoplásico/genética , RNA-Seq , Estudios Retrospectivos , Suecia
8.
Bioinformatics ; 35(22): 4679-4687, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31028395

RESUMEN

MOTIVATION: Both single-cell RNA sequencing (scRNA-seq) and DNA sequencing (scDNA-seq) have been applied for cell-level genomic profiling. For mutation profiling, the latter seems more natural. However, the task is highly challenging due to the limited input materials from only two copies of DNA molecules, while whole-genome amplification generates biases and other technical noises. ScRNA-seq starts with a higher input amount, so generally has better data quality. There exists various methods for mutation detection from DNA sequencing, it is not clear whether these methods work for scRNA-seq data. RESULTS: Mutation detection methods developed for either bulk-cell sequencing data or scDNA-seq data do not work well for the scRNA-seq data, as they produce substantial numbers of false positives. We develop a novel and robust statistical method-called SCmut-to identify specific cells that harbor mutations discovered in bulk-cell data. Statistically SCmut controls the false positives using the 2D local false discovery rate method. We apply SCmut to several scRNA-seq datasets. In scRNA-seq breast cancer datasets SCmut identifies a number of highly confident cell-level mutations that are recurrent in many cells and consistent in different samples. In a scRNA-seq glioblastoma dataset, we discover a recurrent cell-level mutation in the PDGFRA gene that is highly correlated with a well-known in-frame deletion in the gene. To conclude, this study contributes a novel method to discover cell-level mutation information from scRNA-seq that can facilitate investigation of cell-to-cell heterogeneity. AVAILABILITY AND IMPLEMENTATION: The source codes and bioinformatics pipeline of SCmut are available at https://github.com/nghiavtr/SCmut. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Mutación , Perfilación de la Expresión Génica , Humanos , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Programas Informáticos
9.
Bioinformatics ; 34(14): 2392-2400, 2018 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-29490015

RESUMEN

Motivation: RNA sequencing of single cells enables characterization of transcriptional heterogeneity in seemingly homogeneous cell populations. Single-cell sequencing has been applied in a wide range of researches fields. However, few studies have focus on characterization of isoform-level expression patterns at the single-cell level. In this study, we propose and apply a novel method, ISOform-Patterns (ISOP), based on mixture modeling, to characterize the expression patterns of isoform pairs from the same gene in single-cell isoform-level expression data. Results: We define six principal patterns of isoform expression relationships and describe a method for differential-pattern analysis. We demonstrate ISOP through analysis of single-cell RNA-sequencing data from a breast cancer cell line, with replication in three independent datasets. We assigned the pattern types to each of 16 562 isoform-pairs from 4929 genes. Among those, 26% of the discovered patterns were significant (P<0.05), while remaining patterns are possibly effects of transcriptional bursting, drop-out and stochastic biological heterogeneity. Furthermore, 32% of genes discovered through differential-pattern analysis were not detected by differential-expression analysis. Finally, the effects of drop-out events and expression levels of isoforms on ISOP's performances were investigated through simulated datasets. To conclude, ISOP provides a novel approach for characterization of isoform-level preference, commitment and heterogeneity in single-cell RNA-sequencing data. Availability and implementation: The ISOP method has been implemented as a R package and is available at https://github.com/nghiavtr/ISOP under a GPL-3 license. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Expresión Génica , Isoformas de ARN/genética , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Neoplasias de la Mama/genética , Línea Celular Tumoral , Femenino , Humanos
10.
PLoS Comput Biol ; 14(3): e1006018, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29494588

RESUMEN

Nuclear Magnetic Resonance (NMR) spectroscopy is, together with liquid chromatography-mass spectrometry (LC-MS), the most established platform to perform metabolomics. In contrast to LC-MS however, NMR data is predominantly being processed with commercial software. Meanwhile its data processing remains tedious and dependent on user interventions. As a follow-up to speaq, a previously released workflow for NMR spectral alignment and quantitation, we present speaq 2.0. This completely revised framework to automatically analyze 1D NMR spectra uses wavelets to efficiently summarize the raw spectra with minimal information loss or user interaction. The tool offers a fast and easy workflow that starts with the common approach of peak-picking, followed by grouping, thus avoiding the binning step. This yields a matrix consisting of features, samples and peak values that can be conveniently processed either by using included multivariate statistical functions or by using many other recently developed methods for NMR data analysis. speaq 2.0 facilitates robust and high-throughput metabolomics based on 1D NMR but is also compatible with other NMR frameworks or complementary LC-MS workflows. The methods are benchmarked using a simulated dataset and two publicly available datasets. speaq 2.0 is distributed through the existing speaq R package to provide a complete solution for NMR data processing. The package and the code for the presented case studies are freely available on CRAN (https://cran.r-project.org/package=speaq) and GitHub (https://github.com/beirnaert/speaq).


Asunto(s)
Espectroscopía de Resonancia Magnética/métodos , Metabolómica/métodos , Algoritmos , Cromatografía Liquida/métodos , Imagen por Resonancia Magnética/métodos , Programas Informáticos , Flujo de Trabajo
11.
BMC Genomics ; 19(1): 786, 2018 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-30382840

RESUMEN

BACKGROUND: Fusion genes are known to be drivers of many common cancers, so they are potential markers for diagnosis, prognosis or therapy response. The advent of paired-end RNA sequencing enhances our ability to discover fusion genes. While there are available methods, routine analyses of large number of samples are still limited due to high computational demands. RESULTS: We develop FuSeq, a fast and accurate method to discover fusion genes based on quasi-mapping to quickly map the reads, extract initial candidates from split reads and fusion equivalence classes of mapped reads, and finally apply multiple filters and statistical tests to get the final candidates. We apply FuSeq to four validated datasets: breast cancer, melanoma and glioma datasets, and one spike-in dataset. The results reveal high sensitivity and specificity in all datasets, and compare well against other methods such as FusionMap, TRUP, TopHat-Fusion, SOAPfuse and JAFFA. In terms of computational time, FuSeq is two-fold faster than FusionMap and orders of magnitude faster than the other methods. CONCLUSIONS: With this advantage of less computational demands, FuSeq makes it practical to investigate fusion genes in large numbers of samples. FuSeq is implemented in C++ and R, and available at https://github.com/nghiavtr/FuSeq for non-commercial uses.


Asunto(s)
Fusión Génica , ARN/genética , Análisis de Secuencia de ARN , Algoritmos , Línea Celular Tumoral , Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias/genética , Proteínas de Fusión Oncogénica/genética , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN/métodos
12.
Brief Bioinform ; 16(2): 216-31, 2015 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24162173

RESUMEN

Over the past two decades, pattern mining techniques have become an integral part of many bioinformatics solutions. Frequent itemset mining is a popular group of pattern mining techniques designed to identify elements that frequently co-occur. An archetypical example is the identification of products that often end up together in the same shopping basket in supermarket transactions. A number of algorithms have been developed to address variations of this computationally non-trivial problem. Frequent itemset mining techniques are able to efficiently capture the characteristics of (complex) data and succinctly summarize it. Owing to these and other interesting properties, these techniques have proven their value in biological data analysis. Nevertheless, information about the bioinformatics applications of these techniques remains scattered. In this primer, we introduce frequent itemset mining and their derived association rules for life scientists. We give an overview of various algorithms, and illustrate how they can be used in several real-life bioinformatics application domains. We end with a discussion of the future potential and open challenges for frequent itemset mining in the life sciences.


Asunto(s)
Algoritmos , Minería de Datos/estadística & datos numéricos , Animales , Análisis por Conglomerados , Biología Computacional , Perfilación de la Expresión Génica/estadística & datos numéricos , Redes Reguladoras de Genes , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Reconocimiento de Normas Patrones Automatizadas/estadística & datos numéricos , Polimorfismo de Nucleótido Simple , Programas Informáticos
13.
Bioinformatics ; 32(14): 2128-35, 2016 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-27153638

RESUMEN

MOTIVATION: Single-cell RNA-sequencing technology allows detection of gene expression at the single-cell level. One typical feature of the data is a bimodality in the cellular distribution even for highly expressed genes, primarily caused by a proportion of non-expressing cells. The standard and the over-dispersed gamma-Poisson models that are commonly used in bulk-cell RNA-sequencing are not able to capture this property. RESULTS: We introduce a beta-Poisson mixture model that can capture the bimodality of the single-cell gene expression distribution. We further integrate the model into the generalized linear model framework in order to perform differential expression analyses. The whole analytical procedure is called BPSC. The results from several real single-cell RNA-seq datasets indicate that ∼90% of the transcripts are well characterized by the beta-Poisson model; the model-fit from BPSC is better than the fit of the standard gamma-Poisson model in > 80% of the transcripts. Moreover, in differential expression analyses of simulated and real datasets, BPSC performs well against edgeR, a conventional method widely used in bulk-cell RNA-sequencing data, and against scde and MAST, two recent methods specifically designed for single-cell RNA-seq data. AVAILABILITY AND IMPLEMENTATION: An R package BPSC for model fitting and differential expression analyses of single-cell RNA-seq data is available under GPL-3 license at https://github.com/nghiavtr/BPSC CONTACT: yudi.pawitan@ki.se or mattias.rantalainen@ki.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Biología Computacional/métodos , Modelos Teóricos , ARN
14.
Rapid Commun Mass Spectrom ; 31(17): 1396-1404, 2017 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-28569011

RESUMEN

RATIONALE: Using mass spectrometry, the analysis of known metabolite structures has become feasible in a systematic high-throughput fashion. Nevertheless, the identification of previously unknown structures remains challenging, partially because many unidentified variants originate from known molecules that underwent unexpected modifications. Here, we present a method for the discovery of unknown metabolite modifications and conjugate metabolite isoforms in a high-throughput fashion. METHODS: The method is based on user-controlled in-source fragmentation which is used to induce loss of weakly bound modifications. This is followed by the comparison of product ions from in-source fragmentation and collision-induced dissociation (CID). Diagonal MS2 -MS3 matching allows the detection of unknown metabolite modifications, as well as substructure similarities. As the method relies heavily on the advantages of in-source fragmentation and its ability to 'magically' elucidate unknown modification, we have named it inSourcerer as a portmanteau of in-source and sorcerer. RESULTS: The method was evaluated using a set of 15 different cytokinin standards. Product ions from in-source fragmentation and CID were compared. Hierarchical clustering revealed that good matches are due to the presence of common substructures. Plant leaf extract, spiked with a mix of all 15 standards, was used to demonstrate the method's ability to detect these standards in a complex mixture, as well as confidently identify compounds already present in the plant material. CONCLUSIONS: Here we present a method that incorporates a classic liquid chromatography/mass spectrometry (LC/MS) workflow with fragmentation models and computational algorithms. The assumptions upon which the concept of the method was built were shown to be valid and the method showed that in-source fragmentation can be used to pinpoint structural similarities and indicate the occurrence of a modification.


Asunto(s)
Ensayos Analíticos de Alto Rendimiento/métodos , Espectrometría de Masas/métodos , Modelos Químicos , Biología Computacional , Citocininas/análisis , Citocininas/química , Ensayos Analíticos de Alto Rendimiento/normas , Espectrometría de Masas/normas , Metaboloma , Extractos Vegetales/química , Hojas de la Planta/química
15.
J Proteome Res ; 13(9): 4175-83, 2014 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-25004400

RESUMEN

Spectral library searching is a popular approach for MS/MS-based peptide identification. Because the size of spectral libraries continues to grow, the performance of searching algorithms is an important issue. This technical note introduces a strategy based on a minimum shared peak count between two spectra to reduce the set of admissible candidate spectra when issuing a query. A theoretical validation through time complexity analysis and an experimental validation based on an implementation of the candidate reduction strategy show that the approach can achieve a reduction of the set of candidate spectra by (at least) an order of magnitude, resulting in a significant improvement in the speed of the search. Meanwhile, more than 99% of the positive search results is retained. This efficient strategy to drastically improve the speed of spectral library searching with a negligible loss of sensitivity can be applied to any current spectral library search tool, irrespective of the employed similarity metric.


Asunto(s)
Bases de Datos de Proteínas , Biblioteca de Péptidos , Proteómica/métodos , Programas Informáticos , Algoritmos , Minería de Datos , Humanos , Proteínas , Levaduras
16.
BMC Cancer ; 14: 594, 2014 Aug 16.
Artículo en Inglés | MEDLINE | ID: mdl-25128202

RESUMEN

BACKGROUND: Regions within solid tumours often experience oxygen deprivation, which is associated with resistance to chemotherapy and irradiation. The aim of this study was to evaluate the radiosensitising effect of gemcitabine and its main metabolite dFdU under normoxia versus hypoxia and to determine whether hypoxia-inducible factor 1 (HIF-1) is involved in the radiosensitising mechanism. METHODS: Stable expression of dominant negative HIF-1α (dnHIF) in MDA-MB-231 breast cancer cells, that ablated endogenous HIF-1 transcriptional activity, was validated by western blot and functionality was assessed by HIF-1α activity assay. Cells were exposed to varying oxygen environments and treated with gemcitabine or dFdU for 24 h, followed by irradiation. Clonogenicity was then assessed. Using radiosensitising conditions, cells were collected for cell cycle analysis. RESULTS: HIF-1 activity was significantly inhibited in cells stably expressing dnHIF. A clear radiosensitising effect under normoxia and hypoxia was observed for both gemcitabine and dFdU. No significant difference in radiobiological parameters between HIF-1 proficient and HIF-1 deficient MDA-MB-231 cells was demonstrated. CONCLUSIONS: For the first time, radiosensitisation by dFdU, the main metabolite of gemcitabine, was demonstrated under low oxygen conditions. No major role for functional HIF-1 protein in radiosensitisation by gemcitabine or dFdU could be shown.


Asunto(s)
Desoxicitidina/análogos & derivados , Desoxiuridina/farmacología , Factor 1 Inducible por Hipoxia/metabolismo , Fármacos Sensibilizantes a Radiaciones/farmacología , Neoplasias de la Mama , Ciclo Celular/efectos de los fármacos , Ciclo Celular/efectos de la radiación , Hipoxia de la Célula/efectos de los fármacos , Hipoxia de la Célula/efectos de la radiación , Línea Celular Tumoral , Desoxicitidina/farmacología , Desoxiuridina/análogos & derivados , Femenino , Humanos , Subunidad alfa del Factor 1 Inducible por Hipoxia/metabolismo , Técnicas In Vitro , Gemcitabina
17.
Proteome Sci ; 12(1): 54, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25429250

RESUMEN

BACKGROUND: Mass spectrometry-based proteomics experiments generate spectra that are rich in information. Often only a fraction of this information is used for peptide/protein identification, whereas a significant proportion of the peaks in a spectrum remain unexplained. In this paper we explore how a specific class of data mining techniques termed "frequent itemset mining" can be employed to discover patterns in the unassigned data, and how such patterns can help us interpret the origin of the unexpected/unexplained peaks. RESULTS: First a model is proposed that describes the origin of the observed peaks in a mass spectrum. For this purpose we use the classical correlative database search algorithm. Peaks that support a positive identification of the spectrum are termed explained peaks. Next, frequent itemset mining techniques are introduced to infer which unexplained peaks are associated in a spectrum. The method is validated on two types of experimental proteomic data. First, peptide mass fingerprint data is analyzed to explain the unassigned peaks in a full scan mass spectrum. Interestingly, a large numbers of experimental spectra reveals several highly frequent unexplained masses, and pattern mining on these frequent masses demonstrates that subsets of these peaks frequently co-occur. Further evaluation shows that several of these co-occurring peaks indeed have a known common origin, and other patterns are promising hypothesis generators for further analysis. Second, the proposed methodology is validated on tandem mass spectrometral data using a public spectral library, where associations within the mass differences of unassigned peaks and peptide modifications are explored. The investigation of the found patterns illustrates that meaningful patterns can be discovered that can be explained by features of the employed technology and found modifications. CONCLUSIONS: This simple approach offers opportunities to monitor accumulating unexplained mass spectrometry data for emerging new patterns, with possible applications for the development of mass exclusion lists, for the refinement of quality control strategies and for a further interpretation of unexplained spectral peaks in mass spectrometry and tandem mass spectrometry.

18.
Phenomics ; 3(3): 217-227, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37325708

RESUMEN

Alternative splicing exists in most multi-exonic genes, and exploring these complex alternative splicing events and their resultant isoform expressions is essential. However, it has become conventional that RNA sequencing results have often been summarized into gene-level expression counts mainly due to the multiple ambiguous mapping of reads at highly similar regions. Transcript-level quantification and interpretation are often overlooked, and biological interpretations are often deduced based on combined transcript information at the gene level. Here, for the most variable tissue of alternative splicing, the brain, we estimate isoform expressions in 1,191 samples collected by the Genotype-Tissue Expression (GTEx) Consortium using a powerful method that we previously developed. We perform genome-wide association scans on the isoform ratios per gene and identify isoform-ratio quantitative trait loci (irQTL), which could not be detected by studying gene-level expressions alone. By analyzing the genetic architecture of the irQTL, we show that isoform ratios regulate educational attainment via multiple tissues including the frontal cortex (BA9), cortex, cervical spinal cord, and hippocampus. These tissues are also associated with different neuro-related traits, including Alzheimer's or dementia, mood swings, sleep duration, alcohol intake, intelligence, anxiety or depression, etc. Mendelian randomization (MR) analysis revealed 1,139 pairs of isoforms and neuro-related traits with plausible causal relationships, showing much stronger causal effects than on general diseases measured in the UK Biobank (UKB). Our results highlight essential transcript-level biomarkers in the human brain for neuro-related complex traits and diseases, which could be missed by merely investigating overall gene expressions. Supplementary Information: The online version contains supplementary material available at 10.1007/s43657-023-00100-6.

19.
Data Brief ; 47: 108932, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-36819900

RESUMEN

Salmonella enterica is one of the most common agents of foodborne bacterial illness with poultry being an important reservoir. The indiscriminate use of antimicrobial compounds in poultry farming increasingly leads to antimicrobial-resistant (AMR) which threatens the health of both animals and humans. Antimicrobial-resistant Salmonella enterica from the poultry can spread to human through the direct contact with infected poultry or fecal contaminated environments. Antimicrobial-resistant S. enterica, especially fluoroquinolone-resistant nontyphoidal Salmonella is in the list of global health concern stated by the World Health Organization (WHO). Here we report the whole-genome sequencing data and de novo genome assemble of antimicrobial-resistant S. enterica strains S8 and S9 from the C. moschata carcass collected in Vietnam. Genomic DNA of S. enterica were extracted and subjected to whole-genome sequencing using Illumina MiSeq platform. The genome size of antimicrobial-resistant S. enterica strain S8 is 4,707,459 bp with a GC-content of 52.38%, containing 10 antimicrobial resistant genes. The genome size of antimicrobial-resistant Samonella enterica strain S9 is 4,923,944 bp with a GC-content of 52,39%, containing 10 antimicrobial resistance genes. Our data provided the insights on antimicrobial resistant genes of S. enterica isolates from the C. moschata carcass, which help to understand the infection mechanism of antimicrobial-resistant S. enterica in human.

20.
NPJ Precis Oncol ; 7(1): 32, 2023 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-36964195

RESUMEN

Despite some encouraging successes, predicting the therapy response of acute myeloid leukemia (AML) patients remains highly challenging due to tumor heterogeneity. Here we aim to develop and validate MDREAM, a robust ensemble-based prediction model for drug response in AML based on an integration of omics data, including mutations and gene expression, and large-scale drug testing. Briefly, MDREAM is first trained in the BeatAML cohort (n = 278), and then validated in the BeatAML (n = 183) and two external cohorts, including a Swedish AML cohort (n = 45) and a relapsed/refractory acute leukemia cohort (n = 12). The final prediction is based on 122 ensemble models, each corresponding to a drug. A confidence score metric is used to convey the uncertainty of predictions; among predictions with a confidence score >0.75, the validated proportion of good responders is 77%. The Spearman correlations between the predicted and the observed drug response are 0.68 (95% CI: [0.64, 0.68]) in the BeatAML validation set, -0.49 (95% CI: [-0.53, -0.44]) in the Swedish cohort and 0.59 (95% CI: [0.51, 0.67]) in the relapsed/refractory cohort. A web-based implementation of MDREAM is publicly available at https://www.meb.ki.se/shiny/truvu/MDREAM/ .

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA