Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
EJHaem ; 5(4): 721-727, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39157629

RESUMEN

Background: Bone marrow (BM) evaluation is the de facto standard for diagnosis, molecular analysis, risk stratification, and therapy response assessment in acute myeloid leukemia (AML), but in patients with a high number of circulating blast cells, the peripheral blood (PB) sample could provide similar information as BM. However, there is no large-scale molecular study comparing the two specimens in terms of their gene expression profiles, cellular heterogeneities, and ex-vivo drug sensitivity. Methodology: We used (i) the BEAT-AML cohort each with detailed molecular data; (ii) cell-type deconvolution to estimate leukemic and immune cell proportions between specimen types; (iii) differential expression (DE) and drug-cell type association analysis; and (iv) logistic regression models to assess the association between induction therapy response, cell-type composition and first-line drug treatment. Results: Results: We identified 207 patients having BM and 116 patients having PB samples. There was a total of 1271 DE genes (false discovery rate < 0.05) between BM and PB; the top enriched pathways in terms of DE genes belong to the immune system pathways. Aggregated ex-vivo drug response profiles from the two specimens were largely similar, as were the cellular components, except for the GMP-like cell type (17% in BM vs. 5% in PB, p-value = 2 × 10-7). Among the specimen-specific results, the GMP-like subtype was associated with multiple drug resistance in BM and the ProMono-like subtype in PB. Several cell types were associated with the response to induction therapy, but the impact of specimen type on the interaction of cell type and cytarabine-associated induction therapy was not statistically significant for most cell types. Results: Conclusions: Even though there are molecular and cellular differences between BM and PB samples, they show many similarities in ex-vivo drug response profiles, indicating the clinical utility of the substantially less-invasive PB samples.

2.
Conserv Physiol ; 12(1): coae054, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39139733

RESUMEN

Pacific spiny dogfish, Squalus suckleyi, move to shallow coastal waters during critical reproductive life stages and are thus at risk of encountering hypoxic events which occur more frequently in these areas. For effective conservation management, we need to fully understand the consequences of hypoxia on marine key species such as elasmobranchs. Because of their benthic life style, we hypothesized that S. suckleyi are hypoxia tolerant and able to efficiently regulate oxygen consumption, and that anaerobic metabolism is supported by a broad range of metabolites including ketones, fatty acids and amino acids. Therefore, we studied oxygen consumption rates, ventilation frequency and amplitude, blood gasses, acid-base regulation, and changes in plasma and tissue metabolites during progressive hypoxia. Our results show that critical oxygen levels (P crit) where oxyregulation is lost were indeed low (18.1% air saturation or 28.5 Torr at 13°C). However, many dogfish behaved as oxyconformers rather than oxyregulators. Arterial blood PO2 levels mostly decreased linearly with decreasing environmental PO2. Blood gases and acid-base status were dependent on open versus closed respirometry but in both set-ups ventilation frequency increased. Hypoxia below Pcrit resulted in an up-regulation of anaerobic glycolysis, as evidenced by increased lactate levels in all tissues except brain. Elasmobranchs typically rely on ketone bodies as oxidative substrates, and decreased concentrations of acetoacetate and ß-hydroxybutyrate were observed in white muscle of hypoxic and/or recovering fish. Furthermore, reductions in isoleucine, glutamate, glutamine and other amino acids were observed. After 6 hours of normoxic recovery, changes persisted and only lactate returned to normal in most tissues. This emphasizes the importance of using suitable bioindicators adjusted to preferred metabolic pathways of the target species in conservation physiology. We conclude that Pacific spiny dogfish can tolerate severe transient hypoxic events, but recovery is slow and negative impacts can be expected when hypoxia persists.

3.
BMC Bioinformatics ; 25(1): 31, 2024 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-38233808

RESUMEN

Analyzing the interactions of circular RNAs (circRNAs) is a crucial step in understanding their functional impacts. While there are numerous visualization tools available for investigating circRNA interaction networks, these tools are typically limited to known circRNAs from specific databases. Moreover, these existing tools usually require complex installation procedures which can be time-consuming and challenging for users. There is a lack of a user-friendly web application that facilitates interactive exploration and visualization of circRNA interaction networks. CircNetVis is an interactive online web application to enhance the analysis of human/mouse circRNA interactions. The tool allows three different input formats of circRNAs including circRNA IDs from CircBase, circRNA coordinates (chromosome, start position, end position), and circRNA sequences in the FASTA format. It integrates multiple interaction networks for visualization and investigation of the interplay between circRNA, microRNAs, mRNAs and RNA binding proteins. CircNetVis also enables users to interactively explore the interactions of unknown circRNAs which are not reported from previous databases. The tool can generate interactive plots and allows users to save results as output files for offline usage. CircNetVis is implemented as a web application using R-shiny and freely available for academic use at https://www.meb.ki.se/shiny/truvu/CircNetVis/ .


Asunto(s)
MicroARNs , ARN Circular , Humanos , Ratones , Animales , MicroARNs/genética , MicroARNs/metabolismo , ARN Mensajero/genética , Programas Informáticos , Bases de Datos Factuales , Redes Reguladoras de Genes
4.
Phenomics ; 3(3): 217-227, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37325708

RESUMEN

Alternative splicing exists in most multi-exonic genes, and exploring these complex alternative splicing events and their resultant isoform expressions is essential. However, it has become conventional that RNA sequencing results have often been summarized into gene-level expression counts mainly due to the multiple ambiguous mapping of reads at highly similar regions. Transcript-level quantification and interpretation are often overlooked, and biological interpretations are often deduced based on combined transcript information at the gene level. Here, for the most variable tissue of alternative splicing, the brain, we estimate isoform expressions in 1,191 samples collected by the Genotype-Tissue Expression (GTEx) Consortium using a powerful method that we previously developed. We perform genome-wide association scans on the isoform ratios per gene and identify isoform-ratio quantitative trait loci (irQTL), which could not be detected by studying gene-level expressions alone. By analyzing the genetic architecture of the irQTL, we show that isoform ratios regulate educational attainment via multiple tissues including the frontal cortex (BA9), cortex, cervical spinal cord, and hippocampus. These tissues are also associated with different neuro-related traits, including Alzheimer's or dementia, mood swings, sleep duration, alcohol intake, intelligence, anxiety or depression, etc. Mendelian randomization (MR) analysis revealed 1,139 pairs of isoforms and neuro-related traits with plausible causal relationships, showing much stronger causal effects than on general diseases measured in the UK Biobank (UKB). Our results highlight essential transcript-level biomarkers in the human brain for neuro-related complex traits and diseases, which could be missed by merely investigating overall gene expressions. Supplementary Information: The online version contains supplementary material available at 10.1007/s43657-023-00100-6.

5.
NPJ Precis Oncol ; 7(1): 32, 2023 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-36964195

RESUMEN

Despite some encouraging successes, predicting the therapy response of acute myeloid leukemia (AML) patients remains highly challenging due to tumor heterogeneity. Here we aim to develop and validate MDREAM, a robust ensemble-based prediction model for drug response in AML based on an integration of omics data, including mutations and gene expression, and large-scale drug testing. Briefly, MDREAM is first trained in the BeatAML cohort (n = 278), and then validated in the BeatAML (n = 183) and two external cohorts, including a Swedish AML cohort (n = 45) and a relapsed/refractory acute leukemia cohort (n = 12). The final prediction is based on 122 ensemble models, each corresponding to a drug. A confidence score metric is used to convey the uncertainty of predictions; among predictions with a confidence score >0.75, the validated proportion of good responders is 77%. The Spearman correlations between the predicted and the observed drug response are 0.68 (95% CI: [0.64, 0.68]) in the BeatAML validation set, -0.49 (95% CI: [-0.53, -0.44]) in the Swedish cohort and 0.59 (95% CI: [0.51, 0.67]) in the relapsed/refractory cohort. A web-based implementation of MDREAM is publicly available at https://www.meb.ki.se/shiny/truvu/MDREAM/ .

6.
Data Brief ; 47: 108932, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-36819900

RESUMEN

Salmonella enterica is one of the most common agents of foodborne bacterial illness with poultry being an important reservoir. The indiscriminate use of antimicrobial compounds in poultry farming increasingly leads to antimicrobial-resistant (AMR) which threatens the health of both animals and humans. Antimicrobial-resistant Salmonella enterica from the poultry can spread to human through the direct contact with infected poultry or fecal contaminated environments. Antimicrobial-resistant S. enterica, especially fluoroquinolone-resistant nontyphoidal Salmonella is in the list of global health concern stated by the World Health Organization (WHO). Here we report the whole-genome sequencing data and de novo genome assemble of antimicrobial-resistant S. enterica strains S8 and S9 from the C. moschata carcass collected in Vietnam. Genomic DNA of S. enterica were extracted and subjected to whole-genome sequencing using Illumina MiSeq platform. The genome size of antimicrobial-resistant S. enterica strain S8 is 4,707,459 bp with a GC-content of 52.38%, containing 10 antimicrobial resistant genes. The genome size of antimicrobial-resistant Samonella enterica strain S9 is 4,923,944 bp with a GC-content of 52,39%, containing 10 antimicrobial resistance genes. Our data provided the insights on antimicrobial resistant genes of S. enterica isolates from the C. moschata carcass, which help to understand the infection mechanism of antimicrobial-resistant S. enterica in human.

7.
Sensors (Basel) ; 23(3)2023 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-36772473

RESUMEN

The expression abundance of transcripts in nondiseased breast tissue varies among individuals. The association study of genotypes and imaging phenotypes may help us to understand this individual variation. Since existing reports mainly focus on tumors or lesion areas, the heterogeneity of pathological image features and their correlations with RNA expression profiles for nondiseased tissue are not clear. The aim of this study is to discover the association between the nucleus features and the transcriptome-wide RNAs. We analyzed both microscopic histology images and RNA-sequencing data of 456 breast tissues from the Genotype-Tissue Expression (GTEx) project and constructed an automatic computational framework. We classified all samples into four clusters based on their nucleus morphological features and discovered feature-specific gene sets. The biological pathway analysis was performed on each gene set. The proposed framework evaluates the morphological characteristics of the cell nucleus quantitatively and identifies the associated genes. We found image features that capture population variation in breast tissue associated with RNA expressions, suggesting that the variation in expression pattern affects population variation in the morphological traits of breast tissue. This study provides a comprehensive transcriptome-wide view of imaging-feature-specific RNA expression for healthy breast tissue. Such a framework could also be used for understanding the connection between RNA expression and morphology in other tissues and organs. Pathway analysis indicated that the gene sets we identified were involved in specific biological processes, such as immune processes.


Asunto(s)
Neoplasias de la Mama , Transcriptoma , Humanos , Femenino , Transcriptoma/genética , ARN/genética , Análisis de Secuencia de ARN , Genotipo , Fenotipo , Neoplasias de la Mama/diagnóstico por imagen , Neoplasias de la Mama/genética
8.
Nat Commun ; 13(1): 6733, 2022 11 08.
Artículo en Inglés | MEDLINE | ID: mdl-36347843

RESUMEN

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease, involving neuroinflammation and T cell infiltration in the central nervous system. However, the contribution of T cell responses to the pathology of the disease is not fully understood. Here we show, by flow cytometric analysis of blood and cerebrospinal fluid (CSF) samples of a cohort of 89 newly diagnosed ALS patients in Stockholm, Sweden, that T cell phenotypes at the time of diagnosis are good predictors of disease outcome. High frequency of CD4+FOXP3- effector T cells in blood and CSF is associated with poor survival, whereas high frequency of activated regulatory T (Treg) cells and high ratio between activated and resting Treg cells in blood are associated with better survival. Besides survival, phenotypic profiling of T cells could also predict disease progression rate. Single cell transcriptomics analysis of CSF samples shows clonally expanded CD4+ and CD8+ T cells in CSF, with characteristic gene expression patterns. In summary, T cell responses associate with and likely contribute to disease progression in ALS, supporting modulation of adaptive immunity as a viable therapeutic option.


Asunto(s)
Esclerosis Amiotrófica Lateral , Enfermedades Neurodegenerativas , Humanos , Esclerosis Amiotrófica Lateral/diagnóstico , Esclerosis Amiotrófica Lateral/genética , Esclerosis Amiotrófica Lateral/patología , Linfocitos T CD8-positivos/patología , Enfermedades Neurodegenerativas/metabolismo , Linfocitos T Reguladores , Progresión de la Enfermedad
9.
Gigascience ; 112022 09 29.
Artículo en Inglés | MEDLINE | ID: mdl-36173247

RESUMEN

An individualized cancer therapy is ideally chosen to target the cancer's driving biological pathways, but identifying such pathways is challenging because of their underlying heterogeneity and there is no guarantee that they are druggable. We hypothesize that a cancer with an activated druggable cancer-specific pathway (DCSP) is more likely to respond to the relevant drug. Here we develop and validate a systematic method to search for such DCSPs, by (i) introducing a pathway activation score (PAS) that integrates cancer-specific driver mutations and gene expression profile and drug-specific gene targets, (ii) applying the method to identify DCSPs from pan-cancer datasets, and (iii) analyzing the correlation between PAS and the response to relevant drugs. In total, 4,794 DCSPs from 23 different cancers have been discovered in the Genomics of Drug Sensitivity in Cancer database and validated in The Cancer Genome Atlas database. Supporting the hypothesis, for the DCSPs in acute myeloid leukemia, cancers with higher PASs are shown to have stronger drug response, and this is validated in the BeatAML cohort. All DCSPs are publicly available at https://www.meb.ki.se/shiny/truvu/DCSP/.


Asunto(s)
Leucemia Mieloide Aguda , Genómica/métodos , Humanos , Leucemia Mieloide Aguda/tratamiento farmacológico , Leucemia Mieloide Aguda/genética , Transcriptoma
10.
NAR Genom Bioinform ; 4(3): lqac052, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-35855322

RESUMEN

Even though the role of DNA mutations in cancer is well recognized, current quantification of the RNA expression, performed either at gene or isoform level, typically ignores the mutation status. Standard methods for estimating allele-specific expression (ASE) consider gene-level expression, but the functional impact of a mutation is best assessed at isoform level. Hence our goal is to quantify the mutant-allele expression at isoform level. We have developed and implemented a method, named MAX, for quantifying mutant-allele expression given a list of mutations. For a gene of interest, a mutant reference is constructed by incorporating all possible mutant versions of the wild-type isoforms in the transcriptome annotation. The mutant reference is then used for the RNA-seq reads mapping, which in principle works similarly for any quantification tool. We apply an alternating EM algorithm to the read-count data from the mapping step. In a simulation study, MAX performs well against standard isoform-quantification methods. Also, MAX achieves higher accuracy than conventional gene-based ASE methods such as ASEP. An analysis of a real dataset of acute myeloid leukemia reveals a subgroup of NPM1-mutated patients responding well to a kinase inhibitor. Our findings indicate that quantification of mutant-allele expression at isoform level is feasible and has potential added values for assessing the functional impact of DNA mutations in cancers.

11.
Front Genet ; 13: 820493, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35251131

RESUMEN

Several fusion genes are directly involved in the initiation and progression of cancers. Numerous bioinformatics tools have been developed to detect fusion events, but they are mainly based on RNA-seq data. The whole-exome sequencing (WES) represents a powerful technology that is widely used for disease-related DNA variant detection. In this study, we build a novel analysis pipeline called Fuseq-WES to detect fusion genes at DNA level based on the WES data. The same method applies also for targeted panel sequencing data. We assess the method to real datasets of acute myeloid leukemia (AML) and prostate cancer patients. The result shows that two of the main AML fusion genes discovered in RNA-seq data, PML-RARA and CBFB-MYH11, are detected in the WES data in 36 and 63% of the available samples, respectively. For the targeted deep-sequencing of prostate cancer patients, detection of the TMPRSS2-ERG fusion, which is the most frequent chimeric alteration in prostate cancer, is 91% concordant with a manually curated procedure based on four other methods. In summary, the overall results indicate that it is challenging to detect fusion genes in WES data with a standard coverage of ∼ 15-30x, where fusion candidates discovered in the RNA-seq data are often not detected in the WES data and vice versa. A subsampling study of the prostate data suggests that a coverage of at least 75x is necessary to achieve high accuracy.

12.
BMC Genomics ; 23(1): 106, 2022 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-35135477

RESUMEN

BACKGROUND: Circular RNA (circRNA), a class of RNA molecule with a loop structure, has recently attracted researchers due to its diverse biological functions and potential biomarkers of human diseases. Most of the current circRNA detection methods from RNA-sequencing (RNA-Seq) data utilize the mapping information of paired-end (PE) reads to eliminate false positives. However, much of the practical RNA-Seq data such as cross-linking immunoprecipitation sequencing (CLIP-Seq) data usually contain single-end (SE) reads. It is not clear how well these tools perform on SE RNA-Seq data. RESULTS: In this study, we present a systematic evaluation of six advanced RNA-based methods and two CLIP-Seq based methods for detecting circRNAs from SE RNA-Seq data. The performances of the methods are rigorously assessed based on precision, sensitivity, F1 score, and true discovery rate. We investigate the impacts of read length, false positive ratio, sequencing depth and PE mapping information on the performances of the methods using simulated SE RNA-Seq simulated datasets. The real datasets used in this study consist of four experimental RNA-Seq datasets with ≥100bp read length and 124 CLIP-Seq samples from 45 studies that contain mostly short-read (≤50bp) RNA-Seq data. The simulation study shows that the sensitivities of most of the methods can be improved by increasing either read length or sequencing depth, and that the levels of false positive rates significantly affect the precision of all methods. Furthermore, the PE mapping information can improve the method's precision but can not always guarantee the increase of F1 score. Overall, no method is dominant for all SE RNA-Seq data. The RNA-based methods perform better for the long-read datasets but are worse for the short-read datasets. In contrast, the CLIP-Seq based methods outperform the RNA-Seq based methods for all the short-read samples. Combining the results of these methods can significantly improve precision in the CLIP-Seq data. CONCLUSIONS: The results provide a systematic evaluation of circRNA detection methods on SE RNA-Seq data that would facilitate researchers' strategies in circRNA analysis.


Asunto(s)
ARN Circular , ARN , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Inmunoprecipitación , ARN/genética , RNA-Seq , Análisis de Secuencia de ARN
13.
Bioinformatics ; 38(5): 1287-1294, 2022 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-34864849

RESUMEN

MOTIVATION: RNA expression at isoform level is biologically more informative than at gene level and can potentially reveal cellular subsets and corresponding biomarkers that are not visible at gene level. However, due to the strong 3' bias sequencing protocol, mRNA quantification for high-throughput single-cell RNA sequencing such as Chromium Single Cell 3' 10× Genomics is currently performed at the gene level. RESULTS: We have developed an isoform-level quantification method for high-throughput single-cell RNA sequencing by exploiting the concepts of transcription clusters and isoform paralogs. The method, called Scasa, compares well in simulations against competing approaches including Alevin, Cellranger, Kallisto, Salmon, Terminus and STARsolo at both isoform- and gene-level expression. The reanalysis of a CITE-Seq dataset with isoform-based Scasa reveals a subgroup of CD14 monocytes missed by gene-based methods. AVAILABILITY AND IMPLEMENTATION: Implementation of Scasa including source code, documentation, tutorials and test data supporting this study is available at Github: https://github.com/eudoraleer/scasa and Zenodo: https://doi.org/10.5281/zenodo.5712503. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Programas Informáticos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , ARN Mensajero/genética , ARN
14.
BMC Bioinformatics ; 22(1): 495, 2021 Oct 13.
Artículo en Inglés | MEDLINE | ID: mdl-34645386

RESUMEN

BACKGROUND: Circular RNA (circRNA) is an emerging class of RNA molecules attracting researchers due to its potential for serving as markers for diagnosis, prognosis, or therapeutic targets of cancer, cardiovascular, and autoimmune diseases. Current methods for detection of circRNA from RNA sequencing (RNA-seq) focus mostly on improving mapping quality of reads supporting the back-splicing junction (BSJ) of a circRNA to eliminate false positives (FPs). We show that mapping information alone often cannot predict if a BSJ-supporting read is derived from a true circRNA or not, thus increasing the rate of FP circRNAs. RESULTS: We have developed Circall, a novel circRNA detection method from RNA-seq. Circall controls the FPs using a robust multidimensional local false discovery rate method based on the length and expression of circRNAs. It is computationally highly efficient by using a quasi-mapping algorithm for fast and accurate RNA read alignments. We applied Circall on two simulated datasets and three experimental datasets of human cell-lines. The results show that Circall achieves high sensitivity and precision in the simulated data. In the experimental datasets it performs well against current leading methods. Circall is also substantially faster than the other methods, particularly for large datasets. CONCLUSIONS: With those better performances in the detection of circRNAs and in computational time, Circall facilitates the analyses of circRNAs in large numbers of samples. Circall is implemented in C++ and R, and available for use at https://www.meb.ki.se/sites/biostatwiki/circall and https://github.com/datngu/Circall.


Asunto(s)
ARN Circular , ARN , Humanos , ARN/genética , Empalme del ARN , RNA-Seq , Análisis de Secuencia de ARN
15.
Am J Hematol ; 96(5): 580-588, 2021 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-33625756

RESUMEN

Molecular classification of acute myeloid leukemia (AML) aids prognostic stratification and clinical management. Our aim in this study is to identify transcriptome-wide mRNAs that are specific to each of the molecular subtypes of AML. We analyzed RNA-sequencing data of 955 AML samples from three cohorts, including the BeatAML project, the Cancer Genome Atlas, and a cohort of Swedish patients to provide a comprehensive transcriptome-wide view of subtype-specific mRNA expression. We identified 729 subtype-specific mRNAs, discovered in the BeatAML project and validated in the other two cohorts. Using unique proteomics data, we also validated the presence of subtype-specific mRNAs at the protein level, yielding a rich collection of potential protein-based biomarkers for the AML community. To enable the exploration of subtype-specific mRNA expression by the broader scientific community, we provide an interactive resource to the public.


Asunto(s)
Leucemia Mieloide Aguda/genética , ARN Mensajero/biosíntesis , ARN Neoplásico/biosíntesis , Transcriptoma , Biomarcadores de Tumor , Genes Relacionados con las Neoplasias , Humanos , Leucemia Mieloide Aguda/clasificación , Leucemia Mieloide Aguda/metabolismo , Proteínas de Neoplasias/biosíntesis , Proteínas de Neoplasias/genética , Proteínas de Fusión Oncogénica/biosíntesis , Proteínas de Fusión Oncogénica/genética , Proteoma , ARN Mensajero/genética , ARN Neoplásico/genética , RNA-Seq , Estudios Retrospectivos , Suecia
16.
Bioinformatics ; 36(3): 805-812, 2020 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-31400221

RESUMEN

MOTIVATION: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. RESULTS: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model Xß, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers Xß as a bilinear model with both X and ß unknown. Joint estimation of X and ß is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and ß. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets. AVAILABILITY AND IMPLEMENTATION: The method and pipeline are implemented as a tool and freely available for use at http://fafner.meb.ki.se/biostatwiki/xaem/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , RNA-Seq , Algoritmos , Isoformas de Proteínas/genética , Análisis de Secuencia de ARN , Programas Informáticos
17.
Bioinformatics ; 35(22): 4679-4687, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31028395

RESUMEN

MOTIVATION: Both single-cell RNA sequencing (scRNA-seq) and DNA sequencing (scDNA-seq) have been applied for cell-level genomic profiling. For mutation profiling, the latter seems more natural. However, the task is highly challenging due to the limited input materials from only two copies of DNA molecules, while whole-genome amplification generates biases and other technical noises. ScRNA-seq starts with a higher input amount, so generally has better data quality. There exists various methods for mutation detection from DNA sequencing, it is not clear whether these methods work for scRNA-seq data. RESULTS: Mutation detection methods developed for either bulk-cell sequencing data or scDNA-seq data do not work well for the scRNA-seq data, as they produce substantial numbers of false positives. We develop a novel and robust statistical method-called SCmut-to identify specific cells that harbor mutations discovered in bulk-cell data. Statistically SCmut controls the false positives using the 2D local false discovery rate method. We apply SCmut to several scRNA-seq datasets. In scRNA-seq breast cancer datasets SCmut identifies a number of highly confident cell-level mutations that are recurrent in many cells and consistent in different samples. In a scRNA-seq glioblastoma dataset, we discover a recurrent cell-level mutation in the PDGFRA gene that is highly correlated with a well-known in-frame deletion in the gene. To conclude, this study contributes a novel method to discover cell-level mutation information from scRNA-seq that can facilitate investigation of cell-to-cell heterogeneity. AVAILABILITY AND IMPLEMENTATION: The source codes and bioinformatics pipeline of SCmut are available at https://github.com/nghiavtr/SCmut. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Mutación , Perfilación de la Expresión Génica , Humanos , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Programas Informáticos
18.
Front Genet ; 10: 1331, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-32010190

RESUMEN

Detection of differentially expressed genes is a common task in single-cell RNA-seq (scRNA-seq) studies. Various methods based on both bulk-cell and single-cell approaches are in current use. Due to the unique distributional characteristics of single-cell data, it is important to compare these methods with rigorous statistical assessments. In this study, we assess the reproducibility of 9 tools for differential expression analysis in scRNA-seq data. These tools include four methods originally designed for scRNA-seq data, three popular methods originally developed for bulk-cell RNA-seq data but have been applied in scRNA-seq analysis, and two general statistical tests. Instead of comparing the performance across all genes, we compare the methods in terms of the rediscovery rates (RDRs) of top-ranked genes, separately for highly and lowly expressed genes. Three real and one simulated scRNA-seq data sets are used for the comparisons. The results indicate that some widely used methods, such as edgeR and monocle, have worse RDR performances compared to the other methods, especially for the top-ranked genes. For highly expressed genes, many bulk-cell-based methods can perform similarly to the methods designed for scRNA-seq data. But for the lowly expressed genes performance varies substantially; edgeR and monocle are too liberal and have poor control of false positives, while DESeq2 is too conservative and consequently loses sensitivity compared to the other methods. BPSC, Limma, DEsingle, MAST, t-test and Wilcoxon have similar performances in the real data sets. Overall, the scRNA-seq based method BPSC performs well against the other methods, particularly when there is a sufficient number of cells.

19.
BMC Genomics ; 19(1): 786, 2018 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-30382840

RESUMEN

BACKGROUND: Fusion genes are known to be drivers of many common cancers, so they are potential markers for diagnosis, prognosis or therapy response. The advent of paired-end RNA sequencing enhances our ability to discover fusion genes. While there are available methods, routine analyses of large number of samples are still limited due to high computational demands. RESULTS: We develop FuSeq, a fast and accurate method to discover fusion genes based on quasi-mapping to quickly map the reads, extract initial candidates from split reads and fusion equivalence classes of mapped reads, and finally apply multiple filters and statistical tests to get the final candidates. We apply FuSeq to four validated datasets: breast cancer, melanoma and glioma datasets, and one spike-in dataset. The results reveal high sensitivity and specificity in all datasets, and compare well against other methods such as FusionMap, TRUP, TopHat-Fusion, SOAPfuse and JAFFA. In terms of computational time, FuSeq is two-fold faster than FusionMap and orders of magnitude faster than the other methods. CONCLUSIONS: With this advantage of less computational demands, FuSeq makes it practical to investigate fusion genes in large numbers of samples. FuSeq is implemented in C++ and R, and available at https://github.com/nghiavtr/FuSeq for non-commercial uses.


Asunto(s)
Fusión Génica , ARN/genética , Análisis de Secuencia de ARN , Algoritmos , Línea Celular Tumoral , Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias/genética , Proteínas de Fusión Oncogénica/genética , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN/métodos
20.
Biol Direct ; 13(1): 14, 2018 07 16.
Artículo en Inglés | MEDLINE | ID: mdl-30012197

RESUMEN

BACKGROUND: Neuroblastoma is the most common pediatric malignancy with heterogeneous clinical behaviors, ranging from spontaneous regression to aggressive progression. Many studies have identified aberrations related to the pathogenesis and prognosis, broadly classifying neuroblastoma patients into high- and low-risk groups, but predicting tumor progression and clinical management of high-risk patients remains a big challenge. RESULTS: We integrate gene-level expression, array-based comparative genomic hybridization and functional gene-interaction network of 145 neuroblastoma patients to detect potential driver genes. The drivers are summarized into a driver-gene score (DGscore) for each patient, and we then validate its clinical relevance in terms of association with patient survival. Focusing on a subset of 48 clinically defined high-risk patients, we identify 193 recurrent regions of copy number alterations (CNAs), resulting in 274 altered genes whose copy-number gain or loss have parallel impact on the gene expression. Using a network enrichment analysis, we detect four common driver genes, ERCC6, HECTD2, KIAA1279, EMX2, and 66 patient-specific driver genes. Patients with high DGscore, thus carrying more copy-number-altered genes with correspondingly up- or down-regulated expression and functional implications, have worse survival than those with low DGscore (P = 0.006). Furthermore, Cox proportional-hazards regression analysis shows that, adjusted for age, tumor stage and MYCN amplification, DGscore is the only significant prognostic factor for high-risk neuroblastoma patients (P = 0.008). CONCLUSIONS: Integration of genomic copy number alteration, expression and functional interaction-network data reveals clinically relevant and prognostic putative driver genes in high-risk neuroblastoma patients. The identified putative drivers are potential drug targets for individualized therapy. REVIEWERS: This article was reviewed by Armand Valsesia, Susmita Datta and Aleksandra Gruca.


Asunto(s)
Hibridación Genómica Comparativa/métodos , Neuroblastoma/genética , Animales , Variaciones en el Número de Copia de ADN/genética , Dosificación de Gen/genética , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Humanos , Modelos de Riesgos Proporcionales
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA