Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Bioinformatics ; 40(Supplement_1): i189-i198, 2024 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-38940152

RESUMEN

MOTIVATION: Multimodal profiling strategies promise to produce more informative insights into biomedical cohorts via the integration of the information each modality contributes. To perform this integration, however, the development of novel analytical strategies is needed. Multimodal profiling strategies often come at the expense of lower sample numbers, which can challenge methods to uncover shared signals across a cohort. Thus, factor analysis approaches are commonly used for the analysis of high-dimensional data in molecular biology, however, they typically do not yield representations that are directly interpretable, whereas many research questions often center around the analysis of pathways associated with specific observations. RESULTS: We develop PathFA, a novel approach for multimodal factor analysis over the space of pathways. PathFA produces integrative and interpretable views across multimodal profiling technologies, which allow for the derivation of concrete hypotheses. PathFA combines a pathway-learning approach with integrative multimodal capability under a Bayesian procedure that is efficient, hyper-parameter free, and able to automatically infer observation noise from the data. We demonstrate strong performance on small sample sizes within our simulation framework and on matched proteomics and transcriptomics profiles from real tumor samples taken from the Swiss Tumor Profiler consortium. On a subcohort of melanoma patients, PathFA recovers pathway activity that has been independently associated with poor outcome. We further demonstrate the ability of this approach to identify pathways associated with the presence of specific cell-types as well as tumor heterogeneity. Our results show that we capture known biology, making it well suited for analyzing multimodal sample cohorts. AVAILABILITY AND IMPLEMENTATION: The tool is implemented in python and available at https://github.com/ratschlab/path-fa.


Asunto(s)
Teorema de Bayes , Humanos , Proteómica/métodos , Análisis Factorial , Perfilación de la Expresión Génica/métodos , Melanoma/metabolismo , Algoritmos , Biología Computacional/métodos
2.
Blood Adv ; 8(3): 766-779, 2024 02 13.
Artículo en Inglés | MEDLINE | ID: mdl-38147624

RESUMEN

ABSTRACT: It is still not fully understood how genetic haploinsufficiency in del(5q) myelodysplastic syndrome (MDS) contributes to malignant transformation of hematopoietic stem cells. We asked how compound haploinsufficiency for Csnk1a1 and Egr1 in the common deleted region on chromosome 5 affects hematopoietic stem cells. Additionally, Trp53 was disrupted as the most frequently comutated gene in del(5q) MDS using CRISPR/Cas9 editing in hematopoietic progenitors of wild-type (WT), Csnk1a1-/+, Egr1-/+, Csnk1a1/Egr1-/+ mice. A transplantable acute leukemia only developed in the Csnk1a1-/+Trp53-edited recipient. Isolated blasts were indefinitely cultured ex vivo and gave rise to leukemia after transplantation, providing a tool to study disease mechanisms or perform drug screenings. In a small-scale drug screening, the collaborative effect of Csnk1a1 haploinsufficiency and Trp53 sensitized blasts to the CSNK1 inhibitor A51 relative to WT or Csnk1a1 haploinsufficient cells. In vivo, A51 treatment significantly reduced blast counts in Csnk1a1 haploinsufficient/Trp53 acute leukemias and restored hematopoiesis in the bone marrow. Transcriptomics on blasts and their normal counterparts showed that the derived leukemia was driven by MAPK and Myc upregulation downstream of Csnk1a1 haploinsufficiency cooperating with a downregulated p53 axis. A collaborative effect of Csnk1a1 haploinsufficiency and p53 loss on MAPK and Myc upregulation was confirmed on the protein level. Downregulation of Myc protein expression correlated with efficient elimination of blasts in A51 treatment. The "Myc signature" closely resembled the transcriptional profile of patients with del(5q) MDS with TP53 mutation.


Asunto(s)
Leucemia Mieloide Aguda , Síndromes Mielodisplásicos , Animales , Humanos , Ratones , Médula Ósea/metabolismo , Deleción Cromosómica , Haploinsuficiencia , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/tratamiento farmacológico , Síndromes Mielodisplásicos/genética , Proteína p53 Supresora de Tumor/genética , Proteína p53 Supresora de Tumor/metabolismo
4.
Nat Methods ; 20(11): 1759-1768, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37770709

RESUMEN

Understanding and predicting molecular responses in single cells upon chemical, genetic or mechanical perturbations is a core question in biology. Obtaining single-cell measurements typically requires the cells to be destroyed. This makes learning heterogeneous perturbation responses challenging as we only observe unpaired distributions of perturbed or non-perturbed cells. Here we leverage the theory of optimal transport and the recent advent of input convex neural architectures to present CellOT, a framework for learning the response of individual cells to a given perturbation by mapping these unpaired distributions. CellOT outperforms current methods at predicting single-cell drug responses, as profiled by scRNA-seq and a multiplexed protein-imaging technology. Further, we illustrate that CellOT generalizes well on unseen settings by (1) predicting the scRNA-seq responses of holdout patients with lupus exposed to interferon-ß and patients with glioblastoma to panobinostat; (2) inferring lipopolysaccharide responses across different species; and (3) modeling the hematopoietic developmental trajectories of different subpopulations.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Humanos , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión Génica/métodos
5.
medRxiv ; 2023 Mar 10.
Artículo en Inglés | MEDLINE | ID: mdl-36945540

RESUMEN

Background: Homologous Recombination Deficiency (HRD) is a pan-cancer predictive biomarker that identifies patients who benefit from therapy with PARP inhibitors (PARPi). However, testing for HRD is highly complex. Here, we investigated whether Deep Learning can predict HRD status solely based on routine Hematoxylin & Eosin (H&E) histology images in ten cancer types. Methods: We developed a fully automated deep learning pipeline with attention-weighted multiple instance learning (attMIL) to predict HRD status from histology images. A combined genomic scar HRD score, which integrated loss of heterozygosity (LOH), telomeric allelic imbalance (TAI) and large-scale state transitions (LST) was calculated from whole genome sequencing data for n=4,565 patients from two independent cohorts. The primary statistical endpoint was the Area Under the Receiver Operating Characteristic curve (AUROC) for the prediction of genomic scar HRD with a clinically used cutoff value. Results: We found that HRD status is predictable in tumors of the endometrium, pancreas and lung, reaching cross-validated AUROCs of 0.79, 0.58 and 0.66. Predictions generalized well to an external cohort with AUROCs of 0.93, 0.81 and 0.73 respectively. Additionally, an HRD classifier trained on breast cancer yielded an AUROC of 0.78 in internal validation and was able to predict HRD in endometrial, prostate and pancreatic cancer with AUROCs of 0.87, 0.84 and 0.67 indicating a shared HRD-like phenotype is across tumor entities. Conclusion: In this study, we show that HRD is directly predictable from H&E slides using attMIL within and across ten different tumor types.

8.
Cell Rep ; 40(8): 111266, 2022 08 23.
Artículo en Inglés | MEDLINE | ID: mdl-36001976

RESUMEN

Mutations in the splicing factor SF3B1 are frequently occurring in various cancers and drive tumor progression through the activation of cryptic splice sites in multiple genes. Recent studies also demonstrate a positive correlation between the expression levels of wild-type SF3B1 and tumor malignancy. Here, we demonstrate that SF3B1 is a hypoxia-inducible factor (HIF)-1 target gene that positively regulates HIF1 pathway activity. By physically interacting with HIF1α, SF3B1 facilitates binding of the HIF1 complex to hypoxia response elements (HREs) to activate target gene expression. To further validate the relevance of this mechanism for tumor progression, we show that a reduction in SF3B1 levels via monoallelic deletion of Sf3b1 impedes tumor formation and progression via impaired HIF signaling in a mouse model for pancreatic cancer. Our work uncovers an essential role of SF3B1 in HIF1 signaling, thereby providing a potential explanation for the link between high SF3B1 expression and aggressiveness of solid tumors.


Asunto(s)
Neoplasias Pancreáticas , Transducción de Señal , Animales , Línea Celular Tumoral , Hipoxia/metabolismo , Factor 1 Inducible por Hipoxia/metabolismo , Subunidad alfa del Factor 1 Inducible por Hipoxia/genética , Subunidad alfa del Factor 1 Inducible por Hipoxia/metabolismo , Ratones , Neoplasias Pancreáticas/genética , Fosfoproteínas/genética , Fosfoproteínas/metabolismo , Sitios de Empalme de ARN , Factores de Empalme de ARN/genética , Factores de Empalme de ARN/metabolismo , Neoplasias Pancreáticas
9.
J Comput Biol ; 29(8): 857-866, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35776515

RESUMEN

With the constant increase of large-scale genomic data projects, automated and high-throughput quality assessment becomes a crucial component of any analysis. Whereas small projects often have a more homogeneous design and a manageable structure allowing for a manual per-sample analysis of quality, large-scale studies tend to be much more heterogeneous and complex. Many quality metrics have been developed to assess the quality of an individual sample on the raw read level. Degradation effects are typically assessed based on the RNA integrity (RIN) score, or on postalignment data. In this study, we show that single commonly used quality criteria such as the RIN score alone are not sufficient to ensure RNA sample quality. We developed a new approach and provide an efficient tool that estimates RNA sample degradation by computing the 5'/3' bias based on all genes in an alignment-free manner. That enables degradation assessment right after data generation and not during the analysis procedure allowing for early intervention in the sample handling process. Our analysis shows that this strategy is fast, robust to annotation and differences in library size, and provides complementary quality information to RIN scores enabling the accurate identification of degraded samples.


Asunto(s)
Estabilidad del ARN , ARN , Genómica , ARN/química , ARN/genética , Análisis de Secuencia de ARN/métodos
10.
Bioinformatics ; 38(18): 4293-4300, 2022 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-35900151

RESUMEN

MOTIVATION: Several recently developed single-cell DNA sequencing technologies enable whole-genome sequencing of thousands of cells. However, the ultra-low coverage of the sequenced data (<0.05× per cell) mostly limits their usage to the identification of copy number alterations in multi-megabase segments. Many tumors are not copy number-driven, and thus single-nucleotide variant (SNV)-based subclone detection may contribute to a more comprehensive view on intra-tumor heterogeneity. Due to the low coverage of the data, the identification of SNVs is only possible when superimposing the sequenced genomes of hundreds of genetically similar cells. Thus, we have developed a new approach to efficiently cluster tumor cells based on a Bayesian filtering approach of relevant loci and exploiting read overlap and phasing. RESULTS: We developed Single Cell Data Tumor Clusterer (SECEDO, lat. 'to separate'), a new method to cluster tumor cells based solely on SNVs, inferred on ultra-low coverage single-cell DNA sequencing data. We applied SECEDO to a synthetic dataset simulating 7250 cells and eight tumor subclones from a single patient and were able to accurately reconstruct the clonal composition, detecting 92.11% of the somatic SNVs, with the smallest clusters representing only 6.9% of the total population. When applied to five real single-cell sequencing datasets from a breast cancer patient, each consisting of ≈2000 cells, SECEDO was able to recover the major clonal composition in each dataset at the original coverage of 0.03×, achieving an Adjusted Rand Index (ARI) score of ≈0.6. The current state-of-the-art SNV-based clustering method achieved an ARI score of ≈0, even after merging cells to create higher coverage data (factor 10 increase), and was only able to match SECEDOs performance when pooling data from all five datasets, in addition to artificially increasing the sequencing coverage by a factor of 7. Variant calling on the resulting clusters recovered more than twice as many SNVs as would have been detected if calling on all cells together. Further, the allelic ratio of the called SNVs on each subcluster was more than double relative to the allelic ratio of the SNVs called without clustering, thus demonstrating that calling variants on subclones, in addition to both increasing sensitivity of SNV detection and attaching SNVs to subclones, significantly increases the confidence of the called variants. AVAILABILITY AND IMPLEMENTATION: SECEDO is implemented in C++ and is publicly available at https://github.com/ratschlab/secedo. Instructions to download the data and the evaluation code to reproduce the findings in this paper are available at: https://github.com/ratschlab/secedo-evaluation. The code and data of the submitted version are archived at: https://doi.org/10.5281/zenodo.6516955. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Neoplasias , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Teorema de Bayes , Análisis de Secuencia de ADN , Genoma , Secuencia de Bases , Neoplasias/genética , Polimorfismo de Nucleótido Simple
11.
Cancer Cell ; 39(3): 288-293, 2021 03 08.
Artículo en Inglés | MEDLINE | ID: mdl-33482122

RESUMEN

The application and integration of molecular profiling technologies create novel opportunities for personalized medicine. Here, we introduce the Tumor Profiler Study, an observational trial combining a prospective diagnostic approach to assess the relevance of in-depth tumor profiling to support clinical decision-making with an exploratory approach to improve the biological understanding of the disease.


Asunto(s)
Neoplasias/genética , Neoplasias/metabolismo , Toma de Decisiones Clínicas/métodos , Biología Computacional/métodos , Sistemas de Apoyo a Decisiones Clínicas , Humanos , Medicina de Precisión/métodos , Estudios Prospectivos
12.
Bioinformatics ; 36(Suppl_2): i919-i927, 2020 12 30.
Artículo en Inglés | MEDLINE | ID: mdl-33381818

RESUMEN

MOTIVATION: Recent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of the perspectives afforded by each technology. In most cases, however, profiling technologies consume the used cells and thus pairwise correspondences between datasets are lost. Due to the sheer size single-cell datasets can acquire, scalable algorithms that are able to universally match single-cell measurements carried out in one cell to its corresponding sibling in another technology are needed. RESULTS: We propose Single-Cell data Integration via Matching (SCIM), a scalable approach to recover such correspondences in two or more technologies. SCIM assumes that cells share a common (low-dimensional) underlying structure and that the underlying cell distribution is approximately constant across technologies. It constructs a technology-invariant latent space using an autoencoder framework with an adversarial objective. Multi-modal datasets are integrated by pairing cells across technologies using a bipartite matching scheme that operates on the low-dimensional latent representations. We evaluate SCIM on a simulated cellular branching process and show that the cell-to-cell matches derived by SCIM reflect the same pseudotime on the simulated dataset. Moreover, we apply our method to two real-world scenarios, a melanoma tumor sample and a human bone marrow sample, where we pair cells from a scRNA dataset to their sibling cells in a CyTOF dataset achieving 90% and 78% cell-matching accuracy for each one of the samples, respectively. AVAILABILITY AND IMPLEMENTATION: https://github.com/ratschlab/scim. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Análisis de la Célula Individual , Programas Informáticos , Algoritmos , Perfilación de la Expresión Génica , Humanos , Análisis de Secuencia de ARN
13.
Bioinformatics ; 36(Suppl_1): i154-i160, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32657388

RESUMEN

MOTIVATION: Understanding the underlying mutational processes of cancer patients has been a long-standing goal in the community and promises to provide new insights that could improve cancer diagnoses and treatments. Mutational signatures are summaries of the mutational processes, and improving the derivation of mutational signatures can yield new discoveries previously obscured by technical and biological confounders. Results from existing mutational signature extraction methods depend on the size of available patient cohort and solely focus on the analysis of mutation count data without considering the exploitation of metadata. RESULTS: Here we present a supervised method that utilizes cancer type as metadata to extract more distinctive signatures. More specifically, we use a negative binomial non-negative matrix factorization and add a support vector machine loss. We show that mutational signatures extracted by our proposed method have a lower reconstruction error and are designed to be more predictive of cancer type than those generated by unsupervised methods. This design reduces the need for elaborate post-processing strategies in order to recover most of the known signatures unlike the existing unsupervised signature extraction methods. Signatures extracted by a supervised model used in conjunction with cancer-type labels are also more robust, especially when using small and potentially cancer-type limited patient cohorts. Finally, we adapted our model such that molecular features can be utilized to derive an according mutational signature. We used APOBEC expression and MUTYH mutation status to demonstrate the possibilities that arise from this ability. We conclude that our method, which exploits available metadata, improves the quality of mutational signatures as well as helps derive more interpretable representations. AVAILABILITY AND IMPLEMENTATION: https://github.com/ratschlab/SNBNMF-mutsig-public. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Neoplasias , Estudios de Cohortes , Humanos , Mutación , Neoplasias/genética
14.
Nature ; 578(7793): 102-111, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-32025015

RESUMEN

The discovery of drivers of cancer has traditionally focused on protein-coding genes1-4. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium5 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods. For structural variants, we present two methods of driver discovery, and identify regions that are significantly affected by recurrent breakpoints and recurrent somatic juxtapositions. Our analyses confirm previously reported drivers6,7, raise doubts about others and identify novel candidates, including point mutations in the 5' region of TP53, in the 3' untranslated regions of NFKBIZ and TOB1, focal deletions in BRD4 and rearrangements in the loci of AKR1C genes. We show that although point mutations and structural variants that drive cancer are less frequent in non-coding genes and regulatory sequences than in protein-coding genes, additional examples of these drivers will be found as more cancer genomes become available.


Asunto(s)
Genoma Humano/genética , Mutación/genética , Neoplasias/genética , Roturas del ADN , Bases de Datos Genéticas , Regulación Neoplásica de la Expresión Génica , Estudio de Asociación del Genoma Completo , Humanos , Mutación INDEL
15.
Nature ; 578(7793): 129-136, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-32025019

RESUMEN

Transcript alterations often result from somatic changes in cancer genomes1. Various forms of RNA alterations have been described in cancer, including overexpression2, altered splicing3 and gene fusions4; however, it is difficult to attribute these to underlying genomic changes owing to heterogeneity among patients and tumour types, and the relatively small cohorts of patients for whom samples have been analysed by both transcriptome and whole-genome sequencing. Here we present, to our knowledge, the most comprehensive catalogue of cancer-associated gene alterations to date, obtained by characterizing tumour transcriptomes from 1,188 donors of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)5. Using matched whole-genome sequencing data, we associated several categories of RNA alterations with germline and somatic DNA alterations, and identified probable genetic mechanisms. Somatic copy-number alterations were the major drivers of variations in total gene and allele-specific expression. We identified 649 associations of somatic single-nucleotide variants with gene expression in cis, of which 68.4% involved associations with flanking non-coding regions of the gene. We found 1,900 splicing alterations associated with somatic mutations, including the formation of exons within introns in proximity to Alu elements. In addition, 82% of gene fusions were associated with structural variants, including 75 of a new class, termed 'bridged' fusions, in which a third genomic location bridges two genes. We observed transcriptomic alteration signatures that differ between cancer types and have associations with variations in DNA mutational signatures. This compendium of RNA alterations in the genomic context provides a rich resource for identifying genes and mechanisms that are functionally implicated in cancer.


Asunto(s)
Regulación Neoplásica de la Expresión Génica , Neoplasias/genética , ARN/genética , Variaciones en el Número de Copia de ADN , ADN de Neoplasias , Genoma Humano , Genómica , Humanos , Transcriptoma
16.
Cell ; 178(6): 1465-1477.e17, 2019 09 05.
Artículo en Inglés | MEDLINE | ID: mdl-31491388

RESUMEN

Most human protein-coding genes are regulated by multiple, distinct promoters, suggesting that the choice of promoter is as important as its level of transcriptional activity. However, while a global change in transcription is recognized as a defining feature of cancer, the contribution of alternative promoters still remains largely unexplored. Here, we infer active promoters using RNA-seq data from 18,468 cancer and normal samples, demonstrating that alternative promoters are a major contributor to context-specific regulation of transcription. We find that promoters are deregulated across tissues, cancer types, and patients, affecting known cancer genes and novel candidates. For genes with independently regulated promoters, we demonstrate that promoter activity provides a more accurate predictor of patient survival than gene expression. Our study suggests that a dynamic landscape of active promoters shapes the cancer transcriptome, opening new diagnostic avenues and opportunities to further explore the interplay of regulatory mechanisms with transcriptional aberrations in cancer.


Asunto(s)
Biología Computacional/métodos , Regulación Neoplásica de la Expresión Génica/genética , Neoplasias/genética , Regiones Promotoras Genéticas/genética , Transcriptoma/genética , Bases de Datos Genéticas , Humanos , RNA-Seq/métodos
17.
Cancer Cell ; 34(2): 211-224.e6, 2018 08 13.
Artículo en Inglés | MEDLINE | ID: mdl-30078747

RESUMEN

Our comprehensive analysis of alternative splicing across 32 The Cancer Genome Atlas cancer types from 8,705 patients detects alternative splicing events and tumor variants by reanalyzing RNA and whole-exome sequencing data. Tumors have up to 30% more alternative splicing events than normal samples. Association analysis of somatic variants with alternative splicing events confirmed known trans associations with variants in SF3B1 and U2AF1 and identified additional trans-acting variants (e.g., TADA1, PPP2R1A). Many tumors have thousands of alternative splicing events not detectable in normal samples; on average, we identified ≈930 exon-exon junctions ("neojunctions") in tumors not typically found in GTEx normals. From Clinical Proteomic Tumor Analysis Consortium data available for breast and ovarian tumor samples, we confirmed ≈1.7 neojunction- and ≈0.6 single nucleotide variant-derived peptides per tumor sample that are also predicted major histocompatibility complex-I binders ("putative neoantigens").


Asunto(s)
Empalme Alternativo , Neoplasias/genética , Humanos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Análisis de Secuencia de ARN , Secuenciación del Exoma
18.
BMC Genomics ; 16: 198, 2015 Mar 17.
Artículo en Inglés | MEDLINE | ID: mdl-25888292

RESUMEN

BACKGROUND: Variation within splicing regulatory sequences often leads to differences in gene models among individuals within a species. Two alleles of the same gene may express transcripts with different exon/intron structures and consequently produce functionally different proteins. Matching genomic and transcriptomic data allows us to identify putative regulatory variants associated with changes in splicing patterns. RESULTS: Here we analyzed natural variation of splicing patterns in the transcriptomes of 81 natural strains of Drosophila melanogaster with known genotypes. We identified dozens of genotype-specific splicing patterns associated with putative cis-splicing quantitative trait loci (sQTL). The majority of changes can be explained by mutations in splice sites. Allelic-imbalance in splicing patterns confirmed that the majority are regulated mainly by cis-genetic effects. Remarkably, allele-specific splicing changes often lead to qualitative changes in gene models, yielding many isoforms not previously annotated. The observed alterations are typically outside protein-coding regions or affect only very short protein segments. CONCLUSIONS: Overall, the sets of gene models appear to be flexible within D. melanogaster populations. The observed variation in splicing patterns are predicted to have limited effects on the encoded protein sequences. To our knowledge, this is the first sQTL mapping study in Drosophila.


Asunto(s)
Drosophila melanogaster/genética , Variación Genética , Modelos Genéticos , Alelos , Desequilibrio Alélico , Empalme Alternativo , Animales , Exones , Perfilación de la Expresión Génica , Genotipo , Sistemas de Lectura Abierta , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Sitios de Empalme de ARN , Transcriptoma
19.
Pac Symp Biocomput ; : 44-55, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25592567

RESUMEN

We present a genome-wide analysis of splicing patterns of 282 kidney renal clear cell carcinoma patients in which we integrate data from whole-exome sequencing of tumor and normal samples, RNA-seq and copy number variation. We proposed a scoring mechanism to compare splicing patterns in tumor samples to normal samples in order to rank and detect tumor-specific isoforms that have a potential for new biomarkers. We identified a subset of genes that show introns only observable in tumor but not in normal samples, ENCODE and GEUVADIS samples. In order to improve our understanding of the underlying genetic mechanisms of splicing variation we performed a large-scale association analysis to find links between somatic or germline variants with alternative splicing events. We identified 915 cis- and trans-splicing quantitative trait loci (sQTL) associated with changes in splicing patterns. Some of these sQTL have previously been associated with being susceptibility loci for cancer and other diseases. Our analysis also allowed us to identify the function of several COSMIC variants showing significant association with changes in alternative splicing. This demonstrates the potential significance of variants affecting alternative splicing events and yields insights into the mechanisms related to an array of disease phenotypes.


Asunto(s)
Carcinoma de Células Renales/genética , Neoplasias Renales/genética , Empalme del ARN , Empalme Alternativo , Biomarcadores de Tumor/genética , Biología Computacional , Exoma , Predisposición Genética a la Enfermedad , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos , Intrones , Sitios de Carácter Cuantitativo , Análisis de Secuencia de ARN
20.
Nucleic Acids Res ; 41(1): e7, 2013 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-22941663

RESUMEN

The thousand genomes project and many similar ongoing large-scale sequencing efforts require new methods to predict functional variants in both coding and non-coding regions in order to understand phenotype and genotype relationships. We report the design of a new model SInBaD (Sequence-Information-Based-Decision-model) which relies on nucleotide conservation information to evaluate any annotated human variant in all known exons, introns, splice junctions and promoter regions. SInBaD builds separate mathematical models for promoters, exons and introns, using the human disease mutations annotated in human gene mutation database as the training dataset for functional variants. The ten-fold cross validation shows high prediction accuracy. Validations on test datasets, demonstrate that variants predicted as functional have a significantly higher occurrence in cancer patients. We also applied our model to variants found in four different individual human genomes to identify a set of functional variants, which might be of interest for further studies. Scores for any possible variants for all annotated genes are available under http://tingchenlab.cmb.usc.edu/sinbad/. SInBaD supports the current standard format of genotyping, the variant call files (VCF 4.0), making it easy to integrate it into any existing next-generation sequencing pipeline. The accuracy of SNP detection poses the only limitation to the use of SInBaD.


Asunto(s)
Variación Genética , Genómica/métodos , Técnicas de Apoyo para la Decisión , Exones , Genoma Humano , Proyecto Mapa de Haplotipos , Humanos , Intrones , Mutación , Neoplasias/genética , Regiones Promotoras Genéticas , Sitios de Empalme de ARN , Reproducibilidad de los Resultados , Alineación de Secuencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...