RESUMEN
Transcription of tRNA genes by RNA polymerase III (RNAPIII) is tuned by signaling cascades. The emerging notion of differential tRNA gene regulation implies the existence of additional regulatory mechanisms. However, tRNA gene-specific regulators have not been described. Decoding the local chromatin proteome of a native tRNA gene in yeast revealed reprogramming of the RNAPIII transcription machinery upon nutrient perturbation. Among the dynamic proteins, we identified Fpt1, a protein of unknown function that uniquely occupied RNAPIII-regulated genes. Fpt1 binding at tRNA genes correlated with the efficiency of RNAPIII eviction upon nutrient perturbation and required the transcription factors TFIIIB and TFIIIC but not RNAPIII. In the absence of Fpt1, eviction of RNAPIII was reduced, and the shutdown of ribosome biogenesis genes was impaired upon nutrient perturbation. Our findings provide support for a chromatin-associated mechanism required for RNAPIII eviction from tRNA genes and tuning the physiological response to changing metabolic demands.
Asunto(s)
ARN Polimerasa III , Proteínas de Saccharomyces cerevisiae , ARN Polimerasa III/genética , ARN Polimerasa III/metabolismo , Proteoma/genética , Proteoma/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Cromatina/genética , Cromatina/metabolismo , Regulación Fúngica de la Expresión Génica , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , ARN de Transferencia/genética , ARN de Transferencia/metabolismo , Transcripción GenéticaRESUMEN
Oncogenic mutations are abundant in the tissues of healthy individuals, but rarely form tumours1-3. Yet, the underlying protection mechanisms are largely unknown. To resolve these mechanisms in mouse mammary tissue, we use lineage tracing to map the fate of wild-type and Brca1-/-;Trp53-/- cells, and find that both follow a similar pattern of loss and spread within ducts. Clonal analysis reveals that ducts consist of small repetitive units of self-renewing cells that give rise to short-lived descendants. This offers a first layer of protection as any descendants, including oncogenic mutant cells, are constantly lost, thereby limiting the spread of mutations to a single stem cell-descendant unit. Local tissue remodelling during consecutive oestrous cycles leads to the cooperative and stochastic loss and replacement of self-renewing cells. This process provides a second layer of protection, leading to the elimination of most mutant clones while enabling the minority that by chance survive to expand beyond the stem cell-descendant unit. This leads to fields of mutant cells spanning large parts of the epithelial network, predisposing it for transformation. Eventually, clone expansion becomes restrained by the geometry of the ducts, providing a third layer of protection. Together, these mechanisms act to eliminate most cells that acquire somatic mutations at the expense of driving the accelerated expansion of a minority of cells, which can colonize large areas, leading to field cancerization.
Asunto(s)
Transformación Celular Neoplásica , Glándulas Mamarias Animales , Mutación , Animales , Femenino , Ratones , Proteína BRCA1/deficiencia , Proteína BRCA1/genética , Proteína BRCA1/metabolismo , Linaje de la Célula/genética , Autorrenovación de las Células/genética , Transformación Celular Neoplásica/genética , Células Clonales/citología , Células Clonales/metabolismo , Células Clonales/patología , Glándulas Mamarias Animales/citología , Glándulas Mamarias Animales/patología , Glándulas Mamarias Animales/metabolismo , Proteína p53 Supresora de Tumor/deficiencia , Proteína p53 Supresora de Tumor/genética , Proteína p53 Supresora de Tumor/metabolismo , Ciclo Estral , Células Madre/citología , Células Madre/metabolismo , Células Madre/patologíaRESUMEN
BACKGROUND: Data from microbiomes from multiple niches is often collected, but methods to analyse these often ignore associations between niches. One interesting case is that of the oral microbiome. Its composition is receiving increasing attention due to reports on its associations with general health. While the oral cavity includes different niches, multi-niche microbiome data analysis is conducted using a single niche at a time and, therefore, ignores other niches that could act as confounding variables. Understanding the interaction between niches would assist interpretation of the results, and help improve our understanding of multi-niche microbiomes. METHODS: In this study, we used a machine learning technique called latent Dirichlet allocation (LDA) on two microbiome datasets consisting of several niches. LDA was used on both individual niches and all niches simultaneously. On individual niches, LDA was used to decompose each niche into bacterial sub-communities unveiling their taxonomic structure. These sub-communities were then used to assess the relationship between microbial niches using the global test. On all niches simultaneously, LDA allowed us to extract meaningful microbial patterns. Sets of co-occurring operational taxonomic units (OTUs) comprising those patterns were then used to predict the original location of each sample. RESULTS: Our approach showed that the per-niche sub-communities displayed a strong association between supragingival plaque and saliva, as well as between the anterior and posterior tongue. In addition, the LDA-derived microbial signatures were able to predict the original sample niche illustrating the meaningfulness of our sub-communities. For the multi-niche oral microbiome dataset we had an overall accuracy of 76%, and per-niche sensitivity of up to 83%. Finally, for a second multi-niche microbiome dataset from the entire body, microbial niches from the oral cavity displayed stronger associations to each other than with those from other parts of the body, such as niches within the vagina and the skin. CONCLUSION: Our LDA-based approach produces sets of co-occurring taxa that can describe niche composition. LDA-derived microbial signatures can also be instrumental in summarizing microbiome data, for both descriptions as well as prediction.
Asunto(s)
Microbiota , Femenino , Humanos , Boca/microbiología , Bacterias/genética , Saliva , Piel/microbiologíaRESUMEN
OBJECTIVES: To evaluate an artificial intelligence (AI)-assisted double reading system for detecting clinically relevant missed findings on routinely reported chest radiographs. METHODS: A retrospective study was performed in two institutions, a secondary care hospital and tertiary referral oncology centre. Commercially available AI software performed a comparative analysis of chest radiographs and radiologists' authorised reports using a deep learning and natural language processing algorithm, respectively. The AI-detected discrepant findings between images and reports were assessed for clinical relevance by an external radiologist, as part of the commercial service provided by the AI vendor. The selected missed findings were subsequently returned to the institution's radiologist for final review. RESULTS: In total, 25,104 chest radiographs of 21,039 patients (mean age 61.1 years ± 16.2 [SD]; 10,436 men) were included. The AI software detected discrepancies between imaging and reports in 21.1% (5289 of 25,104). After review by the external radiologist, 0.9% (47 of 5289) of cases were deemed to contain clinically relevant missed findings. The institution's radiologists confirmed 35 of 47 missed findings (74.5%) as clinically relevant (0.1% of all cases). Missed findings consisted of lung nodules (71.4%, 25 of 35), pneumothoraces (17.1%, 6 of 35) and consolidations (11.4%, 4 of 35). CONCLUSION: The AI-assisted double reading system was able to identify missed findings on chest radiographs after report authorisation. The approach required an external radiologist to review the AI-detected discrepancies. The number of clinically relevant missed findings by radiologists was very low. CLINICAL RELEVANCE STATEMENT: The AI-assisted double reader workflow was shown to detect diagnostic errors and could be applied as a quality assurance tool. Although clinically relevant missed findings were rare, there is potential impact given the common use of chest radiography. KEY POINTS: ⢠A commercially available double reading system supported by artificial intelligence was evaluated to detect reporting errors in chest radiographs (n=25,104) from two institutions. ⢠Clinically relevant missed findings were found in 0.1% of chest radiographs and consisted of unreported lung nodules, pneumothoraces and consolidations. ⢠Applying AI software as a secondary reader after report authorisation can assist in reducing diagnostic errors without interrupting the radiologist's reading workflow. However, the number of AI-detected discrepancies was considerable and required review by a radiologist to assess their relevance.
Asunto(s)
Inteligencia Artificial , Radiografía Torácica , Humanos , Radiografía Torácica/métodos , Persona de Mediana Edad , Masculino , Estudios Retrospectivos , Femenino , Errores Diagnósticos/prevención & control , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Aprendizaje Profundo , Procesamiento de Lenguaje Natural , Algoritmos , AncianoRESUMEN
BACKGROUND: CRISPR screens provide large-scale assessment of cellular gene functions. Pooled libraries typically consist of several single guide RNAs (sgRNAs) per gene, for a large number of genes, which are transduced in such a way that every cell receives at most one sgRNA, resulting in the disruption of a single gene in that cell. This approach is often used to investigate effects on cellular fitness, by measuring sgRNA abundance at different time points. Comparing gene knockout effects between different cell populations is challenging due to variable cell-type specific parameters and between replicates variation. Failure to take those into account can lead to inflated or false discoveries. RESULTS: We propose a new, flexible approach called ShrinkCRISPR that can take into account multiple sources of variation. Impact on cellular fitness between conditions is inferred by using a mixed-effects model, which allows to test for gene-knockout effects while taking into account sgRNA-specific variation. Estimates are obtained using an empirical Bayesian approach. ShrinkCRISPR can be applied to a variety of experimental designs, including multiple factors. In simulation studies, we compared ShrinkCRISPR results with those of drugZ and MAGeCK, common methods used to detect differential effect on cell fitness. ShrinkCRISPR yielded as many true discoveries as drugZ using a paired screen design, and outperformed both drugZ and MAGeCK for an independent screen design. Although conservative, ShrinkCRISPR was the only approach that kept false discoveries under control at the desired level, for both designs. Using data from several publicly available screens, we showed that ShrinkCRISPR can take data for several time points into account simultaneously, helping to detect early and late differential effects. CONCLUSIONS: ShrinkCRISPR is a robust and flexible approach, able to incorporate different sources of variations and to test for differential effect on cell fitness at the gene level. These improve power to find effects on cell fitness, while keeping multiple testing under the correct control level and helping to improve reproducibility. ShrinkCrispr can be applied to different study designs and incorporate multiple time points, making it a complete and reliable tool to analyze CRISPR screen data.
Asunto(s)
Sistemas CRISPR-Cas , Sistemas CRISPR-Cas/genética , Reproducibilidad de los Resultados , Teorema de Bayes , Técnicas de Inactivación de GenesRESUMEN
Statistical methods to test for effects of single nucleotide polymorphisms (SNPs) on exon inclusion exist but often rely on testing of associations between multiple exon-SNP pairs, with sometimes subsequent summarization of results at the gene level. Such approaches require heavy multiple testing corrections and detect mostly events with large effect sizes. We propose here a test to find spliceQTL (splicing quantitative trait loci) effects that takes all exons and all SNPs into account simultaneously. For any chosen gene, this score-based test looks for an association between the set of exon expressions and the set of SNPs, via a random-effects model framework. It is efficient to compute and can be used if the number of SNPs is larger than the number of samples. In addition, the test is powerful in detecting effects that are relatively small for individual exon-SNP pairs but are observed for many pairs. Furthermore, test results are more often replicated across datasets than pairwise testing results. This makes our test more robust to exon-SNP pair-specific effects, which do not extend to multiple pairs within the same gene. We conclude that the test we propose here offers more power and better replicability in the search for spliceQTL effects.
Asunto(s)
Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Estudio de Asociación del Genoma Completo/métodosRESUMEN
Sensor drift is a well-known disadvantage of electronic nose (eNose) technology and may affect the accuracy of diagnostic algorithms. Correction for this phenomenon is not routinely performed. The aim of this study was to investigate the influence of eNose sensor drift on the development of a disease-specific algorithm in a real-life cohort of inflammatory bowel disease patients (IBD). In this multi-center cohort, patients undergoing colonoscopy collected a fecal sample prior to bowel lavage. Mucosal disease activity was assessed based on endoscopy. Controls underwent colonoscopy for various reasons and had no endoscopic abnormalities. Fecal eNose profiles were measured using Cyranose 320®. Fecal samples of 63 IBD patients and 63 controls were measured on four subsequent days. Sensor data displayed associations with date of measurement, which was reproducible across all samples irrespective of disease state, disease activity state, disease localization and diet of participants. Based on logistic regression, corrections for sensor drift improved accuracy to differentiate between IBD patients and controls based on the significant differences of six sensors (p = 0.004; p < 0.001; p = 0.001; p = 0.028; p < 0.001 and p = 0.005) with an accuracy of 0.68. In this clinical study, short-term sensor drift affected fecal eNose profiles more profoundly than clinical features. These outcomes emphasize the importance of sensor drift correction to improve reliability and repeatability, both within and across eNose studies.
Asunto(s)
Enfermedades Inflamatorias del Intestino , Compuestos Orgánicos Volátiles , Humanos , Pruebas Respiratorias , Espiración , Reproducibilidad de los Resultados , Nariz Electrónica , Enfermedades Inflamatorias del Intestino/diagnósticoRESUMEN
The progression of anchorage-dependent epithelial cells to anchorage-independent growth represents a critical hallmark of malignant transformation. Using an in vitro model of human papillomavirus (HPV)-induced transformation, we previously showed that acquisition of anchorage-independent growth is associated with marked (epi)genetic changes, including altered expression of microRNAs. However, the laborious nature of the conventional growth method in soft agar to measure this phenotype hampers a high-throughput analysis. We developed alternative functional screening methods using 96- and 384-well ultra-low attachment plates to systematically investigate microRNAs regulating anchorage-independent growth. SiHa cervical cancer cells were transfected with a microRNA mimic library (n = 2019) and evaluated for cell viability. We identified 84 microRNAs that consistently suppressed growth in three independent experiments. Further validation in three cell lines and comparison of growth in adherent and ultra-low attachment plates yielded 40 microRNAs that specifically reduced anchorage-independent growth. In conclusion, ultra-low attachment plates are a promising alternative for soft-agar assays to study anchorage-independent growth and are suitable for high-throughput functional screening. Anchorage independence suppressing microRNAs identified through our screen were successfully validated in three cell lines. These microRNAs may provide specific biomarkers for detecting and treating HPV-induced precancerous lesions progressing to invasive cancer, the most critical stage during cervical cancer development.
Asunto(s)
Alphapapillomavirus , MicroARNs , Infecciones por Papillomavirus , Neoplasias del Cuello Uterino , Agar , Alphapapillomavirus/genética , Transformación Celular Neoplásica/genética , Femenino , Humanos , MicroARNs/genética , MicroARNs/metabolismo , Papillomaviridae/genética , Infecciones por Papillomavirus/metabolismo , Neoplasias del Cuello Uterino/patologíaRESUMEN
BACKGROUND: Coeliac disease (CD) has an estimated prevalence of â¼1% in Europe with a significant gap between undiagnosed and diagnosed CD. Active case finding may help to bridge this gap yet the diagnostic yield of such active case finding in general practice by serological testing is unknown. OBJECTIVE: The aim of this study was to determine (1) the frequency of diagnosed CD in the general population, and (2) to investigate the yield of active case finding by general practitioners. METHODS: Electronic medical records of 207.200 patients registered in 49 general practices in The Netherlands in 2016 were analysed. An extensive search strategy, based on International Classification of Primary Care codes, free text and diagnostic test codes was performed to search CD- or gluten-related contacts. RESULTS: The incidence of CD diagnosis in general practice in 2016 was 0.01%. The prevalence of diagnosed CD reported in the general practice in the Netherlands was 0.19%, and considerably higher than previously reported in the general population. During the one year course of the study 0.95% of the population had a gluten-related contact with their GP; most of them (72%) were prompted by gastrointestinal complaints. Serological testing was performed in 66% (n = 1296) of these patients and positive in only 1.6% (n = 21). CONCLUSION: The number of diagnosed CD patients in the Netherlands is substantially higher than previously reported. This suggests that the gap between diagnosed and undiagnosed patients is lower than generally assumed. This may explain that despite a high frequency of gluten-related consultations in general practice the diagnostic yield of case finding by serological testing is low.Key pointsThe diagnostic approach of GPs regarding CD and the diagnostic yield is largely unknownCase finding in a primary health care practice has a low yield of 1.6%CD testing was mostly prompted by consultation for gastrointestinal symptomsThere is a heterogeneity in types of serological test performed in primary care.
Asunto(s)
Enfermedad Celíaca , Médicos Generales , Enfermedad Celíaca/diagnóstico , Enfermedad Celíaca/epidemiología , Humanos , Incidencia , Derivación y Consulta , Pruebas SerológicasRESUMEN
Resistance to chemotherapy is widely recognized as one of the major factors limiting therapeutic efficacy and influences clinical outcomes in patients with cancer. Many studies on various tumor types have focused on combining standard-of-care chemotherapy with immunotherapy. However, for cervical cancer, the role of neoadjuvant chemotherapy (NACT) on the local immune microenvironment is largely unexplored. We performed a pilot study on 13 primary cervical tumor samples, before and after NACT, to phenotype and enumerate tumor-infiltrating T-cell subpopulations using multiplex immunohistochemistry (CD3, CD8, FoxP3, Ki67, and Tbet) and automated co-expression analysis software. A significant decrease in proliferating (Ki67+) CD3+CD8- T cells and FoxP3+(CD3+CD8-) regulatory T cells was observed in the tumor stroma after cisplatin and paclitaxel treatment, with increased rates of cytotoxic CD8+ T cells, including activated and CD8+Tbet+ T cells. No effect was observed on the number of tumor-infiltrating T cells in the cervical tumor microenvironment after treatment with cisplatin only. Therefore, we conclude that patients treated with cisplatin and paclitaxel had more tumor-infiltrating T-cell modulation than patients treated with cisplatin monotherapy. These findings enhance our understanding of the immune-modulating effect of chemotherapy and warrant future combination of the standard-of-care therapy with immunotherapy to improve clinical outcome in patients with cervical cancer.
Asunto(s)
Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Linfocitos Infiltrantes de Tumor/inmunología , Terapia Neoadyuvante/métodos , Linfocitos T Reguladores/inmunología , Neoplasias del Cuello Uterino/tratamiento farmacológico , Neoplasias del Cuello Uterino/inmunología , Adulto , Quimioterapia Adyuvante , Cisplatino/administración & dosificación , Femenino , Estudios de Seguimiento , Humanos , Masculino , Persona de Mediana Edad , Paclitaxel/administración & dosificación , Proyectos Piloto , Pronóstico , Estudios Retrospectivos , Neoplasias del Cuello Uterino/patología , Adulto JovenRESUMEN
Integrative analysis of copy number and gene expression data can help in understanding the cis and trans effect of copy number aberrations on transcription levels of genes involved in a pathway. To analyse how these copy number mediated gene-gene interactions differ between groups of samples we propose a new method, named dNET. Our method uses ridge regression to model the network topology involving one gene's expression level, its gene dosage and the expression levels of other genes in the network. The interaction parameters are estimated by fitting the model per gene for all samples together. However, instead of testing for differential network topology per gene, dNET tests for an overall difference in estimated parameters between two groups of samples and produces a single p-value. With the help of several simulation studies, we show that dNET can detect differential network nodes with high accuracy and low rate of false positives even in the presence of differential cis effects. We also apply dNET to publicly available TCGA cancer datasets and identify pathways where copy number mediated gene-gene interactions differ between samples with cancer stage lower than stage 3 and samples with cancer stage 3 or above.
Asunto(s)
Simulación por Computador , Variaciones en el Número de Copia de ADN , Dosificación de Gen , Regulación Neoplásica de la Expresión Génica , Modelos Teóricos , Neoplasias/genética , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , HumanosRESUMEN
BACKGROUND: Reproducibility of hits from independent CRISPR or siRNA screens is poor. This is partly due to data normalization primarily addressing technical variability within independent screens, and not the technical differences between them. RESULTS: We present "rscreenorm", a method that standardizes the functional data ranges between screens using assay controls, and subsequently performs a piecewise-linear normalization to make data distributions across all screens comparable. In simulation studies, rscreenorm reduces false positives. Using two multiple-cell lines siRNA screens, rscreenorm increased reproducibility between 27 and 62% for hits, and up to 5-fold for non-hits. Using publicly available CRISPR-Cas screen data, application of commonly used median centering yields merely 34% of overlapping hits, in contrast with rscreenorm yielding 84% of overlapping hits. Furthermore, rscreenorm yielded at most 8% discordant results, whilst median-centering yielded as much as 55%. CONCLUSIONS: Rscreenorm yields more consistent results and keeps false positive rates under control, improving reproducibility of genetic screens data analysis from multiple cell lines.
Asunto(s)
Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas/genética , Pruebas Genéticas/métodos , Genómica/métodos , ARN Interferente Pequeño/genética , Humanos , Reproducibilidad de los ResultadosRESUMEN
In high-dimensional omics studies where multiple molecular profiles are obtained for each set of patients, there is often interest in identifying complex multivariate associations, for example, copy number regulated expression levels in a certain pathway or in a genomic region. To detect such associations, we present a novel approach to test for association between two sets of variables. Our approach generalizes the global test, which tests for association between a group of covariates and a single univariate response, to allow high-dimensional multivariate response. We apply the method to several simulated datasets as well as two publicly available datasets, where we compare the performance of multivariate global test (G2) with univariate global test. The method is implemented in R and will be available as a part of the globaltest package in R.
Asunto(s)
Biología Computacional/métodos , Interpretación Estadística de Datos , Simulación por Computador , Perfilación de la Expresión Génica , Genómica , Humanos , Análisis Multivariante , Programas InformáticosRESUMEN
BACKGROUND: Testing for association between RNA-Seq and other genomic data is challenging due to high variability of the former and high dimensionality of the latter. RESULTS: Using the negative binomial distribution and a random-effects model, we develop an omnibus test that overcomes both difficulties. It may be conceptualised as a test of overall significance in regression analysis, where the response variable is overdispersed and the number of explanatory variables exceeds the sample size. CONCLUSIONS: The proposed test can detect genetic and epigenetic alterations that affect gene expression. It can examine complex regulatory mechanisms of gene expression. The R package globalSeq is available from Bioconductor.
Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias de la Próstata/genética , ARN/genética , Análisis de Secuencia de ARN/métodos , Humanos , Masculino , Análisis de RegresiónRESUMEN
BACKGROUND: It has been shown that a random-effects framework can be used to test the association between a gene's expression level and the number of DNA copies of a set of genes. This gene-set modelling framework was later applied to find associations between mRNA expression and microRNA expression, by defining the gene sets using target prediction information. METHODS AND RESULTS: Here, we extend the model introduced by Menezes et al. 2009 to consider the effect of not just copy number, but also of other molecular profiles such as methylation changes and loss-of-heterozigosity (LOH), on gene expression levels. We will consider again sets of measurements, to improve robustness of results and increase the power to find associations. Our approach can be used genome-wide to find associations and yields a test to help separate true associations from noise. We apply our method to colon and to breast cancer samples, for which genome-wide copy number, methylation and gene expression profiles are available. Our findings include interesting gene expression-regulating mechanisms, which may involve only one of copy number or methylation, or both for the same samples. We even are able to find effects due to different molecular mechanisms in different samples. CONCLUSIONS: Our method can equally well be applied to cases where other types of molecular (high-dimensional) data are collected, such as LOH, SNP genotype and microRNA expression data. Computationally efficient, it represents a flexible and powerful tool to study associations between high-dimensional datasets. The method is freely available via the SIM BioConductor package.
Asunto(s)
Neoplasias de la Mama/genética , Neoplasias del Colon/genética , Biología Computacional/métodos , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Transcriptoma , Simulación por Computador , Metilación de ADN , Femenino , Dosificación de Gen , Genotipo , Humanos , Pérdida de Heterocigocidad , Polimorfismo de Nucleótido SimpleRESUMEN
Current microRNA target predictions are based on sequence information and empirically derived rules but do not make use of the expression of microRNAs and their targets. This study aimed to improve microRNA target predictions in a given biological context, using in silico predictions, microRNA and mRNA expression. We used target prediction tools to produce lists of predicted targets and used a gene set test designed to detect consistent effects of microRNAs on the joint expression of multiple targets. In a single test, association between microRNA expression and target gene set expression as well as the contribution of the individual target genes on the association are determined. The strongest negatively associated mRNAs as measured by the test were prioritized. We applied our integration method to a well-defined muscle differentiation model. Validation of our predictions in C2C12 cells confirmed predicted targets of known as well as novel muscle-related microRNAs. We further studied associations between microRNA-mRNA pairs in human prostate cancer, finding some pairs that have been recently experimentally validated by others. Using the same study, we showed the advantages of the global test over Pearson correlation and lasso. We conclude that our integrated approach successfully identifies regulated microRNAs and their targets.
Asunto(s)
Regulación Neoplásica de la Expresión Génica , MicroARNs/análisis , Mioblastos Esqueléticos/metabolismo , ARN Mensajero/análisis , Programas Informáticos , Regiones no Traducidas 3' , Algoritmos , Animales , Diferenciación Celular , Humanos , Masculino , Ratones , MicroARNs/genética , Mioblastos Esqueléticos/citología , Neoplasias de la Próstata/metabolismo , Neoplasias de la Próstata/patología , ARN Mensajero/genética , TranscriptomaRESUMEN
BACKGROUND: A number of statistical models has been proposed for studying the association between gene expression and copy number data in integrated analysis. The next step is to compare association patterns between different groups of samples. RESULTS: We propose a method, named dSIM, to find differences in association between copy number and gene expression, when comparing two groups of samples. Firstly, we use ridge regression to correct for the baseline associations between copy number and gene expression. Secondly, the global test is applied to the corrected data in order to find differences in association patterns between two groups of samples. We show that dSIM detects differences even in small genomic regions in a simulation study. We also apply dSIM to two publicly available breast cancer datasets and identify chromosome arms where copy number led gene expression regulation differs between positive and negative estrogen receptor samples. In spite of differing genomic coverage, some selected arms are identified in both datasets. CONCLUSION: We developed a flexible and robust method for studying association differences between two groups of samples while integrating genomic data from different platforms. dSIM can be used with most types of microarray/sequencing data, including methylation and microRNA expression. The method is implemented in R and will be made part of the BioConductor package SIM.
Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Femenino , Dosificación de Gen/genética , Humanos , Receptores de Estrógenos/metabolismoRESUMEN
In the design of microarray or next-generation sequencing experiments it is crucial to choose the appropriate number of biological replicates. As often the number of differentially expressed genes and their effect sizes are small and too few replicates will lead to insufficient power to detect these. On the other hand, too many replicates unnecessary leads to high experimental costs. Power and sample size analysis can guide experimentalist in choosing the appropriate number of biological replicates. Several methods for power and sample size analysis have recently been proposed for microarray data. However, most of these are restricted to two group comparisons and require user-defined effect sizes. Here we propose a pilot-data based method for power and sample size analysis which can handle more general experimental designs and uses pilot-data to obtain estimates of the effect sizes. The method can also handle χ2 distributed test statistics which enables power and sample size calculations for a much wider class of models, including high-dimensional generalized linear models which are used, e.g., for RNA-seq data analysis. The performance of the method is evaluated using simulated and experimental data from several microarray and next-generation sequencing experiments. Furthermore, we compare our proposed method for estimation of the density of effect sizes from pilot data with a recent proposed method specific for two group comparisons.
Asunto(s)
Perfilación de la Expresión Génica/métodos , Algoritmos , Animales , Interpretación Estadística de Datos , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Modelos Lineales , Modelos Genéticos , Análisis de Secuencia por Matrices de Oligonucleótidos , Tamaño de la Muestra , Análisis de Secuencia de ARNRESUMEN
This paper presents an efficient algorithm based on the combination of Newton Raphson and Gradient Ascent, for using the fused lasso regression method to construct a genome-based classifier. The characteristic structure of copy number data suggests that feature selection should take genomic location into account for producing more interpretable results for genome-based classifiers. The fused lasso penalty, an extension of the lasso penalty, encourages sparsity of the coefficients and their differences by penalizing the L1-norm for both of them at the same time, thus using genomic location. The major advantage of the algorithm over other existing fused lasso optimization techniques is its ability to predict binomial as well as survival response efficiently. We apply our algorithm to two publicly available datasets in order to predict survival and binary outcomes.
Asunto(s)
Algoritmos , Biometría/métodos , Dosificación de Gen , Supervivencia sin Enfermedad , Humanos , Mieloma Múltiple/epidemiología , Mieloma Múltiple/genética , Modelos de Riesgos Proporcionales , Neoplasias de la Vejiga Urinaria/epidemiología , Neoplasias de la Vejiga Urinaria/genéticaRESUMEN
The primary objective of the prospective, randomized, multicenter, phase 3 biomarker Microarray Analysis in breast cancer to Taylor Adjuvant Drugs Or Regimens trial (MATADOR: ISRCTN61893718) is to generate a gene expression profile that can predict benefit from either docetaxel, doxorubicin, and cyclophosphamide (TAC) or dose-dense scheduled doxorubicin and cyclophosphamide (ddAC). Patients with a pT1-3, pN0-3 tumor were randomized 1:1 between ddAC and TAC. The primary endpoint was a gene profile-treatment interaction for recurrence-free survival (RFS). We observed 117 RFS events in 664 patients with a median follow-up of 7 years. Hallmark gene set analyses showed significant association between enrichment in immune-related gene expression and favorable outcome after TAC in hormone receptor-negative, human epidermal growth factor receptor 2 (HER2)-negative breast cancer (BC) (triple-negative breast cancer [TNBC]). We validated this association in TNBC patients treated with TAC on H&E slides; stromal tumor-infiltrating lymphocytes (sTILs) ≥20% was associated with longer RFS (hazard ratio 0.18, p = 0.01), while in patients treated with ddAC no difference in RFS was seen (hazard ratio 0.92, p = 0.86, p interaction = 0.02).