RESUMO
While intestinal Th17 cells are critical for maintaining tissue homeostasis, recent studies have implicated their roles in the development of extra-intestinal autoimmune diseases including multiple sclerosis. However, the mechanisms by which tissue Th17 cells mediate these dichotomous functions remain unknown. Here, we characterized the heterogeneity, plasticity, and migratory phenotypes of tissue Th17 cells in vivo by combined fate mapping with profiling of the transcriptomes and TCR clonotypes of over 84,000 Th17 cells at homeostasis and during CNS autoimmune inflammation. Inter- and intra-organ single-cell analyses revealed a homeostatic, stem-like TCF1+ IL-17+ SLAMF6+ population that traffics to the intestine where it is maintained by the microbiota, providing a ready reservoir for the IL-23-driven generation of encephalitogenic GM-CSF+ IFN-γ+ CXCR6+ T cells. Our study defines a direct in vivo relationship between IL-17+ non-pathogenic and GM-CSF+ and IFN-γ+ pathogenic Th17 populations and provides a mechanism by which homeostatic intestinal Th17 cells direct extra-intestinal autoimmune disease.
Assuntos
Autoimunidade , Intestinos/imunologia , Células-Tronco/metabolismo , Células Th17/imunologia , Animais , Movimento Celular , Células Clonais , Encefalomielite Autoimune Experimental/imunologia , Fator Estimulador de Colônias de Granulócitos e Macrófagos/metabolismo , Homeostase , Humanos , Interferon gama/metabolismo , Interleucina-17/metabolismo , Camundongos Endogâmicos C57BL , Especificidade de Órgãos , RNA/metabolismo , RNA-Seq , Receptores de Antígenos de Linfócitos T/metabolismo , Receptores CXCR6/metabolismo , Receptores de Interleucina/metabolismo , Reprodutibilidade dos Testes , Família de Moléculas de Sinalização da Ativação Linfocitária/metabolismo , Análise de Célula Única , Baço/metabolismoRESUMO
Widespread changes to DNA methylation and chromatin are well documented in cancer, but the fate of higher-order chromosomal structure remains obscure. Here we integrated topological maps for colon tumors and normal colons with epigenetic, transcriptional, and imaging data to characterize alterations to chromatin loops, topologically associated domains, and large-scale compartments. We found that spatial partitioning of the open and closed genome compartments is profoundly compromised in tumors. This reorganization is accompanied by compartment-specific hypomethylation and chromatin changes. Additionally, we identify a compartment at the interface between the canonical A and B compartments that is reorganized in tumors. Remarkably, similar shifts were evident in non-malignant cells that have accumulated excess divisions. Our analyses suggest that these topological changes repress stemness and invasion programs while inducing anti-tumor immunity genes and may therefore restrain malignant progression. Our findings call into question the conventional view that tumor-associated epigenomic alterations are primarily oncogenic.
Assuntos
Cromatina/metabolismo , Cromossomos/metabolismo , Neoplasias Colorretais/genética , Neoplasias Colorretais/metabolismo , Metilação de DNA , Epigênese Genética , Regulação Neoplásica da Expressão Gênica/genética , Divisão Celular , Senescência Celular/genética , Sequenciamento de Cromatina por Imunoprecipitação , Cromossomos/genética , Estudos de Coortes , Neoplasias Colorretais/mortalidade , Neoplasias Colorretais/patologia , Biologia Computacional , Metilação de DNA/genética , Epigenômica , Células HCT116 , Humanos , Hibridização in Situ Fluorescente , Microscopia Eletrônica de Transmissão , Simulação de Dinâmica Molecular , RNA-Seq , Análise Espacial , Proteínas Supressoras de Tumor/genética , Proteínas Supressoras de Tumor/metabolismoRESUMO
Nuclear hormone receptors (NRs) are ligand-binding transcription factors that are widely targeted therapeutically. Agonist binding triggers NR activation and subsequent degradation by unknown ligand-dependent ubiquitin ligase machinery. NR degradation is critical for therapeutic efficacy in malignancies that are driven by retinoic acid and estrogen receptors. Here, we demonstrate the ubiquitin ligase UBR5 drives degradation of multiple agonist-bound NRs, including the retinoic acid receptor alpha (RARA), retinoid x receptor alpha (RXRA), glucocorticoid, estrogen, liver-X, progesterone, and vitamin D receptors. We present the high-resolution cryo-EMstructure of full-length human UBR5 and a negative stain model representing its interaction with RARA/RXRA. Agonist ligands induce sequential, mutually exclusive recruitment of nuclear coactivators (NCOAs) and UBR5 to chromatin to regulate transcriptional networks. Other pharmacological ligands such as selective estrogen receptor degraders (SERDs) degrade their receptors through differential recruitment of UBR5 or RNF111. We establish the UBR5 transcriptional regulatory hub as a common mediator and regulator of NR-induced transcription.
Assuntos
Cromatina , Fatores de Transcrição , Humanos , Ligantes , Cromatina/genética , Fatores de Transcrição/metabolismo , Receptores Citoplasmáticos e Nucleares/genética , Ubiquitinas , Ubiquitina-Proteína Ligases/genéticaRESUMO
Unsupervised clustering of single-cell RNA-sequencing data enables the identification of distinct cell populations. However, the most widely used clustering algorithms are heuristic and do not formally account for statistical uncertainty. We find that not addressing known sources of variability in a statistically rigorous manner can lead to overconfidence in the discovery of novel cell types. Here we extend a previous method, significance of hierarchical clustering, to propose a model-based hypothesis testing approach that incorporates significance analysis into the clustering algorithm and permits statistical evaluation of clusters as distinct cell populations. We also adapt this approach to permit statistical assessment on the clusters reported by any algorithm. Finally, we extend these approaches to account for batch structure. We benchmarked our approach against popular clustering workflows, demonstrating improved performance. To show practical utility, we applied our approach to the Human Lung Cell Atlas and an atlas of the mouse cerebellar cortex, identifying several cases of over-clustering and recapitulating experimentally validated cell type definitions.
Assuntos
Algoritmos , Benchmarking , Humanos , Animais , Camundongos , Análise por Conglomerados , RNA , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodosRESUMO
Engineered cellular therapy with CD19-targeting chimeric antigen receptor T-cells (CAR-T) has revolutionized outcomes for patients with relapsed/refractory Large B-Cell Lymphoma (LBCL), but the cellular and molecular features associated with response remain largely unresolved. We analyzed serial peripheral blood samples ranging from day of apheresis (day -28/baseline) to 28 days after CAR-T infusion from 50 patients with LBCL treated with axicabtagene ciloleucel (axi-cel) by integrating single cell RNA and TCR sequencing (scRNA-seq/scTCR-seq), flow cytometry, and mass cytometry (CyTOF) to characterize features associated with response to CAR-T. Pretreatment patient characteristics associated with response included presence of B cells and increased lymphocyte-to-monocyte ratio (ALC/AMC). Infusion products from responders were enriched for clonally expanded, highly activated CD8+ T cells. We expanded these observations to 99 patients from the ZUMA-1 cohort and identified a subset of patients with elevated baseline B cells, 80% of whom were complete responders. We integrated B cell proportion ï³0.5% and ALC/AMC ï³1.2 into a two-factor predictive model and applied this model to the ZUMA-1 cohort. Estimated progression free survival (PFS) at 1 year in patients meeting one or both criteria was 65% versus 31% for patients meeting neither criterion. Our results suggest that patients' immunologic state at baseline affects likelihood of response to CAR-T through both modulation of the T cell apheresis product composition and promoting a more favorable circulating immune compartment prior to therapy. These baseline immunologic features, measured readily in the clinical setting prior to CAR-T, can be applied to predict response to therapy.
RESUMO
Although antibodies targeting specific tumor-expressed antigens are the standard of care for some cancers, the identification of cancer-specific targets amenable to antibody binding has remained a bottleneck in development of new therapeutics. To overcome this challenge, we developed a high-throughput platform that allows for the unbiased, simultaneous discovery of antibodies and targets based on phenotypic binding profiles. Applying this platform to ovarian cancer, we identified a wide diversity of cancer targets including receptor tyrosine kinases, adhesion and migration proteins, proteases and proteins regulating angiogenesis in a single round of screening using genomics, flow cytometry, and mass spectrometry. In particular, we identified BCAM as a promising candidate for targeted therapy in high-grade serous ovarian cancers. More generally, this approach provides a rapid and flexible framework to identify cancer targets and antibodies.
Assuntos
Neoplasias Ovarianas , Biblioteca de Peptídeos , Humanos , Feminino , Linhagem Celular Tumoral , Anticorpos , Neoplasias Ovarianas/genética , Antígenos de NeoplasiasRESUMO
A central problem in spatial transcriptomics is detecting differentially expressed (DE) genes within cell types across tissue context. Challenges to learning DE include changing cell type composition across space and measurement pixels detecting transcripts from multiple cell types. Here, we introduce a statistical method, cell type-specific inference of differential expression (C-SIDE), that identifies cell type-specific DE in spatial transcriptomics, accounting for localization of other cell types. We model gene expression as an additive mixture across cell types of log-linear cell type-specific expression functions. C-SIDE's framework applies to many contexts: DE due to pathology, anatomical regions, cell-to-cell interactions and cellular microenvironment. Furthermore, C-SIDE enables statistical inference across multiple/replicates. Simulations and validation experiments on Slide-seq, MERFISH and Visium datasets demonstrate that C-SIDE accurately identifies DE with valid uncertainty quantification. Last, we apply C-SIDE to identify plaque-dependent immune activity in Alzheimer's disease and cellular interactions between tumor and immune cells. We distribute C-SIDE within the R package https://github.com/dmcable/spacexr .
Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Perfilação da Expressão Gênica/métodosRESUMO
Pharmacogenomic experiments allow for the systematic testing of drugs, at varying dosage concentrations, to study how genomic markers correlate with cell sensitivity to treatment. The first step in the analysis is to quantify the response of cell lines to variable dosage concentrations of the drugs being tested. The signal to noise in these measurements can be low due to biological and experimental variability. However, the increasing availability of pharmacogenomic studies provides replicated data sets that can be leveraged to gain power. To do this, we formulate a hierarchical mixture model to estimate the drug-specific mixture distributions for estimating cell sensitivity and for assessing drug effect type as either broad or targeted effect. We use this formulation to propose a unified approach that can yield posterior probability of a cell being susceptible to a drug conditional on being a targeted effect or relative effect sizes conditioned on the cell being broad. We demonstrate the usefulness of our approach via case studies. First, we assess pairwise agreements for cell lines/drugs within the intersection of two data sets and confirm the moderate pairwise agreement between many publicly available pharmacogenomic data sets. We then present an analysis that identifies sensitivity to the drug crizotinib for cells harboring EML4-ALK or NPM1-ALK gene fusions, as well as significantly down-regulated cell-matrix pathways associated with crizotinib sensitivity.
Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Crizotinibe/uso terapêutico , Carcinoma Pulmonar de Células não Pequenas/tratamento farmacológico , Carcinoma Pulmonar de Células não Pequenas/genética , Neoplasias Pulmonares/genética , Farmacogenética , Modelos Estatísticos , Receptores Proteína Tirosina Quinases/genética , Receptores Proteína Tirosina Quinases/uso terapêuticoRESUMO
Single-cell RNA sequencing (scRNA-seq) quantifies gene expression for individual cells in a sample, which allows distinct cell-type populations to be identified and characterized. An important step in many scRNA-seq analysis pipelines is the annotation of cells into known cell types. While this can be achieved using experimental techniques, such as fluorescence-activated cell sorting, these approaches are impractical for large numbers of cells. This motivates the development of data-driven cell-type annotation methods. We find limitations with current approaches due to the reliance on known marker genes or from overfitting because of systematic differences, or batch effects, between studies. Here, we present a statistical approach that leverages public data sets to combine information across thousands of genes, uses a latent variable model to define cell-type-specific barcodes and account for batch effect variation, and probabilistically annotates cell-type identity from a reference of known cell types. The barcoding approach also provides a new way to discover marker genes. Using a range of data sets, including those generated to represent imperfect real-world reference data, we demonstrate that our approach substantially outperforms current reference-based methods, particularly when predicting across studies.
Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Humanos , RNA-Seq , Análise de Sequência de RNA/métodos , SoftwareRESUMO
High-dimensional biological data collection across heterogeneous groups of samples has become increasingly common, creating high demand for dimensionality reduction techniques that capture underlying structure of the data. Discovering low-dimensional embeddings that describe the separation of any underlying discrete latent structure in data is an important motivation for applying these techniques since these latent classes can represent important sources of unwanted variability, such as batch effects, or interesting sources of signal such as unknown cell types. The features that define this discrete latent structure are often hard to identify in high-dimensional data. Principal component analysis (PCA) is one of the most widely used methods as an unsupervised step for dimensionality reduction. This reduction technique finds linear transformations of the data which explain total variance. When the goal is detecting discrete structure, PCA is applied with the assumption that classes will be separated in directions of maximum variance. However, PCA will fail to accurately find discrete latent structure if this assumption does not hold. Visualization techniques, such as t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), attempt to mitigate these problems with PCA by creating a low-dimensional space where similar objects are modeled by nearby points in the low-dimensional embedding and dissimilar objects are modeled by distant points with high probability. However, since t-SNE and UMAP are computationally expensive, often a PCA reduction is done before applying them which makes it sensitive to PCAs downfalls. Also, tSNE is limited to only two or three dimensions as a visualization tool, which may not be adequate for retaining discriminatory information. The linear transformations of PCA are preferable to non-linear transformations provided by methods like t-SNE and UMAP for interpretable feature weights. Here, we propose iterative discriminant analysis (iDA), a dimensionality reduction technique designed to mitigate these limitations. iDA produces an embedding that carries discriminatory information which optimally separates latent clusters using linear transformations that permit post hoc analysis to determine features that define these latent structures.
Assuntos
Algoritmos , Humanos , Análise de Componente PrincipalRESUMO
Population displacement may occur after natural disasters, permanently altering the demographic composition of the affected regions. Measuring this displacement is vital for both optimal postdisaster resource allocation and calculation of measures of public health interest such as mortality estimates. Here, we analyzed data generated by mobile phones and social media to estimate the weekly island-wide population at risk and within-island geographic heterogeneity of migration in Puerto Rico after Hurricane Maria. We compared these two data sources with population estimates derived from air travel records and census data. We observed a loss of population across all data sources throughout the study period; however, the magnitude and dynamics differ by the data source. Census data predict a population loss of just over 129,000 from July 2017 to July 2018, a 4% decrease; air travel data predict a population loss of 168,295 for the same period, a 5% decrease; mobile phone-based estimates predict a loss of 235,375 from July 2017 to May 2018, an 8% decrease; and social media-based estimates predict a loss of 476,779 from August 2017 to August 2018, a 17% decrease. On average, municipalities with a smaller population size lost a bigger proportion of their population. Moreover, we infer that these municipalities experienced greater infrastructure damage as measured by the proportion of unknown locations stemming from these regions. Finally, our analysis measures a general shift of population from rural to urban centers within the island. Passively collected data provide a promising supplement to current at-risk population estimation procedures; however, each data source has its own biases and limitations.
RESUMO
We introduce mirTarRnaSeq, an R/Bioconductor package for quantitative assessment of miRNA-mRNA relationships within sample cohorts. mirTarRnaSeq is a statistical package to explore predicted or pre-hypothesized miRNA-mRNA relationships following target prediction.We present two use cases applying mirTarRnaSeq. First, to identify miRNA targets, we examined EBV miRNAs for interaction with human and virus transcriptomes of stomach adenocarcinoma. This revealed enrichment of mRNA targets highly expressed in CD105+ endothelial cells, monocytes, CD4+ T cells, NK cells, CD19+ B cells, and CD34 cells. Next, to investigate miRNA-mRNA relationships in SARS-CoV-2 (COVID-19) infection across time, we used paired miRNA and RNA sequenced datasets of SARS-CoV-2 infected lung epithelial cells across three time points (4, 12, and 24 hours post-infection). mirTarRnaSeq identified evidence for human miRNAs targeting cytokine signaling and neutrophil regulation immune pathways from 4 to 24 hours after SARS-CoV-2 infection. Confirming the clinical relevance of these predictions, three of the immune specific mRNA-miRNA relationships identified in human lung epithelial cells after SARS-CoV-2 infection were also observed to be differentially expressed in blood from patients with COVID-19. Overall, mirTarRnaSeq is a robust tool that can address a wide-range of biological questions providing improved prediction of miRNA-mRNA interactions.
Assuntos
COVID-19 , MicroRNAs , COVID-19/genética , Células Endoteliais , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , SARS-CoV-2RESUMO
In the post-genomic era, thousands of putative noncoding regulatory regions have been identified, such as enhancers, promoters, long noncoding RNAs (lncRNAs), and a cadre of small peptides. These ever-growing catalogs require high-throughput assays to test their functionality at scale. Massively parallel reporter assays have greatly enhanced the understanding of noncoding DNA elements en masse Here, we present a massively parallel RNA assay (MPRNA) that can assay 10,000 or more RNA segments for RNA-based functionality. We applied MPRNA to identify RNA-based nuclear localization domains harbored in lncRNAs. We examined a pool of 11,969 oligos densely tiling 38 human lncRNAs that were fused to a cytosolic transcript. After cell fractionation and barcode sequencing, we identified 109 unique RNA regions that significantly enriched this cytosolic transcript in the nucleus including a cytosine-rich motif. These nuclear enrichment sequences are highly conserved and over-represented in global nuclear fractionation sequencing. Importantly, many of these regions were independently validated by single-molecule RNA fluorescence in situ hybridization. Overall, we demonstrate the utility of MPRNA for future investigation of RNA-based functionalities.
Assuntos
RNA Longo não Codificante/genética , Núcleo Celular/genética , Células HeLa , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Hibridização in Situ Fluorescente , Análise de Sequência de RNARESUMO
Cell type specification during early nervous system development in Drosophila melanogaster requires precise regulation of gene expression in time and space. Resolving the programs driving neurogenesis has been a major challenge owing to the complexity and rapidity with which distinct cell populations arise. To resolve the cell type-specific gene expression dynamics in early nervous system development, we have sequenced the transcriptomes of purified neurogenic cell types across consecutive time points covering crucial events in neurogenesis. The resulting gene expression atlas comprises a detailed resource of global transcriptome dynamics that permits systematic analysis of how cells in the nervous system acquire distinct fates. We resolve known gene expression dynamics and uncover novel expression signatures for hundreds of genes among diverse neurogenic cell types, most of which remain unstudied. We also identified a set of conserved long noncoding RNAs (lncRNAs) that are regulated in a tissue-specific manner and exhibit spatiotemporal expression during neurogenesis with exquisite specificity. lncRNA expression is highly dynamic and demarcates specific subpopulations within neurogenic cell types. Our spatiotemporal transcriptome atlas provides a comprehensive resource for investigating the function of coding genes and noncoding RNAs during crucial stages of early neurogenesis.
Assuntos
Drosophila melanogaster/genética , Regulação da Expressão Gênica no Desenvolvimento , Sistema Nervoso/embriologia , Neurogênese/genética , RNA Longo não Codificante/genética , Animais , Linhagem da Célula , Drosophila melanogaster/metabolismo , Citometria de Fluxo , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Hibridização in Situ Fluorescente , Neuroglia/fisiologia , Filogenia , TranscriptomaRESUMO
Quantifying the impact of natural disasters or epidemics is critical for guiding policy decisions and interventions. When the effects of an event are long-lasting and difficult to detect in the short term, the accumulated effects can be devastating. Mortality is one of the most reliably measured health outcomes, partly due to its unambiguous definition. As a result, excess mortality estimates are an increasingly effective approach for quantifying the effect of an event. However, the fact that indirect effects are often characterized by small, but enduring, increases in mortality rates present a statistical challenge. This is compounded by sources of variability introduced by demographic changes, secular trends, seasonal and day of the week effects, and natural variation. Here, we present a model that accounts for these sources of variability and characterizes concerning increases in mortality rates with smooth functions of time that provide statistical power. The model permits discontinuities in the smooth functions to model sudden increases due to direct effects. We implement a flexible estimation approach that permits both surveillance of concerning increases in mortality rates and careful characterization of the effect of a past event. We demonstrate our tools' utility by estimating excess mortality after hurricanes in the United States and Puerto Rico. We use Hurricane Maria as a case study to show appealing properties that are unique to our method compared with current approaches. Finally, we show the flexibility of our approach by detecting and quantifying the 2014 Chikungunya outbreak in Puerto Rico and the COVID-19 pandemic in the United States. We make our tools available through the excessmort R package available from https://cran.r-project.org/web/packages/excessmort/.
Assuntos
COVID-19 , Tempestades Ciclônicas , Humanos , Pandemias , Porto Rico/epidemiologia , Estados Unidos/epidemiologiaRESUMO
BACKGROUND: Quantifying the effect of natural disasters on society is critical for recovery of public health services and infrastructure. The death toll can be difficult to assess in the aftermath of a major disaster. In September 2017, Hurricane Maria caused massive infrastructural damage to Puerto Rico, but its effect on mortality remains contentious. The official death count is 64. METHODS: Using a representative, stratified sample, we surveyed 3299 randomly chosen households across Puerto Rico to produce an independent estimate of all-cause mortality after the hurricane. Respondents were asked about displacement, infrastructure loss, and causes of death. We calculated excess deaths by comparing our estimated post-hurricane mortality rate with official rates for the same period in 2016. RESULTS: From the survey data, we estimated a mortality rate of 14.3 deaths (95% confidence interval [CI], 9.8 to 18.9) per 1000 persons from September 20 through December 31, 2017. This rate yielded a total of 4645 excess deaths during this period (95% CI, 793 to 8498), equivalent to a 62% increase in the mortality rate as compared with the same period in 2016. However, this number is likely to be an underestimate because of survivor bias. The mortality rate remained high through the end of December 2017, and one third of the deaths were attributed to delayed or interrupted health care. Hurricane-related migration was substantial. CONCLUSIONS: This household-based survey suggests that the number of excess deaths related to Hurricane Maria in Puerto Rico is more than 70 times the official estimate. (Funded by the Harvard T.H. Chan School of Public Health and others.).
Assuntos
Tempestades Ciclônicas , Desastres/estatística & dados numéricos , Acessibilidade aos Serviços de Saúde/estatística & dados numéricos , Mortalidade , Adolescente , Adulto , Distribuição por Idade , Idoso , Idoso de 80 Anos ou mais , Causas de Morte , Criança , Pré-Escolar , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Mortalidade Prematura , Porto Rico/epidemiologia , Inquéritos e Questionários , Adulto JovemRESUMO
As of mid-August 2020, more than 170 000 U.S. residents have died of coronavirus disease 2019 (COVID-19); however, the true number of deaths resulting from COVID-19, both directly and indirectly, is likely to be much higher. The proper attribution of deaths to this pandemic has a range of societal, legal, mortuary, and public health consequences. This article discusses the current difficulties of disaster death attribution and describes the strengths and limitations of relying on death counts from death certificates, estimations of indirect deaths, and estimations of excess mortality. Improving the tabulation of direct and indirect deaths on death certificates will require concerted efforts and consensus across medical institutions and public health agencies. In addition, actionable estimates of excess mortality will require timely access to standardized and structured vital registry data, which should be shared directly at the state level to ensure rapid response for local governments. Correct attribution of direct and indirect deaths and estimation of excess mortality are complementary goals that are critical to our understanding of the pandemic and its effect on human life.
Assuntos
COVID-19/mortalidade , Pandemias , Sistema de Registros , SARS-CoV-2 , Causas de Morte/tendências , Humanos , Taxa de Sobrevida/tendênciasRESUMO
The main application of ChIP-seq technology is the detection of genomic regions that bind to a protein of interest. A large part of functional genomics' public catalogs is based on ChIP-seq data. These catalogs rely on peak calling algorithms that infer protein-binding sites by detecting genomic regions associated with more mapped reads (coverage) than expected by chance, as a result of the experimental protocol's lack of perfect specificity. We find that GC-content bias accounts for substantial variability in the observed coverage for ChIP-seq experiments and that this variability leads to false-positive peak calls. More concerning is that the GC effect varies across experiments, with the effect strong enough to result in a substantial number of peaks called differently when different laboratories perform experiments on the same cell line. However, accounting for GC content bias in ChIP-seq is challenging because the binding sites of interest tend to be more common in high GC-content regions, which confounds real biological signals with unwanted variability. To account for this challenge, we introduce a statistical approach that accounts for GC effects on both nonspecific noise and signal induced by the binding site. The method can be used to account for this bias in binding quantification as well to improve existing peak calling algorithms. We use this approach to show a reduction in false-positive peaks as well as improved consistency across laboratories.
Assuntos
Composição de Bases , DNA/metabolismo , Análise de Sequência de DNA/métodos , Algoritmos , Sítios de Ligação , Imunoprecipitação da Cromatina , DNA/química , Reações Falso-Positivas , Genômica , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
With recent advances in sequencing technology, it is now feasible to measure DNA methylation at tens of millions of sites across the entire genome. In most applications, biologists are interested in detecting differentially methylated regions, composed of multiple sites with differing methylation levels among populations. However, current computational approaches for detecting such regions do not provide accurate statistical inference. A major challenge in reporting uncertainty is that a genome-wide scan is involved in detecting these regions, which needs to be accounted for. A further challenge is that sample sizes are limited due to the costs associated with the technology. We have developed a new approach that overcomes these challenges and assesses uncertainty for differentially methylated regions in a rigorous manner. Region-level statistics are obtained by fitting a generalized least squares regression model with a nested autoregressive correlated error structure for the effect of interest on transformed methylation proportions. We develop an inferential approach, based on a pooled null distribution, that can be implemented even when as few as two samples per population are available. Here, we demonstrate the advantages of our method using both experimental data and Monte Carlo simulation. We find that the new method improves the specificity and sensitivity of lists of regions and accurately controls the false discovery rate.
Assuntos
Metilação de DNA , Genômica/métodos , Modelos Estatísticos , Análise de Sequência de DNA/métodos , Animais , Simulação por Computador , Genômica/normas , Humanos , Análise de Sequência de DNA/normas , IncertezaRESUMO
We introduce Salmon, a lightweight method for quantifying transcript abundance from RNA-seq reads. Salmon combines a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure. It is the first transcriptome-wide quantifier to correct for fragment GC-content bias, which, as we demonstrate here, substantially improves the accuracy of abundance estimates and the sensitivity of subsequent differential expression analysis.