RESUMO
Tumor immune cell compositions play a major role in response to immunotherapy, but the heterogeneity and dynamics of immune infiltrates in human cancer lesions remain poorly characterized. Here, we identify conserved intratumoral CD4 and CD8 T cell behaviors in scRNA-seq data from 25 melanoma patients. We discover a large population of CD8 T cells showing continuous progression from an early effector "transitional" into a dysfunctional T cell state. CD8 T cells that express a complete cytotoxic gene set are rare, and TCR sharing data suggest their independence from the transitional and dysfunctional cell states. Notably, we demonstrate that dysfunctional T cells are the major intratumoral proliferating immune cell compartment and that the intensity of the dysfunctional signature is associated with tumor reactivity. Our data demonstrate that CD8 T cells previously defined as exhausted are in fact a highly proliferating, clonal, and dynamically differentiating cell population within the human tumor microenvironment.
Assuntos
Linfócitos T CD8-Positivos/imunologia , Linfócitos T CD8-Positivos/metabolismo , Melanoma/imunologia , Linfócitos T CD4-Positivos/imunologia , Linfócitos T CD4-Positivos/metabolismo , Humanos , Imunoterapia , Linfócitos do Interstício Tumoral/imunologia , Receptor de Morte Celular Programada 1/imunologia , Microambiente Tumoral/imunologiaRESUMO
Chromosomes in proliferating metazoan cells undergo marked structural metamorphoses every cell cycle, alternating between highly condensed mitotic structures that facilitate chromosome segregation, and decondensed interphase structures that accommodate transcription, gene silencing and DNA replication. Here we use single-cell Hi-C (high-resolution chromosome conformation capture) analysis to study chromosome conformations in thousands of individual cells, and discover a continuum of cis-interaction profiles that finely position individual cells along the cell cycle. We show that chromosomal compartments, topological-associated domains (TADs), contact insulation and long-range loops, all defined by bulk Hi-C maps, are governed by distinct cell-cycle dynamics. In particular, DNA replication correlates with a build-up of compartments and a reduction in TAD insulation, while loops are generally stable from G1 to S and G2 phase. Whole-genome three-dimensional structural models reveal a radial architecture of chromosomal compartments with distinct epigenomic signatures. Our single-cell data therefore allow re-interpretation of chromosome conformation maps through the prism of the cell cycle.
Assuntos
Ciclo Celular/fisiologia , Cromossomos de Mamíferos/química , Cromossomos de Mamíferos/metabolismo , Epigênese Genética , Análise de Célula Única/métodos , Animais , Compartimento Celular , Ciclo Celular/genética , Cromossomos de Mamíferos/genética , Haploidia , Imageamento Tridimensional , Camundongos , Células-Tronco Embrionárias Murinas/citologia , Reprodutibilidade dos TestesRESUMO
In epigenome-wide association studies (EWAS), different methylation profiles of distinct cell types may lead to false discoveries. We introduce ReFACTor, a method based on principal component analysis (PCA) and designed for the correction of cell type heterogeneity in EWAS. ReFACTor does not require knowledge of cell counts, and it provides improved estimates of cell type composition, resulting in improved power and control for false positives in EWAS. Corresponding software is available at http://www.cs.tau.ac.il/~heran/cozygene/software/refactor.html.
Assuntos
Metilação de DNA/genética , Epigenômica/métodos , Heterogeneidade Genética , Estudo de Associação Genômica Ampla/métodos , Análise de Componente Principal , Algoritmos , Simulação por Computador , Ilhas de CpG/genética , Epigenômica/estatística & dados numéricos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Leucócitos/citologia , Leucócitos/metabolismoRESUMO
Genomic imprinting is an important regulatory mechanism that silences one of the parental copies of a gene. To systematically characterize this phenomenon, we analyze tissue specificity of imprinting from allelic expression data in 1582 primary tissue samples from 178 individuals from the Genotype-Tissue Expression (GTEx) project. We characterize imprinting in 42 genes, including both novel and previously identified genes. Tissue specificity of imprinting is widespread, and gender-specific effects are revealed in a small number of genes in muscle with stronger imprinting in males. IGF2 shows maternal expression in the brain instead of the canonical paternal expression elsewhere. Imprinting appears to have only a subtle impact on tissue-specific expression levels, with genes lacking a systematic expression difference between tissues with imprinted and biallelic expression. In summary, our systematic characterization of imprinting in adult tissues highlights variation in imprinting between genes, individuals, and tissues.
Assuntos
Impressão Genômica , Genômica , Adulto , Alelos , Análise por Conglomerados , Metilação de DNA , Bases de Dados de Ácidos Nucleicos , Feminino , Regulação da Expressão Gênica , Variação Genética , Genótipo , Humanos , Masculino , Especificidade de Órgãos/genética , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Fatores SexuaisRESUMO
Characterizing the spatial patterns of genetic diversity in human populations has a wide range of applications, from detecting genetic mutations associated with disease to inferring human history. Current approaches, including the widely used principal-component analysis, are not suited for the analysis of linked markers, and local and long-range linkage disequilibrium (LD) can dramatically reduce the accuracy of spatial localization when unaccounted for. To overcome this, we have introduced an approach that performs spatial localization of individuals on the basis of their genetic data and explicitly models LD among markers by using a multivariate normal distribution. By leveraging external reference panels, we derive closed-form solutions to the optimization procedure to achieve a computationally efficient method that can handle large data sets. We validate the method on empirical data from a large sample of European individuals from the POPRES data set, as well as on a large sample of individuals of Spanish ancestry. First, we show that by modeling LD, we achieve accuracy superior to that of existing methods. Importantly, whereas other methods show decreased performance when dense marker panels are used in the inference, our approach improves in accuracy as more markers become available. Second, we show that accurate localization of genetic data can be achieved with only a part of the genome, and this could potentially enable the spatial localization of admixed samples that have a fraction of their genome originating from a given continent. Finally, we demonstrate that our approach is resistant to distortions resulting from long-range LD regions; such distortions can dramatically bias the results when unaccounted for.
Assuntos
Desequilíbrio de Ligação , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Software , Algoritmos , Marcadores Genéticos , Genética Populacional , Genoma Humano , Humanos , Filogeografia/métodos , Análise de Componente Principal , EspanhaRESUMO
Identifying segments in the genome of different individuals that are identical-by-descent (IBD) is a fundamental element of genetics. IBD data is used for numerous applications including demographic inference, heritability estimation, and mapping disease loci. Simultaneous detection of IBD over multiple haplotypes has proven to be computationally difficult. To overcome this, many state of the art methods estimate the probability of IBD between each pair of haplotypes separately. While computationally efficient, these methods fail to leverage the clique structure of IBD resulting in less powerful IBD identification, especially for small IBD segments.
Assuntos
Asma/genética , Biologia Computacional/métodos , Genética Populacional , Genoma Humano , Haplótipos/genética , Polimorfismo de Nucleotídeo Único/genética , Asma/epidemiologia , Estudos de Coortes , Simulação por Computador , Hispânico ou Latino/genética , Humanos , ProbabilidadeRESUMO
MOTIVATION: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in two-way admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). RESULTS: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos.
Assuntos
Algoritmos , Genética Populacional , Hispânico ou Latino/genética , Fluxo Gênico , Genética Populacional/métodos , Haplótipos , Humanos , Indígenas Norte-Americanos/genética , Desequilíbrio de Ligação , Cadeias de Markov , México , Porto Rico , Estados Unidos , População Branca/genéticaRESUMO
The availability of metagenomic sequencing data, generated by sequencing DNA pooled from multiple microbes living jointly, has increased sharply in the last few years with developments in sequencing technology. Characterizing the contents of metagenomic samples is a challenging task, which has been extensively attempted by both supervised and unsupervised techniques, each with its own limitations. Common to practically all the methods is the processing of single samples only; when multiple samples are sequenced, each is analyzed separately and the results are combined. In this paper we propose to perform a combined analysis of a set of samples in order to obtain a better characterization of each of the samples, and provide two applications of this principle. First, we use an unsupervised probabilistic mixture model to infer hidden components shared across metagenomic samples. We incorporate the model in a novel framework for studying association of microbial sequence elements with phenotypes, analogous to the genome-wide association studies performed on human genomes: We demonstrate that stratification may result in false discoveries of such associations, and that the components inferred by the model can be used to correct for this stratification. Second, we propose a novel read clustering (also termed "binning") algorithm which operates on multiple samples simultaneously, leveraging on the assumption that the different samples contain the same microbial species, possibly in different proportions. We show that integrating information across multiple samples yields more precise binning on each of the samples. Moreover, for both applications we demonstrate that given a fixed depth of coverage, the average per-sample performance generally increases with the number of sequenced samples as long as the per-sample coverage is high enough.
Assuntos
Biologia Computacional/métodos , Metagenômica , Algoritmos , Proteínas de Bactérias/metabolismo , Análise por Conglomerados , Genoma Humano , Estudo de Associação Genômica Ampla , Humanos , Funções Verossimilhança , Modelos Estatísticos , Fenótipo , Filogenia , Análise de Componente Principal , Probabilidade , Análise de Sequência de DNARESUMO
The origin and history of the Ashkenazi Jewish population have long been of great interest, and advances in high-throughput genetic analysis have recently provided a new approach for investigating these topics. We and others have argued on the basis of genome-wide data that the Ashkenazi Jewish population derives its ancestry from a combination of sources tracing to both Europe and the Middle East. It has been claimed, however, through a reanalysis of some of our data, that a large part of the ancestry of the Ashkenazi population originates with the Khazars, a Turkic-speaking group that lived to the north of the Caucasus region ~1,000 years ago. Because the Khazar population has left no obvious modern descendants that could enable a clear test for a contribution to Ashkenazi Jewish ancestry, the Khazar hypothesis has been difficult to examine using genetics. Furthermore, because only limited genetic data have been available from the Caucasus region, and because these data have been concentrated in populations that are genetically close to populations from the Middle East, the attribution of any signal of Ashkenazi-Caucasus genetic similarity to Khazar ancestry rather than shared ancestral Middle Eastern ancestry has been problematic. Here, through integration of genotypes from newly collected samples with data from several of our past studies, we have assembled the largest data set available to date for assessment of Ashkenazi Jewish genetic origins. This data set contains genome-wide single-nucleotide polymorphisms in 1,774 samples from 106 Jewish and non-Jewish populations that span the possible regions of potential Ashkenazi ancestry: Europe, the Middle East, and the region historically associated with the Khazar Khaganate. The data set includes 261 samples from 15 populations from the Caucasus region and the region directly to its north, samples that have not previously been included alongside Ashkenazi Jewish samples in genomic studies. Employing a variety of standard techniques for the analysis of population-genetic structure, we found that Ashkenazi Jews share the greatest genetic ancestry with other Jewish populations and, among non-Jewish populations, with groups from Europe and the Middle East. No particular similarity of Ashkenazi Jews to populations from the Caucasus is evident, particularly populations that most closely represent the Khazar region. Thus, analysis of Ashkenazi Jews together with a large sample from the region of the Khazar Khaganate corroborates the earlier results that Ashkenazi Jews derive their ancestry primarily from populations of the Middle East and Europe, that they possess considerable shared ancestry with other Jewish populations, and that there is no indication of a significant genetic contribution either from within or from north of the Caucasus region.
Assuntos
Judeus/genética , Terras Antigas/etnologia , Europa (Continente)/etnologia , Feminino , Genética Populacional/métodos , Estudo de Associação Genômica Ampla , História Antiga , História Medieval , Humanos , Judeus/história , Masculino , Oriente Médio/etnologia , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Embryonic development involves massive proliferation and differentiation of cell lineages. This must be supported by chromosome replication and epigenetic reprogramming, but how proliferation and cell fate acquisition are balanced in this process is not well understood. Here we use single cell Hi-C to map chromosomal conformations in post-gastrulation mouse embryo cells and study their distributions and correlations with matching embryonic transcriptional atlases. We find that embryonic chromosomes show a remarkably strong cell cycle signature. Despite that, replication timing, chromosome compartment structure, topological associated domains (TADs) and promoter-enhancer contacts are shown to be variable between distinct epigenetic states. About 10% of the nuclei are identified as primitive erythrocytes, showing exceptionally compact and organized compartment structure. The remaining cells are broadly associated with ectoderm and mesoderm identities, showing only mild differentiation of TADs and compartment structures, but more specific localized contacts in hundreds of ectoderm and mesoderm promoter-enhancer pairs. The data suggest that while fully committed embryonic lineages can rapidly acquire specific chromosomal conformations, most embryonic cells are showing plastic signatures driven by complex and intermixed enhancer landscapes.
Assuntos
Gastrulação , Sequências Reguladoras de Ácido Nucleico , Feminino , Gravidez , Animais , Camundongos , Conformação Molecular , Regiões Promotoras Genéticas/genética , CromossomosRESUMO
Crosstalk between neighboring cells underlies many biological processes, including cell signaling, proliferation and differentiation. Current single-cell genomic technologies profile each cell separately after tissue dissociation, losing information on cell-cell interactions. In the present study, we present an approach for sequencing physically interacting cells (PIC-seq), which combines cell sorting of physically interacting cells (PICs) with single-cell RNA-sequencing. Using computational modeling, PIC-seq systematically maps in situ cellular interactions and characterizes their molecular crosstalk. We apply PIC-seq to interrogate diverse interactions including immune-epithelial PICs in neonatal murine lungs. Focusing on interactions between T cells and dendritic cells (DCs) in vitro and in vivo, we map T cell-DC interaction preferences, and discover regulatory T cells as a major T cell subtype interacting with DCs in mouse draining lymph nodes. Analysis of T cell-DC pairs reveals an interaction-specific program between pathogen-presenting migratory DCs and T cells. PIC-seq provides a direct and broadly applicable technology to characterize intercellular interaction-specific pathways at high resolution.
Assuntos
Células Dendríticas/citologia , Perfilação da Expressão Gênica/métodos , Análise de Célula Única/métodos , Linfócitos T/citologia , Algoritmos , Animais , Animais Recém-Nascidos , Comunicação Celular , Células Cultivadas , Biologia Computacional , Células Dendríticas/química , Feminino , Citometria de Fluxo , Pulmão/química , Pulmão/citologia , Camundongos , Análise de Sequência de RNA , Linfócitos T/químicaRESUMO
scRNA-seq profiles each represent a highly partial sample of mRNA molecules from a unique cell that can never be resampled, and robust analysis must separate the sampling effect from biological variance. We describe a methodology for partitioning scRNA-seq datasets into metacells: disjoint and homogenous groups of profiles that could have been resampled from the same cell. Unlike clustering analysis, our algorithm specializes at obtaining granular as opposed to maximal groups. We show how to use metacells as building blocks for complex quantitative transcriptional maps while avoiding data smoothing. Our algorithms are implemented in the MetaCell R/C++ software package.
Assuntos
Análise de Sequência de RNA , Análise de Célula Única , Software , Algoritmos , Linfócitos T CD8-Positivos/metabolismo , Genômica/métodosRESUMO
AIMS: The European Collaborative Project on Inflammation and Vascular Wall Remodeling in Atherosclerosis - Intravascular Ultrasound (ATHEROREMO-IVUS) study was designed as an exploratory clinical study in order to investigate the associations between genetic variation, coronary atherosclerosis phenotypes, and plaque vulnerability as determined by IVUS. METHODS AND RESULTS: The ATHEROREMO-IVUS study was a prospective, observational study of 581 patients with stable angina pectoris or acute coronary syndrome (ACS) who were referred for coronary angiography to the Thoraxcenter, Rotterdam, enriched with 265 IBIS-2 participants (total population, n=846). Prior to catheterisation, blood samples were drawn for genetic analyses. During the catheterisation procedure, IVUS was performed in a non-culprit coronary artery. The primary endpoint was the presence of vulnerable plaque as determined by IVUS virtual histology (VH). In addition, we performed a genome-wide association study of plaque morphology. We observed strong signals associated with plaque morphology in several chromosomal regions: twelve SNPs (rs17300022, rs6904106, rs17177818, rs2248165, rs2477539, rs16865681, rs2396058, rs4753663, rs4082252, rs6932, rs12862206, rs6780676) in or near eight different genes (GNA12, NMBR, SFMBT2, CUL3, SESN3, SLC22A25, EFBN2, SEC62) were most significant. CONCLUSIONS: In conclusion, we found twelve SNPs in or in the proximity of eight genes, which were possibly associated with markers of vulnerable plaque. ClinicalTrials.gov Identifier: NCT01789411.
Assuntos
Aterosclerose , Doença da Artéria Coronariana , Placa Aterosclerótica , Angiografia Coronária , Vasos Coronários , Estudo de Associação Genômica Ampla , Proteínas de Choque Térmico , Humanos , Proteínas de Membrana Transportadoras , Estudos Prospectivos , Ultrassonografia de IntervençãoRESUMO
Modeling human genetic variation along the continuous geographic space is a new research direction that has been stirring interest in the community during the past few years. Multiple recent works suggested different probabilistic models for the relation between geography and genetic sequence, and applied them to geographic localization, detection of selection, and correction of confounding in Genome-Wide Association Studies (GWAS). Prior to these developments, continuous representations of genetic structure were produced almost exclusively using dimensionality reduction techniques, mostly principal component analysis (PCA). Although fast and effective in some tasks, PCA suffers from multiple disadvantages, primarily stemming from a lack of explicit underlying genetic model. We begin this note by explaining the implicit spatio-genetic model that underlies PCA. Our presentation provides insights into some of the recently proposed spatial models; particularly, we show that two of these models can be formulated as modifications of PCA, each removing one of PCA's limitations in the context of genetic analysis. We build on one of the models to derive a nonsupervised procedure for the inference of spatial structure, and empirically demonstrate that it outperforms PCA in spatial inference. We then go on to review a few additional recent works in this unifying perspective.
Assuntos
Variação Genética , Modelos Genéticos , Algoritmos , Humanos , Modelos Estatísticos , Análise de Componente PrincipalRESUMO
BACKGROUND: Disease risk and incidence between males and females reveal differences, and sex is an important component of any investigation of the determinants of phenotypes or disease etiology. Further striking differences between men and women are known, for instance, at the metabolic level. The extent to which men and women vary at the level of the epigenome, however, is not well documented. DNA methylation is the best known epigenetic mechanism to date. RESULTS: In order to shed light on epigenetic differences, we compared autosomal DNA methylation levels between men and women in blood in a large prospective European cohort of 1799 subjects, and replicated our findings in three independent European cohorts. We identified and validated 1184 CpG sites to be differentially methylated between men and women and observed that these CpG sites were distributed across all autosomes. We showed that some of the differentially methylated loci also exhibit differential gene expression between men and women. Finally, we found that the differentially methylated loci are enriched among imprinted genes, and that their genomic location in the genome is concentrated in CpG island shores. CONCLUSION: Our epigenome-wide association study indicates that differences between men and women are so substantial that they should be considered in design and analyses of future studies.