Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Am J Hum Genet ; 108(12): 2354-2367, 2021 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-34822764

RESUMO

Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSF13B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk.


Assuntos
Variação Genética , Estudo de Associação Genômica Ampla , Modelos Genéticos , Teorema de Bayes , Feminino , Humanos , Masculino , Fenótipo
2.
Am J Hum Genet ; 106(5): 611-622, 2020 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-32275883

RESUMO

Population-scale biobanks that combine genetic data and high-dimensional phenotyping for a large number of participants provide an exciting opportunity to perform genome-wide association studies (GWAS) to identify genetic variants associated with diverse quantitative traits and diseases. A major challenge for GWAS in population biobanks is ascertaining disease cases from heterogeneous data sources such as hospital records, digital questionnaire responses, or interviews. In this study, we use genetic parameters, including genetic correlation, to evaluate whether GWAS performed using cases in the UK Biobank ascertained from hospital records, questionnaire responses, and family history of disease implicate similar disease genetics across a range of effect sizes. We find that hospital record and questionnaire GWAS largely identify similar genetic effects for many complex phenotypes and that combining together both phenotyping methods improves power to detect genetic associations. We also show that family history GWAS using cases ascertained on family history of disease agrees with combined hospital record and questionnaire GWAS and that family history GWAS has better power to detect genetic associations for some phenotypes. Overall, this work demonstrates that digital phenotyping and unstructured phenotype data can be combined with structured data such as hospital records to identify cases for GWAS in biobanks and improve the ability of such studies to identify genetic associations.


Assuntos
Doença/genética , Estudo de Associação Genômica Ampla , Fenótipo , Asma/genética , Bases de Dados Factuais , Feminino , Genética Médica , Genótipo , Humanos , Masculino , Neoplasias/genética , Reino Unido
3.
RNA ; 22(10): 1535-49, 2016 10.
Artigo em Inglês | MEDLINE | ID: mdl-27492256

RESUMO

Myelodysplastic syndromes (MDS) are heterogeneous myeloid disorders with prevalent mutations in several splicing factors, but the splicing programs linked to specific mutations or MDS in general remain to be systematically defined. We applied RASL-seq, a sensitive and cost-effective platform, to interrogate 5502 annotated splicing events in 169 samples from MDS patients or healthy individuals. We found that splicing signatures associated with normal hematopoietic lineages are largely related to cell signaling and differentiation programs, whereas MDS-linked signatures are primarily involved in cell cycle control and DNA damage responses. Despite the shared roles of affected splicing factors in the 3' splice site definition, mutations in U2AF1, SRSF2, and SF3B1 affect divergent splicing programs, and interestingly, the affected genes fall into converging cancer-related pathways. A risk score derived from 11 splicing events appears to be independently associated with an MDS prognosis and AML transformation, suggesting potential clinical relevance of altered splicing patterns in MDS.


Assuntos
Mutação , Síndromes Mielodisplásicas/genética , Fosfoproteínas/genética , Sítios de Splice de RNA , Fatores de Processamento de RNA/genética , Fatores de Processamento de Serina-Arginina/genética , Fator de Processamento U2AF/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Estudos de Casos e Controles , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Síndromes Mielodisplásicas/patologia , Splicing de RNA
4.
Blood ; 128(25): 2931-2940, 2016 12 22.
Artigo em Inglês | MEDLINE | ID: mdl-27815263

RESUMO

ROR1 is an oncoembryonic orphan receptor found on chronic lymphocytic leukemia (CLL) B cells, but not on normal postpartum tissues. ROR1 is a receptor for Wnt5a that may complex with TCL1, a coactivator of AKT that is able to promote development of CLL. We found the CLL cells of a few patients expressed negligible ROR1 (ROR1Neg), but expressed TCL1A at levels comparable to those of samples that expressed ROR1 (ROR1Pos). Transcriptome analyses revealed that ROR1Neg cases generally could be distinguished from those that were ROR1Pos in unsupervised gene-expression clustering analysis. Gene-set enrichment analyses demonstrated that ROR1Neg CLL had lower expression and activation of AKT signaling pathways relative to ROR1Pos CLL, similar to what was noted for leukemia that respectively developed in TCL1 vs ROR1xTCL1 transgenic mice. In contrast to its effect on ROR1Pos CLL, Wnt5a did not enhance the proliferation, chemotaxis, or survival of ROR1Neg CLL. We examined the CLL cells from 1568 patients, which we randomly assigned to a training or validation set of 797 or 771 cases, respectively. Using recursive partitioning, we defined a threshold for ROR1 surface expression that could segregate samples of the training set into ROR1-Hi vs ROR1-Lo subgroups that differed significantly in their median treatment-free survival (TFS). Using this threshold, we found that ROR1-Hi cases had a significantly shorter median TFS and overall survival than ROR1-Lo cases in the validation set. These data demonstrate that expression of ROR1 may promote leukemia-cell activation and survival and enhance disease progression in patients with CLL.


Assuntos
Progressão da Doença , Leucemia Linfocítica Crônica de Células B/genética , Leucemia Linfocítica Crônica de Células B/patologia , Receptores Órfãos Semelhantes a Receptor Tirosina Quinase/metabolismo , Movimento Celular/genética , Proliferação de Células/genética , Sobrevivência Celular/genética , Intervalo Livre de Doença , Perfilação da Expressão Gênica , Regulação Leucêmica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Cadeias Pesadas de Imunoglobulinas/genética , Região Variável de Imunoglobulina/genética , Análise Multivariada , Receptores Órfãos Semelhantes a Receptor Tirosina Quinase/genética
5.
Proc Natl Acad Sci U S A ; 112(23): E3050-7, 2015 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-26015570

RESUMO

Tumor-specific molecules are needed across diverse areas of oncology for use in early detection, diagnosis, prognosis and therapy. Large and growing public databases of transcriptome sequencing data (RNA-seq) derived from tumors and normal tissues hold the potential of yielding tumor-specific molecules, but because the data are new they have not been fully explored for this purpose. We have developed custom bioinformatic algorithms and used them with 296 high-grade serous ovarian (HGS-OvCa) tumor and 1,839 normal RNA-seq datasets to identify mRNA isoforms with tumor-specific expression. We rank prioritized isoforms by likelihood of being expressed in HGS-OvCa tumors and not in normal tissues and analyzed 671 top-ranked isoforms by high-throughput RT-qPCR. Six of these isoforms were expressed in a majority of the 12 tumors examined but not in 18 normal tissues. An additional 11 were expressed in most tumors and only one normal tissue, which in most cases was fallopian or colon. Of the 671 isoforms, the topmost 5% (n = 33) ranked based on having tumor-specific or highly restricted normal tissue expression by RT-qPCR analysis are enriched for oncogenic, stem cell/cancer stem cell, and early development loci--including ETV4, FOXM1, LSR, CD9, RAB11FIP4, and FGFRL1. Many of the 33 isoforms are predicted to encode proteins with unique amino acid sequences, which would allow them to be specifically targeted for one or more therapeutic strategies--including monoclonal antibodies and T-cell-based vaccines. The systematic process described herein is readily and rapidly applicable to the more than 30 additional tumor types for which sufficient amounts of RNA-seq already exist.


Assuntos
Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/terapia , RNA Mensageiro/genética , Transcriptoma , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias Ovarianas/genética , Reação em Cadeia da Polimerase em Tempo Real
6.
PLoS Comput Biol ; 11(3): e1004105, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25768983

RESUMO

Mutations in the splicing factor SF3B1 are found in several cancer types and have been associated with various splicing defects. Using transcriptome sequencing data from chronic lymphocytic leukemia, breast cancer and uveal melanoma tumor samples, we show that hundreds of cryptic 3' splice sites (3'SSs) are used in cancers with SF3B1 mutations. We define the necessary sequence context for the observed cryptic 3' SSs and propose that cryptic 3'SS selection is a result of SF3B1 mutations causing a shift in the sterically protected region downstream of the branch point. While most cryptic 3'SSs are present at low frequency (<10%) relative to nearby canonical 3'SSs, we identified ten genes that preferred out-of-frame cryptic 3'SSs. We show that cancers with mutations in the SF3B1 HEAT 5-9 repeats use cryptic 3'SSs downstream of the branch point and provide both a mechanistic model consistent with published experimental data and affected targets that will guide further research into the oncogenic effects of SF3B1 mutation.


Assuntos
Mutação/genética , Mutação/fisiologia , Neoplasias/genética , Fosfoproteínas/genética , Sítios de Splice de RNA/genética , Ribonucleoproteína Nuclear Pequena U2/genética , Transcriptoma/genética , Humanos , Neoplasias/metabolismo , Fatores de Processamento de RNA , Análise de Sequência de RNA
7.
BMC Bioinformatics ; 15: 173, 2014 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-24909518

RESUMO

BACKGROUND: Human disease often arises as a consequence of alterations in a set of associated genes rather than alterations to a set of unassociated individual genes. Most previous microarray-based meta-analyses identified disease-associated genes or biomarkers independent of genetic interactions. Therefore, in this study, we present the first meta-analysis method capable of taking gene combination effects into account to efficiently identify associated biomarkers (ABs) across different microarray platforms. RESULTS: We propose a new meta-analysis approach called MiningABs to mine ABs across different array-based datasets. The similarity between paired probe sequences is quantified as a bridge to connect these datasets together. The ABs can be subsequently identified from an "improved" common logit model (c-LM) by combining several sibling-like LMs in a heuristic genetic algorithm selection process. Our approach is evaluated with two sets of gene expression datasets: i) 4 esophageal squamous cell carcinoma and ii) 3 hepatocellular carcinoma datasets. Based on an unbiased reciprocal test, we demonstrate that each gene in a group of ABs is required to maintain high cancer sample classification accuracy, and we observe that ABs are not limited to genes common to all platforms. Investigating the ABs using Gene Ontology (GO) enrichment, literature survey, and network analyses indicated that our ABs are not only strongly related to cancer development but also highly connected in a diverse network of biological interactions. CONCLUSIONS: The proposed meta-analysis method called MiningABs is able to efficiently identify ABs from different independently performed array-based datasets, and we show its validity in cancer biology via GO enrichment, literature survey and network analyses. We postulate that the ABs may facilitate novel target and drug discovery, leading to improved clinical treatment. Java source code, tutorial, example and related materials are available at "http://sourceforge.net/projects/miningabs/".


Assuntos
Mineração de Dados/métodos , Perfilação da Expressão Gênica , Expressão Gênica , Marcadores Genéticos/genética , Algoritmos , Biomarcadores Tumorais/genética , Perfilação da Expressão Gênica/métodos , Humanos , Neoplasias/genética
8.
BMC Bioinformatics ; 11: 462, 2010 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-20843365

RESUMO

BACKGROUND: Models of sequence evolution typically assume that different nucleotide positions evolve independently. This assumption is widely appreciated to be an over-simplification. The best known violations involve biases due to adjacent nucleotides. There have also been suggestions that biases exist at larger scales, however this possibility has not been systematically explored. RESULTS: To address this we have developed a method which identifies over- and under-represented substitution patterns and assesses their overall impact on the evolution of genome composition. Our method is designed to account for biases at smaller pattern sizes, removing their effects. We used this method to investigate context bias in the human lineage after the divergence from chimpanzee. We examined bias effects in substitution patterns between 2 and 5 bp long and found significant effects at all sizes. This included some individual three and four base pair patterns with relatively large biases. We also found that bias effects vary across the genome, differing between transposons and non-transposons, between different classes of transposons, and also near and far from genes. CONCLUSIONS: We found that nucleotides beyond the immediately adjacent one are responsible for substantial context effects, and that these biases vary across the genome.


Assuntos
Genoma Humano , Animais , Composição de Bases , Sequência de Bases , Biologia Computacional , Elementos de DNA Transponíveis , Humanos , Nucleotídeos/genética , Polimorfismo de Nucleotídeo Único
9.
Nat Commun ; 11(1): 2928, 2020 06 10.
Artigo em Inglês | MEDLINE | ID: mdl-32522985

RESUMO

Structural variants (SVs) and short tandem repeats (STRs) are important sources of genetic diversity but are not routinely analyzed in genetic studies because they are difficult to accurately identify and genotype. Because SVs and STRs range in size and type, it is necessary to apply multiple algorithms that incorporate different types of evidence from sequencing data and employ complex filtering strategies to discover a comprehensive set of high-quality and reproducible variants. Here we assemble a set of 719 deep whole genome sequencing (WGS) samples (mean 42×) from 477 distinct individuals which we use to discover and genotype a wide spectrum of SV and STR variants using five algorithms. We use 177 unique pairs of genetic replicates to identify factors that affect variant call reproducibility and develop a systematic filtering strategy to create of one of the most complete and well characterized maps of SVs and STRs to date.


Assuntos
Repetições de Microssatélites/genética , Sequenciamento Completo do Genoma/métodos , Algoritmos , Biologia Computacional , Genótipo , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos
10.
Nat Commun ; 11(1): 2927, 2020 06 10.
Artigo em Inglês | MEDLINE | ID: mdl-32522982

RESUMO

Structural variants (SVs) and short tandem repeats (STRs) comprise a broad group of diverse DNA variants which vastly differ in their sizes and distributions across the genome. Here, we identify genomic features of SV classes and STRs that are associated with gene expression and complex traits, including their locations relative to eGenes, likelihood of being associated with multiple eGenes, associated eGene types (e.g., coding, noncoding, level of evolutionary constraint), effect sizes, linkage disequilibrium with tagging single nucleotide variants used in GWAS, and likelihood of being associated with GWAS traits. We identify a set of high-impact SVs/STRs associated with the expression of three or more eGenes via chromatin loops and show that they are highly enriched for being associated with GWAS traits. Our study provides insights into the genomic properties of structural variant classes and short tandem repeats that are associated with gene expression and human traits.


Assuntos
Repetições de Microssatélites/genética , Linhagem Celular , Variação Genética/genética , Estudo de Associação Genômica Ampla , Humanos , Desequilíbrio de Ligação/genética , Herança Multifatorial , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética
11.
Nat Commun ; 10(1): 4064, 2019 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-31492854

RESUMO

Population-based biobanks with genomic and dense phenotype data provide opportunities for generating effective therapeutic hypotheses and understanding the genomic role in disease predisposition. To characterize latent components of genetic associations, we apply truncated singular value decomposition (DeGAs) to matrices of summary statistics derived from genome-wide association analyses across 2,138 phenotypes measured in 337,199 White British individuals in the UK Biobank study. We systematically identify key components of genetic associations and the contributions of variants, genes, and phenotypes to each component. As an illustration of the utility of the approach to inform downstream experiments, we report putative loss of function variants, rs114285050 (GPR151) and rs150090666 (PDE3B), that substantially contribute to obesity-related traits and experimentally demonstrate the role of these genes in adipocyte biology. Our approach to dissect components of genetic associations across the human phenome will accelerate biomedical hypothesis generation by providing insights on previously unexplored latent structures.


Assuntos
Adipócitos/metabolismo , Bancos de Espécimes Biológicos , Estudos de Associação Genética/métodos , Estudo de Associação Genômica Ampla/métodos , Células 3T3-L1 , Adipócitos/citologia , Animais , Células Cultivadas , Nucleotídeo Cíclico Fosfodiesterase do Tipo 3/genética , Predisposição Genética para Doença/genética , Humanos , Camundongos , Obesidade/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Reino Unido
12.
Nat Genet ; 51(10): 1506-1517, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31570892

RESUMO

The cardiac transcription factor (TF) gene NKX2-5 has been associated with electrocardiographic (EKG) traits through genome-wide association studies (GWASs), but the extent to which differential binding of NKX2-5 at common regulatory variants contributes to these traits has not yet been studied. We analyzed transcriptomic and epigenomic data from induced pluripotent stem cell-derived cardiomyocytes from seven related individuals, and identified ~2,000 single-nucleotide variants associated with allele-specific effects (ASE-SNVs) on NKX2-5 binding. NKX2-5 ASE-SNVs were enriched for altered TF motifs, for heart-specific expression quantitative trait loci and for EKG GWAS signals. Using fine-mapping combined with epigenomic data from induced pluripotent stem cell-derived cardiomyocytes, we prioritized candidate causal variants for EKG traits, many of which were NKX2-5 ASE-SNVs. Experimentally characterizing two NKX2-5 ASE-SNVs (rs3807989 and rs590041) showed that they modulate the expression of target genes via differential protein binding in cardiac cells, indicating that they are functional variants underlying EKG GWAS signals. Our results show that differential NKX2-5 binding at numerous regulatory variants across the genome contributes to EKG phenotypes.


Assuntos
Fibrilação Atrial/genética , Fibrilação Atrial/patologia , Proteína Homeobox Nkx-2.5/genética , Proteína Homeobox Nkx-2.5/metabolismo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Elementos Reguladores de Transcrição , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Alelos , Criança , Eletrocardiografia , Epigenômica , Feminino , Predisposição Genética para Doença , Genoma Humano , Estudo de Associação Genômica Ampla , Humanos , Células-Tronco Pluripotentes Induzidas/metabolismo , Células-Tronco Pluripotentes Induzidas/patologia , Masculino , Pessoa de Meia-Idade , Miócitos Cardíacos/metabolismo , Miócitos Cardíacos/patologia , Fenótipo , Ligação Proteica , Transcriptoma , Adulto Jovem
13.
Nat Commun ; 9(1): 1612, 2018 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-29691392

RESUMO

Protein-truncating variants can have profound effects on gene function and are critical for clinical genome interpretation and generating therapeutic hypotheses, but their relevance to medical phenotypes has not been systematically assessed. Here, we characterize the effect of 18,228 protein-truncating variants across 135 phenotypes from the UK Biobank and find 27 associations between medical phenotypes and protein-truncating variants in genes outside the major histocompatibility complex. We perform phenome-wide analyses and directly measure the effect in homozygous carriers, commonly referred to as "human knockouts," across medical phenotypes for genes implicated as being protective against disease or associated with at least one phenotype in our study. We find several genes with strong pleiotropic or non-additive effects. Our results illustrate the importance of protein-truncating variants in a variety of diseases.


Assuntos
Bases de Dados de Ácidos Nucleicos , Proteínas/genética , Deleção de Sequência , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Estudo de Associação Genômica Ampla , Humanos , Fenótipo , Reino Unido
14.
Genetics ; 207(4): 1301-1312, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-29074555

RESUMO

Expression quantitative trait loci (eQTL) studies have typically used single-variant association analysis to identify genetic variants correlated with gene expression. However, this approach has several drawbacks: causal variants cannot be distinguished from nonfunctional variants in strong linkage disequilibrium, combined effects from multiple causal variants cannot be captured, and low-frequency (<5% MAF) eQTL variants are difficult to identify. While these issues possibly could be overcome by using sparse polygenic models, which associate multiple genetic variants with gene expression simultaneously, the predictive performance of these models for eQTL studies has not been evaluated. Here, we assessed the ability of three sparse polygenic models (Lasso, Elastic Net, and BSLMM) to identify causal variants, and compared their efficacy to single-variant association analysis and a fine-mapping model. Using simulated data, we determined that, while these methods performed similarly when there was one causal SNP present at a gene, BSLMM substantially outperformed single-variant association analysis for prioritizing causal eQTL variants when multiple causal eQTL variants were present (1.6- to 5.2-fold higher recall at 20% precision), and identified up to 2.3-fold more low frequency variants as the top eQTL SNP. Analysis of real RNA-seq and whole-genome sequencing data of 131 iPSC samples showed that the eQTL SNPs identified by BSLMM had a higher functional enrichment in DHS sites and were more often low-frequency than those identified with single-variant association analysis. Our study showed that BSLMM is a more effective approach than single-variant association analysis for prioritizing multiple causal eQTL variants at a single gene.


Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Herança Multifatorial/genética , Locos de Características Quantitativas/genética , Expressão Gênica/genética , Variação Genética , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único/genética
15.
Nat Commun ; 8(1): 436, 2017 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-28874753

RESUMO

Efforts to identify driver mutations in cancer have largely focused on genes, whereas non-coding sequences remain relatively unexplored. Here we develop a statistical method based on characteristics known to influence local mutation rate and a series of enrichment filters in order to identify distal regulatory elements harboring putative driver mutations in breast cancer. We identify ten DNase I hypersensitive sites that are significantly mutated in breast cancers and associated with the aberrant expression of neighboring genes. A pan-cancer analysis shows that three of these elements are significantly mutated across multiple cancer types and have mutation densities similar to protein-coding driver genes. Functional characterization of the most highly mutated DNase I hypersensitive sites in breast cancer (using in silico and experimental approaches) confirms that they are regulatory elements and affect the expression of cancer genes. Our study suggests that mutations of regulatory elements in tumors likely play an important role in cancer development.Cancer driver mutations can occur within noncoding genomic sequences. Here, the authors develop a statistical approach to identify candidate noncoding driver mutations in DNase I hypersensitive sites in breast cancer and experimentally demonstrate they are regulatory elements of known cancer genes.


Assuntos
Neoplasias da Mama/genética , Desoxirribonuclease I/metabolismo , Sequências Reguladoras de Ácido Nucleico/genética , Cromatina/metabolismo , Montagem e Desmontagem da Cromatina , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Mutação/genética , Reprodutibilidade dos Testes , Deleção de Sequência , Telomerase/metabolismo
16.
Cancer Discov ; 7(4): 410-423, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28188128

RESUMO

Recent studies have characterized the extensive somatic alterations that arise during cancer. However, the somatic evolution of a tumor may be significantly affected by inherited polymorphisms carried in the germline. Here, we analyze genomic data for 5,954 tumors to reveal and systematically validate 412 genetic interactions between germline polymorphisms and major somatic events, including tumor formation in specific tissues and alteration of specific cancer genes. Among germline-somatic interactions, we found germline variants in RBFOX1 that increased incidence of SF3B1 somatic mutation by 8-fold via functional alterations in RNA splicing. Similarly, 19p13.3 variants were associated with a 4-fold increased likelihood of somatic mutations in PTEN. In support of this association, we found that PTEN knockdown sensitizes the MTOR pathway to high expression of the 19p13.3 gene GNA11 Finally, we observed that stratifying patients by germline polymorphisms exposed distinct somatic mutation landscapes, implicating new cancer genes. This study creates a validated resource of inherited variants that govern where and how cancer develops, opening avenues for prevention research.Significance: This study systematically identifies germline variants that directly affect tumor evolution, either by dramatically increasing alteration frequency of specific cancer genes or by influencing the site where a tumor develops. Cancer Discovery; 7(4); 410-23. ©2017 AACR.See related commentary by Geeleher and Huang, p. 354This article is highlighted in the In This Issue feature, p. 339.


Assuntos
Genoma Humano , Genômica , Mutação em Linhagem Germinativa/genética , Neoplasias/genética , Subunidades alfa de Proteínas de Ligação ao GTP/genética , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias/patologia , PTEN Fosfo-Hidrolase/genética , Fosfoproteínas/genética , Polimorfismo Genético , Splicing de RNA/genética , Fatores de Processamento de RNA/genética , Serina-Treonina Quinases TOR/genética
17.
Cell Stem Cell ; 20(4): 533-546.e7, 2017 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-28388430

RESUMO

In this study, we used whole-genome sequencing and gene expression profiling of 215 human induced pluripotent stem cell (iPSC) lines from different donors to identify genetic variants associated with RNA expression for 5,746 genes. We were able to predict causal variants for these expression quantitative trait loci (eQTLs) that disrupt transcription factor binding and validated a subset of them experimentally. We also identified copy-number variant (CNV) eQTLs, including some that appear to affect gene expression by altering the copy number of intergenic regulatory regions. In addition, we were able to identify effects on gene expression of rare genic CNVs and regulatory single-nucleotide variants and found that reactivation of gene expression on the X chromosome depends on gene chromosomal position. Our work highlights the value of iPSCs for genetic association analyses and provides a unique resource for investigating the genetic regulation of gene expression in pluripotent cells.


Assuntos
Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Variação Genética , Células-Tronco Pluripotentes Induzidas/metabolismo , Sítios de Ligação/genética , Reprogramação Celular/genética , Cromossomos Humanos X/genética , Variações do Número de Cópias de DNA/genética , Heterogeneidade Genética , Humanos , Anotação de Sequência Molecular , Locos de Características Quantitativas/genética , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismo
18.
Stem Cell Reports ; 8(4): 1086-1100, 2017 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-28410642

RESUMO

Large-scale collections of induced pluripotent stem cells (iPSCs) could serve as powerful model systems for examining how genetic variation affects biology and disease. Here we describe the iPSCORE resource: a collection of systematically derived and characterized iPSC lines from 222 ethnically diverse individuals that allows for both familial and association-based genetic studies. iPSCORE lines are pluripotent with high genomic integrity (no or low numbers of somatic copy-number variants) as determined using high-throughput RNA-sequencing and genotyping arrays, respectively. Using iPSCs from a family of individuals, we show that iPSC-derived cardiomyocytes demonstrate gene expression patterns that cluster by genetic background, and can be used to examine variants associated with physiological and disease phenotypes. The iPSCORE collection contains representative individuals for risk and non-risk alleles for 95% of SNPs associated with human phenotypes through genome-wide association studies. Our study demonstrates the utility of iPSCORE for examining how genetic variants influence molecular and physiological traits in iPSCs and derived cell lines.


Assuntos
Arritmias Cardíacas/genética , Bases de Dados Factuais , Estudos de Associação Genética , Variação Genética , Células-Tronco Pluripotentes Induzidas/metabolismo , Miócitos Cardíacos/metabolismo , Arritmias Cardíacas/etnologia , Arritmias Cardíacas/metabolismo , Arritmias Cardíacas/fisiopatologia , Diferenciação Celular , Linhagem Celular , Reprogramação Celular/genética , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Família Multigênica , Miócitos Cardíacos/citologia , Análise de Sequência com Séries de Oligonucleotídeos , Fenótipo , Polimorfismo de Nucleotídeo Único , Grupos Raciais
19.
PLoS One ; 8(9): e73956, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24023918

RESUMO

Primary central nervous system lymphomas (PCNSL) have a dramatically increased prevalence among persons living with AIDS and are known to be associated with human Epstein Barr virus (EBV) infection. Previous work suggests that in some cases, co-infection with other viruses may be important for PCNSL pathogenesis. Viral transcription in tumor samples can be measured using next generation transcriptome sequencing. We demonstrate the ability of transcriptome sequencing to identify viruses, characterize viral expression, and identify viral variants by sequencing four archived AIDS-related PCNSL tissue samples and analyzing raw sequencing reads. EBV was detected in all four PCNSL samples and cytomegalovirus (CMV), JC polyomavirus (JCV), and HIV were also discovered, consistent with clinical diagnoses. CMV was found to express three long non-coding RNAs recently reported as expressed during active infection. Single nucleotide variants were observed in each of the viruses observed and three indels were found in CMV. No viruses were found in several control tumor types including 32 diffuse large B-cell lymphoma samples. This study demonstrates the ability of next generation transcriptome sequencing to accurately identify viruses, including DNA viruses, in solid human cancer tissue samples.


Assuntos
Neoplasias do Sistema Nervoso Central/virologia , Perfilação da Expressão Gênica , Linfoma/virologia , Vírus/genética , Vírus/isolamento & purificação , Síndrome da Imunodeficiência Adquirida/complicações , Idoso , Neoplasias do Sistema Nervoso Central/complicações , Feminino , Humanos , Linfoma/complicações , Masculino , Pessoa de Meia-Idade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA