RESUMEN
Complex structural variations (cxSVs) are often overlooked in genome analyses due to detection challenges. We developed ARC-SV, a probabilistic and machine-learning-based method that enables accurate detection and reconstruction of cxSVs from standard datasets. By applying ARC-SV across 4,262 genomes representing all continental populations, we identified cxSVs as a significant source of natural human genetic variation. Rare cxSVs have a propensity to occur in neural genes and loci that underwent rapid human-specific evolution, including those regulating corticogenesis. By performing single-nucleus multiomics in postmortem brains, we discovered cxSVs associated with differential gene expression and chromatin accessibility across various brain regions and cell types. Additionally, cxSVs detected in brains of psychiatric cases are enriched for linkage with psychiatric GWAS risk alleles detected in the same brains. Furthermore, our analysis revealed significantly decreased brain-region- and cell-type-specific expression of cxSV genes, specifically for psychiatric cases, implicating cxSVs in the molecular etiology of major neuropsychiatric disorders.
RESUMEN
Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (â¼30 tissues × â¼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.
Asunto(s)
Epigenoma , Sitios de Carácter Cuantitativo , Estudio de Asociación del Genoma Completo , Genómica , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.
Asunto(s)
Enfermedad/genética , Herencia Multifactorial/genética , Población/genética , ARN Largo no Codificante/genética , Transcriptoma , Enfermedad de la Arteria Coronaria/genética , Diabetes Mellitus Tipo 1/genética , Diabetes Mellitus Tipo 2/genética , Perfilación de la Expresión Génica , Variación Genética , Humanos , Enfermedades Inflamatorias del Intestino/genética , Especificidad de Órganos/genética , Sitios de Carácter CuantitativoRESUMEN
Exitron splicing (EIS) creates a cryptic intron (called an exitron) within a protein-coding exon to increase proteome diversity. EIS is poorly characterized, but emerging evidence suggests a role for EIS in cancer. Through a systematic investigation of EIS across 33 cancers from 9,599 tumor transcriptomes, we discovered that EIS affected 63% of human coding genes and that 95% of those events were tumor specific. Notably, we observed a mutually exclusive pattern between EIS and somatic mutations in their affected genes. Functionally, we discovered that EIS altered known and novel cancer driver genes for causing gain- or loss-of-function, which promotes tumor progression. Importantly, we identified EIS-derived neoepitopes that bind to major histocompatibility complex (MHC) class I or II. Analysis of clinical data from a clear cell renal cell carcinoma cohort revealed an association between EIS-derived neoantigen load and checkpoint inhibitor response. Our findings establish the importance of considering EIS alterations when nominating cancer driver events and neoantigens.
Asunto(s)
Epítopos/genética , Exones/genética , Perfilación de la Expresión Génica , Intrones/genética , Neoplasias/genética , Oncogenes , Empalme del ARN/genética , Secuencia de Aminoácidos , Línea Celular , Estudios de Cohortes , Humanos , Mutación/genéticaRESUMEN
Macroautophagy/autophagy is an essential catabolic process that targets a wide variety of cellular components including proteins, organelles, and pathogens. ATG7, a protein involved in the autophagy process, plays a crucial role in maintaining cellular homeostasis and can contribute to the development of diseases such as cancer. ATG7 initiates autophagy by facilitating the lipidation of the ATG8 proteins in the growing autophagosome membrane. The noncanonical isoform ATG7(2) is unable to perform ATG8 lipidation; however, its cellular regulation and function are unknown. Here, we uncovered a distinct regulation and function of ATG7(2) in contrast with ATG7(1), the canonical isoform. First, affinity-purification mass spectrometry analysis revealed that ATG7(2) establishes direct protein-protein interactions (PPIs) with metabolic proteins, whereas ATG7(1) primarily interacts with autophagy machinery proteins. Furthermore, we identified that ATG7(2) mediates a decrease in metabolic activity, highlighting a novel splice-dependent function of this important autophagy protein. Then, we found a divergent expression pattern of ATG7(1) and ATG7(2) across human tissues. Conclusively, our work uncovers the divergent patterns of expression, protein interactions, and function of ATG7(2) in contrast to ATG7(1). These findings suggest a molecular switch between main catabolic processes through isoform-dependent expression of a key autophagy gene.
Asunto(s)
Autofagia , Metabolismo Energético , Humanos , Autofagosomas/metabolismo , Proteínas Relacionadas con la Autofagia/metabolismo , Proteínas Asociadas a Microtúbulos/metabolismo , Isoformas de Proteínas/metabolismoRESUMEN
Although the human gene annotation has been continuously improved over the past 2 decades, numerous studies demonstrated the existence of a "dark proteome", consisting of proteins that were critical for biological processes but not included in widely used gene catalogs. The Genotype-Tissue Expression project generated more than 15,000 RNA-seq datasets from multiple tissues, which modeled 30 million transcripts in the human genome. To provide a resource of high-confidence novel proteins from the dark proteome, we screened 50,000 mass spectrometry runs from over 900 projects to identify proteins translated from the Genotype-Tissue Expression transcript model with proteomic support. We also integrated 3.8 million common genetic variants from the gnomAD database to improve peptide identification. As a result, we identified 170,529 novel peptides with proteomic evidence, of which 6048 passed the strictest standard we defined and were supported by PepQuery. We provided a user-friendly website (https://ncorf.genes.fun/) for researchers to check the evidence of novel peptides from their studies. The findings will improve our understanding of coding genes and facilitate genomic data interpretation in biomedical research.
Asunto(s)
Proteogenómica , Humanos , Proteogenómica/métodos , Proteoma/metabolismo , Proteómica/métodos , Péptidos/genética , Genoma HumanoRESUMEN
Autophagy, a highly conserved process of protein and organelle degradation, has emerged as a critical regulator in various diseases, including cancer progression. In the context of liver cancer, the predictive value of autophagy-related genes remains ambiguous. Leveraging chip datasets from the TCGA and GTEx databases, we identified 23 differentially expressed autophagy-related genes in liver cancer. Notably, five key autophagy genes, PRKAA2, BIRC5, MAPT, IGF1, and SPNS1, were highlighted as potential prognostic markers, with MAPT showing significant overexpression in clinical samples. In vitro cellular assays further demonstrated that MAPT promotes liver cancer cell proliferation, migration, and invasion by inhibiting autophagy and suppressing apoptosis. Subsequent in vivo studies further corroborated the pro-tumorigenic role of MAPT by suppressing autophagy. Collectively, our model based on the five key genes provides a promising tool for predicting liver cancer prognosis, with MAPT emerging as a pivotal factor in tumor progression through autophagy modulation.
Asunto(s)
Autofagia , Neoplasias Hepáticas , Proteínas tau , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/patología , Neoplasias Hepáticas/metabolismo , Autofagia/genética , Proteínas tau/genética , Proteínas tau/metabolismo , Pronóstico , Línea Celular Tumoral , Survivin/genética , Survivin/metabolismo , Proliferación Celular , Animales , Factor I del Crecimiento Similar a la Insulina/genética , Factor I del Crecimiento Similar a la Insulina/metabolismo , Biomarcadores de Tumor/genética , Movimiento Celular , Ratones , Apoptosis , Regulación Neoplásica de la Expresión Génica , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/patología , Carcinoma Hepatocelular/metabolismoRESUMEN
BACKGROUND: African cattle represent a unique resource of genetic diversity in response to adaptation to numerous environmental challenges. Characterising the genetic landscape of indigenous African cattle and identifying genomic regions and genes of functional importance can contribute to targeted breeding and tackle the loss of genetic diversity. However, pinpointing the adaptive variant and determining underlying functional mechanisms of adaptation remains challenging. RESULTS: In this study, we use selection signatures from whole-genome sequence data of eight indigenous African cattle breeds in combination with gene expression and quantitative trait loci (QTL) databases to characterise genomic targets of artificial selection and environmental adaptation and to identify the underlying functional candidate genes. In general, the trait-association analyses of selection signatures suggest the innate and adaptive immune system and production traits as important selection targets. For example, a large genomic region, with selection signatures identified for all breeds except N'Dama, was located on BTA27, including multiple defensin DEFB coding-genes. Out of 22 analysed tissues, genes under putative selection were significantly enriched for those overexpressed in adipose tissue, blood, lung, testis and uterus. Our results further suggest that cis-eQTL are themselves selection targets; for most tissues, we found a positive correlation between allele frequency differences and cis-eQTL effect size, suggesting that positive selection acts directly on regulatory variants. CONCLUSIONS: By combining selection signatures with information on gene expression and QTL, we were able to reveal compelling candidate selection targets that did not stand out from selection signature results alone (e.g. GIMAP8 for tick resistance and NDUFS3 for heat adaptation). Insights from this study will help to inform breeding and maintain diversity of locally adapted, and hence important, breeds.
Asunto(s)
Sitios de Carácter Cuantitativo , Selección Genética , Animales , Bovinos/genética , Fenotipo , Cruzamiento , Polimorfismo de Nucleótido Simple , Adaptación Fisiológica/genética , Frecuencia de los GenesRESUMEN
Tissue gene expression studies are impacted by biological and technical sources of variation, which can be broadly classified into wanted and unwanted variation. The latter, if not addressed, results in misleading biological conclusions. Methods have been proposed to reduce unwanted variation, such as normalization and batch correction. A more accurate understanding of all causes of variation could significantly improve the ability of these methods to remove unwanted variation while retaining variation corresponding to the biological question of interest. We used 17,282 samples from 49 human tissues in the Genotype-Tissue Expression data set (v8) to investigate patterns and causes of expression variation. Transcript expression was transformed to z-scores, and only the most variable 2% of transcripts were evaluated and clustered based on coexpression patterns. Clustered gene sets were assigned to different biological or technical causes based on histologic appearances and metadata elements. We identified 522 variable transcript clusters (median: 11 per tissue) among the samples. Of these, 63% were confidently explained, 16% were likely explained, 7% were low confidence explanations, and 14% had no clear cause. Histologic analysis annotated 46 clusters. Other common causes of variability included sex, sequencing contamination, immunoglobulin diversity, and compositional tissue differences. Less common biological causes included death interval (Hardy score), disease status, and age. Technical causes included blood draw timing and harvesting differences. Many of the causes of variation in bulk tissue expression were identifiable in the Tabula Sapiens data set of single-cell expression. This is among the largest explorations of the underlying sources of tissue expression variation. It uncovered expected and unexpected causes of variable gene expression and demonstrated the utility of matched histologic specimens. It further demonstrated the value of acquiring meaningful tissue harvesting metadata elements to use for improved normalization, batch correction, and analysis of both bulk and single-cell RNA-seq data.
Asunto(s)
Perfilación de la Expresión Génica , Humanos , Especificidad de Órganos , Análisis por ConglomeradosRESUMEN
Most disease-associated variants, although located in putatively regulatory regions, do not have detectable effects on gene expression. One explanation could be that we have not examined gene expression in the cell types or conditions that are most relevant for disease. Even large-scale efforts to study gene expression across tissues are limited to human samples obtained opportunistically or postmortem, mostly from adults. In this review we evaluate recent findings and suggest an alternative strategy, drawing on the dynamic and highly context-specific nature of gene regulation. We discuss new technologies that can extend the standard regulatory mapping framework to more diverse, disease-relevant cell types and states.
Asunto(s)
Predisposición Genética a la Enfermedad/genética , Sitios de Carácter Cuantitativo/genética , Animales , Expresión Génica/genética , Regulación de la Expresión Génica/genética , Humanos , Secuencias Reguladoras de Ácidos Nucleicos/genéticaRESUMEN
The spatial and temporal domain of a gene's expression can range from ubiquitous to highly specific. Quantifying the degree to which this expression is unique to a specific tissue or developmental timepoint can provide insight into the etiology of genetic diseases. However, quantifying specificity remains challenging as measures of specificity are sensitive to similarity between samples in the sample set. For example, in the Gene-Tissue Expression project (GTEx), brain subregions are overrepresented at 13 of 54 (24%) unique tissues sampled. In this dataset, existing specificity measures have a decreased ability to identify genes specific to the brain relative to other organs. To solve this problem, we leverage sample similarity information to weight samples such that overrepresented tissues do not have an outsized effect on specificity estimates. We test this reweighting procedure on 4 measures of specificity, Z-score, Tau, Tsi and Gini, in the GTEx data and in single cell datasets for zebrafish and mouse. For all of these measures, incorporating sample similarity information to weight samples results in greater stability of sets of genes called as specific and decreases the overall variance in the change of specificity estimates as sample sets become more unbalanced. Furthermore, the genes with the largest improvement in their specificity estimate's stability are those with functions related to the overrepresented sample types. Our results demonstrate that incorporating similarity information improves specificity estimates' stability to the choice of the sample set used to define the transcriptome, providing more robust and reproducible measures of specificity for downstream analyses.
Asunto(s)
Transcriptoma , Pez Cebra , Animales , Ratones , Pez Cebra/genéticaRESUMEN
OBJECTIVE: This study aimed to conclude the effect and mechanism of ZIC2 on immune infiltration in lung adenocarcinoma (LUAD). METHODS: Expression of ZIC2 in several kinds of normal tissues of TCGA data was analyzed and its correlation with the baseline characteristic of LUAD patients were analyzed. The immune infiltration analysis of LUAD patients was performed by CIBERSORT algorithm. The correlation analysis between ZIC2 and immune cell composition was performed. Additionally, the potential upstream regulatory mechanisms of ZIC2 were predicted to identify the possible miRNAs and lncRNAs that regulated ZIC2 in LUAD. In vitro and in vivo experiments were also conducted to confirm the potential effect of ZIC2 on cell proliferation and invasion ability of LUAD cells. RESULTS: ZIC2 expression was decreased in various normal tissues, but increased in multiple tumors, including LUAD, and correlated with the prognosis of LUAD patients. Enrichment by GO and KEGG suggested the possible association of ZIC2 with cell cycle and p53 signal pathway. ZIC2 expression was significantly correlated with T cells CD4 memory resting, Macrophages M1, and plasma cells, indicating that dysregulated ZIC2 expression in LUAD may directly influence immune infiltration. ZIC2 might be regulated by several different lncRNA-mediated ceRNA mechanisms. In vitro experiments validated the promotive effect of ZIC2 on cell viability and invasion ability of LUAD cells. In vivo experiments validated ZIC2 can accelerate tumor growth in nude mouse. CONCLUSION: ZIC2 regulated by different lncRNA-mediated ceRNA mechanisms may play a critical regulatory role in LUAD through mediating the composition of immune cells in tumor microenvironment.
Asunto(s)
Adenocarcinoma del Pulmón , Proliferación Celular , Biología Computacional , Regulación Neoplásica de la Expresión Génica , Neoplasias Pulmonares , MicroARNs , ARN Largo no Codificante , Factores de Transcripción , Humanos , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Adenocarcinoma del Pulmón/genética , Adenocarcinoma del Pulmón/inmunología , Adenocarcinoma del Pulmón/patología , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/inmunología , Neoplasias Pulmonares/patología , Animales , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , MicroARNs/genética , MicroARNs/metabolismo , Proliferación Celular/genética , Línea Celular Tumoral , Ratones , Ratones Desnudos , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , ARN Endógeno CompetitivoRESUMEN
BACKGROUND: Exfoliation syndrome (XFS) is an age-related systemic disorder characterized by excessive production and progressive accumulation of abnormal extracellular material, with pathognomonic ocular manifestations. It is the most common cause of secondary glaucoma, resulting in widespread global blindness. The largest global meta-analysis of XFS in 123,457 multi-ethnic individuals from 24 countries identified seven loci with the strongest association signal in chr15q22-25 region near LOXL1. Expression analysis have so far correlated coding and a few non-coding variants in the region with LOXL1 expression levels, but functional effects of these variants is unclear. We hypothesize that analysis of the contribution of the genetically determined component of gene expression to XFS risk can provide a powerful method to elucidate potential roles of additional genes and clarify biology that underlie XFS. RESULTS: Transcriptomic Wide Association Studies (TWAS) using PrediXcan models trained in 48 GTEx tissues leveraging on results from the multi-ethnic and European ancestry GWAS were performed. To eliminate the possibility of false-positive results due to Linkage Disequilibrium (LD) contamination, we i) performed PrediXcan analysis in reduced models removing variants in LD with LOXL1 missense variants associated with XFS, and variants in LOXL1 models in both multiethnic and European ancestry individuals, ii) conducted conditional analysis of the significant signals in European ancestry individuals, and iii) filtered signals based on correlated gene expression, LD and shared eQTLs, iv) conducted expression validation analysis in human iris tissues. We observed twenty-eight genes in chr15q22-25 region that showed statistically significant associations, which were whittled down to ten genes after statistical validations. In experimental analysis, mRNA transcript levels for ARID3B, CD276, LOXL1, NEO1, SCAMP2, and UBL7 were significantly decreased in iris tissues from XFS patients compared to control samples. TWAS genes for XFS were significantly enriched for genes associated with inflammatory conditions. We also observed a higher incidence of XFS comorbidity with inflammatory and connective tissue diseases. CONCLUSION: Our results implicate a role for connective tissues and inflammation pathways in the etiology of XFS. Targeting the inflammatory pathway may be a potential therapeutic option to reduce progression in XFS.
Asunto(s)
Síndrome de Exfoliación , Humanos , Síndrome de Exfoliación/genética , Síndrome de Exfoliación/complicaciones , Síndrome de Exfoliación/metabolismo , Aminoácido Oxidorreductasas/genética , ARN Mensajero , Mutación Missense , Expresión Génica , Polimorfismo de Nucleótido Simple , Proteínas de Unión al ADN/genética , Antígenos B7/genéticaRESUMEN
There is particular interest in transcriptome-wide association studies (TWAS) gene-level tests based on multi-SNP predictive models of gene expression-for identifying causal genes at loci associated with complex traits. However, interpretation of TWAS associations may be complicated by divergent effects of model SNPs on phenotype and gene expression. We developed an iterative modeling scheme for obtaining multi-SNP models of gene expression and applied this framework to generate expression models for 43 human tissues from the Genotype-Tissue Expression (GTEx) Project. We characterized the performance of single- and multi-SNP models for identifying causal genes in GWAS data for 46 circulating metabolites. We show that: (A) multi-SNP models captured more variation in expression than did the top cis-eQTL (median 2-fold improvement); (B) predicted expression based on multi-SNP models was associated (false discovery rate < 0.01) with metabolite levels for 826 unique gene-metabolite pairs, but, after stepwise conditional analyses, 90% were dominated by a single eQTL SNP; (C) among the 35% of associations where a SNP in the expression model was a significant cis-eQTL and metabolomic-QTL (met-QTL), 92% demonstrated colocalization between these signals, but interpretation was often complicated by incomplete overlap of QTLs in multi-SNP models; and (D) using a "truth" set of causal genes at 61 met-QTLs, the sensitivity was high (67%), but the positive predictive value was low, as only 8% of TWAS associations (19% when restricted to colocalized associations at met-QTLs) involved true causal genes. These results guide the interpretation of TWAS and highlight the need for corroborative data to provide confident assignment of causality.
Asunto(s)
Regulación de la Expresión Génica , Predisposición Genética a la Enfermedad , Metaboloma , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Transcriptoma , Estudio de Asociación del Genoma Completo , Humanos , FenotipoRESUMEN
Expression quantitative trait loci (eQTL) studies utilize regression models to explain the variance of gene expressions with genetic loci or single nucleotide polymorphisms (SNPs). However, regression models for eQTL are challenged by the presence of high dimensional non-sparse and correlated SNPs with small effects, and nonlinear relationships between responses and SNPs. Principal component analyses are commonly conducted for dimension reduction without considering responses. Because of that, this non-supervised learning method often does not work well when the focus is on discovery of the response-covariate relationship. We propose a new supervised structural dimensional reduction method for semiparametric regression models with high dimensional and correlated covariates; we extract low-dimensional latent features from a vast number of correlated SNPs while accounting for their relationships, possibly nonlinear, with gene expressions. Our model identifies important SNPs associated with gene expressions and estimates the association parameters via a likelihood-based algorithm. A GTEx data application on a cancer related gene is presented with 18 novel eQTLs detected by our method. In addition, extensive simulations show that our method outperforms the other competing methods in bias, efficiency, and computational cost.
Asunto(s)
Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Humanos , Sitios de Carácter Cuantitativo/genética , Funciones de Verosimilitud , Estudio de Asociación del Genoma Completo/métodosRESUMEN
Adipose tissue is an important endocrine organ with a role in many cardiometabolic diseases. It is comprised of a heterogeneous collection of cell types that can differentially impact disease phenotypes. Cellular heterogeneity can also confound -omic analyses but is rarely taken into account in analysis of solid-tissue transcriptomes. Here, we investigate cell-type heterogeneity in two population-level subcutaneous adipose-tissue RNA-seq datasets (TwinsUK, n = 766 and the Genotype-Tissue Expression project [GTEx], n = 326) by estimating the relative proportions of four distinct cell types (adipocytes, macrophages, CD4+ T cells, and micro-vascular endothelial cells). We find significant cellular heterogeneity within and between the TwinsUK and GTEx adipose datasets. We find that adipose cell-type composition is heritable and confirm the positive association between adipose-resident macrophage proportion and obesity (high BMI), but we find a stronger BMI-independent association with dual-energy X-ray absorptiometry (DXA) derived body-fat distribution traits. We benchmark the impact of adipose-tissue cell composition on a range of standard analyses, including phenotype-gene expression association, co-expression networks, and cis-eQTL discovery. Our results indicate that it is critical to account for cell-type composition when combining adipose transcriptome datasets in co-expression analysis and in differential expression analysis with obesity-related traits. We applied gene expression by cell-type proportion interaction models (G × Cell) to identify 26 cell-type-specific expression quantitative trait loci (eQTLs) in 20 genes, including four autoimmune disease genome-wide association study (GWAS) loci. These results identify cell-specific eQTLs and demonstrate the potential of in silico deconvolution of bulk tissue to identify cell-type-restricted regulatory variants.
Asunto(s)
Tejido Adiposo/patología , Predisposición Genética a la Enfermedad , Inflamación/patología , Herencia Multifactorial/genética , Obesidad/patología , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Tejido Adiposo/metabolismo , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Perfilación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Humanos , Inflamación/genética , Masculino , Persona de Mediana Edad , Obesidad/genética , Fenotipo , TranscriptomaRESUMEN
Long non-coding RNAs (lncRNAs) play an important role in gene regulation and are increasingly being recognized as crucial mediators of disease pathogenesis. However, the vast majority of published transcriptome datasets lack high-quality lncRNA profiles compared to protein-coding genes (PCGs). Here we propose a framework to harnesses the correlative expression patterns between lncRNA and PCGs to impute unknown lncRNA profiles. The lncRNA expression imputation (LEXI) framework enables characterization of lncRNA transcriptome of samples lacking any lncRNA data using only their PCG profiles. We compare various machine learning and missing value imputation algorithms to implement LEXI and demonstrate the feasibility of this approach to impute lncRNA transcriptome of normal and cancer tissues. Additionally, we determine the factors that influence imputation accuracy and provide guidelines for implementing this approach.
Asunto(s)
Perfilación de la Expresión Génica , Proteínas/genética , ARN Largo no Codificante/genética , Transcriptoma , Algoritmos , Línea Celular , Conjuntos de Datos como Asunto , Humanos , Aprendizaje AutomáticoRESUMEN
MCAM (CD146) is a cell surface adhesion molecule that has been reported to promote cancer development, progression and metastasis and is considered as a potential tumor biomarker and therapeutic target. However, inconsistent reports exist, and its clinical value is yet to be confirmed. Here we took advantage of several large genomic data collections (Genotype-Tissue Expression, The Cancer Genome Atlas and Cancer Cell Line Encyclopedia) and comprehensively analyzed MCAM expression in thousands of normal and cancer samples and cell lines along with their clinical phenotypes and drug response information. Our results show that MCAM is very highly expressed in large vessel tissues while majority of tissues have low or minimal expression. Its expression is dramatically increased in a few tumors but significantly decreased in most other tumors relative to their pairing normal tissues. Increased MCAM expression is associated with a higher tumor stage and worse patient survival for some less common tumors but not for major ones. Higher MCAM expression in primary tumors may be complicated by tumor-associated or normal stromal blood vessels yet its significance may differ from the one from cancer cells. MCAM expression is weakly associated with the response to a few small molecular drugs and the association with targeted anti-BRAF agents suggests its involvement in that pathway which warrants further investigation.
Asunto(s)
Neoplasias/genética , Antineoplásicos/farmacología , Vasos Sanguíneos/metabolismo , Antígeno CD146/genética , Línea Celular , Línea Celular Tumoral , Bases de Datos Genéticas , Progresión de la Enfermedad , Regulación Neoplásica de la Expresión Génica , Humanos , Neoplasias/patología , Análisis de SupervivenciaRESUMEN
OBJECTIVE: Pancreatic adenocarcinoma (PAAD) is a leading cause of cancer-related mortality in adults. Syndecan-4 (SDC4) is involved in cancer pathogenesis. Therefore, this study aimed to explore the expression and clinical significance of SDC4 in PAAD. METHODS: Differentially expressed genes (DEGs) between PAAD and normal pancreas were screened from the GTEx and TCGA databases, and the correlationship between the DEGs and prognosis were analyzed. The prognostic value of the screened SDC4, SERPINE1, and SLC2A1 was evaluated using the Kaplan-Meier curve and SDC4 was subsequently selected as the better candidate. Also, SDC4 expression was analyzed in PAAD tissues, the other risk factors affecting postoperative survival were analyzed using Cox regression analysis, and SDC4-mediated pathways enrichment was identified by GSVA and GSEA. SDC4 expression in PAAD tissues and adjacent normal tissues of selected PAAD patients was detected by RT-qPCR and immunohistochemistry. The correlation between SDC4 and clinical features was evaluated by the χ2 test. RESULTS: SDC4 was highly expressed in PAAD tissues. Elevated SDC4 was correlated with reduced overall survival. SDC4 enrichment pathways included spliceosome function, proteasome activity, pentose phosphate pathway, base excision repair, mismatch repair, DNA replication, oxidative phosphorylation, mitotic spindle formation, epithelial-mesenchymal transition, and G2M checkpoints. SDC4 was elevated in PAAD tissues of PAAD patients compared with adjacent normal tissues. High SDC4 expression was related to metastatic differentiation, TNM stage, lymphatic metastasis, and lower 3-year survival rate. SDC4 was an independent risk factor affecting postoperative survival. CONCLUSION: SDC4 was highly expressed in PAAD and was related to clinicopathological features and poor prognosis, which might be an important index for PAAD early diagnosis and prognosis.
Asunto(s)
Adenocarcinoma , Neoplasias Pancreáticas , Adenocarcinoma/patología , Regulación Neoplásica de la Expresión Génica , Humanos , Neoplasias Pancreáticas/patología , Pronóstico , Complejo de la Endopetidasa Proteasomal/genética , Sindecano-4/genética , Sindecano-4/metabolismo , Neoplasias PancreáticasRESUMEN
BACKGROUND: Previous genome-wide association studies (GWAS) identified genome-wide significant risk loci in chronic pancreatitis and investigated underlying disease causing mechanisms by simple overlaps with expression quantitative trait loci (eQTLs), a procedure which may often result in false positive conclusions. METHODS: We conducted a GWAS in 584 non-alcoholic chronic pancreatitis (NACP) patients and 6040 healthy controls. Next, we applied Bayesian colocalization analysis of identified genome-wide significant risk loci from both, our recently published alcoholic chronic pancreatitis (ACP) and the novel NACP dataset, with pancreas eQTLs from the GTEx V8 European cohort to prioritize candidate causal genes and extracted credible sets of shared causal variants. RESULTS: Variants at the CTRC (p = 1.22 × 10-21) and SPINK1 (p = 6.59 × 10-47) risk loci reached genome-wide significance in NACP. CTRC risk variants colocalized with CTRC eQTLs in ACP (PP4 = 0.99, PP4/PP3 = 95.51) and NACP (PP4 = 0.99, PP4/PP3 = 95.46). For both diseases, the 95% credible set of shared causal variants consisted of rs497078 and rs545634. CLDN2-MORC4 risk variants colocalized with CLDN2 eQTLs in ACP (PP4 = 0.98, PP4/PP3 = 42.20) and NACP (PP4 = 0.67, PP4/PP3 = 7.18), probably driven by the shared causal variant rs12688220. CONCLUSIONS: A shared causal CTRC risk variant might unfold its pathogenic effect in ACP and NACP by reducing CTRC expression, while the CLDN2-MORC4 shared causal variant rs12688220 may modify ACP and NACP risk by increasing CLDN2 expression.