RESUMEN
Cupping of the optic nerve head, a highly heritable trait, is a hallmark of glaucomatous optic neuropathy. Two key parameters are vertical cup-to-disc ratio (VCDR) and vertical disc diameter (VDD). However, manual assessment often suffers from poor accuracy and is time intensive. Here, we show convolutional neural network models can accurately estimate VCDR and VDD for 282,100 images from both UK Biobank and an independent study (Canadian Longitudinal Study on Aging), enabling cross-ancestry epidemiological studies and new genetic discovery for these optic nerve head parameters. Using the AI approach, we perform a systematic comparison of the distribution of VCDR and VDD and compare these with intraocular pressure and glaucoma diagnoses across various genetically determined ancestries, which provides an explanation for the high rates of normal tension glaucoma in East Asia. We then used the large number of AI gradings to conduct a more powerful genome-wide association study (GWAS) of optic nerve head parameters. Using the AI-based gradings increased estimates of heritability by â¼50% for VCDR and VDD. Our GWAS identified more than 200 loci associated with both VCDR and VDD (double the number of loci from previous studies) and uncovered dozens of biological pathways; many of the loci we discovered also confer risk for glaucoma.
Asunto(s)
Inteligencia Artificial , Glaucoma/genética , Disco Óptico/diagnóstico por imagen , Adulto , Anciano , Algoritmos , Femenino , Estudio de Asociación del Genoma Completo , Glaucoma/diagnóstico , Glaucoma/patología , Humanos , Procesamiento de Imagen Asistido por Computador , Patrón de Herencia , Presión Intraocular , Masculino , Persona de Mediana Edad , Red Nerviosa , Disco Óptico/patología , Fotograbar , Polimorfismo de Nucleótido Simple , Factores de RiesgoRESUMEN
Nicotiana benthamiana is predominantly distributed in arid habitats across northern Australia. However, none of six geographically isolated accessions shows obvious xerophytic morphological features. To investigate how these tender-looking plants withstand drought, we examined their responses to water deprivation, assessed phenotypic, physiological, and cellular responses, and analysed cuticular wax composition and wax biosynthesis gene expression profiles. Results showed that the Central Australia (CA) accession, globally known as a research tool, has evolved a drought escape strategy with early vigour, short life cycle, and weak, water loss-limiting responses. By contrast, a northern Queensland (NQ) accession responded to drought by slowing growth, inhibiting flowering, increasing leaf cuticle thickness, and altering cuticular wax composition. Under water stress, NQ increased the heat stability and water impermeability of its cuticle by extending the carbon backbone of cuticular long-chain alkanes from c. 25 to 33. This correlated with rapid upregulation of at least five wax biosynthesis genes. In CA, the alkane chain lengths (c. 25) and gene expression profiles remained largely unaltered. This study highlights complex genetic and environmental control over cuticle composition and provides evidence for divergence into at least two fundamentally different drought response strategies within the N. benthamiana species in < 1 million years.
Asunto(s)
Sequías , Regulación de la Expresión Génica de las Plantas , Nicotiana , Ceras , Nicotiana/genética , Nicotiana/fisiología , Ceras/metabolismo , Hojas de la Planta/fisiología , Hojas de la Planta/anatomía & histología , Especificidad de la Especie , Agua/metabolismo , Genes de Plantas , Estrés Fisiológico , Epidermis de la Planta/fisiología , Flores/fisiología , Flores/anatomía & histología , Fenotipo , Alcanos/metabolismo , AustraliaRESUMEN
BACKGROUND: Gastro-oesophageal reflux disease (GORD) is associated with idiopathic pulmonary fibrosis (IPF) in observational studies. It is not known if this association arises because GORD causes IPF or because IPF causes GORD, or because of confounding by factors, such as smoking, associated with both GORD and IPF. We used bidirectional Mendelian randomisation (MR), where genetic variants are used as instrumental variables to address issues of confounding and reverse causation, to examine how, if at all, GORD and IPF are causally related. METHODS: A bidirectional two-sample MR was performed to estimate the causal effect of GORD on IPF risk and of IPF on GORD risk, using genetic data from the largest GORD (78 707 cases and 288 734 controls) and IPF (4125 cases and 20 464 controls) genome-wide association meta-analyses currently available. RESULTS: GORD increased the risk of IPF, with an OR of 1.6 (95% CI 1.04-2.49; p=0.032). There was no evidence of a causal effect of IPF on the risk of GORD, with an OR of 0.999 (95% CI 0.997-1.000; p=0.245). CONCLUSIONS: We found that GORD increases the risk of IPF, but found no evidence that IPF increases the risk of GORD. GORD should be considered in future studies of IPF risk and interest in it as a potential therapeutic target should be renewed. The mechanisms underlying the effect of GORD on IPF should also be investigated.
Asunto(s)
Reflujo Gastroesofágico , Fibrosis Pulmonar Idiopática , Humanos , Reflujo Gastroesofágico/complicaciones , Reflujo Gastroesofágico/genética , Reflujo Gastroesofágico/tratamiento farmacológico , Estudio de Asociación del Genoma Completo , Fibrosis Pulmonar Idiopática/genética , Fibrosis Pulmonar Idiopática/complicacionesRESUMEN
OBJECTIVE: Gastro-oesophageal reflux disease (GERD) has heterogeneous aetiology primarily attributable to its symptom-based definitions. GERD genome-wide association studies (GWASs) have shown strong genetic overlaps with established risk factors such as obesity and depression. We hypothesised that the shared genetic architecture between GERD and these risk factors can be leveraged to (1) identify new GERD and Barrett's oesophagus (BE) risk loci and (2) explore potentially heterogeneous pathways leading to GERD and oesophageal complications. DESIGN: We applied multitrait GWAS models combining GERD (78 707 cases; 288 734 controls) and genetically correlated traits including education attainment, depression and body mass index. We also used multitrait analysis to identify BE risk loci. Top hits were replicated in 23andMe (462 753 GERD cases, 24 099 BE cases, 1 484 025 controls). We additionally dissected the GERD loci into obesity-driven and depression-driven subgroups. These subgroups were investigated to determine how they relate to tissue-specific gene expression and to risk of serious oesophageal disease (BE and/or oesophageal adenocarcinoma, EA). RESULTS: We identified 88 loci associated with GERD, with 59 replicating in 23andMe after multiple testing corrections. Our BE analysis identified seven novel loci. Additionally we showed that only the obesity-driven GERD loci (but not the depression-driven loci) were associated with genes enriched in oesophageal tissues and successfully predicted BE/EA. CONCLUSION: Our multitrait model identified many novel risk loci for GERD and BE. We present strong evidence for a genetic underpinning of disease heterogeneity in GERD and show that GERD loci associated with depressive symptoms are not strong predictors of BE/EA relative to obesity-driven GERD loci.
Asunto(s)
Esófago de Barrett , Neoplasias Esofágicas , Esofagitis Péptica , Reflujo Gastroesofágico , Esófago de Barrett/complicaciones , Esófago de Barrett/diagnóstico , Esófago de Barrett/genética , Neoplasias Esofágicas/diagnóstico , Neoplasias Esofágicas/genética , Reflujo Gastroesofágico/complicaciones , Reflujo Gastroesofágico/diagnóstico , Reflujo Gastroesofágico/genética , Estudio de Asociación del Genoma Completo , Humanos , Obesidad/complicaciones , Obesidad/genéticaRESUMEN
Alcohol consumption is correlated positively with risk for breast cancer in observational studies, but observational studies are subject to reverse causation and confounding. The association with epithelial ovarian cancer (EOC) is unclear. We performed both observational Cox regression and two-sample Mendelian randomization (MR) analyses using data from various European cohort studies (observational) and publicly available cancer consortia (MR). These estimates were compared to World Cancer Research Fund (WCRF) findings. In our observational analyses, the multivariable-adjusted hazard ratios (HR) for a one standard drink/day increase was 1.06 (95% confidence interval [CI]; 1.04, 1.08) for breast cancer and 1.00 (0.92, 1.08) for EOC, both of which were consistent with previous WCRF findings. MR ORs per genetically predicted one standard drink/day increase estimated via 34 SNPs using MR-PRESSO were 1.00 (0.93, 1.08) for breast cancer and 0.95 (0.85, 1.06) for EOC. Stratification by EOC subtype or estrogen receptor status in breast cancers made no meaningful difference to the results. For breast cancer, the CIs for the genetically derived estimates include the point-estimate from observational studies so are not inconsistent with a small increase in risk. Our data provide additional evidence that alcohol intake is unlikely to have anything other than a very small effect on risk of EOC.
Asunto(s)
Consumo de Bebidas Alcohólicas/efectos adversos , Neoplasias de la Mama/epidemiología , Carcinoma Epitelial de Ovario/epidemiología , Neoplasias Ováricas/epidemiología , Causalidad , Estudios de Cohortes , Femenino , Humanos , Análisis de la Aleatorización Mendeliana , Oportunidad RelativaRESUMEN
The keratinocyte cancers (KC), basal cell carcinoma (BCC) and squamous cell carcinoma (SCC) are the most common cancers in fair-skinned people. KC treatment represents the second highest cancer healthcare expenditure in Australia. Increasing our understanding of the genetic architecture of KC may provide new avenues for prevention and treatment. We first conducted a series of genome-wide association studies (GWAS) of KC across three European ancestry datasets from Australia, Europe and USA, and used linkage disequilibrium (LD) Score regression (LDSC) to estimate their pairwise genetic correlations. We employed a multiple-trait approach to map genes across the combined set of KC GWAS (total N = 47 742 cases, 634 413 controls). We also performed meta-analyses of BCC and SCC separately to identify trait specific loci. We found substantial genetic correlations (generally 0.5-1) between BCC and SCC suggesting overlapping genetic risk variants. The multiple trait combined KC GWAS identified 63 independent genome-wide significant loci, 29 of which were novel. Individual separate meta-analyses of BCC and SCC identified an additional 13 novel loci not found in the combined KC analysis. Three new loci were implicated using gene-based tests. New loci included common variants in BRCA2 (distinct to known rare high penetrance cancer risk variants), and in CTLA4, a target of immunotherapy in melanoma. We found shared and trait specific genetic contributions to BCC and SCC. Considering both, we identified a total of 79 independent risk loci, 45 of which are novel.
Asunto(s)
Carcinoma Basocelular/genética , Carcinoma de Células Escamosas/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Queratinocitos/metabolismo , Sitios de Carácter Cuantitativo , Neoplasias Cutáneas/genética , Alelos , Carcinoma Basocelular/metabolismo , Carcinoma Basocelular/patología , Carcinoma de Células Escamosas/metabolismo , Carcinoma de Células Escamosas/patología , Estudios de Casos y Controles , Biología Computacional/métodos , Perfilación de la Expresión Génica , Humanos , Queratinocitos/patología , Anotación de Secuencia Molecular , Oportunidad Relativa , Polimorfismo de Nucleótido Simple , Carácter Cuantitativo Heredable , Neoplasias Cutáneas/metabolismo , Neoplasias Cutáneas/patologíaRESUMEN
Optic nerve head morphology is affected by several retinal diseases. We measured the vertical optic disc diameter (DD) of the UK Biobank (UKBB) cohort (N = 67 040) and performed the largest genome-wide association study (GWAS) of DD to date. We identified 81 loci (66 novel) for vertical DD. We then replicated the novel loci in International Glaucoma Genetic Consortium (IGGC, N = 22 504) and European Prospective Investigation into Cancer-Norfolk (N = 6005); in general the concordance in effect sizes was very high (correlation in effect size estimates 0.90): 44 of the 66 novel loci were significant at P < 0.05, with 19 remaining significant after Bonferroni correction. We identified another 26 novel loci in the meta-analysis of UKBB and IGGC data. Gene-based analyses identified an additional 57 genes. Human ocular tissue gene expression analysis showed that most of the identified genes are enriched in optic nerve head tissue. Some of the identified loci exhibited pleiotropic effects with vertical cup-to-disc ratio, intraocular pressure, glaucoma and myopia. These results can enhance our understanding of the genetics of optic disc morphology and shed light on the genetic findings for other ophthalmic disorders such as glaucoma and other optic nerve diseases.
Asunto(s)
Estudio de Asociación del Genoma Completo , Glaucoma/genética , Disco Óptico/anatomía & histología , Adulto , Anciano , Bases de Datos Factuales , Femenino , Expresión Génica , Glaucoma/metabolismo , Humanos , Masculino , Persona de Mediana Edad , Disco Óptico/metabolismo , Polimorfismo de Nucleótido Simple , Estudios ProspectivosRESUMEN
There is considerable debate regarding the role that 25-hydroxyvitamin D [25(OH)D] concentrations play in cancer risk or mortality, with earlier studies drawing mixed conclusions. Using data from the UK Biobank (UKB), we evaluate whether genetically predicted 25(OH)D concentrations are associated with overall cancer susceptibility and cancer mortality using five 25(OH)D genetic markers. Data comprised 438 870 white British UKB participants aged 37-73, including 46 155 cancer cases and 6998 cancer-specific deaths. Participants with keratinocyte cancers and/or benign tumors were excluded from the analysis. Odds ratios were calculated per 20 nmol/L increase in genetically predicted 25(OH)D for cancer risk and cancer mortality. For individual cancer risks, estimates were meta-analyzed with publicly available data using a fixed-effect inverse-variance-weighted model. We demonstrated that genetically low plasma 25(OH)D concentrations were not associated with increased cancer risk nor cancer mortality. Stratification by sex or cancer types did not reveal any meaningful differences albeit wider confidence intervals. Fixed-effect meta-analysis of our individual cancer risk estimates with those derived from publicly available cancer consortia data and previous studies further reinforced our null Mendelian randomization findings on prostate, lung, colorectal and breast cancers with tight confidence intervals; for ovarian and pancreatic cancers, our estimates were less precise despite being not statistically significant. Taken altogether, our results provide no genetic evidence for an association between vitamin D and overall cancer outcomes, with tight confidence intervals to exclude all but very small effect sizes.
Asunto(s)
Análisis de la Aleatorización Mendeliana , Neoplasias/sangre , Neoplasias/genética , Vitamina D/sangre , Adulto , Anciano , Femenino , Predisposición Genética a la Enfermedad , Genotipo , Humanos , Masculino , Persona de Mediana Edad , Neoplasias/mortalidad , Neoplasias/patología , Polimorfismo de Nucleótido Simple , Factores de Riesgo , Vitamina D/análogos & derivados , Población BlancaRESUMEN
BACKGROUND: Depression is a clinically heterogeneous disorder. Previous large-scale genetic studies of depression have explored genetic risk factors of depression case-control status or aggregated sums of depressive symptoms, ignoring possible clinical or genetic heterogeneity. METHODS: We analyse data from 148 752 subjects of white British ancestry in the UK Biobank who completed nine items of a self-rated measure of current depressive symptoms: the Patient Health Questionnaire (PHQ-9). Genome-Wide Association analyses were conducted for nine symptoms and two composite measures. LD Score Regression was used to calculate SNP-based heritability (h2SNP) and genetic correlations (rg) across symptoms and to investigate genetic correlations with 25 external phenotypes. Genomic structural equation modelling was used to test the genetic factor structure across the nine symptoms. RESULTS: We identified nine genome-wide significant genomic loci (8 novel), with no overlap in loci across symptoms. h2SNP ranged from 6% (concentration problems) to 9% (appetite changes). Genetic correlations ranged from 0.54 to 0.96 (all p < 1.39 × 10-3) with 30 of 36 correlations being significantly smaller than one. A two-factor model provided the best fit to the genetic covariance matrix, with factors representing 'psychological' and 'somatic' symptoms. The genetic correlations with external phenotypes showed large variation across the nine symptoms. CONCLUSIONS: Patterns of SNP associations and genetic correlations differ across the nine symptoms, suggesting that current depressive symptoms are genetically heterogeneous. Our study highlights the value of symptom-level analyses in understanding the genetic architecture of a psychiatric trait. Future studies should investigate whether genetic heterogeneity is recapitulated in clinical symptoms of major depression.
Asunto(s)
Depresión/genética , Heterogeneidad Genética , Sitios Genéticos , Predisposición Genética a la Enfermedad , Anciano , Anciano de 80 o más Años , Estudios de Casos y Controles , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Persona de Mediana Edad , Cuestionario de Salud del Paciente , Fenotipo , Autoinforme , Reino Unido , Población Blanca/genéticaRESUMEN
BACKGROUND: Frequency and quantity of alcohol consumption are metrics commonly used to measure alcohol consumption behaviors. Epidemiological studies indicate that these alcohol consumption measures are differentially associated with (mental) health outcomes and socioeconomic status (SES). The current study aims to elucidate to what extent genetic risk factors are shared between frequency and quantity of alcohol consumption, and how these alcohol consumption measures are genetically associated with four broad phenotypic categories: (i) SES; (ii) substance use disorders; (iii) other psychiatric disorders; and (iv) psychological/personality traits. METHODS: Genome-Wide Association analyses were conducted to test genetic associations with alcohol consumption frequency (N = 438 308) and alcohol consumption quantity (N = 307 098 regular alcohol drinkers) within UK Biobank. For the other phenotypes, we used genome-wide association studies summary statistics. Genetic correlations (rg) between the alcohol measures and other phenotypes were estimated using LD score regression. RESULTS: We found a substantial genetic correlation between the frequency and quantity of alcohol consumption (rg = 0.52). Nevertheless, both measures consistently showed opposite genetic correlations with SES traits, and many substance use, psychiatric, and psychological/personality traits. High alcohol consumption frequency was genetically associated with high SES and low risk of substance use disorders and other psychiatric disorders, whereas the opposite applies for high alcohol consumption quantity. CONCLUSIONS: Although the frequency and quantity of alcohol consumption show substantial genetic overlap, they consistently show opposite patterns of genetic associations with SES-related phenotypes. Future studies should carefully consider the potential influence of SES on the shared genetic etiology between alcohol and adverse (mental) health outcomes.
Asunto(s)
Consumo de Bebidas Alcohólicas/genética , Salud Mental , Clase Social , Adulto , Anciano , Alcoholismo/genética , Bancos de Muestras Biológicas , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Persona de Mediana Edad , Trastornos Relacionados con Sustancias/genética , Reino UnidoRESUMEN
Serum C-reactive protein (CRP), an important inflammatory marker, has been associated with age-related macular degeneration (AMD) in observational studies; however, the findings are inconsistent. It remains unclear whether the association between circulating CRP levels and AMD is causal. We used two-sample Mendelian randomization (MR) to evaluate the potential causal relationship between serum CRP levels and AMD risk. We derived genetic instruments for serum CRP levels in 418,642 participants of European ancestry from UK Biobank, and then conducted a genome-wide association study for 12,711 advanced AMD cases and 14,590 controls of European descent from the International AMD Genomics Consortium. Genetic variants which predicted elevated serum CRP levels were associated with advanced AMD (odds ratio [OR] for per standard deviation increase in serum CRP levels: 1.31, 95% confidence interval [CI]: 1.19-1.44, P = 5.2 × 10-8). The OR for the increase in advanced AMD risk when moving from low (< 3 mg/L) to high (> 3 mg/L) CRP levels is 1.29 (95% CI: 1.17-1.41). Our results were unchanged in sensitivity analyses using MR models which make different modelling assumptions. Our findings were broadly similar across the different forms of AMD (intermediate AMD, choroidal neovascularization, and geographic atrophy). We used multivariable MR to adjust for the effects of other potential AMD risk factors including smoking, body mass index, blood pressure and cholesterol; this did not alter our findings. Our study provides strong genetic evidence that higher circulating CRP levels lead to increases in risk for all forms of AMD. These findings highlight the potential utility for using circulating CRP as a biomarker in future trials aimed at modulating AMD risk via systemic therapies.
Asunto(s)
Proteína C-Reactiva/genética , Degeneración Macular/sangre , Degeneración Macular/genética , Análisis de la Aleatorización Mendeliana , Anciano , Anciano de 80 o más Años , Proteína C-Reactiva/metabolismo , Estudios de Casos y Controles , Femenino , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Degeneración Macular/epidemiología , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Factores de RiesgoRESUMEN
Genome-wide association study (GWAS) has turned out to be an essential technology for exploring the genetic mechanism of complex traits. To reduce the complexity of computation, it is well accepted to remove unrelated single nucleotide polymorphisms (SNPs) before GWAS, e.g., by using iterative sure independence screening expectation-maximization Bayesian Lasso (ISIS EM-BLASSO) method. In this work, a modified version of ISIS EM-BLASSO is proposed, which reduces the number of SNPs by a screening methodology based on Pearson correlation and mutual information, then estimates the effects via EM-Bayesian Lasso (EM-BLASSO), and finally detects the true quantitative trait nucleotides (QTNs) through likelihood ratio test. We call our method a two-stage mutual information based Bayesian Lasso (MBLASSO). Under three simulation scenarios, MBLASSO improves the statistical power and retains the higher effect estimation accuracy when comparing with three other algorithms. Moreover, MBLASSO performs best on model fitting, the accuracy of detected associations is the highest, and 21 genes can only be detected by MBLASSO in Arabidopsis thaliana datasets.
RESUMEN
BACKGROUND: Whether body mass index (BMI) is causally associated with the risk of being diagnosed with or dying from any cancer remains unclear. Weight reduction has clinical importance for cancer control only if weight gain causes cancer development or death. We aimed to answer the question 'does genetically predicted BMI influence my risk of being diagnosed with or dying from any cancer'. METHODS: We used a Mendelian randomisation (MR) approach to estimate causal effect of BMI in 46,155 white-British participants aged between 40 and 69 years at recruitment (median age at follow-up 61 years) from the UK Biobank, who developed any type of cancer, among whom 6998 died from cancer. To derive MR instruments for BMI, we selected up to 390,628 cancer-free participants. RESULTS: For each standard deviation (4.78 units) increase in genetically predicted BMI, we estimated a causal odds ratio (COR) of 1.07 (1.02-1.12) and 1.28 (1.16-1.41) for overall cancer risk and mortality, respectively. The corresponding estimates were similar for males and females, and smokers and non-smokers. CONCLUSIONS: Higher genetically predicted BMI increases the risk of being diagnosed with or dying from any cancer. These data suggest that increased overall weight may causally increase overall cancer incidence and mortality among Europeans.
Asunto(s)
Neoplasias/epidemiología , Neoplasias/mortalidad , Obesidad/epidemiología , Adulto , Anciano , Índice de Masa Corporal , Femenino , Humanos , Masculino , Análisis de la Aleatorización Mendeliana , Persona de Mediana Edad , Neoplasias/genética , Obesidad/genética , Sobrepeso/epidemiología , Sobrepeso/genética , Reino Unido , Población BlancaRESUMEN
BACKGROUND: Observational studies have shown that being taller is associated with greater cancer risk. However, the interpretation of such studies can be hampered by important issues such as confounding and reporting bias. METHODS: We used the UK Biobank resource to develop genetic predictors of height and applied these in a Mendelian randomisation framework to estimate the causal relationship between height and cancer. Up to 438,870 UK Biobank participants were considered in our analysis. We addressed two primary cancer outcomes, cancer incidence by age ~60 and cancer mortality by age ~60 (where age ~60 is the typical age of UK Biobank participants). RESULTS: We found that each genetically predicted 9 cm increase in height conferred an odds ratio of 1.10 (95% confidence interval 1.07-1.13) and 1.09 (1.02-1.16) for diagnosis of any cancer and death from any cancer, respectively. For both risk and mortality, the effect was larger in females than in males. CONCLUSIONS: Height increases the risk of being diagnosed with and dying from cancer. These findings from Mendelian randomisation analyses agree with observational studies and provide evidence that they were not likely to have been strongly affected by confounding or reporting bias.
Asunto(s)
Bancos de Muestras Biológicas/estadística & datos numéricos , Estatura/fisiología , Neoplasias/epidemiología , Estudios de Casos y Controles , Bases de Datos Factuales/estadística & datos numéricos , Femenino , Humanos , Masculino , Análisis de la Aleatorización Mendeliana , Persona de Mediana Edad , Mortalidad , Neoplasias/mortalidad , Sistema de Registros/estadística & datos numéricos , Factores de Riesgo , Reino Unido/epidemiologíaRESUMEN
BACKGROUND: Zinc-finger protein 384 (ZNF384) fusions are an emerging subtype of precursor B-cell acute lymphoblastic leukaemia (pre-B-ALL) and here we further characterised their prevalence, survival outcomes and transcriptome. METHODS: Bone marrow mononuclear cells from 274 BCR-ABL1-negative pre-B-ALL patients were immunophenotyped and transcriptome molecularly characterised. Transcriptomic data was analysed by principal component analysis and gene-set enrichment analysis to identify gene and pathway expression changes. RESULTS: We exclusively detect E1A-associated protein p300 (EP300)-ZNF384 in 5.7% of BCR-ABL1-negative adolescent/young adult (AYA)/adult pre-B-ALL patients. EP300-ZNF384 patients do not appear to be a high-risk subgroup. Transcriptomic analysis revealed that EP300-ZNF384 samples have a distinct gene expression profile that results in the up-regulation of Janus kinase/signal transducers and activators of transcription (JAK/STAT) and cell adhesion pathways and down-regulation of cell cycle and DNA repair pathways. CONCLUSIONS: Importantly, this report contributes to a better overview of the incidence of EP300-ZNF384 patients and show that they have a distinct gene signature with concurrent up-regulation of JAK-STAT pathway, reduced expression of B-cell regulators and reduced DNA repair capacity.
Asunto(s)
Proteína p300 Asociada a E1A/genética , Proteínas de Fusión Oncogénica/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/epidemiología , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Transactivadores/genética , Transcriptoma , Adolescente , Adulto , Niño , Femenino , Perfilación de la Expresión Génica , Regulación Leucémica de la Expresión Génica , Frecuencia de los Genes , Genes abl/genética , Humanos , Quinasas Janus/metabolismo , Masculino , Leucemia-Linfoma Linfoblástico de Células Precursoras/mortalidad , Leucemia-Linfoma Linfoblástico de Células Precursoras/patología , Recurrencia , Factores de Transcripción STAT/metabolismo , Transducción de Señal/genética , Análisis de Supervivencia , Adulto JovenRESUMEN
The kallikrein related peptidase gene family (KLKs) comprises 15 genes located between 19q13.3-13.4. KLKs have chymotrypsin and/or trypsin like activity, but the tissue/organ expression profile of each KLK varies considerably. Thus, the role of KLKs in human biology is also very diverse, and the deregulation of their function results in a wide-range of diseases. Here, we have cataloged the transcript (variants and fusions) and genetic (single nucleotide polymorphisms, small insertions/deletions, copy number variations (CNVs), and short tandem repeats) diversity at the KLK locus, providing a data set for researchers to explore the mechanisms through which KLK function may be deregulated. We reveal that the KLK locus hosts 85 fusion transcripts, and 80 variant transcripts. Interestingly, some fusion transcripts comprise up to 6 KLK genes. Our analysis of genetic variations of 2504 individuals from the 1000 Genome Project indicated that the KLK locus is rich in genetic diversity, with some fusion transcripts harboring over 1000 single nucleotide variations. We also found evidence from the literature linking 2387 KLK genetic variants with many types of diseases. Finally, genotyping data from the 131 KLK genetic variants in the NCI-60 cancer cell lines is provided as a resource for the cancer and KLK field.
Asunto(s)
Sitios Genéticos/genética , Variación Genética , Genómica , Calicreínas/genética , Análisis por Conglomerados , Humanos , ARN Mensajero/genética , ARN Mensajero/metabolismoRESUMEN
SUMMARY: Circos plots are graphical outputs that display three dimensional chromosomal interactions and fusion transcripts. However, the Circos plot tool is not an interactive visualization tool, but rather a figure generator. For example, it does not enable data to be added dynamically nor does it provide information for specific data points interactively. Recently, an R-based Circos tool (RCircos) has been developed to integrate Circos to R, but similarly, Rcircos can only be used to generate plots. Thus, we have developed a Circos plot tool (J-Circos) that is an interactive visualization tool that can plot Circos figures, as well as being able to dynamically add data to the figure, and providing information for specific data points using mouse hover display and zoom in/out functions. J-Circos uses the Java computer language to enable, it to be used on most operating systems (Windows, MacOS, Linux). Users can input data into J-Circos using flat data formats, as well as from the Graphical user interface (GUI). J-Circos will enable biologists to better study more complex chromosomal interactions and fusion transcripts that are otherwise difficult to visualize from next-generation sequencing data. AVAILABILITY AND IMPLEMENTATION: J-circos and its manual are freely available at http://www.australianprostatecentre.org/research/software/jcircos CONTACT: j.an@qut.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Cromosomas , Gráficos por Computador , Fusión Génica , Programas InformáticosRESUMEN
BACKGROUND: Strand specific RNAseq data is now more common in RNAseq projects. Visualizing RNAseq data has become an important matter in Analysis of sequencing data. The most widely used visualization tool is the UCSC genome browser that introduced the custom track concept that enabled researchers to simultaneously visualize gene expression at a particular locus from multiple experiments. Our objective of the software tool is to provide friendly interface for visualization of RNAseq datasets. RESULTS: This paper introduces a visualization tool (RNASeqBrowser) that incorporates and extends the functionality of the UCSC genome browser. For example, RNASeqBrowser simultaneously displays read coverage, SNPs, InDels and raw read tracks with other BED and wiggle tracks -- all being dynamically built from the BAM file. Paired reads are also connected in the browser to enable easier identification of novel exon/intron borders and chimaeric transcripts. Strand specific RNAseq data is also supported by RNASeqBrowser that displays reads above (positive strand transcript) or below (negative strand transcripts) a central line. Finally, RNASeqBrowser was designed for ease of use for users with few bioinformatic skills, and incorporates the features of many genome browsers into one platform. CONCLUSIONS: The features of RNASeqBrowser: (1) RNASeqBrowser integrates UCSC genome browser and NGS visualization tools such as IGV. It extends the functionality of the UCSC genome browser by adding several new types of tracks to show NGS data such as individual raw reads, SNPs and InDels. (2) RNASeqBrowser can dynamically generate RNA secondary structure. It is useful for identifying non-coding RNA such as miRNA. (3) Overlaying NGS wiggle data is helpful in displaying differential expression and is simple to implement in RNASeqBrowser. (4) NGS data accumulates a lot of raw reads. Thus, RNASeqBrowser collapses exact duplicate reads to reduce visualization space. Normal PC's can show many windows of NGS individual raw reads without much delay. (5) Multiple popup windows of individual raw reads provide users with more viewing space. This avoids existing approaches (such as IGV) which squeeze all raw reads into one window. This will be helpful for visualizing multiple datasets simultaneously. RNASeqBrowser and its manual are freely available at http://www.australianprostatecentre.org/research/software/rnaseqbrowser or http://sourceforge.net/projects/rnaseqbrowser/.
Asunto(s)
Bases de Datos Genéticas , Genoma , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Biología Computacional/métodos , Mutación INDEL/genética , InternetRESUMEN
BACKGROUND: Fusion transcripts are found in many tissues and have the potential to create novel functional products. Here, we investigate the genomic sequences around fusion junctions to better understand the transcriptional mechanisms mediating fusion transcription/splicing. We analyzed data from prostate (cancer) cells as previous studies have shown extensively that these cells readily undergo fusion transcription. RESULTS: We used the FusionMap program to identify high-confidence fusion transcripts from RNAseq data. The RNAseq datasets were from our (N = 8) and other (N = 14) clinical prostate tumors with adjacent non-cancer cells, and from the LNCaP prostate cancer cell line that were mock-, androgen- (DHT), and anti-androgen- (bicalutamide, enzalutamide) treated. In total, 185 fusion transcripts were identified from all RNAseq datasets. The majority (76%) of these fusion transcripts were 'read-through chimeras' derived from adjacent genes in the genome. Characterization of sequences at fusion loci were carried out using a combination of the FusionMap program, custom Perl scripts, and the RNAfold program. Our computational analysis indicated that most fusion junctions (76%) use the consensus GT-AG intron donor-acceptor splice site, and most fusion transcripts (85%) maintained the open reading frame. We assessed whether parental genes of fusion transcripts have the potential to form complementary base pairing between parental genes which might bring them into physical proximity. Our computational analysis of sequences flanking fusion junctions at parental loci indicate that these loci have a similar propensity as non-fusion loci to hybridize. The abundance of repetitive sequences at fusion and non-fusion loci was also investigated given that SINE repeats are involved in aberrant gene transcription. We found few instances of repetitive sequences at both fusion and non-fusion junctions. Finally, RT-qPCR was performed on RNA from both clinical prostate tumors and adjacent non-cancer cells (N = 7), and LNCaP cells treated as above to validate the expression of seven fusion transcripts and their respective parental genes. We reveal that fusion transcript expression is similar to the expression of parental genes. CONCLUSIONS: Fusion transcripts maintain the open reading frame, and likely use the same transcriptional machinery as non-fusion transcripts as they share many genomic features at splice/fusion junctions.
Asunto(s)
Regulación Neoplásica de la Expresión Génica , Neoplasias de la Próstata/genética , Sitios de Carácter Cuantitativo , Empalme del ARN , Transcripción Genética , Andrógenos/farmacología , Antineoplásicos Hormonales/farmacología , Biología Computacional/métodos , Secuencia Conservada , Conjuntos de Datos como Asunto , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Motivos de Nucleótidos , Sitios de Empalme de ARN , Secuencias Repetitivas de Ácidos NucleicosRESUMEN
miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star.