RESUMEN
The homologous genes GTPBP1 and GTPBP2 encode GTP-binding proteins 1 and 2, which are involved in ribosomal homeostasis. Pathogenic variants in GTPBP2 were recently shown to be an ultra-rare cause of neurodegenerative or neurodevelopmental disorders (NDDs). Until now, no human phenotype has been linked to GTPBP1. Here, we describe individuals carrying bi-allelic GTPBP1 variants that display an identical phenotype with GTPBP2 and characterize the overall spectrum of GTP-binding protein (1/2)-related disorders. In this study, 20 individuals from 16 families with distinct NDDs and syndromic facial features were investigated by whole-exome (WES) or whole-genome (WGS) sequencing. To assess the functional impact of the identified genetic variants, semi-quantitative PCR, western blot, and ribosome profiling assays were performed in fibroblasts from affected individuals. We also investigated the effect of reducing expression of CG2017, an ortholog of human GTPBP1/2, in the fruit fly Drosophila melanogaster. Individuals with bi-allelic GTPBP1 or GTPBP2 variants presented with microcephaly, profound neurodevelopmental impairment, pathognomonic craniofacial features, and ectodermal defects. Abnormal vision and/or hearing, progressive spasticity, choreoathetoid movements, refractory epilepsy, and brain atrophy were part of the core phenotype of this syndrome. Cell line studies identified a loss-of-function (LoF) impact of the disease-associated variants but no significant abnormalities on ribosome profiling. Reduced expression of CG2017 isoforms was associated with locomotor impairment in Drosophila. In conclusion, bi-allelic GTPBP1 and GTPBP2 LoF variants cause an identical, distinct neurodevelopmental syndrome. Mutant CG2017 knockout flies display motor impairment, highlighting the conserved role for GTP-binding proteins in CNS development across species.
Asunto(s)
Proteínas de Unión al GTP , Microcefalia , Malformaciones del Sistema Nervioso , Trastornos del Neurodesarrollo , Animales , Humanos , Drosophila melanogaster/genética , GTP Fosfohidrolasas/genética , Proteínas de Unión al GTP/genética , Trastornos del Neurodesarrollo/genética , Fenotipo , Proteínas de Drosophila/genéticaRESUMEN
DNA sample contamination is a major issue in clinical and research applications of whole-genome and -exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a metric to estimate DNA sample contamination from variant-level whole-genome and -exome sequence data called CHARR, contamination from homozygous alternate reference reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VariantDataset format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole-genome and exome sequencing datasets.
Asunto(s)
ADN , Trucha , Humanos , Animales , Análisis de Secuencia de ADN/métodos , Genotipo , Homocigoto , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas InformáticosRESUMEN
Non-coding variants have long been recognized as important contributors to common disease risks, but with the expansion of clinical whole genome sequencing, examples of rare, high-impact non-coding variants are also accumulating. Despite recent advances in the study of regulatory elements and the availability of specialized data collections, the systematic annotation of non-coding variants from genome sequencing remains challenging. Here, we propose a new framework for the prioritization of non-coding regulatory variants that integrates information about regulatory regions with prediction scores and HPO-based prioritization. Firstly, we created a comprehensive collection of annotations for regulatory regions including a database of 2.4 million regulatory elements (GREEN-DB) annotated with controlled gene(s), tissue(s) and associated phenotype(s) where available. Secondly, we calculated a variation constraint metric and showed that constrained regulatory regions associate with disease-associated genes and essential genes from mouse knock-outs. Thirdly, we compared 19 non-coding impact prediction scores providing suggestions for variant prioritization. Finally, we developed a VCF annotation tool (GREEN-VARAN) that can integrate all these elements to annotate variants for their potential regulatory impact. In our evaluation, we show that GREEN-DB can capture previously published disease-associated non-coding variants as well as identify additional candidate disease genes in trio analyses.
Asunto(s)
Anotación de Secuencia Molecular , Animales , Secuencia de Bases , Ratones , Secuenciación Completa del GenomaRESUMEN
In clinical practice, an antidepressant prescription is a trial and error approach, which is time consuming and discomforting for patients. This study investigated an in silico approach for ranking antidepressants based on their hypothetical likelihood of efficacy. We predicted the transcriptomic profile of citalopram remitters by performing an in silico transcriptomic-wide association study on STAR*D GWAS data (N = 1163). The transcriptional profile of remitters was compared with 21 antidepressant-induced gene expression profiles in five human cell lines available in the connectivity-map database. Spearman correlation, Pearson correlation, and the Kolmogorov-Smirnov test were used to determine the similarity between antidepressant-induced profiles and remitter profiles, subsequently calculating the average rank of antidepressants across the three methods and a p value for each rank by using a permutation procedure. The drugs with the top ranks were those having a high positive correlation with the expression profiles of remitters and that may have higher chances of efficacy in the tested patients. In MCF7 (breast cancer cell line), escitalopram had the highest average rank, with an average rank higher than expected by chance (p = 0.0014). In A375 (human melanoma) and PC3 (prostate cancer) cell lines, escitalopram and citalopram emerged as the second-highest ranked antidepressants, respectively (p = 0.0310 and 0.0276, respectively). In HA1E (kidney) and HT29 (colon cancer) cell types, citalopram and escitalopram did not fall among top antidepressants. The correlation between citalopram remitters' and (es)citalopram-induced expression profiles in three cell lines suggests that our approach may be useful and with future improvements, it can be applicable at the individual level to tailor treatment prescription.
Asunto(s)
Antidepresivos/farmacocinética , Citalopram/administración & dosificación , Trastorno Depresivo Mayor/tratamiento farmacológico , Inhibidores Selectivos de la Recaptación de Serotonina/farmacocinética , Transcriptoma/efectos de los fármacos , Antidepresivos/química , Antidepresivos/uso terapéutico , Citalopram/farmacocinética , Simulación por Computador , Trastorno Depresivo Mayor/genética , Prescripciones de Medicamentos , Expresión Génica/efectos de los fármacos , Células HT29 , Humanos , Células MCF-7 , Inhibidores Selectivos de la Recaptación de Serotonina/química , Inhibidores Selectivos de la Recaptación de Serotonina/uso terapéutico , Transcriptoma/genéticaRESUMEN
The 3xTg-AD mouse is a widely used model in the study of Alzheimer's Disease (AD). It has been extensively characterized from both the anatomical and behavioral point of view, but poorly studied at the transcriptomic level. For the first time, we characterize the whole blood transcriptome of the 3xTg-AD mouse at three and six months of age and evaluate how its gene expression is modulated by transcranial direct current stimulation (tDCS). RNA-seq analysis revealed 183 differentially expressed genes (DEGs) that represent a direct signature of the genetic background of the mouse. Moreover, in the 6-month-old 3xTg-AD mice, we observed a high number of DEGs that could represent good peripheral biomarkers of AD symptomatology onset. Finally, tDCS was associated with gene expression changes in the 3xTg-AD, but not in the control mice. In conclusion, this study provides an in-depth molecular characterization of the 3xTg-AD mouse and suggests that blood gene expression can be used to identify new biomarkers of AD progression and treatment effects.
Asunto(s)
Enfermedad de Alzheimer/genética , Estimulación Transcraneal de Corriente Directa/efectos adversos , Transcriptoma/genética , Enfermedad de Alzheimer/terapia , Péptidos beta-Amiloides/metabolismo , Precursor de Proteína beta-Amiloide/metabolismo , Animales , Células Sanguíneas/efectos de los fármacos , Células Sanguíneas/metabolismo , Modelos Animales de Enfermedad , Perfilación de la Expresión Génica/métodos , Masculino , Ratones , Ratones Endogámicos C57BL , Ratones Transgénicos , Secuenciación del Exoma/métodos , Proteínas tau/metabolismoRESUMEN
Summary: Exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathological variants and study genetic architecture of human diseases. However, a significant proportion of identified genetic variants are actually false positive calls, and this pose serious challenge for variants interpretation. Here, we propose a new tool named Genomic vARiants FIltering by dEep Learning moDels in NGS (GARFIELD-NGS), which rely on deep learning models to dissect false and true variants in exome sequencing experiments performed with Illumina or ION platforms. GARFIELD-NGS showed strong performances for both SNP and INDEL variants (AUC 0.71-0.98) and outperformed established hard filters. The method is robust also at low coverage down to 30X and can be applied on data generated with the recent Illumina two-colour chemistry. GARFIELD-NGS processes standard VCF file and produces a regular VCF output. Thus, it can be easily integrated in existing analysis pipeline, allowing application of different thresholds based on desired level of sensitivity and specificity. Availability and implementation: GARFIELD-NGS available at https://github.com/gedoardo83/GARFIELD-NGS. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Aprendizaje Profundo , Genómica , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Mutación INDEL , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/métodosRESUMEN
The growth of publicly available data informing upon genetic variations, mechanisms of disease, and disease subphenotypes offers great potential for personalized medicine. Computational approaches are likely required to assess a large number of novel genetic variants. However, the integration of genetic, structural, and pathophysiological data still represents a challenge for computational predictions and their clinical use. We addressed these issues for alpha-1-antitrypsin deficiency, a disease mediated by mutations in the SERPINA1 gene encoding alpha-1-antitrypsin. We compiled a comprehensive database of SERPINA1 coding mutations and assigned them apparent pathological relevance based upon available data. "Benign" and "pathogenic" variations were used to assess performance of 31 pathogenicity predictors. Well-performing algorithms clustered the subset of variants known to be severely pathogenic with high scores. Eight new mutations identified in the ExAC database and achieving high scores were selected for characterization in cell models and showed secretory deficiency and polymer formation, supporting the predictive power of our computational approach. The behavior of the pathogenic new variants and consistent outliers were rationalized by considering the protein structural context and residue conservation. These findings highlight the potential of computational methods to provide meaningful predictions of the pathogenic significance of novel mutations and identify areas for further investigation.
Asunto(s)
Biología Computacional , Deficiencia de alfa 1-Antitripsina/genética , alfa 1-Antitripsina/genética , Alelos , Bases de Datos Genéticas , Retículo Endoplásmico/genética , Retículo Endoplásmico/patología , Exoma/genética , Femenino , Genética de Población , Humanos , Elastasa de Leucocito/genética , Masculino , Mutación Missense/genética , Secuenciación del Exoma , Deficiencia de alfa 1-Antitripsina/patologíaRESUMEN
BACKGROUND: A-to-I RNA editing is a co-/post-transcriptional modification catalyzed by ADAR enzymes, that deaminates Adenosines (A) into Inosines (I). Most of known editing events are located within inverted ALU repeats, but they also occur in coding sequences and may alter the function of encoded proteins. RNA editing contributes to generate transcriptomic diversity and it is found altered in cancer, autoimmune and neurological disorders. Emerging evidences indicate that editing process could be influenced by genetic variations, biological and environmental variables. RESULTS: We analyzed RNA editing levels in human blood using RNA-seq data from 459 healthy individuals and identified 2079 sites consistently edited in this tissue. As expected, analysis of gene expression revealed that ADAR is the major contributor to editing on these sites, explaining ~ 13% of observed variability. After removing ADAR effect, we found significant associations for 1122 genes, mainly involved in RNA processing. These genes were significantly enriched in genes encoding proteins interacting with ADARs, including 276 potential ADARs interactors and 9 ADARs direct partners. In addition, our analysis revealed several factors potentially influencing RNA editing in blood, including cell composition, age, Body Mass Index, smoke and alcohol consumption. Finally, we identified genetic loci associated with editing levels, including known ADAR eQTLs and a small region on chromosome 7, containing LOC730338, a lincRNA gene that appears to modulate ADARs mRNA expression. CONCLUSIONS: Our data provides a detailed picture of the most relevant RNA editing events and their variability in human blood, giving interesting insights on potential mechanisms behind this post-transcriptional modification and its regulation in this tissue.
Asunto(s)
Edición de ARN , ARN Mensajero/metabolismo , Adenosina Desaminasa/genética , Linfocitos B/citología , Linfocitos B/metabolismo , Línea Celular , Cromosomas Humanos Par 7 , Humanos , Análisis de Componente Principal , Mapas de Interacción de Proteínas/genética , Sitios de Carácter Cuantitativo , ARN Largo no Codificante/genéticaRESUMEN
Sialic acid acetylesterase (SIAE) removes acetyl moieties from the carbon 9 and 4 hydroxyl groups of sialic acid and recently a debate has been opened on its association to autoimmunity. Trying to get new insights on this intriguing enzyme we have studied siae in zebrafish (Danio rerio). In this teleost siae encodes for a polypeptide with a high degree of sequence identity to human and mouse counterparts. Zebrafish Siae behavior upon transient expression in COS7 cells is comparable to human enzyme concerning pH optimum of enzyme activity, subcellular localization and glycosylation. In addition, and as already observed in case of human SIAE, the glycosylated form of the enzyme from zebrafish is released into the culture media. During embryogenesis, in situ hybridization experiments demonstrate that siae transcript is always detectable during development, with a more specific expression in the central nervous system, in pronephric ducts and liver in the more advanced stages of the embryo development. In adult fish an increasing amount of siae mRNA is detectable in heart, eye, muscle, liver, brain, kidney and ovary. These results provide novel information about Siae and point out zebrafish as animal model to better understand the biological role(s) of this rather puzzling enzyme in vertebrates, regarding immune system function and the development of central nervous system.
Asunto(s)
Acetilesterasa/metabolismo , Genoma , Proteínas de Pez Cebra/metabolismo , Acetilesterasa/química , Acetilesterasa/genética , Animales , Células COS , Chlorocebus aethiops , Regulación del Desarrollo de la Expresión Génica , Humanos , Riñón/metabolismo , Hígado/metabolismo , Sistema Nervioso/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Homología de Secuencia de Ácido Nucleico , Pez Cebra/genética , Pez Cebra/crecimiento & desarrollo , Pez Cebra/metabolismo , Proteínas de Pez Cebra/química , Proteínas de Pez Cebra/genéticaRESUMEN
Filamin A is an X-linked, ubiquitous actin-binding protein whose mutations are associated to multiple disorders with limited genotype-phenotype correlations. While gain-of-function mutations cause various bone dysplasias, loss-of-function variants are the most common cause of periventricular nodular heterotopias with variable soft connective tissue involvement, as well as X-linked cardiac valvular dystrophy (XCVD). The term "Ehlers-Danlos syndrome (EDS) with periventricular heterotopias" has been used in females with neurological, cardiovascular, integument and joint manifestations, but this nosology is still a matter of debate. We report the clinical and molecular update of an Italian family with an X-linked recessive soft connective tissue disorder and which was described, in 1975, as the first example of EDS type V of the Berlin nosology. The cutaneous phenotype of the index patient was close to classical EDS and all males died for a lethal cardiac valvular dystrophy. Whole exome sequencing identified the novel c.1829-1G>C splice variation in FLNA in two affected cousins. The nucleotide change was predicted to abolish the canonical splice acceptor site of exon 13 and to activate a cryptic acceptor site 15 bp downstream, leading to in frame deletion of five amino acid residues (p.Phe611_Gly615del). The predicted in frame deletion clusters with all the mutations previously identified in XCVD and falls within the N-terminus rod 1 domain of filamin A. Our findings expand the male-specific phenotype of FLNA mutations that now includes classical-like EDS with lethal cardiac valvular dystrophy, and offer further insights for the genotype-phenotype correlations within this spectrum. © 2016 Wiley Periodicals, Inc.
Asunto(s)
Síndrome de Ehlers-Danlos/diagnóstico , Síndrome de Ehlers-Danlos/genética , Filaminas/genética , Mutación , Fenotipo , Sitios de Empalme de ARN , Niño , Preescolar , Exoma , Resultado Fatal , Femenino , Genes Ligados a X , Estudios de Asociación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Imagen por Resonancia Magnética , Masculino , Persona de Mediana Edad , LinajeRESUMEN
The Ion Proton platform allows to perform whole exome sequencing (WES) at low cost, providing rapid turnaround time and great flexibility. Products for WES on Ion Proton system include the AmpliSeq Exome kit and the recently introduced HiQ sequencing chemistry. Here, we used gold standard variants from GIAB consortium to assess the performances in variants identification, characterize the erroneous calls and develop a filtering strategy to reduce false positives. The AmpliSeq Exome kit captures a large fraction of bases (>94 %) in human CDS, ClinVar genes and ACMG genes, but with 2,041 (7 %), 449 (13 %) and 11 (19 %) genes not fully represented, respectively. Overall, 515 protein coding genes contain hard-to-sequence regions, including 90 genes from ClinVar. Performance in variants detection was maximum at mean coverage >120×, while at 90× and 70× we measured a loss of variants of 3.2 and 4.5 %, respectively. WES using HiQ chemistry showed ~71/97.5 % sensitivity, ~37/2 % FDR and ~0.66/0.98 F1 score for indels and SNPs, respectively. The proposed low, medium or high-stringency filters reduced the amount of false positives by 10.2, 21.2 and 40.4 % for indels and 21.2, 41.9 and 68.2 % for SNP, respectively. Amplicon-based WES on Ion Proton platform using HiQ chemistry emerged as a competitive approach, with improved accuracy in variants identification. False-positive variants remain an issue for the Ion Torrent technology, but our filtering strategy can be applied to reduce erroneous variants.
Asunto(s)
Exoma/genética , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Polimorfismo de Nucleótido Simple/genética , Semiconductores , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Análisis de Secuencia de ADNRESUMEN
The lysosomal hydrolase galactocerebrosidase (GALC) catalyzes the removal of galactose from galactosylceramide and from other sphingolipids. GALC deficiency is responsible for globoid cell leukodystrophy (GLD), or Krabbe's disease, an early lethal inherited neurodegenerative disorder characterized by the accumulation of the neurotoxic metabolite psychosine in the central nervous system (CNS). The poor outcome of current clinical treatments calls for novel model systems to investigate the biological impact of GALC down-regulation and for the search of novel therapeutic strategies in GLD. Zebrafish (Danio rerio) represents an attractive vertebrate model for human diseases. Here, lysosomal GALC activity was demonstrated in the brain of zebrafish adults and embryos. Accordingly, we identified two GALC co-orthologs (named galca and galcb) dynamically co-expressed in CNS during zebrafish development. Both genes encode for lysosomal enzymes endowed with GALC activity. Single down-regulation of galca or galcb by specific antisense morpholino oligonucleotides results in a partial decrease of GALC activity in zebrafish embryos that was abrogated in double galca/galcb morphants. However, no psychosine accumulation was observed in galca/galcb double morphants. Nevertheless, double galca/galcb knockdown caused reduction and partial disorganization of the expression of the early neuronal marker neuroD and an increase of apoptotic events during CNS development. These observations provide new insights into the pathogenesis of GLD, indicating that GALC loss-of-function may have pathological consequences in developing CNS independent of psychosine accumulation. Also, they underscore the potentiality of the zebrafish system in studying the pathogenesis of lysosomal neurodegenerative diseases, including GLD.
Asunto(s)
Galactosilceramidasa/fisiología , Leucodistrofia de Células Globoides/etiología , Pez Cebra/metabolismo , Animales , Encéfalo/embriología , Encéfalo/enzimología , Clonación Molecular , Modelos Animales de Enfermedad , Galactosilceramidasa/genética , Humanos , Leucodistrofia de Células Globoides/enzimología , Pez Cebra/embriologíaRESUMEN
Sialic acid acetyl esterase (SIAE) removes acetyl moieties from the hydroxyl groups in position 9 and 4 of sialic acid. Recently, a dispute has been opened on its association to autoimmunity. In order to get new insights on human SIAE biology and to clarify its seemingly contradictory molecular properties, we combined in silico characterization, phylogenetic analysis and homology modeling with cellular studies in COS7 cells. Genomic and phylogenetic analysis revealed that in most tissues only the "long" isoform, originally referred to lysosomal sialic acid esterase, is detected. Using the homology modeling approach, we predicted a model of SIAE 3D structure, which fulfills the topological features of SGNH-hydrolase family. In addition, the model and site-directed mutagenesis experiments allowed the definition of the residues involved in catalysis. SIAE transient expression revealed that the protein is glycosylated and is active in vitro as an esterase with a pH optimum corresponding to 8.4-8.5. Moreover, glycosylation influences the biological activity of the enzyme and is essential for release of SIAE into the culture medium. According to these findings, co-localization experiments demonstrated the presence of SIAE in membranous structures corresponding to endoplasmic reticulum and Golgi complex. Thus, at least in COS7 cells, SIAE behaves as a typical secreted enzyme, subjected to glycosylation and located along the classical secretory route or in the extracellular space. In these environments, the enzyme could act on 9-O-acetylated sialic acid residues, contributing to the fine-tuning of the various functions played by this acidic sugar.
Asunto(s)
Acetilesterasa/metabolismo , Acetilesterasa/química , Acetilesterasa/genética , Secuencia de Aminoácidos , Animales , Células COS , Chlorocebus aethiops , Humanos , Datos de Secuencia Molecular , Filogenia , Isoformas de Proteínas/química , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Estructura Terciaria de Proteína , Transporte de ProteínasRESUMEN
The zebrafish (Danio rerio) is a very popular vertebrate model system, especially embryos represent a valuable tool for in vivo pharmacological assays. This is mainly due to the zebrafish advantages when compared to other animal models. Erythropoietin is a glycoprotein hormone that acts principally on erythroid progenitors, stimulating their survival, proliferation and differentiation. Recombinant human erythropoietin (rhEPO) has been widely used in medicine to treat anemia and it is one of the best-selling biotherapeutics worldwide. The recombinant molecule, industrially produced in CHO cells, has the same amino acid sequence of endogenous human erythropoietin, but differs in the glycosylation pattern. This may influence efficacy and safety, particularly immunogenicity, of the final product. We employed the zebrafish embryo as a vertebrate animal model to perform in vivo pharmacological assays. We conducted a functional analysis of rhEPO alpha Eprex(®) and two biosimilars, the erythropoietin alpha Binocrit(®) and zeta Retacrit(®). By in silico analysis and 3D modeling we proved the interaction between recombinant human erythropoietin and zebrafish endogenous erythropoietin receptor. Then we treated zebrafish embryos with the 3 rhEPOs and we investigated their effect on erythrocytes production with different assays. By real time-PCR we observed the relative upregulation of gata1 (2.4 ± 0.3 fold), embryonic α-Hb (1.9 ± 0.2 fold) and ß-Hb (1.6 ± 0.1 fold) transcripts. A significant increase in Stat5 phosphorylation was also assessed in embryos treated with rhEPOs when compared with the negative controls. Live imaging in tg (kdrl:EGFP; gata1:ds-red) embryos, o-dianisidine positive area quantification and cyanomethemoglobin content quantification revealed a 1.8 ± 0.3 fold increase of erythrocytes amount in embryos treated with rhEPOs when compared with the negative controls. Finally, we verified that recombinant human erythropoietins did not cause any inflammatory response in the treated embryos. Our data showed that zebrafish embryo can be a valuable tool to study in vivo effects of complex pharmacological compounds, such as recombinant human glycoproteins, allowing to perform fast and reproducible pharmacological assays with excellent results.
Asunto(s)
Biosimilares Farmacéuticos/farmacología , Eritropoyetina/metabolismo , Pez Cebra/metabolismo , Secuencia de Aminoácidos , Animales , Biología Computacional/métodos , Embrión no Mamífero/efectos de los fármacos , Embrión no Mamífero/metabolismo , Epoetina alfa/farmacología , Factor de Transcripción GATA1/metabolismo , Humanos , Modelos Animales , Datos de Secuencia Molecular , Receptores de Eritropoyetina/metabolismo , Proteínas Recombinantes/metabolismo , Alineación de Secuencia , Regulación hacia Arriba/efectos de los fármacosRESUMEN
As vast histological archives are digitised, there is a pressing need to be able to associate specific tissue substructures and incident pathology to disease outcomes without arduous annotation. Here, we learn self-supervised representations using a Vision Transformer, trained on 1.7 M histology images across 23 healthy tissues in 838 donors from the Genotype Tissue Expression consortium (GTEx). Using these representations, we can automatically segment tissues into their constituent tissue substructures and pathology proportions across thousands of whole slide images, outperforming other self-supervised methods (43% increase in silhouette score). Additionally, we can detect and quantify histological pathologies present, such as arterial calcification (AUROC = 0.93) and identify missing calcification diagnoses. Finally, to link gene expression to tissue morphology, we introduce RNAPath, a set of models trained on 23 tissue types that can predict and spatially localise individual RNA expression levels directly from H&E histology (mean genes significantly regressed = 5156, FDR 1%). We validate RNAPath spatial predictions with matched ground truth immunohistochemistry for several well characterised control genes, recapitulating their known spatial specificity. Together, these results demonstrate how self-supervised machine learning when applied to vast histological archives allows researchers to answer questions about tissue pathology, its spatial organisation and the interplay between morphological tissue variability and gene expression.
Asunto(s)
Aprendizaje Automático Supervisado , Humanos , ARN/genética , ARN/metabolismo , Perfilación de la Expresión Génica/métodos , Especificidad de Órganos/genética , Procesamiento de Imagen Asistido por Computador/métodosRESUMEN
OBJECTIVE: The HIV-1 transactivating factor (Tat) possesses features typical of both cell-adhesive and angiogenic growth factor (AGF) proteins, inducing endothelial cell (EC) adhesion and proangiogenic activation. Tat was exploited to investigate the events triggered by EC adhesion to substrate-bound AGF that lead to proangiogenic activation. METHODS AND RESULTS: Immobilized Tat induces actin cytoskeleton organization, formation of α(v)ß(3) integrin(+)focal adhesion plaques, and recruitment of vascular endothelial growth factor receptor-2 (VEGFR2) in the ventral plasma membrane of adherent ECs. Also, acceptor photobleaching fluorescence resonance energy transfer demonstrated that VEGFR2/α(v)ß(3) coupling occurs at the basal aspect of Tat-adherent ECs. Cell membrane fractionation showed that a limited fraction of α(v)ß(3) integrin and VEGFR2 does colocalize in lipid rafts at the basal aspect of Tat-adherent ECs. VEGFR2 undergoes phosphorylation and triggers pp60src/ERK(1/2) activation. The use of lipid raft disrupting agents and second messenger inhibitors demonstrated that intact lipid rafts and the VEGFR2/pp60src/ERK(1/2) pathway are both required for cytoskeleton organization and proangiogenic activation of Tat-adherent ECs. CONCLUSIONS: Substrate-immobilized Tat causes VEGFR2/α(v)ß(3) complex formation and polarization at the basal aspect of adherent ECs, VEGFR2/pp60src/ERK(1/2) phosphorylation, cytoskeleton organization, and proangiogenic activation. These results provide novel insights in the AGF/tyrosine kinase receptor/integrin cross-talk.
Asunto(s)
Células Endoteliales/metabolismo , Endotelio Vascular/metabolismo , VIH-1/metabolismo , Integrina alfaVbeta3/metabolismo , Receptor 2 de Factores de Crecimiento Endotelial Vascular/metabolismo , Productos del Gen tat del Virus de la Inmunodeficiencia Humana/metabolismo , Movimiento Celular , Células Cultivadas , Células Endoteliales/citología , Células Endoteliales/virología , Endotelio Vascular/citología , Endotelio Vascular/virología , Adhesiones Focales , Humanos , Transducción de SeñalRESUMEN
OBJECTIVES: Major depressive disorder (MDD) is a psychiatric disorder with pathogenesis influenced by both genetic and environmental factors. To date, the molecular-level understanding of its aetiology remains unclear. Thus, we aimed to identify genetic variants and susceptibility genes for MDD with a genome-wide association study (GWAS) approach. METHODS: We performed a meta-analysis of GWASs and a gene-based analysis on two Northern Italy isolated populations (cases/controls n = 166/472 and 33/320), followed by replication and polygenic risk score (PRS) analyses in Italian independent samples (cases n = 464, controls n = 339). RESULTS: We identified two novel MDD-associated genes, KCNQ5 (lead SNP rs867262, p = 3.82 × 10-9) and CTNNA2 (rs6729523, p = 1.25 × 10-8). The gene-based analysis revealed another six genes (p < 2.703 × 10-6): GRM7, CTNT4, SNRK, SRGAP3, TRAPPC9, and FHIT. No replication of the genome-wide significant SNPs was found in the independent cohort, even if 14 SNPs around CTNNA2 showed association with MDD and related phenotypes at the nominal level of p (<0.05). Furthermore, the PRS model developed in the discovery cohort discriminated cases and controls in the replication cohort. CONCLUSIONS: Our work suggests new possible genes associated with MDD, and the PRS analysis confirms the polygenic nature of this disorder. Future studies are required to better understand the role of these findings in MDD.
Asunto(s)
Trastorno Depresivo Mayor , Humanos , Trastorno Depresivo Mayor/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Italia , Polimorfismo de Nucleótido SimpleRESUMEN
DNA sample contamination is a major issue in clinical and research applications of whole genome and exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a new metric to estimate DNA sample contamination from variant-level whole genome and exome sequence data, CHARR, Contamination from Homozygous Alternate Reference Reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VDS format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole genome and exome sequencing datasets.
RESUMEN
BACKGROUND: Whole genome sequencing is increasingly being used for the diagnosis of patients with rare diseases. However, the diagnostic yields of many studies, particularly those conducted in a healthcare setting, are often disappointingly low, at 25-30%. This is in part because although entire genomes are sequenced, analysis is often confined to in silico gene panels or coding regions of the genome. METHODS: We undertook WGS on a cohort of 122 unrelated rare disease patients and their relatives (300 genomes) who had been pre-screened by gene panels or arrays. Patients were recruited from a broad spectrum of clinical specialties. We applied a bioinformatics pipeline that would allow comprehensive analysis of all variant types. We combined established bioinformatics tools for phenotypic and genomic analysis with our novel algorithms (SVRare, ALTSPLICE and GREEN-DB) to detect and annotate structural, splice site and non-coding variants. RESULTS: Our diagnostic yield was 43/122 cases (35%), although 47/122 cases (39%) were considered solved when considering novel candidate genes with supporting functional data into account. Structural, splice site and deep intronic variants contributed to 20/47 (43%) of our solved cases. Five genes that are novel, or were novel at the time of discovery, were identified, whilst a further three genes are putative novel disease genes with evidence of causality. We identified variants of uncertain significance in a further fourteen candidate genes. The phenotypic spectrum associated with RMND1 was expanded to include polymicrogyria. Two patients with secondary findings in FBN1 and KCNQ1 were confirmed to have previously unidentified Marfan and long QT syndromes, respectively, and were referred for further clinical interventions. Clinical diagnoses were changed in six patients and treatment adjustments made for eight individuals, which for five patients was considered life-saving. CONCLUSIONS: Genome sequencing is increasingly being considered as a first-line genetic test in routine clinical settings and can make a substantial contribution to rapidly identifying a causal aetiology for many patients, shortening their diagnostic odyssey. We have demonstrated that structural, splice site and intronic variants make a significant contribution to diagnostic yield and that comprehensive analysis of the entire genome is essential to maximise the value of clinical genome sequencing.