RESUMEN
BACKGROUND: Understanding the impact of clonal hematopoiesis of indeterminate potential (CHIP) and mosaic chromosomal alterations (mCAs) on solid tumor risk and mortality can shed light on novel cancer pathways. METHODS: The authors analyzed whole genome sequencing data from the Trans-Omics for Precision Medicine Women's Health Initiative study (n = 10,866). They investigated the presence of CHIP and mCA and their association with the development and mortality of breast, lung, and colorectal cancers. RESULTS: CHIP was associated with higher risk of breast (hazard ratio [HR], 1.30; 95% confidence interval [CI], 1.03-1.64; p = .02) but not colorectal (p = .77) or lung cancer (p = .32). CHIP carriers who developed colorectal cancer also had a greater risk for advanced-stage (p = .01), but this was not seen in breast or lung cancer. CHIP was associated with increased colorectal cancer mortality both with (HR, 3.99; 95% CI, 2.41-6.62; p < .001) and without adjustment (HR, 2.50; 95% CI, 1.32-4.72; p = .004) for advanced-stage and a borderline higher breast cancer mortality (HR, 1.53; 95% CI, 0.98-2.41; p = .06). Conversely, mCA (cell fraction [CF] >3%) did not correlate with cancer risk. With higher CFs (mCA >5%), autosomal mCA was associated with increased breast cancer risk (HR, 1.39; 95% CI, 1.06-1.83; p = .01). There was no association of mCA (>3%) with breast, colorectal, or lung mortality except higher colon cancer mortality (HR, 2.19; 95% CI, 1.11-4.3; p = .02) with mCA >5%. CONCLUSIONS: CHIP and mCA (CF >5%) were associated with higher breast cancer risk and colorectal cancer mortality individually. These data could inform on novel pathways that impact cancer risk and lead to better risk stratification.
Asunto(s)
Neoplasias de la Mama , Aberraciones Cromosómicas , Hematopoyesis Clonal , Neoplasias Colorrectales , Mosaicismo , Humanos , Femenino , Hematopoyesis Clonal/genética , Anciano , Neoplasias de la Mama/genética , Neoplasias de la Mama/mortalidad , Neoplasias de la Mama/patología , Persona de Mediana Edad , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/mortalidad , Neoplasias Colorrectales/patología , Incidencia , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/mortalidad , Neoplasias Pulmonares/patología , Masculino , Neoplasias/genética , Neoplasias/mortalidad , Neoplasias/patología , Neoplasias/epidemiología , Secuenciación Completa del GenomaRESUMEN
The global spread of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and the associated disease COVID-19, requires therapeutic interventions that can be rapidly identified and translated to clinical care. Traditional drug discovery methods have a >90% failure rate and can take 10 to 15 y from target identification to clinical use. In contrast, drug repurposing can significantly accelerate translation. We developed a quantitative high-throughput screen to identify efficacious agents against SARS-CoV-2. From a library of 1,425 US Food and Drug Administration (FDA)-approved compounds and clinical candidates, we identified 17 hits that inhibited SARS-CoV-2 infection and analyzed their antiviral activity across multiple cell lines, including lymph node carcinoma of the prostate (LNCaP) cells and a physiologically relevant model of alveolar epithelial type 2 cells (iAEC2s). Additionally, we found that inhibitors of the Ras/Raf/MEK/ERK signaling pathway exacerbate SARS-CoV-2 infection in vitro. Notably, we discovered that lactoferrin, a glycoprotein found in secretory fluids including mammalian milk, inhibits SARS-CoV-2 infection in the nanomolar range in all cell models with multiple modes of action, including blockage of virus attachment to cellular heparan sulfate and enhancement of interferon responses. Given its safety profile, lactoferrin is a readily translatable therapeutic option for the management of COVID-19.
Asunto(s)
Antivirales/farmacología , Factores Inmunológicos/farmacología , Lactoferrina/farmacología , SARS-CoV-2/efectos de los fármacos , Internalización del Virus/efectos de los fármacos , Replicación Viral/efectos de los fármacos , Animales , COVID-19/inmunología , COVID-19/prevención & control , COVID-19/virología , Células CACO-2 , Línea Celular Tumoral , Chlorocebus aethiops , Relación Dosis-Respuesta a Droga , Descubrimiento de Drogas , Reposicionamiento de Medicamentos/métodos , Células Epiteliales , Heparitina Sulfato/antagonistas & inhibidores , Heparitina Sulfato/inmunología , Heparitina Sulfato/metabolismo , Hepatocitos , Ensayos Analíticos de Alto Rendimiento , Humanos , SARS-CoV-2/crecimiento & desarrollo , SARS-CoV-2/patogenicidad , Células Vero , Tratamiento Farmacológico de COVID-19RESUMEN
Carotid artery atherosclerotic disease (CAAD) is a risk factor for stroke. We used a genome-wide association (GWAS) approach to discover genetic variants associated with CAAD in participants in the electronic Medical Records and Genomics (eMERGE) Network. We identified adult CAAD cases with unilateral or bilateral carotid artery stenosis and controls without evidence of stenosis from electronic health records at eight eMERGE sites. We performed GWAS with a model adjusting for age, sex, study site, and genetic principal components of ancestry. In eMERGE we found 1793 CAAD cases and 17,958 controls. Two loci reached genome-wide significance, on chr6 in LPA (rs10455872, odds ratio [OR] (95% confidence interval [CI]) = 1.50 (1.30-1.73), p = 2.1 × 10-8 ) and on chr7, an intergenic single nucleotide variant (SNV; rs6952610, OR (95% CI) = 1.25 (1.16-1.36), p = 4.3 × 10-8 ). The chr7 association remained significant in the presence of the LPA SNV as a covariate. The LPA SNV was also associated with coronary heart disease (CHD; 4199 cases and 11,679 controls) in this study (OR (95% CI) = 1.27 (1.13-1.43), p = 5 × 10-5 ) but the chr7 SNV was not (OR (95% CI) = 1.03 (0.97-1.09), p = .37). Both variants replicated in UK Biobank. Elevated lipoprotein(a) concentrations ([Lp(a)]) and LPA variants associated with elevated [Lp(a)] have previously been associated with CAAD and CHD, including rs10455872. With electronic health record phenotypes in eMERGE and UKB, we replicated a previously known association and identified a novel locus associated with CAAD.
Asunto(s)
Estenosis Carotídea , Estudio de Asociación del Genoma Completo , Registros Electrónicos de Salud , Predisposición Genética a la Enfermedad , Genómica , Humanos , Lipoproteína(a)/genética , Modelos Genéticos , Polimorfismo de Nucleótido SimpleRESUMEN
The SH-SY5Y neuroblastoma cells are a widely used in vitro model approximating neurons for testing the target engagement of therapeutics designed for neurodegenerative diseases and pain disorders. However, their potential as a model for receptor-mediated delivery and uptake of novel modalities, such as antibody-drug conjugates, remains understudied. Investigation of the SH-SY5Y cell surfaceome will aid in greater in vitro to in vivo correlation of delivery and uptake, thereby accelerating drug discovery. So far, the majority of studies have focused on total cell proteomics from undifferentiated and differentiated SH-SY5Y cells. While some studies have investigated the expression of specific proteins in neuroblastoma tissue, a global approach for comparison of neuroblastoma cell surfaceome to the brain and dorsal root ganglion (DRG) neurons remains uninvestigated. Furthermore, an isoform-specific evaluation of cell surface proteins expressed on neuroblastoma cells remains unexplored. In this study, we define a bioinformatic workflow for the identification of high-confidence surface proteins expressed on brain and DRG neurons using tissue proteomic and transcriptomic data. We then delineate the SH-SY5Y cell surfaceome by surface proteomics and show that it significantly overlaps with the human brain and DRG neuronal surface proteome. We find that, for 32% of common surface proteins, SH-SY5Y-specific major isoforms are alternatively spliced, maintaining their protein-coding ability, and are predicted to localize to the cell surface. Validation of these isoforms using surface proteomics confirms a SH-SY5Y-specific alternative NRCAM (neuron-glia related cell adhesion molecule) isoform, which is absent in typical brain neurons, but present in neuroblastomas, making it a receptor of interest for neuroblastoma-specific therapeutics.
Asunto(s)
Neuroblastoma , Humanos , Neuroblastoma/terapia , Neuroblastoma/tratamiento farmacológico , Línea Celular Tumoral , Proteómica , Neuronas/metabolismo , Diferenciación Celular/fisiología , Proteínas de la Membrana/metabolismoRESUMEN
Bone mineral density (BMD) assessed by DXA is used to evaluate bone health. In children, total body (TB) measurements are commonly used; in older individuals, BMD at the lumbar spine (LS) and femoral neck (FN) is used to diagnose osteoporosis. To date, genetic variants in more than 60 loci have been identified as associated with BMD. To investigate the genetic determinants of TB-BMD variation along the life course and test for age-specific effects, we performed a meta-analysis of 30 genome-wide association studies (GWASs) of TB-BMD including 66,628 individuals overall and divided across five age strata, each spanning 15 years. We identified variants associated with TB-BMD at 80 loci, of which 36 have not been previously identified; overall, they explain approximately 10% of the TB-BMD variance when combining all age groups and influence the risk of fracture. Pathway and enrichment analysis of the association signals showed clustering within gene sets implicated in the regulation of cell growth and SMAD proteins, overexpressed in the musculoskeletal system, and enriched in enhancer and promoter regions. These findings reveal TB-BMD as a relevant trait for genetic studies of osteoporosis, enabling the identification of variants and pathways influencing different bone compartments. Only variants in ESR1 and close proximity to RANKL showed a clear effect dependency on age. This most likely indicates that the majority of genetic variants identified influence BMD early in life and that their effect can be captured throughout the life course.
Asunto(s)
Densidad Ósea/genética , Estudio de Asociación del Genoma Completo , Adolescente , Factores de Edad , Animales , Niño , Preescolar , Sitios Genéticos , Humanos , Lactante , Recién Nacido , Ratones Noqueados , Polimorfismo de Nucleótido Simple/genética , Carácter Cuantitativo Heredable , Análisis de RegresiónRESUMEN
Nonalcoholic fatty liver disease (NAFLD) is a common cause of chronic liver disease. A single-nucleotide polymorphism (SNP), rs6834314, was associated with serum liver enzymes in the general population, presumably reflecting liver fat or injury. We studied rs6834314 and its nearest gene, 17-beta hydroxysteroid dehydrogenase 13 (HSD17B13), to identify associations with histological features of NAFLD and to characterize the functional role of HSD17B13 in NAFLD pathogenesis. The minor allele of rs6834314 was significantly associated with increased steatosis but decreased inflammation, ballooning, Mallory-Denk bodies, and liver enzyme levels in 768 adult Caucasians with biopsy-proven NAFLD and with cirrhosis in the general population. We found two plausible causative variants in the HSD17B13 gene. rs72613567, a splice-site SNP in high linkage with rs6834314 (r2 = 0.94) generates splice variants and shows a similar pattern of association with NAFLD histology. Its minor allele generates simultaneous expression of exon 6-skipping and G-nucleotide insertion variants. Another SNP, rs62305723 (encoding a P260S mutation), is significantly associated with decreased ballooning and inflammation. Hepatic expression of HSD17B13 is 5.9-fold higher (P = 0.003) in patients with NAFLD. HSD17B13 is targeted to lipid droplets, requiring the conserved amino acid 22-28 sequence and amino acid 71-106 region. The protein has retinol dehydrogenase (RDH) activity, with enzymatic activity dependent on lipid droplet targeting and cofactor binding site. The exon 6 deletion, G insertion, and naturally occurring P260S mutation all confer loss of enzymatic activity. Conclusion: We demonstrate the association of variants in HSD17B13 with specific features of NAFLD histology and identify the enzyme as a lipid droplet-associated RDH; our data suggest that HSD17B13 plays a role in NAFLD through its enzymatic activity.
Asunto(s)
17-Hidroxiesteroide Deshidrogenasas/genética , Enfermedad del Hígado Graso no Alcohólico/genética , 17-Hidroxiesteroide Deshidrogenasas/metabolismo , Adulto , Secuencia de Aminoácidos , Estudios de Cohortes , Femenino , Células HEK293 , Células Hep G2 , Humanos , Hígado/metabolismo , Masculino , Persona de Mediana Edad , Terapia Molecular Dirigida , Enfermedad del Hígado Graso no Alcohólico/metabolismo , Polimorfismo de Nucleótido Simple , Retinoides/metabolismoRESUMEN
BACKGROUND AND AIMS: Cirrhosis is characterized by extensive fibrosis of the liver and is a major cause of liver-related mortality. Cirrhosis is partially heritable but genetic contributions to cirrhosis have not been systemically explored. Here, we carry out association analyses with cirrhosis in two large biobanks and determine the effects of cirrhosis associated variants on multiple human disease/traits. METHODS: We carried out a genome-wide association analysis of cirrhosis as a diagnosis in UK BioBank (UKBB; 1088 cases vs. 407 873 controls) and then tested top-associating loci for replication with cirrhosis in a hospital-based cohort from the Michigan Genomics Initiative (MGI; 875 cases of cirrhosis vs. 30 346 controls). For replicating variants or variants previously associated with cirrhosis that also affected cirrhosis in UKBB or MGI, we determined single nucleotide polymorphism effects on all other diagnoses in UKBB (PheWAS), common metabolic traits/diseases and serum/plasma metabolites. RESULTS: Unbiased genome-wide association study identified variants in/near PNPLA3 and HFE, and candidate variant analysis identified variants in/near TM6SF2, MBOAT7, SERPINA1, HSD17B13, STAT4 and IFNL4 that reproducibly affected cirrhosis. Most affected liver enzyme concentrations and/or aspartate transaminase-to-platelet ratio index. PheWAS, metabolic trait and serum/plasma metabolite association analyses revealed effects of these variants on lipid, inflammatory and other processes including new effects on many human diseases and traits. CONCLUSIONS: We identified eight loci that reproducibly associate with population-based cirrhosis and define their diverse effects on human diseases and traits.
Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Cirrosis Hepática , Pleiotropía Genética , Humanos , Cirrosis Hepática/genética , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
BACKGROUND: Alterations in gene expression are key events in disease etiology and risk. Poor reproducibility in detecting differentially expressed genes across studies suggests individual genes may not be sufficiently informative for complex diseases, such as myocardial infarction (MI). Rather, dysregulation of the 'molecular network' may be critical for pathogenic processes. Such a dynamic network can be built from pairwise non-linear interactions. RESULTS: We investigate non-linear interactions represented in mRNA expression profiles that integrate genetic background and environmental factors. Using logistic regression, we test the association of individual GWAS-based candidate genes and non-linear interaction terms (between these mRNA expression levels) with MI. Based on microarray data in CATHGEN (CATHeterization in GENetics) and FHS (Framingham Heart Study), we find individual genes and pairs of mRNAs, encoded by 41 MI candidate genes, with significant interaction terms in the logistic regression model. Two pairs replicate between CATHGEN and FHS (CNNM2|GUCY1A3 and CNNM2|ZEB2). Analysis of RNAseq data from GTEx (Genotype-Tissue Expression) shows that 20 % of these disease-associated RNA pairs are co-expressed, further prioritizing significant interactions. Because edges in sparse co-expression networks formed solely by the 41 candidate genes are unlikely to represent direct physical interactions, we identify additional RNAs as links between network pairs of candidate genes. This approach reveals additional mRNAs and interaction terms significant in the context of MI, for example, the path CNNM2|ACSL5|SCARF1|GUCY1A3, characterized by the common themes of magnesium and lipid processing. CONCLUSIONS: The results of this study support a role for non-linear interactions between genes in MI and provide a basis for further study of MI systems biology. mRNA expression profiles encoded by a limited number of candidate genes yield sparse networks of MI-relevant interactions that can be expanded to include additional candidates by co-expression analysis. The non-linear interactions observed here inform our understanding of the clinical relevance of gene-gene interactions in the pathophysiology of MI, while providing a new strategy in developing clinical biomarker panels.
Asunto(s)
Biología Computacional , Perfilación de la Expresión Génica , Infarto del Miocardio/genética , Dinámicas no Lineales , Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo , Humanos , Hipercolesterolemia/complicaciones , Hipercolesterolemia/genética , Infarto del Miocardio/complicaciones , Infarto del Miocardio/epidemiología , ARN Mensajero/genéticaRESUMEN
Osteoprotegerin (OPG) is involved in bone homeostasis and tumor cell survival. Circulating OPG levels are also important biomarkers of various clinical traits, such as cancers and atherosclerosis. OPG levels were measured in serum or in plasma. In a meta-analysis of genome-wide association studies in up to 10 336 individuals from European and Asian origin, we discovered that variants >100 kb upstream of the TNFRSF11B gene encoding OPG and another new locus on chromosome 17q11.2 were significantly associated with OPG variation. We also identified a suggestive locus on chromosome 14q21.2 associated with the trait. Moreover, we estimated that over half of the heritability of OPG levels could be explained by all variants examined in our study. Our findings provide further insight into the genetic regulation of circulating OPG levels.
Asunto(s)
Cromosomas Humanos Par 14/química , Cromosomas Humanos Par 17/química , Sitios Genéticos , Osteoprotegerina/genética , Polimorfismo Genético , Carácter Cuantitativo Heredable , Pueblo Asiatico , Femenino , Genoma Humano , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Osteoprotegerina/sangre , Población BlancaRESUMEN
Dementia with Lewy Bodies (DLB) is the second most common neurodegenerative disorder in the elderly. The development and progression of DLB remain unclear. In this study we used next generation sequencing to assess RNA expression profiles and cellular processes associated with DLB in the anterior cingulate cortex, a brain region affected by DLB pathology. The expression measurements were made in autopsy brain tissues from 8 DLB subjects and 10 age-matched controls using AmpliSeq technology with ion torrent sequencing. The analysis of RNA expression profiles revealed 490 differentially expressed genes, among which 367 genes were down-regulated and 123 were up-regulated. Functional enrichment analysis of genes differentially expressed in DLB indicated downregulation of genes associated with myelination, neurogenesis, and regulation of nervous system development. miRNA binding sites enriched in these mRNAs yielded a list of candidate miRNAs participating in DLB pathophysiology. Our study provides a comprehensive picture of gene expression landscape in DLB, identifying key cellular processes associated with DLB pathology.
Asunto(s)
Encéfalo/metabolismo , Enfermedad por Cuerpos de Lewy/genética , Anciano , Encéfalo/patología , Estudios de Casos y Controles , Perfilación de la Expresión Génica , Giro del Cíngulo/metabolismo , Giro del Cíngulo/patología , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Cuerpos de Lewy/metabolismo , Cuerpos de Lewy/patología , Enfermedad por Cuerpos de Lewy/patología , MicroARNs/genética , Degeneración Nerviosa/genética , Degeneración Nerviosa/patología , ARN Mensajero/genética , Análisis de Secuencia de ARNAsunto(s)
Pigmento Macular , Niño , Humanos , Antígenos CD36/genética , Polimorfismo de Nucleótido SimpleRESUMEN
BACKGROUND: Over the past 50,000 years, shifts in human-environmental or human-human interactions shaped genetic differences within and among human populations, including variants under positive selection. Shaped by environmental factors, such variants influence the genetics of modern health, disease, and treatment outcome. Because evolutionary processes tend to act on gene regulation, we test whether regulatory variants are under positive selection. We introduce a new approach to enhance detection of genetic markers undergoing positive selection, using conditional entropy to capture recent local selection signals. RESULTS: We use conditional logistic regression to compare our Adjusted Haplotype Conditional Entropy (H|H) measure of positive selection to existing positive selection measures. H|H and existing measures were applied to published regulatory variants acting in cis (cis-eQTLs), with conditional logistic regression testing whether regulatory variants undergo stronger positive selection than the surrounding gene. These cis-eQTLs were drawn from six independent studies of genotype and RNA expression. The conditional logistic regression shows that, overall, H|H is substantially more powerful than existing positive-selection methods in identifying cis-eQTLs against other Single Nucleotide Polymorphisms (SNPs) in the same genes. When broken down by Gene Ontology, H|H predictions are particularly strong in some biological process categories, where regulatory variants are under strong positive selection compared to the bulk of the gene, distinct from those GO categories under overall positive selection. . However, cis-eQTLs in a second group of genes lack positive selection signatures detectable by H|H, consistent with ancient short haplotypes compared to the surrounding gene (for example, in innate immunity GO:0042742); under such other modes of selection, H|H would not be expected to be a strong predictor.. These conditional logistic regression models are adjusted for Minor allele frequency(MAF); otherwise, ascertainment bias is a huge factor in all eQTL data sets. Relationships between Gene Ontology categories, positive selection and eQTL specificity were replicated with H|H in a single larger data set. Our measure, Adjusted Haplotype Conditional Entropy (H|H), was essential in generating all of the results above because it: 1) is a stronger overall predictor for eQTLs than comparable existing approaches, and 2) shows low sequential auto-correlation, overcoming problems with convergence of these conditional regression statistical models. CONCLUSIONS: Our new method, H|H, provides a consistently more robust signal associated with cis-eQTLs compared to existing methods. We interpret this to indicate that some cis-eQTLs are under positive selection compared to their surrounding genes. Conditional entropy indicative of a selective sweep is an especially strong predictor of eQTLs for genes in several biological processes of medical interest. Where conditional entropy is a weak or negative predictor of eQTLs, such as innate immune genes, this would be consistent with balancing selection acting on such eQTLs over long time periods. Different measures of selection may be needed for variant prioritization under other modes of evolutionary selection.
Asunto(s)
Biología Computacional , Entropía , Perfilación de la Expresión Génica , Variación Genética , Sitios de Carácter Cuantitativo , Marcadores Genéticos , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
BACKGROUND: We used RNA sequencing to analyze transcript profiles of ten autopsy brain regions from ten subjects. RNA sequencing techniques were designed to detect both coding and non-coding RNA, splice isoform composition, and allelic expression. Brain regions were selected from five subjects with a documented history of smoking and five non-smokers. Paired-end RNA sequencing was performed on SOLiD instruments to a depth of >40 million reads, using linearly amplified, ribosomally depleted RNA. Sequencing libraries were prepared with both poly-dT and random hexamer primers to detect all RNA classes, including long non-coding (lncRNA), intronic and intergenic transcripts, and transcripts lacking poly-A tails, providing additional data not previously available. The study was designed to generate a database of the complete transcriptomes in brain region for gene network analyses and discovery of regulatory variants. RESULTS: Of 20,318 protein coding and 18,080 lncRNA genes annotated from GENCODE and lncipedia, 12 thousand protein coding and 2 thousand lncRNA transcripts were detectable at a conservative threshold. Of the aligned reads, 52 % were exonic, 34 % intronic and 14 % intergenic. A majority of protein coding genes (65 %) was expressed in all regions, whereas ncRNAs displayed a more restricted distribution. Profiles of RNA isoforms varied across brain regions and subjects at multiple gene loci, with neurexin 3 (NRXN3) a prominent example. Allelic RNA ratios deviating from unity were identified in > 400 genes, detectable in both protein-coding and non-coding genes, indicating the presence of cis-acting regulatory variants. Mathematical modeling was used to identify RNAs stably expressed in all brain regions (serving as potential markers for normalizing expression levels), linked to basic cellular functions. An initial analysis of differential expression analysis between smokers and nonsmokers implicated a number of genes, several previously associated with nicotine exposure. CONCLUSIONS: RNA sequencing identifies distinct and consistent differences in gene expression between brain regions, with non-coding RNA displaying greater diversity between brain regions than mRNAs. Numerous RNAs exhibit robust allele selective expression, proving a means for discovery of cis-acting regulatory factors with potential clinical relevance.
Asunto(s)
Alelos , Encéfalo/metabolismo , Perfilación de la Expresión Génica , Isoformas de ARN/genética , ARN no Traducido/genética , Análisis de Secuencia de ARN , Humanos , Masculino , Polimorfismo de Nucleótido Simple , Fumar/genéticaRESUMEN
Genetic factors strongly influence risk of common human diseases and treatment outcomes but the causative variants remain largely unknown; this gap has been called the 'missing heritability'. We propose several hypotheses that in combination have the potential to narrow the gap. First, given a multi-stage path from wellness to disease, we propose that common variants under positive evolutionary selection represent normal variation and gate the transition between wellness and an 'off-well' state, revealing adaptations to changing environmental conditions. In contrast, genome-wide association studies (GWAS) focus on deleterious variants conveying disease risk, accelerating the path from off-well to illness and finally specific diseases, while common 'normal' variants remain hidden in the noise. Second, epistasis (dynamic gene-gene interactions) likely assumes a central role in adaptations and evolution; yet, GWAS analyses currently are poorly designed to reveal epistasis. As gene regulation is germane to adaptation, we propose that epistasis among common normal regulatory variants, or between common variants and less frequent deleterious variants, can have strong protective or deleterious phenotypic effects. These gene-gene interactions can be highly sensitive to environmental stimuli and could account for large differences in drug response between individuals. Residing largely outside the protein-coding exome, common regulatory variants affect either transcription of coding and non-coding RNAs (regulatory SNPs, or rSNPs) or RNA functions and processing (structural RNA SNPs, or srSNPs). Third, with the vast majority of causative variants yet to be discovered, GWAS rely on surrogate markers, a confounding factor aggravated by the presence of more than one causative variant per gene and by epistasis. We propose that the confluence of these factors may be responsible to large extent for the observed heritability gap.
Asunto(s)
Enfermedad/genética , Exoma , Predisposición Genética a la Enfermedad , Patrón de Herencia/genética , Sistemas de Lectura Abierta/genética , Resultado del Tratamiento , Epistasis Genética , Interacción Gen-Ambiente , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos , Estilo de VidaRESUMEN
Structural genomics consortia established that protein crystallization is the primary obstacle to structure determination using x-ray crystallography. We previously demonstrated that crystallization propensity is systematically related to primary sequence, and we subsequently performed computational analyses showing that arginine is the most overrepresented amino acid in crystal-packing interfaces in the Protein Data Bank. Given the similar physicochemical characteristics of arginine and lysine, we hypothesized that multiple lysine-to-arginine (KR) substitutions should improve crystallization. To test this hypothesis, we developed software that ranks lysine sites in a target protein based on the redundancy-corrected KR substitution frequency in homologs. This software can be run interactively on the worldwide web at https://www.pxengineering.org/. We demonstrate that three unrelated single-domain proteins can tolerate 5-11 KR substitutions with at most minor destabilization, and, for two of these three proteins, the construct with the largest number of KR substitutions exhibits significantly enhanced crystallization propensity. This approach rapidly produced a 1.9 Å crystal structure of a human protein domain refractory to crystallization with its native sequence. Structures from Bulk KR-substituted domains show the engineered arginine residues frequently make hydrogen-bonds across crystal-packing interfaces. We thus demonstrate that Bulk KR substitution represents a rational and efficient method for probabilistic engineering of protein surface properties to improve crystallization.
Asunto(s)
Lisina , Proteínas , Humanos , Lisina/química , Cristalización , Proteínas/genética , Aminoácidos/química , Cristalografía por Rayos X , Arginina/metabolismoRESUMEN
BACKGROUND: Measuring allelic RNA expression ratios is a powerful approach for detecting cis-acting regulatory variants, RNA editing, loss of heterozygosity in cancer, copy number variation, and allele-specific epigenetic gene silencing. Whole transcriptome RNA sequencing (RNA-Seq) has emerged as a genome-wide tool for identifying allelic expression imbalance (AEI), but numerous factors bias allelic RNA ratio measurements. Here, we compare RNA-Seq allelic ratios measured in nine different human brain regions with a highly sensitive and accurate SNaPshot measure of allelic RNA ratios, identifying factors affecting reliable allelic ratio measurement. Accounting for these factors, we subsequently surveyed the variability of RNA editing across brain regions and across individuals. RESULTS: We find that RNA-Seq allelic ratios from standard alignment methods correlate poorly with SNaPshot, but applying alternative alignment strategies and correcting for observed biases significantly improves correlations. Deploying these methods on a transcriptome-wide basis in nine brain regions from a single individual, we identified genes with AEI across all regions (SLC1A3, NHP2L1) and many others with region-specific AEI. In dorsolateral prefrontal cortex (DLPFC) tissues from 14 individuals, we found evidence for frequent regulatory variants affecting RNA expression in tens to hundreds of genes, depending on stringency for assigning AEI. Further, we find that the extent and variability of RNA editing is similar across brain regions and across individuals. CONCLUSIONS: These results identify critical factors affecting allelic ratios measured by RNA-Seq and provide a foundation for using this technology to screen allelic RNA expression on a transcriptome-wide basis. Using this technology as a screening tool reveals tens to hundreds of genes harboring frequent functional variants affecting RNA expression in the human brain. With respect to RNA editing, the similarities within and between individuals leads us to conclude that this post-transcriptional process is under heavy regulatory influence to maintain an optimal degree of editing for normal biological function.
Asunto(s)
Alelos , Encéfalo/metabolismo , Perfilación de la Expresión Génica , ARN/genética , Análisis de Secuencia de ARN , Adulto , ADN Complementario/biosíntesis , Humanos , Masculino , Persona de Mediana Edad , Polimorfismo Genético/genética , Corteza Prefrontal/metabolismo , Edición de ARN/genética , Adulto JovenRESUMEN
It has been proposed that supertree approaches should be applied to large multilocus datasets to achieve computational tractability. Large datasets such as those derived from phylogenomics studies can be broken into many locus-specific tree searches and the resulting trees can be stitched together via a supertree method. Using simulated data, workers have reported that they can rapidly construct a supertree that is comparable to the results of heuristic tree search on the entire dataset. To test this assertion with organismal data, we compare tree length under the parsimony criterion and computational time for 20 multilocus datasets using supertree (SuperFine and SuperTriplets) and supermatrix (heuristic search in TNT) approaches. Tree length and computational times were compared among methods using the Wilcoxon matched-pairs signed rank test. Supermatrix searches produced significantly shorter trees than either supertree approach (SuperFine or SuperTriplets; P < 0.0002 in both cases). Moreover, the processing time of supermatrix search was significantly lower than SuperFine+locus-specific search (P < 0.01) but roughly equivalent to that of SuperTriplets+locus-specific search (P > 0.4, not significant). In conclusion, we show by using real rather than simulated data that there is no basis, either in time tractability or in tree length, for use of supertrees over heuristic tree search using a supermatrix for phylogenomics.
RESUMEN
Overexpression represents a principal bottleneck in structural and functional studies of integral membrane proteins (IMPs). Although E. coli remains the leading organism for convenient and economical protein overexpression, many IMPs exhibit toxicity on induction in this host and give low yields of properly folded protein. Different mechanisms related to membrane biogenesis and IMP folding have been proposed to contribute to these problems, but there is limited understanding of the physical and physiological constraints on IMP overexpression and folding in vivo. Therefore, we used a variety of genetic, genomic, and microscopy techniques to characterize the physiological responses of Escherichia coli MG1655 cells to overexpression of a set of soluble proteins and IMPs, including constructs exhibiting different levels of toxicity and producing different levels of properly folded versus misfolded product on induction. Genetic marker studies coupled with transcriptomic results indicate only minor perturbations in many of the physiological systems implicated in previous studies of IMP biogenesis. Overexpression of either IMPs or soluble proteins tends to block execution of the standard stationary-phase transcriptional program, although these effects are consistently stronger for the IMPs included in our study. However, these perturbations are not an impediment to successful protein overexpression. We present evidence that, at least for the target proteins included in our study, there is no inherent obstacle to IMP overexpression in E. coli at moderate levels suitable for structural studies and that the biochemical and conformational properties of the proteins themselves are the major obstacles to success. Toxicity associated with target protein activity produces selective pressure leading to preferential growth of cells harboring expression-reducing and inactivating mutations, which can produce chemical heterogeneity in the target protein population, potentially contributing to the difficulties encountered in IMP crystallization.
Asunto(s)
Proteínas de Escherichia coli/biosíntesis , Escherichia coli/crecimiento & desarrollo , Proteínas de la Membrana/biosíntesis , Análisis por Matrices de Proteínas/métodos , Escherichia coli/genética , Proteínas de Escherichia coli/genética , Regulación Bacteriana de la Expresión Génica , Vectores Genéticos , Proteínas de la Membrana/química , Proteínas de la Membrana/genética , Pliegue de Proteína , ARN Mensajero/genética , ARN Mensajero/metabolismo , Factores de Transcripción/biosíntesis , Factores de Transcripción/genética , Transcripción GenéticaRESUMEN
Human genome-wide association studies found single-nucleotide polymorphisms (SNPs) near LYPLAL1 (Lysophospholipase-like protein 1) that have sex-specific effects on fat distribution and metabolic traits. To determine whether altering LYPLAL1 affects obesity and metabolic disease, we created and characterized a mouse knockout (KO) of Lyplal1. We fed the experimental group of mice a high-fat, high-sucrose (HFHS) diet for 23 weeks, and the controls were fed regular chow diet. Here, we show that CRISPR-Cas9 whole-body Lyplal1 KO mice fed an HFHS diet showed sex-specific differences in weight gain and fat accumulation as compared to chow diet. Female, not male, KO mice weighed less than WT mice, had reduced body fat percentage, had white fat mass, and had adipocyte diameter not accounted for by changes in the metabolic rate. Female, but not male, KO mice had increased serum triglycerides, decreased aspartate, and decreased alanine aminotransferase. Lyplal1 KO mice of both sexes have reduced liver triglycerides and steatosis. These diet-specific effects resemble the effects of SNPs near LYPLAL1 in humans, suggesting that LYPLAL1 has an evolutionary conserved sex-specific effect on adiposity. This murine model can be used to study this novel gene-by-sex-by-diet interaction to elucidate the metabolic effects of LYPLAL1 on human obesity.
Asunto(s)
Estudio de Asociación del Genoma Completo , Lisofosfolipasa , Obesidad , Animales , Femenino , Humanos , Masculino , Ratones , Dieta Alta en Grasa/efectos adversos , Ratones Endogámicos C57BL , Ratones Noqueados , Obesidad/genética , Obesidad/metabolismo , Triglicéridos , Lisofosfolipasa/genéticaRESUMEN
HIV infections are initiated by a limited number of variants that diverge into a diverse quasispecies swarm. During in utero mother-to-child transmission (IU MTCT), transmitted viral variants must pass through multiple unique environments, and our previously published data suggest a nonstochastic model of transmission. As an alternative to a stochastic model of viral transmission, we hypothesize that viral selection in the placental environment influences the character of the viral quasispecies when HIV-1 is transmitted in utero. To test this hypothesis, we used single-template amplification to isolate HIV-1 envelope gene (env) sequences from both peripheral plasma and the placentas of eight nontransmitting (NT) and nine IU-transmitting participants. Statistically significant compartmentalization between peripheral and placental HIV-1 env was detected in one of the eight NT cases and six of the nine IU MTCT cases. In addition, viral sequences isolated from IU MTCT placental tissue showed variation in env V1 loop lengths compared to matched maternal sequences, while NT placental env sequences did not. Finally, comparison of env sequences from NT and IU MTCT participants indicated statistically significant differences in Kyte-Doolittle hydropathy in the signal peptide, C2, V3, and C3 regions. Our working hypothesis is that the hydropathy differences in Env associated with IU MTCT alter viral cellular tropism or affinity, allowing HIV-1 to efficiently infect placentally localized cells.