Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 519
Filter
Add more filters

Publication year range
1.
Nat Rev Genet ; 23(3): 154-168, 2022 03.
Article in English | MEDLINE | ID: mdl-34611352

ABSTRACT

Modern genome-scale methods that identify new genes, such as proteogenomics and ribosome profiling, have revealed, to the surprise of many, that overlap in genes, open reading frames and even coding sequences is widespread and functionally integrated into prokaryotic, eukaryotic and viral genomes. In parallel, the constraints that overlapping regions place on genome sequences and their evolution can be harnessed in bioengineering to build more robust synthetic strains and constructs. With a focus on overlapping protein-coding and RNA-coding genes, this Review examines their discovery, topology and biogenesis in the context of their genome biology. We highlight exciting new uses for sequence overlap to control translation, compress synthetic genetic constructs, and protect against mutation.


Subject(s)
Bioengineering , Genes, Overlapping/physiology , Genome/genetics , Animals , Bioengineering/methods , Bioengineering/trends , Chromosome Mapping , Humans , Organisms, Genetically Modified/genetics
2.
J Virol ; 98(4): e0024224, 2024 Apr 16.
Article in English | MEDLINE | ID: mdl-38446633

ABSTRACT

Viral genomes frequently harbor overlapping genes, complicating the development of virus-vectored vaccines and gene therapies. This study introduces a novel conditional splicing system to precisely control the expression of such overlapping genes through recombinase-mediated conditional splicing. We refined site-specific recombinase (SSR) conditional splicing systems and explored their mechanisms. The systems demonstrated exceptional inducibility (116,700-fold increase) with negligible background expression, facilitating the conditional expression of overlapping genes in adenovirus-associated virus (AAV) and human immunodeficiency virus type 1. Notably, this approach enabled the establishment of stable AAV producer cell lines, encapsulating all necessary packaging genes. Our findings underscore the potential of the SSR-conditional splicing system to significantly advance vector engineering, enhancing the efficacy and scalability of viral-vector-based therapies and vaccines. IMPORTANCE: Regulating overlapping genes is vital for gene therapy and vaccine development using viral vectors. The regulation of overlapping genes presents challenges, including cytotoxicity and impacts on vector capacity and genome stability, which restrict stable packaging cell line development and broad application. To address these challenges, we present a "loxp-splice-loxp"-based conditional splicing system, offering a novel solution for conditional expression of overlapping genes and stable cell line establishment. This system may also regulate other cytotoxic genes, representing a significant advancement in cell engineering and gene therapy as well as biomass production.


Subject(s)
Dependovirus , Genes, Overlapping , Genes, Viral , Genetic Engineering , HIV-1 , RNA Splicing , Humans , Cell Line , Dependovirus/genetics , DNA Nucleotidyltransferases/genetics , DNA Nucleotidyltransferases/metabolism , Gene Expression Regulation, Viral , Genes, Overlapping/genetics , Genes, Viral/genetics , Genetic Engineering/methods , Genetic Therapy/methods , Genetic Vectors/genetics , HIV-1/genetics , RNA Splicing/genetics , Vaccines/biosynthesis , Vaccines/genetics , Viral Genome Packaging/genetics
3.
Nucleic Acids Res ; 51(13): 7094-7108, 2023 07 21.
Article in English | MEDLINE | ID: mdl-37260076

ABSTRACT

The development of synthetic biological circuits that maintain functionality over application-relevant time scales remains a significant challenge. Here, we employed synthetic overlapping sequences in which one gene is encoded or 'entangled' entirely within an alternative reading frame of another gene. In this design, the toxin-encoding relE was entangled within ilvA, which encodes threonine deaminase, an enzyme essential for isoleucine biosynthesis. A functional entanglement construct was obtained upon modification of the ribosome-binding site of the internal relE gene. Using this optimized design, we found that the selection pressure to maintain functional IlvA stabilized the production of burdensome RelE for >130 generations, which compares favorably with the most stable kill-switch circuits developed to date. This stabilizing effect was achieved through a complete alteration of the allowable landscape of mutations such that mutations inactivating the entangled genes were disfavored. Instead, the majority of lineages accumulated mutations within the regulatory region of ilvA. By reducing baseline relE expression, these more 'benign' mutations lowered circuit burden, which suppressed the accumulation of relE-inactivating mutations, thereby prolonging kill-switch function. Overall, this work demonstrates the utility of sequence entanglement paired with an adaptive laboratory evolution campaign to increase the evolutionary stability of burdensome synthetic circuits.


Subject(s)
Genes, Overlapping , Genetic Engineering , Binding Sites , Escherichia coli/genetics , Mutation , Ribosomes/genetics , Pseudomonas/genetics , Genetic Engineering/methods
4.
PLoS Pathog ; 18(2): e1010331, 2022 02.
Article in English | MEDLINE | ID: mdl-35202429

ABSTRACT

Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (-0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.


Subject(s)
Genes, Overlapping , Genome, Viral , Genes, Overlapping/genetics , Genome, Viral/genetics , Open Reading Frames/genetics
5.
PLoS Pathog ; 18(7): e1010739, 2022 07.
Article in English | MEDLINE | ID: mdl-35901192

ABSTRACT

Hepadnaviruses use extensively overlapping genes to expand their coding capacity, especially the precore/core genes encode the precore and core proteins with mostly identical sequences but distinct functions. The precore protein of the woodchuck hepatitis virus (WHV) is N-glycosylated, in contrast to the precore of the human hepatitis B virus (HBV) that lacks N-glycosylation. To explore the roles of the N-linked glycosylation sites in precore and core functions, we substituted T77 and T92 in the WHV precore/core N-glycosylation motifs (75NIT77 and 90NDT92) with the corresponding HBV residues (E77 and N92) to eliminate the sequons. Conversely, these N-glycosylation sequons were introduced into the HBV precore/core gene by E77T and N92T substitutions. We found that N-glycosylation increased the levels of secreted precore gene products from both HBV and WHV. However, the HBV core (HBc) protein carrying the E77T substitution was defective in supporting virion secretion, and during infection, the HBc E77T and N92T substitutions impaired the formation of the covalently closed circular DNA (cccDNA), the critical viral DNA molecule responsible for establishing and maintaining infection. In cross-species complementation assays, both HBc and WHV core (WHc) proteins supported all steps of intracellular replication of the heterologous virus while WHc, with or without the N-glycosylation sequons, failed to interact with HBV envelope proteins for virion secretion. Interestingly, WHc supported more efficiently intracellular cccDNA amplification than HBc in the context of either HBV or WHV. These findings reveal novel determinants of precore secretion and core functions and illustrate strong constraints during viral host adaptation resulting from their compact genome and extensive use of overlapping genes.


Subject(s)
Hepadnaviridae , Hepatitis B Virus, Woodchuck , Hepatitis B , DNA, Circular , DNA, Viral , Genes, Overlapping , Glycosylation , Hepadnaviridae/genetics , Hepatitis B/genetics , Hepatitis B virus/genetics , Host Adaptation , Humans , Virus Replication/genetics
6.
PLoS Pathog ; 17(3): e1009376, 2021 03.
Article in English | MEDLINE | ID: mdl-33720976

ABSTRACT

Hypervirulent K. pneumoniae (hvKp) is a distinct pathotype that causes invasive community-acquired infections in healthy individuals. Hypermucoviscosity (hmv) is a major phenotype associated with hvKp characterized by copious capsule production and poor sedimentation. Dissecting the individual functions of CPS production and hmv in hvKp has been hindered by the conflation of these two properties. Although hmv requires capsular polysaccharide (CPS) biosynthesis, other cellular factors may also be required and some fitness phenotypes ascribed to CPS may be distinctly attributed to hmv. To address this challenge, we systematically identified genes that impact capsule and hmv. We generated a condensed, ordered transposon library in hypervirulent strain KPPR1, then evaluated the CPS production and hmv phenotypes of the 3,733 transposon mutants, representing 72% of all open reading frames in the genome. We employed forward and reverse genetic screens to evaluate effects of novel and known genes on CPS biosynthesis and hmv. These screens expand our understanding of core genes that coordinate CPS biosynthesis and hmv, as well as identify central metabolism genes that distinctly impact CPS biosynthesis or hmv, specifically those related to purine metabolism, pyruvate metabolism and the TCA cycle. Six representative mutants, with varying effect on CPS biosynthesis and hmv, were evaluated for their impact on CPS thickness, serum resistance, host cell association, and fitness in a murine model of disseminating pneumonia. Altogether, these data demonstrate that hmv requires both CPS biosynthesis and other cellular factors, and that hmv and CPS may serve distinct functions during pathogenesis. The integration of hmv and CPS to the metabolic status of the cell suggests that hvKp may require certain nutrients to specifically cause deep tissue infections.


Subject(s)
Bacterial Capsules/physiology , Genetic Fitness/physiology , Klebsiella Infections , Klebsiella pneumoniae/genetics , Klebsiella pneumoniae/pathogenicity , Animals , Genes, Overlapping , Humans , Mice , Virulence/genetics , Viscosity
7.
Thorax ; 77(2): 115-122, 2022 02.
Article in English | MEDLINE | ID: mdl-34168019

ABSTRACT

RATIONALE: COPD can be assessed using multidimensional grading systems with components from three domains: pulmonary function tests, symptoms and systemic features. Clinically, measures may be used interchangeably, though it is not known if they share similar pathobiology. OBJECTIVE: To use RNA sequencing (RNA-seq) to determine if there is an overlap in the underlying biological mechanisms and consequences driving different components of the multidimensional grading systems. METHODS: Whole blood was collected for RNA-seq from current and former smokers in the Genetic Epidemiology of COPD study. We tested the overlap in gene expression and biological pathways associated with case-control status and quantitative COPD phenotypes within and between the three domains. RESULTS: In 2647 subjects, there were 3030 genes differentially expressed in any of the three domains or case-control status. There were five genes that overlapped between the three domains and case-control status, including G protein-coupled receptor 15(GPR15), sestrin 1 (SESN1) and interferon-induced guanylate-binding protein 1 (GBP1), which were associated with longitudinal decline in FEV1. The overlap between the three domains was enriched for pathways related to cellular components. CONCLUSIONS: We identified gene sets and pathways that overlap between 12 COPD-related phenotypes and case-control status. There were no pathways represented in the overlap between the three domains and case-control status, but we identified multiple genes that demonstrated a consistent pattern of expression across several of the phenotypes. Patterns of gene expression correlation were generally similar to the correlation of clinical phenotypes in the PFT and symptom domains but not the systemic features.


Subject(s)
Pulmonary Disease, Chronic Obstructive , Gene Expression , Genes, Overlapping , Humans , Phenotype , Pulmonary Disease, Chronic Obstructive/genetics , Sequence Analysis, RNA
8.
Nucleic Acids Res ; 48(W1): W558-W565, 2020 07 02.
Article in English | MEDLINE | ID: mdl-32374885

ABSTRACT

Overlapping genes are commonplace in viruses and play an important role in their function and evolution. For these genes, molecular coevolution may be seen as a mechanism to decrease the evolutionary constraints of amino acid positions in the overlapping regions and to tolerate or compensate unfavorable mutations. Tracing these mutational sites, could help to gain insight on the direct or indirect effect of the mutations in the corresponding overlapping proteins. In the past, coevolution analysis has been used to identify residue pairs and coevolutionary signatures within or between proteins that served as markers of physical interactions and/or functional relationships. Coevolution in OVerlapped sequences by Tree analysis (COVTree) is a web server providing the online analysis of coevolving amino-acid pairs in overlapping genes, where residues might be located inside or outside the overlapping region. COVTree is designed to handle protein families with various characteristics, among which those that typically display a small number of highly conserved sequences. It is based on BIS2, a fast version of the coevolution analysis tool Blocks in Sequences (BIS). COVTree provides a rich and interactive graphical interface to ease biological interpretation of the results and it is openly accessible at http://www.lcqb.upmc.fr/COVTree/.


Subject(s)
Evolution, Molecular , Genes, Overlapping , Software , Genes, Viral , Hepatitis B Surface Antigens/genetics , Hepatitis B virus/genetics , Sequence Alignment
9.
BMC Genomics ; 22(1): 888, 2021 Dec 11.
Article in English | MEDLINE | ID: mdl-34895142

ABSTRACT

BACKGROUND: Overlapping genes (OLGs) with long protein-coding overlapping sequences are disallowed by standard genome annotation programs, outside of viruses. Recently however they have been discovered in Archaea, diverse Bacteria, and Mammals. The biological factors underlying life's ability to create overlapping genes require more study, and may have important applications in understanding evolution and in biotechnology. A previous study claimed that protein domains from viruses were much better suited to forming overlaps than those from other cellular organisms - in this study we assessed this claim, in order to discover what might underlie taxonomic differences in the creation of gene overlaps. RESULTS: After overlapping arbitrary Pfam domain pairs and evaluating them with Hidden Markov Models we find OLG construction to be much less constrained than expected. For instance, close to 10% of the constructed sequences cannot be distinguished from typical sequences in their protein family. Most are also indistinguishable from natural protein sequences regarding identity and secondary structure. Surprisingly, contrary to a previous study, virus domains were much less suitable for designing OLGs than bacterial or eukaryotic domains were. In general, the amount of amino acid change required to force a domain to overlap is approximately equal to the variation observed within a typical domain family. The resulting high similarity between natural sequences and those altered so as to overlap is mostly due to the combination of high redundancy in the genetic code and the evolutionary exchangeability of many amino acids. CONCLUSIONS: Synthetic overlapping genes which closely resemble natural gene sequences, as measured by HMM profiles, are remarkably easy to construct, and most arbitrary domain pairs can be altered so as to overlap while retaining high similarity to the original sequences. Future work however will need to assess important factors not considered such as intragenic interactions which affect protein folding. While the analysis here is not sufficient to guarantee functional folding proteins, further analysis of constructed OLGs will improve our understanding of the origin of these remarkable genetic elements across life and opens up exciting possibilities for synthetic biology.


Subject(s)
Biological Factors , Genes, Overlapping , Amino Acid Sequence , Animals , Genome , Open Reading Frames
10.
Mol Biol Evol ; 37(8): 2440-2449, 2020 08 01.
Article in English | MEDLINE | ID: mdl-32243542

ABSTRACT

Purifying (negative) natural selection is a hallmark of functional biological sequences, and can be detected in protein-coding genes using the ratio of nonsynonymous to synonymous substitutions per site (dN/dS). However, when two genes overlap the same nucleotide sites in different frames, synonymous changes in one gene may be nonsynonymous in the other, perturbing dN/dS. Thus, scalable methods are needed to estimate functional constraint specifically for overlapping genes (OLGs). We propose OLGenie, which implements a modification of the Wei-Zhang method. Assessment with simulations and controls from viral genomes (58 OLGs and 176 non-OLGs) demonstrates low false-positive rates and good discriminatory ability in differentiating true OLGs from non-OLGs. We also apply OLGenie to the unresolved case of HIV-1's putative antisense protein gene, showing significant purifying selection. OLGenie can be used to study known OLGs and to predict new OLGs in genome annotation. Software and example data are freely available at https://github.com/chasewnelson/OLGenie (last accessed April 10, 2020).


Subject(s)
Genes, Overlapping , Genetic Techniques , Selection, Genetic , Silent Mutation , Software , HIV-1/genetics
11.
Ann Diagn Pathol ; 52: 151734, 2021 Jun.
Article in English | MEDLINE | ID: mdl-33838490

ABSTRACT

So-called oncocytic papillary renal cell carcinoma (OPRCC) is a poorly defined variant of papillary renal cell carcinoma. Since its first description, several studies were published with conflicting results, and thus precise definition is lacking. A cohort of 39 PRCCs composed of oncocytic cells were analyzed. Cases were divided into 3 groups based on copy number variation (CNV) pattern. The first group consisted of 23 cases with CNV equal to renal oncocytoma. The second group consisted of 7 cases with polysomy of chromosomes 7 and 17 and the last group of 9 cases included those with variable CNV. Epidemiologic, morphologic and immunohistochemical features varied among the groups. There were not any particular histomorphologic features correlating with any of the genetic subgroups. Further, a combination of morphologic, immunohistochemical, and molecular-genetic features did not allow to precisely predict biologic behavior. Owing to variable CNV pattern in OPRCC, strict adherence to morphology and immunohistochemical profile is recommended, particularly in limited samples (i.e., core biopsy). Applying CNV pattern as a part of a diagnostic algorithm can be potentially misleading. OPRCC is a highly variable group of tumors, which might be misdiagnosed as renal oncocytoma. Using the term OPRCC as a distinct diagnostic entity is, thanks to its high heterogeneity, questionable.


Subject(s)
Adenoma, Oxyphilic/genetics , Carcinoma, Renal Cell/genetics , Carcinoma, Renal Cell/pathology , Kidney Neoplasms/genetics , Oxyphil Cells/metabolism , Adenoma, Oxyphilic/diagnosis , Adenoma, Oxyphilic/pathology , Adult , Aged , Aged, 80 and over , Biomarkers, Tumor/metabolism , Biopsy, Large-Core Needle/standards , Carcinoma, Renal Cell/epidemiology , Chromosome Aberrations , DNA Copy Number Variations/genetics , Diagnosis, Differential , Diagnostic Errors , Female , Genes, Overlapping/genetics , Humans , Immunohistochemistry/methods , In Situ Hybridization, Fluorescence/methods , Kidney Neoplasms/diagnosis , Kidney Neoplasms/pathology , Male , Middle Aged , Neoplasm Staging/methods , Oxyphil Cells/pathology
12.
J Gen Virol ; 101(10): 1085-1089, 2020 10.
Article in English | MEDLINE | ID: mdl-32667280

ABSTRACT

Identification of the full complement of genes in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a crucial step towards gaining a fuller understanding of its molecular biology. However, short and/or overlapping genes can be difficult to detect using conventional computational approaches, whereas high-throughput experimental approaches - such as ribosome profiling - cannot distinguish translation of functional peptides from regulatory translation or translational noise. By studying regions showing enhanced conservation at synonymous sites in alignments of SARS-CoV-2 and related viruses (subgenus Sarbecovirus) and correlating the results with the conserved presence of an open reading frame (ORF) and a plausible translation mechanism, a putative new gene - ORF3c - was identified. ORF3c overlaps ORF3a in an alternative reading frame. A recently published ribosome profiling study confirmed that ORF3c is indeed translated during infection. ORF3c is conserved across the subgenus Sarbecovirus, and encodes a 40-41 amino acid predicted transmembrane protein.


Subject(s)
Betacoronavirus/genetics , Genes, Overlapping/genetics , Reading Frames/genetics , Amino Acid Sequence/genetics , COVID-19 , Coronavirus Infections/virology , Humans , Pandemics , Phylogeny , Pneumonia, Viral/virology , SARS-CoV-2 , Sequence Alignment , Viral Regulatory and Accessory Proteins/genetics , Viroporin Proteins
13.
Psychol Med ; 50(10): 1695-1705, 2020 07.
Article in English | MEDLINE | ID: mdl-31328717

ABSTRACT

BACKGROUND: Mounting evidence shows genetic overlap between multiple psychiatric disorders. However, the biological underpinnings of shared risk for psychiatric disorders are not yet fully uncovered. The identification of underlying biological mechanisms is crucial for the progress in the treatment of these disorders. METHODS: We applied gene-set analysis including 7372 gene sets, and 53 tissue-type specific gene-expression profiles to identify sets of genes that are involved in the etiology of multiple psychiatric disorders. We included genome-wide meta-association data of the five psychiatric disorders schizophrenia, bipolar disorder, major depressive disorder, autism spectrum disorder, and attention-deficit/hyperactivity disorder. The total dataset contained 159 219 cases and 262 481 controls. RESULTS: We identified 19 gene sets that were significantly associated with the five psychiatric disorders combined, of which we excluded five sets because their associations were likely driven by schizophrenia only. Conditional analyses showed independent effects of several gene sets that in particular relate to the synapse. In addition, we found independent effects of gene expression levels in the cerebellum and frontal cortex. CONCLUSIONS: We obtained novel evidence for shared biological mechanisms that act across psychiatric disorders and we showed that several gene sets that have been related to individual disorders play a role in a broader range of psychiatric disorders.


Subject(s)
Alleles , Genes, Overlapping , Genetic Heterogeneity , Genetic Testing , Mental Disorders/genetics , Attention Deficit Disorder with Hyperactivity/genetics , Autism Spectrum Disorder/genetics , Bipolar Disorder/genetics , Case-Control Studies , Depressive Disorder, Major/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Polymorphism, Single Nucleotide , Regression Analysis , Risk Factors , Schizophrenia/genetics , White People/genetics
14.
Nucleic Acids Res ; 46(D1): D186-D193, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29069459

ABSTRACT

Gene overlap plays various regulatory functions on transcriptional and post-transcriptional levels. Most current studies focus on protein-coding genes overlapping with non-protein-coding counterparts, the so called natural antisense transcripts. Considerably less is known about the role of gene overlap in the case of two protein-coding genes. Here, we provide OverGeneDB, a database of human and mouse 5' end protein-coding overlapping genes. The database contains 582 human and 113 mouse gene pairs that are transcribed using overlapping promoters in at least one analyzed library. Gene pairs were identified based on the analysis of the transcription start site (TSS) coordinates in 73 human and 10 mouse organs, tissues and cell lines. Beside TSS data, resources for 26 human lung adenocarcinoma cell lines also contain RNA-Seq and ChIP-Seq data for seven histone modifications and RNA Polymerase II activity. The collected data revealed that the overlap region is rarely conserved between the studied species and tissues. In ∼50% of the overlapping genes, transcription started explicitly in the overlap regions. In the remaining half of overlapping genes, transcription was initiated both from overlapping and non-overlapping TSSs. OverGeneDB is accessible at http://overgenedb.amu.edu.pl.


Subject(s)
Databases, Genetic , Genes, Overlapping , Animals , Gene Expression , Histone Code , Humans , Mice , Multigene Family , Open Reading Frames , Promoter Regions, Genetic , RNA Polymerase II/metabolism , Sequence Analysis, RNA , Transcription Factors/metabolism , Transcription Initiation Site
15.
Mol Biol Evol ; 35(10): 2572-2581, 2018 10 01.
Article in English | MEDLINE | ID: mdl-30099499

ABSTRACT

Overlapping genes in viruses maximize the coding capacity of their genomes and allow the generation of new genes without major increases in genome size. Despite their importance, the evolution and function of overlapping genes are often not well understood, in part due to difficulties in their detection. In addition, most bioinformatic approaches for the detection of overlapping genes require the comparison of multiple genome sequences that may not be available in metagenomic surveys of virus biodiversity. We introduce a simple new method for identifying candidate functional overlapping genes using single virus genome sequences. Our method uses randomization tests to estimate the expected length of open reading frames and then identifies overlapping open reading frames that significantly exceed this length and are thus predicted to be functional. We applied this method to 2548 reference RNA virus genomes and find that it has both high sensitivity and low false discovery for genes that overlap by at least 50 nucleotides. Notably, this analysis provided evidence for 29 previously undiscovered functional overlapping genes, some of which are coded in the antisense direction suggesting there are limitations in our current understanding of RNA virus replication.


Subject(s)
Genes, Overlapping , Genetic Techniques , Genome, Viral , Open Reading Frames , RNA Viruses/genetics
16.
BMC Pulm Med ; 19(1): 58, 2019 Mar 07.
Article in English | MEDLINE | ID: mdl-30845926

ABSTRACT

BACKGROUND: Airflow obstruction is a hallmark of chronic obstructive pulmonary disease (COPD), and is defined as either the ratio between forced expiratory volume in one second and forced vital capacity (FEV1/FVC) < 70% or < lower limit of normal (LLN). This study aimed to assess the overlap between genome-wide association studies (GWAS) on airflow obstruction using these two definitions in the same population stratified by smoking. METHODS: GWASes were performed in the LifeLines Cohort Study for both airflow obstruction definitions in never-smokers (NS = 5071) and ever-smokers (ES = 4855). The FEV1/FVC < 70% models were adjusted for sex, age, and height; FEV1/FVC < LLN models were not adjusted. Ever-smokers models were additionally adjusted for pack-years and current-smoking. The overlap in significantly associated SNPs between the two definitions and never/ever-smokers was assessed using several p-value thresholds. To quantify the agreement, the Pearson correlation coefficient was calculated between the p-values and ORs. Replication was performed in the Vlagtwedde-Vlaardingen study (NS = 432, ES = 823). The overlapping SNPs with p < 10- 4 were validated in the Vlagtwedde-Vlaardingen and Rotterdam Study cohorts (NS = 1966, ES = 3134) and analysed for expression quantitative trait loci (eQTL) in lung tissue (n = 1087). RESULTS: In the LifeLines cohort, 96% and 93% of the never- and ever-smokers were classified concordantly based on the two definitions. 26 and 29% of the investigated SNPs were overlapping at p < 0.05 in never- and ever-smokers, respectively. At p < 10- 4 the overlap was 4% and 6% respectively, which could be change findings as shown by simulation studies. The effect estimates of the SNPs of the two definitions correlated strongly, but the p-values showed more variation and correlated only moderately. Similar observations were made in the Vlagtwedde-Vlaardingen study. Two overlapping SNPs in never-smokers (NFYC and FABP7) had the same direction of effect in the validation cohorts and the NFYC SNP was an eQTL for NFYC-AS1. NFYC is a transcription factor that binds to several known COPD genes, and FABP7 may be involved in abnormal pulmonary development. CONCLUSIONS: The definition of airflow obstruction and the population under study may be important determinants of which SNPs are associated with airflow obstruction. The genes FABP7 and NFYC(-AS1) could play a role in airflow obstruction in never-smokers specifically.


Subject(s)
CCAAT-Binding Factor/genetics , Fatty Acid-Binding Protein 7/genetics , Genome-Wide Association Study , Pulmonary Disease, Chronic Obstructive/genetics , Smoking/genetics , Tumor Suppressor Proteins/genetics , Adolescent , Adult , Aged , Aged, 80 and over , Cohort Studies , Female , Forced Expiratory Volume , Genes, Overlapping/genetics , Genetic Predisposition to Disease , Humans , Linear Models , Logistic Models , Lung/physiopathology , Male , Middle Aged , Polymorphism, Single Nucleotide , Quantitative Trait Loci/genetics , Smoking/adverse effects , Spirometry , Vital Capacity , Young Adult
17.
Proc Natl Acad Sci U S A ; 113(44): E6840-E6848, 2016 11 01.
Article in English | MEDLINE | ID: mdl-27791112

ABSTRACT

Neurons of the Statoacoustic Ganglion (SAG), which innervate the inner ear, originate as neuroblasts in the floor of the otic vesicle and subsequently delaminate and migrate toward the hindbrain before completing differentiation. In all vertebrates, locally expressed Fgf initiates SAG development by inducing expression of Neurogenin1 (Ngn1) in the floor of the otic vesicle. However, not all Ngn1-positive cells undergo delamination, nor has the mechanism controlling SAG delamination been elucidated. Here we report that Goosecoid (Gsc), best known for regulating cellular dynamics in the Spemann organizer, regulates delamination of neuroblasts in the otic vesicle. In zebrafish, Fgf coregulates expression of Gsc and Ngn1 in partially overlapping domains, with delamination occurring primarily in the zone of overlap. Loss of Gsc severely inhibits delamination, whereas overexpression of Gsc greatly increases delamination. Comisexpression of Ngn1 and Gsc induces ectopic delamination of some cells from the medial wall of the otic vesicle but with a low incidence, suggesting the action of a local inhibitor. The medial marker Pax2a is required to restrict the domain of gsc expression, and misexpression of Pax2a is sufficient to block delamination and fully suppress the effects of Gsc The opposing activities of Gsc and Pax2a correlate with repression or up-regulation, respectively, of E-cadherin (cdh1). These data resolve a genetic mechanism controlling delamination of otic neuroblasts. The data also elucidate a developmental role for Gsc consistent with a general function in promoting epithelial-to-mesenchymal transition (EMT).


Subject(s)
Basic Helix-Loop-Helix Transcription Factors/genetics , Basic Helix-Loop-Helix Transcription Factors/metabolism , Ganglia, Parasympathetic/growth & development , Ganglia, Parasympathetic/metabolism , Goosecoid Protein/genetics , Goosecoid Protein/metabolism , Nerve Tissue Proteins/genetics , Nerve Tissue Proteins/metabolism , Neurogenesis/physiology , Organizers, Embryonic , Zebrafish Proteins/genetics , Zebrafish Proteins/metabolism , Animals , Cadherins/metabolism , Cell Differentiation/genetics , Ear, Inner/metabolism , Epithelial-Mesenchymal Transition/physiology , Ganglia, Parasympathetic/pathology , Gastrulation , Gene Expression Regulation, Developmental , Genes, Overlapping , Immunohistochemistry , Neural Stem Cells/metabolism , Neural Stem Cells/pathology , Neurogenesis/genetics , Organizers, Embryonic/pathology , PAX2 Transcription Factor/metabolism , Signal Transduction , Up-Regulation , Vestibulocochlear Nerve/growth & development , Vestibulocochlear Nerve/metabolism , Zebrafish/genetics , Zebrafish/metabolism
18.
Genes Dev ; 25(18): 1915-27, 2011 Sep 15.
Article in English | MEDLINE | ID: mdl-21890647

ABSTRACT

Large intergenic noncoding RNAs (lincRNAs) are emerging as key regulators of diverse cellular processes. Determining the function of individual lincRNAs remains a challenge. Recent advances in RNA sequencing (RNA-seq) and computational methods allow for an unprecedented analysis of such transcripts. Here, we present an integrative approach to define a reference catalog of >8000 human lincRNAs. Our catalog unifies previously existing annotation sources with transcripts we assembled from RNA-seq data collected from ∼4 billion RNA-seq reads across 24 tissues and cell types. We characterize each lincRNA by a panorama of >30 properties, including sequence, structural, transcriptional, and orthology features. We found that lincRNA expression is strikingly tissue-specific compared with coding genes, and that lincRNAs are typically coexpressed with their neighboring genes, albeit to an extent similar to that of pairs of neighboring protein-coding genes. We distinguish an additional subset of transcripts that have high evolutionary conservation but may include short ORFs and may serve as either lincRNAs or small peptides. Our integrated, comprehensive, yet conservative reference catalog of human lincRNAs reveals the global properties of lincRNAs and will facilitate experimental studies and further functional classification of these genes.


Subject(s)
Molecular Sequence Annotation/methods , RNA, Untranslated/genetics , Alternative Splicing , Enhancer Elements, Genetic/genetics , Gene Expression Regulation , Genes, Overlapping , Humans , RNA, Untranslated/classification , Sequence Homology, Nucleic Acid
19.
BMC Evol Biol ; 18(1): 21, 2018 02 12.
Article in English | MEDLINE | ID: mdl-29433444

ABSTRACT

BACKGROUND: Due to the DNA triplet code, it is possible that the sequences of two or more protein-coding genes overlap to a large degree. However, such non-trivial overlaps are usually excluded by genome annotation pipelines and, thus, only a few overlapping gene pairs have been described in bacteria. In contrast, transcriptome and translatome sequencing reveals many signals originated from the antisense strand of annotated genes, of which we analyzed an example gene pair in more detail. RESULTS: A small open reading frame of Escherichia coli O157:H7 strain Sakai (EHEC), designated laoB (L-arginine responsive overlapping gene), is embedded in reading frame -2 in the antisense strand of ECs5115, encoding a CadC-like transcriptional regulator. This overlapping gene shows evidence of transcription and translation in Luria-Bertani (LB) and brain-heart infusion (BHI) medium based on RNA sequencing (RNAseq) and ribosomal-footprint sequencing (RIBOseq). The transcriptional start site is 289 base pairs (bp) upstream of the start codon and transcription termination is 155 bp downstream of the stop codon. Overexpression of LaoB fused to an enhanced green fluorescent protein (EGFP) reporter was possible. The sequence upstream of the transcriptional start site displayed strong promoter activity under different conditions, whereas promoter activity was significantly decreased in the presence of L-arginine. A strand-specific translationally arrested mutant of laoB provided a significant growth advantage in competitive growth experiments in the presence of L-arginine compared to the wild type, which returned to wild type level after complementation of laoB in trans. A phylostratigraphic analysis indicated that the novel gene is restricted to the Escherichia/Shigella clade and might have originated recently by overprinting leading to the expression of part of the antisense strand of ECs5115. CONCLUSIONS: Here, we present evidence of a novel small protein-coding gene laoB encoded in the antisense frame -2 of the annotated gene ECs5115. Clearly, laoB is evolutionarily young and it originated in the Escherichia/Shigella clade by overprinting, a process which may cause the de novo evolution of bacterial genes like laoB.


Subject(s)
Arginine/metabolism , Escherichia coli O157/genetics , Escherichia coli O157/metabolism , Escherichia coli Proteins/metabolism , Genes, Overlapping , Open Reading Frames/genetics , Trans-Activators/metabolism , Transcription, Genetic , Base Sequence , Escherichia coli O157/growth & development , Escherichia coli Proteins/genetics , Genes, Bacterial , Green Fluorescent Proteins/metabolism , Mutation/genetics , Phylogeny , Promoter Regions, Genetic , Protein Biosynthesis , Recombinant Fusion Proteins/metabolism , Transcriptome/genetics
20.
J Oral Pathol Med ; 47(6): 547-556, 2018 Jul.
Article in English | MEDLINE | ID: mdl-29193424

ABSTRACT

Cancer database is a systematic collection and analysis of information on various human cancers at genomic and molecular level that can be utilized to understand various steps in carcinogenesis and for therapeutic advancement in cancer field. Oral cancer is one of the leading causes of morbidity and mortality all over the world. The current research efforts in this field are aimed at cancer etiology and therapy. Advanced genomic technologies including microarrays, proteomics, transcrpitomics, and gene sequencing development have culminated in generation of extensive data and subjection of several genes and microRNAs that are distinctively expressed and this information is stored in the form of various databases. Extensive data from various resources have brought the need for collaboration and data sharing to make effective use of this new knowledge. The current review provides comprehensive information of various publicly accessible databases that contain information pertinent to oral squamous cell carcinoma (OSCC) and databases designed exclusively for OSCC. The databases discussed in this paper are Protein-Coding Gene Databases and microRNA Databases. This paper also describes gene overlap in various databases, which will help researchers to reduce redundancy and focus on only those genes, which are common to more than one databases. We hope such introduction will promote awareness and facilitate the usage of these resources in the cancer research community, and researchers can explore the molecular mechanisms involved in the development of cancer, which can help in subsequent crafting of therapeutic strategies.


Subject(s)
Biomedical Research , Databases, Factual , Internet , Mouth Neoplasms/genetics , Carcinogenesis/genetics , Carcinoma, Squamous Cell/genetics , Gene Expression Regulation, Neoplastic , Genes, Neoplasm/genetics , Genes, Overlapping , Genes, Tumor Suppressor , Humans , MicroRNAs/genetics , Open Reading Frames/genetics , Research Personnel
SELECTION OF CITATIONS
SEARCH DETAIL