Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
J Mol Evol ; 91(5): 570-580, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37326679

RESUMO

Protein-coding DNA sequences can be translated into completely different amino acid sequences if the nucleotide triplets used are shifted by a non-triplet amount on the same DNA strand or by translating codons from the opposite strand. Such "alternative reading frames" of protein-coding genes are a major contributor to the evolution of novel protein products. Recent studies demonstrating this include examples across the three domains of cellular life and in viruses. These sequences increase the number of trials potentially available for the evolutionary invention of new genes and also have unusual properties which may facilitate gene origin. There is evidence that the structure of the standard genetic code contributes to the features and gene-likeness of some alternative frame sequences. These findings have important implications across diverse areas of molecular biology, including for genome annotation, structural biology, and evolutionary genomics.

2.
Cell Syst ; 14(5): 343-345, 2023 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-37201506

RESUMO

Eukaryotic genomes are pervasively translated, but the properties of translated sequences outside of canonical genes are poorly understood. A new study in Cell Systems reveals a large translatome that is not under significant evolutionary constraint but is still an active part of diverse cellular systems.


Assuntos
Evolução Biológica , Genoma
3.
FEMS Microbiol Rev ; 47(1)2023 01 16.
Artigo em Inglês | MEDLINE | ID: mdl-36725215

RESUMO

Annotating protein sequences according to their biological functions is one of the key steps in understanding microbial diversity, metabolic potentials, and evolutionary histories. However, even in the best-studied prokaryotic genomes, not all proteins can be characterized by classical in vivo, in vitro, and/or in silico methods-a challenge rapidly growing alongside the advent of next-generation sequencing technologies and their enormous extension of 'omics' data in public databases. These so-called hypothetical proteins (HPs) represent a huge knowledge gap and hidden potential for biotechnological applications. Opportunities for leveraging the available 'Big Data' have recently proliferated with the use of artificial intelligence (AI). Here, we review the aims and methods of protein annotation and explain the different principles behind machine and deep learning algorithms including recent research examples, in order to assist both biologists wishing to apply AI tools in developing comprehensive genome annotations and computer scientists who want to contribute to this leading edge of biological research.


Assuntos
Inteligência Artificial , Aprendizado de Máquina , Algoritmos , Células Procarióticas , Anotação de Sequência Molecular
4.
iScience ; 25(2): 103844, 2022 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-35198897

RESUMO

The existence of overlapping genes (OLGs) with significant coding overlaps revolutionizes our understanding of genomic complexity. We report two exceptionally long (957 nt and 1536 nt), evolutionarily novel, translated antisense open reading frames (ORFs) embedded within annotated genes in the pathogenic Gram-negative bacterium Pseudomonas aeruginosa. Both OLG pairs show sequence features consistent with being genes and transcriptional signals in RNA sequencing. Translation of both OLGs was confirmed by ribosome profiling and mass spectrometry. Quantitative proteomics of samples taken during different phases of growth revealed regulation of protein abundances, implying biological functionality. Both OLGs are taxonomically restricted, and likely arose by overprinting within the genus. Evidence for purifying selection further supports functionality. The OLGs reported here, designated olg1 and olg2, are the longest yet proposed in prokaryotes and are among the best attested in terms of translation and evolutionary constraint. These results highlight a potentially large unexplored dimension of prokaryotic genomes.

6.
BMC Genomics ; 22(1): 888, 2021 Dec 11.
Artigo em Inglês | MEDLINE | ID: mdl-34895142

RESUMO

BACKGROUND: Overlapping genes (OLGs) with long protein-coding overlapping sequences are disallowed by standard genome annotation programs, outside of viruses. Recently however they have been discovered in Archaea, diverse Bacteria, and Mammals. The biological factors underlying life's ability to create overlapping genes require more study, and may have important applications in understanding evolution and in biotechnology. A previous study claimed that protein domains from viruses were much better suited to forming overlaps than those from other cellular organisms - in this study we assessed this claim, in order to discover what might underlie taxonomic differences in the creation of gene overlaps. RESULTS: After overlapping arbitrary Pfam domain pairs and evaluating them with Hidden Markov Models we find OLG construction to be much less constrained than expected. For instance, close to 10% of the constructed sequences cannot be distinguished from typical sequences in their protein family. Most are also indistinguishable from natural protein sequences regarding identity and secondary structure. Surprisingly, contrary to a previous study, virus domains were much less suitable for designing OLGs than bacterial or eukaryotic domains were. In general, the amount of amino acid change required to force a domain to overlap is approximately equal to the variation observed within a typical domain family. The resulting high similarity between natural sequences and those altered so as to overlap is mostly due to the combination of high redundancy in the genetic code and the evolutionary exchangeability of many amino acids. CONCLUSIONS: Synthetic overlapping genes which closely resemble natural gene sequences, as measured by HMM profiles, are remarkably easy to construct, and most arbitrary domain pairs can be altered so as to overlap while retaining high similarity to the original sequences. Future work however will need to assess important factors not considered such as intragenic interactions which affect protein folding. While the analysis here is not sufficient to guarantee functional folding proteins, further analysis of constructed OLGs will improve our understanding of the origin of these remarkable genetic elements across life and opens up exciting possibilities for synthetic biology.


Assuntos
Fatores Biológicos , Homologia de Genes , Sequência de Aminoácidos , Animais , Genoma , Fases de Leitura Aberta
7.
Virology ; 558: 145-151, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33774510

RESUMO

At least six small alternative-frame open reading frames (ORFs) overlapping well-characterized SARS-CoV-2 genes have been hypothesized to encode accessory proteins. Researchers have used different names for the same ORF or the same name for different ORFs, resulting in erroneous homological and functional inferences. We propose standard names for these ORFs and their shorter isoforms, developed in consultation with the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. We recommend calling the 39 codon Spike-overlapping ORF ORF2b; the 41, 57, and 22 codon ORF3a-overlapping ORFs ORF3c, ORF3d, and ORF3b; the 33 codon ORF3d isoform ORF3d-2; and the 97 and 73 codon Nucleocapsid-overlapping ORFs ORF9b and ORF9c. Finally, we document conflicting usage of the name ORF3b in 32 studies, and consequent erroneous inferences, stressing the importance of reserving identical names for homologs. We recommend that authors referring to these ORFs provide lengths and coordinates to minimize ambiguity caused by prior usage of alternative names.


Assuntos
Fases de Leitura Aberta , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus , Terminologia como Assunto , SARS-CoV-2/imunologia , Glicoproteína da Espícula de Coronavírus/classificação , Glicoproteína da Espícula de Coronavírus/genética
8.
Elife ; 92020 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-33001029

RESUMO

Understanding the emergence of novel viruses requires an accurate and comprehensive annotation of their genomes. Overlapping genes (OLGs) are common in viruses and have been associated with pandemics but are still widely overlooked. We identify and characterize ORF3d, a novel OLG in SARS-CoV-2 that is also present in Guangxi pangolin-CoVs but not other closely related pangolin-CoVs or bat-CoVs. We then document evidence of ORF3d translation, characterize its protein sequence, and conduct an evolutionary analysis at three levels: between taxa (21 members of Severe acute respiratory syndrome-related coronavirus), between human hosts (3978 SARS-CoV-2 consensus sequences), and within human hosts (401 deeply sequenced SARS-CoV-2 samples). ORF3d has been independently identified and shown to elicit a strong antibody response in COVID-19 patients. However, it has been misclassified as the unrelated gene ORF3b, leading to confusion. Our results liken ORF3d to other accessory genes in emerging viruses and highlight the importance of OLGs.


Assuntos
Betacoronavirus/genética , Infecções por Coronavirus/virologia , Evolução Molecular , Homologia de Genes , Genes Virais , Especificidade de Hospedeiro/genética , Fases de Leitura Aberta/genética , Pandemias , Pneumonia Viral/virologia , Proteínas Virais/genética , Sequência de Aminoácidos , Animais , Anticorpos Antivirais/imunologia , Especificidade de Anticorpos , Antígenos Virais/biossíntese , Antígenos Virais/genética , Antígenos Virais/imunologia , Betacoronavirus/patogenicidade , Betacoronavirus/fisiologia , COVID-19 , China/epidemiologia , Quirópteros/virologia , Coronavirus/genética , Infecções por Coronavirus/epidemiologia , Epitopos/genética , Epitopos/imunologia , Europa (Continente)/epidemiologia , Eutérios/virologia , Regulação Viral da Expressão Gênica , Variação Genética , Haplótipos/genética , Humanos , Modelos Moleculares , Mutação , Filogenia , Pneumonia Viral/epidemiologia , Biossíntese de Proteínas , Conformação Proteica , RNA Viral/genética , SARS-CoV-2 , Alinhamento de Sequência , Homologia de Sequência do Ácido Nucleico , Proteínas Virais/imunologia
9.
Front Mol Biosci ; 7: 187, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32923454

RESUMO

Many prokaryotic RNAs are transcribed from loci outside of annotated protein coding genes. Across bacterial species hundreds of short open reading frames antisense to annotated genes show evidence of both transcription and translation, for instance in ribosome profiling data. Determining the functional fraction of these protein products awaits further research, including insights from studies of molecular interactions and detailed evolutionary analysis. There are multiple lines of evidence, however, that many of these newly discovered proteins are of use to the organism. Condition-specific phenotypes have been characterized for a few. These proteins should be added to genome annotations, and the methods for predicting them standardized. Evolutionary analysis of these typically young sequences also may provide important insights into gene evolution. This research should be prioritized for its exciting potential to uncover large numbers of novel proteins with extremely diverse potential practical uses, including applications in synthetic biology and responding to pathogens.

10.
J Biol Chem ; 295(27): 8999-9011, 2020 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-32385111

RESUMO

Ribosome profiling (RIBO-Seq) has improved our understanding of bacterial translation, including finding many unannotated genes. However, protocols for RIBO-Seq and corresponding data analysis are not yet standardized. Here, we analyzed 48 RIBO-Seq samples from nine studies of Escherichia coli K12 grown in lysogeny broth medium and particularly focused on the size-selection step. We show that for conventional expression analysis, a size range between 22 and 30 nucleotides is sufficient to obtain protein-coding fragments, which has the advantage of removing many unwanted rRNA and tRNA reads. More specific analyses may require longer reads and a corresponding improvement in rRNA/tRNA depletion. There is no consensus about the appropriate sequencing depth for RIBO-Seq experiments in prokaryotes, and studies vary significantly in total read number. Our analysis suggests that 20 million reads that are not mapping to rRNA/tRNA are required for global detection of translated annotated genes. We also highlight the influence of drug-induced ribosome stalling, which causes bias at translation start sites. The resulting accumulation of reads at the start site may be especially useful for detecting weakly expressed genes. As different methods suit different questions, it may not be possible to produce a "one-size-fits-all" ribosome profiling data set. Therefore, experiments should be carefully designed in light of the scientific questions of interest. We propose some basic characteristics that should be reported with any new RIBO-Seq data sets. Careful attention to the factors discussed should improve prokaryotic gene detection and the comparability of ribosome profiling data sets.


Assuntos
Bactérias/genética , Ribossomos/genética , Análise de Sequência de RNA/métodos , Biologia Computacional/métodos , Perfil Genético , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Anotação de Sequência Molecular/métodos , Biossíntese de Proteínas/genética , RNA Mensageiro/genética , RNA Ribossômico/metabolismo
11.
Mol Biol Evol ; 37(8): 2440-2449, 2020 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-32243542

RESUMO

Purifying (negative) natural selection is a hallmark of functional biological sequences, and can be detected in protein-coding genes using the ratio of nonsynonymous to synonymous substitutions per site (dN/dS). However, when two genes overlap the same nucleotide sites in different frames, synonymous changes in one gene may be nonsynonymous in the other, perturbing dN/dS. Thus, scalable methods are needed to estimate functional constraint specifically for overlapping genes (OLGs). We propose OLGenie, which implements a modification of the Wei-Zhang method. Assessment with simulations and controls from viral genomes (58 OLGs and 176 non-OLGs) demonstrates low false-positive rates and good discriminatory ability in differentiating true OLGs from non-OLGs. We also apply OLGenie to the unresolved case of HIV-1's putative antisense protein gene, showing significant purifying selection. OLGenie can be used to study known OLGs and to predict new OLGs in genome annotation. Software and example data are freely available at https://github.com/chasewnelson/OLGenie (last accessed April 10, 2020).


Assuntos
Homologia de Genes , Técnicas Genéticas , Seleção Genética , Mutação Silenciosa , Software , HIV-1/genética
12.
Front Microbiol ; 11: 377, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32265854

RESUMO

Antisense transcription is well known in bacteria. However, translation of antisense RNAs is typically not considered, as the implied overlapping coding at a DNA locus is assumed to be highly improbable. Therefore, such overlapping genes are systematically excluded in prokaryotic genome annotation. Here we report an exceptional 603 bp long open reading frame completely embedded in antisense to the gene of the outer membrane protein ompA. An active σ70 promoter, transcription start site (TSS), Shine-Dalgarno motif and rho-independent terminator were experimentally validated, providing evidence that this open reading frame has all the structural features of a functional gene. Furthermore, ribosomal profiling revealed translation of the mRNA, the protein was detected in Western blots and a pH-dependent phenotype conferred by the protein was shown in competitive overexpression growth experiments of a translationally arrested mutant versus wild type. We designate this novel gene pop (pH-regulated overlapping protein-coding gene), thus adding another example to the growing list of overlapping, protein coding genes in bacteria.

13.
Biosystems ; 185: 104023, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31520875

RESUMO

The genetic code and its evolution have been studied by many different approaches. One approach is to compare the properties of the standard genetic code (SGC) to theoretical alternative codes in order to determine how optimal it is and from this infer whether or not it is likely that it has undergone a selective evolutionary process. Many different properties have been studied in this way in the literature. Less focus has been put on the alternative code sets which are used as a comparison to the standard code. Each implicitly represents an evolutionary hypothesis and the sets used differ greatly across the literature. Here we determine the influence of the comparison set on the results of the optimality calculation by using codes based upon different sub-structures of the SGC. With these results we can generalize the results to different evolutionary hypotheses. We find that the SGC's optimality is very robust, as no code set with no optimised properties is found. We therefore conclude that the optimality of the SGC is a robust feature across all evolutionary hypotheses. Our results provide important information for any future studies on the evolution of the standard genetic code. We also studied properties of the SGC concerning overlapping genes, which have recently been found to be more widespread than often believed. Although our results are not conclusive yet we find additional intriguing structures in the SGC that need explanation.


Assuntos
Aminoácidos/genética , Códon/genética , Código Genético/genética , Fases de Leitura Aberta/genética , Algoritmos , Códon de Terminação/genética , Evolução Molecular , Modelos Genéticos , Mutação , Biossíntese de Proteínas , Seleção Genética
14.
Sci Rep ; 8(1): 17875, 2018 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-30552341

RESUMO

Only a few overlapping gene pairs are known in the best-analyzed bacterial model organism Escherichia coli. Automatic annotation programs usually annotate only one out of six reading frames at a locus, allowing only small overlaps between protein-coding sequences. However, both RNAseq and RIBOseq show signals corresponding to non-trivially overlapping reading frames in antisense to annotated genes, which may constitute protein-coding genes. The transcription and translation of the novel 264 nt gene asa, which overlaps in antisense to a putative TEGT (Testis-Enhanced Gene Transfer) transporter gene is detected in pathogenic E. coli, but not in two apathogenic E. coli strains. The gene in E. coli O157:H7 (EHEC) was further analyzed. An overexpression phenotype was identified in two stress conditions, i.e. excess in salt or arginine. For this, EHEC overexpressing asa was grown competitively against EHEC with a translationally arrested asa mutant gene. RT-qPCR revealed conditional expression dependent on growth phase, sodium chloride, and arginine. Two potential promoters were computationally identified and experimentally verified by reporter gene expression and determination of the transcription start site. The protein Asa was verified by Western blot. Close homologues of asa have not been found in protein databases, but bioinformatic analyses showed that it may be membrane associated, having a largely disordered structure.


Assuntos
Escherichia coli O157/genética , Proteínas de Escherichia coli/biossíntese , Regulação Bacteriana da Expressão Gênica , Proteínas de Membrana/biossíntese , Cloreto de Sódio/metabolismo , Arginina/metabolismo , Western Blotting , Escherichia coli O157/crescimento & desenvolvimento , Escherichia coli O157/metabolismo , Proteínas de Escherichia coli/genética , Perfilação da Expressão Gênica , Proteínas de Membrana/genética , Regiões Promotoras Genéticas , Reação em Cadeia da Polimerase em Tempo Real , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Sítio de Iniciação de Transcrição
15.
PLoS One ; 12(9): e0184119, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28902868

RESUMO

In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escherichia coli O157:H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure. All intergenic open reading frames potentially encoding a protein of ≥ 30 amino acids were investigated with regard to coverage by transcription and translation signals and their translatability expressed by the ribosomal coverage value. This led to discovery of 465 unique, putative novel genes not yet annotated in this E. coli strain, which are evenly distributed over both DNA strands of the genome. For 255 of the novel genes, annotated homologs in other bacteria were found, and a machine-learning algorithm, trained on small protein-coding E. coli genes, predicted that 89% of these translated open reading frames represent bona fide genes. The remaining 210 putative novel genes without annotated homologs were compared to the 255 novel genes with homologs and to 250 short annotated genes of this E. coli strain. All three groups turned out to be similar with respect to their translatability distribution, fractions of differentially regulated genes, secondary structure composition, and the distribution of evolutionary constraint, suggesting that both novel groups represent legitimate genes. However, the machine-learning algorithm only recognized a small fraction of the 210 genes without annotated homologs. It is possible that these genes represent a novel group of genes, which have unusual features dissimilar to the genes of the machine-learning algorithm training set.


Assuntos
DNA Intergênico/genética , Escherichia coli O157/genética , Genes Bacterianos , Genoma Bacteriano , Sequência Conservada , DNA Bacteriano/genética , Estudos de Associação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Fases de Leitura Aberta/genética , RNA Bacteriano/genética , Transcriptoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA