Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
BMC Genomics ; 25(Suppl 3): 834, 2024 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-39237856

RESUMO

BACKGROUND: Novel protein-coding genes were considered to be born by re-organization of pre-existing genes, such as gene duplication and gene fusion. However, recent progress of genome research revealed that more protein-coding genes than expected were born de novo, that is, gene origination by accumulating mutations in non-genic DNA sequences. Nonetheless, the in-depth process (scenario) for de novo origination is not well understood. RESULTS: We have conceived bioinformatic analysis for sketching a scenario for de novo origination of protein-coding genes. For each de novo protein-coding gene, we firstly identified an edge of a given phylogenetic tree where the gene was born based on parsimony. Then, from a multiple sequence alignment of the de novo gene and its orthologous regions, we constructed ancestral DNA sequences of the gene corresponding to both end nodes of the edge. We finally revealed statistical features observed in evolution between the two ancestral sequences. In the analysis of the Saccharomyces cerevisiae lineage, we have successfully sketched a putative scenario for de novo origination of protein-coding genes. (1) In the beginning was GC-rich genome regions. (2) Neutral mutations were accumulated in the regions. (3) ORFs were extended/combined, and then (4) translation signature (Kozak consensus sequence) was recruited. Interestingly, as the scenario progresses from (2) to (4), the specificity of mutations increases. CONCLUSION: To the best of our knowledge, this is the first report outlining a scenario of de novo origination of protein-coding genes. Our bioinformatic analysis can capture events that occur during a short evolutionary time by directly observing the evolution of the ancestral sequences from non-genic to genic. This property is suitable for the analysis of fast evolving de novo genes.


Assuntos
Evolução Molecular , Fases de Leitura Aberta , Filogenia , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Biologia Computacional/métodos , Mutação , Genoma Fúngico
2.
BMC Biol ; 21(1): 257, 2023 11 13.
Artigo em Inglês | MEDLINE | ID: mdl-37957718

RESUMO

BACKGROUND: Over evolutionary timescales, genomic loci can switch between functional and non-functional states through processes such as pseudogenization and de novo gene birth. Particularly, de novo gene birth is a widespread process, and many examples continue to be discovered across diverse evolutionary lineages. However, the general mechanisms that lead to functionalization are poorly understood, and estimated rates of de novo gene birth remain contentious. Here, we address this problem within a model that takes into account mutations and structural variation, allowing us to estimate the likelihood of emergence of new functions at non-functional loci. RESULTS: Assuming biologically reasonable mutation rates and mutational effects, we find that functionalization of non-genic loci requires the realization of strict conditions. This is in line with the observation that most de novo genes are localized to the vicinity of established genes. Our model also provides an explanation for the empirical observation that emerging proto-genes are often lost despite showing signs of adaptation. CONCLUSIONS: Our work elucidates the properties of non-genic loci that make them fertile for adaptation, and our results offer mechanistic insights into the process of de novo gene birth.


Assuntos
Evolução Biológica , Evolução Molecular , Mutação
3.
Mol Biol Evol ; 37(6): 1761-1774, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32101291

RESUMO

De novo protein-coding innovations sometimes emerge from ancestrally noncoding DNA, despite the expectation that translating random sequences is overwhelmingly likely to be deleterious. The "preadapting selection" hypothesis claims that emergence is facilitated by prior, low-level translation of noncoding sequences via molecular errors. It predicts that selection on polypeptides translated only in error is strong enough to matter and is strongest when erroneous expression is high. To test this hypothesis, we examined noncoding sequences located downstream of stop codons (i.e., those potentially translated by readthrough errors) in Saccharomyces cerevisiae genes. We identified a class of "fragile" proteins under strong selection to reduce readthrough, which are unlikely substrates for co-option. Among the remainder, sequences showing evidence of readthrough translation, as assessed by ribosome profiling, encoded C-terminal extensions with higher intrinsic structural disorder, supporting the preadapting selection hypothesis. The cryptic sequences beyond the stop codon, rather than spillover effects from the regular C-termini, are primarily responsible for the higher disorder. Results are robust to controlling for the fact that stronger selection also reduces the length of C-terminal extensions. These findings indicate that selection acts on 3' UTRs in Saccharomyces cerevisiae to purge potentially deleterious variants of cryptic polypeptides, acting more strongly in genes that experience more readthrough errors.


Assuntos
Adaptação Biológica , Evolução Molecular , Seleção Genética , Códon de Terminação , Saccharomyces cerevisiae
4.
Dev Genes Evol ; 230(4): 279-294, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32623522

RESUMO

Genome studies have uncovered many examples of essential gene loss, raising the question of how ancient genes transition from essentiality to dispensability. We explored this process for the deeply conserved E3 ubiquitin ligase Murine double minute (Mdm), which is lacking in Drosophila despite the conservation of its main regulatory target, the cellular stress response gene p53. Conducting gene expression and knockdown experiments in the red flour beetle Tribolium castaneum, we found evidence that Mdm has remained essential in insects where it is present. Using bioinformatics approaches, we confirm the absence of the Mdm gene family in Drosophila, mapping its loss to the stem lineage of schizophoran Diptera and Pipunculidae (big-headed flies), about 95-85 million years ago. Intriguingly, this gene loss event was preceded by the de novo origin of the gene Companion of reaper (Corp), a novel p53 regulatory factor that is characterized by functional similarities to vertebrate Mdm2 despite lacking E3 ubiquitin ligase protein domains. Speaking against a 1:1 compensatory gene gain/loss scenario, however, we found that hoverflies (Syrphidae) and pointed-wing flies (Lonchopteridae) possess both Mdm and Corp. This implies that the two p53 regulators have been coexisting for ~ 150 million years in select dipteran clades and for at least 50 million years in the lineage to Schizophora and Pipunculidae. Given these extensive time spans of Mdm/Corp coexistence, we speculate that the loss of Mdm in the lineage to Drosophila involved further acquisitions of compensatory gene activities besides the emergence of Corp. Combined with the previously noted reduction of an ancestral P53 contact domain in the Mdm homologs of crustaceans and insects, we conclude that the loss of the ancient Mdm gene family in flies was the outcome of incremental functional regression over long macroevolutionary time scales.


Assuntos
Proteínas de Drosophila/genética , Drosophila/genética , Genes Essenciais/genética , Proteínas Proto-Oncogênicas c-mdm2/genética , Tribolium/genética , Proteína Supressora de Tumor p53/metabolismo , Ubiquitina-Proteína Ligases/genética , Animais , Evolução Molecular , Técnicas de Silenciamento de Genes , Genômica , Filogenia , Proteínas Proto-Oncogênicas c-mdm2/metabolismo , Tribolium/embriologia , Proteína Supressora de Tumor p53/genética
5.
Mol Phylogenet Evol ; 118: 54-57, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-28943376

RESUMO

Taxon-specific de novo protein-coding sequences are thought to be important for taxon-specific environmental adaptation. A recent study revealed that bottlenose dolphins acquired a novel isoform of aquaporin 2 generated by alternative splicing (alternative AQP2), which helps dolphins to live in hyperosmotic seawater. The AQP2 gene consists of four exons, but the alternative AQP2 gene lacks the fourth exon and instead has a longer third exon that includes the original third exon and a part of the original third intron. Here, we show that the latter half of the third exon of the alternative AQP2 arose from a non-protein-coding sequence. Intact ORF of this de novo sequence is shared not by all cetaceans, but only by delphinoids. However, this sequence is conservative in all modern cetaceans, implying that this de novo sequence potentially plays important roles for marine adaptation in cetaceans.


Assuntos
Aquaporina 2/química , Golfinhos/classificação , Evolução Molecular , Processamento Alternativo , Animais , Aquaporina 2/genética , Aquaporina 2/metabolismo , Sequência de Bases , Golfinhos/metabolismo , Éxons , Íntrons , Rim/metabolismo , Filogenia , Isoformas de Proteínas/química , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , RNA/química , RNA/isolamento & purificação , RNA/metabolismo , Alinhamento de Sequência , Análise de Sequência de DNA
6.
Genome Biol ; 25(1): 183, 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38978079

RESUMO

BACKGROUND: Recent studies uncovered pervasive transcription and translation of thousands of noncanonical open reading frames (nORFs) outside of annotated genes. The contribution of nORFs to cellular phenotypes is difficult to infer using conventional approaches because nORFs tend to be short, of recent de novo origins, and lowly expressed. Here we develop a dedicated coexpression analysis framework that accounts for low expression to investigate the transcriptional regulation, evolution, and potential cellular roles of nORFs in Saccharomyces cerevisiae. RESULTS: Our results reveal that nORFs tend to be preferentially coexpressed with genes involved in cellular transport or homeostasis but rarely with genes involved in RNA processing. Mechanistically, we discover that young de novo nORFs located downstream of conserved genes tend to leverage their neighbors' promoters through transcription readthrough, resulting in high coexpression and high expression levels. Transcriptional piggybacking also influences the coexpression profiles of young de novo nORFs located upstream of genes, but to a lesser extent and without detectable impact on expression levels. Transcriptional piggybacking influences, but does not determine, the transcription profiles of de novo nORFs emerging nearby genes. About 40% of nORFs are not strongly coexpressed with any gene but are transcriptionally regulated nonetheless and tend to form entirely new transcription modules. We offer a web browser interface ( https://carvunislab.csb.pitt.edu/shiny/coexpression/ ) to efficiently query, visualize, and download our coexpression inferences. CONCLUSIONS: Our results suggest that nORF transcription is highly regulated. Our coexpression dataset serves as an unprecedented resource for unraveling how nORFs integrate into cellular networks, contribute to cellular phenotypes, and evolve.


Assuntos
Regulação Fúngica da Expressão Gênica , Fases de Leitura Aberta , Saccharomyces cerevisiae , Transcrição Gênica , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Evolução Molecular , Biossíntese de Proteínas
7.
bioRxiv ; 2024 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-39257767

RESUMO

Novel proteins can originate de novo from non-coding DNA and contribute to species-specific adaptations. It is challenging to conceive how de novo emerging proteins may integrate pre-existing cellular systems to bring about beneficial traits, given that their sequences are previously unseen by the cell. To address this apparent paradox, we investigated 26 de novo emerging proteins previously associated with growth benefits in yeast. Microscopy revealed that these beneficial emerging proteins preferentially localize to the endoplasmic reticulum (ER). Sequence and structure analyses uncovered a common protein organization among all ER-localizing beneficial emerging proteins, characterized by a short hydrophobic C-terminus immediately preceded by a transmembrane domain. Using genetic and biochemical approaches, we showed that ER localization of beneficial emerging proteins requires the GET and SND pathways, both of which are evolutionarily conserved and known to recognize transmembrane domains to promote post-translational ER insertion. The abundance of ER-localizing beneficial emerging proteins was regulated by conserved proteasome- and vacuole-dependent processes, through mechanisms that appear to be facilitated by the emerging proteins' C-termini. Consequently, we propose that evolutionarily conserved pathways can convergently govern the cellular processing of de novo emerging proteins with unique sequences, likely owing to common underlying protein organization patterns.

8.
Cell Syst ; 14(5): 363-381.e8, 2023 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-37164009

RESUMO

Translation is the process by which ribosomes synthesize proteins. Ribosome profiling recently revealed that many short sequences previously thought to be noncoding are pervasively translated. To identify protein-coding genes in this noncanonical translatome, we combine an integrative framework for extremely sensitive ribosome profiling analysis, iRibo, with high-powered selection inferences tailored for short sequences. We construct a reference translatome for Saccharomyces cerevisiae comprising 5,400 canonical and almost 19,000 noncanonical translated elements. Only 14 noncanonical elements were evolving under detectable purifying selection. A representative subset of translated elements lacking signatures of selection demonstrated involvement in processes including DNA repair, stress response, and post-transcriptional regulation. Our results suggest that most translated elements are not conserved protein-coding genes and contribute to genotype-phenotype relationships through fast-evolving molecular mechanisms.


Assuntos
Regulação da Expressão Gênica , Ribossomos , Ribossomos/genética , Ribossomos/metabolismo , Saccharomyces cerevisiae/genética , Fenótipo
9.
Genome Biol Evol ; 2022 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-35668555

RESUMO

Proteins are the workhorses of the cell, yet they carry great potential for harm via misfolding and aggregation. Despite the dangers, proteins are sometimes born de novo from non-coding DNA. Proteins are more likely to be born from non-coding regions that produce peptides that do little to no harm when translated than from regions that produce harmful peptides. To investigate which newborn proteins are most likely to "first, do no harm", we estimate fitnesses from an experiment that competed Escherichia coli lineages that each expressed a unique random peptide. A variety of peptide metrics significantly predict lineage fitness, but this predictive power stems from simple amino acid frequencies rather than the ordering of amino acids. Amino acids that are smaller and that promote intrinsic structural disorder have more benign fitness effects. We validate that the amino acids that indicate benign effects in random peptides expressed in E. coli also do so in an independent dataset of random N-terminal tags in which it is possible to control for expression level. The same amino acids are also enriched in young animal proteins.

10.
Genes (Basel) ; 12(12)2021 11 24.
Artigo em Inglês | MEDLINE | ID: mdl-34946813

RESUMO

Microproteins (<100 amino acids) are receiving increasing recognition as important participants in numerous biological processes, but their evolutionary dynamics are poorly understood. SPAAR is a recently discovered microprotein that regulates muscle regeneration and angiogenesis through interactions with conserved signaling pathways. Interestingly, SPAAR does not belong to any known protein family and has known homologs exclusively among placental mammals. This lack of distant homology could be caused by challenges in homology detection of short sequences, or it could indicate a recent de novo emergence from a noncoding sequence. By integrating syntenic alignments and homology searches, we identify SPAAR orthologs in marsupials and monotremes, establishing that SPAAR has existed at least since the emergence of mammals. SPAAR shows substantial primary sequence divergence but retains a conserved protein structure. In primates, we infer two independent evolutionary events leading to the de novo origination of 5' elongated isoforms of SPAAR from a noncoding sequence and find evidence of adaptive evolution in this extended region. Thus, SPAAR may be of ancient origin, but it appears to be experiencing continual evolutionary innovation in mammals.


Assuntos
Peptídeos/genética , RNA Longo não Codificante/genética , Animais , Evolução Molecular , Feminino , Humanos , Mamíferos/genética , Camundongos , Gambás/genética , Filogenia , Placenta/metabolismo , Ornitorrinco/genética , Gravidez , Primatas/genética
11.
Genes (Basel) ; 11(11)2020 11 13.
Artigo em Inglês | MEDLINE | ID: mdl-33202889

RESUMO

Plant-parasitic nematodes cause extensive annual yield losses to worldwide agricultural production. Most cultivated plants have no known resistance against nematodes and the few bearing a resistance gene can be overcome by certain species. Chemical methods that have been deployed to control nematodes have largely been banned from use due to their poor specificity and high toxicity. Hence, there is an urgent need for the development of cleaner and more specific control methods. Recent advances in nematode genomics, including in phytoparasitic species, provide an unprecedented opportunity to identify genes and functions specific to these pests. Using phylogenomics, we compared 61 nematode genomes, including 16 for plant-parasitic species and identified more than 24,000 protein families specific to these parasites. In the genome of Meloidogyne incognita, one of the most devastating plant parasites, we found ca. 10,000 proteins with orthologs restricted only to phytoparasitic species and no further homology in protein databases. Among these phytoparasite-specific proteins, ca. 1000 shared the same properties as known secreted effectors involved in essential parasitic functions. Of these, 68 were novel and showed strong expression during the endophytic phase of the nematode life cycle, based on both RNA-seq and RT-qPCR analyses. Besides effector candidates, transcription-related and neuro-perception functions were enriched in phytoparasite-specific proteins, revealing interesting targets for nematode control methods. This phylogenomics analysis constitutes a unique resource for the further understanding of the genetic basis of nematode adaptation to phytoparasitism and for the development of more efficient control methods.


Assuntos
Proteínas de Helminto/genética , Plantas/parasitologia , Tylenchoidea/genética , Animais , Simulação por Computador , Regulação da Expressão Gênica , Ontologia Genética , Transferência Genética Horizontal , Genoma Helmíntico/genética , Genômica/métodos , Interações Hospedeiro-Parasita/genética , Nematoides/genética , Nematoides/patogenicidade , Filogenia , Doenças das Plantas/parasitologia , Tylenchoidea/patogenicidade
12.
Noncoding RNA ; 6(3)2020 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-32899105

RESUMO

A small phylogenetically conserved sequence of 11,231 bp, termed FAM247, is repeated in human chromosome 22 by segmental duplications. This sequence forms part of diverse genes that span evolutionary time, the protein genes being the earliest as they are present in zebrafish and/or mice genomes, and the long noncoding RNA genes and pseudogenes the most recent as they appear to be present only in the human genome. We propose that the conserved sequence provides a nucleation site for new gene development at evolutionarily conserved chromosomal loci where the FAM247 sequences reside. The FAM247 sequence also carries information in its open reading frames that provides protein exon amino acid sequences; one exon plays an integral role in immune system regulation, specifically, the function of ubiquitin-specific protease (USP18) in the regulation of interferon. An analysis of this multifaceted sequence and the genesis of genes that contain it is presented.

13.
Genome Biol Evol ; 12(8): 1355-1366, 2020 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-32589737

RESUMO

Taxonomically restricted genes (TRGs) are genes that are present only in one clade. Protein-coding TRGs may evolve de novo from previously noncoding sequences: functional ncRNA, introns, or alternative reading frames of older protein-coding genes, or intergenic sequences. A major challenge in studying de novo genes is the need to avoid both false-positives (nonfunctional open reading frames and/or functional genes that did not arise de novo) and false-negatives. Here, we search conservatively for high-confidence TRGs as the most promising candidates for experimental studies, ensuring functionality through conservation across at least two species, and ensuring de novo status through examination of homologous noncoding sequences. Our pipeline also avoids ascertainment biases associated with preconceptions of how de novo genes are born. We identify one TRG family that evolved de novo in the Drosophila melanogaster subgroup. This TRG family contains single-copy genes in Drosophila simulans and Drosophila sechellia. It originated in an intron of a well-established gene, sharing that intron with another well-established gene upstream. These TRGs contain an intron that predates their open reading frame. These genes have not been previously reported as de novo originated, and to our knowledge, they are the best Drosophila candidates identified so far for experimental studies aimed at elucidating the properties of de novo genes.


Assuntos
Drosophila melanogaster/genética , Evolução Molecular , Família Multigênica , Animais , Fases de Leitura Aberta , Especificidade da Espécie
14.
Genetics ; 212(4): 1353-1366, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31227545

RESUMO

Proteins are among the most important constituents of biological systems. Because all protein-coding genes have a noncoding ancestral form, the properties of noncoding sequences and how they shape the birth of novel proteins may influence the structure and function of all proteins. Differences between the properties of young proteins and random expectations from noncoding sequences have previously been interpreted as the result of natural selection. However, interpreting such deviations requires a yet-unattained understanding of the raw material of de novo gene birth and its relation to novel functional proteins. We mathematically show that the average properties and selective filtering of the "junk" polypeptides of which this raw material is composed are not the only factors influencing the properties of novel functional proteins. We find that in some biological scenarios, they also depend on the variance of the properties of junk polypeptides and their correlation with the rate of allelic turnover, which may itself depend on mutational biases. This suggests for instance that any property of polypeptides that accelerates their exploration of the sequence space could be overrepresented in novel functional proteins, even if it has a limited effect on adaptive value. To exemplify the use of our general theoretical results, we build a simple model that predicts the mean length and mean intrinsic disorder of novel functional proteins from the genomic GC content and a single evolutionary parameter. This work provides a theoretical framework that can guide the prediction and interpretation of results when studying the de novo emergence of protein-coding genes.


Assuntos
Evolução Molecular , Modelos Genéticos , Mutagênese , Peptídeos/genética , Animais , Bactérias/genética , Composição de Bases , Viés , Eucariotos/genética , Fases de Leitura Aberta/genética , Peptídeos/química , Dobramento de Proteína
15.
Elife ; 82019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31674305

RESUMO

The word function has many different meanings in molecular biology. Here we explore the use of this word (and derivatives like functional) in research papers about de novo gene birth. Based on an analysis of 20 abstracts we propose a simple lexicon that, we believe, will help scientists and philosophers discuss the meaning of function more clearly.


Assuntos
Fatores Biológicos/metabolismo , Biologia Molecular/métodos , Terminologia como Assunto
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa