Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 61
Filter
Add more filters











Publication year range
1.
bioRxiv ; 2024 Aug 29.
Article in English | MEDLINE | ID: mdl-39257767

ABSTRACT

Novel proteins can originate de novo from non-coding DNA and contribute to species-specific adaptations. It is challenging to conceive how de novo emerging proteins may integrate pre-existing cellular systems to bring about beneficial traits, given that their sequences are previously unseen by the cell. To address this apparent paradox, we investigated 26 de novo emerging proteins previously associated with growth benefits in yeast. Microscopy revealed that these beneficial emerging proteins preferentially localize to the endoplasmic reticulum (ER). Sequence and structure analyses uncovered a common protein organization among all ER-localizing beneficial emerging proteins, characterized by a short hydrophobic C-terminus immediately preceded by a transmembrane domain. Using genetic and biochemical approaches, we showed that ER localization of beneficial emerging proteins requires the GET and SND pathways, both of which are evolutionarily conserved and known to recognize transmembrane domains to promote post-translational ER insertion. The abundance of ER-localizing beneficial emerging proteins was regulated by conserved proteasome- and vacuole-dependent processes, through mechanisms that appear to be facilitated by the emerging proteins' C-termini. Consequently, we propose that evolutionarily conserved pathways can convergently govern the cellular processing of de novo emerging proteins with unique sequences, likely owing to common underlying protein organization patterns.

2.
BMC Genomics ; 25(Suppl 3): 834, 2024 Sep 05.
Article in English | MEDLINE | ID: mdl-39237856

ABSTRACT

BACKGROUND: Novel protein-coding genes were considered to be born by re-organization of pre-existing genes, such as gene duplication and gene fusion. However, recent progress of genome research revealed that more protein-coding genes than expected were born de novo, that is, gene origination by accumulating mutations in non-genic DNA sequences. Nonetheless, the in-depth process (scenario) for de novo origination is not well understood. RESULTS: We have conceived bioinformatic analysis for sketching a scenario for de novo origination of protein-coding genes. For each de novo protein-coding gene, we firstly identified an edge of a given phylogenetic tree where the gene was born based on parsimony. Then, from a multiple sequence alignment of the de novo gene and its orthologous regions, we constructed ancestral DNA sequences of the gene corresponding to both end nodes of the edge. We finally revealed statistical features observed in evolution between the two ancestral sequences. In the analysis of the Saccharomyces cerevisiae lineage, we have successfully sketched a putative scenario for de novo origination of protein-coding genes. (1) In the beginning was GC-rich genome regions. (2) Neutral mutations were accumulated in the regions. (3) ORFs were extended/combined, and then (4) translation signature (Kozak consensus sequence) was recruited. Interestingly, as the scenario progresses from (2) to (4), the specificity of mutations increases. CONCLUSION: To the best of our knowledge, this is the first report outlining a scenario of de novo origination of protein-coding genes. Our bioinformatic analysis can capture events that occur during a short evolutionary time by directly observing the evolution of the ancestral sequences from non-genic to genic. This property is suitable for the analysis of fast evolving de novo genes.


Subject(s)
Evolution, Molecular , Open Reading Frames , Phylogeny , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/genetics , Computational Biology/methods , Mutation , Genome, Fungal
3.
Genome Biol Evol ; 16(8)2024 Aug 05.
Article in English | MEDLINE | ID: mdl-39004885

ABSTRACT

New protein-coding genes can evolve from previously noncoding genomic regions through a process known as de novo gene emergence. Evidence suggests that this process has likely occurred throughout evolution and across the tree of life. Yet, confidently identifying de novo emerged genes remains challenging. Ancestral sequence reconstruction is a promising approach for inferring whether a gene has emerged de novo or not, as it allows us to inspect whether a given genomic locus ancestrally harbored protein-coding capacity. However, the use of ancestral sequence reconstruction in the context of de novo emergence is still in its infancy and its capabilities, limitations, and overall potential are largely unknown. Notably, it is difficult to formally evaluate the protein-coding capacity of ancestral sequences, particularly when new gene candidates are short. How well-suited is ancestral sequence reconstruction as a tool for the detection and study of de novo genes? Here, we address this question by designing an ancestral sequence reconstruction workflow incorporating different tools and sets of parameters and by introducing a formal criterion that allows to estimate, within a desired level of confidence, when protein-coding capacity originated at a particular locus. Applying this workflow on ∼2,600 short, annotated budding yeast genes (<1,000 nucleotides), we found that ancestral sequence reconstruction robustly predicts an ancient origin for the most widely conserved genes, which constitute "easy" cases. For less robust cases, we calculated a randomization-based empirical P-value estimating whether the observed conservation between the extant and ancestral reading frame could be attributed to chance. This formal criterion allowed us to pinpoint a branch of origin for most of the less robust cases, identifying 49 genes that can unequivocally be considered de novo originated since the split of the Saccharomyces genus, including 37 Saccharomyces cerevisiae-specific genes. We find that for the remaining equivocal cases we cannot rule out different evolutionary scenarios including rapid evolution, multiple gene losses, or a recent de novo origin. Overall, our findings suggest that ancestral sequence reconstruction is a valuable tool to study de novo gene emergence but should be applied with caution and awareness of its limitations.


Subject(s)
Evolution, Molecular , Saccharomyces cerevisiae/genetics , Phylogeny , Genome, Fungal , Genes, Fungal
4.
Genome Biol ; 25(1): 183, 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38978079

ABSTRACT

BACKGROUND: Recent studies uncovered pervasive transcription and translation of thousands of noncanonical open reading frames (nORFs) outside of annotated genes. The contribution of nORFs to cellular phenotypes is difficult to infer using conventional approaches because nORFs tend to be short, of recent de novo origins, and lowly expressed. Here we develop a dedicated coexpression analysis framework that accounts for low expression to investigate the transcriptional regulation, evolution, and potential cellular roles of nORFs in Saccharomyces cerevisiae. RESULTS: Our results reveal that nORFs tend to be preferentially coexpressed with genes involved in cellular transport or homeostasis but rarely with genes involved in RNA processing. Mechanistically, we discover that young de novo nORFs located downstream of conserved genes tend to leverage their neighbors' promoters through transcription readthrough, resulting in high coexpression and high expression levels. Transcriptional piggybacking also influences the coexpression profiles of young de novo nORFs located upstream of genes, but to a lesser extent and without detectable impact on expression levels. Transcriptional piggybacking influences, but does not determine, the transcription profiles of de novo nORFs emerging nearby genes. About 40% of nORFs are not strongly coexpressed with any gene but are transcriptionally regulated nonetheless and tend to form entirely new transcription modules. We offer a web browser interface ( https://carvunislab.csb.pitt.edu/shiny/coexpression/ ) to efficiently query, visualize, and download our coexpression inferences. CONCLUSIONS: Our results suggest that nORF transcription is highly regulated. Our coexpression dataset serves as an unprecedented resource for unraveling how nORFs integrate into cellular networks, contribute to cellular phenotypes, and evolve.


Subject(s)
Gene Expression Regulation, Fungal , Open Reading Frames , Saccharomyces cerevisiae , Transcription, Genetic , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Evolution, Molecular , Protein Biosynthesis
5.
Genome Biol Evol ; 16(7)2024 07 03.
Article in English | MEDLINE | ID: mdl-38879874

ABSTRACT

For protein coding genes to emerge de novo from a non-genic DNA, the DNA sequence must gain an open reading frame (ORF) and the ability to be transcribed. The newborn de novo gene can further evolve to accumulate changes in its sequence. Consequently, it can also elongate or shrink with time. Existing literature shows that older de novo genes have longer ORF, but it is not clear if they elongated with time or remained of the same length since their inception. To address this question we developed a mathematical model of ORF elongation as a Markov-jump process, and show that ORFs tend to keep their length in short evolutionary timescales. We also show that if change occurs it is likely to be a truncation. Our genomics and transcriptomics data analyses of seven Drosophila melanogaster populations are also in agreement with the model's prediction. We conclude that selection could facilitate ORF length extension that may explain why longer ORFs were observed in old de novo genes in studies analysing longer evolutionary time scales. Alternatively, shorter ORFs may be purged because they may be less likely to yield functional proteins.


Subject(s)
Drosophila melanogaster , Evolution, Molecular , Models, Genetic , Open Reading Frames , Animals , Drosophila melanogaster/genetics , Markov Chains
6.
Int J Mol Sci ; 25(7)2024 Mar 25.
Article in English | MEDLINE | ID: mdl-38612464

ABSTRACT

Immunodominant alloantigens in pig sperm membranes include 15 known gene products and a previously undiscovered Mr 20,000 sperm membrane-specific protein (SMA20). Here we characterize SMA20 and identify it as the unannotated pig ortholog of PMIS2. A composite SMA20 cDNA encoded a 126 amino acid polypeptide comprising two predicted transmembrane segments and an N-terminal alanine- and proline (AP)-rich region with no apparent signal peptide. The Northern blots showed that the composite SMA20 cDNA was derived from a 1.1 kb testis-specific transcript. A BLASTp search retrieved no SMA20 match from the pig genome, but it did retrieve a 99% match to the Pmis2 gene product in warthog. Sequence identity to predicted PMIS2 orthologs from other placental mammals ranged from no more than 80% overall in Cetartiodactyla to less than 60% in Primates, with the AP-rich region showing the highest divergence, including, in the extreme, its absence in most rodents, including the mouse. SMA20 immunoreactivity localized to the acrosome/apical head of methanol-fixed boar spermatozoa but not live, motile cells. Ultrastructurally, the SMA20 AP-rich domain immunolocalized to the inner leaflet of the plasma membrane, the outer acrosomal membrane, and the acrosomal contents of ejaculated spermatozoa. Gene name search failed to retrieve annotated Pmis2 from most mammalian genomes. Nevertheless, individual pairwise interrogation of loci spanning Atp4a-Haus5 identified Pmis2 in all placental mammals, but not in marsupials or monotremes. We conclude that the gene encoding sperm-specific SMA20/PMIS2 arose de novo in Eutheria after divergence from Metatheria, whereupon rapid molecular evolution likely drove the acquisition of a species-divergent function unique to fertilization in placental mammals.


Subject(s)
Placenta , Semen , Male , Female , Pregnancy , Swine , Animals , Mice , DNA, Complementary , Spermatozoa , Eutheria , Alanine , Isoantigens/genetics , Fertilization/genetics
7.
medRxiv ; 2023 Nov 27.
Article in English | MEDLINE | ID: mdl-38076862

ABSTRACT

The orphan gene of SARS-CoV-2, ORF10, is the least studied gene in the virus responsible for the COVID-19 pandemic. Recent experimentation indicated ORF10 expression moderates innate immunity in vitro. However, whether ORF10 affects COVID-19 in humans remained unknown. We determine that the ORF10 sequence is identical to the Wuhan-Hu-1 ancestral haplotype in 95% of genomes across five variants of concern (VOC). Four ORF10 variants are associated with less virulent clinical outcomes in the human host: three of these affect ORF10 protein structure, one affects ORF10 RNA structural dynamics. RNA-Seq data from 2070 samples from diverse human cells and tissues reveals ORF10 accumulation is conditionally discordant from that of other SARS-CoV-2 transcripts. Expression of ORF10 in A549 and HEK293 cells perturbs immune-related gene expression networks, alters expression of the majority of mitochondrially-encoded genes of oxidative respiration, and leads to large shifts in levels of 14 newly-identified transcripts. We conclude ORF10 contributes to more severe COVID-19 clinical outcomes in the human host.

8.
BMC Biol ; 21(1): 257, 2023 11 13.
Article in English | MEDLINE | ID: mdl-37957718

ABSTRACT

BACKGROUND: Over evolutionary timescales, genomic loci can switch between functional and non-functional states through processes such as pseudogenization and de novo gene birth. Particularly, de novo gene birth is a widespread process, and many examples continue to be discovered across diverse evolutionary lineages. However, the general mechanisms that lead to functionalization are poorly understood, and estimated rates of de novo gene birth remain contentious. Here, we address this problem within a model that takes into account mutations and structural variation, allowing us to estimate the likelihood of emergence of new functions at non-functional loci. RESULTS: Assuming biologically reasonable mutation rates and mutational effects, we find that functionalization of non-genic loci requires the realization of strict conditions. This is in line with the observation that most de novo genes are localized to the vicinity of established genes. Our model also provides an explanation for the empirical observation that emerging proto-genes are often lost despite showing signs of adaptation. CONCLUSIONS: Our work elucidates the properties of non-genic loci that make them fertile for adaptation, and our results offer mechanistic insights into the process of de novo gene birth.


Subject(s)
Biological Evolution , Evolution, Molecular , Mutation
9.
Mol Biol Evol ; 40(5)2023 05 02.
Article in English | MEDLINE | ID: mdl-37139943

ABSTRACT

The formation of new genes during evolution is an important motor of functional innovation, but the rate at which new genes originate and the likelihood that they persist over longer evolutionary periods are still poorly understood questions. Two important mechanisms by which new genes arise are gene duplication and de novo formation from a previously noncoding sequence. Does the mechanism of formation influence the evolutionary trajectories of the genes? Proteins arisen by gene duplication retain the sequence and structural properties of the parental protein, and thus they may be relatively stable. Instead, de novo originated proteins are often species specific and thought to be more evolutionary labile. Despite these differences, here we show that both types of genes share a number of similarities, including low sequence constraints in their initial evolutionary phases, high turnover rates at the species level, and comparable persistence rates in deeper branchers, in both yeast and flies. In addition, we show that putative de novo proteins have an excess of substitutions between charged amino acids compared with the neutral expectation, which is reflected in the rapid loss of their initial highly basic character. The study supports high evolutionary dynamics of different kinds of new genes at the species level, in sharp contrast with the stability observed at later stages.


Subject(s)
Evolution, Molecular , Proteins , Proteins/genetics , Gene Duplication , Saccharomyces cerevisiae/genetics , Phylogeny
10.
Cell Syst ; 14(5): 363-381.e8, 2023 05 17.
Article in English | MEDLINE | ID: mdl-37164009

ABSTRACT

Translation is the process by which ribosomes synthesize proteins. Ribosome profiling recently revealed that many short sequences previously thought to be noncoding are pervasively translated. To identify protein-coding genes in this noncanonical translatome, we combine an integrative framework for extremely sensitive ribosome profiling analysis, iRibo, with high-powered selection inferences tailored for short sequences. We construct a reference translatome for Saccharomyces cerevisiae comprising 5,400 canonical and almost 19,000 noncanonical translated elements. Only 14 noncanonical elements were evolving under detectable purifying selection. A representative subset of translated elements lacking signatures of selection demonstrated involvement in processes including DNA repair, stress response, and post-transcriptional regulation. Our results suggest that most translated elements are not conserved protein-coding genes and contribute to genotype-phenotype relationships through fast-evolving molecular mechanisms.


Subject(s)
Gene Expression Regulation , Ribosomes , Ribosomes/genetics , Ribosomes/metabolism , Saccharomyces cerevisiae/genetics , Phenotype
11.
Mol Biol Evol ; 40(4)2023 04 04.
Article in English | MEDLINE | ID: mdl-37011142

ABSTRACT

New protein coding genes can emerge from genomic regions that previously did not contain any genes, via a process called de novo gene emergence. To synthesize a protein, DNA must be transcribed as well as translated. Both processes need certain DNA sequence features. Stable transcription requires promoters and a polyadenylation signal, while translation requires at least an open reading frame. We develop mathematical models based on mutation probabilities, and the assumption of neutral evolution, to find out how quickly genes emerge and are lost. We also investigate the effect of the order by which DNA features evolve, and if sequence composition is biased by mutation rate. We rationalize how genes are lost much more rapidly than they emerge, and how they preferentially arise in regions that are already transcribed. Our study not only answers some fundamental questions on the topic of de novo emergence but also provides a modeling framework for future studies.


Subject(s)
Evolution, Molecular , Genomics , Mutation , Open Reading Frames , Genome
12.
Plant Direct ; 7(3): e484, 2023 Mar.
Article in English | MEDLINE | ID: mdl-36937792

ABSTRACT

Diploid plant genomes typically contain ~35,000 genes, almost all belonging to highly conserved gene families. Only a small fraction are lineage-specific, which are found in only one or few closely related species. Little is known about how genes arise de novo in plant genomes and how often this occurs; however, they are believed to be important for plants diversification and adaptation. We developed a pipeline to identify lineage-specific genes in Triticeae, using newly available genome assemblies of wheat, barley, and rye. Applying a set of stringent criteria, we identified 5942 candidate Triticeae-specific genes (TSGs), of which 2337 were validated as protein-coding genes in wheat. Differential gene expression analyses revealed that stress-induced wheat TSGs are strongly enriched in putative secreted proteins. Some were previously described to be involved in Triticeae non-host resistance and cold response. Additionally, we show that 1079 TSGs have sequence homology to transposable elements (TEs), ~68% of them deriving from regulatory non-coding regions of Gypsy retrotransposons. Most importantly, we demonstrate that these TSGs are enriched in transmembrane domains and are among the most highly expressed wheat genes overall. To summarize, we conclude that de novo gene formation is relatively rare and that Triticeae probably possess ~779 lineage-specific genes per haploid genome. TSGs, which respond to pathogen and environmental stresses, may be interesting candidates for future targeted resistance breeding in Triticeae. Finally, we propose that non-coding regions of TEs might provide important genetic raw material for the functional innovation of TM domains and the evolution of novel secreted proteins.

13.
Mol Biol Evol ; 40(3)2023 03 04.
Article in English | MEDLINE | ID: mdl-36917489

ABSTRACT

Intergenic genomic regions have essential regulatory and structural roles that impose constraints on their sequences. But regions that do not currently encode proteins also carry the potential to do so in the future. De novo gene emergence, the evolution of novel genes out of previously noncoding sequences has now been established as a potent force for genomic novelty. Recently, it was shown that intergenic regions in the genome of Saccharomyces cerevisiae harbor pervasive cryptic potential to, if theoretically translated, form transmembrane domains (TM domains) more frequently than expected by chance given their nucleotide composition, a property that we refer to as TM-forming enrichment. The source and biological relevance of this property is unknown. Here, we expand the investigation into the TM-forming potential of intergenic regions to the entire Saccharomycotina budding yeast subphylum, in an effort to explain this property and understand its importance. We find pervasive but variable enrichment in TM-forming potential across the subphylum regardless of the composition and average size of intergenic regions. This cryptic property is evenly spread across the genome, cannot be explained by the hydrophobic content of the sequence, and does not appear to localize to regions containing regulatory motifs. This TM-forming enrichment specifically, and not the actual TM-forming potential, is associated, across genomes, with more TM domains in evolutionarily young genes. Our findings shed light on this newly discovered feature of yeast genomes and constitute a first step toward understanding its evolutionary importance.


Subject(s)
Saccharomycetales , Yeasts , DNA, Intergenic/genetics , Yeasts/genetics , Saccharomyces cerevisiae/genetics , Genomics , Genome , Saccharomycetales/genetics
14.
Trends Genet ; 39(4): 235-236, 2023 04.
Article in English | MEDLINE | ID: mdl-36774242

ABSTRACT

Genes restricted to a given species or lineage are mysterious. Many emerged de novo from ancestral noncoding genomic regions rather than from pre-existing genes. A new study by Vakirlis and colleagues shows that, in humans, many of these are associated with phenotypic effects, accelerating our understanding of their functional importance.


Subject(s)
Evolution, Molecular , Hominidae , Animals , Humans , Genome , Genomics , CRISPR-Cas Systems
15.
Adv Sci (Weinh) ; 10(7): e2204140, 2023 03.
Article in English | MEDLINE | ID: mdl-36638273

ABSTRACT

Newly originated de novo genes have been linked to the formation and function of the human brain. However, how a specific gene originates from ancestral noncoding DNAs and becomes involved in the preexisting network for functional outcomes remains elusive. Here, a human-specific de novo gene, SP0535, is identified that is preferentially expressed in the ventricular zone of the human fetal brain and plays an important role in cortical development and function. In human embryonic stem cell-derived cortical organoids, knockout of SP0535 compromises their growth and neurogenesis. In SP0535 transgenic (TG) mice, expression of SP0535 induces fetal cortex expansion and sulci and gyri-like structure formation. The progenitors and neurons in the SP0535 TG mouse cortex tend to proliferate and differentiate in ways that are unique to humans. SP0535 TG adult mice also exhibit improved cognitive ability and working memory. Mechanistically, SP0535 interacts with the membrane protein Na+ /K+ ATPase subunit alpha-1 (ATP1A1) and releases Src from the ATP1A1-Src complex, allowing increased level of Src phosphorylation that promotes cell proliferation. Thus, SP0535 is the first proven human-specific de novo gene that promotes cortical expansion and folding, and can function through incorporating into an existing conserved molecular network.


Subject(s)
Neurogenesis , Neurons , Mice , Animals , Humans , Mice, Transgenic , Neurogenesis/genetics
16.
Plant Mol Biol ; 111(1-2): 189-203, 2023 Jan.
Article in English | MEDLINE | ID: mdl-36306001

ABSTRACT

De novo genes created in the plant mitochondrial genome have frequently been transferred into the nuclear genome via intergenomic gene transfer events. Therefore, plant mitochondria might be a source of de novo genes in the nuclear genome. However, the functions of de novo genes originating from mitochondria and the evolutionary fate remain unclear. Here, we revealed that an Arabidopsis thaliana specific small coding gene derived from the mitochondrial genome regulates floral transition. We previously identified 49 candidate de novo genes that induce abnormal morphological changes on overexpression. We focused on a candidate gene derived from the mitochondrial genome (sORF2146) that encodes 66 amino acids. Comparative genomic analyses indicated that the mitochondrial sORF2146 emerged in the Brassica lineage as a de novo gene. The nuclear sORF2146 emerged following an intergenomic gene transfer event in the A. thaliana after the divergence between Arabidopsis and Capsella. Although the nuclear and mitochondrial sORF2146 sequences are the same in A. thaliana, only the nuclear sORF2146 is transcribed. The nuclear sORF2146 product is localized in mitochondria, which may be associated with the pseudogenization of the mitochondrial sORF2146. To functionally characterize the nuclear sORF2146, we performed a transcriptomic analysis of transgenic plants overexpressing the nuclear sORF2146. Flowering transition-related genes were highly regulated in the transgenic plants. Subsequent phenotypic analyses demonstrated that the overexpression and knockdown of sORF2146 in transgenic plants resulted in delayed and early flowering, respectively. These findings suggest that a lineage-specific de novo gene derived from mitochondria has an important regulatory effect on floral transition.


Subject(s)
Arabidopsis Proteins , Arabidopsis , Brassica , Arabidopsis/metabolism , Genome, Plant , Brassica/genetics , Gene Expression Profiling , Mitochondria/genetics , Mitochondria/metabolism , Arabidopsis Proteins/genetics , Arabidopsis Proteins/metabolism , Gene Expression Regulation, Plant , Flowers/genetics , Flowers/metabolism
17.
Genome Biol Evol ; 2022 Jun 07.
Article in English | MEDLINE | ID: mdl-35668555

ABSTRACT

Proteins are the workhorses of the cell, yet they carry great potential for harm via misfolding and aggregation. Despite the dangers, proteins are sometimes born de novo from non-coding DNA. Proteins are more likely to be born from non-coding regions that produce peptides that do little to no harm when translated than from regions that produce harmful peptides. To investigate which newborn proteins are most likely to "first, do no harm", we estimate fitnesses from an experiment that competed Escherichia coli lineages that each expressed a unique random peptide. A variety of peptide metrics significantly predict lineage fitness, but this predictive power stems from simple amino acid frequencies rather than the ordering of amino acids. Amino acids that are smaller and that promote intrinsic structural disorder have more benign fitness effects. We validate that the amino acids that indicate benign effects in random peptides expressed in E. coli also do so in an independent dataset of random N-terminal tags in which it is possible to control for expression level. The same amino acids are also enriched in young animal proteins.

18.
Plant J ; 111(4): 1081-1095, 2022 08.
Article in English | MEDLINE | ID: mdl-35748398

ABSTRACT

De novo genes are derived from non-coding sequences, and they can play essential roles in organisms. Cultivated peanut (Arachis hypogaea) is a major oil and protein crop derived from a cross between Arachis duranensis and Arachis ipaensis. However, few de novo genes have been documented in Arachis. Here, we identified 381 de novo genes in A. hypogaea cv. Tifrunner based on comparison with five closely related Arachis species. There are distinct differences in gene expression patterns and gene structures between conserved and de novo genes. The identified de novo genes originated from ancestral sequence regions associated with metabolic and biosynthetic processes, and they were subsequently integrated into existing regulatory networks. De novo paralogs and homoeologs were identified in A. hypogaea cv. Tifrunner. De novo paralogs and homoeologs with conserved expression have mismatching cis-acting elements under normal growth conditions. De novo genes potentially have pluripotent functions in responses to biotic stresses as well as in growth and development based on quantitative trait locus data. This work provides a foundation for future research examining gene birth processes and gene function in Arachis and related taxa.


Subject(s)
Arachis , Evolution, Molecular , Arachis/genetics , Arachis/metabolism , Quantitative Trait Loci/genetics
19.
Mol Biol Evol ; 39(1)2022 01 07.
Article in English | MEDLINE | ID: mdl-34792602

ABSTRACT

All genomes include gene families with very limited taxonomic distributions that potentially represent new genes and innovations in protein-coding sequence, raising questions on the origins of such genes. Some of these genes are hypothesized to have formed de novo, from noncoding sequences, and recent work has begun to elucidate the processes by which de novo gene formation can occur. A special case of de novo gene formation, overprinting, describes the origin of new genes from noncoding alternative reading frames of existing open reading frames (ORFs). We argue that additionally, out-of-frame gene fission/fusion events of alternative reading frames of ORFs and out-of-frame lateral gene transfers could contribute to the origin of new gene families. To demonstrate this, we developed an original pattern-search in sequence similarity networks, enhancing the use of these graphs, commonly used to detect in-frame remodeled genes. We applied this approach to gene families in 524 complete genomes of Escherichia coli. We identified 767 gene families whose evolutionary history likely included at least one out-of-frame remodeling event. These genes with out-of-frame components represent ∼2.5% of all genes in the E. coli pangenome, suggesting that alternative reading frames of existing ORFs can contribute to a significant proportion of de novo genes in bacteria.


Subject(s)
Escherichia coli , Evolution, Molecular , Escherichia coli/genetics , Open Reading Frames , Phylogeny , Reading Frames
20.
Genes (Basel) ; 12(12)2021 11 24.
Article in English | MEDLINE | ID: mdl-34946813

ABSTRACT

Microproteins (<100 amino acids) are receiving increasing recognition as important participants in numerous biological processes, but their evolutionary dynamics are poorly understood. SPAAR is a recently discovered microprotein that regulates muscle regeneration and angiogenesis through interactions with conserved signaling pathways. Interestingly, SPAAR does not belong to any known protein family and has known homologs exclusively among placental mammals. This lack of distant homology could be caused by challenges in homology detection of short sequences, or it could indicate a recent de novo emergence from a noncoding sequence. By integrating syntenic alignments and homology searches, we identify SPAAR orthologs in marsupials and monotremes, establishing that SPAAR has existed at least since the emergence of mammals. SPAAR shows substantial primary sequence divergence but retains a conserved protein structure. In primates, we infer two independent evolutionary events leading to the de novo origination of 5' elongated isoforms of SPAAR from a noncoding sequence and find evidence of adaptive evolution in this extended region. Thus, SPAAR may be of ancient origin, but it appears to be experiencing continual evolutionary innovation in mammals.


Subject(s)
Peptides/genetics , RNA, Long Noncoding/genetics , Animals , Evolution, Molecular , Female , Humans , Mammals/genetics , Mice , Opossums/genetics , Phylogeny , Placenta/metabolism , Platypus/genetics , Pregnancy , Primates/genetics
SELECTION OF CITATIONS
SEARCH DETAIL