Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 61
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
BMC Genomics ; 25(Suppl 3): 834, 2024 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-39237856

RESUMO

BACKGROUND: Novel protein-coding genes were considered to be born by re-organization of pre-existing genes, such as gene duplication and gene fusion. However, recent progress of genome research revealed that more protein-coding genes than expected were born de novo, that is, gene origination by accumulating mutations in non-genic DNA sequences. Nonetheless, the in-depth process (scenario) for de novo origination is not well understood. RESULTS: We have conceived bioinformatic analysis for sketching a scenario for de novo origination of protein-coding genes. For each de novo protein-coding gene, we firstly identified an edge of a given phylogenetic tree where the gene was born based on parsimony. Then, from a multiple sequence alignment of the de novo gene and its orthologous regions, we constructed ancestral DNA sequences of the gene corresponding to both end nodes of the edge. We finally revealed statistical features observed in evolution between the two ancestral sequences. In the analysis of the Saccharomyces cerevisiae lineage, we have successfully sketched a putative scenario for de novo origination of protein-coding genes. (1) In the beginning was GC-rich genome regions. (2) Neutral mutations were accumulated in the regions. (3) ORFs were extended/combined, and then (4) translation signature (Kozak consensus sequence) was recruited. Interestingly, as the scenario progresses from (2) to (4), the specificity of mutations increases. CONCLUSION: To the best of our knowledge, this is the first report outlining a scenario of de novo origination of protein-coding genes. Our bioinformatic analysis can capture events that occur during a short evolutionary time by directly observing the evolution of the ancestral sequences from non-genic to genic. This property is suitable for the analysis of fast evolving de novo genes.


Assuntos
Evolução Molecular , Fases de Leitura Aberta , Filogenia , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Biologia Computacional/métodos , Mutação , Genoma Fúngico
2.
bioRxiv ; 2024 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-39257767

RESUMO

Novel proteins can originate de novo from non-coding DNA and contribute to species-specific adaptations. It is challenging to conceive how de novo emerging proteins may integrate pre-existing cellular systems to bring about beneficial traits, given that their sequences are previously unseen by the cell. To address this apparent paradox, we investigated 26 de novo emerging proteins previously associated with growth benefits in yeast. Microscopy revealed that these beneficial emerging proteins preferentially localize to the endoplasmic reticulum (ER). Sequence and structure analyses uncovered a common protein organization among all ER-localizing beneficial emerging proteins, characterized by a short hydrophobic C-terminus immediately preceded by a transmembrane domain. Using genetic and biochemical approaches, we showed that ER localization of beneficial emerging proteins requires the GET and SND pathways, both of which are evolutionarily conserved and known to recognize transmembrane domains to promote post-translational ER insertion. The abundance of ER-localizing beneficial emerging proteins was regulated by conserved proteasome- and vacuole-dependent processes, through mechanisms that appear to be facilitated by the emerging proteins' C-termini. Consequently, we propose that evolutionarily conserved pathways can convergently govern the cellular processing of de novo emerging proteins with unique sequences, likely owing to common underlying protein organization patterns.

3.
Genome Biol ; 25(1): 183, 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38978079

RESUMO

BACKGROUND: Recent studies uncovered pervasive transcription and translation of thousands of noncanonical open reading frames (nORFs) outside of annotated genes. The contribution of nORFs to cellular phenotypes is difficult to infer using conventional approaches because nORFs tend to be short, of recent de novo origins, and lowly expressed. Here we develop a dedicated coexpression analysis framework that accounts for low expression to investigate the transcriptional regulation, evolution, and potential cellular roles of nORFs in Saccharomyces cerevisiae. RESULTS: Our results reveal that nORFs tend to be preferentially coexpressed with genes involved in cellular transport or homeostasis but rarely with genes involved in RNA processing. Mechanistically, we discover that young de novo nORFs located downstream of conserved genes tend to leverage their neighbors' promoters through transcription readthrough, resulting in high coexpression and high expression levels. Transcriptional piggybacking also influences the coexpression profiles of young de novo nORFs located upstream of genes, but to a lesser extent and without detectable impact on expression levels. Transcriptional piggybacking influences, but does not determine, the transcription profiles of de novo nORFs emerging nearby genes. About 40% of nORFs are not strongly coexpressed with any gene but are transcriptionally regulated nonetheless and tend to form entirely new transcription modules. We offer a web browser interface ( https://carvunislab.csb.pitt.edu/shiny/coexpression/ ) to efficiently query, visualize, and download our coexpression inferences. CONCLUSIONS: Our results suggest that nORF transcription is highly regulated. Our coexpression dataset serves as an unprecedented resource for unraveling how nORFs integrate into cellular networks, contribute to cellular phenotypes, and evolve.


Assuntos
Regulação Fúngica da Expressão Gênica , Fases de Leitura Aberta , Saccharomyces cerevisiae , Transcrição Gênica , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Evolução Molecular , Biossíntese de Proteínas
4.
Genome Biol Evol ; 16(8)2024 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-39004885

RESUMO

New protein-coding genes can evolve from previously noncoding genomic regions through a process known as de novo gene emergence. Evidence suggests that this process has likely occurred throughout evolution and across the tree of life. Yet, confidently identifying de novo emerged genes remains challenging. Ancestral sequence reconstruction is a promising approach for inferring whether a gene has emerged de novo or not, as it allows us to inspect whether a given genomic locus ancestrally harbored protein-coding capacity. However, the use of ancestral sequence reconstruction in the context of de novo emergence is still in its infancy and its capabilities, limitations, and overall potential are largely unknown. Notably, it is difficult to formally evaluate the protein-coding capacity of ancestral sequences, particularly when new gene candidates are short. How well-suited is ancestral sequence reconstruction as a tool for the detection and study of de novo genes? Here, we address this question by designing an ancestral sequence reconstruction workflow incorporating different tools and sets of parameters and by introducing a formal criterion that allows to estimate, within a desired level of confidence, when protein-coding capacity originated at a particular locus. Applying this workflow on ∼2,600 short, annotated budding yeast genes (<1,000 nucleotides), we found that ancestral sequence reconstruction robustly predicts an ancient origin for the most widely conserved genes, which constitute "easy" cases. For less robust cases, we calculated a randomization-based empirical P-value estimating whether the observed conservation between the extant and ancestral reading frame could be attributed to chance. This formal criterion allowed us to pinpoint a branch of origin for most of the less robust cases, identifying 49 genes that can unequivocally be considered de novo originated since the split of the Saccharomyces genus, including 37 Saccharomyces cerevisiae-specific genes. We find that for the remaining equivocal cases we cannot rule out different evolutionary scenarios including rapid evolution, multiple gene losses, or a recent de novo origin. Overall, our findings suggest that ancestral sequence reconstruction is a valuable tool to study de novo gene emergence but should be applied with caution and awareness of its limitations.


Assuntos
Evolução Molecular , Saccharomyces cerevisiae/genética , Filogenia , Genoma Fúngico , Genes Fúngicos
5.
Genome Biol Evol ; 16(7)2024 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-38879874

RESUMO

For protein coding genes to emerge de novo from a non-genic DNA, the DNA sequence must gain an open reading frame (ORF) and the ability to be transcribed. The newborn de novo gene can further evolve to accumulate changes in its sequence. Consequently, it can also elongate or shrink with time. Existing literature shows that older de novo genes have longer ORF, but it is not clear if they elongated with time or remained of the same length since their inception. To address this question we developed a mathematical model of ORF elongation as a Markov-jump process, and show that ORFs tend to keep their length in short evolutionary timescales. We also show that if change occurs it is likely to be a truncation. Our genomics and transcriptomics data analyses of seven Drosophila melanogaster populations are also in agreement with the model's prediction. We conclude that selection could facilitate ORF length extension that may explain why longer ORFs were observed in old de novo genes in studies analysing longer evolutionary time scales. Alternatively, shorter ORFs may be purged because they may be less likely to yield functional proteins.


Assuntos
Drosophila melanogaster , Evolução Molecular , Modelos Genéticos , Fases de Leitura Aberta , Animais , Drosophila melanogaster/genética , Cadeias de Markov
6.
Int J Mol Sci ; 25(7)2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38612464

RESUMO

Immunodominant alloantigens in pig sperm membranes include 15 known gene products and a previously undiscovered Mr 20,000 sperm membrane-specific protein (SMA20). Here we characterize SMA20 and identify it as the unannotated pig ortholog of PMIS2. A composite SMA20 cDNA encoded a 126 amino acid polypeptide comprising two predicted transmembrane segments and an N-terminal alanine- and proline (AP)-rich region with no apparent signal peptide. The Northern blots showed that the composite SMA20 cDNA was derived from a 1.1 kb testis-specific transcript. A BLASTp search retrieved no SMA20 match from the pig genome, but it did retrieve a 99% match to the Pmis2 gene product in warthog. Sequence identity to predicted PMIS2 orthologs from other placental mammals ranged from no more than 80% overall in Cetartiodactyla to less than 60% in Primates, with the AP-rich region showing the highest divergence, including, in the extreme, its absence in most rodents, including the mouse. SMA20 immunoreactivity localized to the acrosome/apical head of methanol-fixed boar spermatozoa but not live, motile cells. Ultrastructurally, the SMA20 AP-rich domain immunolocalized to the inner leaflet of the plasma membrane, the outer acrosomal membrane, and the acrosomal contents of ejaculated spermatozoa. Gene name search failed to retrieve annotated Pmis2 from most mammalian genomes. Nevertheless, individual pairwise interrogation of loci spanning Atp4a-Haus5 identified Pmis2 in all placental mammals, but not in marsupials or monotremes. We conclude that the gene encoding sperm-specific SMA20/PMIS2 arose de novo in Eutheria after divergence from Metatheria, whereupon rapid molecular evolution likely drove the acquisition of a species-divergent function unique to fertilization in placental mammals.


Assuntos
Placenta , Sêmen , Masculino , Feminino , Gravidez , Suínos , Animais , Camundongos , DNA Complementar , Espermatozoides , Eutérios , Alanina , Isoantígenos/genética , Fertilização/genética
7.
medRxiv ; 2023 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-38076862

RESUMO

The orphan gene of SARS-CoV-2, ORF10, is the least studied gene in the virus responsible for the COVID-19 pandemic. Recent experimentation indicated ORF10 expression moderates innate immunity in vitro. However, whether ORF10 affects COVID-19 in humans remained unknown. We determine that the ORF10 sequence is identical to the Wuhan-Hu-1 ancestral haplotype in 95% of genomes across five variants of concern (VOC). Four ORF10 variants are associated with less virulent clinical outcomes in the human host: three of these affect ORF10 protein structure, one affects ORF10 RNA structural dynamics. RNA-Seq data from 2070 samples from diverse human cells and tissues reveals ORF10 accumulation is conditionally discordant from that of other SARS-CoV-2 transcripts. Expression of ORF10 in A549 and HEK293 cells perturbs immune-related gene expression networks, alters expression of the majority of mitochondrially-encoded genes of oxidative respiration, and leads to large shifts in levels of 14 newly-identified transcripts. We conclude ORF10 contributes to more severe COVID-19 clinical outcomes in the human host.

8.
BMC Biol ; 21(1): 257, 2023 11 13.
Artigo em Inglês | MEDLINE | ID: mdl-37957718

RESUMO

BACKGROUND: Over evolutionary timescales, genomic loci can switch between functional and non-functional states through processes such as pseudogenization and de novo gene birth. Particularly, de novo gene birth is a widespread process, and many examples continue to be discovered across diverse evolutionary lineages. However, the general mechanisms that lead to functionalization are poorly understood, and estimated rates of de novo gene birth remain contentious. Here, we address this problem within a model that takes into account mutations and structural variation, allowing us to estimate the likelihood of emergence of new functions at non-functional loci. RESULTS: Assuming biologically reasonable mutation rates and mutational effects, we find that functionalization of non-genic loci requires the realization of strict conditions. This is in line with the observation that most de novo genes are localized to the vicinity of established genes. Our model also provides an explanation for the empirical observation that emerging proto-genes are often lost despite showing signs of adaptation. CONCLUSIONS: Our work elucidates the properties of non-genic loci that make them fertile for adaptation, and our results offer mechanistic insights into the process of de novo gene birth.


Assuntos
Evolução Biológica , Evolução Molecular , Mutação
9.
Mol Biol Evol ; 40(5)2023 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-37139943

RESUMO

The formation of new genes during evolution is an important motor of functional innovation, but the rate at which new genes originate and the likelihood that they persist over longer evolutionary periods are still poorly understood questions. Two important mechanisms by which new genes arise are gene duplication and de novo formation from a previously noncoding sequence. Does the mechanism of formation influence the evolutionary trajectories of the genes? Proteins arisen by gene duplication retain the sequence and structural properties of the parental protein, and thus they may be relatively stable. Instead, de novo originated proteins are often species specific and thought to be more evolutionary labile. Despite these differences, here we show that both types of genes share a number of similarities, including low sequence constraints in their initial evolutionary phases, high turnover rates at the species level, and comparable persistence rates in deeper branchers, in both yeast and flies. In addition, we show that putative de novo proteins have an excess of substitutions between charged amino acids compared with the neutral expectation, which is reflected in the rapid loss of their initial highly basic character. The study supports high evolutionary dynamics of different kinds of new genes at the species level, in sharp contrast with the stability observed at later stages.


Assuntos
Evolução Molecular , Proteínas , Proteínas/genética , Duplicação Gênica , Saccharomyces cerevisiae/genética , Filogenia
10.
Cell Syst ; 14(5): 363-381.e8, 2023 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-37164009

RESUMO

Translation is the process by which ribosomes synthesize proteins. Ribosome profiling recently revealed that many short sequences previously thought to be noncoding are pervasively translated. To identify protein-coding genes in this noncanonical translatome, we combine an integrative framework for extremely sensitive ribosome profiling analysis, iRibo, with high-powered selection inferences tailored for short sequences. We construct a reference translatome for Saccharomyces cerevisiae comprising 5,400 canonical and almost 19,000 noncanonical translated elements. Only 14 noncanonical elements were evolving under detectable purifying selection. A representative subset of translated elements lacking signatures of selection demonstrated involvement in processes including DNA repair, stress response, and post-transcriptional regulation. Our results suggest that most translated elements are not conserved protein-coding genes and contribute to genotype-phenotype relationships through fast-evolving molecular mechanisms.


Assuntos
Regulação da Expressão Gênica , Ribossomos , Ribossomos/genética , Ribossomos/metabolismo , Saccharomyces cerevisiae/genética , Fenótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA