Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 70
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Trends Genet ; 39(4): 235-236, 2023 04.
Article in English | MEDLINE | ID: mdl-36774242

ABSTRACT

Genes restricted to a given species or lineage are mysterious. Many emerged de novo from ancestral noncoding genomic regions rather than from pre-existing genes. A new study by Vakirlis and colleagues shows that, in humans, many of these are associated with phenotypic effects, accelerating our understanding of their functional importance.


Subject(s)
Evolution, Molecular , Hominidae , Animals , Humans , Genome , Genomics , CRISPR-Cas Systems
2.
Genome Res ; 2022 May 26.
Article in English | MEDLINE | ID: mdl-35618415

ABSTRACT

The unicellular yeast Schizosaccharomyces pombe (fission yeast) retains many of the splicing features observed in humans and is thus an excellent model to study the basic mechanisms of splicing. Nearly half the genes contain introns, but the impact of alternative splicing in gene regulation and proteome diversification remains largely unexplored. Here we leverage Oxford Nanopore Technologies native RNA sequencing (dRNA), as well as ribosome profiling data, to uncover the full range of polyadenylated transcripts and translated open reading frames. We identify 332 alternative isoforms affecting the coding sequences of 262 different genes, 97 of which occur at frequencies higher than 20%, indicating that functional alternative splicing in S. pombe is more prevalent than previously suspected. Intron retention events make about 80% of the cases; these events may be involved in the regulation of gene expression and, in some cases, generate novel protein isoforms, as supported by ribosome profiling data in 18 of the intron retention isoforms. One example is the rpl22 gene, in which intron retention is associated with the translation of a protein of only 13 amino acids. We also find that lowly expressed transcripts tend to have longer poly(A) tails than highly expressed transcripts, highlighting an interdependence between poly(A) tail length and transcript expression level. Finally, we discover 214 novel transcripts that are not annotated, including 158 antisense transcripts, some of which also show translation evidence. The methodologies described in this work open new opportunities to study the regulation of splicing in a simple eukaryotic model.

3.
Mol Biol Evol ; 40(5)2023 05 02.
Article in English | MEDLINE | ID: mdl-37139943

ABSTRACT

The formation of new genes during evolution is an important motor of functional innovation, but the rate at which new genes originate and the likelihood that they persist over longer evolutionary periods are still poorly understood questions. Two important mechanisms by which new genes arise are gene duplication and de novo formation from a previously noncoding sequence. Does the mechanism of formation influence the evolutionary trajectories of the genes? Proteins arisen by gene duplication retain the sequence and structural properties of the parental protein, and thus they may be relatively stable. Instead, de novo originated proteins are often species specific and thought to be more evolutionary labile. Despite these differences, here we show that both types of genes share a number of similarities, including low sequence constraints in their initial evolutionary phases, high turnover rates at the species level, and comparable persistence rates in deeper branchers, in both yeast and flies. In addition, we show that putative de novo proteins have an excess of substitutions between charged amino acids compared with the neutral expectation, which is reflected in the rapid loss of their initial highly basic character. The study supports high evolutionary dynamics of different kinds of new genes at the species level, in sharp contrast with the stability observed at later stages.


Subject(s)
Evolution, Molecular , Proteins , Proteins/genetics , Gene Duplication , Saccharomyces cerevisiae/genetics , Phylogeny
4.
Proc Natl Acad Sci U S A ; 117(42): 26197-26205, 2020 10 20.
Article in English | MEDLINE | ID: mdl-33033229

ABSTRACT

MicroProteins are small, often single-domain proteins that are sequence-related to larger, often multidomain proteins. Here, we used a combination of comparative genomics and heterologous synthetic misexpression to isolate functional cereal microProtein regulators. Our approach identified LITTLE NINJA (LNJ), a microProtein that acts as a modulator of jasmonic acid (JA) signaling. Ectopic expression of LNJ in Arabidopsis resulted in stunted plants that resembled the decuple JAZ (jazD) mutant. In fact, comparing the transcriptomes of transgenic LNJ overexpressor plants and jazD revealed a large overlap of deregulated genes, suggesting that ectopic LNJ expression altered JA signaling. Transgenic Brachypodium plants with elevated LNJ expression levels showed deregulation of JA signaling as well and displayed reduced growth and enhanced production of side shoots (tiller). This tillering effect was transferable between grass species, and overexpression of LNJ in barley and rice caused similar traits. We used a clustered regularly interspaced short palindromic repeats (CRISPR) approach and created a LNJ-like protein in Arabidopsis by deleting parts of the coding sentence of the AFP2 gene that encodes a NINJA-domain protein. These afp2-crispr mutants were also stunted in size and resembled jazD Thus, similar genome-engineering approaches can be exploited as a future tool to create LNJ proteins and produce cereals with altered architectures.


Subject(s)
Arabidopsis/metabolism , Cyclopentanes/pharmacology , Gene Expression Regulation, Plant , Hordeum/metabolism , Oryza/metabolism , Oxylipins/pharmacology , Plant Proteins/classification , Plant Proteins/metabolism , Arabidopsis/drug effects , Arabidopsis/genetics , Gene Expression Profiling , Hordeum/drug effects , Hordeum/genetics , Oryza/drug effects , Oryza/genetics , Plant Growth Regulators/pharmacology , Plant Proteins/genetics , Plants, Genetically Modified , Protein Isoforms , Repressor Proteins/genetics , Repressor Proteins/metabolism , Signal Transduction
5.
Trends Genet ; 35(3): 186-198, 2019 03.
Article in English | MEDLINE | ID: mdl-30606460

ABSTRACT

The translatome can be defined as the sum of the RNA sequences that are translated into proteins in the cell by the ribosomal machinery. Until recently, it was generally assumed that the translatome was essentially restricted to evolutionary conserved proteins encoded by the set of annotated protein-coding genes. However, it has become increasingly clear that it also includes small regulatory open reading frames (ORFs), functional micropeptides, de novo proteins, and the pervasive translation of likely nonfunctional proteins. Many of these ORFs have been discovered thanks to the development of ribosome profiling, a technique to sequence ribosome-protected RNA fragments. To fully capture the diversity of translated ORFs, we propose a comprehensive classification that includes the new types of translated ORFs in addition to standard proteins.


Subject(s)
Evolution, Molecular , Open Reading Frames/genetics , Protein Biosynthesis , RNA/genetics , Computational Biology , Conserved Sequence/genetics , Gene Expression Regulation/genetics , Ribosomes/genetics
6.
Br J Cancer ; 127(2): 313-320, 2022 07.
Article in English | MEDLINE | ID: mdl-35449454

ABSTRACT

BACKGROUND: Molecular subtyping of bladder cancer has revealed luminal tumors generally have a more favourable prognosis. However, some aggressive forms of variant histology, including micropapillary, are often classified luminal. In previous work, we found long non-coding RNA (lncRNA) expression profiles could identify a subgroup of luminal bladder tumors with less aggressive biology and better outcomes. OBJECTIVE: In the present study, we aimed to investigate whether lncRNA expression profiles could identify high-grade T1 micropapillary bladder cancer with differential outcome. DESIGN, SETTING, AND PARTICIPANTS: LncRNAs were quantified from RNA-seq data from a HGT1 bladder cancer cohort that was enriched for primary micropapillary cases (15/84). Unsupervised consensus clustering of variant lncRNAs identified a three-cluster solution, which was further characterised using a panel of micropapillary-associated biomarkers, molecular subtypes, gene signatures, and survival analysis. A single-sample genomic signature was trained using lasso-penalized logistic regression to classify micropapillary-like gene-expression, as characterised by lncRNA clustering. The genomic classifier (GC) was tested on luminal tumors derived from the TCGA cohort (N = 202). OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS: Patient and tumor characteristics were compared between subgroups by using X2 tests and two-sided Wilcoxon rank-sum tests. Primary endpoints were overall, progression-free and high-grade recurrence-free survival, calculated as the date of high-grade T1 disease at TURBT till date of death from any cause, progression, or recurrence, respectively. Survival rates were estimated using weighted Kaplan-Meier (KM) curves. RESULTS AND LIMITATIONS: Primary micropapillary HGT1 showed decreased FGFR3, SHH, and p53 pathway activity relative to tumors with conventional urothelial carcinoma. Many bladder cancer-associated lncRNAs were downregulated in micropapillary tumors, including UCA1, LINC00152, and MALAT1. Unsupervised consensus clustering resulted in a lncRNA cluster 1 (LC1) with worse prognosis that was enriched for primary micropapillary histology and the Luminal Unstable (LumU) molecular subtype. Interestingly, LC1 appeared to better identify aggressive HGT1 disease, compared to stratifying outcomes using primary histologic characteristics. A signature trained to identify LC1 cases showed good performance in the testing cohort, identifying seven cases with significantly worse survival (p < 0.001). Limitations include the retrospective nature of the study and the lack of a validation cohort. CONCLUSIONS: Using the lncRNA transcriptome we identified a subgroup of aggressive HGT1 bladder cancer that was enriched with micropapillary histology. These data suggest that lncRNAs can facilitate the identification of aggressive micropapillary-like tumors, potentially improving patient management.


Subject(s)
Carcinoma, Transitional Cell , RNA, Long Noncoding , Urinary Bladder Neoplasms , Biomarkers, Tumor/analysis , Biomarkers, Tumor/genetics , Carcinoma, Transitional Cell/genetics , Gene Expression Profiling/methods , Humans , Prognosis , RNA, Long Noncoding/genetics , Retrospective Studies , Urinary Bladder Neoplasms/pathology
7.
Exp Cell Res ; 391(1): 111940, 2020 06 01.
Article in English | MEDLINE | ID: mdl-32156600

ABSTRACT

High throughput RNA sequencing techniques have revealed that a large fraction of the genome is transcribed into long non-coding RNAs (lncRNAs). Unlike canonical protein-coding genes, lncRNAs do not contain long open reading frames (ORFs) and tend to be poorly conserved across species. However, many of them contain small ORFs (sORFs) that exhibit translation signatures according to ribosome profiling or proteomics data. These sORFs are a source of putative novel proteins; some of them may confer a selective advantage and be maintained over time, a process known as de novo gene birth. Here we review the mechanisms by which randomly occurring sORFs in lncRNAs can become new functional proteins.


Subject(s)
Evolution, Molecular , Genome , Open Reading Frames , Protein Biosynthesis , RNA, Long Noncoding/genetics , Ribosomes/genetics , Animals , Brain/metabolism , Humans , Liver/metabolism , Male , Molecular Sequence Annotation , Myocardium/metabolism , Organ Specificity , RNA, Long Noncoding/classification , RNA, Long Noncoding/metabolism , Ribosomes/classification , Ribosomes/metabolism , Testis/metabolism , Transcription, Genetic
8.
Mol Biol Evol ; 34(4): 843-856, 2017 04 01.
Article in English | MEDLINE | ID: mdl-28087778

ABSTRACT

Phylostratigraphy is a computational framework for dating the emergence of DNA and protein sequences in a phylogeny. It has been extensively applied to make inferences on patterns of genome evolution, including patterns of disease gene evolution, ontogeny and de novo gene origination. Phylostratigraphy typically relies on BLAST searches along a species tree, but new simulation studies have raised concerns about the ability of BLAST to detect remote homologues and its impact on phylostratigraphic inferences. Here, we re-assessed these simulations. We found that, even with a possible overall BLAST false negative rate between 11-15%, the large majority of sequences assigned to a recent evolutionary origin by phylostratigraphy is unaffected by technical concerns about BLAST. Where the results of the simulations did cast doubt on previously reported findings, we repeated the original analyses but now excluded all questionable sequences. The originally described patterns remained essentially unchanged. These new analyses strongly support phylostratigraphic inferences, including: genes that emerged after the origin of eukaryotes are more likely to be expressed in the ectoderm than in the endoderm or mesoderm in Drosophila, and the de novo emergence of protein-coding genes from non-genic sequences occurs through proto-gene intermediates in yeast. We conclude that BLAST is an appropriate and sufficiently sensitive tool in phylostratigraphic analysis that does not appear to introduce significant biases into evolutionary pattern inferences.


Subject(s)
Computational Biology/methods , Sequence Analysis, DNA/methods , Sequence Analysis, Protein/methods , Animals , Bias , Biological Evolution , Computer Simulation , Drosophila , Evolution, Molecular , Genome , Models, Genetic , Phylogeny , Time Factors
9.
Mol Ecol ; 27(3): 709-722, 2018 02.
Article in English | MEDLINE | ID: mdl-29319912

ABSTRACT

Hibernation is an adaptive strategy some mammals use to survive highly seasonal or unpredictable environments. We present the first investigation on the transcriptomics of hibernation in a natural population of primate hibernators: Crossley's dwarf lemurs (Cheirogaleus crossleyi). Using capture-mark-recapture techniques to track the same animals over a period of 7 months in Madagascar, we used RNA-seq to compare gene expression profiles in white adipose tissue (WAT) during three distinct physiological states. We focus on pathway analysis to assess the biological significance of transcriptional changes in dwarf lemur WAT and, by comparing and contrasting what is known in other model hibernating species, contribute to a broader understanding of genomic contributions of hibernation across Mammalia. The hibernation signature is characterized by a suppression of lipid biosynthesis, pyruvate metabolism and mitochondrial-associated functions, and an accumulation of transcripts encoding ribosomal components and iron-storage proteins. The data support a key role of pyruvate dehydrogenase kinase isoenzyme 4 (PDK4) in regulating the shift in fuel economy during periods of severe food deprivation. This pattern of PDK4 holds true across representative hibernating species from disparate mammalian groups, suggesting that the genetic underpinnings of hibernation may be ancestral to mammals.


Subject(s)
Animals, Wild/genetics , Animals, Wild/physiology , Cheirogaleidae/genetics , Cheirogaleidae/physiology , Hibernation/genetics , Transcriptome/genetics , Animals , Body Temperature , Carbohydrate Metabolism/genetics , Gene Expression Profiling , Iron/metabolism , Lipid Metabolism/genetics , Mitochondria/metabolism , Protein Biosynthesis/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism
10.
PLoS Genet ; 11(12): e1005721, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26720152

ABSTRACT

The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species--human, chimpanzee, macaque, and mouse--and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.


Subject(s)
Evolution, Molecular , Genes , Genome, Human , Pan troglodytes/genetics , Ribonucleoprotein, U1 Small Nuclear/genetics , Animals , Base Sequence , Female , Gene Expression , Humans , Macaca/genetics , Male , Mice , Promoter Regions, Genetic , Regulatory Sequences, Nucleic Acid , Testis/physiology , Transcription Initiation Site
11.
Mol Biol Evol ; 32(9): 2263-72, 2015 Sep.
Article in English | MEDLINE | ID: mdl-25931513

ABSTRACT

The high regulatory complexity of vertebrates has been related to two rounds of whole genome duplication (2R-WGD) that occurred before the divergence of the major vertebrate groups. Following these events, many developmental transcription factors (TFs) were retained in multiple copies and subsequently specialized in diverse functions, whereas others reverted to their singleton state. TFs are known to be generally rich in amino acid repeats or low-complexity regions (LCRs), such as polyalanine or polyglutamine runs, which can evolve rapidly and potentially influence the transcriptional activity of the protein. Here we test the hypothesis that LCRs have played a major role in the diversification of TF gene duplicates. We find that nearly half of the TF gene families originated during the 2R-WGD contains LCRs. The number of gene duplicates with LCRs is 155 out of 550 analyzed (28%), about twice as many as the number of single copy genes with LCRs (15 out of 115, 13%). In addition, duplicated TFs preferentially accumulate certain LCR types, the most prominent of which are alanine repeats. We experimentally test the role of alanine-rich LCRs in two different TF gene families, PHOX2A/PHOX2B and LHX2/LHX9. In both cases, the presence of the alanine-rich LCR in one of the copies (PHOX2B and LHX2) significantly increases the capacity of the TF to activate transcription. Taken together, the results provide strong evidence that LCRs are important driving forces of evolutionary change in duplicated genes.


Subject(s)
LIM-Homeodomain Proteins/genetics , Transcription Factors/genetics , Trinucleotide Repeat Expansion , Animals , Evolution, Molecular , Gene Duplication , Humans , Phylogeny , Transcriptional Activation
12.
BMC Evol Biol ; 15: 218, 2015 Oct 05.
Article in English | MEDLINE | ID: mdl-26438045

ABSTRACT

BACKGROUND: The high density of tandem repeat sequences (satellites) in nematode genomes and the availability of genome sequences from several species in the group offer a unique opportunity to better understand the evolutionary dynamics and the functional role of these sequences. We take advantage of the previously developed SATFIND program to study the satellites in four Caenorhabditis species and investigate these questions. METHODS: The identification and comparison of satellites is carried out in three steps. First we find all the satellites present in each species with the SATFIND program. Each satellite is defined by its length, number of repeats, and repeat sequence. Only satellites with at least ten repeats are considered. In the second step we build satellite families with a newly developed alignment program. Satellite families are defined by a consensus sequence and the number of satellites in the family. Finally we compare the consensus sequence of satellite families in different species. RESULTS: We give a catalog of individual satellites in each species. We have also identified satellite families with a related sequence and compare them in different species. We analyze the turnover of satellites: they increased in size through duplications of fragments of 100-300 bases. It appears that in many cases they have undergone an explosive expansion. In C. elegans we have identified a subset of large satellites that have strong affinity for the centromere protein CENP-A. We have also compared our results with those obtained from other species, including one nematode and three mammals. CONCLUSIONS: Most satellite families found in Caenorhabditis are species-specific; in particular those with long repeats. A subset of these satellites may facilitate the formation of kinetochores in mitosis. Other satellite families in C. elegans are either related to Helitron transposons or to meiotic pairing centers.


Subject(s)
Caenorhabditis/classification , Caenorhabditis/genetics , DNA, Helminth/genetics , Animals , Autoantigens/genetics , Biological Evolution , Caenorhabditis elegans/genetics , Centromere , Centromere Protein A , Chromosomal Proteins, Non-Histone/genetics , DNA, Satellite/genetics , Repetitive Sequences, Nucleic Acid , Species Specificity
13.
Genome Res ; 22(3): 478-85, 2012 Mar.
Article in English | MEDLINE | ID: mdl-22128134

ABSTRACT

Insertions and deletions (indels), together with nucleotide substitutions, are major drivers of sequence evolution. An excess of deletions over insertions in genomic sequences-the so-called deletional bias-has been reported in a wide range of species, including mammals. However, this bias has not been found in the coding sequences of some mammalian species, such as human and mouse. To determine the strength of the deletional bias in mammals, and the influence of mutation and selection, we have quantified indels in both neutrally evolving noncoding sequences and protein-coding sequences, in six mammalian branches: human, macaque, ancestral primate, mouse, rat, and ancestral rodent. The results obtained with an improved algorithm for the placement of insertions in multiple alignments, Prank(+F), indicate that contrary to previous results, the only mammalian branch with a strong deletional bias is the rodent ancestral branch. We estimate that such a bias has resulted in an ~2.5% sequence loss of mammalian syntenic region in the ancestor of the mouse and rat. Further, a comparison of coding and noncoding sequences shows that negative selection is acting more strongly against mutations generating amino acid insertions than against mutations resulting in amino acid deletions. The strength of selection against indels is found to be higher in the rodent branches than in the primate branches, consistent with the larger effective population sizes of the rodents.


Subject(s)
Mammals/genetics , Sequence Deletion , Amino Acid Sequence , Animals , Cattle , Evolution, Molecular , Humans , Macaca mulatta , Mice , Molecular Sequence Data , Mutagenesis, Insertional , Open Reading Frames , RNA, Untranslated , Rats , Rodentia/genetics , Sequence Alignment , Tandem Repeat Sequences
14.
Nucleic Acids Res ; 41(17): 8107-25, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23832230

ABSTRACT

Interferons (IFN) play a pivotal role in innate immunity, orchestrating a cell-intrinsic anti-pathogenic state and stimulating adaptive immune responses. The complex interplay between the primary response to IFNs and its modulation by positive and negative feedback loops is incompletely understood. Here, we implement the combination of high-resolution gene-expression profiling of nascent RNA with translational inhibition of secondary feedback by cycloheximide. Unexpectedly, this approach revealed a prominent role of negative feedback mechanisms during the immediate (≤60 min) IFNα response. In contrast, a more complex picture involving both negative and positive feedback loops was observed on IFNγ treatment. IFNγ-induced repression of genes associated with regulation of gene expression, cellular development, apoptosis and cell growth resulted from cycloheximide-resistant primary IFNγ signalling. In silico promoter analysis revealed significant overrepresentation of SP1/SP3-binding sites and/or GC-rich stretches. Although signal transducer and activator of transcription 1 (STAT1)-binding sites were not overrepresented, repression was lost in absence of STAT1. Interestingly, basal expression of the majority of these IFNγ-repressed genes was dependent on STAT1 in IFN-naïve fibroblasts. Finally, IFNγ-mediated repression was also found to be evident in primary murine macrophages. IFN-repressed genes include negative regulators of innate and stress response, and their decrease may thus aid the establishment of a signalling perceptive milieu.


Subject(s)
Gene Expression Regulation , Interferon-alpha/pharmacology , Interferon-gamma/pharmacology , Promoter Regions, Genetic , Transcription, Genetic , Animals , Cells, Cultured , Computer Simulation , Cycloheximide/pharmacology , Feedback, Physiological , Gene Expression Profiling , Gene Expression Regulation/drug effects , Macrophages/drug effects , Macrophages/metabolism , Mice , NIH 3T3 Cells , Protein Synthesis Inhibitors/pharmacology , Response Elements , STAT1 Transcription Factor/physiology , Thiouridine , Transcription, Genetic/drug effects
15.
BMC Genomics ; 15: 599, 2014 Jul 16.
Article in English | MEDLINE | ID: mdl-25030307

ABSTRACT

BACKGROUND: The recent increase in human polymorphism data, together with the availability of genome sequences from several primate species, provides an unprecedented opportunity to investigate how natural selection has shaped human evolution. RESULTS: We compared human branch-specific substitutions with variation data in the current human population to measure the impact of adaptive evolution on human protein coding genes. The use of single nucleotide polymorphisms (SNPs) with high derived allele frequencies (DAFs) minimized the influence of segregating slightly deleterious mutations and improved the estimation of the number of adaptive sites. Using DAF ≥ 60% we showed that the proportion of adaptive substitutions is 0.2% in the complete gene set. However, the percentage rose to 40% when we focused on genes that are specifically accelerated in the human branch with respect to the chimpanzee branch, or on genes that show signatures of adaptive selection at the codon level by the maximum likelihood based branch-site test. In general, neural genes are enriched in positive selection signatures. Genes with multiple lines of evidence of positive selection include taxilin beta, which is involved in motor nerve regeneration and syntabulin, and is required for the formation of new presynaptic boutons. CONCLUSIONS: We combined several methods to detect adaptive evolution in human coding sequences at a genome-wide level. The use of variation data, in addition to sequence divergence information, uncovered previously undetected positive selection signatures in neural genes.


Subject(s)
Evolution, Molecular , Animals , Gene Frequency , Genetic Linkage , Genome, Human , Humans , Mammals/genetics , Polymorphism, Single Nucleotide , Selection, Genetic/genetics
16.
Mol Biol Evol ; 30(8): 1830-42, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23625888

ABSTRACT

Gene duplication is widely regarded as a major mechanism modeling genome evolution and function. However, the mechanisms that drive the evolution of the two, initially redundant, gene copies are still ill defined. Many gene duplicates experience evolutionary rate acceleration, but the relative contribution of positive selection and random drift to the retention and subsequent evolution of gene duplicates, and for how long the molecular clock may be distorted by these processes, remains unclear. Focusing on rodent genes that duplicated before and after the mouse and rat split, we find significantly increased sequence divergence after duplication in only one of the copies, which in nearly all cases corresponds to the novel daughter copy, independent of the mechanism of duplication. We observe that the evolutionary rate of the accelerated copy, measured as the ratio of nonsynonymous to synonymous substitutions, is on average 5-fold higher in the period spanning 4-12 My after the duplication than it was before the duplication. This increase can be explained, at least in part, by the action of positive selection according to the results of the maximum likelihood-based branch-site test. Subsequently, the rate decelerates until purifying selection completely returns to preduplication levels. Reversion to the original rates has already been accomplished 40.5 My after the duplication event, corresponding to a genetic distance of about 0.28 synonymous substitutions per site. Differences in tissue gene expression patterns parallel those of substitution rates, reinforcing the role of neofunctionalization in explaining the evolution of young gene duplicates.


Subject(s)
Evolution, Molecular , Gene Duplication , Genes, Duplicate , Animals , Chromosomal Position Effects , INDEL Mutation , Mice , Organ Specificity/genetics , Rats , Selection, Genetic
17.
Genome Biol Evol ; 16(7)2024 07 03.
Article in English | MEDLINE | ID: mdl-38934859

ABSTRACT

During evolution, new open reading frames (ORFs) with the potential to give rise to novel proteins continuously emerge. A recent compilation of noncanonical ORFs with translation signatures in humans has identified thousands of cases with a putative de novo origin. However, it is not known which is their distribution in the population. Are they universally translated? Here, we use ribosome profiling data from 65 lymphoblastoid cell lines from individuals of Yoruba origin to investigate this question. We identify 2,587 de novo ORFs translated in at least one of the cell lines. In line with their de novo origin, the encoded proteins tend to be smaller than 100 amino acids and encode positively charged proteins. We observe that the de novo ORFs are more polymorphic in the population than the set of canonical proteins, with a substantial fraction of them being translated in only some of the cell lines. Remarkably, this difference remains significant after controlling for differences in the translation levels. These results suggest that variations in the level translation of de novo ORFs could be a relevant source of intraspecies phenotypic diversity in humans.


Subject(s)
Open Reading Frames , Polymorphism, Genetic , Humans , Protein Biosynthesis , Cell Line , Evolution, Molecular , Ribosomes/genetics , Ribosomes/metabolism
18.
BMC Evol Biol ; 13: 47, 2013 Feb 20.
Article in English | MEDLINE | ID: mdl-23425224

ABSTRACT

BACKGROUND: Proteins are composed of a combination of discrete, well-defined, sequence domains, associated with specific functions that have arisen at different times during evolutionary history. The emergence of novel domains is related to protein functional diversification and adaptation. But currently little is known about how novel domains arise and how they subsequently evolve. RESULTS: To gain insights into the impact of recently emerged domains in protein evolution we have identified all human young protein domains that have emerged in approximately the past 550 million years. We have classified them into vertebrate-specific and mammalian-specific groups, and compared them to older domains. We have found 426 different annotated young domains, totalling 995 domain occurrences, which represent about 12.3% of all human domains. We have observed that 61.3% of them arose in newly formed genes, while the remaining 38.7% are found combined with older domains, and have very likely emerged in the context of a previously existing protein. Young domains are preferentially located at the N-terminus of the protein, indicating that, at least in vertebrates, novel functional sequences often emerge there. Furthermore, young domains show significantly higher non-synonymous to synonymous substitution rates than older domains using human and mouse orthologous sequence comparisons. This is also true when we compare young and old domains located in the same protein, suggesting that recently arisen domains tend to evolve in a less constrained manner than older domains. CONCLUSIONS: We conclude that proteins tend to gain domains over time, becoming progressively longer. We show that many proteins are made of domains of different age, and that the fastest evolving parts correspond to the domains that have been acquired more recently.


Subject(s)
Evolution, Molecular , Protein Structure, Tertiary/genetics , Animals , Genome, Human , Humans , Mammals/genetics , Mice , Sequence Alignment , Sequence Analysis, Protein , Vertebrates/genetics
19.
Mol Biol Evol ; 29(3): 883-6, 2012 Mar.
Article in English | MEDLINE | ID: mdl-22045997

ABSTRACT

Low-complexity sequences are extremely abundant in eukaryotic proteins for reasons that remain unclear. One hypothesis is that they contribute to the formation of novel coding sequences, facilitating the generation of novel protein functions. Here, we test this hypothesis by examining the content of low-complexity sequences in proteins of different age. We show that recently emerged proteins contain more low-complexity sequences than older proteins and that these sequences often form functional domains. These data are consistent with the idea that low-complexity sequences may play a key role in the emergence of novel genes.


Subject(s)
Amino Acid Motifs/genetics , Evolution, Molecular , Models, Genetic , Proteins/genetics , Amino Acid Sequence , Base Composition , Computational Biology , Humans , Phylogeny , Species Specificity
20.
Genome Res ; 20(6): 745-54, 2010 Jun.
Article in English | MEDLINE | ID: mdl-20335526

ABSTRACT

Amino acid tandem repeats are found in a large number of eukaryotic proteins. They are often encoded by trinucleotide repeats and exhibit high intra- and interspecies size variability due to the high mutation rate associated with replication slippage. The extent to which natural selection is important in shaping amino acid repeat evolution is a matter of debate. On one hand, their high frequency may simply reflect their high probability of expansion by slippage, and they could essentially evolve in a neutral manner. On the other hand, there is experimental evidence that changes in repeat size can influence protein-protein interactions, transcriptional activity, or protein subcellular localization, indicating that repeats could be functionally relevant and thus shaped by selection. To gauge the relative contribution of neutral and selective forces in amino acid repeat evolution, we have performed a comparative analysis of amino acid repeat conservation in a large set of orthologous proteins from 12 vertebrate species. As a neutral model of repeat evolution we have used sequences with the same DNA triplet composition as the coding sequences--and thus expected to be subject to the same mutational forces--but located in syntenic noncoding genomic regions. The results strongly indicate that selection has played a more important role than previously suspected in amino acid tandem repeat evolution, by increasing the repeat retention rate and by modulating repeat size. The data obtained in this study have allowed us to identify a set of 92 repeats that are postulated to play important functional roles due to their strong selective signature, including five cases with direct experimental evidence.


Subject(s)
Amino Acids/genetics , Proteins/genetics , Repetitive Sequences, Amino Acid , Selection, Genetic , Amino Acid Sequence , Amino Acids/chemistry , Animals , Humans , Molecular Sequence Data , Proteins/chemistry , Sequence Homology, Amino Acid
SELECTION OF CITATIONS
SEARCH DETAIL