Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 71
Filtrar
1.
Trends Genet ; 39(4): 235-236, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36774242

RESUMO

Genes restricted to a given species or lineage are mysterious. Many emerged de novo from ancestral noncoding genomic regions rather than from pre-existing genes. A new study by Vakirlis and colleagues shows that, in humans, many of these are associated with phenotypic effects, accelerating our understanding of their functional importance.


Assuntos
Evolução Molecular , Hominidae , Animais , Humanos , Genoma , Genômica , Sistemas CRISPR-Cas
2.
Genome Res ; 2022 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-35618415

RESUMO

The unicellular yeast Schizosaccharomyces pombe (fission yeast) retains many of the splicing features observed in humans and is thus an excellent model to study the basic mechanisms of splicing. Nearly half the genes contain introns, but the impact of alternative splicing in gene regulation and proteome diversification remains largely unexplored. Here we leverage Oxford Nanopore Technologies native RNA sequencing (dRNA), as well as ribosome profiling data, to uncover the full range of polyadenylated transcripts and translated open reading frames. We identify 332 alternative isoforms affecting the coding sequences of 262 different genes, 97 of which occur at frequencies higher than 20%, indicating that functional alternative splicing in S. pombe is more prevalent than previously suspected. Intron retention events make about 80% of the cases; these events may be involved in the regulation of gene expression and, in some cases, generate novel protein isoforms, as supported by ribosome profiling data in 18 of the intron retention isoforms. One example is the rpl22 gene, in which intron retention is associated with the translation of a protein of only 13 amino acids. We also find that lowly expressed transcripts tend to have longer poly(A) tails than highly expressed transcripts, highlighting an interdependence between poly(A) tail length and transcript expression level. Finally, we discover 214 novel transcripts that are not annotated, including 158 antisense transcripts, some of which also show translation evidence. The methodologies described in this work open new opportunities to study the regulation of splicing in a simple eukaryotic model.

3.
Mol Biol Evol ; 40(5)2023 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-37139943

RESUMO

The formation of new genes during evolution is an important motor of functional innovation, but the rate at which new genes originate and the likelihood that they persist over longer evolutionary periods are still poorly understood questions. Two important mechanisms by which new genes arise are gene duplication and de novo formation from a previously noncoding sequence. Does the mechanism of formation influence the evolutionary trajectories of the genes? Proteins arisen by gene duplication retain the sequence and structural properties of the parental protein, and thus they may be relatively stable. Instead, de novo originated proteins are often species specific and thought to be more evolutionary labile. Despite these differences, here we show that both types of genes share a number of similarities, including low sequence constraints in their initial evolutionary phases, high turnover rates at the species level, and comparable persistence rates in deeper branchers, in both yeast and flies. In addition, we show that putative de novo proteins have an excess of substitutions between charged amino acids compared with the neutral expectation, which is reflected in the rapid loss of their initial highly basic character. The study supports high evolutionary dynamics of different kinds of new genes at the species level, in sharp contrast with the stability observed at later stages.


Assuntos
Evolução Molecular , Proteínas , Proteínas/genética , Duplicação Gênica , Saccharomyces cerevisiae/genética , Filogenia
4.
Proc Natl Acad Sci U S A ; 117(42): 26197-26205, 2020 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-33033229

RESUMO

MicroProteins are small, often single-domain proteins that are sequence-related to larger, often multidomain proteins. Here, we used a combination of comparative genomics and heterologous synthetic misexpression to isolate functional cereal microProtein regulators. Our approach identified LITTLE NINJA (LNJ), a microProtein that acts as a modulator of jasmonic acid (JA) signaling. Ectopic expression of LNJ in Arabidopsis resulted in stunted plants that resembled the decuple JAZ (jazD) mutant. In fact, comparing the transcriptomes of transgenic LNJ overexpressor plants and jazD revealed a large overlap of deregulated genes, suggesting that ectopic LNJ expression altered JA signaling. Transgenic Brachypodium plants with elevated LNJ expression levels showed deregulation of JA signaling as well and displayed reduced growth and enhanced production of side shoots (tiller). This tillering effect was transferable between grass species, and overexpression of LNJ in barley and rice caused similar traits. We used a clustered regularly interspaced short palindromic repeats (CRISPR) approach and created a LNJ-like protein in Arabidopsis by deleting parts of the coding sentence of the AFP2 gene that encodes a NINJA-domain protein. These afp2-crispr mutants were also stunted in size and resembled jazD Thus, similar genome-engineering approaches can be exploited as a future tool to create LNJ proteins and produce cereals with altered architectures.


Assuntos
Arabidopsis/metabolismo , Ciclopentanos/farmacologia , Regulação da Expressão Gênica de Plantas , Hordeum/metabolismo , Oryza/metabolismo , Oxilipinas/farmacologia , Proteínas de Plantas/classificação , Proteínas de Plantas/metabolismo , Arabidopsis/efeitos dos fármacos , Arabidopsis/genética , Perfilação da Expressão Gênica , Hordeum/efeitos dos fármacos , Hordeum/genética , Oryza/efeitos dos fármacos , Oryza/genética , Reguladores de Crescimento de Plantas/farmacologia , Proteínas de Plantas/genética , Plantas Geneticamente Modificadas , Isoformas de Proteínas , Proteínas Repressoras/genética , Proteínas Repressoras/metabolismo , Transdução de Sinais
5.
Trends Genet ; 35(3): 186-198, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30606460

RESUMO

The translatome can be defined as the sum of the RNA sequences that are translated into proteins in the cell by the ribosomal machinery. Until recently, it was generally assumed that the translatome was essentially restricted to evolutionary conserved proteins encoded by the set of annotated protein-coding genes. However, it has become increasingly clear that it also includes small regulatory open reading frames (ORFs), functional micropeptides, de novo proteins, and the pervasive translation of likely nonfunctional proteins. Many of these ORFs have been discovered thanks to the development of ribosome profiling, a technique to sequence ribosome-protected RNA fragments. To fully capture the diversity of translated ORFs, we propose a comprehensive classification that includes the new types of translated ORFs in addition to standard proteins.


Assuntos
Evolução Molecular , Fases de Leitura Aberta/genética , Biossíntese de Proteínas , RNA/genética , Biologia Computacional , Sequência Conservada/genética , Regulação da Expressão Gênica/genética , Ribossomos/genética
6.
Br J Cancer ; 127(2): 313-320, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35449454

RESUMO

BACKGROUND: Molecular subtyping of bladder cancer has revealed luminal tumors generally have a more favourable prognosis. However, some aggressive forms of variant histology, including micropapillary, are often classified luminal. In previous work, we found long non-coding RNA (lncRNA) expression profiles could identify a subgroup of luminal bladder tumors with less aggressive biology and better outcomes. OBJECTIVE: In the present study, we aimed to investigate whether lncRNA expression profiles could identify high-grade T1 micropapillary bladder cancer with differential outcome. DESIGN, SETTING, AND PARTICIPANTS: LncRNAs were quantified from RNA-seq data from a HGT1 bladder cancer cohort that was enriched for primary micropapillary cases (15/84). Unsupervised consensus clustering of variant lncRNAs identified a three-cluster solution, which was further characterised using a panel of micropapillary-associated biomarkers, molecular subtypes, gene signatures, and survival analysis. A single-sample genomic signature was trained using lasso-penalized logistic regression to classify micropapillary-like gene-expression, as characterised by lncRNA clustering. The genomic classifier (GC) was tested on luminal tumors derived from the TCGA cohort (N = 202). OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS: Patient and tumor characteristics were compared between subgroups by using X2 tests and two-sided Wilcoxon rank-sum tests. Primary endpoints were overall, progression-free and high-grade recurrence-free survival, calculated as the date of high-grade T1 disease at TURBT till date of death from any cause, progression, or recurrence, respectively. Survival rates were estimated using weighted Kaplan-Meier (KM) curves. RESULTS AND LIMITATIONS: Primary micropapillary HGT1 showed decreased FGFR3, SHH, and p53 pathway activity relative to tumors with conventional urothelial carcinoma. Many bladder cancer-associated lncRNAs were downregulated in micropapillary tumors, including UCA1, LINC00152, and MALAT1. Unsupervised consensus clustering resulted in a lncRNA cluster 1 (LC1) with worse prognosis that was enriched for primary micropapillary histology and the Luminal Unstable (LumU) molecular subtype. Interestingly, LC1 appeared to better identify aggressive HGT1 disease, compared to stratifying outcomes using primary histologic characteristics. A signature trained to identify LC1 cases showed good performance in the testing cohort, identifying seven cases with significantly worse survival (p < 0.001). Limitations include the retrospective nature of the study and the lack of a validation cohort. CONCLUSIONS: Using the lncRNA transcriptome we identified a subgroup of aggressive HGT1 bladder cancer that was enriched with micropapillary histology. These data suggest that lncRNAs can facilitate the identification of aggressive micropapillary-like tumors, potentially improving patient management.


Assuntos
Carcinoma de Células de Transição , RNA Longo não Codificante , Neoplasias da Bexiga Urinária , Biomarcadores Tumorais/análise , Biomarcadores Tumorais/genética , Carcinoma de Células de Transição/genética , Perfilação da Expressão Gênica/métodos , Humanos , Prognóstico , RNA Longo não Codificante/genética , Estudos Retrospectivos , Neoplasias da Bexiga Urinária/patologia
7.
Exp Cell Res ; 391(1): 111940, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32156600

RESUMO

High throughput RNA sequencing techniques have revealed that a large fraction of the genome is transcribed into long non-coding RNAs (lncRNAs). Unlike canonical protein-coding genes, lncRNAs do not contain long open reading frames (ORFs) and tend to be poorly conserved across species. However, many of them contain small ORFs (sORFs) that exhibit translation signatures according to ribosome profiling or proteomics data. These sORFs are a source of putative novel proteins; some of them may confer a selective advantage and be maintained over time, a process known as de novo gene birth. Here we review the mechanisms by which randomly occurring sORFs in lncRNAs can become new functional proteins.


Assuntos
Evolução Molecular , Genoma , Fases de Leitura Aberta , Biossíntese de Proteínas , RNA Longo não Codificante/genética , Ribossomos/genética , Animais , Encéfalo/metabolismo , Humanos , Fígado/metabolismo , Masculino , Anotação de Sequência Molecular , Miocárdio/metabolismo , Especificidade de Órgãos , RNA Longo não Codificante/classificação , RNA Longo não Codificante/metabolismo , Ribossomos/classificação , Ribossomos/metabolismo , Testículo/metabolismo , Transcrição Gênica
8.
Mol Biol Evol ; 34(4): 843-856, 2017 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-28087778

RESUMO

Phylostratigraphy is a computational framework for dating the emergence of DNA and protein sequences in a phylogeny. It has been extensively applied to make inferences on patterns of genome evolution, including patterns of disease gene evolution, ontogeny and de novo gene origination. Phylostratigraphy typically relies on BLAST searches along a species tree, but new simulation studies have raised concerns about the ability of BLAST to detect remote homologues and its impact on phylostratigraphic inferences. Here, we re-assessed these simulations. We found that, even with a possible overall BLAST false negative rate between 11-15%, the large majority of sequences assigned to a recent evolutionary origin by phylostratigraphy is unaffected by technical concerns about BLAST. Where the results of the simulations did cast doubt on previously reported findings, we repeated the original analyses but now excluded all questionable sequences. The originally described patterns remained essentially unchanged. These new analyses strongly support phylostratigraphic inferences, including: genes that emerged after the origin of eukaryotes are more likely to be expressed in the ectoderm than in the endoderm or mesoderm in Drosophila, and the de novo emergence of protein-coding genes from non-genic sequences occurs through proto-gene intermediates in yeast. We conclude that BLAST is an appropriate and sufficiently sensitive tool in phylostratigraphic analysis that does not appear to introduce significant biases into evolutionary pattern inferences.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Animais , Viés , Evolução Biológica , Simulação por Computador , Drosophila , Evolução Molecular , Genoma , Modelos Genéticos , Filogenia , Fatores de Tempo
9.
Mol Ecol ; 27(3): 709-722, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29319912

RESUMO

Hibernation is an adaptive strategy some mammals use to survive highly seasonal or unpredictable environments. We present the first investigation on the transcriptomics of hibernation in a natural population of primate hibernators: Crossley's dwarf lemurs (Cheirogaleus crossleyi). Using capture-mark-recapture techniques to track the same animals over a period of 7 months in Madagascar, we used RNA-seq to compare gene expression profiles in white adipose tissue (WAT) during three distinct physiological states. We focus on pathway analysis to assess the biological significance of transcriptional changes in dwarf lemur WAT and, by comparing and contrasting what is known in other model hibernating species, contribute to a broader understanding of genomic contributions of hibernation across Mammalia. The hibernation signature is characterized by a suppression of lipid biosynthesis, pyruvate metabolism and mitochondrial-associated functions, and an accumulation of transcripts encoding ribosomal components and iron-storage proteins. The data support a key role of pyruvate dehydrogenase kinase isoenzyme 4 (PDK4) in regulating the shift in fuel economy during periods of severe food deprivation. This pattern of PDK4 holds true across representative hibernating species from disparate mammalian groups, suggesting that the genetic underpinnings of hibernation may be ancestral to mammals.


Assuntos
Animais Selvagens/genética , Animais Selvagens/fisiologia , Cheirogaleidae/genética , Cheirogaleidae/fisiologia , Hibernação/genética , Transcriptoma/genética , Animais , Temperatura Corporal , Metabolismo dos Carboidratos/genética , Perfilação da Expressão Gênica , Ferro/metabolismo , Metabolismo dos Lipídeos/genética , Mitocôndrias/metabolismo , Biossíntese de Proteínas/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo
10.
PLoS Genet ; 11(12): e1005721, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26720152

RESUMO

The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species--human, chimpanzee, macaque, and mouse--and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.


Assuntos
Evolução Molecular , Genes , Genoma Humano , Pan troglodytes/genética , Ribonucleoproteína Nuclear Pequena U1/genética , Animais , Sequência de Bases , Feminino , Expressão Gênica , Humanos , Macaca/genética , Masculino , Camundongos , Regiões Promotoras Genéticas , Sequências Reguladoras de Ácido Nucleico , Testículo/fisiologia , Sítio de Iniciação de Transcrição
11.
Mol Biol Evol ; 32(9): 2263-72, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25931513

RESUMO

The high regulatory complexity of vertebrates has been related to two rounds of whole genome duplication (2R-WGD) that occurred before the divergence of the major vertebrate groups. Following these events, many developmental transcription factors (TFs) were retained in multiple copies and subsequently specialized in diverse functions, whereas others reverted to their singleton state. TFs are known to be generally rich in amino acid repeats or low-complexity regions (LCRs), such as polyalanine or polyglutamine runs, which can evolve rapidly and potentially influence the transcriptional activity of the protein. Here we test the hypothesis that LCRs have played a major role in the diversification of TF gene duplicates. We find that nearly half of the TF gene families originated during the 2R-WGD contains LCRs. The number of gene duplicates with LCRs is 155 out of 550 analyzed (28%), about twice as many as the number of single copy genes with LCRs (15 out of 115, 13%). In addition, duplicated TFs preferentially accumulate certain LCR types, the most prominent of which are alanine repeats. We experimentally test the role of alanine-rich LCRs in two different TF gene families, PHOX2A/PHOX2B and LHX2/LHX9. In both cases, the presence of the alanine-rich LCR in one of the copies (PHOX2B and LHX2) significantly increases the capacity of the TF to activate transcription. Taken together, the results provide strong evidence that LCRs are important driving forces of evolutionary change in duplicated genes.


Assuntos
Proteínas com Homeodomínio LIM/genética , Fatores de Transcrição/genética , Expansão das Repetições de Trinucleotídeos , Animais , Evolução Molecular , Duplicação Gênica , Humanos , Filogenia , Ativação Transcricional
12.
BMC Evol Biol ; 15: 218, 2015 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-26438045

RESUMO

BACKGROUND: The high density of tandem repeat sequences (satellites) in nematode genomes and the availability of genome sequences from several species in the group offer a unique opportunity to better understand the evolutionary dynamics and the functional role of these sequences. We take advantage of the previously developed SATFIND program to study the satellites in four Caenorhabditis species and investigate these questions. METHODS: The identification and comparison of satellites is carried out in three steps. First we find all the satellites present in each species with the SATFIND program. Each satellite is defined by its length, number of repeats, and repeat sequence. Only satellites with at least ten repeats are considered. In the second step we build satellite families with a newly developed alignment program. Satellite families are defined by a consensus sequence and the number of satellites in the family. Finally we compare the consensus sequence of satellite families in different species. RESULTS: We give a catalog of individual satellites in each species. We have also identified satellite families with a related sequence and compare them in different species. We analyze the turnover of satellites: they increased in size through duplications of fragments of 100-300 bases. It appears that in many cases they have undergone an explosive expansion. In C. elegans we have identified a subset of large satellites that have strong affinity for the centromere protein CENP-A. We have also compared our results with those obtained from other species, including one nematode and three mammals. CONCLUSIONS: Most satellite families found in Caenorhabditis are species-specific; in particular those with long repeats. A subset of these satellites may facilitate the formation of kinetochores in mitosis. Other satellite families in C. elegans are either related to Helitron transposons or to meiotic pairing centers.


Assuntos
Caenorhabditis/classificação , Caenorhabditis/genética , DNA de Helmintos/genética , Animais , Autoantígenos/genética , Evolução Biológica , Caenorhabditis elegans/genética , Centrômero , Proteína Centromérica A , Proteínas Cromossômicas não Histona/genética , DNA Satélite/genética , Sequências Repetitivas de Ácido Nucleico , Especificidade da Espécie
13.
Genome Res ; 22(3): 478-85, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22128134

RESUMO

Insertions and deletions (indels), together with nucleotide substitutions, are major drivers of sequence evolution. An excess of deletions over insertions in genomic sequences-the so-called deletional bias-has been reported in a wide range of species, including mammals. However, this bias has not been found in the coding sequences of some mammalian species, such as human and mouse. To determine the strength of the deletional bias in mammals, and the influence of mutation and selection, we have quantified indels in both neutrally evolving noncoding sequences and protein-coding sequences, in six mammalian branches: human, macaque, ancestral primate, mouse, rat, and ancestral rodent. The results obtained with an improved algorithm for the placement of insertions in multiple alignments, Prank(+F), indicate that contrary to previous results, the only mammalian branch with a strong deletional bias is the rodent ancestral branch. We estimate that such a bias has resulted in an ~2.5% sequence loss of mammalian syntenic region in the ancestor of the mouse and rat. Further, a comparison of coding and noncoding sequences shows that negative selection is acting more strongly against mutations generating amino acid insertions than against mutations resulting in amino acid deletions. The strength of selection against indels is found to be higher in the rodent branches than in the primate branches, consistent with the larger effective population sizes of the rodents.


Assuntos
Mamíferos/genética , Deleção de Sequência , Sequência de Aminoácidos , Animais , Bovinos , Evolução Molecular , Humanos , Macaca mulatta , Camundongos , Dados de Sequência Molecular , Mutagênese Insercional , Fases de Leitura Aberta , RNA não Traduzido , Ratos , Roedores/genética , Alinhamento de Sequência , Sequências de Repetição em Tandem
14.
Nucleic Acids Res ; 41(17): 8107-25, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23832230

RESUMO

Interferons (IFN) play a pivotal role in innate immunity, orchestrating a cell-intrinsic anti-pathogenic state and stimulating adaptive immune responses. The complex interplay between the primary response to IFNs and its modulation by positive and negative feedback loops is incompletely understood. Here, we implement the combination of high-resolution gene-expression profiling of nascent RNA with translational inhibition of secondary feedback by cycloheximide. Unexpectedly, this approach revealed a prominent role of negative feedback mechanisms during the immediate (≤60 min) IFNα response. In contrast, a more complex picture involving both negative and positive feedback loops was observed on IFNγ treatment. IFNγ-induced repression of genes associated with regulation of gene expression, cellular development, apoptosis and cell growth resulted from cycloheximide-resistant primary IFNγ signalling. In silico promoter analysis revealed significant overrepresentation of SP1/SP3-binding sites and/or GC-rich stretches. Although signal transducer and activator of transcription 1 (STAT1)-binding sites were not overrepresented, repression was lost in absence of STAT1. Interestingly, basal expression of the majority of these IFNγ-repressed genes was dependent on STAT1 in IFN-naïve fibroblasts. Finally, IFNγ-mediated repression was also found to be evident in primary murine macrophages. IFN-repressed genes include negative regulators of innate and stress response, and their decrease may thus aid the establishment of a signalling perceptive milieu.


Assuntos
Regulação da Expressão Gênica , Interferon-alfa/farmacologia , Interferon gama/farmacologia , Regiões Promotoras Genéticas , Transcrição Gênica , Animais , Células Cultivadas , Simulação por Computador , Cicloeximida/farmacologia , Retroalimentação Fisiológica , Perfilação da Expressão Gênica , Regulação da Expressão Gênica/efeitos dos fármacos , Macrófagos/efeitos dos fármacos , Macrófagos/metabolismo , Camundongos , Células NIH 3T3 , Inibidores da Síntese de Proteínas/farmacologia , Elementos de Resposta , Fator de Transcrição STAT1/fisiologia , Tiouridina , Transcrição Gênica/efeitos dos fármacos
15.
BMC Genomics ; 15: 599, 2014 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-25030307

RESUMO

BACKGROUND: The recent increase in human polymorphism data, together with the availability of genome sequences from several primate species, provides an unprecedented opportunity to investigate how natural selection has shaped human evolution. RESULTS: We compared human branch-specific substitutions with variation data in the current human population to measure the impact of adaptive evolution on human protein coding genes. The use of single nucleotide polymorphisms (SNPs) with high derived allele frequencies (DAFs) minimized the influence of segregating slightly deleterious mutations and improved the estimation of the number of adaptive sites. Using DAF ≥ 60% we showed that the proportion of adaptive substitutions is 0.2% in the complete gene set. However, the percentage rose to 40% when we focused on genes that are specifically accelerated in the human branch with respect to the chimpanzee branch, or on genes that show signatures of adaptive selection at the codon level by the maximum likelihood based branch-site test. In general, neural genes are enriched in positive selection signatures. Genes with multiple lines of evidence of positive selection include taxilin beta, which is involved in motor nerve regeneration and syntabulin, and is required for the formation of new presynaptic boutons. CONCLUSIONS: We combined several methods to detect adaptive evolution in human coding sequences at a genome-wide level. The use of variation data, in addition to sequence divergence information, uncovered previously undetected positive selection signatures in neural genes.


Assuntos
Evolução Molecular , Animais , Frequência do Gene , Ligação Genética , Genoma Humano , Humanos , Mamíferos/genética , Polimorfismo de Nucleotídeo Único , Seleção Genética/genética
16.
Mol Biol Evol ; 30(8): 1830-42, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23625888

RESUMO

Gene duplication is widely regarded as a major mechanism modeling genome evolution and function. However, the mechanisms that drive the evolution of the two, initially redundant, gene copies are still ill defined. Many gene duplicates experience evolutionary rate acceleration, but the relative contribution of positive selection and random drift to the retention and subsequent evolution of gene duplicates, and for how long the molecular clock may be distorted by these processes, remains unclear. Focusing on rodent genes that duplicated before and after the mouse and rat split, we find significantly increased sequence divergence after duplication in only one of the copies, which in nearly all cases corresponds to the novel daughter copy, independent of the mechanism of duplication. We observe that the evolutionary rate of the accelerated copy, measured as the ratio of nonsynonymous to synonymous substitutions, is on average 5-fold higher in the period spanning 4-12 My after the duplication than it was before the duplication. This increase can be explained, at least in part, by the action of positive selection according to the results of the maximum likelihood-based branch-site test. Subsequently, the rate decelerates until purifying selection completely returns to preduplication levels. Reversion to the original rates has already been accomplished 40.5 My after the duplication event, corresponding to a genetic distance of about 0.28 synonymous substitutions per site. Differences in tissue gene expression patterns parallel those of substitution rates, reinforcing the role of neofunctionalization in explaining the evolution of young gene duplicates.


Assuntos
Evolução Molecular , Duplicação Gênica , Genes Duplicados , Animais , Efeitos da Posição Cromossômica , Mutação INDEL , Camundongos , Especificidade de Órgãos/genética , Ratos , Seleção Genética
17.
Genome Biol Evol ; 16(7)2024 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-38934859

RESUMO

During evolution, new open reading frames (ORFs) with the potential to give rise to novel proteins continuously emerge. A recent compilation of noncanonical ORFs with translation signatures in humans has identified thousands of cases with a putative de novo origin. However, it is not known which is their distribution in the population. Are they universally translated? Here, we use ribosome profiling data from 65 lymphoblastoid cell lines from individuals of Yoruba origin to investigate this question. We identify 2,587 de novo ORFs translated in at least one of the cell lines. In line with their de novo origin, the encoded proteins tend to be smaller than 100 amino acids and encode positively charged proteins. We observe that the de novo ORFs are more polymorphic in the population than the set of canonical proteins, with a substantial fraction of them being translated in only some of the cell lines. Remarkably, this difference remains significant after controlling for differences in the translation levels. These results suggest that variations in the level translation of de novo ORFs could be a relevant source of intraspecies phenotypic diversity in humans.


Assuntos
Fases de Leitura Aberta , Polimorfismo Genético , Humanos , Biossíntese de Proteínas , Linhagem Celular , Evolução Molecular , Ribossomos/genética , Ribossomos/metabolismo
18.
Sci Adv ; 10(28): eadn3628, 2024 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-38985879

RESUMO

The expression of tumor-specific antigens during cancer progression can trigger an immune response against the tumor. Here, we investigate if microproteins encoded by noncanonical open reading frames (ncORFs) are a relevant source of tumor-specific antigens. We analyze RNA sequencing data from 117 hepatocellular carcinoma (HCC) tumors and matched healthy tissue together with ribosome profiling and immunopeptidomics data. Combining human leukocyte antigen-epitope binding predictions and experimental validation experiments, we conclude that around 40% of the tumor-specific antigens in HCC are likely to be derived from ncORFs, including two peptides that can trigger an immune response in humanized mice. We identify a subset of 33 tumor-specific long noncoding RNAs expressing novel cancer antigens shared by more than 10% of the HCC samples analyzed, which, when combined, cover a large proportion of the patients. The results of the study open avenues for extending the range of anticancer vaccines.


Assuntos
Antígenos de Neoplasias , Carcinoma Hepatocelular , Neoplasias Hepáticas , Fases de Leitura Aberta , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/imunologia , Antígenos de Neoplasias/genética , Antígenos de Neoplasias/imunologia , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/imunologia , Animais , Camundongos , Estudos de Coortes , RNA Longo não Codificante/genética , Regulação Neoplásica da Expressão Gênica , Micropeptídeos
19.
BMC Evol Biol ; 13: 47, 2013 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-23425224

RESUMO

BACKGROUND: Proteins are composed of a combination of discrete, well-defined, sequence domains, associated with specific functions that have arisen at different times during evolutionary history. The emergence of novel domains is related to protein functional diversification and adaptation. But currently little is known about how novel domains arise and how they subsequently evolve. RESULTS: To gain insights into the impact of recently emerged domains in protein evolution we have identified all human young protein domains that have emerged in approximately the past 550 million years. We have classified them into vertebrate-specific and mammalian-specific groups, and compared them to older domains. We have found 426 different annotated young domains, totalling 995 domain occurrences, which represent about 12.3% of all human domains. We have observed that 61.3% of them arose in newly formed genes, while the remaining 38.7% are found combined with older domains, and have very likely emerged in the context of a previously existing protein. Young domains are preferentially located at the N-terminus of the protein, indicating that, at least in vertebrates, novel functional sequences often emerge there. Furthermore, young domains show significantly higher non-synonymous to synonymous substitution rates than older domains using human and mouse orthologous sequence comparisons. This is also true when we compare young and old domains located in the same protein, suggesting that recently arisen domains tend to evolve in a less constrained manner than older domains. CONCLUSIONS: We conclude that proteins tend to gain domains over time, becoming progressively longer. We show that many proteins are made of domains of different age, and that the fastest evolving parts correspond to the domains that have been acquired more recently.


Assuntos
Evolução Molecular , Estrutura Terciária de Proteína/genética , Animais , Genoma Humano , Humanos , Mamíferos/genética , Camundongos , Alinhamento de Sequência , Análise de Sequência de Proteína , Vertebrados/genética
20.
Mol Biol Evol ; 29(3): 883-6, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22045997

RESUMO

Low-complexity sequences are extremely abundant in eukaryotic proteins for reasons that remain unclear. One hypothesis is that they contribute to the formation of novel coding sequences, facilitating the generation of novel protein functions. Here, we test this hypothesis by examining the content of low-complexity sequences in proteins of different age. We show that recently emerged proteins contain more low-complexity sequences than older proteins and that these sequences often form functional domains. These data are consistent with the idea that low-complexity sequences may play a key role in the emergence of novel genes.


Assuntos
Motivos de Aminoácidos/genética , Evolução Molecular , Modelos Genéticos , Proteínas/genética , Sequência de Aminoácidos , Composição de Bases , Biologia Computacional , Humanos , Filogenia , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA