Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 72
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Trends Genet ; 39(4): 235-236, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36774242

RESUMEN

Genes restricted to a given species or lineage are mysterious. Many emerged de novo from ancestral noncoding genomic regions rather than from pre-existing genes. A new study by Vakirlis and colleagues shows that, in humans, many of these are associated with phenotypic effects, accelerating our understanding of their functional importance.


Asunto(s)
Evolución Molecular , Hominidae , Animales , Humanos , Genoma , Genómica , Sistemas CRISPR-Cas
2.
Genome Res ; 2022 May 26.
Artículo en Inglés | MEDLINE | ID: mdl-35618415

RESUMEN

The unicellular yeast Schizosaccharomyces pombe (fission yeast) retains many of the splicing features observed in humans and is thus an excellent model to study the basic mechanisms of splicing. Nearly half the genes contain introns, but the impact of alternative splicing in gene regulation and proteome diversification remains largely unexplored. Here we leverage Oxford Nanopore Technologies native RNA sequencing (dRNA), as well as ribosome profiling data, to uncover the full range of polyadenylated transcripts and translated open reading frames. We identify 332 alternative isoforms affecting the coding sequences of 262 different genes, 97 of which occur at frequencies higher than 20%, indicating that functional alternative splicing in S. pombe is more prevalent than previously suspected. Intron retention events make about 80% of the cases; these events may be involved in the regulation of gene expression and, in some cases, generate novel protein isoforms, as supported by ribosome profiling data in 18 of the intron retention isoforms. One example is the rpl22 gene, in which intron retention is associated with the translation of a protein of only 13 amino acids. We also find that lowly expressed transcripts tend to have longer poly(A) tails than highly expressed transcripts, highlighting an interdependence between poly(A) tail length and transcript expression level. Finally, we discover 214 novel transcripts that are not annotated, including 158 antisense transcripts, some of which also show translation evidence. The methodologies described in this work open new opportunities to study the regulation of splicing in a simple eukaryotic model.

3.
Mol Biol Evol ; 40(5)2023 05 02.
Artículo en Inglés | MEDLINE | ID: mdl-37139943

RESUMEN

The formation of new genes during evolution is an important motor of functional innovation, but the rate at which new genes originate and the likelihood that they persist over longer evolutionary periods are still poorly understood questions. Two important mechanisms by which new genes arise are gene duplication and de novo formation from a previously noncoding sequence. Does the mechanism of formation influence the evolutionary trajectories of the genes? Proteins arisen by gene duplication retain the sequence and structural properties of the parental protein, and thus they may be relatively stable. Instead, de novo originated proteins are often species specific and thought to be more evolutionary labile. Despite these differences, here we show that both types of genes share a number of similarities, including low sequence constraints in their initial evolutionary phases, high turnover rates at the species level, and comparable persistence rates in deeper branchers, in both yeast and flies. In addition, we show that putative de novo proteins have an excess of substitutions between charged amino acids compared with the neutral expectation, which is reflected in the rapid loss of their initial highly basic character. The study supports high evolutionary dynamics of different kinds of new genes at the species level, in sharp contrast with the stability observed at later stages.


Asunto(s)
Evolución Molecular , Proteínas , Proteínas/genética , Duplicación de Gen , Saccharomyces cerevisiae/genética , Filogenia
4.
Proc Natl Acad Sci U S A ; 117(42): 26197-26205, 2020 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-33033229

RESUMEN

MicroProteins are small, often single-domain proteins that are sequence-related to larger, often multidomain proteins. Here, we used a combination of comparative genomics and heterologous synthetic misexpression to isolate functional cereal microProtein regulators. Our approach identified LITTLE NINJA (LNJ), a microProtein that acts as a modulator of jasmonic acid (JA) signaling. Ectopic expression of LNJ in Arabidopsis resulted in stunted plants that resembled the decuple JAZ (jazD) mutant. In fact, comparing the transcriptomes of transgenic LNJ overexpressor plants and jazD revealed a large overlap of deregulated genes, suggesting that ectopic LNJ expression altered JA signaling. Transgenic Brachypodium plants with elevated LNJ expression levels showed deregulation of JA signaling as well and displayed reduced growth and enhanced production of side shoots (tiller). This tillering effect was transferable between grass species, and overexpression of LNJ in barley and rice caused similar traits. We used a clustered regularly interspaced short palindromic repeats (CRISPR) approach and created a LNJ-like protein in Arabidopsis by deleting parts of the coding sentence of the AFP2 gene that encodes a NINJA-domain protein. These afp2-crispr mutants were also stunted in size and resembled jazD Thus, similar genome-engineering approaches can be exploited as a future tool to create LNJ proteins and produce cereals with altered architectures.


Asunto(s)
Arabidopsis/metabolismo , Ciclopentanos/farmacología , Regulación de la Expresión Génica de las Plantas , Hordeum/metabolismo , Oryza/metabolismo , Oxilipinas/farmacología , Proteínas de Plantas/clasificación , Proteínas de Plantas/metabolismo , Arabidopsis/efectos de los fármacos , Arabidopsis/genética , Perfilación de la Expresión Génica , Hordeum/efectos de los fármacos , Hordeum/genética , Oryza/efectos de los fármacos , Oryza/genética , Reguladores del Crecimiento de las Plantas/farmacología , Proteínas de Plantas/genética , Plantas Modificadas Genéticamente , Isoformas de Proteínas , Proteínas Represoras/genética , Proteínas Represoras/metabolismo , Transducción de Señal
5.
Trends Genet ; 35(3): 186-198, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30606460

RESUMEN

The translatome can be defined as the sum of the RNA sequences that are translated into proteins in the cell by the ribosomal machinery. Until recently, it was generally assumed that the translatome was essentially restricted to evolutionary conserved proteins encoded by the set of annotated protein-coding genes. However, it has become increasingly clear that it also includes small regulatory open reading frames (ORFs), functional micropeptides, de novo proteins, and the pervasive translation of likely nonfunctional proteins. Many of these ORFs have been discovered thanks to the development of ribosome profiling, a technique to sequence ribosome-protected RNA fragments. To fully capture the diversity of translated ORFs, we propose a comprehensive classification that includes the new types of translated ORFs in addition to standard proteins.


Asunto(s)
Evolución Molecular , Sistemas de Lectura Abierta/genética , Biosíntesis de Proteínas , ARN/genética , Biología Computacional , Secuencia Conservada/genética , Regulación de la Expresión Génica/genética , Ribosomas/genética
6.
Br J Cancer ; 127(2): 313-320, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35449454

RESUMEN

BACKGROUND: Molecular subtyping of bladder cancer has revealed luminal tumors generally have a more favourable prognosis. However, some aggressive forms of variant histology, including micropapillary, are often classified luminal. In previous work, we found long non-coding RNA (lncRNA) expression profiles could identify a subgroup of luminal bladder tumors with less aggressive biology and better outcomes. OBJECTIVE: In the present study, we aimed to investigate whether lncRNA expression profiles could identify high-grade T1 micropapillary bladder cancer with differential outcome. DESIGN, SETTING, AND PARTICIPANTS: LncRNAs were quantified from RNA-seq data from a HGT1 bladder cancer cohort that was enriched for primary micropapillary cases (15/84). Unsupervised consensus clustering of variant lncRNAs identified a three-cluster solution, which was further characterised using a panel of micropapillary-associated biomarkers, molecular subtypes, gene signatures, and survival analysis. A single-sample genomic signature was trained using lasso-penalized logistic regression to classify micropapillary-like gene-expression, as characterised by lncRNA clustering. The genomic classifier (GC) was tested on luminal tumors derived from the TCGA cohort (N = 202). OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS: Patient and tumor characteristics were compared between subgroups by using X2 tests and two-sided Wilcoxon rank-sum tests. Primary endpoints were overall, progression-free and high-grade recurrence-free survival, calculated as the date of high-grade T1 disease at TURBT till date of death from any cause, progression, or recurrence, respectively. Survival rates were estimated using weighted Kaplan-Meier (KM) curves. RESULTS AND LIMITATIONS: Primary micropapillary HGT1 showed decreased FGFR3, SHH, and p53 pathway activity relative to tumors with conventional urothelial carcinoma. Many bladder cancer-associated lncRNAs were downregulated in micropapillary tumors, including UCA1, LINC00152, and MALAT1. Unsupervised consensus clustering resulted in a lncRNA cluster 1 (LC1) with worse prognosis that was enriched for primary micropapillary histology and the Luminal Unstable (LumU) molecular subtype. Interestingly, LC1 appeared to better identify aggressive HGT1 disease, compared to stratifying outcomes using primary histologic characteristics. A signature trained to identify LC1 cases showed good performance in the testing cohort, identifying seven cases with significantly worse survival (p < 0.001). Limitations include the retrospective nature of the study and the lack of a validation cohort. CONCLUSIONS: Using the lncRNA transcriptome we identified a subgroup of aggressive HGT1 bladder cancer that was enriched with micropapillary histology. These data suggest that lncRNAs can facilitate the identification of aggressive micropapillary-like tumors, potentially improving patient management.


Asunto(s)
Carcinoma de Células Transicionales , ARN Largo no Codificante , Neoplasias de la Vejiga Urinaria , Biomarcadores de Tumor/análisis , Biomarcadores de Tumor/genética , Carcinoma de Células Transicionales/genética , Perfilación de la Expresión Génica/métodos , Humanos , Pronóstico , ARN Largo no Codificante/genética , Estudios Retrospectivos , Neoplasias de la Vejiga Urinaria/patología
7.
Exp Cell Res ; 391(1): 111940, 2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32156600

RESUMEN

High throughput RNA sequencing techniques have revealed that a large fraction of the genome is transcribed into long non-coding RNAs (lncRNAs). Unlike canonical protein-coding genes, lncRNAs do not contain long open reading frames (ORFs) and tend to be poorly conserved across species. However, many of them contain small ORFs (sORFs) that exhibit translation signatures according to ribosome profiling or proteomics data. These sORFs are a source of putative novel proteins; some of them may confer a selective advantage and be maintained over time, a process known as de novo gene birth. Here we review the mechanisms by which randomly occurring sORFs in lncRNAs can become new functional proteins.


Asunto(s)
Evolución Molecular , Genoma , Sistemas de Lectura Abierta , Biosíntesis de Proteínas , ARN Largo no Codificante/genética , Ribosomas/genética , Animales , Encéfalo/metabolismo , Humanos , Hígado/metabolismo , Masculino , Anotación de Secuencia Molecular , Miocardio/metabolismo , Especificidad de Órganos , ARN Largo no Codificante/clasificación , ARN Largo no Codificante/metabolismo , Ribosomas/clasificación , Ribosomas/metabolismo , Testículo/metabolismo , Transcripción Genética
8.
Mol Biol Evol ; 34(4): 843-856, 2017 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-28087778

RESUMEN

Phylostratigraphy is a computational framework for dating the emergence of DNA and protein sequences in a phylogeny. It has been extensively applied to make inferences on patterns of genome evolution, including patterns of disease gene evolution, ontogeny and de novo gene origination. Phylostratigraphy typically relies on BLAST searches along a species tree, but new simulation studies have raised concerns about the ability of BLAST to detect remote homologues and its impact on phylostratigraphic inferences. Here, we re-assessed these simulations. We found that, even with a possible overall BLAST false negative rate between 11-15%, the large majority of sequences assigned to a recent evolutionary origin by phylostratigraphy is unaffected by technical concerns about BLAST. Where the results of the simulations did cast doubt on previously reported findings, we repeated the original analyses but now excluded all questionable sequences. The originally described patterns remained essentially unchanged. These new analyses strongly support phylostratigraphic inferences, including: genes that emerged after the origin of eukaryotes are more likely to be expressed in the ectoderm than in the endoderm or mesoderm in Drosophila, and the de novo emergence of protein-coding genes from non-genic sequences occurs through proto-gene intermediates in yeast. We conclude that BLAST is an appropriate and sufficiently sensitive tool in phylostratigraphic analysis that does not appear to introduce significant biases into evolutionary pattern inferences.


Asunto(s)
Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de Proteína/métodos , Animales , Sesgo , Evolución Biológica , Simulación por Computador , Drosophila , Evolución Molecular , Genoma , Modelos Genéticos , Filogenia , Factores de Tiempo
9.
Mol Ecol ; 27(3): 709-722, 2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-29319912

RESUMEN

Hibernation is an adaptive strategy some mammals use to survive highly seasonal or unpredictable environments. We present the first investigation on the transcriptomics of hibernation in a natural population of primate hibernators: Crossley's dwarf lemurs (Cheirogaleus crossleyi). Using capture-mark-recapture techniques to track the same animals over a period of 7 months in Madagascar, we used RNA-seq to compare gene expression profiles in white adipose tissue (WAT) during three distinct physiological states. We focus on pathway analysis to assess the biological significance of transcriptional changes in dwarf lemur WAT and, by comparing and contrasting what is known in other model hibernating species, contribute to a broader understanding of genomic contributions of hibernation across Mammalia. The hibernation signature is characterized by a suppression of lipid biosynthesis, pyruvate metabolism and mitochondrial-associated functions, and an accumulation of transcripts encoding ribosomal components and iron-storage proteins. The data support a key role of pyruvate dehydrogenase kinase isoenzyme 4 (PDK4) in regulating the shift in fuel economy during periods of severe food deprivation. This pattern of PDK4 holds true across representative hibernating species from disparate mammalian groups, suggesting that the genetic underpinnings of hibernation may be ancestral to mammals.


Asunto(s)
Animales Salvajes/genética , Animales Salvajes/fisiología , Cheirogaleidae/genética , Cheirogaleidae/fisiología , Hibernación/genética , Transcriptoma/genética , Animales , Temperatura Corporal , Metabolismo de los Hidratos de Carbono/genética , Perfilación de la Expresión Génica , Hierro/metabolismo , Metabolismo de los Lípidos/genética , Mitocondrias/metabolismo , Biosíntesis de Proteínas/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo
10.
PLoS Genet ; 11(12): e1005721, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26720152

RESUMEN

The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species--human, chimpanzee, macaque, and mouse--and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.


Asunto(s)
Evolución Molecular , Genes , Genoma Humano , Pan troglodytes/genética , Ribonucleoproteína Nuclear Pequeña U1/genética , Animales , Secuencia de Bases , Femenino , Expresión Génica , Humanos , Macaca/genética , Masculino , Ratones , Regiones Promotoras Genéticas , Secuencias Reguladoras de Ácidos Nucleicos , Testículo/fisiología , Sitio de Iniciación de la Transcripción
11.
Mol Biol Evol ; 32(9): 2263-72, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25931513

RESUMEN

The high regulatory complexity of vertebrates has been related to two rounds of whole genome duplication (2R-WGD) that occurred before the divergence of the major vertebrate groups. Following these events, many developmental transcription factors (TFs) were retained in multiple copies and subsequently specialized in diverse functions, whereas others reverted to their singleton state. TFs are known to be generally rich in amino acid repeats or low-complexity regions (LCRs), such as polyalanine or polyglutamine runs, which can evolve rapidly and potentially influence the transcriptional activity of the protein. Here we test the hypothesis that LCRs have played a major role in the diversification of TF gene duplicates. We find that nearly half of the TF gene families originated during the 2R-WGD contains LCRs. The number of gene duplicates with LCRs is 155 out of 550 analyzed (28%), about twice as many as the number of single copy genes with LCRs (15 out of 115, 13%). In addition, duplicated TFs preferentially accumulate certain LCR types, the most prominent of which are alanine repeats. We experimentally test the role of alanine-rich LCRs in two different TF gene families, PHOX2A/PHOX2B and LHX2/LHX9. In both cases, the presence of the alanine-rich LCR in one of the copies (PHOX2B and LHX2) significantly increases the capacity of the TF to activate transcription. Taken together, the results provide strong evidence that LCRs are important driving forces of evolutionary change in duplicated genes.


Asunto(s)
Proteínas con Homeodominio LIM/genética , Factores de Transcripción/genética , Expansión de Repetición de Trinucleótido , Animales , Evolución Molecular , Duplicación de Gen , Humanos , Filogenia , Activación Transcripcional
12.
BMC Evol Biol ; 15: 218, 2015 Oct 05.
Artículo en Inglés | MEDLINE | ID: mdl-26438045

RESUMEN

BACKGROUND: The high density of tandem repeat sequences (satellites) in nematode genomes and the availability of genome sequences from several species in the group offer a unique opportunity to better understand the evolutionary dynamics and the functional role of these sequences. We take advantage of the previously developed SATFIND program to study the satellites in four Caenorhabditis species and investigate these questions. METHODS: The identification and comparison of satellites is carried out in three steps. First we find all the satellites present in each species with the SATFIND program. Each satellite is defined by its length, number of repeats, and repeat sequence. Only satellites with at least ten repeats are considered. In the second step we build satellite families with a newly developed alignment program. Satellite families are defined by a consensus sequence and the number of satellites in the family. Finally we compare the consensus sequence of satellite families in different species. RESULTS: We give a catalog of individual satellites in each species. We have also identified satellite families with a related sequence and compare them in different species. We analyze the turnover of satellites: they increased in size through duplications of fragments of 100-300 bases. It appears that in many cases they have undergone an explosive expansion. In C. elegans we have identified a subset of large satellites that have strong affinity for the centromere protein CENP-A. We have also compared our results with those obtained from other species, including one nematode and three mammals. CONCLUSIONS: Most satellite families found in Caenorhabditis are species-specific; in particular those with long repeats. A subset of these satellites may facilitate the formation of kinetochores in mitosis. Other satellite families in C. elegans are either related to Helitron transposons or to meiotic pairing centers.


Asunto(s)
Caenorhabditis/clasificación , Caenorhabditis/genética , ADN de Helmintos/genética , Animales , Autoantígenos/genética , Evolución Biológica , Caenorhabditis elegans/genética , Centrómero , Proteína A Centromérica , Proteínas Cromosómicas no Histona/genética , ADN Satélite/genética , Secuencias Repetitivas de Ácidos Nucleicos , Especificidad de la Especie
13.
Genome Res ; 22(3): 478-85, 2012 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-22128134

RESUMEN

Insertions and deletions (indels), together with nucleotide substitutions, are major drivers of sequence evolution. An excess of deletions over insertions in genomic sequences-the so-called deletional bias-has been reported in a wide range of species, including mammals. However, this bias has not been found in the coding sequences of some mammalian species, such as human and mouse. To determine the strength of the deletional bias in mammals, and the influence of mutation and selection, we have quantified indels in both neutrally evolving noncoding sequences and protein-coding sequences, in six mammalian branches: human, macaque, ancestral primate, mouse, rat, and ancestral rodent. The results obtained with an improved algorithm for the placement of insertions in multiple alignments, Prank(+F), indicate that contrary to previous results, the only mammalian branch with a strong deletional bias is the rodent ancestral branch. We estimate that such a bias has resulted in an ~2.5% sequence loss of mammalian syntenic region in the ancestor of the mouse and rat. Further, a comparison of coding and noncoding sequences shows that negative selection is acting more strongly against mutations generating amino acid insertions than against mutations resulting in amino acid deletions. The strength of selection against indels is found to be higher in the rodent branches than in the primate branches, consistent with the larger effective population sizes of the rodents.


Asunto(s)
Mamíferos/genética , Eliminación de Secuencia , Secuencia de Aminoácidos , Animales , Bovinos , Evolución Molecular , Humanos , Macaca mulatta , Ratones , Datos de Secuencia Molecular , Mutagénesis Insercional , Sistemas de Lectura Abierta , ARN no Traducido , Ratas , Roedores/genética , Alineación de Secuencia , Secuencias Repetidas en Tándem
14.
Nucleic Acids Res ; 41(17): 8107-25, 2013 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-23832230

RESUMEN

Interferons (IFN) play a pivotal role in innate immunity, orchestrating a cell-intrinsic anti-pathogenic state and stimulating adaptive immune responses. The complex interplay between the primary response to IFNs and its modulation by positive and negative feedback loops is incompletely understood. Here, we implement the combination of high-resolution gene-expression profiling of nascent RNA with translational inhibition of secondary feedback by cycloheximide. Unexpectedly, this approach revealed a prominent role of negative feedback mechanisms during the immediate (≤60 min) IFNα response. In contrast, a more complex picture involving both negative and positive feedback loops was observed on IFNγ treatment. IFNγ-induced repression of genes associated with regulation of gene expression, cellular development, apoptosis and cell growth resulted from cycloheximide-resistant primary IFNγ signalling. In silico promoter analysis revealed significant overrepresentation of SP1/SP3-binding sites and/or GC-rich stretches. Although signal transducer and activator of transcription 1 (STAT1)-binding sites were not overrepresented, repression was lost in absence of STAT1. Interestingly, basal expression of the majority of these IFNγ-repressed genes was dependent on STAT1 in IFN-naïve fibroblasts. Finally, IFNγ-mediated repression was also found to be evident in primary murine macrophages. IFN-repressed genes include negative regulators of innate and stress response, and their decrease may thus aid the establishment of a signalling perceptive milieu.


Asunto(s)
Regulación de la Expresión Génica , Interferón-alfa/farmacología , Interferón gamma/farmacología , Regiones Promotoras Genéticas , Transcripción Genética , Animales , Células Cultivadas , Simulación por Computador , Cicloheximida/farmacología , Retroalimentación Fisiológica , Perfilación de la Expresión Génica , Regulación de la Expresión Génica/efectos de los fármacos , Macrófagos/efectos de los fármacos , Macrófagos/metabolismo , Ratones , Células 3T3 NIH , Inhibidores de la Síntesis de la Proteína/farmacología , Elementos de Respuesta , Factor de Transcripción STAT1/fisiología , Tiouridina , Transcripción Genética/efectos de los fármacos
15.
BMC Genomics ; 15: 599, 2014 Jul 16.
Artículo en Inglés | MEDLINE | ID: mdl-25030307

RESUMEN

BACKGROUND: The recent increase in human polymorphism data, together with the availability of genome sequences from several primate species, provides an unprecedented opportunity to investigate how natural selection has shaped human evolution. RESULTS: We compared human branch-specific substitutions with variation data in the current human population to measure the impact of adaptive evolution on human protein coding genes. The use of single nucleotide polymorphisms (SNPs) with high derived allele frequencies (DAFs) minimized the influence of segregating slightly deleterious mutations and improved the estimation of the number of adaptive sites. Using DAF ≥ 60% we showed that the proportion of adaptive substitutions is 0.2% in the complete gene set. However, the percentage rose to 40% when we focused on genes that are specifically accelerated in the human branch with respect to the chimpanzee branch, or on genes that show signatures of adaptive selection at the codon level by the maximum likelihood based branch-site test. In general, neural genes are enriched in positive selection signatures. Genes with multiple lines of evidence of positive selection include taxilin beta, which is involved in motor nerve regeneration and syntabulin, and is required for the formation of new presynaptic boutons. CONCLUSIONS: We combined several methods to detect adaptive evolution in human coding sequences at a genome-wide level. The use of variation data, in addition to sequence divergence information, uncovered previously undetected positive selection signatures in neural genes.


Asunto(s)
Evolución Molecular , Animales , Frecuencia de los Genes , Ligamiento Genético , Genoma Humano , Humanos , Mamíferos/genética , Polimorfismo de Nucleótido Simple , Selección Genética/genética
16.
Mol Biol Evol ; 30(8): 1830-42, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23625888

RESUMEN

Gene duplication is widely regarded as a major mechanism modeling genome evolution and function. However, the mechanisms that drive the evolution of the two, initially redundant, gene copies are still ill defined. Many gene duplicates experience evolutionary rate acceleration, but the relative contribution of positive selection and random drift to the retention and subsequent evolution of gene duplicates, and for how long the molecular clock may be distorted by these processes, remains unclear. Focusing on rodent genes that duplicated before and after the mouse and rat split, we find significantly increased sequence divergence after duplication in only one of the copies, which in nearly all cases corresponds to the novel daughter copy, independent of the mechanism of duplication. We observe that the evolutionary rate of the accelerated copy, measured as the ratio of nonsynonymous to synonymous substitutions, is on average 5-fold higher in the period spanning 4-12 My after the duplication than it was before the duplication. This increase can be explained, at least in part, by the action of positive selection according to the results of the maximum likelihood-based branch-site test. Subsequently, the rate decelerates until purifying selection completely returns to preduplication levels. Reversion to the original rates has already been accomplished 40.5 My after the duplication event, corresponding to a genetic distance of about 0.28 synonymous substitutions per site. Differences in tissue gene expression patterns parallel those of substitution rates, reinforcing the role of neofunctionalization in explaining the evolution of young gene duplicates.


Asunto(s)
Evolución Molecular , Duplicación de Gen , Genes Duplicados , Animales , Efectos de la Posición Cromosómica , Mutación INDEL , Ratones , Especificidad de Órganos/genética , Ratas , Selección Genética
17.
Genome Biol Evol ; 16(7)2024 07 03.
Artículo en Inglés | MEDLINE | ID: mdl-38934859

RESUMEN

During evolution, new open reading frames (ORFs) with the potential to give rise to novel proteins continuously emerge. A recent compilation of noncanonical ORFs with translation signatures in humans has identified thousands of cases with a putative de novo origin. However, it is not known which is their distribution in the population. Are they universally translated? Here, we use ribosome profiling data from 65 lymphoblastoid cell lines from individuals of Yoruba origin to investigate this question. We identify 2,587 de novo ORFs translated in at least one of the cell lines. In line with their de novo origin, the encoded proteins tend to be smaller than 100 amino acids and encode positively charged proteins. We observe that the de novo ORFs are more polymorphic in the population than the set of canonical proteins, with a substantial fraction of them being translated in only some of the cell lines. Remarkably, this difference remains significant after controlling for differences in the translation levels. These results suggest that variations in the level translation of de novo ORFs could be a relevant source of intraspecies phenotypic diversity in humans.


Asunto(s)
Sistemas de Lectura Abierta , Polimorfismo Genético , Humanos , Biosíntesis de Proteínas , Línea Celular , Evolución Molecular , Ribosomas/genética , Ribosomas/metabolismo
18.
Sci Adv ; 10(28): eadn3628, 2024 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-38985879

RESUMEN

The expression of tumor-specific antigens during cancer progression can trigger an immune response against the tumor. Here, we investigate if microproteins encoded by noncanonical open reading frames (ncORFs) are a relevant source of tumor-specific antigens. We analyze RNA sequencing data from 117 hepatocellular carcinoma (HCC) tumors and matched healthy tissue together with ribosome profiling and immunopeptidomics data. Combining human leukocyte antigen-epitope binding predictions and experimental validation experiments, we conclude that around 40% of the tumor-specific antigens in HCC are likely to be derived from ncORFs, including two peptides that can trigger an immune response in humanized mice. We identify a subset of 33 tumor-specific long noncoding RNAs expressing novel cancer antigens shared by more than 10% of the HCC samples analyzed, which, when combined, cover a large proportion of the patients. The results of the study open avenues for extending the range of anticancer vaccines.


Asunto(s)
Antígenos de Neoplasias , Carcinoma Hepatocelular , Neoplasias Hepáticas , Sistemas de Lectura Abierta , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/inmunología , Antígenos de Neoplasias/genética , Antígenos de Neoplasias/inmunología , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/inmunología , Animales , Ratones , Estudios de Cohortes , ARN Largo no Codificante/genética , Regulación Neoplásica de la Expresión Génica , Micropéptidos
19.
bioRxiv ; 2024 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-39314370

RESUMEN

A major scientific drive is to characterize the protein-coding genome as it provides the primary basis for the study of human health. But the fundamental question remains: what has been missed in prior genomic analyses? Over the past decade, the translation of non-canonical open reading frames (ncORFs) has been observed across human cell types and disease states, with major implications for proteomics, genomics, and clinical science. However, the impact of ncORFs has been limited by the absence of a large-scale understanding of their contribution to the human proteome. Here, we report the collaborative efforts of stakeholders in proteomics, immunopeptidomics, Ribo-seq ORF discovery, and gene annotation, to produce a consensus landscape of protein-level evidence for ncORFs. We show that at least 25% of a set of 7,264 ncORFs give rise to translated gene products, yielding over 3,000 peptides in a pan-proteome analysis encompassing 3.8 billion mass spectra from 95,520 experiments. With these data, we developed an annotation framework for ncORFs and created public tools for researchers through GENCODE and PeptideAtlas. This work will provide a platform to advance ncORF-derived proteins in biomedical discovery and, beyond humans, diverse animals and plants where ncORFs are similarly observed.

20.
BMC Evol Biol ; 13: 47, 2013 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-23425224

RESUMEN

BACKGROUND: Proteins are composed of a combination of discrete, well-defined, sequence domains, associated with specific functions that have arisen at different times during evolutionary history. The emergence of novel domains is related to protein functional diversification and adaptation. But currently little is known about how novel domains arise and how they subsequently evolve. RESULTS: To gain insights into the impact of recently emerged domains in protein evolution we have identified all human young protein domains that have emerged in approximately the past 550 million years. We have classified them into vertebrate-specific and mammalian-specific groups, and compared them to older domains. We have found 426 different annotated young domains, totalling 995 domain occurrences, which represent about 12.3% of all human domains. We have observed that 61.3% of them arose in newly formed genes, while the remaining 38.7% are found combined with older domains, and have very likely emerged in the context of a previously existing protein. Young domains are preferentially located at the N-terminus of the protein, indicating that, at least in vertebrates, novel functional sequences often emerge there. Furthermore, young domains show significantly higher non-synonymous to synonymous substitution rates than older domains using human and mouse orthologous sequence comparisons. This is also true when we compare young and old domains located in the same protein, suggesting that recently arisen domains tend to evolve in a less constrained manner than older domains. CONCLUSIONS: We conclude that proteins tend to gain domains over time, becoming progressively longer. We show that many proteins are made of domains of different age, and that the fastest evolving parts correspond to the domains that have been acquired more recently.


Asunto(s)
Evolución Molecular , Estructura Terciaria de Proteína/genética , Animales , Genoma Humano , Humanos , Mamíferos/genética , Ratones , Alineación de Secuencia , Análisis de Secuencia de Proteína , Vertebrados/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA