Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 60
Filtrar
1.
Mol Syst Biol ; 2024 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-38890548

RESUMO

Correlation is not causation: this simple and uncontroversial statement has far-reaching implications. Defining and applying causality in biomedical research has posed significant challenges to the scientific community. In this perspective, we attempt to connect the partly disparate fields of systems biology, causal reasoning, and machine learning to inform future approaches in the field of systems biology and molecular medicine.

2.
PLoS Comput Biol ; 20(2): e1011381, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38386685

RESUMO

Metabolic profiling (metabolomics) aims at measuring small molecules (metabolites) in complex samples like blood or urine for human health studies. While biomarker-based assessment often relies on a single molecule, metabolic profiling combines several metabolites to create a more complex and more specific fingerprint of the disease. However, in contrast to genomics, there is no unique metabolomics setup able to measure the entire metabolome. This challenge leads to tedious and resource consuming preliminary studies to be able to design the right metabolomics experiment. In that context, computer assisted metabolic profiling can be of strong added value to design metabolomics studies more quickly and efficiently. We propose a constraint-based modelling approach which predicts in silico profiles of metabolites that are more likely to be differentially abundant under a given metabolic perturbation (e.g. due to a genetic disease), using flux simulation. In genome-scale metabolic networks, the fluxes of exchange reactions, also known as the flow of metabolites through their external transport reactions, can be simulated and compared between control and disease conditions in order to calculate changes in metabolite import and export. These import/export flux differences would be expected to induce changes in circulating biofluid levels of those metabolites, which can then be interpreted as potential biomarkers or metabolites of interest. In this study, we present SAMBA (SAMpling Biomarker Analysis), an approach which simulates fluxes in exchange reactions following a metabolic perturbation using random sampling, compares the simulated flux distributions between the baseline and modulated conditions, and ranks predicted differentially exchanged metabolites as potential biomarkers for the perturbation. We show that there is a good fit between simulated metabolic exchange profiles and experimental differential metabolites detected in plasma, such as patient data from the disease database OMIM, and metabolic trait-SNP associations found in mGWAS studies. These biomarker recommendations can provide insight into the underlying mechanism or metabolic pathway perturbation lying behind observed metabolite differential abundances, and suggest new metabolites as potential avenues for further experimental analyses.


Assuntos
Metaboloma , Metabolômica , Humanos , Metaboloma/genética , Genoma , Redes e Vias Metabólicas , Biomarcadores
3.
Int J Mol Sci ; 25(5)2024 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-38474241

RESUMO

Tandem repeats (TRs) in protein sequences are consecutive, highly similar sequence motifs. Some types of TRs fold into structural units that pack together in ensembles, forming either an (open) elongated domain or a (closed) propeller, where the last unit of the ensemble packs against the first one. Here, we examine TR proteins (TRPs) to see how their sequence, structure, and evolutionary properties favor them for a function as mediators of protein interactions. Our observations suggest that TRPs bind other proteins using large, structured surfaces like globular domains; in particular, open-structured TR ensembles are favored by flexible termini and the possibility to tightly coil against their targets. While, intuitively, open ensembles of TRs seem prone to evolve due to their potential to accommodate insertions and deletions of units, these evolutionary events are unexpectedly rare, suggesting that they are advantageous for the emergence of the ancestral sequence but are early fixed. We hypothesize that their flexibility makes it easier for further proteins to adapt to interact with them, which would explain their large number of protein interactions. We provide insight into the properties of open TR ensembles, which make them scaffolds for alternative protein complexes to organize genes, RNA and proteins.


Assuntos
Proteínas , Sequências de Repetição em Tandem , Proteínas/química , Sequência de Aminoácidos
4.
J Struct Biol ; 215(2): 107962, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37031868

RESUMO

Nucleocytoplasmatic large DNA viruses (NCLDVs or giant viruses) stand out because of their relatively large genomes encoding hundreds of proteins. These species give us an unprecedented opportunity to study the emergence and evolution of repeats in protein sequences. On the one hand, as viruses, these species have a restricted set of functions, which can help us better define the functional landscape of repeats. On the other hand, given the particular use of the genetic machinery of the host, it is worth asking whether this allows the variations of genetic material that lead to repeats in non-viral species. To support research in the characterization of repeat protein evolution and function, we present here an analysis focused on the repeat proteins of giant viruses, namely tandem repeats (TRs), short repeats (SRs), and homorepeats (polyX). Proteins with large and short repeats are not very frequent in non-eukaryotic organisms because of the difficulties that their folding may entail; however, their presence in giant viruses remarks their advantage for performance in the protein environment of the eukaryotic host. The heterogeneous content of these TRs, SRs and polyX in some viruses hints at diverse needs. Comparisons to homologs suggest that the mechanisms that generate these repeats are extensively used by some of these viruses, but also their capacity to adopt genes with repeats. Giant viruses could be very good models for the study of the emergence and evolution of protein repeats.


Assuntos
Vírus Gigantes , Vírus , Vírus Gigantes/genética , Evolução Molecular , Vírus de DNA/genética , Proteínas/genética , Vírus/genética , Eucariotos
5.
J Struct Biol ; 215(4): 108023, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37652396

RESUMO

Tandem Repeat Proteins (TRPs) are a class of proteins with repetitive amino acid sequences that have been studied extensively for over two decades. Different features at the level of sequence, structure, function and evolution have been attributed to them by various authors. And yet many of its salient features appear only when looking at specific subclasses of protein tandem repeats. Here, we attempt to rationalize the existing knowledge on Tandem Repeat Proteins (TRPs) by pointing out several dichotomies. The emerging picture is more nuanced than generally assumed and allows us to draw some boundaries of what is not a "proper" TRP. We conclude with an operational definition of a specific subset, which we have denominated STRPs (Structural Tandem Repeat Proteins), which separates a subclass of tandem repeats with distinctive features from several other less well-defined types of repeats. We believe that this definition will help researchers in the field to better characterize the biological meaning of this large yet largely understudied group of proteins.


Assuntos
Proteínas , Sequências de Repetição em Tandem , Proteínas/genética , Proteínas/química , Sequências de Repetição em Tandem/genética , Sequência de Aminoácidos
6.
Bioinformatics ; 38(21): 4851-4858, 2022 10 31.
Artigo em Inglês | MEDLINE | ID: mdl-36106994

RESUMO

MOTIVATION: Poly-alanine (polyA) regions are protein stretches mostly composed of alanines. Despite their abundance in eukaryotic proteomes and their association to nine inherited human diseases, the structural and functional roles exerted by polyA stretches remain poorly understood. In this work we study how the amino acid context in which polyA regions are settled in proteins influences their structure and function. RESULTS: We identified glycine and proline as the most abundant amino acids within polyA and in the flanking regions of polyA tracts, in human proteins as well as in 17 additional eukaryotic species. Our analyses indicate that the non-structuring nature of these two amino acids influences the α-helical conformations predicted for polyA, suggesting a relevant role in reducing the inherent aggregation propensity of long polyA. Then, we show how polyA position in protein N-termini relates with their function as transit peptides. PolyA placed just after the initial methionine is often predicted as part of mitochondrial transit peptides, whereas when placed in downstream positions, polyA are part of signal peptides. A few examples from known structures suggest that short polyA can emerge by alanine substitutions in α-helices; but evolution by insertion is observed for longer polyA. Our results showcase the importance of studying the sequence context of homorepeats as a mechanism to shape their structure-function relationships. AVAILABILITY AND IMPLEMENTATION: The datasets used and/or analyzed during the current study are available from the corresponding author onreasonable request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Alanina , Poli A , Humanos , Sequência de Aminoácidos , Proteoma , Peptídeos/química
7.
Brief Bioinform ; 21(2): 458-472, 2020 03 23.
Artigo em Inglês | MEDLINE | ID: mdl-30698641

RESUMO

There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs. SHORT ABSTRACT: There are multiple definitions for low complexity regions (LCRs) in protein sequences. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, plus overlaps between different properties related to LCRs, using examples.


Assuntos
Proteínas/química , Algoritmos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Evolução Molecular , Conformação Proteica , Domínios Proteicos
8.
PLoS Comput Biol ; 17(2): e1008730, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33571201

RESUMO

The correct identification of metabolic activity in tissues or cells under different conditions can be extremely elusive due to mechanisms such as post-transcriptional modification of enzymes or different rates in protein degradation, making difficult to perform predictions on the basis of gene expression alone. Context-specific metabolic network reconstruction can overcome some of these limitations by leveraging the integration of multi-omics data into genome-scale metabolic networks (GSMN). Using the experimental information, context-specific models are reconstructed by extracting from the generic GSMN the sub-network most consistent with the data, subject to biochemical constraints. One advantage is that these context-specific models have more predictive power since they are tailored to the specific tissue, cell or condition, containing only the reactions predicted to be active in such context. However, an important limitation is that there are usually many different sub-networks that optimally fit the experimental data. This set of optimal networks represent alternative explanations of the possible metabolic state. Ignoring the set of possible solutions reduces the ability to obtain relevant information about the metabolism and may bias the interpretation of the true metabolic states. In this work we formalize the problem of enumerating optimal metabolic networks and we introduce DEXOM, an unified approach for diversity-based enumeration of context-specific metabolic networks. We developed different strategies for this purpose and we performed an exhaustive analysis using simulated and real data. In order to analyze the extent to which these results are biologically meaningful, we used the alternative solutions obtained with the different methods to measure: 1) the improvement of in silico predictions of essential genes in Saccharomyces cerevisiae using ensembles of metabolic network; and 2) the detection of alternative enriched pathways in different human cancer cell lines. We also provide DEXOM as an open-source library compatible with COBRA Toolbox 3.0, available at https://github.com/MetExplore/dexom.


Assuntos
Perfilação da Expressão Gênica , Redes e Vias Metabólicas/fisiologia , Processamento Pós-Transcricional do RNA , Saccharomyces cerevisiae/genética , Algoritmos , Linhagem Celular Tumoral , Biologia Computacional , Simulação por Computador , Reações Falso-Positivas , Genoma , Humanos , Modelos Biológicos , Modelos Estatísticos , Linguagens de Programação , Software
9.
PLoS Comput Biol ; 17(9): e1009105, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34492007

RESUMO

Over-representation analysis (ORA) is one of the commonest pathway analysis approaches used for the functional interpretation of metabolomics datasets. Despite the widespread use of ORA in metabolomics, the community lacks guidelines detailing its best-practice use. Many factors have a pronounced impact on the results, but to date their effects have received little systematic attention. Using five publicly available datasets, we demonstrated that changes in parameters such as the background set, differential metabolite selection methods, and pathway database used can result in profoundly different ORA results. The use of a non-assay-specific background set, for example, resulted in large numbers of false-positive pathways. Pathway database choice, evaluated using three of the most popular metabolic pathway databases (KEGG, Reactome, and BioCyc), led to vastly different results in both the number and function of significantly enriched pathways. Factors that are specific to metabolomics data, such as the reliability of compound identification and the chemical bias of different analytical platforms also impacted ORA results. Simulated metabolite misidentification rates as low as 4% resulted in both gain of false-positive pathways and loss of truly significant pathways across all datasets. Our results have several practical implications for ORA users, as well as those using alternative pathway analysis methods. We offer a set of recommendations for the use of ORA in metabolomics, alongside a set of minimal reporting guidelines, as a first step towards the standardisation of pathway analysis in metabolomics.


Assuntos
Metabolômica , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Redes e Vias Metabólicas , Reprodutibilidade dos Testes
10.
EMBO Rep ; 21(7): e49443, 2020 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-32350990

RESUMO

RNA modifications have recently emerged as an important layer of gene regulation. N6-methyladenosine (m6 A) is the most prominent modification on eukaryotic messenger RNA and has also been found on noncoding RNA, including ribosomal and small nuclear RNA. Recently, several m6 A methyltransferases were identified, uncovering the specificity of m6 A deposition by structurally distinct enzymes. In order to discover additional m6 A enzymes, we performed an RNAi screen to deplete annotated orthologs of human methyltransferase-like proteins (METTLs) in Drosophila cells and identified CG9666, the ortholog of human METTL5. We show that CG9666 is required for specific deposition of m6 A on 18S ribosomal RNA via direct interaction with the Drosophila ortholog of human TRMT112, CG12975. Depletion of CG9666 yields a subsequent loss of the 18S rRNA m6 A modification, which lies in the vicinity of the ribosome decoding center; however, this does not compromise rRNA maturation. Instead, a loss of CG9666-mediated m6 A impacts fly behavior, providing an underlying molecular mechanism for the reported human phenotype in intellectual disability. Thus, our work expands the repertoire of m6 A methyltransferases, demonstrates the specialization of these enzymes, and further addresses the significance of ribosomal RNA modifications in gene expression and animal behavior.


Assuntos
Drosophila , Metiltransferases , Adenosina , Animais , Drosophila/genética , Humanos , Metiltransferases/genética , RNA Ribossômico , RNA Ribossômico 18S/genética , Caminhada
11.
Nucleic Acids Res ; 48(W1): W77-W84, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32421769

RESUMO

Low complexity regions (LCRs) in protein sequences are characterized by a less diverse amino acid composition compared to typically observed sequence diversity. Recent studies have shown that LCRs may co-occur with intrinsically disordered regions, are highly conserved in many organisms, and often play important roles in protein functions and in diseases. In previous decades, several methods have been developed to identify regions with LCRs or amino acid bias, but most of them as stand-alone applications and currently there is no web-based tool which allows users to explore LCRs in protein sequences with additional functional annotations. We aim to fill this gap by providing PlaToLoCo - PLAtform of TOols for LOw COmplexity-a meta-server that integrates and collects the output of five different state-of-the-art tools for discovering LCRs and provides functional annotations such as domain detection, transmembrane segment prediction, and calculation of amino acid frequencies. In addition, the union or intersection of the results of the search on a query sequence can be obtained. By developing the PlaToLoCo meta-server, we provide the community with a fast and easily accessible tool for the analysis of LCRs with additional information included to aid the interpretation of the results. The PlaToLoCo platform is available at: http://platoloco.aei.polsl.pl/.


Assuntos
Proteínas/química , Software , Aminoácidos/análise , Gráficos por Computador , Humanos , Proteínas de Membrana/química , Anotação de Sequência Molecular , Domínios Proteicos , Análise de Sequência de Proteína
12.
Int J Mol Sci ; 23(10)2022 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-35628660

RESUMO

Huntington's disease (HD) is caused by the production of a mutant huntingtin (HTT) with an abnormally long poly-glutamine (polyQ) tract, forming aggregates and inclusions in neurons. Previous work by us and others has shown that an increase or decrease in polyQ-triggered aggregates can be passive simply due to the interaction of proteins with the aggregates. To search for proteins with active (functional) effects, which might be more effective in finding therapies and mechanisms of HD, we selected among the proteins that interact with HTT a total of 49 pairs of proteins that, while being paralogous to each other (and thus expected to have similar passive interaction with HTT), are located in different regions of the protein interaction network (suggesting participation in different pathways or complexes). Three of these 49 pairs contained members with opposite effects on HD, according to the literature. The negative members of the three pairs, MID1, IKBKG, and IKBKB, interact with PPP2CA and TUBB, which are known negative factors in HD, as well as with HSP90AA1 and RPS3. The positive members of the three pairs interact with HSPA9. Our results provide potential HD modifiers of functional relevance and reveal the dynamic aspect of paralog evolution within the interaction network.


Assuntos
Doença de Huntington , Humanos , Doença de Huntington/metabolismo , Quinase I-kappa B/metabolismo , Corpos de Inclusão/metabolismo , Neurônios/metabolismo , Mapas de Interação de Proteínas
13.
Biol Chem ; 402(8): 945-951, 2021 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-33660494

RESUMO

According to the amino acid composition of natural proteins, it could be expected that all possible sequences of three or four amino acids will occur at least once in large protein datasets purely by chance. However, in some species or cellular context, specific short amino acid motifs are missing due to unknown reasons. We describe these as Avoided Motifs, short amino acid combinations missing from biological sequences. Here we identify 209 human and 154 bacterial Avoided Motifs of length four amino acids, and discuss their possible functionality according to their presence in other species. Furthermore, we determine two Avoided Motifs of length three amino acids in human proteins specifically located in the cytoplasm, and two more in secreted proteins. Our results support the hypothesis that the characterization of Avoided Motifs in particular contexts can provide us with information about functional motifs, pointing to a new approach in the use of molecular sequences for the discovery of protein function.


Assuntos
Biologia Computacional , Proteínas , Alinhamento de Sequência
14.
Brief Bioinform ; 20(2): 463-470, 2019 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-29040399

RESUMO

Protein databases are steadily growing driven by the spread of new more efficient sequencing techniques. This growth is dominated by an increase in redundancy (homologous proteins with various degrees of sequence similarity) and by the incapability to process and curate sequence entries as fast as they are created. To understand these trends and aid bioinformatic resources that might be compromised by the increasing size of the protein sequence databases, we have created a less-redundant protein data set. In parallel, we analyzed the evolution of protein sequence databases in terms of size and redundancy. While the SwissProt database has decelerated its growth mostly because of a focus on increasing the level of annotation of its sequences, its counterpart TrEMBL, much less limited by curation steps, is still in a phase of accelerated growth. However, we predict that before 2020, almost all entries deposited in UniProtKB will be homologous to known proteins. We propose that new sequencing projects can be made more useful if they are driven to sequencing voids, parts of the tree of life far from already sequenced species or model organisms. We show these voids are present in the Archaea and Eukarya domains of life. The approach to the certainty of the redundancy of new protein sequence entries leads to the consideration that most of the protein diversity on Earth has already been described, which we estimate to be of around 3.75 million proteins, revising down the prediction we did a decade ago.


Assuntos
Bases de Dados de Proteínas , Proteínas/análise , Proteoma/análise , Análise de Sequência de Proteína/métodos , Animais , Biologia Computacional , Humanos , Bases de Conhecimento , Proteínas/classificação , Software
15.
Nucleic Acids Res ; 47(21): 10994-11006, 2019 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-31584084

RESUMO

The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.


Assuntos
DNA/genética , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Erro Científico Experimental , Sequências de Repetição em Tandem/genética , Animais , Gadus morhua/genética , Análise de Sequência de DNA
16.
Int J Mol Sci ; 22(4)2021 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-33572172

RESUMO

Low complexity regions (LCRs) are very frequent in protein sequences, generally having a lower propensity to form structured domains and tending to be much less evolutionarily conserved than globular domains. Their higher abundance in eukaryotes and in species with more cellular types agrees with a growing number of reports on their function in protein interactions regulated by post-translational modifications. LCRs facilitate the increase of regulatory and network complexity required with the emergence of organisms with more complex tissue distribution and development. Although the low conservation and structural flexibility of LCRs complicate their study, evolutionary studies of proteins across species have been used to evaluate their significance and function. To investigate how to apply this evolutionary approach to the study of LCR function in protein-protein interactions, we performed a detailed analysis for Huntingtin (HTT), a large protein that is a hub for interaction with hundreds of proteins, has a variety of LCRs, and for which partial structural information (in complex with HAP40) is available. We hypothesize that proteins RASA1, SYN2, and KAT2B may compete with HAP40 for their attachment to the core of HTT using similar LCRs. Our results illustrate how evolution might favor the interplay of LCRs with domains, and the possibility of detecting multiple modes of LCR-mediated protein-protein interactions with a large hub such as HTT when enough protein interaction data is available.


Assuntos
Evolução Molecular , Proteína Huntingtina/metabolismo , Proteínas Nucleares/metabolismo , Motivos de Aminoácidos/genética , Sequência de Aminoácidos/genética , Animais , Humanos , Proteína Huntingtina/química , Proteína Huntingtina/genética , Proteína Huntingtina/ultraestrutura , Microscopia Eletrônica , Proteínas Nucleares/química , Proteínas Nucleares/genética , Proteínas Nucleares/ultraestrutura , Ligação Proteica/genética , Conformação Proteica em alfa-Hélice/genética , Domínios Proteicos/genética , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Alinhamento de Sequência , Sinapsinas/química , Sinapsinas/metabolismo , Proteína p120 Ativadora de GTPase/química , Proteína p120 Ativadora de GTPase/metabolismo , Fatores de Transcrição de p300-CBP/química , Fatores de Transcrição de p300-CBP/metabolismo
17.
J Struct Biol ; 212(2): 107608, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32896658

RESUMO

Tandem Repeat Proteins (TRPs) are ubiquitous in cells and are enriched in eukaryotes. They contributed to the evolution of organism complexity, specializing for functions that require quick adaptability such as immunity-related functions. To investigate the hypothesis of repeat protein evolution through exon duplication and rearrangement, we designed a tool to analyze the relationships between exon/intron patterns and structural symmetries. The tool allows comparison of the structure fragments as defined by exon/intron boundaries from Ensembl against the structural element repetitions from RepeatsDB. The all-against-all pairwise structural alignment between fragments and comparison of the two definitions (structural units and exons) are visualized in a single matrix, the "repeat/exon plot". An analysis of different repeat protein families, including the solenoids Leucine-Rich, Ankyrin, Pumilio, HEAT repeats and the ß propellers Kelch-like, WD40 and RCC1, shows different behaviors, illustrated here through examples. For each example, the analysis of the exon mapping in homologous proteins supports the conservation of their exon patterns. We propose that when a clear-cut relationship between exon and structural boundaries can be identified, it is possible to infer a specific "evolutionary pattern" which may improve TRPs detection and classification.


Assuntos
Éxons/genética , Proteínas/genética , Sequências de Repetição em Tandem/genética , Animais , Evolução Molecular , Humanos , Íntrons/genética
18.
BMC Evol Biol ; 20(1): 59, 2020 05 24.
Artigo em Inglês | MEDLINE | ID: mdl-32448113

RESUMO

BACKGROUND: Polyglutamine regions (polyQ) are one of the most studied and prevalent homorepeats in eukaryotes. They have a particular length-dependent codon usage, which relates to a characteristic CAG-slippage mechanism. Pathologically expanded tracts of polyQ are known to form aggregates and are involved in the development of several human neurodegenerative diseases. The non-pathogenic function of polyQ is to mediate protein-protein interactions via a coiled-coil pairing with an interactor. They are usually located in a helical context. RESULTS: Here we study the stability of polyQ regions in evolution, using a set of 60 proteomes from four distinct taxonomic groups (Insecta, Teleostei, Sauria and Mammalia). The polyQ regions can be distinctly grouped in three categories based on their evolutionary stability: stable, unstable by length variation (inserted), and unstable by mutations (mutated). PolyQ regions in these categories can be significantly distinguished by their glutamine codon usage, and we show that the CAG-slippage mechanism is predominant in inserted polyQ of Sauria and Mammalia. The polyQ amino acid context is also influenced by the polyQ stability, with a higher proportion of proline residues around inserted polyQ. By studying the secondary structure of the sequences surrounding polyQ regions, we found that regarding the structural conformation around a polyQ, its stability category is more relevant than its taxonomic information. The protein-protein interaction capacity of a polyQ is also affected by its stability, as stable polyQ have more interactors than unstable polyQ. CONCLUSIONS: Our results show that apart from the sequence of a polyQ, information about its orthologous sequences is needed to assess its function. Codon usage, amino acid context, structural conformation and the protein-protein interaction capacity of polyQ from all studied taxa critically depend on the region stability. There are however some taxa-specific polyQ features that override this importance. We conclude that a taxa-driven evolutionary analysis is of the highest importance for the comprehensive study of any feature of polyglutamine regions.


Assuntos
Evolução Molecular , Peptídeos/metabolismo , Humanos , Mapeamento de Interação de Proteínas , Proteoma
19.
Bioinformatics ; 35(6): 1079-1081, 2019 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-30165582

RESUMO

SUMMARY: Traitpedia is a collaborative database aimed to collect binary traits in a tabular form for a growing number of species. AVAILABILITY AND IMPLEMENTATION: Traitpedia can be accessed from http://cbdm-01.zdv.uni-mainz.de/~munoz/traitpedia. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Fenótipo
20.
J Struct Biol ; 208(2): 86-91, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31408700

RESUMO

Low complexity regions (LCRs) in protein sequences have special properties that are very different from those of globular proteins. The rules that define secondary structure elements do not apply when the distribution of amino acids becomes biased. While there is a tendency towards structural disorder in LCRs, various examples, and particularly homorepeats of single amino acids, suggest that very short repeats could adopt structures very difficult to predict. These structures are possibly variable and dependant on the context of intra- or inter-molecular interactions. In general, short repeats in LCRs can induce structure. This could explain the observation that very short (non-perfect) repeats are widespread and many define regions with a function in protein interactions. For these reasons, we have developed an algorithm to quickly analyze local repeatability along protein sequences, that is, how close a protein fragment is from a perfect repeat. Using this algorithm we identified that the proteins of the yeast Saccharomyces cerevisiae are depleted in short repeats (approximate or not) of odd-length, while the human proteins are not, that the fish Danio rerio has many proteins with repeats of length two and that the plant Arabidopsis thaliana has an unusually large amount of repeats of length seven. Our method (REpeatability Scanner, RES, accessible at http://cbdm-01.zdv.uni-mainz.de/~munoz/res/) allows to find regions with approximate short repeats in protein sequences, and helps to characterize the variable use of LCRs and compositional bias in different organisms.


Assuntos
Proteínas/química , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Evolução Molecular , Humanos , Sequências Repetitivas de Aminoácidos , Alinhamento de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA