RESUMO
MOTIVATION: Automated machine learning (AutoML) solutions can bridge the gap between new computational advances and their real-world applications by enabling experimental scientists to build their own custom models. We examine different steps in the development life-cycle of peptide bioactivity binary predictors and identify key steps where automation cannot only result in a more accessible method, but also more robust and interpretable evaluation leading to more trustworthy models. RESULTS: We present a new automated method for drawing negative peptides that achieves better balance between specificity and generalization than current alternatives. We study the effect of homology-based partitioning for generating the training and testing data subsets and demonstrate that model performance is overestimated when no such homology correction is used, which indicates that prior studies may have overestimated their performance when applied to new peptide sequences. We also conduct a systematic analysis of different protein language models as peptide representation methods and find that they can serve as better descriptors than a naive alternative, but that there is no significant difference across models with different sizes or algorithms. Finally, we demonstrate that an ensemble of optimized traditional machine learning algorithms can compete with more complex neural network models, while being more computationally efficient. We integrate these findings into AutoPeptideML, an easy-to-use AutoML tool to allow researchers without a computational background to build new predictive models for peptide bioactivity in a matter of minutes. AVAILABILITY AND IMPLEMENTATION: Source code, documentation, and data are available at https://github.com/IBM/AutoPeptideML and a dedicated web-server at http://peptide.ucd.ie/AutoPeptideML. A static version of the software to ensure the reproduction of the results is available at https://zenodo.org/records/13363975.
Assuntos
Algoritmos , Aprendizado de Máquina , Peptídeos , Peptídeos/química , Software , Biologia Computacional/métodos , Redes Neurais de Computação , Bases de Dados de ProteínasRESUMO
Legume seed protein is an important source of nutrition, but generally it is less digestible than animal protein. Poor protein digestibility in legume seeds and seedlings may partly reflect defenses against herbivores. Protein changes during germination typically increase proteolysis and digestibility, by lowering the levels of anti-nutrient protease inhibitors, activating proteases, and breaking down storage proteins (including allergens). Germinating legume sprouts also show striking increases in free amino acids (especially asparagine), but their roles in host defense or other processes are not known. While the net effect of germination is generally to increase the digestibility of legume seed proteins, the extent of improvement in digestibility is species- and strain-dependent. Further research is needed to highlight which changes contribute most to improved digestibility of sprouted seeds. Such knowledge could guide the selection of varieties that are more digestible and also guide the development of food preparations that are more digestible, potentially combining germination with other factors altering digestibility, such as heating and fermentation. Techniques to characterize the shifts in protein make-up, activity and degradation during germination need to draw on traditional analytical approaches, complemented by proteomic and peptidomic analysis of mass spectrometry-identified peptide breakdown products.
Assuntos
Fabaceae , Animais , Fabaceae/metabolismo , Germinação , Proteólise , Proteômica , Sementes/química , Plântula , Proteínas de Plantas/metabolismo , Verduras/metabolismoRESUMO
Milk-derived peptides are known to confer anti-inflammatory effects. We hypothesised that milk-derived cell-penetrating peptides might modulate inflammation in useful ways. Using computational techniques, we identified and synthesised peptides from the milk protein Alpha-S1-casein that were predicted to be cell-penetrating using a machine learning predictor. We modified the interpretation of the prediction results to consider the effects of histidine. Peptides were then selected for testing to determine their cell penetrability and anti-inflammatory effects using HeLa cells and J774.2 mouse macrophage cell lines. The selected peptides all showed cell penetrating behaviour, as judged using confocal microscopy of fluorescently labelled peptides. None of the peptides had an effect on either the NF-κB transcription factor or TNFα and IL-1ß secretion. Thus, the identified milk-derived sequences have the ability to be internalised into the cell without affecting cell homeostatic mechanisms such as NF-κB activation. These peptides are worthy of further investigation for other potential bioactivities or as a naturally derived carrier to promote the cellular internalisation of other active peptides.
Assuntos
Peptídeos Penetradores de Células , NF-kappa B , Humanos , Camundongos , Animais , NF-kappa B/metabolismo , Peptídeos Penetradores de Células/farmacologia , Células HeLa , Leite/metabolismo , Fator de Necrose Tumoral alfa/metabolismo , Anti-Inflamatórios/farmacologiaRESUMO
During coronavirus infection, three non-structural proteins, nsp3, nsp4, and nsp6, are of great importance as they induce the formation of double-membrane vesicles where the replication and transcription of viral gRNA takes place, and the interaction of nsp3 and nsp4 lumenal regions triggers membrane pairing. However, their structural states are not well-understood. We investigated the interactions between nsp3 and nsp4 by predicting the structures of their lumenal regions individually and in complex using AlphaFold2 as implemented in ColabFold. The ColabFold prediction accuracy of the nsp3-nsp4 complex was increased compared to nsp3 alone and nsp4 alone. All cysteine residues in both lumenal regions were modelled to be involved in intramolecular disulphide bonds. A linker region in the nsp4 lumenal region emerged as crucial for the interaction, transitioning to a structured state when predicted in complex. The key interactions modelled between nsp3 and nsp4 appeared stable when the transmembrane regions of nsp3 and nsp4 were added to the modelling either alone or together. While molecular dynamics simulations (MD) demonstrated that the proposed model of the nsp3 lumenal region on its own is not stable, key interactions between nsp and nsp4 in the proposed complex model appeared stable after MD. Together, these observations suggest that the interaction is robust to different modelling conditions. Understanding the functional importance of the nsp4 linker region may have implications for the targeting of double membrane vesicle formation in controlling coronavirus infection.
Assuntos
SARS-CoV-2 , Proteínas não Estruturais Virais , SARS-CoV-2/metabolismo , Proteínas não Estruturais Virais/genética , Proteínas não Estruturais Virais/metabolismo , Conformação ProteicaRESUMO
BACKGROUND: In bacteria, genes with related functions-such as those involved in the metabolism of the same compound or in infection processes-are often physically close on the genome and form groups called clusters. The enrichment of such clusters over various distantly related bacteria can be used to predict the roles of genes of unknown function that cluster with characterised genes. There is no obvious rule to define a cluster, given their variability in size and intergenic distances, and the definition of what comprises a "gene", since genes can gain and lose domains over time. Protein domains can cluster within a gene, or in adjacent genes of related function, and in both cases these are chromosomally clustered. Here, we model the distances between pairs of protein domain coding regions across a wide range of bacteria and archaea via a probabilistic two component mixture model, without imposing arbitrary thresholds in terms of gene numbers or distances. RESULTS: We trained our model using matched gene ontology terms to label functionally related pairs and assess the stability of the parameters of the model across 14,178 archaeal and bacterial strains. We found that the parameters of our mixture model are remarkably stable across bacteria and archaea, except for endosymbionts and obligate intracellular pathogens. Obligate pathogens have smaller genomes, and although they vary, on average do not show noticeably different clustering distances; the main difference in the parameter estimates is that a far greater proportion of the genes sharing ontology terms are clustered. This may reflect that these genomes are enriched for complexes encoded by clustered core housekeeping genes, as a proportion of the total genes. Given the overall stability of the parameter estimates, we then used the mean parameter estimates across the entire dataset to investigate which gene ontology terms are most frequently associated with clustered genes. CONCLUSIONS: Given the stability of the mixture model across species, it may be used to predict bacterial gene clusters that are shared across multiple species, in addition to giving insights into the evolutionary pressures on the chromosomal locations of genes in different species.
Assuntos
Genoma Arqueal , Genoma Bacteriano , Archaea/genética , Bactérias/genética , Análise por Conglomerados , Simulação por Computador , Evolução Molecular , Filogenia , Domínios ProteicosRESUMO
Expression of the macrophage immunometabolism regulator gene (MACIR) is associated with severity of autoimmune disease pathology and with the regulation of macrophage biology through unknown mechanisms. The encoded 206 amino acid protein lacks homology to any characterized protein sequence and is a disordered protein according to structure prediction algorithms. To identify interactions of MACIR with proteins from all subcellular compartments, a membrane solubilization buffer is employed, that together with a high affinity EF hand based pull down method, increases the resolution of quantitative mass spectrometry analysis with significant enrichment of interactions from membrane bound nuclear and mitochondrial compartments compared to samples prepared with radioimmunoprecipitation assay buffer. A total of 63 significant interacting proteins are identified and interaction with the nuclear transport receptor TNPO1 and the trafficking proteins UNC119 homolog A and B are validated by immunoprecipitation. Mutational analysis in two candidate nuclear localization signal motifs in the MACIR amino acid sequence shows the interaction with TNPO1 is likely via a non-classical proline/tyrosine-nuclear localization signal motif (aa98-117). It is shown that employing a highly specific and high affinity pull down method that performs efficiently in this glycerol and detergent rich buffer is a powerful approach for the analysis of uncharacterized protein interactomes.
Assuntos
Macrófagos , Proteínas de Membrana , Proteômica , Proteínas Adaptadoras de Transdução de Sinal , Sequência de Aminoácidos , Humanos , Imunoprecipitação , beta CarioferinasRESUMO
Low-throughput experiments and high-throughput proteomic and genomic analyses have created enormous quantities of data that can be used to explore protein function and evolution. The ability to consolidate these data into an informative and intuitive format is vital to our capacity to comprehend these distinct but complementary sources of information. However, existing tools to visualize protein-related data are restricted by their presentation, sources of information, functionality or accessibility. We introduce ProViz, a powerful browser-based tool to aid biologists in building hypotheses and designing experiments by simplifying the analysis of functional and evolutionary features of proteins. Feature information is retrieved in an automated manner from resources describing protein modular architecture, post-translational modification, structure, sequence variation and experimental characterization of functional regions. These features are mapped to evolutionary information from precomputed multiple sequence alignments. Data are displayed in an interactive and information-rich yet intuitive visualization, accessible through a simple protein search interface. This allows users with limited bioinformatic skills to rapidly access data pertinent to their research. Visualizations can be further customized with user-defined data either manually or using a REST API. ProViz is available at http://proviz.ucd.ie/.
Assuntos
Biologia Computacional/estatística & dados numéricos , Conjuntos de Dados como Assunto/estatística & dados numéricos , Interface Usuário-Computador , Sequência de Aminoácidos , Pesquisa Biomédica , Biologia Computacional/métodos , Gráficos por Computador , Bases de Dados de Proteínas , Evolução Molecular , Humanos , Internet , Alinhamento de SequênciaRESUMO
Tandem mass spectrometry (MS/MS) techniques, developed for protein identification, are increasingly being applied in the field of peptidomics. Using this approach, the set of protein fragments observed in a sample of interest can be determined to gain insights into important biological processes such as signaling and other bioactivities. As the peptidomics era progresses, there is a need for robust and convenient methods to inspect and analyze MS/MS derived data. Here, we present Peptigram, a novel tool dedicated to the visualization and comparison of peptides detected by MS/MS. The principal advantage of Peptigram is that it provides visualizations at both the protein and peptide level, allowing users to simultaneously visualize the peptide distributions of one or more samples of interest, mapped to their parent proteins. In this way rapid comparisons between samples can be made in terms of their peptide coverage and abundance. Moreover, Peptigram integrates and displays key sequence features from external databases and links with peptide analysis tools to offer the user a comprehensive peptide discovery resource. Here, we illustrate the use of Peptigram on a data set of milk hydrolysates. For convenience, Peptigram is implemented as a web application, and is freely available for academic use at http://bioware.ucd.ie/peptigram .
Assuntos
Peptídeos/genética , Proteômica/métodos , Software , Bases de Dados de Proteínas , Internet , Peptídeos/classificação , Espectrometria de Massas em TandemRESUMO
In this study, we carry out a systematic characterisation of the YIPF family of proteins with respect to their subcellular localisation profile, membrane topology and functional effects on the endomembrane system. YIPF proteins primarily localise to the Golgi complex and can be grouped into trans-Golgi-localising YIPFs (YIPF1 and YIPF2) and cis-Golgi-localising YIPFs (YIPF3, YIPF4 and YIPF5), with YIPF6 and YIPF7 showing a broader profile being distributed throughout the Golgi stack. YIPF proteins have a long soluble N-terminal region, which is orientated towards the cytosol, followed by 5 closely stacked transmembrane domains, and a C terminus, orientated towards the lumen of the Golgi. The significance of YIPF proteins for the maintenance of the morphology of the Golgi was tested by RNA interference, revealing a number of specific morphological changes to this organelle on their depletion. We propose a role for this family of proteins in regulating membrane dynamics in the endomembrane system.
Assuntos
Proteínas de Membrana/metabolismo , Complexo de Golgi/metabolismo , Células HeLa , Humanos , Células Tumorais CultivadasRESUMO
Identifying effective therapeutic drug combinations that modulate complex signaling pathways in platelets is central to the advancement of effective anti-thrombotic therapies. However, there is no systems model of the platelet that predicts responses to different inhibitor combinations. We developed an approach which goes beyond current inhibitor-inhibitor combination screening to efficiently consider other signaling aspects that may give insights into the behaviour of the platelet as a system. We investigated combinations of platelet inhibitors and activators. We evaluated three distinct strands of information, namely: activator-inhibitor combination screens (testing a panel of inhibitors against a panel of activators); inhibitor-inhibitor synergy screens; and activator-activator synergy screens. We demonstrated how these analyses may be efficiently performed, both experimentally and computationally, to identify particular combinations of most interest. Robust tests of activator-activator synergy and of inhibitor-inhibitor synergy required combinations to show significant excesses over the double doses of each component. Modeling identified multiple effects of an inhibitor of the P2Y12 ADP receptor, and complementarity between inhibitor-inhibitor synergy effects and activator-inhibitor combination effects. This approach accelerates the mapping of combination effects of compounds to develop combinations that may be therapeutically beneficial. We integrated the three information sources into a unified model that predicted the benefits of a triple drug combination targeting ADP, thromboxane and thrombin signaling.
Assuntos
Plaquetas/efeitos dos fármacos , Plaquetas/fisiologia , Descoberta de Drogas/métodos , Modelos Estatísticos , Ativação Plaquetária/efeitos dos fármacos , Inibidores da Agregação Plaquetária/administração & dosagem , Células Cultivadas , Simulação por Computador , Antagonismo de Drogas , Sinergismo Farmacológico , Quimioterapia Combinada , HumanosRESUMO
BACKGROUND: Epistasis (synergistic interaction) among SNPs governing gene expression is likely to arise within transcriptional networks. However, the power to detect it is limited by the large number of combinations to be tested and the modest sample sizes of most datasets. By limiting the interaction search space firstly to cis-trans and then cis-cis SNP pairs where both SNPs had an independent effect on the expression of the most variable transcripts in the liver and brain, we greatly reduced the size of the search space. RESULTS: Within the cis-trans search space we discovered three transcripts with significant epistasis. Surprisingly, all interacting SNP pairs were located nearby each other on the chromosome (within 290 kb-2.16 Mb). Despite their proximity, the interacting SNPs were outside the range of linkage disequilibrium (LD), which was absent between the pairs (r(2) < 0.01). Accordingly, we redefined the search space to detect cis-cis interactions, where a cis-SNP was located within 10 Mb of the target transcript. The results of this show evidence for the epistatic regulation of 50 transcripts across the tissues studied. Three transcripts, namely, HLA-G, PSORS1C1 and HLA-DRB5 share common regulatory SNPs in the pre-frontal cortex and their expression is significantly correlated. This pattern of epistasis is consistent with mediation via long-range chromatin structures rather than the binding of transcription factors in trans. Accordingly, some of the interactions map to regions of the genome known to physically interact in lymphoblastoid cell lines while others map to known promoter and enhancer elements. SNPs involved in interactions appear to be enriched for promoter markers. CONCLUSIONS: In the context of gene expression and its regulation, our analysis indicates that the study of cis-cis or local epistatic interactions may have a more important role than interchromosomal interactions.
Assuntos
Epistasia Genética , Genoma Humano , Locos de Características Quantitativas , Cerebelo/metabolismo , Lobo Frontal/metabolismo , Estudo de Associação Genômica Ampla , Genótipo , Cadeias HLA-DRB5/genética , Antígenos HLA-G/genética , Humanos , Desequilíbrio de Ligação , Fígado/metabolismo , Polimorfismo de Nucleotídeo Único , Regiões Promotoras Genéticas , Proteínas/genética , Córtex Visual/metabolismoRESUMO
Protein-protein and protein-peptide interactions are responsible for the vast majority of biological functions in vivo, but targeting these interactions with small molecules has historically been difficult. What is required are efficient combined computational and experimental screening methods to choose among a number of potential protein interfaces worthy of targeting lead macrocyclic compounds for further investigation. To achieve this, we have generated combinatorial 3D virtual libraries of short disulfide-bonded peptides and compared them to pharmacophore models of important protein-protein and protein-peptide structures, including short linear motifs (SLiMs), protein-binding peptides, and turn structures at protein-protein interfaces, built from 3D models available in the Protein Data Bank. We prepared a total of 372 reference pharmacophores, which were matched against 108,659 multiconformer cyclic peptides. After normalization to exclude nonspecific cyclic peptides, the top hits notably are enriched for mimetics of turn structures, including a turn at the interaction surface of human α thrombin, and also feature several protein-binding peptides. The top cyclic peptide hits also cover the critical "hot spot" interaction sites predicted from the interaction crystal structure. We have validated our method by testing cyclic peptides predicted to inhibit thrombin, a key protein in the blood coagulation pathway of important therapeutic interest, identifying a cyclic peptide inhibitor with lead-like activity. We conclude that protein interfaces most readily targetable by cyclic peptides and related macrocyclic drugs may be identified computationally among a set of candidate interfaces, accelerating the choice of interfaces against which lead compounds may be screened.
Assuntos
Biblioteca de Peptídeos , Peptídeos Cíclicos/química , Peptídeos Cíclicos/farmacologia , Domínios e Motivos de Interação entre Proteínas , Proteínas/química , Antitrombinas/química , Antitrombinas/farmacologia , Técnicas de Química Combinatória , Bases de Dados de Proteínas , Dissulfetos/química , Avaliação Pré-Clínica de Medicamentos/métodos , Humanos , Conformação Molecular , Peptidomiméticos/farmacologia , Agregação Plaquetária/efeitos dos fármacos , Reprodutibilidade dos Testes , Relação Estrutura-Atividade , Ressonância de Plasmônio de SuperfícieRESUMO
BACKGROUND: Bioactive cyclic peptides derived from natural sources are well studied, particularly those derived from non-ribosomal synthetases in fungi or bacteria. Ribosomally synthesised bioactive disulphide-bonded loops represent a large, naturally enriched library of potential bioactive compounds, worthy of systematic investigation. RESULTS: We examined the distribution of short cyclic loops on the surface of a large number of proteins, especially membrane or extracellular proteins. Available three-dimensional structures highlighted a number of disulphide-bonded loops responsible for the majority of the likely binding interactions in a variety of protein complexes, due to their location at protein-protein interfaces. We find that disulphide-bonded loops at protein-protein interfaces may, but do not necessarily, show biological activity independent of their parent protein. Examining the conservation of short disulphide bonded loops in proteins, we find a small but significant increase in conservation inside these loops compared to surrounding residues. We identify a subset of these loops that exhibit a high relative conservation, particularly among peptide hormones. CONCLUSIONS: We conclude that short disulphide-bonded loops are found in a wide variety of biological interactions. They may retain biological activity outside their parent proteins. Such structurally independent peptides may be useful as biologically active templates for the development of novel modulators of protein-protein interactions.
Assuntos
Biologia Computacional/métodos , Dissulfetos/química , Peptídeos Cíclicos/metabolismo , Proteínas/química , Proteínas/metabolismo , Sequência Conservada , Espaço Extracelular/metabolismo , Humanos , Peptídeos Cíclicos/química , Ligação ProteicaRESUMO
YAP (Yes-associated protein) is a potent oncogene and a major effector of the mammalian Hippo tumor suppressor pathway. In this review, our emphasis is on the structural basis of how YAP recognizes its various cellular partners. In particular, we discuss the role of LATS kinase and AMOTL1 junction protein, two key cellular partners of YAP that bind to its WW domain, in mediating cytoplasmic localization of YAP and thereby playing a key role in the regulation of its transcriptional activity. Importantly, the crystal structure of an amino-terminal domain of YAP in complex with the carboxy-terminal domain of TEAD transcription factor was only recently solved at atomic resolution, while the structure of WW domain of YAP in complex with a peptide containing the PPxY motif has been available for more than a decade. We discuss how such structural information may be exploited for the rational development of novel anti-cancer therapeutics harboring greater efficacy coupled with low toxicity. We also embark on a brief discussion of how recent in silico studies led to identification of the cardiac glycoside digitoxin as a potential modulator of WW domain-ligand interactions. Conversely, dobutamine was identified in a screen of known drugs as a compound that promotes cytoplasmic localization of YAP, thereby resulting in growth suppressing activity. Finally, we discuss how a recent study on the dynamics of WW domain folding on a biologically critical time scale may provide a tool to generate repertoires of WW domain variants for regulation of the Hippo pathway toward desired, non-oncogenic outputs.
Assuntos
Proteínas Adaptadoras de Transdução de Sinal/química , Antineoplásicos/uso terapêutico , Neoplasias/tratamento farmacológico , Neoplasias/metabolismo , Proteínas Nucleares/química , Proteínas Nucleares/metabolismo , Fosfoproteínas/química , Proteínas Adaptadoras de Transdução de Sinal/metabolismo , Animais , Desenho de Fármacos , Humanos , Terapia de Alvo Molecular , Fosfoproteínas/metabolismo , Dobramento de Proteína , Transdução de Sinais/efeitos dos fármacos , Fatores de Transcrição , Proteínas de Sinalização YAPRESUMO
Little is known about the digestive process in infants. In particular, the chronological activity of enzymes across the course of digestion in the infant remains largely unknown. To create a temporal picture of how milk proteins are digested, enzyme activity was compared between intact human milk samples from three mothers and the gastric samples from each of their 4-12 day postpartum infants, 2 h after breast milk ingestion. The activities of 7 distinct enzymes are predicted in the infant stomach based on their observed cleavage pattern in peptidomics data. We found that the same patterns of cleavage were evident in both intact human milk and gastric milk samples, demonstrating that the enzyme activities that begin in milk persist in the infant stomach. However, the extent of enzyme activity is found to vary greatly between the intact milk and gastric samples. Overall, we observe that milk-specific proteins are cleaved at higher levels in the stomach compared to human milk. Notably, the enzymes we predict here only explain 78% of the cleavages uniquely observed in the gastric samples, highlighting that further investigation of the specific enzyme activities associated with digestion in infants is warranted.
Assuntos
Mucosa Gástrica/metabolismo , Proteínas do Leite/metabolismo , Leite Humano/metabolismo , Peptídeos/metabolismo , Catepsina D/metabolismo , Quimotripsina/metabolismo , Digestão , Endopeptidases/metabolismo , Feminino , Fibrinolisina/metabolismo , Humanos , Recém-Nascido , Intubação Gastrointestinal , Espectrometria de Massas , Leite Humano/enzimologia , Mães , Elastase Pancreática/metabolismo , Pepsina A/metabolismo , Peptídeos/análise , Proteólise , Proteômica/métodos , Estômago/enzimologia , Tripsina/metabolismoRESUMO
Raised blood pressure (BP) is a major risk factor for cardiovascular disease. Previous studies have identified 47 distinct genetic variants robustly associated with BP, but collectively these explain only a few percent of the heritability for BP phenotypes. To find additional BP loci, we used a bespoke gene-centric array to genotype an independent discovery sample of 25,118 individuals that combined hypertensive case-control and general population samples. We followed up four SNPs associated with BP at our p < 8.56 × 10(-7) study-specific significance threshold and six suggestively associated SNPs in a further 59,349 individuals. We identified and replicated a SNP at LSP1/TNNT3, a SNP at MTHFR-NPPB independent (r(2) = 0.33) of previous reports, and replicated SNPs at AGT and ATP2B1 reported previously. An analysis of combined discovery and follow-up data identified SNPs significantly associated with BP at p < 8.56 × 10(-7) at four further loci (NPR3, HFE, NOS3, and SOX6). The high number of discoveries made with modest genotyping effort can be attributed to using a large-scale yet targeted genotyping array and to the development of a weighting scheme that maximized power when meta-analyzing results from samples ascertained with extreme phenotypes, in combination with results from nonascertained or population samples. Chromatin immunoprecipitation and transcript expression data highlight potential gene regulatory mechanisms at the MTHFR and NOS3 loci. These results provide candidates for further study to help dissect mechanisms affecting BP and highlight the utility of studying SNPs and samples that are independent of those studied previously even when the sample size is smaller than that in previous studies.
Assuntos
Loci Gênicos , Hipertensão/genética , Análise de Sequência com Séries de Oligonucleotídeos , Adulto , Idoso , Pressão Sanguínea/genética , Estudos de Casos e Controles , Feminino , Perfilação da Expressão Gênica , Frequência do Gene , Estudo de Associação Genômica Ampla , Haplótipos , Humanos , Desequilíbrio de Ligação , Masculino , Metilenotetra-Hidrofolato Redutase (NADPH2)/genética , Pessoa de Meia-Idade , ATPases Transportadoras de Cálcio da Membrana Plasmática/genética , Polimorfismo de Nucleotídeo Único , Receptores do Fator Natriurético Atrial/genética , Análise de Sequência de DNARESUMO
Cell penetrating peptides (CPPs) are attracting much attention as a means of overcoming the inherently poor cellular uptake of various bioactive molecules. Here, we introduce CPPpred, a web server for the prediction of CPPs using a N-to-1 neural network. The server takes one or more peptide sequences, between 5 and 30 amino acids in length, as input and returns a prediction of how likely each peptide is to be cell penetrating. CPPpred was developed with redundancy reduced training and test sets, offering an advantage over the only other currently available CPP prediction method.
Assuntos
Peptídeos Penetradores de Células/química , Biologia Computacional , Redes Neurais de Computação , Análise de Sequência de Proteína , Software , Peptídeos Penetradores de Células/metabolismo , Bases de Dados de Proteínas , Humanos , InternetRESUMO
MOTIVATION: Peptides play important roles in signalling, regulation and immunity within an organism. Many have successfully been used as therapeutic products often mimicking naturally occurring peptides. Here we present PeptideLocator for the automated prediction of functional peptides in a protein sequence. RESULTS: We have trained a machine learning algorithm to predict bioactive peptides within protein sequences. PeptideLocator performs well on training data achieving an area under the curve of 0.92 when tested in 5-fold cross-validation on a set of 2202 redundancy reduced peptide containing protein sequences. It has predictive power when applied to antimicrobial peptides, cytokines, growth factors, peptide hormones, toxins, venoms and other peptides. It can be applied to refine the choice of experimental investigations in functional studies of proteins. AVAILABILITY AND IMPLEMENTATION: PeptideLocator is freely available for academic users at http://bioware.ucd.ie/.
Assuntos
Algoritmos , Peptídeos/química , Análise de Sequência de Proteína/métodos , Peptídeos Catiônicos Antimicrobianos/química , Inteligência Artificial , Peptídeos/classificação , Proteínas/químicaRESUMO
Large portions of higher eukaryotic proteomes are intrinsically disordered, and abundant evidence suggests that these unstructured regions of proteins are rich in regulatory interaction interfaces. A major class of disordered interaction interfaces are the compact and degenerate modules known as short linear motifs (SLiMs). As a result of the difficulties associated with the experimental identification and validation of SLiMs, our understanding of these modules is limited, advocating the use of computational methods to focus experimental discovery. This article evaluates the use of evolutionary conservation as a discriminatory technique for motif discovery. A statistical framework is introduced to assess the significance of relatively conserved residues, quantifying the likelihood a residue will have a particular level of conservation given the conservation of the surrounding residues. The framework is expanded to assess the significance of groupings of conserved residues, a metric that forms the basis of SLiMPrints (short linear motif fingerprints), a de novo motif discovery tool. SLiMPrints identifies relatively overconstrained proximal groupings of residues within intrinsically disordered regions, indicative of putatively functional motifs. Finally, the human proteome is analysed to create a set of highly conserved putative motif instances, including a novel site on translation initiation factor eIF2A that may regulate translation through binding of eIF4E.
Assuntos
Motivos de Aminoácidos , Análise de Sequência de Proteína/métodos , Proteínas Adaptadoras de Transporte Vesicular/química , Sequência de Aminoácidos , Sequência Conservada , Fator de Iniciação 2 em Eucariotos/química , Fator de Iniciação 2 em Eucariotos/metabolismo , Fator de Iniciação 4E em Eucariotos/metabolismo , Proteínas F-Box/química , Células HeLa , Humanos , Dados de Sequência Molecular , Probabilidade , Proteoma/química , Alinhamento de SequênciaRESUMO
Milk is a hallmark of mammalian evolution: a unique food that has evolved with mammals. Despite the importance of this food, it is not known if variation in AA composition between different species is important to milk proteins or how it might affect the nutritional value of milk. As milk is the only food source for newborn mammals, it has long been speculated that milk proteins should be enriched in essential AA. However, no systematic analysis supports this assumption. Although many factors influence the overall nutritional value of milk, including total protein concentration, we focused here on the AA composition of milk proteins and investigated the possibility that selection drives compositional changes. We identified 9 major milk proteins present in 13 mammalian species and compared them with a large group of nonmilk proteins. Our results indicate heterogeneity in the AA composition of milk proteins, showing significant enrichment and depletion of certain AA in milk-specific proteins. Although high levels of particular AA appear to be consistently maintained, orthologous milk proteins display significant differences in AA composition across species, most notably among the caseins. Interspecies variation of milk composition is thought to be indicative of nutritional optimization to the requirements of the species. In accordance with this, our observations indicate that milk proteins may have adapted to the species-specific nutritional needs of the neonate.