RESUMO
We have developed a simple and totally in vitro selection procedure based on cell-free cotranslation using a highly stable and efficient in vitro virus (IVV). Cell-free cotranslation of tagged bait and prey proteins is advantageous for the formation of protein complexes and allows high-throughput analysis of protein-protein interactions (PPI) as a result of providing in vitro instead of in vivo preparation of bait proteins. The use of plural selection rounds and a two-step purification of the IVV selection, followed by in vitro post-selection, is advantageous for decreasing false positives. This simple IVV selection system based on cell-free cotranslation is applicable to high-throughput and comprehensive analysis of transcription factor networks.
Assuntos
Mapeamento de Interação de Proteínas/métodos , Fatores de Transcrição/metabolismo , Animais , Sistema Livre de Células/metabolismo , DNA Complementar/genética , Biblioteca Gênica , Humanos , Biossíntese de Proteínas , Mapas de Interação de Proteínas , RNA Mensageiro/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa/métodos , Fatores de Transcrição/análise , Fatores de Transcrição/genéticaRESUMO
Next-generation sequencing (NGS) has been applied to various kinds of omics studies, resulting in many biological and medical discoveries. However, high-throughput protein-protein interactome datasets derived from detection by sequencing are scarce, because protein-protein interaction analysis requires many cell manipulations to examine the interactions. The low reliability of the high-throughput data is also a problem. Here, we describe a cell-free display technology combined with NGS that can improve both the coverage and reliability of interactome datasets. The completely cell-free method gives a high-throughput and a large detection space, testing the interactions without using clones. The quantitative information provided by NGS reduces the number of false positives. The method is suitable for the in vitro detection of proteins that interact not only with the bait protein, but also with DNA, RNA and chemical compounds. Thus, it could become a universal approach for exploring the large space of protein sequences and interactome networks.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Sequência de Aminoácidos , Animais , Sequência de Bases , Sistema Livre de Células , Biologia Computacional , DNA Complementar , Camundongos , Proteínas/química , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Reação em Cadeia da Polimerase em Tempo Real/métodos , Reprodutibilidade dos Testes , Análise de Sequência de DNARESUMO
UNLABELLED: Although protein-RNA interactions (PRIs) are involved in various important cellular processes, compiled data on PRIs are still limited. This contrasts with protein-protein interactions, which have been intensively recorded in public databases and subjected to network level analysis. Here, we introduce PRD, an online database of PRIs, dispersed across several sources, including scientific literature. Currently, over 10,000 interactions have been stored in PRD using PSI-MI 2.5, which is a standard model for describing detailed molecular interactions, with an emphasis on gene level data. Users can browse all recorded interactions and execute flexible keyword searches against the database via a web interface. Our database is not only a reference of PRIs, but will also be a valuable resource for studying characteristics of PRI networks. AVAILABILITY: PRD can be freely accessed at http://pri.hgc.jp/
RESUMO
UNLABELLED: Protein-protein interactions (PPIs) are mediated through specific regions on proteins. Some proteins have two or more protein interacting regions (IRs) and some IRs are competitively used for interactions with different proteins. IRView currently contains data for 3417 IRs in human and mouse proteins. The data were obtained from different sources and combined with annotated region data from InterPro. Information on non-synonymous single nucleotide polymorphism sites and variable regions owing to alternative mRNA splicing is also included. The IRView web interface displays all IR data, including user-uploaded data, on reference sequences so that the positional relationship between IRs can be easily understood. IRView should be useful for analyzing underlying relationships between the proteins behind the PPI networks. AVAILABILITY: IRView is publicly available on the web at http://ir.hgc.jp/
Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Proteínas/análise , Software , Processamento Alternativo , Animais , Humanos , Internet , Camundongos , Estrutura Terciária de ProteínaRESUMO
BACKGROUND: High-throughput methods for detecting protein-protein interactions enable us to obtain large interaction networks, and also allow us to computationally identify the associations of proteins as protein complexes. Although there are methods to extract protein complexes as sets of proteins from interaction networks, the extracted complexes may include false positives because they do not account for the structural limitations of the proteins and thus do not check that the proteins in the extracted complex can simultaneously bind to each other. In addition, there have been few searches for deeper insights into the protein complexes, such as of the topology of the protein-protein interactions or into the domain-domain interactions that mediate the protein interactions. RESULTS: Here, we introduce a combinatorial approach for prediction of protein complexes focusing not only on determining member proteins in complexes but also on the DDI/PPI organization of the complexes. Our method analyzes complex candidates predicted by the existing methods. It searches for optimal combinations of domain-domain interactions in the candidates based on an assumption that the proteins in a candidate can form a true protein complex if each of the domains is used by a single protein interaction. This optimization problem was mathematically formulated and solved using binary integer linear programming. By using publicly available sets of yeast protein-protein interactions and domain-domain interactions, we succeeded in extracting protein complex candidates with an accuracy that is twice the average accuracy of the existing methods, MCL, MCODE, or clustering coefficient. Although the configuring parameters for each algorithm resulted in slightly improved precisions, our method always showed better precision for most values of the parameters. CONCLUSIONS: Our combinatorial approach can provide better accuracy for prediction of protein complexes and also enables to identify both direct PPIs and DDIs that mediate them in complexes.
Assuntos
Algoritmos , Complexos Multiproteicos/química , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Proteínas/metabolismo , Análise por Conglomerados , Programação Linear , Técnicas do Sistema de Duplo-HíbridoRESUMO
Large-scale data sets of protein-protein interactions (PPIs) are a valuable resource for mapping and analysis of the topological and dynamic features of interactome networks. The currently available large-scale PPI data sets only contain information on interaction partners. The data presented in this study also include the sequences involved in the interactions (i.e., the interacting regions, IRs) suggested to correspond to functional and structural domains. Here we present the first large-scale IR data set obtained using mRNA display for 50 human transcription factors (TFs), including 12 transcription-related proteins. The core data set (966 IRs; 943 PPIs) displays a verification rate of 70%. Analysis of the IR data set revealed the existence of IRs that interact with multiple partners. Furthermore, these IRs were preferentially associated with intrinsic disorder. This finding supports the hypothesis that intrinsically disordered regions play a major role in the dynamics and diversity of TF networks through their ability to structurally adapt to and bind with multiple partners. Accordingly, this domain-based interaction resource represents an important step in refining protein interactions and networks at the domain level and in associating network analysis with biological structure and function.
Assuntos
Redes Reguladoras de Genes , Mapeamento de Interação de Proteínas/métodos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Sítios de Ligação/genética , Bases de Dados de Proteínas , Perfilação da Expressão Gênica , Humanos , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Estrutura Terciária de Proteína , Proteômica , Fatores de Transcrição/químicaRESUMO
BACKGROUND: A GC-compositional strand bias or GC-skew (=(C-G)/(C+G)), where C and G denote the numbers of cytosine and guanine residues, was recently reported near the transcription start sites (TSS) of Arabidopsis genes. However, it is unclear whether other eukaryotic species have equally prominent GC-skews, and the biological meaning of this trait remains unknown. RESULTS: Our study confirmed a significant GC-skew (C > G) in the TSS of Oryza sativa (rice) genes. The full-length cDNAs and genomic sequences from Arabidopsis and rice were compared using statistical analyses. Despite marked differences in the G+C content around the TSS in the two plants, the degrees of bias were almost identical. Although slight GC-skew peaks, including opposite skews (C < G), were detected around the TSS of genes in human and Drosophila, they were qualitatively and quantitatively different from those identified in plants. However, plant-like GC-skew in regions upstream of the translation initiation sites (TIS) in some fungi was identified following analyses of the expressed sequence tags and/or genomic sequences from other species. On the basis of our dataset, we estimated that > 70 and 68% of Arabidopsis and rice genes, respectively, had a strong GC-skew (> 0.33) in a 100-bp window (that is, the number of C residues was more than double the number of G residues in a +/-100-bp window around the TSS). The mean GC-skew value in the TSS of highly-expressed genes in Arabidopsis was significantly greater than that of genes with low expression levels. Many of the GC-skew peaks were preferentially located near the TSS, so we examined the potential value of GC-skew as an index for TSS identification. Our results confirm that the GC-skew can be used to assist the TSS prediction in plant genomes. CONCLUSION: The GC-skew (C > G) around the TSS is strictly conserved between monocot and eudicot plants (ie. angiosperms in general), and a similar skew has been observed in some fungi. Highly-expressed Arabidopsis genes had overall a more marked GC-skew in the TSS compared to genes with low expression levels. We therefore propose that the GC-skew around the TSS in some plants and fungi is related to transcription. It might be caused by mutations during transcription initiation or the frequent use of transcription factor-biding sites having a strand preference. In addition, GC-skew is a good candidate index for TSS prediction in plant genomes, where there is a lack of correlation among CpG islands and genes.
Assuntos
Arabidopsis/genética , Regulação Fúngica da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Genes Fúngicos , Genes de Plantas , Transcrição Gênica , Composição de Bases , Sequência de Bases , Ilhas de CpG , DNA Complementar/metabolismo , Genoma , Genoma de Planta , Modelos Genéticos , Modelos Estatísticos , Mutação , Oryza/genética , Plantas/genética , Biossíntese de Proteínas , Curva ROC , Sítio de Iniciação de TranscriçãoRESUMO
A computer-based analysis was conducted to assess the characteristics of microsatellites in transcribed regions of rice and Arabidopsis. In addition, two mammals were simultaneously analyzed for a comparative analysis. Our analyses confirmed a novel plant-specific feature in which there is a gradient in microsatellite density along the direction of transcription. It was also confirmed that pyrimidine-rich microsatellites are found intensively near the transcription start sites, specifically in the two plants, but not in the mammals. Our results suggest that microsatellites located at high frequency in the 5'-flanking regions of plant genes can potentially act as factors in regulating gene expression.