Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 61
Filtrar
1.
Nucleic Acids Res ; 47(W1): W175-W182, 2019 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-31127311

RESUMEN

The discovery and development of DNA-editing nucleases (Zinc Finger Nucleases, TALENs, CRISPR/Cas systems) has given scientists the ability to precisely engineer or edit genomes as never before. Several different platforms, protocols and vectors for precision genome editing are now available, leading to the development of supporting web-based software. Here we present the Gene Sculpt Suite (GSS), which comprises three tools: (i) GTagHD, which automatically designs and generates oligonucleotides for use with the GeneWeld knock-in protocol; (ii) MEDJED, a machine learning method, which predicts the extent to which a double-stranded DNA break site will utilize the microhomology-mediated repair pathway; and (iii) MENTHU, a tool for identifying genomic locations likely to give rise to a single predominant microhomology-mediated end joining allele (PreMA) repair outcome. All tools in the GSS are freely available for download under the GPL v3.0 license and can be run locally on Windows, Mac and Linux systems capable of running R and/or Docker. The GSS is also freely available online at www.genesculpt.org.


Asunto(s)
Bases de Datos Genéticas , Edición Génica , Ingeniería Genética/métodos , Programas Informáticos , Animales , Sistemas CRISPR-Cas/genética , Roturas del ADN de Doble Cadena , Humanos , Nucleasas de los Efectores Tipo Activadores de la Transcripción/genética , Nucleasas con Dedos de Zinc/genética
2.
PLoS Genet ; 14(9): e1007652, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-30208061

RESUMEN

One key problem in precision genome editing is the unpredictable plurality of sequence outcomes at the site of targeted DNA double stranded breaks (DSBs). This is due to the typical activation of the versatile Non-homologous End Joining (NHEJ) pathway. Such unpredictability limits the utility of somatic gene editing for applications including gene therapy and functional genomics. For germline editing work, the accurate reproduction of the identical alleles using NHEJ is a labor intensive process. In this study, we propose Microhomology-mediated End Joining (MMEJ) as a viable solution for improving somatic sequence homogeneity in vivo, capable of generating a single predictable allele at high rates (56% ~ 86% of the entire mutant allele pool). Using a combined dataset from zebrafish (Danio rerio) in vivo and human HeLa cell in vitro, we identified specific contextual sequence determinants surrounding genomic DSBs for robust MMEJ pathway activation. We then applied our observation to prospectively design MMEJ-inducing sgRNAs against a variety of proof-of-principle genes and demonstrated high levels of mutant allele homogeneity. MMEJ-based DNA repair at these target loci successfully generated F0 mutant zebrafish embryos and larvae that faithfully recapitulated previously reported, recessive, loss-of-function phenotypes. We also tested the generalizability of our approach in cultured human cells. Finally, we provide a novel algorithm, MENTHU (http://genesculpt.org/menthu/), for improved and facile prediction of candidate MMEJ loci. We believe that this MMEJ-centric approach will have a broader impact on genome engineering and its applications. For example, whereas somatic mosaicism hinders efficient recreation of knockout mutant allele at base pair resolution via the standard NHEJ-based approach, we demonstrate that F0 founders transmitted the identical MMEJ allele of interest at high rates. Most importantly, the ability to directly dictate the reading frame of an endogenous target will have important implications for gene therapy applications in human genetic diseases.


Asunto(s)
Roturas del ADN de Doble Cadena , Reparación del ADN por Unión de Extremidades/genética , Edición Génica/métodos , Modelos Genéticos , Algoritmos , Alelos , Animales , Estudios de Factibilidad , Femenino , Enfermedades Genéticas Congénitas/genética , Enfermedades Genéticas Congénitas/terapia , Terapia Genética/métodos , Células HeLa , Humanos , Masculino , Mutagénesis Sitio-Dirigida , ARN Guía de Kinetoplastida/genética , ARN Guía de Kinetoplastida/metabolismo , Pez Cebra
3.
Proteins ; 87(3): 198-211, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30536635

RESUMEN

RNA-protein interactions play essential roles in regulating gene expression. While some RNA-protein interactions are "specific", that is, the RNA-binding proteins preferentially bind to particular RNA sequence or structural motifs, others are "non-RNA specific." Deciphering the protein-RNA recognition code is essential for comprehending the functional implications of these interactions and for developing new therapies for many diseases. Because of the high cost of experimental determination of protein-RNA interfaces, there is a need for computational methods to identify RNA-binding residues in proteins. While most of the existing computational methods for predicting RNA-binding residues in RNA-binding proteins are oblivious to the characteristics of the partner RNA, there is growing interest in methods for partner-specific prediction of RNA binding sites in proteins. In this work, we assess the performance of two recently published partner-specific protein-RNA interface prediction tools, PS-PRIP, and PRIdictor, along with our own new tools. Specifically, we introduce a novel metric, RNA-specificity metric (RSM), for quantifying the RNA-specificity of the RNA binding residues predicted by such tools. Our results show that the RNA-binding residues predicted by previously published methods are oblivious to the characteristics of the putative RNA binding partner. Moreover, when evaluated using partner-agnostic metrics, RNA partner-specific methods are outperformed by the state-of-the-art partner-agnostic methods. We conjecture that either (a) the protein-RNA complexes in PDB are not representative of the protein-RNA interactions in nature, or (b) the current methods for partner-specific prediction of RNA-binding residues in proteins fail to account for the differences in RNA partner-specific versus partner-agnostic protein-RNA interactions, or both.


Asunto(s)
Biología Computacional , Proteínas/química , Proteínas de Unión al ARN/genética , ARN/genética , Secuencia de Aminoácidos/genética , Secuencia de Bases/genética , Sitios de Unión/genética , Modelos Moleculares , Unión Proteica/genética , Conformación Proteica , Proteínas/genética , ARN/química , Motivos de Unión al ARN/genética , Proteínas de Unión al ARN/química , Análisis de Secuencia de Proteína , Programas Informáticos
4.
Brief Bioinform ; 18(3): 458-466, 2017 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-27013645

RESUMEN

Although many advanced and sophisticated ab initio approaches for modeling protein-protein complexes have been proposed in past decades, template-based modeling (TBM) remains the most accurate and widely used approach, given a reliable template is available. However, there are many different ways to exploit template information in the modeling process. Here, we systematically evaluate and benchmark a TBM method that uses conserved interfacial residue pairs as docking distance restraints [referred to as alpha carbon-alpha carbon (CA-CA)-guided docking]. We compare it with two other template-based protein-protein modeling approaches, including a conserved non-pairwise interfacial residue restrained docking approach [referred to as the ambiguous interaction restraint (AIR)-guided docking] and a simple superposition-based modeling approach. Our results show that, for most cases, the CA-CA-guided docking method outperforms both superposition with refinement and the AIR-guided docking method. We emphasize the superiority of the CA-CA-guided docking on cases with medium to large conformational changes, and interactions mediated through loops, tails or disordered regions. Our results also underscore the importance of a proper refinement of superimposition models to reduce steric clashes. In summary, we provide a benchmarked TBM protocol that uses conserved pairwise interface distance as restraints in generating realistic 3D protein-protein interaction models, when reliable templates are available. The described CA-CA-guided docking protocol is based on the HADDOCK platform, which allows users to incorporate additional prior knowledge of the target system to further improve the quality of the resulting models.


Asunto(s)
Proteínas/metabolismo , Modelos Moleculares , Unión Proteica
5.
Retrovirology ; 14(1): 40, 2017 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-28830558

RESUMEN

BACKGROUND: Rev-like proteins are post-transcriptional regulatory proteins found in several retrovirus genera, including lentiviruses, betaretroviruses, and deltaretroviruses. These essential proteins mediate the nuclear export of incompletely spliced viral RNA, and act by tethering viral pre-mRNA to the host CRM1 nuclear export machinery. Although all Rev-like proteins are functionally homologous, they share less than 30% sequence identity. In the present study, we computationally assessed the extent of structural homology among retroviral Rev-like proteins within a phylogenetic framework. RESULTS: We undertook a comprehensive analysis of overall protein domain architecture and predicted secondary structural features for representative members of the Rev-like family of proteins. Similar patterns of α-helical domains were identified for Rev-like proteins within each genus, with the exception of deltaretroviruses, which were devoid of α-helices. Coiled-coil oligomerization motifs were also identified for most Rev-like proteins, with the notable exceptions of HIV-1, the deltaretroviruses, and some small ruminant lentiviruses. In Rev proteins of primate lentiviruses, the presence of predicted coiled-coil motifs segregated within specific primate lineages: HIV-1 descended from SIVs that lacked predicted coiled-coils in Rev whereas HIV-2 descended from SIVs that contained predicted coiled-coils in Rev. Phylogenetic ancestral reconstruction of coiled-coils for all Rev-like proteins predicted a single origin for the coiled-coil motif, followed by three losses of the predicted signal. The absence of a coiled-coil signal in HIV-1 was associated with replacement of canonical polar residues with non-canonical hydrophobic residues. However, hydrophobic residues were retained in the key 'a' and 'd' positions, and the α-helical region of HIV-1 Rev oligomerization domain could be modeled as a helical wheel with two predicted interaction interfaces. Moreover, the predicted interfaces mapped to the dimerization and oligomerization interfaces in HIV-1 Rev crystal structures. Helical wheel projections of other retroviral Rev-like proteins, including endogenous sequences, revealed similar interaction interfaces that could mediate oligomerization. CONCLUSIONS: Sequence-based computational analyses of Rev-like proteins, together with helical wheel projections of oligomerization domains, reveal a conserved homogeneous structural basis for oligomerization by retroviral Rev-like proteins.


Asunto(s)
Productos del Gen rev/química , Productos del Gen rev/metabolismo , Modelos Moleculares , Retroviridae/química , Retroviridae/metabolismo , Secuencia de Aminoácidos , Dimerización , Variación Genética , Filogenia , Estructura Secundaria de Proteína , Proteínas de los Retroviridae/química , Proteínas de los Retroviridae/metabolismo , Homología de Secuencia de Aminoácido
6.
New Phytol ; 212(2): 444-60, 2016 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-27265684

RESUMEN

Heterodera glycines, the soybean cyst nematode, delivers effector proteins into soybean roots to initiate and maintain an obligate parasitic relationship. HgGLAND18 encodes a candidate H. glycines effector and is expressed throughout the infection process. We used a combination of molecular, genetic, bioinformatic and phylogenetic analyses to determine the role of HgGLAND18 during H. glycines infection. HgGLAND18 is necessary for pathogenicity in compatible interactions with soybean. The encoded effector strongly suppresses both basal and hypersensitive cell death innate immune responses, and immunosuppression requires the presence and coordination between multiple protein domains. The N-terminal domain in HgGLAND18 contains unique sequence similarity to domains of an immunosuppressive effector of Plasmodium spp., the malaria parasites. The Plasmodium effector domains functionally complement the loss of the N-terminal domain from HgGLAND18. In-depth sequence searches and phylogenetic analyses demonstrate convergent evolution between effectors from divergent parasites of plants and animals as the cause of sequence and functional similarity.


Asunto(s)
Glycine max/inmunología , Glycine max/parasitología , Inmunidad Innata , Inmunidad de la Planta , Plasmodium/fisiología , Tylenchoidea/fisiología , Factores de Virulencia/metabolismo , Secuencia de Aminoácidos , Animales , Prueba de Complementación Genética , Mutación/genética , Proteínas de Plantas/química , Raíces de Plantas/parasitología , Polimorfismo Genético , Dominios Proteicos , Interferencia de ARN , Secuencias Repetitivas de Ácidos Nucleicos/genética , Tylenchoidea/patogenicidad , Virulencia
7.
Mol Cell ; 31(2): 294-301, 2008 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-18657511

RESUMEN

Custom-made zinc-finger nucleases (ZFNs) can induce targeted genome modifications with high efficiency in cell types including Drosophila, C. elegans, plants, and humans. A bottleneck in the application of ZFN technology has been the generation of highly specific engineered zinc-finger arrays. Here we describe OPEN (Oligomerized Pool ENgineering), a rapid, publicly available strategy for constructing multifinger arrays, which we show is more effective than the previously published modular assembly method. We used OPEN to construct 37 highly active ZFN pairs which induced targeted alterations with high efficiencies (1%-50%) at 11 different target sites located within three endogenous human genes (VEGF-A, HoxB13, and CFTR), an endogenous plant gene (tobacco SuRA), and a chromosomally integrated EGFP reporter gene. In summary, OPEN provides an "open-source" method for rapidly engineering highly active zinc-finger arrays, thereby enabling broader practice, development, and application of ZFN technology for biological research and gene therapy.


Asunto(s)
Endonucleasas/metabolismo , Ingeniería Genética/métodos , Dedos de Zinc , Secuencia de Bases , Endonucleasas/toxicidad , Marcación de Gen , Proteínas Fluorescentes Verdes/genética , Humanos , Células K562 , Datos de Secuencia Molecular , Mutagénesis , Mutación/genética , Conformación Proteica
8.
Proteins ; 82(2): 250-67, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-23873600

RESUMEN

Selecting near-native conformations from the immense number of conformations generated by docking programs remains a major challenge in molecular docking. We introduce DockRank, a novel approach to scoring docked conformations based on the degree to which the interface residues of the docked conformation match a set of predicted interface residues. DockRank uses interface residues predicted by partner-specific sequence homology-based protein-protein interface predictor (PS-HomPPI), which predicts the interface residues of a query protein with a specific interaction partner. We compared the performance of DockRank with several state-of-the-art docking scoring functions using Success Rate (the percentage of cases that have at least one near-native conformation among the top m conformations) and Hit Rate (the percentage of near-native conformations that are included among the top m conformations). In cases where it is possible to obtain partner-specific (PS) interface predictions from PS-HomPPI, DockRank consistently outperforms both (i) ZRank and IRAD, two state-of-the-art energy-based scoring functions (improving Success Rate by up to 4-fold); and (ii) Variants of DockRank that use predicted interface residues obtained from several protein interface predictors that do not take into account the binding partner in making interface predictions (improving success rate by up to 39-fold). The latter result underscores the importance of using partner-specific interface residues in scoring docked conformations. We show that DockRank, when used to re-rank the conformations returned by ClusPro, improves upon the original ClusPro rankings in terms of both Success Rate and Hit Rate. DockRank is available as a server at http://einstein.cs.iastate.edu/DockRank/.


Asunto(s)
Simulación del Acoplamiento Molecular , Programas Informáticos , Ligandos , Dominios y Motivos de Interacción de Proteínas , Estructura Cuaternaria de Proteína , Receptores de Superficie Celular/química , Homología de Secuencia de Aminoácido , Homología Estructural de Proteína , Termodinámica
9.
Retrovirology ; 11: 115, 2014 Dec 23.
Artículo en Inglés | MEDLINE | ID: mdl-25533001

RESUMEN

BACKGROUND: The lentiviral Rev protein mediates nuclear export of intron-containing viral RNAs that encode structural proteins or serve as the viral genome. Following translation, HIV-1 Rev localizes to the nucleus and binds its cognate sequence, termed the Rev-responsive element (RRE), in incompletely spliced viral RNA. Rev subsequently multimerizes along the viral RNA and associates with the cellular Crm1 export machinery to translocate the RNA-protein complex to the cytoplasm. Equine infectious anemia virus (EIAV) Rev is functionally homologous to HIV-1 Rev, but shares very little sequence similarity and differs in domain organization. EIAV Rev also contains a bipartite RNA binding domain comprising two short arginine-rich motifs (designated ARM-1 and ARM-2) spaced 79 residues apart in the amino acid sequence. To gain insight into the topology of the bipartite RNA binding domain, a computational approach was used to model the tertiary structure of EIAV Rev. RESULTS: The tertiary structure of EIAV Rev was modeled using several protein structure prediction and model quality assessment servers. Two types of structures were predicted: an elongated structure with an extended central alpha helix, and a globular structure with a central bundle of helices. Assessment of models on the basis of biophysical properties indicated they were of average quality. In almost all models, ARM-1 and ARM-2 were spatially separated by >15 Å, suggesting that they do not form a single RNA binding interface on the monomer. A highly conserved canonical coiled-coil motif was identified in the central region of EIAV Rev, suggesting that an RNA binding interface could be formed through dimerization of Rev and juxtaposition of ARM-1 and ARM-2. In support of this, purified Rev protein migrated as a dimer in Blue native gels, and mutation of a residue predicted to form a key coiled-coil contact disrupted dimerization and abrogated RNA binding. In contrast, mutation of residues outside the predicted coiled-coil interface had no effect on dimerization or RNA binding. CONCLUSIONS: Our results suggest that EIAV Rev binding to the RRE requires dimerization via a coiled-coil motif to juxtapose two RNA binding motifs, ARM-1 and ARM-2.


Asunto(s)
Productos del Gen rev/química , Productos del Gen rev/metabolismo , Virus de la Anemia Infecciosa Equina/fisiología , Multimerización de Proteína , ARN Viral/metabolismo , Modelos Moleculares , Unión Proteica , Conformación Proteica
10.
Nat Methods ; 8(1): 67-9, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21151135

RESUMEN

Engineered zinc-finger nucleases (ZFNs) enable targeted genome modification. Here we describe context-dependent assembly (CoDA), a platform for engineering ZFNs using only standard cloning techniques or custom DNA synthesis. Using CoDA-generated ZFNs, we rapidly altered 20 genes in Danio rerio, Arabidopsis thaliana and Glycine max. The simplicity and efficacy of CoDA will enable broad adoption of ZFN technology and make possible large-scale projects focused on multigene pathways or genome-wide alterations.


Asunto(s)
Endonucleasas/genética , Endonucleasas/metabolismo , Ingeniería de Proteínas , Dedos de Zinc/fisiología , Animales , Arabidopsis/genética , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Genoma , Glycine max/genética , Pez Cebra/genética , Dedos de Zinc/genética
11.
Proc Natl Acad Sci U S A ; 108(23): 9443-8, 2011 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-21606328

RESUMEN

Telomerases constitute a group of specialized ribonucleoprotein enzymes that remediate chromosomal shrinkage resulting from the "end-replication" problem. Defects in telomere length regulation are associated with several diseases as well as with aging and cancer. Despite significant progress in understanding the roles of telomerase, the complete structure of the human telomerase enzyme bound to telomeric DNA remains elusive, with the detailed molecular mechanism of telomere elongation still unknown. By application of computational methods for distant homology detection, comparative modeling, and molecular docking, guided by available experimental data, we have generated a three-dimensional structural model of a partial telomerase elongation complex composed of three essential protein domains bound to a single-stranded telomeric DNA sequence in the form of a heteroduplex with the template region of the human RNA subunit, TER. This model provides a structural mechanism for the processivity of telomerase and offers new insights into elongation. We conclude that the RNADNA heteroduplex is constrained by the telomerase TEN domain through repeated extension cycles and that the TEN domain controls the process by moving the template ahead one base at a time by translation and rotation of the double helix. The RNA region directly following the template can bind complementarily to the newly synthesized telomeric DNA, while the template itself is reused in the telomerase active site during the next reaction cycle. This first structural model of the human telomerase enzyme provides many details of the molecular mechanism of telomerase and immediately provides an important target for rational drug design.


Asunto(s)
ADN/química , Estructura Terciaria de Proteína , Telomerasa/química , Telómero/química , Secuencia de Aminoácidos , Sitios de Unión/genética , Dominio Catalítico , Simulación por Computador , ADN/genética , ADN/metabolismo , Humanos , Cinética , Modelos Moleculares , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Ácidos Nucleicos Heterodúplex/química , Ácidos Nucleicos Heterodúplex/genética , Ácidos Nucleicos Heterodúplex/metabolismo , Polimerizacion , Unión Proteica , Estructura Secundaria de Proteína , ARN/química , ARN/genética , ARN/metabolismo , Homología de Secuencia de Aminoácido , Telomerasa/genética , Telomerasa/metabolismo , Telómero/genética , Telómero/metabolismo
12.
Nucleic Acids Res ; 39(Database issue): D277-82, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21071426

RESUMEN

The Protein-RNA Interface Database (PRIDB) is a comprehensive database of protein-RNA interfaces extracted from complexes in the Protein Data Bank (PDB). It is designed to facilitate detailed analyses of individual protein-RNA complexes and their interfaces, in addition to automated generation of user-defined data sets of protein-RNA interfaces for statistical analyses and machine learning applications. For any chosen PDB complex or list of complexes, PRIDB rapidly displays interfacial amino acids and ribonucleotides within the primary sequences of the interacting protein and RNA chains. PRIDB also identifies ProSite motifs in protein chains and FR3D motifs in RNA chains and provides links to these external databases, as well as to structure files in the PDB. An integrated JMol applet is provided for visualization of interacting atoms and residues in the context of the 3D complex structures. The current version of PRIDB contains structural information regarding 926 protein-RNA complexes available in the PDB (as of 10 October 2010). Atomic- and residue-level contact information for the entire data set can be downloaded in a simple machine-readable format. Also, several non-redundant benchmark data sets of protein-RNA complexes are provided. The PRIDB database is freely available online at http://bindr.gdcb.iastate.edu/PRIDB.


Asunto(s)
Bases de Datos de Proteínas , Proteínas de Unión al ARN/química , ARN/química , Aminoácidos/química , Sitios de Unión , Conformación de Ácido Nucleico , Conformación Proteica , Ribonucleótidos/química , Interfaz Usuario-Computador
13.
Proc Natl Acad Sci U S A ; 107(26): 12028-33, 2010 Jun 29.
Artículo en Inglés | MEDLINE | ID: mdl-20508152

RESUMEN

We report here an efficient method for targeted mutagenesis of Arabidopsis genes through regulated expression of zinc finger nucleases (ZFNs)-enzymes engineered to create DNA double-strand breaks at specific target loci. ZFNs recognizing the Arabidopsis ADH1 and TT4 genes were made by Oligomerized Pool ENgineering (OPEN)-a publicly available, selection-based platform that yields high quality zinc finger arrays. The ADH1 and TT4 ZFNs were placed under control of an estrogen-inducible promoter and introduced into Arabidopsis plants by floral-dip transformation. Primary transgenic Arabidopsis seedlings induced to express the ADH1 or TT4 ZFNs exhibited somatic mutation frequencies of 7% or 16%, respectively. The induced mutations were typically insertions or deletions (1-142 bp) that were localized at the ZFN cleavage site and likely derived from imprecise repair of chromosome breaks by nonhomologous end-joining. Mutations were transmitted to the next generation for 69% of primary transgenics expressing the ADH1 ZFNs and 33% of transgenics expressing the TT4 ZFNs. Furthermore, approximately 20% of the mutant-producing plants were homozygous for mutations at ADH1 or TT4, indicating that both alleles were disrupted. ADH1 and TT4 were chosen as targets for this study because of their selectable or screenable phenotypes (adh1, allyl alcohol resistance; tt4, lack of anthocyanins in the seed coat). However, the high frequency of observed ZFN-induced mutagenesis suggests that targeted mutations can readily be recovered by simply screening progeny of primary transgenic plants by PCR and DNA sequencing. Taken together, our results suggest that it should now be possible to obtain mutations in any Arabidopsis target gene regardless of its mutant phenotype.


Asunto(s)
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Desoxirribonucleasas/genética , Mutagénesis Sitio-Dirigida , Dedos de Zinc/genética , Alcohol Deshidrogenasa/genética , Arabidopsis/metabolismo , Secuencia de Bases , Reparación del ADN , ADN de Plantas/genética , ADN de Plantas/metabolismo , Desoxirribonucleasas/metabolismo , Marcación de Gen , Genes de Plantas , Datos de Secuencia Molecular , Plantas Modificadas Genéticamente , Ingeniería de Proteínas , Protoplastos/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
14.
BMC Bioinformatics ; 13: 41, 2012 Mar 18.
Artículo en Inglés | MEDLINE | ID: mdl-22424103

RESUMEN

BACKGROUND: Identification of the residues in protein-protein interaction sites has a significant impact in problems such as drug discovery. Motivated by the observation that the set of interface residues of a protein tend to be conserved even among remote structural homologs, we introduce PrISE, a family of local structural similarity-based computational methods for predicting protein-protein interface residues. RESULTS: We present a novel representation of the surface residues of a protein in the form of structural elements. Each structural element consists of a central residue and its surface neighbors. The PrISE family of interface prediction methods uses a representation of structural elements that captures the atomic composition and accessible surface area of the residues that make up each structural element. Each of the members of the PrISE methods identifies for each structural element in the query protein, a collection of similar structural elements in its repository of structural elements and weights them according to their similarity with the structural element of the query protein. PrISEL relies on the similarity between structural elements (i.e. local structural similarity). PrISEG relies on the similarity between protein surfaces (i.e. general structural similarity). PrISEC, combines local structural similarity and general structural similarity to predict interface residues. These predictors label the central residue of a structural element in a query protein as an interface residue if a weighted majority of the structural elements that are similar to it are interface residues, and as a non-interface residue otherwise. The results of our experiments using three representative benchmark datasets show that the PrISEC outperforms PrISEL and PrISEG; and that PrISEC is highly competitive with state-of-the-art structure-based methods for predicting protein-protein interface residues. Our comparison of PrISEC with PredUs, a recently developed method for predicting interface residues of a query protein based on the known interface residues of its (global) structural homologs, shows that performance superior or comparable to that of PredUs can be obtained using only local surface structural similarity. PrISEC is available as a Web server at http://prise.cs.iastate.edu/ CONCLUSIONS: Local surface structural similarity based methods offer a simple, efficient, and effective approach to predict protein-protein interface residues.


Asunto(s)
Dominios y Motivos de Interacción de Proteínas , Proteínas/química , Programas Informáticos , Algoritmos , Modelos Moleculares , Conformación Proteica , Proteínas/metabolismo
15.
BMC Bioinformatics ; 13: 89, 2012 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-22574904

RESUMEN

BACKGROUND: RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. RESULTS: We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. CONCLUSIONS: Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.


Asunto(s)
Inteligencia Artificial , Proteínas de Unión al ARN/química , ARN/química , Algoritmos , Aminoácidos/química , Teorema de Bayes , Humanos , Posición Específica de Matrices de Puntuación , Conformación Proteica , ARN/metabolismo , Proteínas de Unión al ARN/metabolismo , Análisis de Secuencia de Proteína , Máquina de Vectores de Soporte
16.
Plant Physiol ; 156(2): 466-73, 2011 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-21464476

RESUMEN

We performed targeted mutagenesis of a transgene and nine endogenous soybean (Glycine max) genes using zinc-finger nucleases (ZFNs). A suite of ZFNs were engineered by the recently described context-dependent assembly platform--a rapid, open-source method for generating zinc-finger arrays. Specific ZFNs targeting dicer-like (DCL) genes and other genes involved in RNA silencing were cloned into a vector under an estrogen-inducible promoter. A hairy-root transformation system was employed to investigate the efficiency of ZFN mutagenesis at each target locus. Transgenic roots exhibited somatic mutations localized at the ZFN target sites for seven out of nine targeted genes. We next introduced a ZFN into soybean via whole-plant transformation and generated independent mutations in the paralogous genes DCL4a and DCL4b. The dcl4b mutation showed efficient heritable transmission of the ZFN-induced mutation in the subsequent generation. These findings indicate that ZFN-based mutagenesis provides an efficient method for making mutations in duplicate genes that are otherwise difficult to study due to redundancy. We also developed a publicly accessible Web-based tool to identify sites suitable for engineering context-dependent assembly ZFNs in the soybean genome.


Asunto(s)
Endonucleasas/química , Endonucleasas/metabolismo , Genes Duplicados/genética , Genes de Plantas/genética , Técnicas Genéticas , Glycine max/genética , Mutagénesis/genética , Dedos de Zinc/genética , Secuencia de Bases , Proteínas Fluorescentes Verdes/metabolismo , Patrón de Herencia/genética , Internet , Datos de Secuencia Molecular , Mutación/genética , Raíces de Plantas/genética , Reacción en Cadena de la Polimerasa , Transgenes/genética
17.
Nucleic Acids Res ; 38(Web Server issue): W462-8, 2010 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-20435679

RESUMEN

ZiFiT (Zinc Finger Targeter) is a simple and intuitive web-based tool that provides an interface to identify potential binding sites for engineered zinc finger proteins (ZFPs) in user-supplied DNA sequences. In this updated version, ZiFiT identifies potential sites for ZFPs made by both the modular assembly and OPEN engineering methods. In addition, ZiFiT now integrates additional tools and resources including scoring schemes for modular assembly, an interface with the Zinc Finger Database (ZiFDB) of engineered ZFPs, and direct querying of NCBI BLAST servers for identifying potential off-target sites within a host genome. Taken together, these features facilitate design of ZFPs using reagents made available to the academic research community by the Zinc Finger Consortium. ZiFiT is freely available on the web without registration at http://bindr.gdcb.iastate.edu/ZiFiT/.


Asunto(s)
Proteínas de Unión al ADN/química , Ingeniería de Proteínas , Programas Informáticos , Dedos de Zinc , Sitios de Unión , Proteínas de Unión al ADN/metabolismo , Internet , Análisis de Secuencia de ADN , Interfaz Usuario-Computador
18.
BMC Bioinformatics ; 12: 244, 2011 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-21682895

RESUMEN

BACKGROUND: Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. RESULTS: We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence.Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein.Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. CONCLUSIONS: Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners.


Asunto(s)
Proteínas/metabolismo , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Humanos , Proteínas/química , Homología de Secuencia
19.
BMC Bioinformatics ; 12: 489, 2011 Dec 22.
Artículo en Inglés | MEDLINE | ID: mdl-22192482

RESUMEN

BACKGROUND: RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions. RESULTS: We propose RPISeq, a family of classifiers for predicting RNA-protein interactions using only sequence information. Given the sequences of an RNA and a protein as input, RPIseq predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of RPISeq are presented: RPISeq-SVM, which uses a Support Vector Machine (SVM) classifier and RPISeq-RF, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB), RPISeq achieved an AUC (Area Under the Receiver Operating Characteristic (ROC) curve) of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of RPISeq was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations) of the putative RNA and protein partners. In addition, RPISeq classifiers trained using the PRIDB data correctly predicted the majority (57-99%) of non-coding RNA-protein interactions in NPInter-derived networks from E. coli, S. cerevisiae, D. melanogaster, M. musculus, and H. sapiens. CONCLUSIONS: Our experiments with RPISeq demonstrate that RNA-protein interactions can be reliably predicted using only sequence-derived information. RPISeq offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs. RPISeq is freely available as a web-based server at http://pridb.gdcb.iastate.edu/RPISeq/.


Asunto(s)
Algoritmos , Mapas de Interacción de Proteínas , Proteínas/metabolismo , Proteínas de Unión al ARN/metabolismo , ARN/química , Análisis de Secuencia de ARN , Animales , Bases de Datos de Proteínas , Drosophila melanogaster/metabolismo , Escherichia coli/metabolismo , Humanos , Ratones , ARN/metabolismo , Estabilidad del ARN , Saccharomyces cerevisiae/metabolismo , Programas Informáticos , Máquina de Vectores de Soporte
20.
BMC Genomics ; 12: 83, 2011 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-21276248

RESUMEN

BACKGROUND: Zinc Finger Nucleases (ZFNs) have tremendous potential as tools to facilitate genomic modifications, such as precise gene knockouts or gene replacements by homologous recombination. ZFNs can be used to advance both basic research and clinical applications, including gene therapy. Recently, the ability to engineer ZFNs that target any desired genomic DNA sequence with high fidelity has improved significantly with the introduction of rapid, robust, and publicly available techniques for ZFN design such as the Oligomerized Pool ENgineering (OPEN) method. The motivation for this study is to make resources for genome modifications using OPEN-generated ZFNs more accessible to researchers by creating a user-friendly interface that identifies and provides quality scores for all potential ZFN target sites in the complete genomes of several model organisms. DESCRIPTION: ZFNGenome is a GBrowse-based tool for identifying and visualizing potential target sites for OPEN-generated ZFNs. ZFNGenome currently includes a total of more than 11.6 million potential ZFN target sites, mapped within the fully sequenced genomes of seven model organisms; S. cerevisiae, C. reinhardtii, A. thaliana, D. melanogaster, D. rerio, C. elegans, and H. sapiens and can be visualized within the flexible GBrowse environment. Additional model organisms will be included in future updates. ZFNGenome provides information about each potential ZFN target site, including its chromosomal location and position relative to transcription initiation site(s). Users can query ZFNGenome using several different criteria (e.g., gene ID, transcript ID, target site sequence). Tracks in ZFNGenome also provide "uniqueness" and ZiFOpT (Zinc Finger OPEN Targeter) "confidence" scores that estimate the likelihood that a chosen ZFN target site will function in vivo. ZFNGenome is dynamically linked to ZiFDB, allowing users access to all available information about zinc finger reagents, such as the effectiveness of a given ZFN in creating double-stranded breaks. CONCLUSIONS: ZFNGenome provides a user-friendly interface that allows researchers to access resources and information regarding genomic target sites for engineered ZFNs in seven model organisms. This genome-wide database of potential ZFN target sites should greatly facilitate the utilization of ZFNs in both basic and clinical research.ZFNGenome is freely available at: http://bindr.gdcb.iastate.edu/ZFNGenome or at the Zinc Finger Consortium website: http://www.zincfingers.org/.


Asunto(s)
Endonucleasas/genética , Dedos de Zinc/genética , Animales , Arabidopsis/enzimología , Sitios de Unión/genética , Caenorhabditis elegans/enzimología , Chlamydomonas reinhardtii/enzimología , Bases de Datos Genéticas , Drosophila melanogaster/enzimología , Humanos , Saccharomyces cerevisiae/enzimología , Programas Informáticos , Sitio de Iniciación de la Transcripción , Pez Cebra
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA