Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Genome Res ; 22(6): 1173-83, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22454233

RESUMO

We developed PolyA-seq, a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts, and used it to globally map polyadenylation (polyA) sites in 24 matched tissues in human, rhesus, dog, mouse, and rat. We show that PolyA-seq is as accurate as existing RNA sequencing (RNA-seq) approaches for digital gene expression (DGE), enabling simultaneous mapping of polyA sites and quantitative measurement of their usage. In human, we confirmed 158,533 known sites and discovered 280,857 novel sites (FDR < 2.5%). On average 10% of novel human sites were also detected in matched tissues in other species. Most novel sites represent uncharacterized alternative polyA events and extensions of known transcripts in human and mouse, but primarily delineate novel transcripts in the other three species. A total of 69.1% of known human genes that we detected have multiple polyA sites in their 3'UTRs, with 49.3% having three or more. We also detected polyadenylation of noncoding and antisense transcripts, including constitutive and tissue-specific primary microRNAs. The canonical polyA signal was strongly enriched and positionally conserved in all species. In general, usage of polyA sites is more similar within the same tissues across different species than within a species. These quantitative maps of polyA usage in evolutionarily and functionally related samples constitute a resource for understanding the regulatory mechanisms underlying alternative polyadenylation.


Assuntos
Mamíferos/genética , Poli A/genética , Poliadenilação/genética , Regiões 3' não Traduzidas , Animais , Embrião de Galinha , Cães , Evolução Molecular , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Macaca mulatta/genética , Camundongos , MicroRNAs/genética , Modelos Genéticos , RNA não Traduzido , Ratos , Transcriptoma
2.
BMC Genomics ; 11: 473, 2010 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-20707912

RESUMO

BACKGROUND: Identifying associations between genotypes and gene expression levels using microarrays has enabled systematic interrogation of regulatory variation underlying complex phenotypes. This approach has vast potential for functional characterization of disease states, but its prohibitive cost, given hundreds to thousands of individual samples from populations have to be genotyped and expression profiled, has limited its widespread application. RESULTS: Here we demonstrate that genomic regions with allele-specific expression (ASE) detected by sequencing cDNA are highly enriched for cis-acting expression quantitative trait loci (cis-eQTL) identified by profiling of 500 animals in parallel, with up to 90% agreement on the allele that is preferentially expressed. We also observed widespread noncoding and antisense ASE and identified several allele-specific alternative splicing variants. CONCLUSION: Monitoring ASE by sequencing cDNA from as little as one sample is a practical alternative to expression genetics for mapping cis-acting variation that regulates RNA transcription and processing.


Assuntos
Perfilação da Expressão Gênica/métodos , Expressão Gênica , Variação Genética , Locos de Características Quantitativas , Análise de Sequência de DNA/métodos , Alelos , Processamento Alternativo , Animais , Elementos Antissenso (Genética)/genética , Camundongos , Análise de Sequência com Séries de Oligonucleotídeos , RNA não Traduzido/genética , Transcrição Gênica
3.
PLoS One ; 5(7): e11779, 2010 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-20668672

RESUMO

Non-coding RNAs (ncRNAs) are an essential class of molecular species that have been difficult to monitor on high throughput platforms due to frequent lack of polyadenylation. Using a polyadenylation-neutral amplification protocol and next-generation sequencing, we explore ncRNA expression in eleven human tissues. ncRNAs 7SL, U2, 7SK, and HBII-52 are expressed at levels far exceeding mRNAs. C/D and H/ACA box snoRNAs are associated with rRNA methylation and pseudouridylation, respectively: spleen expresses both, hypothalamus expresses mainly C/D box snoRNAs, and testes show enriched expression of both H/ACA box snoRNAs and RNA telomerase TERC. Within the snoRNA 14q cluster, 14q(I-6) is expressed at much higher levels than other cluster members. More reads align to mitochondrial than nuclear tRNAs. Many lincRNAs are actively transcribed, particularly those overlapping known ncRNAs. Within the Prader-Willi syndrome loci, the snoRNA HBII-85 (group I) cluster is highly expressed in hypothalamus, greater than in other tissues and greater than group II or III. Additionally, within the disease locus we find novel transcription across a 400,000 nt span in ovaries. This genome-wide polyA-neutral expression compendium demonstrates the richness of ncRNA expression, their high expression patterns, their function-specific expression patterns, and is publicly available.


Assuntos
Genoma Humano/genética , RNA Nucleolar Pequeno/genética , RNA não Traduzido/genética , Perfilação da Expressão Gênica , Humanos , Hipotálamo/metabolismo , Reação em Cadeia da Polimerase , Baço/metabolismo
4.
BMC Genomics ; 11: 244, 2010 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-20398377

RESUMO

BACKGROUND: DNA copy number variations occur within populations and aberrations can cause disease. We sought to develop an improved lab-automatable, cost-efficient, accurate platform to profile DNA copy number. RESULTS: We developed a sequencing-based assay of nuclear, mitochondrial, and telomeric DNA copy number that draws on the unbiased nature of next-generation sequencing and incorporates techniques developed for RNA expression profiling. To demonstrate this platform, we assayed UMC-11 cells using 5 million 33 nt reads and found tremendous copy number variation, including regions of single and homogeneous deletions and amplifications to 29 copies; 5 times more mitochondria and 4 times less telomeric sequence than a pool of non-diseased, blood-derived DNA; and that UMC-11 was derived from a male individual. CONCLUSION: The described assay outputs absolute copy number, outputs an error estimate (p-value), and is more accurate than array-based platforms at high copy number. The platform enables profiling of mitochondrial levels and telomeric length. The assay is lab-automatable and has a genomic resolution and cost that are tunable based on the number of sequence reads.


Assuntos
Tumor Carcinoide/genética , Variações do Número de Cópias de DNA , Neoplasias Pulmonares/genética , Mitocôndrias/genética , Telômero , Animais , Linhagem Celular Tumoral , Feminino , Humanos , Masculino , Camundongos , Análise de Sequência de DNA/métodos
5.
Nat Methods ; 6(9): 647-9, 2009 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-19668204

RESUMO

We developed a procedure for the preparation of whole transcriptome cDNA libraries depleted of ribosomal RNA from only 1 microg of total RNA. The method relies on a collection of short, computationally selected oligonucleotides, called 'not-so-random' (NSR) primers, to obtain full-length, strand-specific representation of nonribosomal RNA transcripts. In this study we validated the technique by profiling human whole brain and universal human reference RNA using ultra-high-throughput sequencing.


Assuntos
Encéfalo/metabolismo , DNA Complementar/genética , Perfilação da Expressão Gênica/métodos , Biblioteca Gênica , Clonagem Molecular , Humanos , RNA/genética , RNA/metabolismo
6.
BMC Syst Biol ; 3: 80, 2009 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-19653913

RESUMO

BACKGROUND: A systems biology interpretation of genome-scale RNA interference (RNAi) experiments is complicated by scope, experimental variability and network signaling robustness. Over representation approaches (ORA), such as the Hypergeometric or z-score, are an established statistical framework used to associate RNA interference effectors to biologically annotated gene sets or pathways. These methods, however, do not directly take advantage of our growing understanding of the interactome. Furthermore, these methods can miss partial pathway activation and may be biased by protein complexes. Here we present a novel ORA, protein interaction permutation analysis (PIPA), that takes advantage of canonical pathways and established protein interactions to identify pathways enriched for protein interactions connecting RNAi hits. RESULTS: We use PIPA to analyze genome-scale siRNA cell growth screens performed in HeLa and TOV cell lines. First we show that interacting gene pair siRNA hits are more reproducible than single gene hits. Using protein interactions, PIPA identifies enriched pathways not found using the standard Hypergeometric analysis including the FAK cytoskeletal remodeling pathway. Different branches of the FAK pathway are distinctly essential in HeLa versus TOV cell lines while other portions are uneffected by siRNA perturbations. Enriched hits belong to protein interactions associated with cell cycle regulation, anti-apoptosis, and signal transduction. CONCLUSION: PIPA provides an analytical framework to interpret siRNA screen data by merging biologically annotated gene sets with the human interactome. As a result we identify pathways and signaling hypotheses that are statistically enriched to effect cell growth in human cell lines. This method provides a complementary approach to standard gene set enrichment that utilizes the additional knowledge of specific interactions within biological gene sets.


Assuntos
Genoma Humano , Proteínas/genética , RNA Interferente Pequeno/genética , Transdução de Sinais , Algoritmos , Linhagem Celular , Expressão Gênica , Humanos , Ligação Proteica , Biologia de Sistemas
7.
BMC Genomics ; 10: 269, 2009 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-19534766

RESUMO

BACKGROUND: Housekeeping genes (HKG) are constitutively expressed in all tissues while tissue-enriched genes (TEG) are expressed at a much higher level in a single tissue type than in others. HKGs serve as valuable experimental controls in gene and protein expression experiments, while TEGs tend to represent distinct physiological processes and are frequently candidates for biomarkers or drug targets. The genomic features of these two groups of genes expressed in opposing patterns may shed light on the mechanisms by which cells maintain basic and tissue-specific functions. RESULTS: Here, we generate gene expression profiles of 42 normal human tissues on custom high-density microarrays to systematically identify 1,522 HKGs and 975 TEGs and compile a small subset of 20 housekeeping genes which are highly expressed in all tissues with lower variance than many commonly used HKGs. Cross-species comparison shows that both the functions and expression patterns of HKGs are conserved. TEGs are enriched with respect to both segmental duplication and copy number variation, while no such enrichment is observed for HKGs, suggesting the high expression of HKGs are not due to high copy numbers. Analysis of genomic and epigenetic features of HKGs and TEGs reveals that the high expression of HKGs across different tissues is associated with decreased nucleosome occupancy at the transcription start site as indicated by enhanced DNase hypersensitivity. Additionally, we systematically and quantitatively demonstrated that the CpG islands' enrichment in HKGs transcription start sites (TSS) and their depletion in TEGs TSS. Histone methylation patterns differ significantly between HKGs and TEGs, suggesting that methylation contributes to the differential expression patterns as well. CONCLUSION: We have compiled a set of high quality HKGs that should provide higher and more consistent expression when used as references in laboratory experiments than currently used HKGs. The comparison of genomic features between HKGs and TEGs shows that HKGs are more conserved than TEGs in terms of functions, expression pattern and polymorphisms. In addition, our results identify chromatin structure and epigenetic features of HKGs and TEGs that are likely to play an important role in regulating their strikingly different expression patterns.


Assuntos
Epigênese Genética , Perfilação da Expressão Gênica , Genoma Humano , Cromatina , Sequência Conservada , Ilhas de CpG , Metilação de DNA , Dosagem de Genes , Duplicação Gênica , Regulação da Expressão Gênica , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Sítio de Iniciação de Transcrição
8.
Bioinformatics ; 25(12): i281-8, 2009 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-19478000

RESUMO

MOTIVATION: Our focus has been on detecting topological properties that are rare in real proteins, but occur more frequently in models generated by protein structure prediction methods such as Rosetta. We previously created the Knotfind algorithm, successfully decreasing the frequency of knotted Rosetta models during CASP6. We observed an additional class of knot-like loops that appeared to be equally un-protein-like and yet do not contain a mathematical knot. These topological features are commonly referred to as slip-knots and are caused by the same mechanisms that result in knotted models. Slip-knots are undetectable by the original Knotfind algorithm. We have generalized our algorithm to detect them, and analyzed CASP6 models built using the Rosetta loop modeling method. RESULTS: After analyzing known protein structures in the PDB, we found that slip-knots do occur in certain proteins, but are rare and fall into a small number of specific classes. Our group used this new Pokefind algorithm to distinguish between these rare real slip-knots and the numerous classes of slip-knots that we discovered in Rosetta models and models submitted by the various CASP7 servers. The goal of this work is to improve future models created by protein structure prediction methods. Both algorithms are able to detect un-protein-like features that current metrics such as GDT are unable to identify, so these topological filters can also be used as additional assessment tools.


Assuntos
Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Software , Algoritmos , Bases de Dados de Proteínas , Modelos Moleculares , Dobramento de Proteína
9.
PLoS Biol ; 6(5): e107, 2008 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-18462017

RESUMO

Genetic variants that are associated with common human diseases do not lead directly to disease, but instead act on intermediate, molecular phenotypes that in turn induce changes in higher-order disease traits. Therefore, identifying the molecular phenotypes that vary in response to changes in DNA and that also associate with changes in disease traits has the potential to provide the functional information required to not only identify and validate the susceptibility genes that are directly affected by changes in DNA, but also to understand the molecular networks in which such genes operate and how changes in these networks lead to changes in disease traits. Toward that end, we profiled more than 39,000 transcripts and we genotyped 782,476 unique single nucleotide polymorphisms (SNPs) in more than 400 human liver samples to characterize the genetic architecture of gene expression in the human liver, a metabolically active tissue that is important in a number of common human diseases, including obesity, diabetes, and atherosclerosis. This genome-wide association study of gene expression resulted in the detection of more than 6,000 associations between SNP genotypes and liver gene expression traits, where many of the corresponding genes identified have already been implicated in a number of human diseases. The utility of these data for elucidating the causes of common human diseases is demonstrated by integrating them with genotypic and expression data from other human and mouse populations. This provides much-needed functional support for the candidate susceptibility genes being identified at a growing number of genetic loci that have been identified as key drivers of disease from genome-wide association studies of disease. By using an integrative genomics approach, we highlight how the gene RPS26 and not ERBB3 is supported by our data as the most likely susceptibility gene for a novel type 1 diabetes locus recently identified in a large-scale, genome-wide association study. We also identify SORT1 and CELSR2 as candidate susceptibility genes for a locus recently associated with coronary artery disease and plasma low-density lipoprotein cholesterol levels in the process.


Assuntos
Perfilação da Expressão Gênica , Predisposição Genética para Doença/genética , Fígado/metabolismo , Polimorfismo de Nucleotídeo Único/genética , Transcrição Gênica/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Animais , Criança , Pré-Escolar , LDL-Colesterol/sangue , LDL-Colesterol/genética , Doença da Artéria Coronariana/genética , Diabetes Mellitus Tipo 1/genética , Feminino , Genes MHC da Classe II/genética , Genoma Humano , Genótipo , Humanos , Lactente , Masculino , Camundongos , Pessoa de Meia-Idade , Análise de Sequência com Séries de Oligonucleotídeos , Locos de Características Quantitativas/genética , RNA Mensageiro/análise , RNA Mensageiro/genética
10.
Mol Biotechnol ; 34(1): 69-93, 2006 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-16943573

RESUMO

By organizing and making widely accessible the increasing amounts of data from high-throughput analyses, protein interaction databases have become an integral resource for the biological community in relating sequence data with higher-order function. To provide a sense of the use and applicability of these databases, we describe each of the major comprehensive interaction databases as well as some of the more specialized ones. Content description, search/browse functionalities, and data presentation are discussed. A succinct explanation of database contents helps the user quickly identify whether the database contains applicable information to their research interest. Broad levels of search/browse functions as well as descriptions/examples allow users to quickly find and access pertinent data. At this point, clear presentation of search results as well as the primary content is necessary. Many databases display information graphically or divided into smaller digestible parts over a number of tabbed/linked pages. In addition, cross-linking between the databases promotes interconnectivity of the data and is an added layer of relational data for the user. Overall, although these protein interaction databases are under continual improvement, their current state shows that much time and effort has gone into organizing and presenting these large sets of data-describing protein interactions.


Assuntos
Bases de Dados de Proteínas/classificação , Documentação/métodos , Armazenamento e Recuperação da Informação/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/classificação , Proteínas/metabolismo , Terminologia como Assunto , Sistemas de Gerenciamento de Base de Dados , Proteínas/química , Vocabulário Controlado
11.
Bioinformatics ; 22(14): e252-9, 2006 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-16873480

RESUMO

MOTIVATION: Knots in polypeptide chains have been found in very few proteins, and consequently should be generally avoided in protein structure prediction methods. Most effective structure prediction methods do not model the protein folding process itself, but rather seek only to correctly obtain the final native state. Consequently, the mechanisms that prevent knots from occurring in native proteins are not relevant to the modeling process, and as a result, knots can occur with significantly higher frequency in protein models. Here we describe Knotfind, a simple algorithm for knot detection that is fast enough for structure prediction, where tens or hundreds of thousands of conformations may be sampled during the course of a prediction. We have used this algorithm to characterize knots in large populations of model structures generated for targets in CASP 5 and CASP 6 using the Rosetta homology-based modeling method. RESULTS: Analysis of CASP5 models suggested several possible avenues for introduction of knots into these models, and these insights were applied to structure prediction in CASP 6, resulting in a significant decrease in the proportion of knotted models generated. Additionally, using the knot detection algorithm on structures in the Protein Data Bank, a previously unreported deep trefoil knot was found in acetylornithine transcarbamylase. AVAILABILITY: The Knotfind algorithm is available in the Rosetta structure prediction program at http://www.rosettacommons.org.


Assuntos
Algoritmos , Modelos Químicos , Modelos Moleculares , Proteínas/química , Proteínas/ultraestrutura , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Simulação por Computador , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão , Conformação Proteica , Proteínas/classificação , Homologia de Sequência de Aminoácidos
12.
Proc Natl Acad Sci U S A ; 103(14): 5361-6, 2006 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-16567638

RESUMO

We have developed a method that combines the ROSETTA de novo protein folding and refinement protocol with distance constraints derived from homologous structures to build homology models that are frequently more accurate than their templates. We test this method by building complete-chain models for a benchmark set of 22 proteins, each with 1 or 2 candidate templates, for a total of 39 test cases. We use structure-based and sequence-based alignments for each of the test cases. All atoms, including hydrogens, are represented explicitly. The resulting models contain approximately the same number of atomic overlaps as experimentally determined crystal structures and maintain good stereochemistry. The most accurate models can be identified by their energies, and in 22 of 39 cases a model that is more accurate than the template over aligned regions is one of the 10 lowest-energy models.


Assuntos
Modelos Teóricos , Proteínas/química , Dobramento de Proteína
13.
Proteins ; 61 Suppl 7: 157-166, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16187358

RESUMO

The Robetta server and revised automatic protocols were used to predict structures for CASP6 targets. Robetta is a publicly available protein structure prediction server (http://robetta.bakerlab.org/ that uses the Rosetta de novo and homology modeling structure prediction methods. We incorporated some of the lessons learned in the CASP5 experiment into the server prior to participating in CASP6. We additionally tested new ideas that were amenable to full-automation with an eye toward improving the server. We find that the Robetta server shows the greatest promise for the more challenging targets. The most significant finding from CASP5, that automated protocols can be roughly comparable in ability with the better human-intervention predictors, is repeated here in CASP6.


Assuntos
Biologia Computacional/métodos , Proteômica/métodos , Algoritmos , Simulação por Computador , Computadores , Interpretação Estatística de Dados , Bases de Dados de Proteínas , Dimerização , Modelos Moleculares , Método de Monte Carlo , Conformação Proteica , Dobramento de Proteína , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Reprodutibilidade dos Testes , Alinhamento de Sequência , Software
14.
Methods Enzymol ; 394: 244-60, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15808223

RESUMO

RosettaNMR combines the Rosetta de novo structure prediction method with limited NMR experimental data for rapid estimation of protein structure. The de novo Rosetta algorithm predicts protein three-dimensional structures using only sequence information by combining short fragments selected from known protein structures on the basis of local sequence similarity. These fragments are assembled using a Monte Carlo strategy to generate models that reproduce empirical statistics describing nonlocal protein structure such as overall compactness, hydrophobic burial, and beta-strand pairing. By incorporating chemical shift, nuclear Overhauser enhancement, and?or residual dipolar coupling restraints that are insufficient on their own to determine the protein global fold, the RosettaNMR method correctly estimates the global fold of a variety of different proteins, generating models that are that are generally 4?A or better Calpha root-mean-square deviation to the high-resolution experimental structures. Here we review the capabilities of the RosettaNMR approach, describe the underlying methods, and provide practical tips for applying the technique to structure estimation problems.


Assuntos
Biologia Computacional , Espectroscopia de Ressonância Magnética/métodos , Proteínas/química , Algoritmos , Interpretação Estatística de Dados , Biblioteca de Peptídeos
15.
Proteins ; 55(3): 656-77, 2004 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-15103629

RESUMO

A major limitation of current comparative modeling methods is the accuracy with which regions that are structurally divergent from homologues of known structure can be modeled. Because structural differences between homologous proteins are responsible for variations in protein function and specificity, the ability to model these differences has important functional consequences. Although existing methods can provide reasonably accurate models of short loop regions, modeling longer structurally divergent regions is an unsolved problem. Here we describe a method based on the de novo structure prediction algorithm, Rosetta, for predicting conformations of structurally divergent regions in comparative models. Initial conformations for short segments are selected from the protein structure database, whereas longer segments are built up by using three- and nine-residue fragments drawn from the database and combined by using the Rosetta algorithm. A gap closure term in the potential in combination with modified Newton's method for gradient descent minimization is used to ensure continuity of the peptide backbone. Conformations of variable regions are refined in the context of a fixed template structure using Monte Carlo minimization together with rapid repacking of side-chains to iteratively optimize backbone torsion angles and side-chain rotamers. For short loops, mean accuracies of 0.69, 1.45, and 3.62 A are obtained for 4, 8, and 12 residue loops, respectively. In addition, the method can provide reasonable models of conformations of longer protein segments: predicted conformations of 3A root-mean-square deviation or better were obtained for 5 of 10 examples of segments ranging from 13 to 34 residues. In combination with a sequence alignment algorithm, this method generates complete, ungapped models of protein structures, including regions both similar to and divergent from a homologous structure. This combined method was used to make predictions for 28 protein domains in the Critical Assessment of Protein Structure 4 (CASP 4) and 59 domains in CASP 5, where the method ranked highly among comparative modeling and fold recognition methods. Model accuracy in these blind predictions is dominated by alignment quality, but in the context of accurate alignments, long protein segments can be accurately modeled. Notably, the method correctly predicted the local structure of a 39-residue insertion into a TIM barrel in CASP 5 target T0186.


Assuntos
Algoritmos , Modelos Moleculares , Homologia Estrutural de Proteína , Sequência de Aminoácidos , Conformação Proteica , Proteínas/química , Reprodutibilidade dos Testes
17.
Proteins ; 53 Suppl 6: 457-68, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-14579334

RESUMO

We describe predictions of the structures of CASP5 targets using Rosetta. The Rosetta fragment insertion protocol was used to generate models for entire target domains without detectable sequence similarity to a protein of known structure and to build long loop insertions (and N-and C-terminal extensions) in cases where a structural template was available. Encouraging results were obtained both for the de novo predictions and for the long loop insertions; we describe here the successes as well as the failures in the context of current efforts to improve the Rosetta method. In particular, de novo predictions failed for large proteins that were incorrectly parsed into domains and for topologically complex (high contact order) proteins with swapping of segments between domains. However, for the remaining targets, at least one of the five submitted models had a long fragment with significant similarity to the native structure. A fully automated version of the CASP5 protocol produced results that were comparable to the human-assisted predictions for most of the targets, suggesting that automated genomic-scale, de novo protein structure prediction may soon be worthwhile. For the three targets where the human-assisted predictions were significantly closer to the native structure, we identify the steps that remain to be automated.


Assuntos
Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Algoritmos , Animais , Proteínas de Bactérias/química , Biologia Computacional/tendências , Ferredoxinas/química , Metiltransferases/química , Modelos Moleculares , Dobramento de Proteína , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Reprodutibilidade dos Testes
18.
Proteins ; 53 Suppl 6: 524-33, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-14579342

RESUMO

Robetta is a fully automated protein structure prediction server that uses the Rosetta fragment-insertion method. It combines template-based and de novo structure prediction methods in an attempt to produce high quality models that cover every residue of a submitted sequence. The first step in the procedure is the automatic detection of the locations of domains and selection of the appropriate modeling protocol for each domain. For domains matched to a homolog with an experimentally characterized structure by PSI-BLAST or Pcons2, Robetta uses a new alignment method, called K*Sync, to align the query sequence onto the parent structure. It then models the variable regions by allowing them to explore conformational space with fragments in fashion similar to the de novo protocol, but in the context of the template. When no structural homolog is available, domains are modeled with the Rosetta de novo protocol, which allows the full length of the domain to explore conformational space via fragment-insertion, producing a large decoy ensemble from which the final models are selected. The Robetta server produced quite reasonable predictions for targets in the recent CASP-5 and CAFASP-3 experiments, some of which were at the level of the best human predictions.


Assuntos
Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Algoritmos , Modelos Moleculares , Dobramento de Proteína , Estrutura Terciária de Proteína
19.
Proteins ; 53(1): 76-87, 2003 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-12945051

RESUMO

We have improved the original Rosetta centroid/backbone decoy set by increasing the number of proteins and frequency of near native models and by building on sidechains and minimizing clashes. The new set consists of 1,400 model structures for 78 different and diverse protein targets and provides a challenging set for the testing and evaluation of scoring functions. We evaluated the extent to which a variety of all-atom energy functions could identify the native and close-to-native structures in the new decoy sets. Of various implicit solvent models, we found that a solvent-accessible surface area-based solvation provided the best enrichment and discrimination of close-to-native decoys. The combination of this solvation treatment with Lennard Jones terms and the original Rosetta energy provided better enrichment and discrimination than any of the individual terms. The results also highlight the differences in accuracy of NMR and X-ray crystal structures: a large energy gap was observed between native and non-native conformations for X-ray structures but not for NMR structures.


Assuntos
Conformação Proteica , Algoritmos , Ligação de Hidrogênio , Proteínas/química , Solventes/química
20.
J Mol Biol ; 331(1): 281-99, 2003 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-12875852

RESUMO

Protein-protein docking algorithms provide a means to elucidate structural details for presently unknown complexes. Here, we present and evaluate a new method to predict protein-protein complexes from the coordinates of the unbound monomer components. The method employs a low-resolution, rigid-body, Monte Carlo search followed by simultaneous optimization of backbone displacement and side-chain conformations using Monte Carlo minimization. Up to 10(5) independent simulations are carried out, and the resulting "decoys" are ranked using an energy function dominated by van der Waals interactions, an implicit solvation model, and an orientation-dependent hydrogen bonding potential. Top-ranking decoys are clustered to select the final predictions. Small-perturbation studies reveal the formation of binding funnels in 42 of 54 cases using coordinates derived from the bound complexes and in 32 of 54 cases using independently determined coordinates of one or both monomers. Experimental binding affinities correlate with the calculated score function and explain the predictive success or failure of many targets. Global searches using one or both unbound components predict at least 25% of the native residue-residue contacts in 28 of the 32 cases where binding funnels exist. The results suggest that the method may soon be useful for generating models of biologically important complexes from the structures of the isolated components, but they also highlight the challenges that must be met to achieve consistent and accurate prediction of protein-protein interactions.


Assuntos
Algoritmos , Modelos Moleculares , Proteínas/química , Complexo Antígeno-Anticorpo/química , Simulação por Computador , Inibidores Enzimáticos/química , Enzimas/química , Ligação de Hidrogênio , Interações Hidrofóbicas e Hidrofílicas , Método de Monte Carlo , Ligação Proteica , Conformação Proteica , Solventes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...