Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
2.
Proc Natl Acad Sci U S A ; 111(17): 6131-8, 2014 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-24753594

RESUMEN

With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.


Asunto(s)
ADN/genética , Genoma Humano/genética , Evolución Biológica , Enfermedad/genética , Humanos , Secuencias Reguladoras de Ácidos Nucleicos/genética , Programas Informáticos
3.
Microb Drug Resist ; 19(6): 428-36, 2013 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-23808957

RESUMEN

The alarming rise of ciprofloxacin-resistant Pseudomonas aeruginosa has been reported in several clinical studies. Though the mutation of resistance genes and their role in drug resistance has been researched, the process by which the bacterium acquires high-level resistance is still not well understood. How does the genomic evolution of P. aeruginosa affect resistance development? Could the exposure of antibiotics to the bacteria enrich genomic variants that lead to the development of resistance, and if so, how are these variants distributed through the genome? To answer these questions, we performed 454 pyrosequencing and a whole genome analysis both before and after exposure to ciprofloxacin. The comparative sequence data revealed 93 unique resistance strain variation sites, which included a mutation in the DNA gyrase subunit A gene. We generated variation-distribution maps comparing the wild and resistant types, and isolated 19 candidates from three discrete resistance-associated high variability regions that had available transposon mutants, to perform a ciprofloxacin exposure assay. Of these region candidates with transposon disruptions, 79% (15/19) showed a reduction in the ability to gain high-level resistance, suggesting that genes within these high variability regions might enrich for certain functions associated with resistance development.


Asunto(s)
Girasa de ADN/genética , Farmacorresistencia Bacteriana/genética , Genoma Bacteriano , Mutación , Pseudomonas aeruginosa/genética , Antibacterianos/farmacología , Ciprofloxacina/farmacología , Elementos Transponibles de ADN , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular , Pseudomonas aeruginosa/efectos de los fármacos
4.
J Proteome Res ; 12(9): 4240-7, 2013 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-23875887

RESUMEN

Peppy, the proteogenomic/proteomic search software, employs a novel method for assessing the match quality between an MS/MS spectrum and a theorized peptide sequence. The scoring system uses three score factors calculated with binomial probabilities: the probability that a fragment ion will randomly align with a peptide ion, the probability that the aligning ions will be selected from subsets of the most intense peaks, and the probability that the intensities of fragment ions identified as y-ions are greater than those of their counterpart b-ions. The scores produced by the method act as global confidence scores, which facilitate the accurate comparison of results and the estimation of false discovery rates. Peppy has been integrated into the meta-search engine PepArML to produce meaningful comparisons with Mascot, MSGF+, OMSSA, X!Tandem, k-Score and s-Score. For two of the four data sets examined with the PepArML analysis, Peppy exceeded the accuracy performance of the other scoring systems. Peppy is available for download at http://geneffects.com/peppy .


Asunto(s)
Mapeo Peptídico , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Proteínas Sanguíneas/química , Humanos , Datos de Secuencia Molecular , Fragmentos de Péptidos/química , Análisis de Secuencia de Proteína , Espectrometría de Masas en Tándem
5.
J Proteome Res ; 12(6): 3019-25, 2013 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-23614390

RESUMEN

Proteogenomic searching is a useful method for identifying novel proteins, annotating genes and detecting peptides unique to an individual genome. The approach, however, can be laborious, as it often requires search segmentation and the use of several unintegrated tools. Furthermore, many proteogenomic efforts have been limited to small genomes, as large genomes can prove impractical due to the required amount of computer memory and computation time. We present Peppy, a software tool designed to perform every necessary task of proteogenomic searches quickly, accurately and automatically. The software generates a peptide database from a genome, tracks peptide loci, matches peptides to MS/MS spectra and assigns confidence values to those matches. Peppy automatically performs a decoy database generation, search and analysis to return identifications at the desired false discovery rate threshold. Written in Java for cross-platform execution, the software is fully multithreaded for enhanced speed. The program can run on regular desktop computers, opening the doors of proteogenomic searching to a wider audience of proteomics and genomics researchers. Peppy is available at http://geneffects.com/peppy .


Asunto(s)
Anotación de Secuencia Molecular , Fragmentos de Péptidos/aislamiento & purificación , Proteínas/aislamiento & purificación , Proteómica , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Secuencia de Bases , Línea Celular , Bases de Datos de Proteínas , Humanos , Datos de Secuencia Molecular , Espectrometría de Masas en Tándem
6.
Theor Biol Med Model ; 10: 23, 2013 Apr 03.
Artículo en Inglés | MEDLINE | ID: mdl-23551850

RESUMEN

BACKGROUND: It is a fascinating phenomenon that in genetically identical bacteria populations of Bacillus subtilis, a distinct DNA uptake phenotype called the competence phenotype may emerge in 10-20% of the population. Many aspects of the phenomenon are believed to be due to the variable expression of critical genes: a stochastic occurrence termed "noise" which has made the phenomenon difficult to examine directly by lab experimentation. METHODS: To capture and model noise in this system and further understand the emergence of competence both at the intracellular and culture levels in B. subtilis, we developed a novel multi-scale, agent-based model. At the intracellular level, our model recreates the regulatory network involved in the competence phenotype. At the culture level, we simulated growth conditions, with our multi-scale model providing feedback between the two levels. RESULTS: Our model predicted three potential sources of genetic "noise". First, the random spatial arrangement of molecules may influence the manifestation of the competence phenotype. In addition, the evidence suggests that there may be a type of epigenetic heritability to the emergence of competence, influenced by the molecular concentrations of key competence molecules inherited through cell division. Finally, the emergence of competence during the stationary phase may in part be due to the dilution effect of cell division upon protein concentrations. CONCLUSIONS: The competence phenotype was easily translated into an agent-based model - one with the ability to illuminate complex cell behavior. Models such as the one described in this paper can simulate cell behavior that is otherwise unobservable in vivo, highlighting their potential usefulness as research tools.


Asunto(s)
Bacillus subtilis/fisiología , Modelos Teóricos , Bacillus subtilis/genética , Genes Bacterianos , Biosíntesis de Proteínas , Procesos Estocásticos , Transcripción Genética
7.
BMC Genomics ; 14: 141, 2013 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-23448259

RESUMEN

BACKGROUND: Proteogenomic mapping is an approach that uses mass spectrometry data from proteins to directly map protein-coding genes and could aid in locating translational regions in the human genome. In concert with the ENcyclopedia of DNA Elements (ENCODE) project, we applied proteogenomic mapping to produce proteogenomic tracks for the UCSC Genome Browser, to explore which putative translational regions may be missing from the human genome. RESULTS: We generated ~1 million high-resolution tandem mass (MS/MS) spectra for Tier 1 ENCODE cell lines K562 and GM12878 and mapped them against the UCSC hg19 human genome, and the GENCODE V7 annotated protein and transcript sets. We then compared the results from the three searches to identify the best-matching peptide for each MS/MS spectrum, thereby increasing the confidence of the putative new protein-coding regions found via the whole genome search. At a 1% false discovery rate, we identified 26,472, 24,406, and 13,128 peptides from the protein, transcript, and whole genome searches, respectively; of these, 481 were found solely via the whole genome search. The proteogenomic mapping data are available on the UCSC Genome Browser at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeUncBsuProt. CONCLUSIONS: The whole genome search revealed that ~4% of the uniquely mapping identified peptides were located outside GENCODE V7 annotated exons. The comparison of the results from the disparate searches also identified 15% more spectra than would have been found solely from a protein database search. Therefore, whole genome proteogenomic mapping is a complementary method for genome annotation when performed in conjunction with other searches.


Asunto(s)
Bases de Datos Genéticas , Genoma Humano , Anotación de Secuencia Molecular , Sistemas de Lectura Abierta/genética , Línea Celular , Mapeo Cromosómico , Biología Computacional , Humanos , Espectrometría de Masas , Análisis de Secuencia de ADN
8.
Anal Chem ; 84(21): 9008-14, 2012 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-23030679

RESUMEN

Membrane proteomics, the large-scale analysis of membrane proteins, is often constrained by the difficulties of achieving fully resolvable separation and resistance to proteolysis, both of which could lead to low recovery and low identification rates of membrane proteins. Here, we introduce a novel integrated approach, GELFrEE Optimized FASP Technology (GOFAST) for large-scale and comprehensive membrane proteins analysis. Using an array of sample preparation techniques including gel-eluted liquid fraction entrapment electrophoresis (GELFrEE), filter-aided sample preparation (FASP), and microwave-assisted on-filter enzymatic digestion, we identified 2 090 proteins from the membrane fraction of a leukemia cell line (K562). Of these, 37% are annotated as membrane proteins according to gene ontology analysis, resulting in the largest membrane proteome of leukemia cells reported to date. Our approach combines the advantages of GELFrEE high-loading capacity, gel-free separation, efficient depletion of detergents, and microwave-assisted on-filter digestion, minimizing sample losses and maximizing MS-detectable sequence coverage of individual proteins. In addition, this approach also shows great potential for the identification of alternative splicing products.


Asunto(s)
Métodos Analíticos de la Preparación de la Muestra/métodos , Electroforesis/métodos , Proteínas de la Membrana/análisis , Proteoma/análisis , Proteómica/métodos , Filtración , Humanos , Células K562 , Proteínas de la Membrana/química , Proteínas de la Membrana/aislamiento & purificación , Isoformas de Proteínas/análisis , Isoformas de Proteínas/química , Proteoma/química
9.
Genome Res ; 22(9): 1646-57, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22955977

RESUMEN

Data from the Encyclopedia of DNA Elements (ENCODE) project show over 9640 human genome loci classified as long noncoding RNAs (lncRNAs), yet only ~100 have been deeply characterized to determine their role in the cell. To measure the protein-coding output from these RNAs, we jointly analyzed two recent data sets produced in the ENCODE project: tandem mass spectrometry (MS/MS) data mapping expressed peptides to their encoding genomic loci, and RNA-seq data generated by ENCODE in long polyA+ and polyA- fractions in the cell lines K562 and GM12878. We used the machine-learning algorithm RuleFit3 to regress the peptide data against RNA expression data. The most important covariate for predicting translation was, surprisingly, the Cytosol polyA- fraction in both cell lines. LncRNAs are ~13-fold less likely to produce detectable peptides than similar mRNAs, indicating that ~92% of GENCODE v7 lncRNAs are not translated in these two ENCODE cell lines. Intersecting 9640 lncRNA loci with 79,333 peptides yielded 85 unique peptides matching 69 lncRNAs. Most cases were due to a coding transcript misannotated as lncRNA. Two exceptions were an unprocessed pseudogene and a bona fide lncRNA gene, both with open reading frames (ORFs) compromised by upstream stop codons. All potentially translatable lncRNA ORFs had only a single peptide match, indicating low protein abundance and/or false-positive peptide matches. We conclude that with very few exceptions, ribosomes are able to distinguish coding from noncoding transcripts and, hence, that ectopic translation and cryptic mRNAs are rare in the human lncRNAome.


Asunto(s)
Biosíntesis de Proteínas , ARN Largo no Codificante/genética , Secuencia de Aminoácidos , Secuencia de Bases , Línea Celular , Expresión Génica , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Humanos , Células K562 , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , Péptidos/genética , ARN Largo no Codificante/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Alineación de Secuencia , Espectrometría de Masas en Tándem/métodos
10.
Bioinformatics ; 27(6): 844-52, 2011 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-21389073

RESUMEN

MOTIVATION: Post-translational modifications are vital to the function of proteins, but are hard to study, especially since several modified isoforms of a protein may be present simultaneously. Mass spectrometers are a great tool for investigating modified proteins, but the data they provide is often incomplete, ambiguous and difficult to interpret. Combining data from multiple experimental techniques-especially bottom-up and top-down mass spectrometry-provides complementary information. When integrated with background knowledge this allows a human expert to interpret what modifications are present and where on a protein they are located. However, the process is arduous and for high-throughput applications needs to be automated. RESULTS: This article explores a data integration methodology based on Markov chain Monte Carlo and simulated annealing. Our software, the Protein Inference Engine (the PIE) applies these algorithms using a modular approach, allowing multiple types of data to be considered simultaneously and for new data types to be added as needed. Even for complicated data representing multiple modifications and several isoforms, the PIE generates accurate modification predictions, including location. When applied to experimental data collected on the L7/L12 ribosomal protein the PIE was able to make predictions consistent with manual interpretation for several different L7/L12 isoforms using a combination of bottom-up data with experimentally identified intact masses. AVAILABILITY: Software, demo projects and source can be downloaded from http://pie.giddingslab.org/


Asunto(s)
Espectrometría de Masas/métodos , Procesamiento Proteico-Postraduccional , Proteínas/química , Programas Informáticos , Algoritmos , Proteínas Bacterianas/análisis , Proteínas Bacterianas/química , Escherichia coli/química , Cadenas de Markov , Método de Montecarlo , Isoformas de Proteínas/análisis , Isoformas de Proteínas/química , Proteínas/análisis , Proteómica/métodos , Proteínas Ribosómicas/análisis , Proteínas Ribosómicas/química
11.
Methods Mol Biol ; 694: 255-90, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21082440

RESUMEN

This chapter describes using the Protein Inference Engine (PIE) to integrate various types of data--especially top down and bottom up mass spectrometer (MS) data--to describe a protein's posttranslational modifications (PTMs). PTMs include cleavage events such as the n-terminal loss of methionine and residue modifications like phosphorylation. Modifications are key elements in many biological processes, but are difficult to study as no single, general method adequately characterizes a protein's PTMs; manually integrating data from several MS experiments is usually required. The PIE is designed to automate this process using a guess and refine process similar to how an expert manually integrates data. The PIE repeatedly "imagines" a possible modification set, evaluates it using available data, and then tries to improve on it. After many rounds of refinement, the resulting modification set is proposed as a candidate answer. Multiple candidate answers are generated to obtain both best and near-best answers. Near-best answers are crucial in allowing for proteins with more than one supported modification pattern (isoforms) and obtaining robust results given incomplete and inconsistent data.The goal of this chapter is to walk the reader through installing and using the downloadable version of PIE, both in general and by means of a specific, detailed example. The example integrates several types of experimental and background (prior) data. It is not a "perfect-world" scenario, but has been designed to illustrate several real-world difficulties that may be encountered when trying to analyze imperfect data.


Asunto(s)
Biología Computacional/métodos , Procesamiento Automatizado de Datos/métodos , Procesamiento Proteico-Postraduccional , Proteínas/metabolismo , Programas Informáticos , Secuencia de Aminoácidos , Espectrometría de Masas , Datos de Secuencia Molecular , Peso Molecular , Péptidos/química , Péptidos/metabolismo , Fosforilación , Isoformas de Proteínas/química , Isoformas de Proteínas/metabolismo , Proteínas/química
12.
Antimicrob Agents Chemother ; 54(11): 4626-35, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-20696867

RESUMEN

Microbes have developed resistance to nearly every antibiotic, yet the steps leading to drug resistance remain unclear. Here we report a multistage process by which Pseudomonas aeruginosa acquires drug resistance following exposure to ciprofloxacin at levels ranging from 0.5× to 8× the initial MIC. In stage I, susceptible cells are killed en masse by the exposure. In stage II, a small, slow to nongrowing population survives antibiotic exposure that does not exhibit significantly increased resistance according to the MIC measure. In stage III, exhibited at 0.5× to 4× the MIC, a growing population emerges to reconstitute the population, and these cells display heritable increases in drug resistance of up to 50 times the original level. We studied the stage III cells by proteomic methods to uncover differences in the regulatory pathways that are involved in this phenotype, revealing upregulation of phosphorylation on two proteins, succinate-semialdehyde dehydrogenase (SSADH) and methylmalonate-semialdehyde dehydrogenase (MMSADH), and also revealing upregulation of a highly conserved protein of unknown function. Transposon disruption in the encoding genes for each of these targets substantially dampened the ability of cells to develop the stage III phenotype. Considering these results in combination with computational models of resistance and genomic sequencing results, we postulate that stage III heritable resistance develops from a combination of both genomic mutations and modulation of one or more preexisting cellular pathways.


Asunto(s)
Antiinfecciosos/farmacología , Proteínas Bacterianas/metabolismo , Ciprofloxacina/farmacología , Farmacorresistencia Bacteriana/fisiología , Pseudomonas aeruginosa/efectos de los fármacos , Pseudomonas aeruginosa/metabolismo , Proteínas Bacterianas/genética , ADN Bacteriano/genética , Farmacorresistencia Bacteriana/genética , Electroforesis en Gel Bidimensional , Metilmalonato-Semialdehído Deshidrogenasa (Acetilante)/genética , Metilmalonato-Semialdehído Deshidrogenasa (Acetilante)/metabolismo , Pruebas de Sensibilidad Microbiana , Pseudomonas aeruginosa/genética , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción , Succionato-Semialdehído Deshidrogenasa/genética , Succionato-Semialdehído Deshidrogenasa/metabolismo
13.
PLoS One ; 5(5): e9454, 2010 May 13.
Artículo en Inglés | MEDLINE | ID: mdl-20485527

RESUMEN

We report the application of agent-based modeling to examine the signal transduction network and receptor arrays for chemotaxis in Escherichia coli, which are responsible for regulating swimming behavior in response to environmental stimuli. Agent-based modeling is a stochastic and bottom-up approach, where individual components of the modeled system are explicitly represented, and bulk properties emerge from their movement and interactions. We present the Chemoscape model: a collection of agents representing both fixed membrane-embedded and mobile cytoplasmic proteins, each governed by a set of rules representing knowledge or hypotheses about their function. When the agents were placed in a simulated cellular space and then allowed to move and interact stochastically, the model exhibited many properties similar to the biological system including adaptation, high signal gain, and wide dynamic range. We found the agent based modeling approach to be both powerful and intuitive for testing hypotheses about biological properties such as self-assembly, the non-linear dynamics that occur through cooperative protein interactions, and non-uniform distributions of proteins in the cell. We applied the model to explore the role of receptor type, geometry and cooperativity in the signal gain and dynamic range of the chemotactic response to environmental stimuli. The model provided substantial qualitative evidence that the dynamic range of chemotactic response can be traced to both the heterogeneity of receptor types present, and the modulation of their cooperativity by their methylation state.


Asunto(s)
Quimiotaxis , Escherichia coli/citología , Modelos Biológicos , Transducción de Señal , Células Quimiorreceptoras/metabolismo , Simulación por Computador , Proteínas de Escherichia coli/metabolismo , Ligandos , Metilación , Multimerización de Proteína , Receptores de Superficie Celular
14.
BMC Bioinformatics ; 10: 254, 2009 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-19691849

RESUMEN

BACKGROUND: Modern, high-throughput biological experiments generate copious, heterogeneous, interconnected data sets. Research is dynamic, with frequently changing protocols, techniques, instruments, and file formats. Because of these factors, systems designed to manage and integrate modern biological data sets often end up as large, unwieldy databases that become difficult to maintain or evolve. The novel rule-based approach of the Ultra-Structure design methodology presents a potential solution to this problem. By representing both data and processes as formal rules within a database, an Ultra-Structure system constitutes a flexible framework that enables users to explicitly store domain knowledge in both a machine- and human-readable form. End users themselves can change the system's capabilities without programmer intervention, simply by altering database contents; no computer code or schemas need be modified. This provides flexibility in adapting to change, and allows integration of disparate, heterogenous data sets within a small core set of database tables, facilitating joint analysis and visualization without becoming unwieldy. Here, we examine the application of Ultra-Structure to our ongoing research program for the integration of large proteomic and genomic data sets (proteogenomic mapping). RESULTS: We transitioned our proteogenomic mapping information system from a traditional entity-relationship design to one based on Ultra-Structure. Our system integrates tandem mass spectrum data, genomic annotation sets, and spectrum/peptide mappings, all within a small, general framework implemented within a standard relational database system. General software procedures driven by user-modifiable rules can perform tasks such as logical deduction and location-based computations. The system is not tied specifically to proteogenomic research, but is rather designed to accommodate virtually any kind of biological research. CONCLUSION: We find Ultra-Structure offers substantial benefits for biological information systems, the largest being the integration of diverse information sources into a common framework. This facilitates systems biology research by integrating data from disparate high-throughput techniques. It also enables us to readily incorporate new data types, sources, and domain knowledge with no change to the database structure or associated computer code. Ultra-Structure may be a significant step towards solving the hard problem of data management and integration in the systems biology era.


Asunto(s)
Biología Computacional/métodos , Sistemas de Administración de Bases de Datos , Biología de Sistemas , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información/métodos
15.
RNA ; 15(7): 1314-21, 2009 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-19458034

RESUMEN

Hydroxyl-selective electrophiles, including N-methylisatoic anhydride (NMIA) and 1-methyl-7-nitroisatoic anhydride (1M7), are broadly useful for RNA structure analysis because they react preferentially with the ribose 2'-OH group at conformationally unconstrained or flexible nucleotides. Each nucleotide in an RNA has the potential to form an adduct with these reagents to yield a comprehensive, nucleotide-resolution, view of RNA structure. However, it is possible that factors other than local structure modulate reactivity. To evaluate the influence of base identity on the intrinsic reactivity of each nucleotide, we analyze NMIA and 1M7 reactivity using four distinct RNAs, under both native and denaturing conditions. We show that guanosine and adenosine residues have identical intrinsic 2'-hydroxyl reactivities at pH 8.0 and are 1.4 and 1.7 times more reactive than uridine and cytidine, respectively. These subtle, but statistically significant, differences do not impact the ability of selective 2'-hydroxyl acylation analyzed by primer extension-based (SHAPE) methods to establish an RNA secondary structure or monitor RNA folding in solution because base-specific influences are much smaller than the reactivity differences between paired and unpaired nucleotides.


Asunto(s)
Anhídridos/química , Radical Hidroxilo/química , ARN/química , Ribosa/química , ortoaminobenzoatos/química , Acilación , VIH-1/genética , Conformación de Ácido Nucleico , ARN/genética , ARN/metabolismo , ARN Ribosómico/genética , Ribonucleasa P/genética
16.
RNA ; 14(10): 1979-90, 2008 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-18772246

RESUMEN

Analysis of the long-range architecture of RNA is a challenging experimental and computational problem. Local nucleotide flexibility, which directly reports underlying base pairing and tertiary interactions in an RNA, can be comprehensively assessed at single nucleotide resolution using high-throughput selective 2'-hydroxyl acylation analyzed by primer extension (hSHAPE). hSHAPE resolves structure-sensitive chemical modification information by high-resolution capillary electrophoresis and typically yields quantitative nucleotide flexibility information for 300-650 nucleotides (nt) per experiment. The electropherograms generated in hSHAPE experiments provide a wealth of structural information; however, significant algorithmic analysis steps are required to generate quantitative and interpretable data. We have developed a set of software tools called ShapeFinder to make possible rapid analysis of raw sequencer data from hSHAPE, and most other classes of nucleic acid reactivity experiments. The algorithms in ShapeFinder (1) convert measured fluorescence intensity to quantitative cDNA fragment amounts, (2) correct for signal decay over read lengths extending to 600 nts or more, (3) align reactivity data to the known RNA sequence, and (4) quantify per nucleotide reactivities using whole-channel Gaussian integration. The algorithms and user interface tools implemented in ShapeFinder create new opportunities for tackling ambitious problems involving high-throughput analysis of structure-function relationships in large RNAs.


Asunto(s)
Biología Computacional/métodos , Conformación de Ácido Nucleico , ARN/química , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Algoritmos , Secuencia de Bases , Electroforesis Capilar , Nucleótidos/química , ARN/aislamiento & purificación
17.
PLoS Biol ; 6(4): e96, 2008 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-18447581

RESUMEN

Replication and pathogenesis of the human immunodeficiency virus (HIV) is tightly linked to the structure of its RNA genome, but genome structure in infectious virions is poorly understood. We invent high-throughput SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension) technology, which uses many of the same tools as DNA sequencing, to quantify RNA backbone flexibility at single-nucleotide resolution and from which robust structural information can be immediately derived. We analyze the structure of HIV-1 genomic RNA in four biologically instructive states, including the authentic viral genome inside native particles. Remarkably, given the large number of plausible local structures, the first 10% of the HIV-1 genome exists in a single, predominant conformation in all four states. We also discover that noncoding regions functioning in a regulatory role have significantly lower (p-value < 0.0001) SHAPE reactivities, and hence more structure, than do viral coding regions that function as the template for protein synthesis. By directly monitoring protein binding inside virions, we identify the RNA recognition motif for the viral nucleocapsid protein. Seven structurally homologous binding sites occur in a well-defined domain in the genome, consistent with a role in directing specific packaging of genomic RNA into nascent virions. In addition, we identify two distinct motifs that are targets for the duplex destabilizing activity of this same protein. The nucleocapsid protein destabilizes local HIV-1 RNA structure in ways likely to facilitate initial movement both of the retroviral reverse transcriptase from its tRNA primer and of the ribosome in coding regions. Each of the three nucleocapsid interaction motifs falls in a specific genome domain, indicating that local protein interactions can be organized by the long-range architecture of an RNA. High-throughput SHAPE reveals a comprehensive view of HIV-1 RNA genome structure, and further application of this technology will make possible newly informative analysis of any RNA in a cellular transcriptome.


Asunto(s)
Genoma Viral , VIH-1/genética , ARN Viral/química , Acilación , Secuencia de Aminoácidos , Secuencia de Bases , Sitios de Unión , Cartilla de ADN/química , Humanos , Modelos Biológicos , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Proteínas de la Nucleocápside/química , Proteínas de la Nucleocápside/metabolismo , ARN Mensajero/química , ARN Mensajero/metabolismo , ARN de Transferencia de Lisina/química , ARN de Transferencia de Lisina/metabolismo , ARN Viral/metabolismo , Relación Estructura-Actividad , Transcripción Genética
18.
Curr Protoc Bioinformatics ; Chapter 13: Unit 13.9, 2008 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-18428684

RESUMEN

Genome-based peptide fingerprint scanning (GFS) directly maps several types of protein mass spectral (MS) data to the loci in the genome that may have encoded for the protein. This process can be used either for protein identification or for proteogenomic mapping, which is gene-finding and annotation based on proteomic data. Inputs to the program are one or more mass spectrometry files from peptide mass fingerprinting and/or tandem MS (MS/MS) along with one or more sequences to search them against, and the output is the coordinates of any matches found. This unit describes the use of GFS and subsequent results analysis.


Asunto(s)
Algoritmos , Espectrometría de Masas/métodos , Mapeo Peptídico/métodos , Proteínas/química , Proteínas/genética , Análisis de Secuencia/métodos , Programas Informáticos , Secuencia de Aminoácidos , Secuencia de Bases , Mapeo Cromosómico , Código Genético , Datos de Secuencia Molecular
20.
Bioinformatics ; 24(5): 674-81, 2008 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-18187442

RESUMEN

MOTIVATION: The identification of peptides by tandem mass spectrometry (MS/MS) is a central method of proteomics research, but due to the complexity of MS/MS data and the large databases searched, the accuracy of peptide identification algorithms remains limited. To improve the accuracy of identification we applied a machine-learning approach using a hidden Markov model (HMM) to capture the complex and often subtle links between a peptide sequence and its MS/MS spectrum. MODEL: Our model, HMM_Score, represents ion types as HMM states and calculates the maximum joint probability for a peptide/spectrum pair using emission probabilities from three factors: the amino acids adjacent to each fragmentation site, the mass dependence of ion types and the intensity dependence of ion types. The Viterbi algorithm is used to calculate the most probable assignment between ion types in a spectrum and a peptide sequence, then a correction factor is added to account for the propensity of the model to favor longer peptides. An expectation value is calculated based on the model score to assess the significance of each peptide/spectrum match. RESULTS: We trained and tested HMM_Score on three data sets generated by two different mass spectrometer types. For a reference data set recently reported in the literature and validated using seven identification algorithms, HMM_Score produced 43% more positive identification results at a 1% false positive rate than the best of two other commonly used algorithms, Mascot and X!Tandem. HMM_Score is a highly accurate platform for peptide identification that works well for a variety of mass spectrometer and biological sample types. AVAILABILITY: The program is freely available on ProteomeCommons via an OpenSource license. See http://bioinfo.unc.edu/downloads/ for the download link.


Asunto(s)
Cadenas de Markov , Péptidos/química , Algoritmos , Modelos Teóricos , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción , Espectrometría de Masas en Tándem
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...