Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
Proc Natl Acad Sci U S A ; 111(17): 6131-8, 2014 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-24753594

RESUMO

With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.


Assuntos
DNA/genética , Genoma Humano/genética , Evolução Biológica , Doença/genética , Humanos , Sequências Reguladoras de Ácido Nucleico/genética , Software
2.
Genome Res ; 22(9): 1646-57, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22955977

RESUMO

Data from the Encyclopedia of DNA Elements (ENCODE) project show over 9640 human genome loci classified as long noncoding RNAs (lncRNAs), yet only ~100 have been deeply characterized to determine their role in the cell. To measure the protein-coding output from these RNAs, we jointly analyzed two recent data sets produced in the ENCODE project: tandem mass spectrometry (MS/MS) data mapping expressed peptides to their encoding genomic loci, and RNA-seq data generated by ENCODE in long polyA+ and polyA- fractions in the cell lines K562 and GM12878. We used the machine-learning algorithm RuleFit3 to regress the peptide data against RNA expression data. The most important covariate for predicting translation was, surprisingly, the Cytosol polyA- fraction in both cell lines. LncRNAs are ~13-fold less likely to produce detectable peptides than similar mRNAs, indicating that ~92% of GENCODE v7 lncRNAs are not translated in these two ENCODE cell lines. Intersecting 9640 lncRNA loci with 79,333 peptides yielded 85 unique peptides matching 69 lncRNAs. Most cases were due to a coding transcript misannotated as lncRNA. Two exceptions were an unprocessed pseudogene and a bona fide lncRNA gene, both with open reading frames (ORFs) compromised by upstream stop codons. All potentially translatable lncRNA ORFs had only a single peptide match, indicating low protein abundance and/or false-positive peptide matches. We conclude that with very few exceptions, ribosomes are able to distinguish coding from noncoding transcripts and, hence, that ectopic translation and cryptic mRNAs are rare in the human lncRNAome.


Assuntos
Biossíntese de Proteínas , RNA Longo não Codificante/genética , Sequência de Aminoácidos , Sequência de Bases , Linhagem Celular , Expressão Gênica , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos , Células K562 , Anotação de Sequência Molecular , Dados de Sequência Molecular , Peptídeos/genética , RNA Longo não Codificante/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Alinhamento de Sequência , Espectrometria de Massas em Tandem/métodos
3.
J Proteome Res ; 12(6): 3019-25, 2013 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-23614390

RESUMO

Proteogenomic searching is a useful method for identifying novel proteins, annotating genes and detecting peptides unique to an individual genome. The approach, however, can be laborious, as it often requires search segmentation and the use of several unintegrated tools. Furthermore, many proteogenomic efforts have been limited to small genomes, as large genomes can prove impractical due to the required amount of computer memory and computation time. We present Peppy, a software tool designed to perform every necessary task of proteogenomic searches quickly, accurately and automatically. The software generates a peptide database from a genome, tracks peptide loci, matches peptides to MS/MS spectra and assigns confidence values to those matches. Peppy automatically performs a decoy database generation, search and analysis to return identifications at the desired false discovery rate threshold. Written in Java for cross-platform execution, the software is fully multithreaded for enhanced speed. The program can run on regular desktop computers, opening the doors of proteogenomic searching to a wider audience of proteomics and genomics researchers. Peppy is available at http://geneffects.com/peppy .


Assuntos
Anotação de Sequência Molecular , Fragmentos de Peptídeos/isolamento & purificação , Proteínas/isolamento & purificação , Proteômica , Software , Algoritmos , Sequência de Aminoácidos , Sequência de Bases , Linhagem Celular , Bases de Dados de Proteínas , Humanos , Dados de Sequência Molecular , Espectrometria de Massas em Tandem
4.
J Proteome Res ; 12(9): 4240-7, 2013 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-23875887

RESUMO

Peppy, the proteogenomic/proteomic search software, employs a novel method for assessing the match quality between an MS/MS spectrum and a theorized peptide sequence. The scoring system uses three score factors calculated with binomial probabilities: the probability that a fragment ion will randomly align with a peptide ion, the probability that the aligning ions will be selected from subsets of the most intense peaks, and the probability that the intensities of fragment ions identified as y-ions are greater than those of their counterpart b-ions. The scores produced by the method act as global confidence scores, which facilitate the accurate comparison of results and the estimation of false discovery rates. Peppy has been integrated into the meta-search engine PepArML to produce meaningful comparisons with Mascot, MSGF+, OMSSA, X!Tandem, k-Score and s-Score. For two of the four data sets examined with the PepArML analysis, Peppy exceeded the accuracy performance of the other scoring systems. Peppy is available for download at http://geneffects.com/peppy .


Assuntos
Mapeamento de Peptídeos , Software , Algoritmos , Sequência de Aminoácidos , Proteínas Sanguíneas/química , Humanos , Dados de Sequência Molecular , Fragmentos de Peptídeos/química , Análise de Sequência de Proteína , Espectrometria de Massas em Tandem
5.
BMC Genomics ; 14: 141, 2013 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-23448259

RESUMO

BACKGROUND: Proteogenomic mapping is an approach that uses mass spectrometry data from proteins to directly map protein-coding genes and could aid in locating translational regions in the human genome. In concert with the ENcyclopedia of DNA Elements (ENCODE) project, we applied proteogenomic mapping to produce proteogenomic tracks for the UCSC Genome Browser, to explore which putative translational regions may be missing from the human genome. RESULTS: We generated ~1 million high-resolution tandem mass (MS/MS) spectra for Tier 1 ENCODE cell lines K562 and GM12878 and mapped them against the UCSC hg19 human genome, and the GENCODE V7 annotated protein and transcript sets. We then compared the results from the three searches to identify the best-matching peptide for each MS/MS spectrum, thereby increasing the confidence of the putative new protein-coding regions found via the whole genome search. At a 1% false discovery rate, we identified 26,472, 24,406, and 13,128 peptides from the protein, transcript, and whole genome searches, respectively; of these, 481 were found solely via the whole genome search. The proteogenomic mapping data are available on the UCSC Genome Browser at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeUncBsuProt. CONCLUSIONS: The whole genome search revealed that ~4% of the uniquely mapping identified peptides were located outside GENCODE V7 annotated exons. The comparison of the results from the disparate searches also identified 15% more spectra than would have been found solely from a protein database search. Therefore, whole genome proteogenomic mapping is a complementary method for genome annotation when performed in conjunction with other searches.


Assuntos
Bases de Dados Genéticas , Genoma Humano , Anotação de Sequência Molecular , Fases de Leitura Aberta/genética , Linhagem Celular , Mapeamento Cromossômico , Biologia Computacional , Humanos , Espectrometria de Massas , Análise de Sequência de DNA
6.
Theor Biol Med Model ; 10: 23, 2013 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-23551850

RESUMO

BACKGROUND: It is a fascinating phenomenon that in genetically identical bacteria populations of Bacillus subtilis, a distinct DNA uptake phenotype called the competence phenotype may emerge in 10-20% of the population. Many aspects of the phenomenon are believed to be due to the variable expression of critical genes: a stochastic occurrence termed "noise" which has made the phenomenon difficult to examine directly by lab experimentation. METHODS: To capture and model noise in this system and further understand the emergence of competence both at the intracellular and culture levels in B. subtilis, we developed a novel multi-scale, agent-based model. At the intracellular level, our model recreates the regulatory network involved in the competence phenotype. At the culture level, we simulated growth conditions, with our multi-scale model providing feedback between the two levels. RESULTS: Our model predicted three potential sources of genetic "noise". First, the random spatial arrangement of molecules may influence the manifestation of the competence phenotype. In addition, the evidence suggests that there may be a type of epigenetic heritability to the emergence of competence, influenced by the molecular concentrations of key competence molecules inherited through cell division. Finally, the emergence of competence during the stationary phase may in part be due to the dilution effect of cell division upon protein concentrations. CONCLUSIONS: The competence phenotype was easily translated into an agent-based model - one with the ability to illuminate complex cell behavior. Models such as the one described in this paper can simulate cell behavior that is otherwise unobservable in vivo, highlighting their potential usefulness as research tools.


Assuntos
Bacillus subtilis/fisiologia , Modelos Teóricos , Bacillus subtilis/genética , Genes Bacterianos , Biossíntese de Proteínas , Processos Estocásticos , Transcrição Gênica
7.
Anal Chem ; 84(21): 9008-14, 2012 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-23030679

RESUMO

Membrane proteomics, the large-scale analysis of membrane proteins, is often constrained by the difficulties of achieving fully resolvable separation and resistance to proteolysis, both of which could lead to low recovery and low identification rates of membrane proteins. Here, we introduce a novel integrated approach, GELFrEE Optimized FASP Technology (GOFAST) for large-scale and comprehensive membrane proteins analysis. Using an array of sample preparation techniques including gel-eluted liquid fraction entrapment electrophoresis (GELFrEE), filter-aided sample preparation (FASP), and microwave-assisted on-filter enzymatic digestion, we identified 2 090 proteins from the membrane fraction of a leukemia cell line (K562). Of these, 37% are annotated as membrane proteins according to gene ontology analysis, resulting in the largest membrane proteome of leukemia cells reported to date. Our approach combines the advantages of GELFrEE high-loading capacity, gel-free separation, efficient depletion of detergents, and microwave-assisted on-filter digestion, minimizing sample losses and maximizing MS-detectable sequence coverage of individual proteins. In addition, this approach also shows great potential for the identification of alternative splicing products.


Assuntos
Métodos Analíticos de Preparação de Amostras/métodos , Eletroforese/métodos , Proteínas de Membrana/análise , Proteoma/análise , Proteômica/métodos , Filtração , Humanos , Células K562 , Proteínas de Membrana/química , Proteínas de Membrana/isolamento & purificação , Isoformas de Proteínas/análise , Isoformas de Proteínas/química , Proteoma/química
8.
Bioinformatics ; 27(6): 844-52, 2011 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-21389073

RESUMO

MOTIVATION: Post-translational modifications are vital to the function of proteins, but are hard to study, especially since several modified isoforms of a protein may be present simultaneously. Mass spectrometers are a great tool for investigating modified proteins, but the data they provide is often incomplete, ambiguous and difficult to interpret. Combining data from multiple experimental techniques-especially bottom-up and top-down mass spectrometry-provides complementary information. When integrated with background knowledge this allows a human expert to interpret what modifications are present and where on a protein they are located. However, the process is arduous and for high-throughput applications needs to be automated. RESULTS: This article explores a data integration methodology based on Markov chain Monte Carlo and simulated annealing. Our software, the Protein Inference Engine (the PIE) applies these algorithms using a modular approach, allowing multiple types of data to be considered simultaneously and for new data types to be added as needed. Even for complicated data representing multiple modifications and several isoforms, the PIE generates accurate modification predictions, including location. When applied to experimental data collected on the L7/L12 ribosomal protein the PIE was able to make predictions consistent with manual interpretation for several different L7/L12 isoforms using a combination of bottom-up data with experimentally identified intact masses. AVAILABILITY: Software, demo projects and source can be downloaded from http://pie.giddingslab.org/


Assuntos
Espectrometria de Massas/métodos , Processamento de Proteína Pós-Traducional , Proteínas/química , Software , Algoritmos , Proteínas de Bactérias/análise , Proteínas de Bactérias/química , Escherichia coli/química , Cadeias de Markov , Método de Monte Carlo , Isoformas de Proteínas/análise , Isoformas de Proteínas/química , Proteínas/análise , Proteômica/métodos , Proteínas Ribossômicas/análise , Proteínas Ribossômicas/química
10.
RNA ; 15(7): 1314-21, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19458034

RESUMO

Hydroxyl-selective electrophiles, including N-methylisatoic anhydride (NMIA) and 1-methyl-7-nitroisatoic anhydride (1M7), are broadly useful for RNA structure analysis because they react preferentially with the ribose 2'-OH group at conformationally unconstrained or flexible nucleotides. Each nucleotide in an RNA has the potential to form an adduct with these reagents to yield a comprehensive, nucleotide-resolution, view of RNA structure. However, it is possible that factors other than local structure modulate reactivity. To evaluate the influence of base identity on the intrinsic reactivity of each nucleotide, we analyze NMIA and 1M7 reactivity using four distinct RNAs, under both native and denaturing conditions. We show that guanosine and adenosine residues have identical intrinsic 2'-hydroxyl reactivities at pH 8.0 and are 1.4 and 1.7 times more reactive than uridine and cytidine, respectively. These subtle, but statistically significant, differences do not impact the ability of selective 2'-hydroxyl acylation analyzed by primer extension-based (SHAPE) methods to establish an RNA secondary structure or monitor RNA folding in solution because base-specific influences are much smaller than the reactivity differences between paired and unpaired nucleotides.


Assuntos
Anidridos/química , Radical Hidroxila/química , RNA/química , Ribose/química , ortoaminobenzoatos/química , Acilação , HIV-1/genética , Conformação de Ácido Nucleico , RNA/genética , RNA/metabolismo , RNA Ribossômico/genética , Ribonuclease P/genética
11.
PLoS Biol ; 6(4): e96, 2008 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-18447581

RESUMO

Replication and pathogenesis of the human immunodeficiency virus (HIV) is tightly linked to the structure of its RNA genome, but genome structure in infectious virions is poorly understood. We invent high-throughput SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension) technology, which uses many of the same tools as DNA sequencing, to quantify RNA backbone flexibility at single-nucleotide resolution and from which robust structural information can be immediately derived. We analyze the structure of HIV-1 genomic RNA in four biologically instructive states, including the authentic viral genome inside native particles. Remarkably, given the large number of plausible local structures, the first 10% of the HIV-1 genome exists in a single, predominant conformation in all four states. We also discover that noncoding regions functioning in a regulatory role have significantly lower (p-value < 0.0001) SHAPE reactivities, and hence more structure, than do viral coding regions that function as the template for protein synthesis. By directly monitoring protein binding inside virions, we identify the RNA recognition motif for the viral nucleocapsid protein. Seven structurally homologous binding sites occur in a well-defined domain in the genome, consistent with a role in directing specific packaging of genomic RNA into nascent virions. In addition, we identify two distinct motifs that are targets for the duplex destabilizing activity of this same protein. The nucleocapsid protein destabilizes local HIV-1 RNA structure in ways likely to facilitate initial movement both of the retroviral reverse transcriptase from its tRNA primer and of the ribosome in coding regions. Each of the three nucleocapsid interaction motifs falls in a specific genome domain, indicating that local protein interactions can be organized by the long-range architecture of an RNA. High-throughput SHAPE reveals a comprehensive view of HIV-1 RNA genome structure, and further application of this technology will make possible newly informative analysis of any RNA in a cellular transcriptome.


Assuntos
Genoma Viral , HIV-1/genética , RNA Viral/química , Acilação , Sequência de Aminoácidos , Sequência de Bases , Sítios de Ligação , Primers do DNA/química , Humanos , Modelos Biológicos , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Proteínas do Nucleocapsídeo/química , Proteínas do Nucleocapsídeo/metabolismo , RNA Mensageiro/química , RNA Mensageiro/metabolismo , RNA de Transferência de Lisina/química , RNA de Transferência de Lisina/metabolismo , RNA Viral/metabolismo , Relação Estrutura-Atividade , Transcrição Gênica
12.
Antimicrob Agents Chemother ; 54(11): 4626-35, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20696867

RESUMO

Microbes have developed resistance to nearly every antibiotic, yet the steps leading to drug resistance remain unclear. Here we report a multistage process by which Pseudomonas aeruginosa acquires drug resistance following exposure to ciprofloxacin at levels ranging from 0.5× to 8× the initial MIC. In stage I, susceptible cells are killed en masse by the exposure. In stage II, a small, slow to nongrowing population survives antibiotic exposure that does not exhibit significantly increased resistance according to the MIC measure. In stage III, exhibited at 0.5× to 4× the MIC, a growing population emerges to reconstitute the population, and these cells display heritable increases in drug resistance of up to 50 times the original level. We studied the stage III cells by proteomic methods to uncover differences in the regulatory pathways that are involved in this phenotype, revealing upregulation of phosphorylation on two proteins, succinate-semialdehyde dehydrogenase (SSADH) and methylmalonate-semialdehyde dehydrogenase (MMSADH), and also revealing upregulation of a highly conserved protein of unknown function. Transposon disruption in the encoding genes for each of these targets substantially dampened the ability of cells to develop the stage III phenotype. Considering these results in combination with computational models of resistance and genomic sequencing results, we postulate that stage III heritable resistance develops from a combination of both genomic mutations and modulation of one or more preexisting cellular pathways.


Assuntos
Anti-Infecciosos/farmacologia , Proteínas de Bactérias/metabolismo , Ciprofloxacina/farmacologia , Farmacorresistência Bacteriana/fisiologia , Pseudomonas aeruginosa/efeitos dos fármacos , Pseudomonas aeruginosa/metabolismo , Proteínas de Bactérias/genética , DNA Bacteriano/genética , Farmacorresistência Bacteriana/genética , Eletroforese em Gel Bidimensional , Metilmalonato-Semialdeído Desidrogenase (Acilante)/genética , Metilmalonato-Semialdeído Desidrogenase (Acilante)/metabolismo , Testes de Sensibilidade Microbiana , Pseudomonas aeruginosa/genética , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Succinato-Semialdeído Desidrogenase/genética , Succinato-Semialdeído Desidrogenase/metabolismo
13.
RNA ; 14(10): 1979-90, 2008 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-18772246

RESUMO

Analysis of the long-range architecture of RNA is a challenging experimental and computational problem. Local nucleotide flexibility, which directly reports underlying base pairing and tertiary interactions in an RNA, can be comprehensively assessed at single nucleotide resolution using high-throughput selective 2'-hydroxyl acylation analyzed by primer extension (hSHAPE). hSHAPE resolves structure-sensitive chemical modification information by high-resolution capillary electrophoresis and typically yields quantitative nucleotide flexibility information for 300-650 nucleotides (nt) per experiment. The electropherograms generated in hSHAPE experiments provide a wealth of structural information; however, significant algorithmic analysis steps are required to generate quantitative and interpretable data. We have developed a set of software tools called ShapeFinder to make possible rapid analysis of raw sequencer data from hSHAPE, and most other classes of nucleic acid reactivity experiments. The algorithms in ShapeFinder (1) convert measured fluorescence intensity to quantitative cDNA fragment amounts, (2) correct for signal decay over read lengths extending to 600 nts or more, (3) align reactivity data to the known RNA sequence, and (4) quantify per nucleotide reactivities using whole-channel Gaussian integration. The algorithms and user interface tools implemented in ShapeFinder create new opportunities for tackling ambitious problems involving high-throughput analysis of structure-function relationships in large RNAs.


Assuntos
Biologia Computacional/métodos , Conformação de Ácido Nucleico , RNA/química , Análise de Sequência de RNA/métodos , Software , Algoritmos , Sequência de Bases , Eletroforese Capilar , Nucleotídeos/química , RNA/isolamento & purificação
14.
BMC Bioinformatics ; 10: 254, 2009 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-19691849

RESUMO

BACKGROUND: Modern, high-throughput biological experiments generate copious, heterogeneous, interconnected data sets. Research is dynamic, with frequently changing protocols, techniques, instruments, and file formats. Because of these factors, systems designed to manage and integrate modern biological data sets often end up as large, unwieldy databases that become difficult to maintain or evolve. The novel rule-based approach of the Ultra-Structure design methodology presents a potential solution to this problem. By representing both data and processes as formal rules within a database, an Ultra-Structure system constitutes a flexible framework that enables users to explicitly store domain knowledge in both a machine- and human-readable form. End users themselves can change the system's capabilities without programmer intervention, simply by altering database contents; no computer code or schemas need be modified. This provides flexibility in adapting to change, and allows integration of disparate, heterogenous data sets within a small core set of database tables, facilitating joint analysis and visualization without becoming unwieldy. Here, we examine the application of Ultra-Structure to our ongoing research program for the integration of large proteomic and genomic data sets (proteogenomic mapping). RESULTS: We transitioned our proteogenomic mapping information system from a traditional entity-relationship design to one based on Ultra-Structure. Our system integrates tandem mass spectrum data, genomic annotation sets, and spectrum/peptide mappings, all within a small, general framework implemented within a standard relational database system. General software procedures driven by user-modifiable rules can perform tasks such as logical deduction and location-based computations. The system is not tied specifically to proteogenomic research, but is rather designed to accommodate virtually any kind of biological research. CONCLUSION: We find Ultra-Structure offers substantial benefits for biological information systems, the largest being the integration of diverse information sources into a common framework. This facilitates systems biology research by integrating data from disparate high-throughput techniques. It also enables us to readily incorporate new data types, sources, and domain knowledge with no change to the database structure or associated computer code. Ultra-Structure may be a significant step towards solving the hard problem of data management and integration in the systems biology era.


Assuntos
Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Biologia de Sistemas , Bases de Dados Factuais , Armazenamento e Recuperação da Informação/métodos
15.
Bioinformatics ; 24(5): 674-81, 2008 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-18187442

RESUMO

MOTIVATION: The identification of peptides by tandem mass spectrometry (MS/MS) is a central method of proteomics research, but due to the complexity of MS/MS data and the large databases searched, the accuracy of peptide identification algorithms remains limited. To improve the accuracy of identification we applied a machine-learning approach using a hidden Markov model (HMM) to capture the complex and often subtle links between a peptide sequence and its MS/MS spectrum. MODEL: Our model, HMM_Score, represents ion types as HMM states and calculates the maximum joint probability for a peptide/spectrum pair using emission probabilities from three factors: the amino acids adjacent to each fragmentation site, the mass dependence of ion types and the intensity dependence of ion types. The Viterbi algorithm is used to calculate the most probable assignment between ion types in a spectrum and a peptide sequence, then a correction factor is added to account for the propensity of the model to favor longer peptides. An expectation value is calculated based on the model score to assess the significance of each peptide/spectrum match. RESULTS: We trained and tested HMM_Score on three data sets generated by two different mass spectrometer types. For a reference data set recently reported in the literature and validated using seven identification algorithms, HMM_Score produced 43% more positive identification results at a 1% false positive rate than the best of two other commonly used algorithms, Mascot and X!Tandem. HMM_Score is a highly accurate platform for peptide identification that works well for a variety of mass spectrometer and biological sample types. AVAILABILITY: The program is freely available on ProteomeCommons via an OpenSource license. See http://bioinfo.unc.edu/downloads/ for the download link.


Assuntos
Cadeias de Markov , Peptídeos/química , Algoritmos , Modelos Teóricos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Espectrometria de Massas em Tandem
16.
BMC Microbiol ; 7: 63, 2007 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-17605819

RESUMO

BACKGROUND: Little is known regarding the extent or targets of phosphorylation in mycoplasmas, yet in many other bacterial species phosphorylation is known to play an important role in signaling and regulation of cellular processes. To determine the prevalence of phosphorylation in mycoplasmas, we examined the CHAPS-soluble protein fractions of Mycoplasma genitalium and Mycoplasma pneumoniae by two-dimensional gel electrophoresis (2-DE), using a combination of Pro-Q Diamond phosphoprotein stain and 33P labeling. Protein spots that were positive for phosphorylation were identified by peptide mass fingerprinting using MALDI-TOF-TOF mass spectrometry. RESULTS: We identified a total of 24 distinct phosphoproteins, about 3% and 5% of the total protein complement in M. pneumoniae and M. genitalium, respectively, indicating that phosphorylation occurs with prevalence similar to many other bacterial species. Identified phosphoproteins include pyruvate dehydrogenase E1 alpha and beta subunits, enolase, heat shock proteins DnaK and GroEL, elongation factor Tu, cytadherence accessory protein HMW3, P65, and several hypothetical proteins. These proteins are involved in energy metabolism, carbohydrate metabolism, translation/transcription and cytadherence. Interestingly, fourteen of the 24 phosphoproteins we identified (58%) were previously reported as putatively associated with a cytoskeleton-like structure that is present in the mycoplasmas, indicating a potential regulatory role for phosphorylation in this structure. CONCLUSION: This study has shown that phosphorylation in mycoplasmas is comparable to that of other bacterial species. Our evidence supports a link between phosphorylation and cytadherence and/or a cytoskeleton-like structure, since over half of the proteins identified as phosphorylated have been previously associated with these functions. This opens the door to further research into the purposes and mechanisms of phosphorylation for mycoplasmas.


Assuntos
Proteínas de Bactérias/metabolismo , Mycoplasma genitalium/metabolismo , Fosfoproteínas/metabolismo , Pneumonia por Mycoplasma/metabolismo , Proteínas de Bactérias/química , Metabolismo dos Carboidratos , Eletroforese em Gel Bidimensional , Metabolismo Energético , Fator Tu de Elongação de Peptídeos/química , Fator Tu de Elongação de Peptídeos/metabolismo , Mapeamento de Peptídeos , Fosfoproteínas/química , Fosfopiruvato Hidratase/química , Fosfopiruvato Hidratase/metabolismo , Fosforilação , Proteômica/métodos , Piruvato Desidrogenase (Lipoamida)/química , Piruvato Desidrogenase (Lipoamida)/metabolismo , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz
17.
BMC Evol Biol ; 6: 16, 2006 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-16483367

RESUMO

BACKGROUND: The Dscam gene in the fruit fly, Drosophila melanogaster, contains twenty-four exons, four of which are composed of tandem arrays that each undergo mutually exclusive alternative splicing (4, 6, 9 and 17), potentially generating 38,016 protein isoforms. This degree of transcript diversity has not been found in mammalian homologs of Dscam. We examined the molecular evolution of exons within this gene family to locate the point of divergence for this alternative splicing pattern. RESULTS: Using the fruit fly Dscam exons 4, 6, 9 and 17 as seed sequences, we iteratively searched sixteen genomes for homologs, and then performed phylogenetic analyses of the resulting sequences to examine their evolutionary history. We found homologs in the nematode, arthropod and vertebrate genomes, including homologs in several vertebrates where Dscam had not been previously annotated. Among these, only the arthropods contain homologs arranged in tandem arrays indicative of mutually exclusive splicing. We found no homologs to these exons within the Arabidopsis, yeast, tunicate or sea urchin genomes but homologs to several constitutive exons from fly Dscam were present within tunicate and sea urchin. Comparing the rate of turnover within the tandem arrays of the insect taxa (fruit fly, mosquito and honeybee), we found the variants within exons 4 and 17 are well conserved in number and spatial arrangement despite 248-283 million years of divergence. In contrast, the variants within exons 6 and 9 have undergone considerable turnover since these taxa diverged, as indicated by deeply branching taxon-specific lineages. CONCLUSION: Our results suggest that at least one Dscam exon array may be an ancient duplication that predates the divergence of deuterostomes from protostomes but that there is no evidence for the presence of arrays in the common ancestor of vertebrates. The different patterns of conservation and turnover among the Dscam exon arrays provide a striking example of how a gene can evolve in a modular fashion rather than as a single unit.


Assuntos
Processamento Alternativo/genética , Evolução Molecular , Éxons/genética , Família Multigênica/genética , Proteínas/genética , Sequência de Aminoácidos , Animais , Sequência Conservada , Drosophila melanogaster/genética , Humanos , Dados de Sequência Molecular , Filogenia , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos
19.
Microb Drug Resist ; 19(6): 428-36, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-23808957

RESUMO

The alarming rise of ciprofloxacin-resistant Pseudomonas aeruginosa has been reported in several clinical studies. Though the mutation of resistance genes and their role in drug resistance has been researched, the process by which the bacterium acquires high-level resistance is still not well understood. How does the genomic evolution of P. aeruginosa affect resistance development? Could the exposure of antibiotics to the bacteria enrich genomic variants that lead to the development of resistance, and if so, how are these variants distributed through the genome? To answer these questions, we performed 454 pyrosequencing and a whole genome analysis both before and after exposure to ciprofloxacin. The comparative sequence data revealed 93 unique resistance strain variation sites, which included a mutation in the DNA gyrase subunit A gene. We generated variation-distribution maps comparing the wild and resistant types, and isolated 19 candidates from three discrete resistance-associated high variability regions that had available transposon mutants, to perform a ciprofloxacin exposure assay. Of these region candidates with transposon disruptions, 79% (15/19) showed a reduction in the ability to gain high-level resistance, suggesting that genes within these high variability regions might enrich for certain functions associated with resistance development.


Assuntos
DNA Girase/genética , Farmacorresistência Bacteriana/genética , Genoma Bacteriano , Mutação , Pseudomonas aeruginosa/genética , Antibacterianos/farmacologia , Ciprofloxacina/farmacologia , Elementos de DNA Transponíveis , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular , Pseudomonas aeruginosa/efeitos dos fármacos
20.
Methods Mol Biol ; 694: 255-90, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21082440

RESUMO

This chapter describes using the Protein Inference Engine (PIE) to integrate various types of data--especially top down and bottom up mass spectrometer (MS) data--to describe a protein's posttranslational modifications (PTMs). PTMs include cleavage events such as the n-terminal loss of methionine and residue modifications like phosphorylation. Modifications are key elements in many biological processes, but are difficult to study as no single, general method adequately characterizes a protein's PTMs; manually integrating data from several MS experiments is usually required. The PIE is designed to automate this process using a guess and refine process similar to how an expert manually integrates data. The PIE repeatedly "imagines" a possible modification set, evaluates it using available data, and then tries to improve on it. After many rounds of refinement, the resulting modification set is proposed as a candidate answer. Multiple candidate answers are generated to obtain both best and near-best answers. Near-best answers are crucial in allowing for proteins with more than one supported modification pattern (isoforms) and obtaining robust results given incomplete and inconsistent data.The goal of this chapter is to walk the reader through installing and using the downloadable version of PIE, both in general and by means of a specific, detailed example. The example integrates several types of experimental and background (prior) data. It is not a "perfect-world" scenario, but has been designed to illustrate several real-world difficulties that may be encountered when trying to analyze imperfect data.


Assuntos
Biologia Computacional/métodos , Processamento Eletrônico de Dados/métodos , Processamento de Proteína Pós-Traducional , Proteínas/metabolismo , Software , Sequência de Aminoácidos , Espectrometria de Massas , Dados de Sequência Molecular , Peso Molecular , Peptídeos/química , Peptídeos/metabolismo , Fosforilação , Isoformas de Proteínas/química , Isoformas de Proteínas/metabolismo , Proteínas/química
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa