Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Proteome Res ; 23(6): 1983-1999, 2024 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-38728051

RESUMEN

In recent years, several deep learning-based methods have been proposed for predicting peptide fragment intensities. This study aims to provide a comprehensive assessment of six such methods, namely Prosit, DeepMass:Prism, pDeep3, AlphaPeptDeep, Prosit Transformer, and the method proposed by Guan et al. To this end, we evaluated the accuracy of the predicted intensity profiles for close to 1.7 million precursors (including both tryptic and HLA peptides) corresponding to more than 18 million experimental spectra procured from 40 independent submissions to the PRIDE repository that were acquired for different species using a variety of instruments and different dissociation types/energies. Specifically, for each method, distributions of similarity (measured by Pearson's correlation and normalized angle) between the predicted and the corresponding experimental b and y fragment intensities were generated. These distributions were used to ascertain the prediction accuracy and rank the prediction methods for particular types of experimental conditions. The effect of variables like precursor charge, length, and collision energy on the prediction accuracy was also investigated. In addition to prediction accuracy, the methods were evaluated in terms of prediction speed. The systematic assessment of these six methods may help in choosing the right method for MS/MS spectra prediction for particular needs.


Asunto(s)
Aprendizaje Profundo , Humanos , Fragmentos de Péptidos/química , Fragmentos de Péptidos/análisis , Espectrometría de Masas en Tándem/métodos , Espectrometría de Masas en Tándem/estadística & datos numéricos , Proteómica/métodos , Proteómica/estadística & datos numéricos
2.
Nucleic Acids Res ; 50(2): e11, 2022 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-34791389

RESUMEN

The choice of guide RNA (gRNA) for CRISPR-based gene targeting is an essential step in gene editing applications, but the prediction of gRNA specificity remains challenging. Lack of transparency and focus on point estimates of efficiency disregarding the information on possible error sources in the model limit the power of existing Deep Learning-based methods. To overcome these problems, we present a new approach, a hybrid of Capsule Networks and Gaussian Processes. Our method predicts the cleavage efficiency of a gRNA with a corresponding confidence interval, which allows the user to incorporate information regarding possible model errors into the experimental design. We provide the first utilization of uncertainty estimation in computational gRNA design, which is a critical step toward accurate decision-making for future CRISPR applications. The proposed solution demonstrates acceptable confidence intervals for most test sets and shows regression quality similar to existing models. We introduce a set of criteria for gRNA selection based on off-target cleavage efficiency and its variance and present a collection of pre-computed gRNAs for human chromosome 22. Using Neural Network Interpretation methods, we show that our model rediscovers an established biological factor underlying cleavage efficiency, the importance of the seed region in gRNA.


Asunto(s)
Sistemas CRISPR-Cas , Aprendizaje Profundo , Edición Génica , Marcación de Gen , ARN Guía de Kinetoplastida/genética , Algoritmos , Edición Génica/métodos , Marcación de Gen/métodos , Genómica/métodos , Humanos , Redes Neurales de la Computación , Reproducibilidad de los Resultados
3.
Nucleic Acids Res ; 46(13): 6712-6725, 2018 07 27.
Artículo en Inglés | MEDLINE | ID: mdl-29788454

RESUMEN

Despite the key role of the human ribosome in protein biosynthesis, little is known about the extent of sequence variation in ribosomal DNA (rDNA) or its pre-rRNA and rRNA products. We recovered ribosomal DNA segments from a single human chromosome 21 using transformation-associated recombination (TAR) cloning in yeast. Accurate long-read sequencing of 13 isolates covering ∼0.82 Mb of the chromosome 21 rDNA complement revealed substantial variation among tandem repeat rDNA copies, several palindromic structures and potential errors in the previous reference sequence. These clones revealed 101 variant positions in the 45S transcription unit and 235 in the intergenic spacer sequence. Approximately 60% of the 45S variants were confirmed in independent whole-genome or RNA-seq data, with 47 of these further observed in mature 18S/28S rRNA sequences. TAR cloning and long-read sequencing enabled the accurate reconstruction of multiple rDNA units and a new, high-quality 44 838 bp rDNA reference sequence, which we have annotated with variants detected from chromosome 21 of a single individual. The large number of variants observed reveal heterogeneity in human rDNA, opening up the possibility of corresponding variations in ribosome dynamics.


Asunto(s)
Cromosomas Humanos Par 21 , ADN Ribosómico/química , Genes de ARNr , Variación Genética , Animales , Línea Celular , Clonación Molecular , ADN Ribosómico/aislamiento & purificación , ADN Espaciador Ribosómico/química , Humanos , Ratones , Conformación de Ácido Nucleico , Región Organizadora del Nucléolo/química , ARN Ribosómico/química , ARN Ribosómico/metabolismo , Análisis de Secuencia de ADN
4.
Proteomics ; 19(14): e1800367, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-30908818

RESUMEN

Mass spectrometry-based proteomics starts with identifications of peptides and proteins, which provide the bases for forming the next-level hypotheses whose "validations" are often employed for forming even higher level hypotheses and so forth. Scientifically meaningful conclusions are thus attainable only if the number of falsely identified peptides/proteins is accurately controlled. For this reason, RAId continued to be developed in the past decade. RAId employs rigorous statistics for peptides/proteins identification, hence assigning accurate P-values/E-values that can be used confidently to control the number of falsely identified peptides and proteins. The RAId web service is a versatile tool built to identify peptides and proteins from tandem mass spectrometry data. Not only recognizing various spectra file formats, the web service also allows four peptide scoring functions and choice of three statistical methods for assigning P-values/E-values to identified peptides. Users may upload their own protein database or use one of the available knowledge integrated organismal databases that contain annotated information such as single amino acid polymorphisms, post-translational modifications, and their disease associations. The web service also provides a friendly interface to display, sort using different criteria, and download the identified peptides and proteins. RAId web service is freely available at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/raid.


Asunto(s)
Bases de Datos de Proteínas , Espectrometría de Masas/métodos , Proteómica/métodos , Biología Computacional
5.
Nucleic Acids Res ; 44(22): 10898-10911, 2016 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-27466388

RESUMEN

Specific structures in mRNA modulate translation rate and thus can affect protein folding. Using the protein structures from two eukaryotes and three prokaryotes, we explore the connections between the protein compactness, inferred from solvent accessibility, and mRNA structure, inferred from mRNA folding energy (ΔG). In both prokaryotes and eukaryotes, the ΔG value of the most stable 30 nucleotide segment of the mRNA (ΔGmin) strongly, positively correlates with protein solvent accessibility. Thus, mRNAs containing exceptionally stable secondary structure elements typically encode compact proteins. The correlations between ΔG and protein compactness are much more pronounced in predicted ordered parts of proteins compared to the predicted disordered parts, indicative of an important role of mRNA secondary structure elements in the control of protein folding. Additionally, ΔG correlates with the mRNA length and the evolutionary rate of synonymous positions. The correlations are partially independent and were used to construct multiple regression models which explain about half of the variance of protein solvent accessibility. These findings suggest a model in which the mRNA structure, particularly exceptionally stable RNA structural elements, act as gauges of protein co-translational folding by reducing ribosome speed when the nascent peptide needs time to form and optimize the core structure.


Asunto(s)
Pliegue de Proteína , ARN Mensajero/fisiología , Animales , Composición de Base , Humanos , Cinética , Modelos Lineales , Modelos Moleculares , Conformación de Ácido Nucleico , Biosíntesis de Proteínas , Estructura Secundaria de Proteína , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Estabilidad del ARN , ARN Mensajero/química , Termodinámica , Transcriptoma
6.
Bioinformatics ; 32(17): i552-i558, 2016 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-27587674

RESUMEN

MOTIVATION: Target-specific hybridization depends on oligo-probe characteristics that improve hybridization specificity and minimize genome-wide cross-hybridization. Interplay between specific hybridization and genome-wide cross-hybridization has been insufficiently studied, despite its crucial role in efficient probe design and in data analysis. RESULTS: In this study, we defined hybridization specificity as a ratio between oligo target-specific hybridization and oligo genome-wide cross-hybridization. A microarray database, derived from the Genomic Comparison Hybridization (GCH) experiment and performed using the Affymetrix platform, contains two different types of probes. The first type of oligo-probes does not have a specific target on the genome and their hybridization signals are derived from genome-wide cross-hybridization alone. The second type includes oligonucleotides that have a specific target on the genomic DNA and their signals are derived from specific and cross-hybridization components combined together in a total signal. A comparative analysis of hybridization specificity of oligo-probes, as well as their nucleotide sequences and thermodynamic features was performed on the database. The comparison has revealed that hybridization specificity was negatively affected by low stability of the fully-paired oligo-target duplex, stable probe self-folding, G-rich content, including GGG motifs, low sequence complexity and nucleotide composition symmetry. CONCLUSION: Filtering out the probes with defined 'negative' characteristics significantly increases specific hybridization and dramatically decreasing genome-wide cross-hybridization. Selected oligo-probes have two times higher hybridization specificity on average, compared to the probes that were filtered from the analysis by applying suggested cutoff thresholds to the described parameters. A new approach for efficient oligo-probe design is described in our study. CONTACT: shabalin@ncbi.nlm.nih.gov or olga.matveeva@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma , Hibridación de Ácido Nucleico , Análisis de Secuencia por Matrices de Oligonucleótidos , Relación Señal-Ruido , Sondas de ADN , Perfilación de la Expresión Génica , Genómica , Oligonucleótidos , Sensibilidad y Especificidad
7.
RNA Biol ; 14(12): 1649-1654, 2017 12 02.
Artículo en Inglés | MEDLINE | ID: mdl-28722509

RESUMEN

Comparison of mRNA and protein structures shows that highly structured mRNAs typically encode compact protein domains suggesting that mRNA structure controls protein folding. This function is apparently performed by distinct structural elements in the mRNA, which implies 'fine tuning' of mRNA structure under selection for optimal protein folding. We find that, during evolution, changes in the mRNA folding energy follow amino acid replacements, reinforcing the notion of an intimate connection between the structures of a mRNA and the protein it encodes, and the double encoding of protein sequence and folding in the mRNA.


Asunto(s)
Adaptación Biológica , Conformación de Ácido Nucleico , Biosíntesis de Proteínas , Pliegue de Proteína , ARN Mensajero/química , ARN Mensajero/genética , Animales , Evolución Biológica , Humanos , Estabilidad del ARN , Selección Genética , Relación Estructura-Actividad
8.
Nucleic Acids Res ; 42(11): 7132-44, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-24792168

RESUMEN

Alternative splicing (AS), alternative transcription initiation (ATI) and alternative transcription termination (ATT) create the extraordinary complexity of transcriptomes and make key contributions to the structural and functional diversity of mammalian proteomes. Analysis of mammalian genomic and transcriptomic data shows that contrary to the traditional view, the joint contribution of ATI and ATT to the transcriptome and proteome diversity is quantitatively greater than the contribution of AS. Although the mean numbers of protein-coding constitutive and alternative nucleotides in gene loci are nearly identical, their distribution along the transcripts is highly non-uniform. On average, coding exons in the variable 5' and 3' transcript ends that are created by ATI and ATT contain approximately four times more alternative nucleotides than core protein-coding regions that diversify exclusively via AS. Short upstream exons that encompass alternative 5'-untranslated regions and N-termini of proteins evolve under strong nucleotide-level selection whereas in 3'-terminal exons that encode protein C-termini, protein-level selection is significantly stronger. The groups of genes that are subject to ATI and ATT show major differences in biological roles, expression and selection patterns.


Asunto(s)
Evolución Molecular , Isoformas de Proteínas/genética , Iniciación de la Transcripción Genética , Terminación de la Transcripción Genética , Animales , Variación Genética , Humanos , Ratones , Proteoma , Transcriptoma
9.
medRxiv ; 2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38313291

RESUMEN

Objective: To investigate the relationship between vaccination rates and excess mortality during distinct waves of SARS-CoV-2 variant-specific infections, while considering a state's GDP per capita. Methods: We ranked U.S. states by vaccination rates and GDP and employed the CDC's excess mortality model for regression and odds ratio analysis. Results: Regression analysis reveals that both vaccination and GDP are significant factors related to mortality when considering the entire U.S. population. Notably, in wealthier states (with GDP above $65,000), excess mortality is primarily driven by slow vaccination rates, while in less affluent states, low GDP plays a major role. Odds ratio analysis demonstrates an almost twofold increase in mortality linked to the Delta and Omicron BA.1 virus variants in states with the slowest vaccination rates compared to those with the fastest (OR 1.8, 95% CI 1.7-1.9, p < 0.01). However, this gap disappeared in the post-Omicron BA.1 period. Conclusion: The interplay between slow vaccination and low GDP per capita drives high mortality.

10.
J Am Soc Mass Spectrom ; 35(6): 1138-1155, 2024 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-38740383

RESUMEN

Having fast, accurate, and broad spectrum methods for the identification of microorganisms is of paramount importance to public health, research, and safety. Bottom-up mass spectrometer-based proteomics has emerged as an effective tool for the accurate identification of microorganisms from microbial isolates. However, one major hurdle that limits the deployment of this tool for routine clinical diagnosis, and other areas of research such as culturomics, is the instrument time required for the mass spectrometer to analyze a single sample, which can take ∼1 h per sample, when using mass spectrometers that are presently used in most institutes. To address this issue, in this study, we employed, for the first time, tandem mass tags (TMTs) in multiplex identifications of microorganisms from multiple TMT-labeled samples in one MS/MS experiment. A difficulty encountered when using TMT labeling is the presence of interference in the measured intensities of TMT reporter ions. To correct for interference, we employed in the proposed method a modified version of the expectation maximization (EM) algorithm that redistributes the signal from ion interference back to the correct TMT-labeled samples. We have evaluated the sensitivity and specificity of the proposed method using 94 MS/MS experiments (covering a broad range of protein concentration ratios across TMT-labeled channels and experimental parameters), containing a total of 1931 true positive TMT-labeled channels and 317 true negative TMT-labeled channels. The results of the evaluation show that the proposed method has an identification sensitivity of 93-97% and a specificity of 100% at the species level. Furthermore, as a proof of concept, using an in-house-generated data set composed of some of the most common urinary tract pathogens, we demonstrated that by using the proposed method the mass spectrometer time required per sample, using a 1 h LC-MS/MS run, can be reduced to 10 and 6 min when samples are labeled with TMT-6 and TMT-10, respectively. The proposed method can also be used along with Orbitrap mass spectrometers that have faster MS/MS acquisition rates, like the recently released Orbitrap Astral mass spectrometer, to further reduce the mass spectrometer time required per sample.


Asunto(s)
Algoritmos , Proteómica , Espectrometría de Masas en Tándem , Espectrometría de Masas en Tándem/métodos , Proteómica/métodos , Humanos , Bacterias/aislamiento & purificación , Bacterias/química , Proteínas Bacterianas/análisis , Proteínas Bacterianas/química , Proteínas Bacterianas/aislamiento & purificación
11.
Mol Biol Evol ; 27(8): 1745-9, 2010 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-20360214

RESUMEN

Comparison of expression levels and breadth and evolutionary rates of intronless and intron-containing mammalian genes shows that intronless genes are expressed at lower levels, tend to be tissue specific, and evolve significantly faster than spliced genes. By contrast, monomorphic spliced genes that are not subject to detectable alternative splicing and polymorphic alternatively spliced genes show similar statistically indistinguishable patterns of expression and evolution. Alternative splicing is most common in ancient genes, whereas intronless genes appear to have relatively recent origins. These results imply tight coupling between different stages of gene expression, in particular, transcription, splicing, and nucleocytosolic transport of transcripts, and suggest that formation of intronless genes is an important route of evolution of novel tissue-specific functions in animals.


Asunto(s)
Evolución Biológica , Intrones , Mamíferos/genética , Animales , Humanos , Datos de Secuencia Molecular , Empalme del ARN
12.
Hum Mol Genet ; 18(6): 1037-51, 2009 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-19103668

RESUMEN

The mu-opioid receptor (OPRM1) is the principal receptor target for both endogenous and exogenous opioid analgesics. There are substantial individual differences in human responses to painful stimuli and to opiate drugs that are attributed to genetic variations in OPRM1. In searching for new functional variants, we employed comparative genome analysis and obtained evidence for the existence of an expanded human OPRM1 gene locus with new promoters, alternative exons and regulatory elements. Examination of polymorphisms within the human OPRM1 gene locus identified strong association between single nucleotide polymorphism (SNP) rs563649 and individual variations in pain perception. SNP rs563649 is located within a structurally conserved internal ribosome entry site (IRES) in the 5'-UTR of a novel exon 13-containing OPRM1 isoforms (MOR-1K) and affects both mRNA levels and translation efficiency of these variants. Furthermore, rs563649 exhibits very strong linkage disequilibrium throughout the entire OPRM1 gene locus and thus affects the functional contribution of the corresponding haplotype that includes other functional OPRM1 SNPs. Our results provide evidence for an essential role for MOR-1K isoforms in nociceptive signaling and suggest that genetic variations in alternative OPRM1 isoforms may contribute to individual differences in opiate responses.


Asunto(s)
Polimorfismo de Nucleótido Simple/genética , Receptores Opioides mu/genética , Adolescente , Adulto , Alelos , Animales , Secuencia de Bases , Estudios de Cohortes , Exones/genética , Femenino , Predisposición Genética a la Enfermedad , Haplotipos , Humanos , Intrones/genética , Ratones , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Dolor/genética , Isoformas de Proteínas/genética , Empalme del ARN/genética
13.
Sci Rep ; 11(1): 2997, 2021 02 04.
Artículo en Inglés | MEDLINE | ID: mdl-33542373

RESUMEN

The rDNA clusters and flanking sequences on human chromosomes 13, 14, 15, 21 and 22 represent large gaps in the current genomic assembly. The organization and the degree of divergence of the human rDNA units within an individual nucleolar organizer region (NOR) are only partially known. To address this lacuna, we previously applied transformation-associated recombination (TAR) cloning to isolate individual rDNA units from chromosome 21. That approach revealed an unexpectedly high level of heterogeneity in human rDNA, raising the possibility of corresponding variations in ribosome dynamics. We have now applied the same strategy to analyze an entire rDNA array end-to-end from a copy of chromosome 22. Sequencing of TAR isolates provided the entire NOR sequence, including proximal and distal junctions that may be involved in nucleolar function. Comparison of the newly sequenced rDNAs to reference sequence for chromosomes 22 and 21 revealed variants that are shared in human rDNA in individuals from different ethnic groups, many of them at high frequency. Analysis infers comparable intra- and inter-individual divergence of rDNA units on the same and different chromosomes, supporting the concerted evolution of rDNA units. The results provide a route to investigate further the role of rDNA variation in nucleolar formation and in the empirical associations of nucleoli with pathology.


Asunto(s)
Cromosomas Humanos Par 22/genética , ADN Ribosómico/genética , Genoma Humano/genética , Región Organizadora del Nucléolo/genética , Nucléolo Celular/genética , Clonación Molecular , Heterogeneidad Genética , Genómica , Humanos , Anotación de Secuencia Molecular , Ribosomas/genética
14.
Nature ; 429(6991): 558-62, 2004 Jun 03.
Artículo en Inglés | MEDLINE | ID: mdl-15175752

RESUMEN

New alleles become fixed owing to random drift of nearly neutral mutations or to positive selection of substantially advantageous mutations. After decades of debate, the fraction of fixations driven by selection remains uncertain. Within 9,390 genes, we analysed 28,196 codons at which rat and mouse differ from each other at two nucleotide sites and 1,982 codons with three differences. At codons where rat-mouse divergence involved two non-synonymous substitutions, both of them occurred in the same lineage, either rat or mouse, in 64% of cases; however, independent substitutions would occur in the same lineage with a probability of only 50%. All three non-synonymous substitutions occurred in the same lineage for 46% of codons, instead of the 25% expected. Furthermore, comparison of 12 pairs of prokaryotic genomes also shows clumping of multiple non-synonymous substitutions in the same lineage. This pattern cannot be explained by correlated mutation or episodes of relaxed negative selection, but instead indicates that positive selection acts at many sites of rapid, successive amino acid replacement.


Asunto(s)
Sustitución de Aminoácidos/genética , Evolución Molecular , Variación Genética/genética , Mutación/genética , Selección Genética , Alelos , Animales , Codón/genética , Humanos , Ratones , Ratas , Alineación de Secuencia
15.
BMC Genomics ; 10: 162, 2009 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-19371439

RESUMEN

BACKGROUND: Alternative splicing (AS) in protein-coding sequences has emerged as an important mechanism of regulation and diversification of animal gene function. By contrast, the extent and roles of alternative events including AS and alternative transcription initiation (ATI) within the 5'-untranslated regions (5'UTRs) of mammalian genes are not well characterized. RESULTS: We evaluated the abundance, conservation and evolution of putative regulatory control elements, namely, upstream start codons (uAUGs) and open reading frames (uORFs), in the 5'UTRs of human and mouse genes impacted by alternative events. For genes with alternative 5'UTRs, the fraction of alternative sequences (those present in a subset of the transcripts) is much greater than that in the corresponding coding sequence, conceivably, because 5'UTRs are not bound by constraints on protein structure that limit AS in coding regions. Alternative regions of mammalian 5'UTRs evolve faster and are subject to a weaker purifying selection than constitutive portions. This relatively weak selection results in over-abundance of uAUGs and uORFs in the alternative regions of 5'UTRs compared to constitutive regions. Nevertheless, even in alternative regions, uORFs evolve under a stronger selection than the rest of the sequences, indicating that some of the uORFs are conserved regulatory elements; some of the non-conserved uORFs could be involved in species-specific regulation. CONCLUSION: The findings on the evolution and selection in alternative and constitutive regions presented here are consistent with the hypothesis that alternative events, namely, AS and ATI, in 5'UTRs of mammalian genes are likely to contribute to the regulation of translation.


Asunto(s)
Regiones no Traducidas 5'/genética , Empalme Alternativo , Evolución Molecular , Animales , Codón Iniciador , Hibridación Genómica Comparativa , Secuencia Conservada , Regulación de la Expresión Génica , Humanos , Ratones , Sistemas de Lectura Abierta , Biosíntesis de Proteínas , Secuencias Reguladoras de Ácidos Nucleicos , Selección Genética , Análisis de Secuencia de ARN
16.
Nucleic Acids Res ; 35(8): e63, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17426130

RESUMEN

Current literature describes several methods for the design of efficient siRNAs with 19 perfectly matched base pairs and 2 nt overhangs. Using four independent databases totaling 3336 experimentally verified siRNAs, we compared how well several of these methods predict siRNA cleavage efficiency. According to receiver operating characteristics (ROC) and correlation analyses, the best programs were BioPredsi, ThermoComposition and DSIR. We also studied individual parameters that significantly and consistently correlated with siRNA efficacy in different databases. As a result of this work we developed a new method which utilizes linear regression fitting with local duplex stability, nucleotide position-dependent preferences and total G/C content of siRNA duplexes as input parameters. The new method's discrimination ability of efficient and inefficient siRNAs is comparable with that of the best methods identified, but its parameters are more obviously related to the mechanisms of siRNA action in comparison with BioPredsi. This permits insight to the underlying physical features and relative importance of the parameters. The new method of predicting siRNA efficiency is faster than that of ThermoComposition because it does not employ time-consuming RNA secondary structure calculations and has much less parameters than DSIR. It is available as a web tool called 'siRNA scales'.


Asunto(s)
ARN Interferente Pequeño/química , Programas Informáticos , Algoritmos , Composición de Base , Bases de Datos Genéticas , Internet , Modelos Lineales , Nucleótidos/química
17.
BMC Genomics ; 9: 505, 2008 Oct 27.
Artículo en Inglés | MEDLINE | ID: mdl-18954448

RESUMEN

BACKGROUND: Existing scientific literature is a rich source of biological information such as disease markers. Integration of this information with data analysis may help researchers to identify possible controversies and to form useful hypotheses for further validations. In the context of proteomics studies, individualized proteomics era may be approached through consideration of amino acid substitutions/modifications as well as information from disease studies. Integration of such information with peptide searches facilitates speedy, dynamic information retrieval that may significantly benefit clinical laboratory studies. DESCRIPTION: We have integrated from various sources annotated single amino acid polymorphisms, post-translational modifications, and their documented disease associations (if they exist) into one enhanced database per organism. We have also augmented our peptide identification software RAId_DbS to take into account this information while analyzing a tandem mass spectrum. In principle, one may choose to respect or ignore the correlation of amino acid polymorphisms/modifications within each protein. The former leads to targeted searches and avoids scoring of unnecessary polymorphism/modification combinations; the latter explores possible polymorphisms in a controlled fashion. To facilitate new discoveries, RAId_DbS also allows users to conduct searches permitting novel polymorphisms as well as to search a knowledge database created by the users. CONCLUSION: We have finished constructing enhanced databases for 17 organisms. The web link to RAId_DbS and the enhanced databases is http://www.ncbi.nlm.nih.gov/CBBResearch/qmbp/RAId_DbS/index.html. The relevant databases and binaries of RAId_DbS for Linux, Windows, and Mac OS X are available for download from the same web page.


Asunto(s)
Bases de Datos de Ácidos Nucleicos/organización & administración , Internet , Espectrometría de Masas/métodos , Péptidos/análisis , Animales , Biología Computacional/métodos , Humanos , National Library of Medicine (U.S.) , Proteómica/métodos , Programas Informáticos , Estados Unidos
18.
Nucleic Acids Res ; 34(8): 2428-37, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-16682450

RESUMEN

Single-stranded mRNA molecules form secondary structures through complementary self-interactions. Several hypotheses have been proposed on the relationship between the nucleotide sequence, encoded amino acid sequence and mRNA secondary structure. We performed the first transcriptome-wide in silico analysis of the human and mouse mRNA foldings and found a pronounced periodic pattern of nucleotide involvement in mRNA secondary structure. We show that this pattern is created by the structure of the genetic code, and the dinucleotide relative abundances are important for the maintenance of mRNA secondary structure. Although synonymous codon usage contributes to this pattern, it is intrinsic to the structure of the genetic code and manifests itself even in the absence of synonymous codon usage bias at the 4-fold degenerate sites. While all codon sites are important for the maintenance of mRNA secondary structure, degeneracy of the code allows regulation of stability and periodicity of mRNA secondary structure. We demonstrate that the third degenerate codon sites contribute most strongly to mRNA stability. These results convincingly support the hypothesis that redundancies in the genetic code allow transcripts to satisfy requirements for both protein structure and RNA structure. Our data show that selection may be operating on synonymous codons to maintain a more stable and ordered mRNA secondary structure, which is likely to be important for transcript stability and translation. We also demonstrate that functional domains of the mRNA [5'-untranslated region (5'-UTR), CDS and 3'-UTR] preferentially fold onto themselves, while the start codon and stop codon regions are characterized by relaxed secondary structures, which may facilitate initiation and termination of translation.


Asunto(s)
Código Genético , ARN Mensajero/química , Regiones no Traducidas 3' , Regiones no Traducidas 5' , Animales , Emparejamiento Base , Secuencia de Bases , Codón Iniciador , Codón de Terminación , Biología Computacional , Secuencia Conservada , Humanos , Ratones , Conformación de Ácido Nucleico , Estabilidad del ARN , ARN Mensajero/metabolismo
19.
PLoS One ; 13(6): e0199162, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29928000

RESUMEN

Off-target oligoprobe's interaction with partially complementary nucleotide sequences represents a problem for many bio-techniques. The goal of the study was to identify oligoprobe sequence characteristics that control the ratio between on-target and off-target hybridization. To understand the complex interplay between specific and genome-wide off-target (cross-hybridization) signals, we analyzed a database derived from genomic comparison hybridization experiments performed with an Affymetrix tiling array. The database included two types of probes with signals derived from (i) a combination of specific signal and cross-hybridization and (ii) genomic cross-hybridization only. All probes from the database were grouped into bins according to their sequence characteristics, where both hybridization signals were averaged separately. For selection of specific probes, we analyzed the following sequence characteristics: vulnerability to self-folding, nucleotide composition bias, numbers of G nucleotides and GGG-blocks, and occurrence of probe's k-mers in the human genome. Increases in bin ranges for these characteristics are simultaneously accompanied by a decrease in hybridization specificity-the ratio between specific and cross-hybridization signals. However, both averaged hybridization signals exhibit growing trends along with an increase of probes' binding energy, where the hybridization specific signal increases significantly faster in comparison to the cross-hybridization. The same trend is evident for the S function, which serves as a combined evaluation of probe binding energy and occurrence of probe's k-mers in the genome. Application of S allows extracting a larger number of specific probes, as compared to using only binding energy. Thus, we showed that high values of specific and cross-hybridization signals are not mutually exclusive for probes with high values of binding energy and S. In this study, the application of a new set of sequence characteristics allows detection of probes that are highly specific to their targets for array design and other bio-techniques that require selection of specific probes.


Asunto(s)
Hibridación de Ácido Nucleico , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Secuencia de Bases , Bases de Datos Genéticas , Genoma , Humanos
20.
J Am Soc Mass Spectrom ; 29(8): 1721-1737, 2018 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-29873019

RESUMEN

Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA