Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
1.
J Proteome Res ; 23(6): 1983-1999, 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38728051

RESUMO

In recent years, several deep learning-based methods have been proposed for predicting peptide fragment intensities. This study aims to provide a comprehensive assessment of six such methods, namely Prosit, DeepMass:Prism, pDeep3, AlphaPeptDeep, Prosit Transformer, and the method proposed by Guan et al. To this end, we evaluated the accuracy of the predicted intensity profiles for close to 1.7 million precursors (including both tryptic and HLA peptides) corresponding to more than 18 million experimental spectra procured from 40 independent submissions to the PRIDE repository that were acquired for different species using a variety of instruments and different dissociation types/energies. Specifically, for each method, distributions of similarity (measured by Pearson's correlation and normalized angle) between the predicted and the corresponding experimental b and y fragment intensities were generated. These distributions were used to ascertain the prediction accuracy and rank the prediction methods for particular types of experimental conditions. The effect of variables like precursor charge, length, and collision energy on the prediction accuracy was also investigated. In addition to prediction accuracy, the methods were evaluated in terms of prediction speed. The systematic assessment of these six methods may help in choosing the right method for MS/MS spectra prediction for particular needs.


Assuntos
Aprendizado Profundo , Humanos , Fragmentos de Peptídeos/química , Fragmentos de Peptídeos/análise , Espectrometria de Massas em Tandem/métodos , Espectrometria de Massas em Tandem/estatística & dados numéricos , Proteômica/métodos , Proteômica/estatística & dados numéricos
2.
Nucleic Acids Res ; 50(2): e11, 2022 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-34791389

RESUMO

The choice of guide RNA (gRNA) for CRISPR-based gene targeting is an essential step in gene editing applications, but the prediction of gRNA specificity remains challenging. Lack of transparency and focus on point estimates of efficiency disregarding the information on possible error sources in the model limit the power of existing Deep Learning-based methods. To overcome these problems, we present a new approach, a hybrid of Capsule Networks and Gaussian Processes. Our method predicts the cleavage efficiency of a gRNA with a corresponding confidence interval, which allows the user to incorporate information regarding possible model errors into the experimental design. We provide the first utilization of uncertainty estimation in computational gRNA design, which is a critical step toward accurate decision-making for future CRISPR applications. The proposed solution demonstrates acceptable confidence intervals for most test sets and shows regression quality similar to existing models. We introduce a set of criteria for gRNA selection based on off-target cleavage efficiency and its variance and present a collection of pre-computed gRNAs for human chromosome 22. Using Neural Network Interpretation methods, we show that our model rediscovers an established biological factor underlying cleavage efficiency, the importance of the seed region in gRNA.


Assuntos
Sistemas CRISPR-Cas , Aprendizado Profundo , Edição de Genes , Marcação de Genes , RNA Guia de Cinetoplastídeos/genética , Algoritmos , Edição de Genes/métodos , Marcação de Genes/métodos , Genômica/métodos , Humanos , Redes Neurais de Computação , Reprodutibilidade dos Testes
3.
Nucleic Acids Res ; 46(13): 6712-6725, 2018 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-29788454

RESUMO

Despite the key role of the human ribosome in protein biosynthesis, little is known about the extent of sequence variation in ribosomal DNA (rDNA) or its pre-rRNA and rRNA products. We recovered ribosomal DNA segments from a single human chromosome 21 using transformation-associated recombination (TAR) cloning in yeast. Accurate long-read sequencing of 13 isolates covering ∼0.82 Mb of the chromosome 21 rDNA complement revealed substantial variation among tandem repeat rDNA copies, several palindromic structures and potential errors in the previous reference sequence. These clones revealed 101 variant positions in the 45S transcription unit and 235 in the intergenic spacer sequence. Approximately 60% of the 45S variants were confirmed in independent whole-genome or RNA-seq data, with 47 of these further observed in mature 18S/28S rRNA sequences. TAR cloning and long-read sequencing enabled the accurate reconstruction of multiple rDNA units and a new, high-quality 44 838 bp rDNA reference sequence, which we have annotated with variants detected from chromosome 21 of a single individual. The large number of variants observed reveal heterogeneity in human rDNA, opening up the possibility of corresponding variations in ribosome dynamics.


Assuntos
Cromossomos Humanos Par 21 , DNA Ribossômico/química , Genes de RNAr , Variação Genética , Animais , Linhagem Celular , Clonagem Molecular , DNA Ribossômico/isolamento & purificação , DNA Espaçador Ribossômico/química , Humanos , Camundongos , Conformação de Ácido Nucleico , Região Organizadora do Nucléolo/química , RNA Ribossômico/química , RNA Ribossômico/metabolismo , Análise de Sequência de DNA
4.
Proteomics ; 19(14): e1800367, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30908818

RESUMO

Mass spectrometry-based proteomics starts with identifications of peptides and proteins, which provide the bases for forming the next-level hypotheses whose "validations" are often employed for forming even higher level hypotheses and so forth. Scientifically meaningful conclusions are thus attainable only if the number of falsely identified peptides/proteins is accurately controlled. For this reason, RAId continued to be developed in the past decade. RAId employs rigorous statistics for peptides/proteins identification, hence assigning accurate P-values/E-values that can be used confidently to control the number of falsely identified peptides and proteins. The RAId web service is a versatile tool built to identify peptides and proteins from tandem mass spectrometry data. Not only recognizing various spectra file formats, the web service also allows four peptide scoring functions and choice of three statistical methods for assigning P-values/E-values to identified peptides. Users may upload their own protein database or use one of the available knowledge integrated organismal databases that contain annotated information such as single amino acid polymorphisms, post-translational modifications, and their disease associations. The web service also provides a friendly interface to display, sort using different criteria, and download the identified peptides and proteins. RAId web service is freely available at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/raid.


Assuntos
Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Proteômica/métodos , Biologia Computacional
5.
Nucleic Acids Res ; 44(22): 10898-10911, 2016 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-27466388

RESUMO

Specific structures in mRNA modulate translation rate and thus can affect protein folding. Using the protein structures from two eukaryotes and three prokaryotes, we explore the connections between the protein compactness, inferred from solvent accessibility, and mRNA structure, inferred from mRNA folding energy (ΔG). In both prokaryotes and eukaryotes, the ΔG value of the most stable 30 nucleotide segment of the mRNA (ΔGmin) strongly, positively correlates with protein solvent accessibility. Thus, mRNAs containing exceptionally stable secondary structure elements typically encode compact proteins. The correlations between ΔG and protein compactness are much more pronounced in predicted ordered parts of proteins compared to the predicted disordered parts, indicative of an important role of mRNA secondary structure elements in the control of protein folding. Additionally, ΔG correlates with the mRNA length and the evolutionary rate of synonymous positions. The correlations are partially independent and were used to construct multiple regression models which explain about half of the variance of protein solvent accessibility. These findings suggest a model in which the mRNA structure, particularly exceptionally stable RNA structural elements, act as gauges of protein co-translational folding by reducing ribosome speed when the nascent peptide needs time to form and optimize the core structure.


Assuntos
Dobramento de Proteína , RNA Mensageiro/fisiologia , Animais , Composição de Bases , Humanos , Cinética , Modelos Lineares , Modelos Moleculares , Conformação de Ácido Nucleico , Biossíntese de Proteínas , Estrutura Secundária de Proteína , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Estabilidade de RNA , RNA Mensageiro/química , Termodinâmica , Transcriptoma
6.
Bioinformatics ; 32(17): i552-i558, 2016 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27587674

RESUMO

MOTIVATION: Target-specific hybridization depends on oligo-probe characteristics that improve hybridization specificity and minimize genome-wide cross-hybridization. Interplay between specific hybridization and genome-wide cross-hybridization has been insufficiently studied, despite its crucial role in efficient probe design and in data analysis. RESULTS: In this study, we defined hybridization specificity as a ratio between oligo target-specific hybridization and oligo genome-wide cross-hybridization. A microarray database, derived from the Genomic Comparison Hybridization (GCH) experiment and performed using the Affymetrix platform, contains two different types of probes. The first type of oligo-probes does not have a specific target on the genome and their hybridization signals are derived from genome-wide cross-hybridization alone. The second type includes oligonucleotides that have a specific target on the genomic DNA and their signals are derived from specific and cross-hybridization components combined together in a total signal. A comparative analysis of hybridization specificity of oligo-probes, as well as their nucleotide sequences and thermodynamic features was performed on the database. The comparison has revealed that hybridization specificity was negatively affected by low stability of the fully-paired oligo-target duplex, stable probe self-folding, G-rich content, including GGG motifs, low sequence complexity and nucleotide composition symmetry. CONCLUSION: Filtering out the probes with defined 'negative' characteristics significantly increases specific hybridization and dramatically decreasing genome-wide cross-hybridization. Selected oligo-probes have two times higher hybridization specificity on average, compared to the probes that were filtered from the analysis by applying suggested cutoff thresholds to the described parameters. A new approach for efficient oligo-probe design is described in our study. CONTACT: shabalin@ncbi.nlm.nih.gov or olga.matveeva@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Hibridização de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos , Razão Sinal-Ruído , Sondas de DNA , Perfilação da Expressão Gênica , Genômica , Oligonucleotídeos , Sensibilidade e Especificidade
7.
RNA Biol ; 14(12): 1649-1654, 2017 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-28722509

RESUMO

Comparison of mRNA and protein structures shows that highly structured mRNAs typically encode compact protein domains suggesting that mRNA structure controls protein folding. This function is apparently performed by distinct structural elements in the mRNA, which implies 'fine tuning' of mRNA structure under selection for optimal protein folding. We find that, during evolution, changes in the mRNA folding energy follow amino acid replacements, reinforcing the notion of an intimate connection between the structures of a mRNA and the protein it encodes, and the double encoding of protein sequence and folding in the mRNA.


Assuntos
Adaptação Biológica , Conformação de Ácido Nucleico , Biossíntese de Proteínas , Dobramento de Proteína , RNA Mensageiro/química , RNA Mensageiro/genética , Animais , Evolução Biológica , Humanos , Estabilidade de RNA , Seleção Genética , Relação Estrutura-Atividade
8.
Nucleic Acids Res ; 42(11): 7132-44, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24792168

RESUMO

Alternative splicing (AS), alternative transcription initiation (ATI) and alternative transcription termination (ATT) create the extraordinary complexity of transcriptomes and make key contributions to the structural and functional diversity of mammalian proteomes. Analysis of mammalian genomic and transcriptomic data shows that contrary to the traditional view, the joint contribution of ATI and ATT to the transcriptome and proteome diversity is quantitatively greater than the contribution of AS. Although the mean numbers of protein-coding constitutive and alternative nucleotides in gene loci are nearly identical, their distribution along the transcripts is highly non-uniform. On average, coding exons in the variable 5' and 3' transcript ends that are created by ATI and ATT contain approximately four times more alternative nucleotides than core protein-coding regions that diversify exclusively via AS. Short upstream exons that encompass alternative 5'-untranslated regions and N-termini of proteins evolve under strong nucleotide-level selection whereas in 3'-terminal exons that encode protein C-termini, protein-level selection is significantly stronger. The groups of genes that are subject to ATI and ATT show major differences in biological roles, expression and selection patterns.


Assuntos
Evolução Molecular , Isoformas de Proteínas/genética , Iniciação da Transcrição Genética , Terminação da Transcrição Genética , Animais , Variação Genética , Humanos , Camundongos , Proteoma , Transcriptoma
9.
medRxiv ; 2024 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-38313291

RESUMO

Aim: This study investigates factors influencing pandemic mortality rates across U.S. states during different waves of SARS-CoV-2 infection from February 2020 to April 2023, given that over one million people died from COVID-19 in the country. Methods: We performed statistical analyses and used linear regression models to estimate age-adjusted and unadjusted excess mortality as functions of life expectancy, vaccination rates, and GDP per capita in U.S. states. Results and Discussion: States with lower life expectancy and lower GDP per capita experienced significantly higher mortality rates during the pandemic, underscoring the critical role of underlying health conditions and healthcare infrastructure, as reflected in these factors. When categorizing states by vaccination rates, significant differences in GDP per capita and pre-pandemic life expectancy emerged between states with lower and higher vaccination rates, likely explaining mortality disparities before mass vaccination. During the Delta and Omicron BA.1 waves, when vaccines were widely available, the mortality gap widened, and states with lower vaccination rates experienced nearly double the mortality compared to states with higher vaccination rates (Odds Ratio 1.8, 95% CI 1.7-1.9, p < 0.01). This disparity disappeared during the later Omicron variants, likely because the levels of combined immunity from vaccination and widespread infection across state populations became comparable. We showed that vaccination rates were the only significant factor influencing age-adjusted mortality, highlighting the substantial impact of age-specific demographics on both life expectancy and GDP across states. Conclusion: The study underscores the critical role of high vaccination rates in reducing excess deaths across all states, regardless of economic status. Vaccination rates proved more decisive than GDP per capita in reducing excess deaths. Additionally, states with lower pre-pandemic life expectancy faced greater challenges, reflecting the combined effects of healthcare quality, demographic variations, and social determinants of health. These findings call for comprehensive public health strategies that address both immediate interventions, like vaccination, and long-term improvements in healthcare infrastructure and social conditions.

10.
J Am Soc Mass Spectrom ; 35(6): 1138-1155, 2024 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-38740383

RESUMO

Having fast, accurate, and broad spectrum methods for the identification of microorganisms is of paramount importance to public health, research, and safety. Bottom-up mass spectrometer-based proteomics has emerged as an effective tool for the accurate identification of microorganisms from microbial isolates. However, one major hurdle that limits the deployment of this tool for routine clinical diagnosis, and other areas of research such as culturomics, is the instrument time required for the mass spectrometer to analyze a single sample, which can take ∼1 h per sample, when using mass spectrometers that are presently used in most institutes. To address this issue, in this study, we employed, for the first time, tandem mass tags (TMTs) in multiplex identifications of microorganisms from multiple TMT-labeled samples in one MS/MS experiment. A difficulty encountered when using TMT labeling is the presence of interference in the measured intensities of TMT reporter ions. To correct for interference, we employed in the proposed method a modified version of the expectation maximization (EM) algorithm that redistributes the signal from ion interference back to the correct TMT-labeled samples. We have evaluated the sensitivity and specificity of the proposed method using 94 MS/MS experiments (covering a broad range of protein concentration ratios across TMT-labeled channels and experimental parameters), containing a total of 1931 true positive TMT-labeled channels and 317 true negative TMT-labeled channels. The results of the evaluation show that the proposed method has an identification sensitivity of 93-97% and a specificity of 100% at the species level. Furthermore, as a proof of concept, using an in-house-generated data set composed of some of the most common urinary tract pathogens, we demonstrated that by using the proposed method the mass spectrometer time required per sample, using a 1 h LC-MS/MS run, can be reduced to 10 and 6 min when samples are labeled with TMT-6 and TMT-10, respectively. The proposed method can also be used along with Orbitrap mass spectrometers that have faster MS/MS acquisition rates, like the recently released Orbitrap Astral mass spectrometer, to further reduce the mass spectrometer time required per sample.


Assuntos
Algoritmos , Proteômica , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Proteômica/métodos , Humanos , Bactérias/isolamento & purificação , Bactérias/química , Proteínas de Bactérias/análise , Proteínas de Bactérias/química , Proteínas de Bactérias/isolamento & purificação
11.
Mol Biol Evol ; 27(8): 1745-9, 2010 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-20360214

RESUMO

Comparison of expression levels and breadth and evolutionary rates of intronless and intron-containing mammalian genes shows that intronless genes are expressed at lower levels, tend to be tissue specific, and evolve significantly faster than spliced genes. By contrast, monomorphic spliced genes that are not subject to detectable alternative splicing and polymorphic alternatively spliced genes show similar statistically indistinguishable patterns of expression and evolution. Alternative splicing is most common in ancient genes, whereas intronless genes appear to have relatively recent origins. These results imply tight coupling between different stages of gene expression, in particular, transcription, splicing, and nucleocytosolic transport of transcripts, and suggest that formation of intronless genes is an important route of evolution of novel tissue-specific functions in animals.


Assuntos
Evolução Biológica , Íntrons , Mamíferos/genética , Animais , Humanos , Dados de Sequência Molecular , Splicing de RNA
12.
Hum Mol Genet ; 18(6): 1037-51, 2009 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-19103668

RESUMO

The mu-opioid receptor (OPRM1) is the principal receptor target for both endogenous and exogenous opioid analgesics. There are substantial individual differences in human responses to painful stimuli and to opiate drugs that are attributed to genetic variations in OPRM1. In searching for new functional variants, we employed comparative genome analysis and obtained evidence for the existence of an expanded human OPRM1 gene locus with new promoters, alternative exons and regulatory elements. Examination of polymorphisms within the human OPRM1 gene locus identified strong association between single nucleotide polymorphism (SNP) rs563649 and individual variations in pain perception. SNP rs563649 is located within a structurally conserved internal ribosome entry site (IRES) in the 5'-UTR of a novel exon 13-containing OPRM1 isoforms (MOR-1K) and affects both mRNA levels and translation efficiency of these variants. Furthermore, rs563649 exhibits very strong linkage disequilibrium throughout the entire OPRM1 gene locus and thus affects the functional contribution of the corresponding haplotype that includes other functional OPRM1 SNPs. Our results provide evidence for an essential role for MOR-1K isoforms in nociceptive signaling and suggest that genetic variations in alternative OPRM1 isoforms may contribute to individual differences in opiate responses.


Assuntos
Polimorfismo de Nucleotídeo Único/genética , Receptores Opioides mu/genética , Adolescente , Adulto , Alelos , Animais , Sequência de Bases , Estudos de Coortes , Éxons/genética , Feminino , Predisposição Genética para Doença , Haplótipos , Humanos , Íntrons/genética , Camundongos , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Dor/genética , Isoformas de Proteínas/genética , Splicing de RNA/genética
13.
Sci Rep ; 11(1): 2997, 2021 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-33542373

RESUMO

The rDNA clusters and flanking sequences on human chromosomes 13, 14, 15, 21 and 22 represent large gaps in the current genomic assembly. The organization and the degree of divergence of the human rDNA units within an individual nucleolar organizer region (NOR) are only partially known. To address this lacuna, we previously applied transformation-associated recombination (TAR) cloning to isolate individual rDNA units from chromosome 21. That approach revealed an unexpectedly high level of heterogeneity in human rDNA, raising the possibility of corresponding variations in ribosome dynamics. We have now applied the same strategy to analyze an entire rDNA array end-to-end from a copy of chromosome 22. Sequencing of TAR isolates provided the entire NOR sequence, including proximal and distal junctions that may be involved in nucleolar function. Comparison of the newly sequenced rDNAs to reference sequence for chromosomes 22 and 21 revealed variants that are shared in human rDNA in individuals from different ethnic groups, many of them at high frequency. Analysis infers comparable intra- and inter-individual divergence of rDNA units on the same and different chromosomes, supporting the concerted evolution of rDNA units. The results provide a route to investigate further the role of rDNA variation in nucleolar formation and in the empirical associations of nucleoli with pathology.


Assuntos
Cromossomos Humanos Par 22/genética , DNA Ribossômico/genética , Genoma Humano/genética , Região Organizadora do Nucléolo/genética , Nucléolo Celular/genética , Clonagem Molecular , Heterogeneidade Genética , Genômica , Humanos , Anotação de Sequência Molecular , Ribossomos/genética
14.
Nature ; 429(6991): 558-62, 2004 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-15175752

RESUMO

New alleles become fixed owing to random drift of nearly neutral mutations or to positive selection of substantially advantageous mutations. After decades of debate, the fraction of fixations driven by selection remains uncertain. Within 9,390 genes, we analysed 28,196 codons at which rat and mouse differ from each other at two nucleotide sites and 1,982 codons with three differences. At codons where rat-mouse divergence involved two non-synonymous substitutions, both of them occurred in the same lineage, either rat or mouse, in 64% of cases; however, independent substitutions would occur in the same lineage with a probability of only 50%. All three non-synonymous substitutions occurred in the same lineage for 46% of codons, instead of the 25% expected. Furthermore, comparison of 12 pairs of prokaryotic genomes also shows clumping of multiple non-synonymous substitutions in the same lineage. This pattern cannot be explained by correlated mutation or episodes of relaxed negative selection, but instead indicates that positive selection acts at many sites of rapid, successive amino acid replacement.


Assuntos
Substituição de Aminoácidos/genética , Evolução Molecular , Variação Genética/genética , Mutação/genética , Seleção Genética , Alelos , Animais , Códon/genética , Humanos , Camundongos , Ratos , Alinhamento de Sequência
15.
BMC Genomics ; 10: 162, 2009 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-19371439

RESUMO

BACKGROUND: Alternative splicing (AS) in protein-coding sequences has emerged as an important mechanism of regulation and diversification of animal gene function. By contrast, the extent and roles of alternative events including AS and alternative transcription initiation (ATI) within the 5'-untranslated regions (5'UTRs) of mammalian genes are not well characterized. RESULTS: We evaluated the abundance, conservation and evolution of putative regulatory control elements, namely, upstream start codons (uAUGs) and open reading frames (uORFs), in the 5'UTRs of human and mouse genes impacted by alternative events. For genes with alternative 5'UTRs, the fraction of alternative sequences (those present in a subset of the transcripts) is much greater than that in the corresponding coding sequence, conceivably, because 5'UTRs are not bound by constraints on protein structure that limit AS in coding regions. Alternative regions of mammalian 5'UTRs evolve faster and are subject to a weaker purifying selection than constitutive portions. This relatively weak selection results in over-abundance of uAUGs and uORFs in the alternative regions of 5'UTRs compared to constitutive regions. Nevertheless, even in alternative regions, uORFs evolve under a stronger selection than the rest of the sequences, indicating that some of the uORFs are conserved regulatory elements; some of the non-conserved uORFs could be involved in species-specific regulation. CONCLUSION: The findings on the evolution and selection in alternative and constitutive regions presented here are consistent with the hypothesis that alternative events, namely, AS and ATI, in 5'UTRs of mammalian genes are likely to contribute to the regulation of translation.


Assuntos
Regiões 5' não Traduzidas/genética , Processamento Alternativo , Evolução Molecular , Animais , Códon de Iniciação , Hibridização Genômica Comparativa , Sequência Conservada , Regulação da Expressão Gênica , Humanos , Camundongos , Fases de Leitura Aberta , Biossíntese de Proteínas , Sequências Reguladoras de Ácido Nucleico , Seleção Genética , Análise de Sequência de RNA
16.
Nucleic Acids Res ; 35(8): e63, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17426130

RESUMO

Current literature describes several methods for the design of efficient siRNAs with 19 perfectly matched base pairs and 2 nt overhangs. Using four independent databases totaling 3336 experimentally verified siRNAs, we compared how well several of these methods predict siRNA cleavage efficiency. According to receiver operating characteristics (ROC) and correlation analyses, the best programs were BioPredsi, ThermoComposition and DSIR. We also studied individual parameters that significantly and consistently correlated with siRNA efficacy in different databases. As a result of this work we developed a new method which utilizes linear regression fitting with local duplex stability, nucleotide position-dependent preferences and total G/C content of siRNA duplexes as input parameters. The new method's discrimination ability of efficient and inefficient siRNAs is comparable with that of the best methods identified, but its parameters are more obviously related to the mechanisms of siRNA action in comparison with BioPredsi. This permits insight to the underlying physical features and relative importance of the parameters. The new method of predicting siRNA efficiency is faster than that of ThermoComposition because it does not employ time-consuming RNA secondary structure calculations and has much less parameters than DSIR. It is available as a web tool called 'siRNA scales'.


Assuntos
RNA Interferente Pequeno/química , Software , Algoritmos , Composição de Bases , Bases de Dados Genéticas , Internet , Modelos Lineares , Nucleotídeos/química
17.
BMC Genomics ; 9: 505, 2008 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-18954448

RESUMO

BACKGROUND: Existing scientific literature is a rich source of biological information such as disease markers. Integration of this information with data analysis may help researchers to identify possible controversies and to form useful hypotheses for further validations. In the context of proteomics studies, individualized proteomics era may be approached through consideration of amino acid substitutions/modifications as well as information from disease studies. Integration of such information with peptide searches facilitates speedy, dynamic information retrieval that may significantly benefit clinical laboratory studies. DESCRIPTION: We have integrated from various sources annotated single amino acid polymorphisms, post-translational modifications, and their documented disease associations (if they exist) into one enhanced database per organism. We have also augmented our peptide identification software RAId_DbS to take into account this information while analyzing a tandem mass spectrum. In principle, one may choose to respect or ignore the correlation of amino acid polymorphisms/modifications within each protein. The former leads to targeted searches and avoids scoring of unnecessary polymorphism/modification combinations; the latter explores possible polymorphisms in a controlled fashion. To facilitate new discoveries, RAId_DbS also allows users to conduct searches permitting novel polymorphisms as well as to search a knowledge database created by the users. CONCLUSION: We have finished constructing enhanced databases for 17 organisms. The web link to RAId_DbS and the enhanced databases is http://www.ncbi.nlm.nih.gov/CBBResearch/qmbp/RAId_DbS/index.html. The relevant databases and binaries of RAId_DbS for Linux, Windows, and Mac OS X are available for download from the same web page.


Assuntos
Bases de Dados de Ácidos Nucleicos/organização & administração , Internet , Espectrometria de Massas/métodos , Peptídeos/análise , Animais , Biologia Computacional/métodos , Humanos , National Library of Medicine (U.S.) , Proteômica/métodos , Software , Estados Unidos
18.
Nucleic Acids Res ; 34(8): 2428-37, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16682450

RESUMO

Single-stranded mRNA molecules form secondary structures through complementary self-interactions. Several hypotheses have been proposed on the relationship between the nucleotide sequence, encoded amino acid sequence and mRNA secondary structure. We performed the first transcriptome-wide in silico analysis of the human and mouse mRNA foldings and found a pronounced periodic pattern of nucleotide involvement in mRNA secondary structure. We show that this pattern is created by the structure of the genetic code, and the dinucleotide relative abundances are important for the maintenance of mRNA secondary structure. Although synonymous codon usage contributes to this pattern, it is intrinsic to the structure of the genetic code and manifests itself even in the absence of synonymous codon usage bias at the 4-fold degenerate sites. While all codon sites are important for the maintenance of mRNA secondary structure, degeneracy of the code allows regulation of stability and periodicity of mRNA secondary structure. We demonstrate that the third degenerate codon sites contribute most strongly to mRNA stability. These results convincingly support the hypothesis that redundancies in the genetic code allow transcripts to satisfy requirements for both protein structure and RNA structure. Our data show that selection may be operating on synonymous codons to maintain a more stable and ordered mRNA secondary structure, which is likely to be important for transcript stability and translation. We also demonstrate that functional domains of the mRNA [5'-untranslated region (5'-UTR), CDS and 3'-UTR] preferentially fold onto themselves, while the start codon and stop codon regions are characterized by relaxed secondary structures, which may facilitate initiation and termination of translation.


Assuntos
Código Genético , RNA Mensageiro/química , Regiões 3' não Traduzidas , Regiões 5' não Traduzidas , Animais , Pareamento de Bases , Sequência de Bases , Códon de Iniciação , Códon de Terminação , Biologia Computacional , Sequência Conservada , Humanos , Camundongos , Conformação de Ácido Nucleico , Estabilidade de RNA , RNA Mensageiro/metabolismo
19.
PLoS One ; 13(6): e0199162, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29928000

RESUMO

Off-target oligoprobe's interaction with partially complementary nucleotide sequences represents a problem for many bio-techniques. The goal of the study was to identify oligoprobe sequence characteristics that control the ratio between on-target and off-target hybridization. To understand the complex interplay between specific and genome-wide off-target (cross-hybridization) signals, we analyzed a database derived from genomic comparison hybridization experiments performed with an Affymetrix tiling array. The database included two types of probes with signals derived from (i) a combination of specific signal and cross-hybridization and (ii) genomic cross-hybridization only. All probes from the database were grouped into bins according to their sequence characteristics, where both hybridization signals were averaged separately. For selection of specific probes, we analyzed the following sequence characteristics: vulnerability to self-folding, nucleotide composition bias, numbers of G nucleotides and GGG-blocks, and occurrence of probe's k-mers in the human genome. Increases in bin ranges for these characteristics are simultaneously accompanied by a decrease in hybridization specificity-the ratio between specific and cross-hybridization signals. However, both averaged hybridization signals exhibit growing trends along with an increase of probes' binding energy, where the hybridization specific signal increases significantly faster in comparison to the cross-hybridization. The same trend is evident for the S function, which serves as a combined evaluation of probe binding energy and occurrence of probe's k-mers in the genome. Application of S allows extracting a larger number of specific probes, as compared to using only binding energy. Thus, we showed that high values of specific and cross-hybridization signals are not mutually exclusive for probes with high values of binding energy and S. In this study, the application of a new set of sequence characteristics allows detection of probes that are highly specific to their targets for array design and other bio-techniques that require selection of specific probes.


Assuntos
Hibridização de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Sequência de Bases , Bases de Dados Genéticas , Genoma , Humanos
20.
J Am Soc Mass Spectrom ; 29(8): 1721-1737, 2018 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-29873019

RESUMO

Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa