RESUMO
In animals, microRNAs frequently form families with related sequences. The functional relevance of miRNA families and the relative contribution of family members to target repression have remained, however, largely unexplored. Here, we used the Caenorhabditis elegans miR-58 miRNA family, composed primarily of the four highly abundant members miR-58.1, miR-80, miR-81, and miR-82, as a model to investigate the redundancy of miRNA family members and their impact on target expression in an in vivo setting. We found that miR-58 family members repress largely overlapping sets of targets in a predominantly additive fashion. Progressive deletions of miR-58 family members lead to cumulative up-regulation of target protein and RNA levels. Phenotypic defects could only be observed in the family quadruple mutant, which also showed the strongest change in target protein levels. Interestingly, although the seed sequences of miR-80 and miR-58.1 differ in a single nucleotide, predicted canonical miR-80 targets were efficiently up-regulated in the mir-58.1 single mutant, indicating functional redundancy of distinct members of this miRNA family. At the aggregate level, target binding leads mainly to mRNA degradation, although we also observed some degree of translational inhibition, particularly in the single miR-58 family mutants. These results provide a framework for understanding how miRNA family members interact to regulate target mRNAs.
Assuntos
Caenorhabditis elegans/genética , MicroRNAs/genética , Estabilidade de RNA/genética , RNA Mensageiro/genética , Regulação para Cima , Animais , Proteínas de Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/metabolismo , Repressão Epigenética , MicroRNAs/metabolismo , Proteômica , RNA Mensageiro/metabolismo , Análise de Sequência de RNA , TranscriptomaRESUMO
The analysis and management of MS data, especially those generated by data independent MS acquisition, exemplified by SWATH-MS, pose significant challenges for proteomics bioinformatics. The large size and vast amount of information inherent to these data sets need to be properly structured to enable an efficient and straightforward extraction of the signals used to identify specific target peptides. Standard XML based formats are not well suited to large MS data files, for example, those generated by SWATH-MS, and compromise high-throughput data processing and storing. We developed mzDB, an efficient file format for large MS data sets. It relies on the SQLite software library and consists of a standardized and portable server-less single-file database. An optimized 3D indexing approach is adopted, where the LC-MS coordinates (retention time and m/z), along with the precursor m/z for SWATH-MS data, are used to query the database for data extraction. In comparison with XML formats, mzDB saves â¼25% of storage space and improves access times by a factor of twofold up to even 2000-fold, depending on the particular data access. Similarly, mzDB shows also slightly to significantly lower access times in comparison with other formats like mz5. Both C++ and Java implementations, converting raw or XML formats to mzDB and providing access methods, will be released under permissive license. mzDB can be easily accessed by the SQLite C library and its drivers for all major languages, and browsed with existing dedicated GUIs. The mzDB described here can boost existing mass spectrometry data analysis pipelines, offering unprecedented performance in terms of efficiency, portability, compactness, and flexibility.
Assuntos
Sistemas de Gerenciamento de Base de Dados , Espectrometria de Massas/métodos , Conjuntos de Dados como Assunto , Células Epiteliais/metabolismo , Humanos , Proteoma/análiseRESUMO
Quality control is increasingly recognized as a crucial aspect of mass spectrometry based proteomics. Several recent papers discuss relevant parameters for quality control and present applications to extract these from the instrumental raw data. What has been missing, however, is a standard data exchange format for reporting these performance metrics. We therefore developed the qcML format, an XML-based standard that follows the design principles of the related mzML, mzIdentML, mzQuantML, and TraML standards from the HUPO-PSI (Proteomics Standards Initiative). In addition to the XML format, we also provide tools for the calculation of a wide range of quality metrics as well as a database format and interconversion tools, so that existing LIMS systems can easily add relational storage of the quality control data to their existing schema. We here describe the qcML specification, along with possible use cases and an illustrative example of the subsequent analysis possibilities. All information about qcML is available at http://code.google.com/p/qcml.
Assuntos
Espectrometria de Massas/normas , Software , Bases de Dados de Proteínas , Linguagens de Programação , Proteômica/normas , Controle de QualidadeRESUMO
Selected reaction monitoring (SRM) MS is a highly selective and sensitive technique to quantify protein abundances in complex biological samples. To enhance the pace of SRM large studies, a validated, robust method to fully automate absolute quantification and to substitute for interactive evaluation would be valuable. To address this demand, we present Ariadne, a Matlab software. To quantify monitored targets, Ariadne exploits metadata imported from the transition lists, and targets can be filtered according to mProphet output. Signal processing and statistical learning approaches are combined to compute peptide quantifications. To robustly estimate absolute abundances, the external calibration curve method is applied, ensuring linearity over the measured dynamic range. Ariadne was benchmarked against mProphet and Skyline by comparing its quantification performance on three different dilution series, featuring either noisy/smooth traces without background or smooth traces with complex background. Results, evaluated as efficiency, linearity, accuracy, and precision of quantification, showed that Ariadne's performance is independent of data smoothness and complex background presence and that Ariadne outperforms mProphet on the noisier data set and improved 2-fold Skyline's accuracy and precision for the lowest abundant dilution with complex background. Remarkably, Ariadne could statistically distinguish from each other all different abundances, discriminating dilutions as low as 0.1 and 0.2 fmol. These results suggest that Ariadne offers reliable and automated analysis of large-scale SRM differential expression studies.
Assuntos
Automação , Proteínas/química , Software , Algoritmos , Reprodutibilidade dos TestesRESUMO
As an emerging field, MS-based proteomics still requires software tools for efficiently storing and accessing experimental data. In this work, we focus on the management of LC-MS data, which are typically made available in standard XML-based portable formats. The structures that are currently employed to manage these data can be highly inefficient, especially when dealing with high-throughput profile data. LC-MS datasets are usually accessed through 2D range queries. Optimizing this type of operation could dramatically reduce the complexity of data analysis. We propose a novel data structure for LC-MS datasets, called mzRTree, which embodies a scalable index based on the R-tree data structure. mzRTree can be efficiently created from the XML-based data formats and it is suitable for handling very large datasets. We experimentally show that, on all range queries, mzRTree outperforms other known structures used for LC-MS data, even on those queries these structures are optimized for. Besides, mzRTree is also more space efficient. As a result, mzRTree reduces data analysis computational costs for very large profile datasets.