Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
Más filtros

Bases de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Mol Cell Proteomics ; 23(2): 100708, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38154689

RESUMEN

In the era of open-modification search engines, more posttranslational modifications than ever can be detected by LC-MS/MS-based proteomics. This development can switch proteomics research into a higher gear, as PTMs are key in many cellular pathways important in cell proliferation, migration, metastasis, and aging. However, despite these advances in modification identification, statistical methods for PTM-level quantification and differential analysis have yet to catch up. This absence can partly be explained by statistical challenges inherent to the data, such as the confounding of PTM intensities with its parent protein abundance. Therefore, we have developed msqrob2PTM, a new workflow in the msqrob2 universe capable of differential abundance analysis at the PTM and at the peptidoform level. The latter is important for validating PTMs found as significantly differential. Indeed, as our method can deal with multiple PTMs per peptidoform, there is a possibility that significant PTMs stem from one significant peptidoform carrying another PTM, hinting that it might be the other PTM driving the perceived differential abundance. Our workflows can flag both differential peptidoform abundance (DPA) and differential peptidoform usage (DPU). This enables a distinction between direct assessment of differential abundance of peptidoforms (DPA) and differences in the relative usage of peptidoforms corrected for corresponding protein abundances (DPU). For DPA, we directly model the log2-transformed peptidoform intensities, while for DPU, we correct for parent protein abundance by an intermediate normalization step which calculates the log2-ratio of the peptidoform intensities to their summarized parent protein intensities. We demonstrated the utility and performance of msqrob2PTM by applying it to datasets with known ground truth, as well as to biological PTM-rich datasets. Our results show that msqrob2PTM is on par with, or surpassing the performance of, the current state-of-the-art methods. Moreover, msqrob2PTM is currently unique in providing output at the peptidoform level.


Asunto(s)
Proteómica , Espectrometría de Masas en Tándem , Proteómica/métodos , Cromatografía Liquida , Procesamiento Proteico-Postraduccional , Proteínas
2.
J Proteome Res ; 22(2): 350-358, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36648107

RESUMEN

Reliable peptide identification is key in mass spectrometry (MS) based proteomics. To this end, the target decoy approach (TDA) has become the cornerstone for extracting a set of reliable peptide-to-spectrum matches (PSMs) that will be used in downstream analysis. Indeed, TDA is now the default method to estimate the false discovery rate (FDR) for a given set of PSMs, and users typically view it as a universal solution for assessing the FDR in the peptide identification step. However, the TDA also relies on a minimal set of assumptions, which are typically never verified in practice. We argue that a violation of these assumptions can lead to poor FDR control, which can be detrimental to any downstream data analysis. We here therefore first clearly spell out these TDA assumptions, and introduce TargetDecoy, a Bioconductor package with all the necessary functionality to control the TDA quality and its underlying assumptions for a given set of PSMs.


Asunto(s)
Péptidos , Espectrometría de Masas en Tándem , Espectrometría de Masas en Tándem/métodos , Péptidos/análisis , Proteómica/métodos , Análisis de Datos , Control de Calidad , Bases de Datos de Proteínas , Algoritmos
3.
Mol Cell Proteomics ; 19(7): 1209-1219, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32321741

RESUMEN

Label-Free Quantitative mass spectrometry based workflows for differential expression (DE) analysis of proteins impose important challenges on the data analysis because of peptide-specific effects and context dependent missingness of peptide intensities. Peptide-based workflows, like MSqRob, test for DE directly from peptide intensities and outperform summarization methods which first aggregate MS1 peptide intensities to protein intensities before DE analysis. However, these methods are computationally expensive, often hard to understand for the non-specialized end-user, and do not provide protein summaries, which are important for visualization or downstream processing. In this work, we therefore evaluate state-of-the-art summarization strategies using a benchmark spike-in dataset and discuss why and when these fail compared with the state-of-the-art peptide based model, MSqRob. Based on this evaluation, we propose a novel summarization strategy, MSqRobSum, which estimates MSqRob's model parameters in a two-stage procedure circumventing the drawbacks of peptide-based workflows. MSqRobSum maintains MSqRob's superior performance, while providing useful protein expression summaries for plotting and downstream analysis. Summarizing peptide to protein intensities considerably reduces the computational complexity, the memory footprint and the model complexity, and makes it easier to disseminate DE inferred on protein summaries. Moreover, MSqRobSum provides a highly modular analysis framework, which provides researchers with full flexibility to develop data analysis workflows tailored toward their specific applications.


Asunto(s)
Espectrometría de Masas/métodos , Péptidos/metabolismo , Proteoma/metabolismo , Proteómica/métodos , Cromatografía Liquida , Bases de Datos de Proteínas , Humanos , Programas Informáticos
4.
Int J Mol Sci ; 22(3)2021 Jan 21.
Artículo en Inglés | MEDLINE | ID: mdl-33494376

RESUMEN

As a major group of algae, diatoms are responsible for a substantial part of the primary production on the planet. Pennate diatoms have a predominantly benthic lifestyle and are the most species-rich diatom group, with members of the raphid clades being motile and generally having heterothallic sexual reproduction. It was recently shown that the model species Seminavis robusta uses multiple sexual cues during mating, including cyclo(l-Pro-l-Pro) as an attraction pheromone. Elaboration of the pheromone-detection system is a key aspect in elucidating pennate diatom life-cycle regulation that could yield novel fundamental insights into diatom speciation. This study reports the synthesis and bio-evaluation of seven novel pheromone analogs containing small structural alterations to the cyclo(l-Pro-l-Pro) pheromone. Toxicity, attraction, and interference assays were applied to assess their potential activity as a pheromone. Most of our analogs show a moderate-to-good bioactivity and low-to-no phytotoxicity. The pheromone activity of azide- and diazirine-containing analogs was unaffected and induced a similar mating behavior as the natural pheromone. These results demonstrate that the introduction of confined structural modifications can be used to develop a chemical probe based on the diazirine- and/or azide-containing analogs to study the pheromone-detection system of S. robusta.


Asunto(s)
Diatomeas/metabolismo , Feromonas/metabolismo , Atractivos Sexuales/metabolismo , Vías Biosintéticas , Estructura Molecular , Feromonas/química , Reproducción , Atractivos Sexuales/química
5.
BMC Genomics ; 21(1): 733, 2020 Oct 22.
Artículo en Inglés | MEDLINE | ID: mdl-33092529

RESUMEN

BACKGROUND: Microorganisms are not only indispensable to ecosystem functioning, they are also keystones for emerging technologies. In the last 15 years, the number of studies on environmental microbial communities has increased exponentially due to advances in sequencing technologies, but the large amount of data generated remains difficult to analyze and interpret. Recently, metabarcoding analysis has shifted from clustering reads using Operational Taxonomical Units (OTUs) to Amplicon Sequence Variants (ASVs). Differences between these methods can seriously affect the biological interpretation of metabarcoding data, especially in ecosystems with high microbial diversity, as the methods are benchmarked based on low diversity datasets. RESULTS: In this work we have thoroughly examined the differences in community diversity, structure, and complexity between the OTU and ASV methods. We have examined culture-based mock and simulated datasets as well as soil- and plant-associated bacterial and fungal environmental communities. Four key findings were revealed. First, analysis of microbial datasets at family level guaranteed both consistency and adequate coverage when using either method. Second, the performance of both methods used are related to community diversity and sample sequencing depth. Third, differences in the method used affected sample diversity and number of detected differentially abundant families upon treatment; this may lead researchers to draw different biological conclusions. Fourth, the observed differences can mostly be attributed to low abundant (relative abundance < 0.1%) families, thus extra care is recommended when studying rare species using metabarcoding. The ASV method used outperformed the adopted OTU method concerning community diversity, especially for fungus-related sequences, but only when the sequencing depth was sufficient to capture the community complexity. CONCLUSIONS: Investigation of metabarcoding data should be done with care. Correct biological interpretation depends on several factors, including in-depth sequencing of the samples, choice of the most appropriate filtering strategy for the specific research goal, and use of family level for data clustering.


Asunto(s)
Microbiota , Suelo , Bacterias/genética , Hongos/genética , Humanos , Microbiota/genética , Microbiología del Suelo
6.
Anal Chem ; 92(9): 6278-6287, 2020 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-32227882

RESUMEN

Missing values are a major issue in quantitative data-dependent mass spectrometry-based proteomics. We therefore present an innovative solution to this key issue by introducing a hurdle model, which is a mixture between a binomial peptide count and a peptide intensity-based model component. It enables dramatically enhanced quantification of proteins with many missing values without having to resort to harmful assumptions for missingness. We demonstrate the superior performance of our method by comparing it with state-of-the-art methods in the field.


Asunto(s)
Proteómica/métodos , Proyectos de Investigación , Cromatografía Líquida de Alta Presión , Espectrometría de Masas , Modelos Teóricos , Péptidos/análisis , Proteoma/análisis
7.
J Proteome Res ; 17(6): 2182-2191, 2018 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-29733654

RESUMEN

A20 is a negative regulator of NF-κB signaling; it controls inflammatory responses and ensures tissue homeostasis. A20 is thought to restrict NF-κB activation both by its ubiquitin-editing activity as well as by its nonenzymatic activities. Besides its role in NF-κB signaling, A20 also acts as a protective factor inhibiting apoptosis and necroptosis. Because of the ability of A20 to both ubiquitinate and deubiquitinate substrates, and its involvement in many cellular processes, we hypothesized that deletion of A20 might generally impact on protein levels, thereby disrupting cellular signaling. We performed a differential proteomics study on bone marrow-derived macrophages (BMDMs) from control and myeloid-specific A20 knockout mice, both in untreated conditions and after LPS or TNF treatment, and demonstrated A20-dependent changes in protein expression. Several inflammatory proteins were found up-regulated in the absence of A20, even without an inflammatory stimulus, but, depending on the treatment and the treatment time, more proteins were found regulated. Together these protein changes may affect normal signaling events, which may disturb tissue homeostasis and induce (autoimmune) inflammation, in agreement with A20s proposed identity as a susceptibility gene for inflammatory disease. We further verify that immune-responsive gene 1 (IRG1) is up-regulated in the absence of A20 and that its levels are transcriptionally regulated.


Asunto(s)
Hidroliasas/metabolismo , Proteómica/métodos , Proteína 3 Inducida por el Factor de Necrosis Tumoral alfa/deficiencia , Animales , Regulación de la Expresión Génica/efectos de los fármacos , Hidroliasas/antagonistas & inhibidores , Lipopolisacáridos/farmacología , Macrófagos/metabolismo , Ratones , Ratones Noqueados , Transcripción Genética , Proteína 3 Inducida por el Factor de Necrosis Tumoral alfa/fisiología , Factor de Necrosis Tumoral alfa/farmacología , Regulación hacia Arriba
8.
Mol Cell Proteomics ; 15(2): 657-68, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26566788

RESUMEN

Peptide intensities from mass spectra are increasingly used for relative quantitation of proteins in complex samples. However, numerous issues inherent to the mass spectrometry workflow turn quantitative proteomic data analysis into a crucial challenge. We and others have shown that modeling at the peptide level outperforms classical summarization-based approaches, which typically also discard a lot of proteins at the data preprocessing step. Peptide-based linear regression models, however, still suffer from unbalanced datasets due to missing peptide intensities, outlying peptide intensities and overfitting. Here, we further improve upon peptide-based models by three modular extensions: ridge regression, improved variance estimation by borrowing information across proteins with empirical Bayes and M-estimation with Huber weights. We illustrate our method on the CPTAC spike-in study and on a study comparing wild-type and ArgP knock-out Francisella tularensis proteomes. We show that the fold change estimates of our robust approach are more precise and more accurate than those from state-of-the-art summarization-based methods and peptide-based regression models, which leads to an improved sensitivity and specificity. We also demonstrate that ionization competition effects come already into play at very low spike-in concentrations and confirm that analyses with peptide-based regression methods on peptide intensity values aggregated by charge state and modification status (e.g. MaxQuant's peptides.txt file) are slightly superior to analyses on raw peptide intensity values (e.g. MaxQuant's evidence.txt file).


Asunto(s)
Péptidos/genética , Proteoma/genética , Proteómica/métodos , Teorema de Bayes , Modelos Lineales , Espectrometría de Masas , Proteómica/estadística & datos numéricos
9.
BMC Bioinformatics ; 18(1): 535, 2017 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-29191167

RESUMEN

BACKGROUND: In the search for novel causal mutations, public and/or private variant databases are nearly always used to facilitate the search as they result in a massive reduction of putative variants in one step. Practically, variant filtering is often done by either using all variants from the variant database (called the absence-approach, i.e. it is assumed that disease-causing variants do not reside in variant databases) or by using the subset of variants with an allelic frequency > 1% (called the 1%-approach). We investigate the validity of these two approaches in terms of false negatives (the true disease-causing variant does not pass all filters) and false positives (a harmless mutation passes all filters and is erroneously retained in the list of putative disease-causing variants) and compare it with an novel approach which we named the quantile-based approach. This approach applies variable instead of static frequency thresholds and the calculation of these thresholds is based on prior knowledge of disease prevalence, inheritance models, database size and database characteristics. RESULTS: Based on real-life data, we demonstrate that the quantile-based approach outperforms the absence-approach in terms of false negatives. At the same time, this quantile-based approach deals more appropriately with the variable allele frequencies of disease-causing alleles in variant databases relative to the 1%-approach and as such allows a better control of the number of false positives. We also introduce an alternative application for variant database usage and the quantile-based approach. If disease-causing variants in variant databases deviate substantially from theoretical expectancies calculated with the quantile-based approach, their association between genotype and phenotype had to be reconsidered in 12 out of 13 cases. CONCLUSIONS: We developed a novel method and demonstrated that this so-called quantile-based approach is a highly suitable method for variant filtering. In addition, the quantile-based approach can also be used for variant flagging. For user friendliness, lookup tables and easy-to-use R calculators are provided.


Asunto(s)
Bases de Datos Genéticas , Estudios de Asociación Genética , Alelos , Anomalías Congénitas/genética , Anomalías Congénitas/patología , Frecuencia de los Genes , Genotipo , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple
10.
Anal Chem ; 89(8): 4461-4467, 2017 04 18.
Artículo en Inglés | MEDLINE | ID: mdl-28350455

RESUMEN

Standard data analysis pipelines for digital PCR estimate the concentration of a target nucleic acid by digitizing the end-point fluorescence of the parallel micro-PCR reactions, using an automated hard threshold. While it is known that misclassification has a major impact on the concentration estimate and substantially reduces accuracy, the uncertainty of this classification is typically ignored. We introduce a model-based clustering method to estimate the probability that the target is present (absent) in a partition conditional on its observed fluorescence and the distributional shape in no-template control samples. This methodology acknowledges the inherent uncertainty of the classification and provides a natural measure of precision, both at individual partition level and at the level of the global concentration. We illustrate our method on genetically modified organism, inhibition, dynamic range, and mutation detection experiments. We show that our method provides concentration estimates of similar accuracy or better than the current standard, along with a more realistic measure of precision. The individual partition probabilities and diagnostic density plots further allow for some quality control. An R implementation of our method, called Umbrella, is available, providing a more objective and automated data analysis procedure for absolute dPCR quantification.


Asunto(s)
Modelos Teóricos , Reacción en Cadena de la Polimerasa/métodos , ADN de Plantas/análisis , ADN de Plantas/metabolismo , Plantas Modificadas Genéticamente/genética , Reacción en Cadena de la Polimerasa/normas , Control de Calidad
11.
Bioinformatics ; 31(1): 94-101, 2015 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-25178459

RESUMEN

MOTIVATION: In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical noise. For Illumina sequencing, single base substitutions are the main error source and impede powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores (Qs) that are useful for differentiating errors from the real low-frequency mutations. RESULTS: A variant calling tool, Q-cpileup, is proposed, which exploits the Qs of nucleotides in a filtering strategy to increase specificity. The tool is imbedded in an open-source pipeline, VirVarSeq, which allows variant calling starting from fastq files. Using both plasmid mixtures and clinical samples, we show that Q-cpileup is able to reduce the number of false-positive findings. The filtering strategy is adaptive and provides an optimized threshold for individual samples in each sequencing run. Additionally, linkage information is kept between single-nucleotide polymorphisms as variants are called at the codon level. This enables virologists to have an immediate biological interpretation of the reported variants with respect to their antiviral drug responses. A comparison with existing SNP caller tools reveals that calling variants at the codon level with Q-cpileup results in an outstanding sensitivity while maintaining a good specificity for variants with frequencies down to 0.5%. AVAILABILITY: The VirVarSeq is available, together with a user's guide and test data, at sourceforge: http://sourceforge.net/projects/virtools/?source=directory.


Asunto(s)
Algoritmos , Variación Genética/genética , Genómica/métodos , Hepacivirus/genética , Hepatitis C/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Genoma Viral , Hepatitis C/virología , Humanos
12.
Plant Physiol ; 167(3): 800-16, 2015 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-25604532

RESUMEN

Although the response of plants exposed to severe drought stress has been studied extensively, little is known about how plants adapt their growth under mild drought stress conditions. Here, we analyzed the leaf and rosette growth response of six Arabidopsis (Arabidopsis thaliana) accessions originating from different geographic regions when exposed to mild drought stress. The automated phenotyping platform WIWAM was used to impose stress early during leaf development, when the third leaf emerges from the shoot apical meristem. Analysis of growth-related phenotypes showed differences in leaf development between the accessions. In all six accessions, mild drought stress reduced both leaf pavement cell area and number without affecting the stomatal index. Genome-wide transcriptome analysis (using RNA sequencing) of early developing leaf tissue identified 354 genes differentially expressed under mild drought stress in the six accessions. Our results indicate the existence of a robust response over different genetic backgrounds to mild drought stress in developing leaves. The processes involved in the overall mild drought stress response comprised abscisic acid signaling, proline metabolism, and cell wall adjustments. In addition to these known severe drought-related responses, 87 genes were found to be specific for the response of young developing leaves to mild drought stress.


Asunto(s)
Arabidopsis/fisiología , Sequías , Ecotipo , Hojas de la Planta/fisiología , Estrés Fisiológico , Arabidopsis/genética , Arabidopsis/crecimiento & desarrollo , Pared Celular/metabolismo , Perfilación de la Expresión Génica , Regulación de la Expresión Génica de las Plantas , Ontología de Genes , Redes Reguladoras de Genes , Genes de Plantas , Fenotipo , Hojas de la Planta/anatomía & histología , Plantones/crecimiento & desarrollo
13.
PLoS Genet ; 9(8): e1003693, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23966873

RESUMEN

Revealing QTLs with a minor effect in complex traits remains difficult. Initial strategies had limited success because of interference by major QTLs and epistasis. New strategies focused on eliminating major QTLs in subsequent mapping experiments. Since genetic analysis of superior segregants from natural diploid strains usually also reveals QTLs linked to the inferior parent, we have extended this strategy for minor QTL identification by eliminating QTLs in both parent strains and repeating the QTL mapping with pooled-segregant whole-genome sequence analysis. We first mapped multiple QTLs responsible for high thermotolerance in a natural yeast strain, MUCL28177, compared to the laboratory strain, BY4742. Using single and bulk reciprocal hemizygosity analysis we identified MKT1 and PRP42 as causative genes in QTLs linked to the superior and inferior parent, respectively. We subsequently downgraded both parents by replacing their superior allele with the inferior allele of the other parent. QTL mapping using pooled-segregant whole-genome sequence analysis with the segregants from the cross of the downgraded parents, revealed several new QTLs. We validated the two most-strongly linked new QTLs by identifying NCS2 and SMD2 as causative genes linked to the superior downgraded parent and we found an allele-specific epistatic interaction between PRP42 and SMD2. Interestingly, the related function of PRP42 and SMD2 suggests an important role for RNA processing in high thermotolerance and underscores the relevance of analyzing minor QTLs. Our results show that identification of minor QTLs involved in complex traits can be successfully accomplished by crossing parent strains that have both been downgraded for a single QTL. This novel approach has the advantage of maintaining all relevant genetic diversity as well as enough phenotypic difference between the parent strains for the trait-of-interest and thus maximizes the chances of successfully identifying additional minor QTLs that are relevant for the phenotypic difference between the original parents.


Asunto(s)
Proteínas de Ciclo Celular/genética , Sitios de Carácter Cuantitativo/genética , Procesamiento Postranscripcional del ARN/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Alelos , Mapeo Cromosómico , Ligamiento Genético , Variación Genética , Calor , ARN/genética
14.
BMC Bioinformatics ; 16: 379, 2015 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-26554718

RESUMEN

BACKGROUND: Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth ("deep sequencing"), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset. RESULTS: For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNV(D)). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNV(HS)). To also increase specificity, SNVs called were overruled when their frequency was below the 80(th) percentile calculated on the distribution of error frequencies (QQ-SNV(HS-P80)). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNV(D) performed similarly to the existing approaches. QQ-SNV(HS) was more sensitive on all test sets but with more false positives. QQ-SNV(HS-P80) was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5%, QQ-SNV(HS-P80) revealed a sensitivity of 100% (vs. 40-60% for the existing methods) and a specificity of 100% (vs. 98.0-99.7% for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5% were consistently detected by QQ-SNV(HS-P80) from different generations of Illumina sequencers. CONCLUSIONS: We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data.


Asunto(s)
Infecciones por VIH/genética , VIH-1/genética , Hepacivirus/genética , Hepatitis C/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Polimorfismo de Nucleótido Simple/genética , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Simulación por Computador , Genoma Viral , Infecciones por VIH/virología , Hepatitis C/virología , Humanos , Plásmidos/genética , Análisis de Regresión
15.
BMC Bioinformatics ; 16: 59, 2015 Feb 22.
Artículo en Inglés | MEDLINE | ID: mdl-25887734

RESUMEN

BACKGROUND: Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. RESULTS: Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. CONCLUSIONS: ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection.


Asunto(s)
Algoritmos , Variación Genética/genética , Hepacivirus/genética , Hepatitis C/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Mutación/genética , Programas Informáticos , Análisis por Conglomerados , Genoma Viral , Genómica/métodos , Hepatitis C/virología , Humanos , Sensibilidad y Especificidad , Análisis de Secuencia de ADN/métodos
16.
J Proteome Res ; 14(6): 2457-65, 2015 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-25827922

RESUMEN

Quantitative label-free mass spectrometry is increasingly used to analyze the proteomes of complex biological samples. However, the choice of appropriate data analysis methods remains a major challenge. We therefore provide a rigorous comparison between peptide-based models and peptide-summarization-based pipelines. We show that peptide-based models outperform summarization-based pipelines in terms of sensitivity, specificity, accuracy, and precision. We also demonstrate that the predefined FDR cutoffs for the detection of differentially regulated proteins can become problematic when differentially expressed (DE) proteins are highly abundant in one or more samples. Care should therefore be taken when data are interpreted from samples with spiked-in internal controls and from samples that contain a few very highly abundant proteins. We do, however, show that specific diagnostic plots can be used for assessing differentially expressed proteins and the overall quality of the obtained fold change estimates. Finally, our study also illustrates that imputation under the "missing by low abundance" assumption is beneficial for the detection of differential expression in proteins with low abundance, but it negatively affects moderately to highly abundant proteins. Hence, imputation strategies that are commonly implemented in standard proteomics software should be used with care.


Asunto(s)
Interpretación Estadística de Datos , Guías como Asunto , Modelos Químicos , Péptidos/química , Proteómica , Curva ROC
17.
Genome Res ; 22(5): 975-84, 2012 May.
Artículo en Inglés | MEDLINE | ID: mdl-22399573

RESUMEN

High ethanol tolerance is an exquisite characteristic of the yeast Saccharomyces cerevisiae, which enables this microorganism to dominate in natural and industrial fermentations. Up to now, ethanol tolerance has only been analyzed in laboratory yeast strains with moderate ethanol tolerance. The genetic basis of the much higher ethanol tolerance in natural and industrial yeast strains is unknown. We have applied pooled-segregant whole-genome sequence analysis to map all quantitative trait loci (QTL) determining high ethanol tolerance. We crossed a highly ethanol-tolerant segregant of a Brazilian bioethanol production strain with a laboratory strain with moderate ethanol tolerance. Out of 5974 segregants, we pooled 136 segregants tolerant to at least 16% ethanol and 31 segregants tolerant to at least 17%. Scoring of SNPs using whole-genome sequence analysis of DNA from the two pools and parents revealed three major loci and additional minor loci. The latter were more pronounced or only present in the 17% pool compared to the 16% pool. In the locus with the strongest linkage, we identified three closely located genes affecting ethanol tolerance: MKT1, SWS2, and APJ1, with SWS2 being a negative allele located in between two positive alleles. SWS2 and APJ1 probably contained significant polymorphisms only outside the ORF, and lower expression of APJ1 may be linked to higher ethanol tolerance. This work has identified the first causative genes involved in high ethanol tolerance of yeast. It also reveals the strong potential of pooled-segregant sequence analysis using relatively small numbers of selected segregants for identifying QTL on a genome-wide scale.


Asunto(s)
Etanol/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Mapeo Cromosómico , Técnicas de Inactivación de Genes , Genes Fúngicos , Estudios de Asociación Genética , Ligamiento Genético , Genoma Fúngico , Viabilidad Microbiana , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Saccharomyces cerevisiae/crecimiento & desarrollo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Análisis de Secuencia de ADN
19.
Bioinformatics ; 30(17): 2494-5, 2014 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-24794933

RESUMEN

MOTIVATION: Recently, De Neve et al. proposed a modification of the Wilcoxon-Mann-Whitney (WMW) test for assessing differential expression based on RT-qPCR data. Their test, referred to as the unified WMW (uWMW) test, incorporates a robust and intuitive normalization and quantifies the probability that the expression from one treatment group exceeds the expression from another treatment group. However, no software package for this test was available yet. RESULTS: We have developed a Bioconductor package for analyzing RT-qPCR data with the uWMW test. The package also provides graphical tools for visualizing the effect sizes. AVAILABILITY AND IMPLEMENTATION: The unifiedWMWqPCR package and its user documentation can be obtained through Bioconductor.


Asunto(s)
Reacción en Cadena en Tiempo Real de la Polimerasa/métodos , Programas Informáticos , Humanos , MicroARNs/análisis , Neuroblastoma/genética , Estadísticas no Paramétricas
20.
Nature ; 458(7238): 623-6, 2009 Apr 02.
Artículo en Inglés | MEDLINE | ID: mdl-19270679

RESUMEN

Owing to the present global biodiversity crisis, the biodiversity-stability relationship and the effect of biodiversity on ecosystem functioning have become major topics in ecology. Biodiversity is a complex term that includes taxonomic, functional, spatial and temporal aspects of organismic diversity, with species richness (the number of species) and evenness (the relative abundance of species) considered among the most important measures. With few exceptions (see, for example, ref. 6), the majority of studies of biodiversity-functioning and biodiversity-stability theory have predominantly examined richness. Here we show, using microbial microcosms, that initial community evenness is a key factor in preserving the functional stability of an ecosystem. Using experimental manipulations of both richness and initial evenness in microcosms with denitrifying bacterial communities, we found that the stability of the net ecosystem denitrification in the face of salinity stress was strongly influenced by the initial evenness of the community. Therefore, when communities are highly uneven, or there is extreme dominance by one or a few species, their functioning is less resistant to environmental stress. Further unravelling how evenness influences ecosystem processes in natural and humanized environments constitutes a major future conceptual challenge.


Asunto(s)
Bacterias/metabolismo , Biodiversidad , Modelos Biológicos , Selección Genética , Bacterias/genética , Nitratos/metabolismo , Nitritos/metabolismo , ARN Ribosómico 16S/genética , Estrés Fisiológico
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA