Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 128
Filtrar
1.
BMC Bioinformatics ; 25(1): 59, 2024 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-38321386

RESUMO

The prediction of interactions between novel drugs and biological targets is a vital step in the early stage of the drug discovery pipeline. Many deep learning approaches have been proposed over the last decade, with a substantial fraction of them sharing the same underlying two-branch architecture. Their distinction is limited to the use of different types of feature representations and branches (multi-layer perceptrons, convolutional neural networks, graph neural networks and transformers). In contrast, the strategy used to combine the outputs (embeddings) of the branches has remained mostly the same. The same general architecture has also been used extensively in the area of recommender systems, where the choice of an aggregation strategy is still an open question. In this work, we investigate the effectiveness of three different embedding aggregation strategies in the area of drug-target interaction (DTI) prediction. We formally define these strategies and prove their universal approximator capabilities. We then present experiments that compare the different strategies on benchmark datasets from the area of DTI prediction, showcasing conditions under which specific strategies could be the obvious choice.


Assuntos
Benchmarking , Descoberta de Drogas , Fontes de Energia Elétrica , Redes Neurais de Computação
2.
Ecol Lett ; 27(5): e14433, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38712704

RESUMO

The negative diversity-invasion relationship observed in microbial invasion studies is commonly explained by competition between the invader and resident populations. However, whether this relationship is affected by invader-resident cooperative interactions is unknown. Using ecological and mathematical approaches, we examined the survival and functionality of Aminobacter niigataensis MSH1 to mineralize 2,6-dichlorobenzamide (BAM), a groundwater micropollutant affecting drinking water production, in sand microcosms when inoculated together with synthetic assemblies of resident bacteria. The assemblies varied in richness and in strains that interacted pairwise with MSH1, including cooperative and competitive interactions. While overall, the negative diversity-invasion relationship was retained, residents engaging in cooperative interactions with the invader had a positive impact on MSH1 survival and functionality, highlighting the dependency of invasion success on community composition. No correlation existed between community richness and the delay in BAM mineralization by MSH1. The findings suggest that the presence of cooperative residents can alleviate the negative diversity-invasion relationship.


Assuntos
Microbiota , Benzamidas , Interações Microbianas , Phyllobacteriaceae/fisiologia , Água Subterrânea/microbiologia , Biodiversidade
3.
Bioinformatics ; 38(4): 1144-1145, 2022 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-34788379

RESUMO

SUMMARY: In combinatorial biotechnology, it is crucial for screening experiments to sufficiently cover the design space. In the BioCCP.jl package (Julia), we provide functions for minimum sample size determination based on the mathematical framework coined the Coupon Collector Problem. AVAILABILITY AND IMPLEMENTATION: BioCCP.jl, including source code, documentation and Pluto notebooks, is available at https://github.com/kirstvh/BioCCP.jl.


Assuntos
Documentação , Software , Abelhas , Animais
4.
Crit Rev Food Sci Nutr ; 63(25): 7837-7851, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35297716

RESUMO

Dietary diversity is an established public health principle, and its measurement is essential for studies of diet quality and food security. However, conventional between food group scores fail to capture the nutritional variability and ecosystem services delivered by dietary richness and dissimilarity within food groups, or the relative distribution (i.e., evenness or moderation) of e.g., species or varieties across whole diets. Summarizing food biodiversity in an all-encompassing index is problematic. Therefore, various diversity indices have been proposed in ecology, yet these require methodological adaption for integration in dietary assessments. In this narrative review, we summarize the key conceptual issues underlying the measurement of food biodiversity at an edible species level, assess the ecological diversity indices previously applied to food consumption and food supply data, discuss their relative suitability, and potential amendments for use in (quantitative) dietary intake studies. Ecological diversity indices are often used without justification through the lens of nutrition. To illustrate: (i) dietary species richness fails to account for the distribution of foods across the diet or their functional traits; (ii) evenness indices, such as the Gini-Simpson index, require widely accepted relative abundance units (e.g., kcal, g, cups) and evidence-based moderation weighting factors; and (iii) functional dissimilarity indices are constructed based on an arbitrary selection of distance measures, cutoff criteria, and number of phylogenetic, nutritional, and morphological traits. Disregard for these limitations can lead to counterintuitive results and ambiguous or incorrect conclusions about the food biodiversity within diets or food systems. To ensure comparability and robustness of future research, we advocate food biodiversity indices that: (i) satisfy key axioms; (ii) can be extended to account for disparity between edible species; and (iii) are used in combination, rather than in isolation.Supplemental data for this article is available online at https://doi.org/10.1080/10408398.2022.2051163 .


Assuntos
Biodiversidade , Dieta , Humanos , Ingestão de Alimentos , Filogenia
5.
Brief Bioinform ; 21(1): 262-271, 2020 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-30329015

RESUMO

Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings. In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models. The machine learning techniques with the algebraic shortcuts are implemented in the RLScore software package: https://github.com/aatapa/RLScore.

6.
Clin Chem ; 68(7): 906-916, 2022 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-35266984

RESUMO

BACKGROUND: Synthetic cannabinoid receptor agonists (SCRAs) are amongst the largest groups of new psychoactive substances (NPS). Their often high activity at the CB1 cannabinoid receptor frequently results in intoxication, imposing serious health risks. Hence, continuous monitoring of these compounds is important, but challenged by the rapid emergence of novel analogues that are missed by traditional targeted detection strategies. We addressed this need by performing an activity-based, universal screening on a large set (n = 968) of serum samples from patients presenting to the emergency department with acute recreational drug or NPS toxicity. METHODS: We assessed the performance of an activity-based method in detecting newly circulating SCRAs compared with liquid chromatography coupled to high-resolution mass spectrometry. Additionally, we developed and evaluated machine learning models to reduce the screening workload by automating interpretation of the activity-based screening output. RESULTS: Activity-based screening delivered outstanding performance, with a sensitivity of 94.6% and a specificity of 98.5%. Furthermore, the developed machine learning models allowed accurate distinction between positive and negative patient samples in an automatic manner, closely matching the manual scoring of samples. The performance of the model depended on the predefined threshold, e.g., at a threshold of 0.055, sensitivity and specificity were both 94.0%. CONCLUSION: The activity-based bioassay is an ideal candidate for untargeted screening of novel SCRAs. The combination of this universal screening assay and a machine learning approach for automated sample scoring is a promising complement to conventional analytical methods in clinical practice.


Assuntos
Canabinoides , Drogas Ilícitas , Agonistas de Receptores de Canabinoides/farmacologia , Cromatografia Líquida/métodos , Humanos , Aprendizado de Máquina
7.
Environ Sci Technol ; 56(2): 1352-1364, 2022 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-34982540

RESUMO

Bioaugmentation often involves an invasion process requiring the establishment and activity of a foreign microbe in the resident community of the target environment. Interactions with resident micro-organisms, either antagonistic or cooperative, are believed to impact invasion. However, few studies have examined the variability of interactions between an invader and resident species of its target environment, and none of them considered a bioremediation context. Aminobacter sp. MSH1 mineralizing the groundwater micropollutant 2,6-dichlorobenzamide (BAM), is proposed for bioaugmentation of sand filters used in drinking water production to avert BAM contamination. We examined the nature of the interactions between MSH1 and 13 sand filter resident bacteria in dual and triple species assemblies in sand microcosms. The residents affected MSH1-mediated BAM mineralization without always impacting MSH1 cell densities, indicating effects on cell physiology rather than on cell number. Exploitative competition explained most of the effects (70%), but indications of interference competition were also found. Two residents improved BAM mineralization in dual species assemblies, apparently in a mutual cooperation, and overruled negative effects by others in triple species systems. The results suggest that sand filter communities contain species that increase MSH1 fitness. This opens doors for assisting bioaugmentation through co-inoculation with "helper" bacteria originating from and adapted to the target environment.


Assuntos
Água Subterrânea , Phyllobacteriaceae , Purificação da Água , Bactérias , Benzamidas , Biodegradação Ambiental , Purificação da Água/métodos
8.
Theor Appl Genet ; 134(12): 3845-3861, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34387711

RESUMO

KEY MESSAGE: The deep scoping method incorporates the use of a gene bank together with different population layers to reintroduce genetic variation into the breeding population, thus maximizing the long-term genetic gain without reducing the short-term genetic gain or increasing the total financial cost. Genomic prediction is often combined with truncation selection to identify superior parental individuals that can pass on favorable quantitative trait locus (QTL) alleles to their offspring. However, truncation selection reduces genetic variation within the breeding population, causing a premature convergence to a sub-optimal genetic value. In order to also increase genetic gain in the long term, different methods have been proposed that better preserve genetic variation. However, when the genetic variation of the breeding population has already been reduced as a result of prior intensive selection, even those methods will not be able to avert such premature convergence. Pre-breeding provides a solution for this problem by reintroducing genetic variation into the breeding population. Unfortunately, as pre-breeding often relies on a separate breeding population to increase the genetic value of wild specimens before introducing them in the elite population, it comes with an increased financial cost. In this paper, on the basis of a simulation study, we propose a new method that reintroduces genetic variation in the breeding population on a continuous basis without the need for a separate pre-breeding program or a larger population size. This way, we are able to introduce favorable QTL alleles into an elite population and maximize the genetic gain in the short as well as in the long term without increasing the financial cost.


Assuntos
Variação Genética , Melhoramento Vegetal , Locos de Características Quantitativas , Alelos , Haploidia , Hordeum/genética , Modelos Genéticos , Melhoramento Vegetal/métodos
9.
Proc Natl Acad Sci U S A ; 115(1): 127-132, 2018 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-29255049

RESUMO

Biodiversity is key for human and environmental health. Available dietary and ecological indicators are not designed to assess the intricate relationship between food biodiversity and diet quality. We applied biodiversity indicators to dietary intake data from and assessed associations with diet quality of women and young children. Data from 24-hour diet recalls (55% in the wet season) of n = 6,226 participants (34% women) in rural areas from seven low- and middle-income countries were analyzed. Mean adequacies of vitamin A, vitamin C, folate, calcium, iron, and zinc and diet diversity score (DDS) were used to assess diet quality. Associations of biodiversity indicators with nutrient adequacy were quantified using multilevel models, receiver operating characteristic curves, and test sensitivity and specificity. A total of 234 different species were consumed, of which <30% were consumed in more than one country. Nine species were consumed in all countries and provided, on average, 61% of total energy intake and a significant contribution of micronutrients in the wet season. Compared with Simpson's index of diversity and functional diversity, species richness (SR) showed stronger associations and better diagnostic properties with micronutrient adequacy. For every additional species consumed, dietary nutrient adequacy increased by 0.03 (P < 0.001). Diets with higher nutrient adequacy were mostly obtained when both SR and DDS were maximal. Adding SR to the minimum cutoff for minimum diet diversity improved the ability to detect diets with higher micronutrient adequacy in women but not in children. Dietary SR is recommended as the most appropriate measure of food biodiversity in diets.


Assuntos
Ingestão de Alimentos , Preferências Alimentares , Micronutrientes , Valor Nutritivo , População Rural , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Masculino
10.
Chaos ; 31(1): 013136, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33754763

RESUMO

Discrete dynamical systems such as cellular automata are of increasing interest to scientists in a variety of disciplines since they are simple models of computation capable of simulating complex phenomena. For this reason, the problem of reversibility of such systems is very important and, therefore, recurrently taken up by researchers. Unfortunately, the study of reversibility is remarkably hard, especially in the case of two- or higher-dimensional cellular automata. In this paper, we propose a novel and simple method that allows us to completely resolve the reversibility problem of a wide class of linear cellular automata on finite triangular grids with null boundary conditions.

11.
Entropy (Basel) ; 23(6)2021 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-34199402

RESUMO

Shannon's entropy measure is a popular means for quantifying ecological diversity. We explore how one can use information-theoretic measures (that are often called indices in ecology) on joint ensembles to study the diversity of species interaction networks. We leverage the little-known balance equation to decompose the network information into three components describing the species abundance, specificity, and redundancy. This balance reveals that there exists a fundamental trade-off between these components. The decomposition can be straightforwardly extended to analyse networks through time as well as space, leading to the corresponding notions for alpha, beta, and gamma diversity. Our work aims to provide an accessible introduction for ecologists. To this end, we illustrate the interpretation of the components on numerous real networks. The corresponding code is made available to the community in the specialised Julia package EcologicalNetworks.jl.

12.
Environ Res ; 183: 108619, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31836206

RESUMO

Black carbon is often used as an indicator for combustion-related air pollution. In urban environments, on-road black carbon concentrations have a large spatial variability, suggesting that the personal exposure of a cyclist to black carbon can heavily depend on the route that is chosen to reach a destination. In this paper, we describe the development of a cyclist routing procedure that minimizes personal exposure to black carbon. Firstly, a land use regression model for predicting black carbon concentrations in an urban environment is developed using mobile monitoring data, collected by cyclists. The optimal model is selected and validated using a spatially stratified cross-validation scheme. The resulting model is integrated in a dedicated routing procedure that minimizes personal exposure to black carbon during cycling. The best model obtains a coefficient of multiple correlation of R=0.520. Simulations with the black carbon exposure minimizing routing procedure indicate that the inhaled amount of black carbon is reduced by 1.58% on average as compared to the shortest-path route, with extreme cases where a reduction of up to 13.35% is obtained. Moreover, we observed that the average exposure to black carbon and the exposure to local peak concentrations on a route are competing objectives, and propose a parametrized cost function for the routing problem that allows for a gradual transition from routes that minimize average exposure to routes that minimize peak exposure.


Assuntos
Poluentes Atmosféricos , Poluição do Ar , Carbono , Monitoramento Ambiental , Exposição Ambiental , Material Particulado , Fuligem
13.
Cytometry A ; 95(7): 782-791, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31099963

RESUMO

Recent years have seen an increased interest in employing data analysis techniques for the automated identification of cell populations in the field of cytometry. These techniques highly depend on the use of a distance metric, a function that quantifies the distances between single-cell measurements. In most cases, researchers simply use the Euclidean distance metric. In this article, we exploit the availability of single-cell labels to find an optimal Mahalanobis distance metric derived from the data. We show that such a Mahalanobis distance metric results in an improved identification of cell populations compared with the Euclidean distance metric. Once determined, it can be used for the analysis of multiple samples that were measured under the same experimental setup. We illustrate this approach for cytometry data from two different origins, that is, flow cytometry applied to microbial cells and mass cytometry for the analysis of human blood cells. We also illustrate that such a distance metric results in an improved identification of cell populations when clustering methods are employed. Generally, these results imply that the performance of data analysis techniques can be improved by using a more advanced distance metric. © 2019 International Society for Advancement of Cytometry.


Assuntos
Citometria de Fluxo/métodos , Aprendizado de Máquina , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Bactérias/citologia , Células Sanguíneas/citologia , Análise por Conglomerados , Humanos , Microbiota , Análise de Célula Única
14.
Environ Sci Technol ; 53(24): 14459-14469, 2019 12 17.
Artigo em Inglês | MEDLINE | ID: mdl-31682110

RESUMO

Many disciplines rely on testing combinations of compounds, materials, proteins, or bacterial species to drive scientific discovery. It is time-consuming and expensive to determine experimentally, via trial-and-error or random selection approaches, which of the many possible combinations will lead to desirable outcomes. Hence, there is a pressing need for more rational and efficient experimental design approaches to reduce experimental effort. In this work, we demonstrate the potential of machine learning methods for the in silico selection of promising co-culture combinations in the application of bioaugmentation. We use the example of pollutant removal in drinking water treatment plants, which can be achieved using co-cultures of a specialized pollutant degrader with combinations of bacterial isolates. To reduce the experimental effort needed to discover high-performing combinations, we propose a data-driven experimental design. Based on a dataset of mineralization performance for all pairs of 13 bacterial species co-cultured with MSH1, we built a Gaussian process regression model to predict the Gompertz mineralization parameters of the co-cultures of two and three species, based on the single-strain parameters. We subsequently used this model in a Bayesian optimization scheme to suggest potentially high-performing combinations of bacteria. We achieved good performance with this approach, both for predicting mineralization parameters and for selecting effective co-cultures, despite the limited dataset. As a novel application of Bayesian optimization in bioremediation, this experimental design approach has promising applications for highlighting co-culture combinations for in vitro testing in various settings, to lessen the experimental burden and perform more targeted screenings.


Assuntos
Purificação da Água , Teorema de Bayes , Biodegradação Ambiental , Técnicas de Cocultura , Simulação por Computador
15.
Nucleic Acids Res ; 45(7): e51, 2017 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-27986855

RESUMO

In microRNA (miRNA) target prediction, typically two levels of information need to be modeled: the number of potential miRNA binding sites present in a target mRNA and the genomic context of each individual site. Single model structures insufficiently cope with this complex training data structure, consisting of feature vectors of unequal length as a consequence of the varying number of miRNA binding sites in different mRNAs. To circumvent this problem, we developed a two-layered, stacked model, in which the influence of binding site context is separately modeled. Using logistic regression and random forests, we applied the stacked model approach to a unique data set of 7990 probed miRNA-mRNA interactions, hereby including the largest number of miRNAs in model training to date. Compared to lower-complexity models, a particular stacked model, named miSTAR (miRNA stacked model target prediction; www.mi-star.org), displays a higher general performance and precision on top scoring predictions. More importantly, our model outperforms published and widely used miRNA target prediction algorithms. Finally, we highlight flaws in cross-validation schemes for evaluation of miRNA target prediction models and adopt a more fair and stringent approach.


Assuntos
Regiões 3' não Traduzidas , MicroRNAs/metabolismo , Modelos Genéticos , Algoritmos , Sítios de Ligação , Humanos , Aprendizado de Máquina , RNA Mensageiro/metabolismo , Software
16.
Cytometry A ; 93(2): 201-212, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29266796

RESUMO

The analysis of microbial populations is fundamental, not only for developing a deeper understanding of microbial communities but also for their engineering in biotechnological applications. Many methods have been developed to study their characteristics and over the last few decades, molecular analysis tools, such as DNA sequencing, have been used with considerable success to identify the composition of microbial populations. Recently, flow cytometric fingerprinting is emerging as a promising and powerful method to analyze bacterial populations. So far, these methods have primarily been used to observe shifts in the composition of microbial communities of natural samples. In this article, we apply a flow cytometric fingerprinting method to discriminate among 29 Lactobacillus strains. Our results indicate that it is possible to discriminate among 27 Lactobacillus strains by staining with SYBR green I and that the discriminatory power can be increased by combined SYBR green I and propidium iodide staining. Furthermore, we illustrate the impact of physiological changes on the fingerprinting method by demonstrating how flow cytometric fingerprinting is able to discriminate the different growth phases of a microbial culture. The sensitivity of the method is assessed by its ability to detect changes in the relative abundance of a mix of polystyrene beads down to 1.2%. When a mix of bacteria was used, the sensitivity was as between 1.2% and 5%. The presented data demonstrate that flow cytometric fingerprinting is a sensitive and reproducible technique with the potential to be applied as a method for the dereplication of bacterial isolates. © 2017 International Society for Advancement of Cytometry.


Assuntos
Impressões Digitais de DNA/métodos , Citometria de Fluxo/métodos , Lactobacillus/genética , Microbiota/genética
17.
Neural Comput ; 30(8): 2245-2283, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29894652

RESUMO

Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction, or network inference problems. During the past decade, kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression, and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency, and spectral filtering properties. Our theoretical results provide valuable insights into assessing the advantages and limitations of existing pairwise learning methods.

18.
Chaos ; 28(12): 123124, 2018 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30599525

RESUMO

Theoretical and experimental research studies have shown that ecosystems governed by non-transitive competition networks tend to maintain high levels of biodiversity. The theoretical body of work, however, has mainly focused on competition networks in which the outcomes of competition events are predetermined and hence deterministic, and where all species are identical up to their competitive relationships, an assumption that may limit the applicability of theoretical results to real-life situations. In this paper, we aim to probe the robustness of the link between biodiversity and non-transitive competition by introducing a three-dimensional winning probability parameter space, making the outcomes of competition events in a three-species in silico ecosystem uncertain. While two degenerate points in this parameter space have been the subject of previous studies, we investigate the remaining settings, which equip the species with distinct competitive abilities. We find that the impact of this modification depends on the spatial dimension of the system. When the system is well mixed, it collapses to monoculture, as is also the case in the non-transitive deterministic setting. In one dimension, chaotic patterns emerge, which tend to maintain biodiversity, and a power law relates the time that species manage to coexist to the degree of uncertainty regarding competition event outcomes. In two dimensions, the formation of spiral wave patterns ensures that biodiversity is maintained for moderate degrees of uncertainty, while considerable deviations from the non-transitive deterministic setting have strong negative effects on species coexistence. It can hence be concluded that non-transitive competition can still produce coexistence when the assumption of deterministic competition is abandoned. When the system collapses to monoculture, one observes a "survival of the strongest" law, as the species that has the highest probability of defeating its competitors has the best odds to become the sole survivor.

19.
Chaos ; 26(12): 123121, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-28039986

RESUMO

Biodiversity has a critical impact on ecosystem functionality and stability, and thus the current biodiversity crisis has motivated many studies of the mechanisms that sustain biodiversity, a notable example being non-transitive or cyclic competition. We therefore extend existing microscopic models of communities with cyclic competition by incorporating resource dependence in demographic processes, characteristics of natural systems often oversimplified or overlooked by modellers. The spatially explicit nature of our individual-based model of three interacting species results in the formation of stable spatial structures, which have significant effects on community functioning, in agreement with experimental observations of pattern formation in microbial communities.


Assuntos
Dinâmica Populacional , Biodiversidade , Simulação por Computador , Ecossistema , Modelos Biológicos
20.
J Proteome Res ; 14(4): 1792-8, 2015 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-25714903

RESUMO

A growing number of proteogenomics and metaproteomics studies indicate potential limitations of the application of the "decoy" database paradigm used to separate correct peptide identifications from incorrect ones in traditional shotgun proteomics. We therefore propose a binary classifier called Nokoi that allows fast yet reliable decoy-free separation of correct from incorrect peptide-to-spectrum matches (PSMs). Nokoi was trained on a very large collection of heterogeneous data using ranks supplied by the Mascot search engine to label correct and incorrect PSMs. We show that Nokoi outperforms Mascot and achieves a performance very close to that of Percolator at substantially higher processing speeds.


Assuntos
Algoritmos , Peptídeos/isolamento & purificação , Proteômica/métodos , Software , Bases de Dados de Proteínas , Modelos Logísticos , Aprendizado de Máquina
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa