Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
1.
PLoS Comput Biol ; 20(8): e1011831, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39102416

RESUMEN

Bacteriophages (phages) are viruses that infect bacteria. Many of them produce specific enzymes called depolymerases to break down external polysaccharide structures. Accurate annotation and domain identification of these depolymerases are challenging due to their inherent sequence diversity. Hence, we present DepoScope, a machine learning tool that combines a fine-tuned ESM-2 model with a convolutional neural network to identify depolymerase sequences and their enzymatic domains precisely. To accomplish this, we curated a dataset from the INPHARED phage genome database, created a polysaccharide-degrading domain database, and applied sequential filters to construct a high-quality dataset, which is subsequently used to train DepoScope. Our work is the first approach that combines sequence-level predictions with amino-acid-level predictions for accurate depolymerase detection and functional domain identification. In that way, we believe that DepoScope can greatly enhance our understanding of phage-host interactions at the level of depolymerases.


Asunto(s)
Bacteriófagos , Biología Computacional , Bacteriófagos/genética , Bacteriófagos/enzimología , Biología Computacional/métodos , Anotación de Secuencia Molecular , Proteínas Virales/genética , Proteínas Virales/metabolismo , Proteínas Virales/química , Redes Neurales de la Computación , Aprendizaje Automático , Programas Informáticos , Dominios Proteicos , Genoma Viral/genética , Hidrolasas de Éster Carboxílico/genética , Hidrolasas de Éster Carboxílico/metabolismo , Hidrolasas de Éster Carboxílico/química
2.
Vet Res ; 55(1): 15, 2024 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-38317242

RESUMEN

This study investigated the role of causative infectious agents in ulceration of the non-glandular part of the porcine stomach (pars oesophagea). In total, 150 stomachs from slaughter pigs were included, 75 from pigs that received a meal feed, 75 from pigs that received an equivalent pelleted feed with a smaller particle size. The pars oesophagea was macroscopically examined after slaughter. (q)PCR assays for H. suis, F. gastrosuis and H. pylori-like organisms were performed, as well as 16S rRNA sequencing for pars oesophagea microbiome analyses. All 150 pig stomachs showed lesions. F. gastrosuis was detected in 115 cases (77%) and H. suis in 117 cases (78%), with 92 cases (61%) of co-infection; H. pylori-like organisms were detected in one case. Higher infectious loads of H. suis increased the odds of severe gastric lesions (OR = 1.14, p = 0.038), while the presence of H. suis infection in the pyloric gland zone increased the probability of pars oesophageal erosions [16.4% (95% CI 0.6-32.2%)]. The causal effect of H. suis was mediated by decreased pars oesophageal microbiome diversity [-1.9% (95% CI - 5.0-1.2%)], increased abundances of Veillonella and Campylobacter spp., and decreased abundances of Lactobacillus, Escherichia-Shigella, and Enterobacteriaceae spp. Higher infectious loads of F. gastrosuis in the pars oesophagea decreased the odds of severe gastric lesions (OR = 0.8, p = 0.0014). Feed pelleting had no significant impact on the prevalence of severe gastric lesions (OR = 1.72, p = 0.28). H. suis infections are a risk factor for ulceration of the porcine pars oesophagea, probably mediated through alterations in pars oesophageal microbiome diversity and composition.


Asunto(s)
Fusobacterium , Infecciones por Helicobacter , Helicobacter heilmannii , Microbiota , Úlcera Gástrica , Enfermedades de los Porcinos , Animales , Porcinos , Úlcera Gástrica/microbiología , Úlcera Gástrica/patología , Úlcera Gástrica/veterinaria , ARN Ribosómico 16S , Enfermedades de los Porcinos/microbiología , Infecciones por Helicobacter/veterinaria , Infecciones por Helicobacter/microbiología , Mucosa Gástrica
3.
Bioinformatics ; 38(4): 1144-1145, 2022 01 27.
Artículo en Inglés | MEDLINE | ID: mdl-34788379

RESUMEN

SUMMARY: In combinatorial biotechnology, it is crucial for screening experiments to sufficiently cover the design space. In the BioCCP.jl package (Julia), we provide functions for minimum sample size determination based on the mathematical framework coined the Coupon Collector Problem. AVAILABILITY AND IMPLEMENTATION: BioCCP.jl, including source code, documentation and Pluto notebooks, is available at https://github.com/kirstvh/BioCCP.jl.


Asunto(s)
Documentación , Programas Informáticos , Abejas , Animales
4.
Brief Bioinform ; 21(1): 262-271, 2020 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-30329015

RESUMEN

Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings. In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models. The machine learning techniques with the algebraic shortcuts are implemented in the RLScore software package: https://github.com/aatapa/RLScore.

5.
Clin Chem ; 68(7): 906-916, 2022 07 03.
Artículo en Inglés | MEDLINE | ID: mdl-35266984

RESUMEN

BACKGROUND: Synthetic cannabinoid receptor agonists (SCRAs) are amongst the largest groups of new psychoactive substances (NPS). Their often high activity at the CB1 cannabinoid receptor frequently results in intoxication, imposing serious health risks. Hence, continuous monitoring of these compounds is important, but challenged by the rapid emergence of novel analogues that are missed by traditional targeted detection strategies. We addressed this need by performing an activity-based, universal screening on a large set (n = 968) of serum samples from patients presenting to the emergency department with acute recreational drug or NPS toxicity. METHODS: We assessed the performance of an activity-based method in detecting newly circulating SCRAs compared with liquid chromatography coupled to high-resolution mass spectrometry. Additionally, we developed and evaluated machine learning models to reduce the screening workload by automating interpretation of the activity-based screening output. RESULTS: Activity-based screening delivered outstanding performance, with a sensitivity of 94.6% and a specificity of 98.5%. Furthermore, the developed machine learning models allowed accurate distinction between positive and negative patient samples in an automatic manner, closely matching the manual scoring of samples. The performance of the model depended on the predefined threshold, e.g., at a threshold of 0.055, sensitivity and specificity were both 94.0%. CONCLUSION: The activity-based bioassay is an ideal candidate for untargeted screening of novel SCRAs. The combination of this universal screening assay and a machine learning approach for automated sample scoring is a promising complement to conventional analytical methods in clinical practice.


Asunto(s)
Cannabinoides , Drogas Ilícitas , Agonistas de Receptores de Cannabinoides/farmacología , Cromatografía Liquida/métodos , Humanos , Aprendizaje Automático
6.
Limnol Oceanogr ; 67(8): 1647-1669, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-36247386

RESUMEN

Plankton imaging systems supported by automated classification and analysis have improved ecologists' ability to observe aquatic ecosystems. Today, we are on the cusp of reliably tracking plankton populations with a suite of lab-based and in situ tools, collecting imaging data at unprecedentedly fine spatial and temporal scales. But these data have potential well beyond examining the abundances of different taxa; the individual images themselves contain a wealth of information on functional traits. Here, we outline traits that could be measured from image data, suggest machine learning and computer vision approaches to extract functional trait information from the images, and discuss promising avenues for novel studies. The approaches we discuss are data agnostic and are broadly applicable to imagery of other aquatic or terrestrial organisms.

7.
Entropy (Basel) ; 23(6)2021 Jun 02.
Artículo en Inglés | MEDLINE | ID: mdl-34199402

RESUMEN

Shannon's entropy measure is a popular means for quantifying ecological diversity. We explore how one can use information-theoretic measures (that are often called indices in ecology) on joint ensembles to study the diversity of species interaction networks. We leverage the little-known balance equation to decompose the network information into three components describing the species abundance, specificity, and redundancy. This balance reveals that there exists a fundamental trade-off between these components. The decomposition can be straightforwardly extended to analyse networks through time as well as space, leading to the corresponding notions for alpha, beta, and gamma diversity. Our work aims to provide an accessible introduction for ecologists. To this end, we illustrate the interpretation of the components on numerous real networks. The corresponding code is made available to the community in the specialised Julia package EcologicalNetworks.jl.

8.
Sensors (Basel) ; 20(11)2020 May 28.
Artículo en Inglés | MEDLINE | ID: mdl-32481619

RESUMEN

The study of the dynamic responses of plants to short-term environmental changes is becoming increasingly important in basic plant science, phenotyping, breeding, crop management, and modelling. These short-term variations are crucial in plant adaptation to new environments and, consequently, in plant fitness and productivity. Scalable, versatile, accurate, and low-cost data-logging solutions are necessary to advance these fields and complement existing sensing platforms such as high-throughput phenotyping. However, current data logging and sensing platforms do not meet the requirements to monitor these responses. Therefore, a new modular data logging platform was designed, named Gloxinia. Different sensor boards are interconnected depending upon the needs, with the potential to scale to hundreds of sensors in a distributed sensor system. To demonstrate the architecture, two sensor boards were designed-one for single-ended measurements and one for lock-in amplifier based measurements, named Sylvatica and Planalta, respectively. To evaluate the performance of the system in small setups, a small-scale trial was conducted in a growth chamber. Expected plant dynamics were successfully captured, indicating proper operation of the system. Though a large scale trial was not performed, we expect the system to scale very well to larger setups. Additionally, the platform is open-source, enabling other users to easily build upon our work and perform application-specific optimisations.


Asunto(s)
Fitomejoramiento , Fenómenos Fisiológicos de las Plantas , Plantas , Programas Informáticos
9.
Environ Sci Technol ; 53(24): 14459-14469, 2019 12 17.
Artículo en Inglés | MEDLINE | ID: mdl-31682110

RESUMEN

Many disciplines rely on testing combinations of compounds, materials, proteins, or bacterial species to drive scientific discovery. It is time-consuming and expensive to determine experimentally, via trial-and-error or random selection approaches, which of the many possible combinations will lead to desirable outcomes. Hence, there is a pressing need for more rational and efficient experimental design approaches to reduce experimental effort. In this work, we demonstrate the potential of machine learning methods for the in silico selection of promising co-culture combinations in the application of bioaugmentation. We use the example of pollutant removal in drinking water treatment plants, which can be achieved using co-cultures of a specialized pollutant degrader with combinations of bacterial isolates. To reduce the experimental effort needed to discover high-performing combinations, we propose a data-driven experimental design. Based on a dataset of mineralization performance for all pairs of 13 bacterial species co-cultured with MSH1, we built a Gaussian process regression model to predict the Gompertz mineralization parameters of the co-cultures of two and three species, based on the single-strain parameters. We subsequently used this model in a Bayesian optimization scheme to suggest potentially high-performing combinations of bacteria. We achieved good performance with this approach, both for predicting mineralization parameters and for selecting effective co-cultures, despite the limited dataset. As a novel application of Bayesian optimization in bioremediation, this experimental design approach has promising applications for highlighting co-culture combinations for in vitro testing in various settings, to lessen the experimental burden and perform more targeted screenings.


Asunto(s)
Purificación del Agua , Teorema de Bayes , Biodegradación Ambiental , Técnicas de Cocultivo , Simulación por Computador
10.
Nucleic Acids Res ; 45(7): e51, 2017 04 20.
Artículo en Inglés | MEDLINE | ID: mdl-27986855

RESUMEN

In microRNA (miRNA) target prediction, typically two levels of information need to be modeled: the number of potential miRNA binding sites present in a target mRNA and the genomic context of each individual site. Single model structures insufficiently cope with this complex training data structure, consisting of feature vectors of unequal length as a consequence of the varying number of miRNA binding sites in different mRNAs. To circumvent this problem, we developed a two-layered, stacked model, in which the influence of binding site context is separately modeled. Using logistic regression and random forests, we applied the stacked model approach to a unique data set of 7990 probed miRNA-mRNA interactions, hereby including the largest number of miRNAs in model training to date. Compared to lower-complexity models, a particular stacked model, named miSTAR (miRNA stacked model target prediction; www.mi-star.org), displays a higher general performance and precision on top scoring predictions. More importantly, our model outperforms published and widely used miRNA target prediction algorithms. Finally, we highlight flaws in cross-validation schemes for evaluation of miRNA target prediction models and adopt a more fair and stringent approach.


Asunto(s)
Regiones no Traducidas 3' , MicroARNs/metabolismo , Modelos Genéticos , Algoritmos , Sitios de Unión , Humanos , Aprendizaje Automático , ARN Mensajero/metabolismo , Programas Informáticos
11.
Neural Comput ; 30(8): 2245-2283, 2018 08.
Artículo en Inglés | MEDLINE | ID: mdl-29894652

RESUMEN

Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction, or network inference problems. During the past decade, kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression, and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency, and spectral filtering properties. Our theoretical results provide valuable insights into assessing the advantages and limitations of existing pairwise learning methods.

12.
J Proteome Res ; 14(4): 1792-8, 2015 Apr 03.
Artículo en Inglés | MEDLINE | ID: mdl-25714903

RESUMEN

A growing number of proteogenomics and metaproteomics studies indicate potential limitations of the application of the "decoy" database paradigm used to separate correct peptide identifications from incorrect ones in traditional shotgun proteomics. We therefore propose a binary classifier called Nokoi that allows fast yet reliable decoy-free separation of correct from incorrect peptide-to-spectrum matches (PSMs). Nokoi was trained on a very large collection of heterogeneous data using ranks supplied by the Mascot search engine to label correct and incorrect PSMs. We show that Nokoi outperforms Mascot and achieves a performance very close to that of Percolator at substantially higher processing speeds.


Asunto(s)
Algoritmos , Péptidos/aislamiento & purificación , Proteómica/métodos , Programas Informáticos , Bases de Datos de Proteínas , Modelos Logísticos , Aprendizaje Automático
13.
Sci Adv ; 10(3): eadi3621, 2024 Jan 19.
Artículo en Inglés | MEDLINE | ID: mdl-38241375

RESUMEN

Design in synthetic biology is typically goal oriented, aiming to repurpose or optimize existing biological functions, augmenting biology with new-to-nature capabilities, or creating life-like systems from scratch. While the field has seen many advances, bottlenecks in the complexity of the systems built are emerging and designs that function in the lab often fail when used in real-world contexts. Here, we propose an open-ended approach to biological design, with the novelty of designed biology being at least as important as how well it fulfils its goal. Rather than solely focusing on optimization toward a single best design, designing with novelty in mind may allow us to move beyond the diminishing returns we see in performance for most engineered biology. Research from the artificial life community has demonstrated that embracing novelty can automatically generate innovative and unexpected solutions to challenging problems beyond local optima. Synthetic biology offers the ideal playground to explore more creative approaches to biological design.


Asunto(s)
Evolución Biológica , Biología Sintética , Estudios Longitudinales
14.
Nat Commun ; 15(1): 3640, 2024 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-38684714

RESUMEN

Careful consideration of how we approach design is crucial to all areas of biotechnology. However, choosing or developing an effective design methodology is not always easy as biology, unlike most areas of engineering, is able to adapt and evolve. Here, we put forward that design and evolution follow a similar cyclic process and therefore all design methods, including traditional design, directed evolution, and even random trial and error, exist within an evolutionary design spectrum. This contrasts with conventional views that often place these methods at odds and provides a valuable framework for unifying engineering approaches for challenging biological design problems.


Asunto(s)
Evolución Molecular Dirigida , Bioingeniería/métodos , Evolución Biológica , Biotecnología/métodos , Evolución Molecular Dirigida/métodos , Biología Sintética/métodos
15.
Nat Commun ; 15(1): 4355, 2024 May 22.
Artículo en Inglés | MEDLINE | ID: mdl-38778023

RESUMEN

Phages are increasingly considered promising alternatives to target drug-resistant bacterial pathogens. However, their often-narrow host range can make it challenging to find matching phages against bacteria of interest. Current computational tools do not accurately predict interactions at the strain level in a way that is relevant and properly evaluated for practical use. We present PhageHostLearn, a machine learning system that predicts strain-level interactions between receptor-binding proteins and bacterial receptors for Klebsiella phage-bacteria pairs. We evaluate this system both in silico and in the laboratory, in the clinically relevant setting of finding matching phages against bacterial strains. PhageHostLearn reaches a cross-validated ROC AUC of up to 81.8% in silico and maintains this performance in laboratory validation. Our approach provides a framework for developing and evaluating phage-host prediction methods that are useful in practice, which we believe to be a meaningful contribution to the machine-learning-guided development of phage therapeutics and diagnostics.


Asunto(s)
Bacteriófagos , Especificidad del Huésped , Klebsiella , Aprendizaje Automático , Bacteriófagos/fisiología , Klebsiella/virología , Simulación por Computador
16.
PLOS Digit Health ; 3(7): e0000533, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39052668

RESUMEN

BACKGROUND: Disability progression is a key milestone in the disease evolution of people with multiple sclerosis (PwMS). Prediction models of the probability of disability progression have not yet reached the level of trust needed to be adopted in the clinic. A common benchmark to assess model development in multiple sclerosis is also currently lacking. METHODS: Data of adult PwMS with a follow-up of at least three years from 146 MS centers, spread over 40 countries and collected by the MSBase consortium was used. With basic inclusion criteria for quality requirements, it represents a total of 15, 240 PwMS. External validation was performed and repeated five times to assess the significance of the results. Transparent Reporting for Individual Prognosis Or Diagnosis (TRIPOD) guidelines were followed. Confirmed disability progression after two years was predicted, with a confirmation window of six months. Only routinely collected variables were used such as the expanded disability status scale, treatment, relapse information, and MS course. To learn the probability of disability progression, state-of-the-art machine learning models were investigated. The discrimination performance of the models is evaluated with the area under the receiver operator curve (ROC-AUC) and under the precision recall curve (AUC-PR), and their calibration via the Brier score and the expected calibration error. All our preprocessing and model code are available at https://gitlab.com/edebrouwer/ms_benchmark, making this task an ideal benchmark for predicting disability progression in MS. FINDINGS: Machine learning models achieved a ROC-AUC of 0⋅71 ± 0⋅01, an AUC-PR of 0⋅26 ± 0⋅02, a Brier score of 0⋅1 ± 0⋅01 and an expected calibration error of 0⋅07 ± 0⋅04. The history of disability progression was identified as being more predictive for future disability progression than the treatment or relapses history. CONCLUSIONS: Good discrimination and calibration performance on an external validation set is achieved, using only routinely collected variables. This suggests machine-learning models can reliably inform clinicians about the future occurrence of progression and are mature for a clinical impact study.

17.
Front Plant Sci ; 14: 1187573, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37588419

RESUMEN

Electrochemical impedance spectroscopy has emerged over the past decade as an efficient, non-destructive method to investigate various (eco-)physiological and morphological properties of plants. This work reviews the state-of-the-art of impedance spectra modeling for plant applications. In addition to covering the traditional, widely-used representations of electrochemical impedance spectra, we also consider the more recent machine-learning-based approaches.

18.
Front Plant Sci ; 14: 1299208, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38293629

RESUMEN

Historically, plant and crop sciences have been quantitative fields that intensively use measurements and modeling. Traditionally, researchers choose between two dominant modeling approaches: mechanistic plant growth models or data-driven, statistical methodologies. At the intersection of both paradigms, a novel approach referred to as "simulation intelligence", has emerged as a powerful tool for comprehending and controlling complex systems, including plants and crops. This work explores the transformative potential for the plant science community of the nine simulation intelligence motifs, from understanding molecular plant processes to optimizing greenhouse control. Many of these concepts, such as surrogate models and agent-based modeling, have gained prominence in plant and crop sciences. In contrast, some motifs, such as open-ended optimization or program synthesis, still need to be explored further. The motifs of simulation intelligence can potentially revolutionize breeding and precision farming towards more sustainable food production.

19.
Sci Rep ; 12(1): 12594, 2022 07 22.
Artículo en Inglés | MEDLINE | ID: mdl-35869238

RESUMEN

Plants are complex organisms subject to variable environmental conditions, which influence their physiology and phenotype dynamically. We propose to interpret plants as reservoirs in physical reservoir computing. The physical reservoir computing paradigm originates from computer science; instead of relying on Boolean circuits to perform computations, any substrate that exhibits complex non-linear and temporal dynamics can serve as a computing element. Here, we present the first application of physical reservoir computing with plants. In addition to investigating classical benchmark tasks, we show that Fragaria × ananassa (strawberry) plants can solve environmental and eco-physiological tasks using only eight leaf thickness sensors. Although the results indicate that plants are not suitable for general-purpose computation but are well-suited for eco-physiological tasks such as photosynthetic rate and transpiration rate. Having the means to investigate the information processing by plants improves quantification and understanding of integrative plant responses to dynamic changes in their environment. This first demonstration of physical reservoir computing with plants is key for transitioning towards a holistic view of phenotyping and early stress detection in precision agriculture applications since physical reservoir computing enables us to analyse plant responses in a general way: environmental changes are processed by plants to optimise their phenotype.


Asunto(s)
Fragaria , Agricultura , Fragaria/fisiología , Fotosíntesis , Hojas de la Planta
20.
Viruses ; 14(6)2022 06 17.
Artículo en Inglés | MEDLINE | ID: mdl-35746800

RESUMEN

Receptor-binding proteins (RBPs) of bacteriophages initiate the infection of their corresponding bacterial host and act as the primary determinant for host specificity. The ever-increasing amount of sequence data enables the development of predictive models for the automated identification of RBP sequences. However, the development of such models is challenged by the inconsistent or missing annotation of many phage proteins. Recently developed tools have started to bridge this gap but are not specifically focused on RBP sequences, for which many different annotations are available. We have developed two parallel approaches to alleviate the complex identification of RBP sequences in phage genomic data. The first combines known RBP-related hidden Markov models (HMMs) from the Pfam database with custom-built HMMs to identify phage RBPs based on protein domains. The second approach consists of training an extreme gradient boosting classifier that can accurately discriminate between RBPs and other phage proteins. We explained how these complementary approaches can reinforce each other in identifying RBP sequences. In addition, we benchmarked our methods against the recently developed PhANNs tool. Our best performing model reached a precision-recall area-under-the-curve of 93.8% and outperformed PhANNs on an independent test set, reaching an F1-score of 84.0% compared to 69.8%.


Asunto(s)
Receptores de Bacteriógrafos , Bacteriófagos , Bacteriófagos/genética , Bacteriófagos/metabolismo , Proteínas Portadoras/metabolismo , Unión Proteica , Proteínas/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA