Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Bioinformatics ; 37(10): 1360-1366, 2021 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-33444437

RESUMEN

MOTIVATION: Population-level genetic variation enables competitiveness and niche specialization in microbial communities. Despite the difficulty in culturing many microbes from an environment, we can still study these communities by isolating and sequencing DNA directly from an environment (metagenomics). Recovering the genomic sequences of all isoforms of a given gene across all organisms in a metagenomic sample would aid evolutionary and ecological insights into microbial ecosystems with potential benefits for medicine and biotechnology. A significant obstacle to this goal arises from the lack of a computationally tractable solution that can recover these sequences from sequenced read fragments. This poses a problem analogous to reconstructing the two sequences that make up the genome of a diploid organism (i.e. haplotypes) but for an unknown number of individuals and haplotypes. RESULTS: The problem of single individual haplotyping was first formalized by Lancia et al. in 2001. Now, nearly two decades later, we discuss the complexity of 'haplotyping' metagenomic samples, with a new formalization of Lancia et al.'s data structure that allows us to effectively extend the single individual haplotype problem to microbial communities. This work describes and formalizes the problem of recovering genes (and other genomic subsequences) from all individuals within a complex community sample, which we term the metagenomic individual haplotyping problem. We also provide software implementations for a pairwise single nucleotide variant (SNV) co-occurrence matrix and greedy graph traversal algorithm. AVAILABILITY AND IMPLEMENTATION: Our reference implementation of the described pairwise SNV matrix (Hansel) and greedy haplotype path traversal algorithm (Gretel) is open source, MIT licensed and freely available online at github.com/samstudio8/hansel and github.com/samstudio8/gretel, respectively.

2.
Expert Rev Proteomics ; 13(5): 495-511, 2016 05.
Artículo en Inglés | MEDLINE | ID: mdl-27031651

RESUMEN

With the current expanded technical capabilities to perform mass spectrometry-based biomedical proteomics experiments, an improved focus on the design of experiments is crucial. As it is clear that ignoring the importance of a good design leads to an unprecedented rate of false discoveries which would poison our results, more and more tools are developed to help researchers designing proteomic experiments. In this review, we apply statistical thinking to go through the entire proteomics workflow for biomarker discovery and validation and relate the considerations that should be made at the level of hypothesis building, technology selection, experimental design and the optimization of the experimental parameters.


Asunto(s)
Espectrometría de Masas/métodos , Proteómica/métodos , Proyectos de Investigación , Humanos , Proteómica/estadística & datos numéricos , Proteómica/tendencias
3.
Proteomics ; 14(4-5): 353-66, 2014 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24323524

RESUMEN

Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.


Asunto(s)
Inteligencia Artificial , Biología Computacional , Proteómica/métodos , Estándares de Referencia , Proyectos de Investigación
4.
Bioinformatics ; 29(15): 1913-4, 2013 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-23709496

RESUMEN

SUMMARY: We present PIUS, a tool that identifies peptides from tandem mass spectrometry data by analyzing the six-frame translation of a complete genome. It differs from earlier studies that have performed such a genomic search in two ways: (i) it considers a larger search space and (ii) it is designed for natural peptide identification rather than proteomics. Differently from other peptidomics tools designed for genome-wide searches, PIUS does not limit the analysis to a set of sequences that match a list of de novo reconstructions. AVAILABILITY: Source code, executables and a detailed technical report are freely available at http://dtai.cs.kuleuven.be/ml/systems/pius. CONTACT: eduardo.costa@cs.kuleuven.be SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Péptidos/química , Programas Informáticos , Espectrometría de Masas en Tándem , Algoritmos , Animales , Línea Celular , Bases de Datos de Proteínas , Genoma , Genómica , Ratones , Péptidos/análisis , Proteómica/métodos , Análisis de Secuencia de Proteína
5.
Mol Inform ; 42(3): e2200232, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36529710

RESUMEN

Maximum common substructures (MCS) have received a lot of attention in the chemoinformatics community. They are typically used as a similarity measure between molecules, showing high predictive performance when used in classification tasks, while being easily explainable substructures. In the present work, we applied the Pairwise Maximum Common Subgraph Feature Generation (PMCSFG) algorithm to automatically detect toxicophores (structural alerts) and to compute fingerprints based on MCS. We present a comparison between our MCS-based fingerprints and 12 well-known chemical fingerprints when used as features in machine learning models. We provide an experimental evaluation and discuss the usefulness of the different methods on mutagenicity data. The features generated by the MCS method have a state-of-the-art performance when predicting mutagenicity, while they are more interpretable than the traditional chemical fingerprints.


Asunto(s)
Algoritmos , Mutágenos , Mutágenos/química , Mutagénesis , Aprendizaje Automático
6.
Science ; 355(6327): 820-826, 2017 02 24.
Artículo en Inglés | MEDLINE | ID: mdl-28219971

RESUMEN

It is still not possible to predict whether a given molecule will have a perceived odor or what olfactory percept it will produce. We therefore organized the crowd-sourced DREAM Olfaction Prediction Challenge. Using a large olfactory psychophysical data set, teams developed machine-learning algorithms to predict sensory attributes of molecules based on their chemoinformatic features. The resulting models accurately predicted odor intensity and pleasantness and also successfully predicted 8 among 19 rated semantic descriptors ("garlic," "fish," "sweet," "fruit," "burnt," "spices," "flower," and "sour"). Regularized linear models performed nearly as well as random forest-based ones, with a predictive accuracy that closely approaches a key theoretical limit. These models help to predict the perceptual qualities of virtually any molecule with high accuracy and also reverse-engineer the smell of a molecule.


Asunto(s)
Odorantes , Percepción Olfatoria , Olfato , Adulto , Conjuntos de Datos como Asunto , Humanos , Masculino , Modelos Biológicos
7.
J R Soc Interface ; 12(104): 20141289, 2015 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-25652463

RESUMEN

There is an urgent need to make drug discovery cheaper and faster. This will enable the development of treatments for diseases currently neglected for economic reasons, such as tropical and orphan diseases, and generally increase the supply of new drugs. Here, we report the Robot Scientist 'Eve' designed to make drug discovery more economical. A Robot Scientist is a laboratory automation system that uses artificial intelligence (AI) techniques to discover scientific knowledge through cycles of experimentation. Eve integrates and automates library-screening, hit-confirmation, and lead generation through cycles of quantitative structure activity relationship learning and testing. Using econometric modelling we demonstrate that the use of AI to select compounds economically outperforms standard drug screening. For further efficiency Eve uses a standardized form of assay to compute Boolean functions of compound properties. These assays can be quickly and cheaply engineered using synthetic biology, enabling more targets to be assayed for a given budget. Eve has repositioned several drugs against specific targets in parasites that cause tropical diseases. One validated discovery is that the anti-cancer compound TNP-470 is a potent inhibitor of dihydrofolate reductase from the malaria-causing parasite Plasmodium vivax.


Asunto(s)
Diseño de Fármacos , Reposicionamiento de Medicamentos , Enfermedades Raras/tratamiento farmacológico , Tecnología Farmacéutica/tendencias , Algoritmos , Antineoplásicos/uso terapéutico , Automatización , Evaluación Preclínica de Medicamentos , Humanos , Malaria Vivax/tratamiento farmacológico , Modelos Estadísticos , Plasmodium vivax/efectos de los fármacos , Relación Estructura-Actividad Cuantitativa , Análisis de Regresión , Reproducibilidad de los Resultados , Programas Informáticos , Medicina Tropical
8.
J Biomed Semantics ; 4 Suppl 1: S7, 2013 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-23734675

RESUMEN

The theory of probability is widely used in biomedical research for data analysis and modelling. In previous work the probabilities of the research hypotheses have been recorded as experimental metadata. The ontology HELO is designed to support probabilistic reasoning, and provides semantic descriptors for reporting on research that involves operations with probabilities. HELO explicitly links research statements such as hypotheses, models, laws, conclusions, etc. to the associated probabilities of these statements being true. HELO enables the explicit semantic representation and accurate recording of probabilities in hypotheses, as well as the inference methods used to generate and update those hypotheses. We demonstrate the utility of HELO on three worked examples: changes in the probability of the hypothesis that sirtuins regulate human life span; changes in the probability of hypotheses about gene functions in the S. cerevisiae aromatic amino acid pathway; and the use of active learning in drug design (quantitative structure activity relation learning), where a strategy for the selection of compounds with the highest probability of improving on the best known compound was used. HELO is open source and available at https://github.com/larisa-soldatova/HELO.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA