Pesquisa | Portal Regional da BVS

1.

Multi-objective context-guided consensus of a massive array of techniques for the inference of Gene Regulatory Networks.

Segura-Ortiz, Adrián; García-Nieto, José; Aldana-Montes, José F; Navas-Delgado, Ismael.

Comput Biol Med ; 179: 108850, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-39013340

RESUMO

BACKGROUND AND OBJECTIVE: Gene Regulatory Network (GRN) inference is a fundamental task in biology and medicine, as it enables a deeper understanding of the intricate mechanisms of gene expression present in organisms. This bioinformatics problem has been addressed in the literature through multiple computational approaches. Techniques developed for inferring from expression data have employed Bayesian networks, ordinary differential equations (ODEs), machine learning, information theory measures and neural networks, among others. The diversity of implementations and their respective customization have led to the emergence of many tools and multiple specialized domains derived from them, understood as subsets of networks with specific characteristics that are challenging to detect a priori. This specialization has introduced significant uncertainty when choosing the most appropriate technique for a particular dataset. This proposal, named MO-GENECI, builds upon the basic idea of the previous proposal GENECI and optimizes consensus among different inference techniques, through a carefully refined multi-objective evolutionary algorithm guided by various objective functions, linked to the biological context at hand. METHODS: MO-GENECI has been tested on an extensive and diverse academic benchmark of 106 gene regulatory networks from multiple sources and sizes. The evaluation of MO-GENECI compared its performance to individual techniques using key metrics (AUROC and AUPR) for gene regulatory network inference. Friedman's statistical ranking provided an ordered classification, followed by non-parametric Holm tests to determine statistical significance. RESULTS: MO-GENECI's Pareto front approximation facilitates easy selection of an appropriate solution based on generic input data characteristics. The best solution consistently emerged as the winner in all statistical tests, and in many cases, the median precision solution showed no statistically significant difference compared to the winner. CONCLUSIONS: MO-GENECI has not only demonstrated achieving more accurate results than individual techniques, but has also overcome the uncertainty associated with the initial choice due to its flexibility and adaptability. It is shown intelligently to select the most suitable techniques for each case. The source code is hosted in a public repository at GitHub under MIT license: https://github.com/AdrianSeguraOrtiz/MO-GENECI. Moreover, to facilitate its installation and use, the software associated with this implementation has been encapsulated in a Python package available at PyPI: https://pypi.org/project/geneci/.

Assuntos

Algoritmos , Redes Reguladoras de Genes , Biologia Computacional/métodos , Humanos , Teorema de Bayes , Software

2.

A deep learning LSTM-based approach for forecasting annual pollen curves: Olea and Urticaceae pollen types as a case study.

Picornell, Antonio; Hurtado, Sandro; Antequera-Gómez, María Luisa; Barba-González, Cristóbal; Ruiz-Mata, Rocío; de Gálvez-Montañez, Enrique; Recio, Marta; Trigo, María Del Mar; Aldana-Montes, José F; Navas-Delgado, Ismael.

Comput Biol Med ; 168: 107706, 2024 01.

Artigo em Inglês | MEDLINE | ID: mdl-37989073

RESUMO

Airborne pollen can trigger allergic rhinitis and other respiratory diseases in the synthesised population, which makes it one of the most relevant biological contaminants. Therefore, implementing accurate forecast systems is a priority for public health. The current forecast models are generally useful, but they falter when long time series of data are managed. The emergence of new computational techniques such as the LSTM algorithms could constitute a significant improvement for the pollen risk assessment. In this study, several LSTM variants were applied to forecast monthly pollen integrals in Málaga (southern Spain) using meteorological variables as predictors. Olea and Urticaceae pollen types were modelled as proxies of different annual pollen curves, using data from the period 1992-2022. The aims of this study were to determine the LSTM variants with the highest accuracy when forecasting monthly pollen integrals as well as to compare their performance with the traditional pollen forecast methods. The results showed that the CNN-LSTM were the most accurate when forecasting the monthly pollen integrals for both pollen types. Moreover, the traditional forecast methods were outperformed by all the LSTM variants. These findings highlight the importance of implementing LSTM models in pollen forecasting for public health and research applications.

Assuntos

Aprendizado Profundo , Olea , Urticaceae , Pólen , Espanha

3.

SALON ontology for the formal description of sequence alignments.

Benítez-Hidalgo, Antonio; Aldana-Montes, José F; Navas-Delgado, Ismael; Roldán-García, María Del Mar.

BMC Bioinformatics ; 24(1): 69, 2023 Feb 27.

Artigo em Inglês | MEDLINE | ID: mdl-36849882

RESUMO

BACKGROUND: Information provided by high-throughput sequencing platforms allows the collection of content-rich data about biological sequences and their context. Sequence alignment is a bioinformatics approach to identifying regions of similarity in DNA, RNA, or protein sequences. However, there is no consensus about the specific common terminology and representation for sequence alignments. Thus, automatically linking the wide existing knowledge about the sequences with the alignments is challenging. RESULTS: The Sequence Alignment Ontology (SALON) defines a helpful vocabulary for representing and semantically annotating pairwise and multiple sequence alignments. SALON is an OWL 2 ontology that supports automated reasoning for alignments validation and retrieving complementary information from public databases under the Open Linked Data approach. This will reduce the effort needed by scientists to interpret the sequence alignment results. CONCLUSIONS: SALON defines a full range of controlled terminology in the domain of sequence alignments. It can be used as a mediated schema to integrate data from different sources and validate acquired knowledge.

Assuntos

Biologia Computacional , Alinhamento de Sequência , Sequência de Aminoácidos , Consenso , Bases de Dados Factuais

4.

Ensemble-based genetic algorithm explainer with automized image segmentation: A case study on melanoma detection dataset.

Nematzadeh, Hossein; García-Nieto, José; Navas-Delgado, Ismael; Aldana-Montes, José F.

Comput Biol Med ; 155: 106613, 2023 03.

Artigo em Inglês | MEDLINE | ID: mdl-36764157

RESUMO

Explainable Artificial Intelligence (XAI) makes AI understandable to the human user particularly when the model is complex and opaque. Local Interpretable Model-agnostic Explanations (LIME) has an image explainer package that is used to explain deep learning models. The image explainer of LIME needs some parameters to be manually tuned by the expert in advance, including the number of top features to be seen and the number of superpixels in the segmented input image. This parameter tuning is a time-consuming task. Hence, with the aim of developing an image explainer that automizes image segmentation, this paper proposes Ensemble-based Genetic Algorithm Explainer (EGAE) for melanoma cancer detection that automatically detects and presents the informative sections of the image to the user. EGAE has three phases. First, the sparsity of chromosomes in GAs is determined heuristically. Then, multiple GAs are executed consecutively. However, the difference between these GAs are in different number of superpixels in the input image that result in different chromosome lengths. Finally, the results of GAs are ensembled using consensus and majority votings. This paper also introduces how Euclidean distance can be used to calculate the distance between the actual explanation (delineated by experts) and the calculated explanation (computed by the explainer) for accuracy measurement. Experimental results on a melanoma dataset show that EGAE automatically detects informative lesions, and it also improves the accuracy of explanation in comparison with LIME efficiently. The python codes for EGAE, the ground truths delineated by clinicians, and the melanoma detection dataset are available at https://github.com/KhaosResearch/EGAE.

Assuntos

Inteligência Artificial , Melanoma , Humanos , Óxidos

5.

GENECI: A novel evolutionary machine learning consensus-based approach for the inference of gene regulatory networks.

Segura-Ortiz, Adrián; García-Nieto, José; Aldana-Montes, José F; Navas-Delgado, Ismael.

Comput Biol Med ; 155: 106653, 2023 03.

Artigo em Inglês | MEDLINE | ID: mdl-36803795

RESUMO

Gene regulatory networks define the interactions between DNA products and other substances in cells. Increasing knowledge of these networks improves the level of detail with which the processes that trigger different diseases are described and fosters the development of new therapeutic targets. These networks are usually represented by graphs, and the primary sources for their correct construction are usually time series from differential expression data. The inference of networks from this data type has been approached differently in the literature. Mostly, computational learning techniques have been implemented, which have finally shown some specialization in specific datasets. For this reason, the need arises to create new and more robust strategies for reaching a consensus based on previous results to gain a particular capacity for generalization. This paper presents GENECI (GEne NEtwork Consensus Inference), an evolutionary machine learning approach that acts as an organizer for constructing ensembles to process the results of the main inference techniques reported in the literature and to optimize the consensus network derived from them, according to their confidence levels and topological characteristics. After its design, the proposal was confronted with datasets collected from academic benchmarks (DREAM challenges and IRMA network) to quantify its accuracy. Subsequently, it was applied to a real-world biological network of melanoma patients whose results could be contrasted with medical research collected in the literature. Finally, it has been proved that its ability to optimize the consensus of several networks leads to outstanding robustness and accuracy, gaining a certain generalization capacity after facing the inference of multiple datasets. The source code is hosted in a public repository at GitHub under MIT license: https://github.com/AdrianSeguraOrtiz/GENECI. Moreover, to facilitate its installation and use, the software associated with this implementation has been encapsulated in a python package available at PyPI: https://pypi.org/project/geneci/.

Assuntos

Redes Reguladoras de Genes , Software , Humanos , Consenso , Aprendizado de Máquina , Fatores de Tempo , Algoritmos

6.

FIMED: Flexible management of biomedical data.

Hurtado, Sandro; García-Nieto, José; Navas-Delgado, Ismael; Aldana-Montes, José F.

Comput Methods Programs Biomed ; 212: 106496, 2021 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-34740063

RESUMO

BACKGROUND AND OBJECTIVES: In the last decade, clinical trial management systems have become an essential support tool for data management and analysis in clinical research. However, these clinical tools have design limitations, since they are currently not able to cover the needs of adaptation to the continuous changes in the practice of the trials due to the heterogeneous and dynamic nature of the clinical research data. These systems are usually proprietary solutions provided by vendors for specific tasks. In this work, we propose FIMED, a software solution for the flexible management of clinical data from multiple trials, moving towards personalized medicine, which can contribute positively by improving clinical researchers quality and ease in clinical trials. METHODS: This tool allows a dynamic and incremental design of patients' profiles in the context of clinical trials, providing a flexible user interface that hides the complexity of using databases. Clinical researchers will be able to define personalized data schemas according to their needs and clinical study specifications. Thus, FIMED allows the incorporation of separate clinical data analysis from multiple trials. RESULTS: The efficiency of the software has been demonstrated by a real-world use case for a clinical assay in Melanoma disease, which has been indeed anonymized to provide a user demonstration. FIMED currently provides three data analysis and visualization components, guaranteeing a clinical exploration for gene expression data: heatmap visualization, clusterheatmap visualization, as well as gene regulatory network inference and visualization. An instance of this tool is freely available on the web at https://khaos.uma.es/fimed. It can be accessed with a demo user account, "researcher", using the password "demo". CONCLUSION: This paper shows FIMED as a flexible and user-friendly way of managing multidimensional clinical research data. Hence, without loss of generality, FIMED is flexible enough to be used in the context of any other disease where clinical data and assays are involved.

Assuntos

Gerenciamento de Dados , Software , Bases de Dados Factuais , Redes Reguladoras de Genes , Humanos , Internet , Interface Usuário-Computador

7.

Sequoya: multiobjective multiple sequence alignment in Python.

Benítez-Hidalgo, Antonio; Nebro, Antonio J; Aldana-Montes, José F.

Bioinformatics ; 36(12): 3892-3893, 2020 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-32315391

RESUMO

MOTIVATION: Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems. RESULTS: The software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization. AVAILABILITY AND IMPLEMENTATION: Sequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Evolução Biológica , Linguagens de Programação , Alinhamento de Sequência

8.

Inference of gene regulatory networks with multi-objective cellular genetic algorithm.

García-Nieto, José; Nebro, Antonio J; Aldana-Montes, José F.

Comput Biol Chem ; 80: 409-418, 2019 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-31128452

RESUMO

Reverse engineering of biochemical networks remains an important open challenge in computational systems biology. The goal of model inference is to, based on time-series gene expression data, obtain the sparse topological structure and parameters that quantitatively understand and reproduce the dynamics of biological systems. In this paper, we propose a multi-objective approach for the inference of S-System structures for Gene Regulatory Networks (GRNs) based on Pareto dominance and Pareto optimality theoretical concepts instead of the conventional single-objective evaluation of Mean Squared Error (MSE). Our motivation is that, using a multi-objective formulation for the GRN, it is possible to optimize the sparse topology of a given GRN as well as the kinetic order and rate constant parameters in a decoupled S-System, yet avoiding the use of additional penalty weights. A flexible and robust Multi-Objective Cellular Evolutionary Algorithm is adapted to perform the tasks of parameter learning and network topology inference for the proposed approach. The resulting software, called MONET, is evaluated on real-based academic and synthetic time-series of gene expression taken from the DREAM3 challenge and the IRMA in vivo datasets. The ability to reproduce biological behavior and robustness to noise is assessed and compared. The results obtained are competitive and indicate that the proposed approach offers advantages over previously used methods. In addition, MONET is able to provide experts with a set of trade-off solutions involving GRNs with different typologies and MSEs.

Assuntos

Algoritmos , Redes Reguladoras de Genes , Biologia de Sistemas/métodos , Escherichia coli/genética , Galactose/metabolismo , Glucose/metabolismo , Modelos Genéticos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo

9.

VIGLA-M: visual gene expression data analytics.

Navas-Delgado, Ismael; García-Nieto, José; López-Camacho, Esteban; Rybinski, Maciej; Lavado, Rocio; Berciano Guerrero, Miguel Ángel; Aldana-Montes, José F.

BMC Bioinformatics ; 20(Suppl 4): 150, 2019 Apr 18.

Artigo em Inglês | MEDLINE | ID: mdl-30999846

RESUMO

BACKGROUND: The analysis of gene expression levels is used in many clinical studies to know how patients evolve or to find new genetic biomarkers that could help in clinical decision making. However, the techniques and software available for these analyses are not intended for physicians, but for geneticists. However, enabling physicians to make initial discoveries on these data would benefit in the clinical assay development. RESULTS: Melanoma is a highly immunogenic tumor. Therefore, in recent years physicians have incorporated immune system altering drugs into their therapeutic arsenal against this disease, revolutionizing the treatment of patients with an advanced stage of the cancer. This has led us to explore and deepen our knowledge of the immunology surrounding melanoma, in order to optimize the approach. Within this project we have developed a database for collecting relevant clinical information for melanoma patients, including the storage of patient gene expression levels obtained from the NanoString platform (several samples are taken from each patient). The Immune Profiling Panel is used in this case. This database is being exploited through the analysis of the different expression profiles of the patients. This analysis is being done with Python, and a parallel version of the algorithms is available with Apache Spark to provide scalability as needed. CONCLUSIONS: VIGLA-M, the visual analysis tool for gene expression levels in melanoma patients is available at http://khaos.uma.es/melanoma/ . The platform with real clinical data can be accessed with a demo user account, physician, using password physician_test_7634 (if you encounter any problems, contact us at this email address: mailto: khaos@lcc.uma.es). The initial results of the analysis of gene expression levels using these tools are providing first insights into the patients' evolution. These results are promising, but larger scale tests must be developed once new patients have been sequenced, to discover new genetic biomarkers.

Assuntos

Algoritmos , Ciência de Dados , Regulação da Expressão Gênica , Análise por Conglomerados , Bases de Dados Factuais , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Melanoma/genética

10.

M2Align: parallel multiple sequence alignment with a multi-objective metaheuristic.

Zambrano-Vega, Cristian; Nebro, Antonio J; García-Nieto, José; Aldana-Montes, José F.

Bioinformatics ; 33(19): 3011-3017, 2017 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-28541404

RESUMO

MOTIVATION: Multiple sequence alignment (MSA) is an NP-complete optimization problem found in computational biology, where the time complexity of finding an optimal alignment raises exponentially along with the number of sequences and their lengths. Additionally, to assess the quality of a MSA, a number of objectives can be taken into account, such as maximizing the sum-of-pairs, maximizing the totally conserved columns, minimizing the number of gaps, or maximizing structural information based scores such as STRIKE. An approach to deal with MSA problems is to use multi-objective metaheuristics, which are non-exact stochastic optimization methods that can produce high quality solutions to complex problems having two or more objectives to be optimized at the same time. Our motivation is to provide a multi-objective metaheuristic for MSA that can run in parallel taking advantage of multi-core-based computers. RESULTS: The software tool we propose, called M2Align (Multi-objective Multiple Sequence Alignment), is a parallel and more efficient version of the three-objective optimizer for sequence alignments MO-SAStrE, able of reducing the algorithm computing time by exploiting the computing capabilities of common multi-core CPU clusters. Our performance evaluation over datasets of the benchmark BAliBASE (v3.0) shows that significant time reductions can be achieved by using up to 20 cores. Even in sequential executions, M2Align is faster than MO-SAStrE, thanks to the encoding method used for the alignments. AVAILABILITY AND IMPLEMENTATION: M2Align is an open source project hosted in GitHub, where the source code and sample datasets can be freely obtained: https://github.com/KhaosResearch/M2Align. CONTACT: antonio@lcc.uma.es. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Alinhamento de Sequência/métodos , Software , Algoritmos

11.

Biological Web Service Repositories Review.

Urdidiales-Nieto, David; Navas-Delgado, Ismael; Aldana-Montes, José F.

Mol Inform ; 36(5-6)2017 05.

Artigo em Inglês | MEDLINE | ID: mdl-27783459

RESUMO

Web services play a key role in bioinformatics enabling the integration of database access and analysis of algorithms. However, Web service repositories do not usually publish information on the changes made to their registered Web services. Dynamism is directly related to the changes in the repositories (services registered or unregistered) and at service level (annotation changes). Thus, users, software clients or workflow based approaches lack enough relevant information to decide when they should review or re-execute a Web service or workflow to get updated or improved results. The dynamism of the repository could be a measure for workflow developers to re-check service availability and annotation changes in the services of interest to them. This paper presents a review on the most well-known Web service repositories in the life sciences including an analysis of their dynamism. Freshness is introduced in this paper, and has been used as the measure for the dynamism of these repositories.

Assuntos

Disciplinas das Ciências Biológicas , Biologia Computacional , Bases de Dados Factuais , Curadoria de Dados , Armazenamento e Recuperação da Informação , Internet

12.

Molecular Docking Optimization in the Context of Multi-Drug Resistant and Sensitive EGFR Mutants.

García-Godoy, María Jesús; López-Camacho, Esteban; García-Nieto, José; Nebro, Antonio J; Aldana-Montes, José F.

Molecules ; 21(11)2016 Nov 19.

Artigo em Inglês | MEDLINE | ID: mdl-27869781

RESUMO

The human Epidermal Growth Factor (EGFR) plays an important role in signaling pathways, such as cell proliferation and migration. Mutations like G719S, L858R, T790M, G719S/T790M or T790M/L858R can alter its conformation, and, therefore, drug responses from lung cancer patients. In this context, candidate drugs are being tested and in silico studies are necessary to know how these mutations affect the ligand binding site. This problem can be tackled by using a multi-objective approach applied to the molecular docking problem. According to the literature, few studies are related to the application of multi-objective approaches by minimizing two or more objectives in drug discovery. In this study, we have used four algorithms (NSGA-II, GDE3, SMPSO and MOEA/D) to minimize two objectives: the ligand-receptor intermolecular energy and the RMSD score. We have prepared a set of instances that includes the wild-type EGFR kinase domain and the same receptor with somatic mutations, and then we assessed the performance of the algorithms by applying a quality indicator to evaluate the convergence and diversity of the reference fronts. The MOEA/D algorithm yields the best solutions to these docking problems. The obtained solutions were analyzed, showing promising results to predict candidate EGFR inhibitors by using this multi-objective approach.

Assuntos

Resistência a Múltiplos Medicamentos/genética , Resistencia a Medicamentos Antineoplásicos/genética , Receptores ErbB/química , Receptores ErbB/genética , Simulação de Acoplamento Molecular , Inibidores de Proteínas Quinases/química , Inibidores de Proteínas Quinases/farmacologia , Algoritmos , Sítios de Ligação , Humanos , Ligantes , Conformação Molecular , Simulação de Dinâmica Molecular , Ligação Proteica , Relação Quantitativa Estrutura-Atividade

13.

Dione: An OWL representation of ICD-10-CM for classifying patients' diseases.

Roldán-García, María Del Mar; García-Godoy, María Jesús; Aldana-Montes, José F.

J Biomed Semantics ; 7(1): 62, 2016 10 13.

Artigo em Inglês | MEDLINE | ID: mdl-27737720

RESUMO

BACKGROUND: Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) has been designed as standard clinical terminology for annotating Electronic Health Records (EHRs). EHRs textual information is used to classify patients' diseases into an International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) category (usually by an expert). Improving the accuracy of classification is the main purpose of using ontologies and OWL representations at the core of classification systems. In the last few years some ontologies and OWL representations for representing ICD-10-CM categories have been developed. However, they were not designed to be the basis for an automatic classification tool nor do they model ICD-10-CM inclusion terms as Web Ontology Language (OWL) axioms, which enables automatic classification. In this context we have developed Dione, an OWL representation of ICD-10-CM. RESULTS: Dione is the first OWL representation of ICD-10-CM, which is logically consistent, whose axioms define the ICD-10-CM inclusion terms by means of a methodology based on SNOMED CT/ICD-10-CM mappings. The ICD-10-CM exclusions are handled with these mappings. Dione currently contains 391,669 classes, 391,720 entity annotation axioms and 11,795 owl:equivalentClass axioms which have been constructed using 104,646 relationships extracted from the SNOMED CT/ICD-10-CM and BioPortal mappings included in Dione using the owl:intersectionOf and the owl:someValuesFrom statements. The resulting OWL representation has been classified and its consistency tested with the ELK reasoner. We have also taken three clinical records from the Virgen de la Victoria Hospital (Málaga, Spain) which have been manually annotated using SNOMED CT. These annotations have been included as instances to be classified by the reasoner. The classified instances show that Dione could be a promising ICD-10-CM OWL representation to support the classification of patients' diseases. CONCLUSIONS: Dione is a first step towards the automatic classification of patients' diseases by using SNOMED CT annotations embedded in Electronic Health Records (EHRs). The purpose of Dione is to standardise and formalise a medical terminology, thereby enabling new kinds of tools and new sets of functionalities to be developed. This in turn assists health specialists by providing classified information from EHRs and enables the automatic annotation of patients' diseases with ICD-10-CM codes.

Assuntos

Ontologias Biológicas , Doença/classificação , Humanos , Internet

14.

ReprOlive: a database with linked data for the olive tree (Olea europaea L.) reproductive transcriptome.

Carmona, Rosario; Zafra, Adoración; Seoane, Pedro; Castro, Antonio J; Guerrero-Fernández, Darío; Castillo-Castillo, Trinidad; Medina-García, Ana; Cánovas, Francisco M; Aldana-Montes, José F; Navas-Delgado, Ismael; Alché, Juan de Dios; Claros, M Gonzalo.

Front Plant Sci ; 6: 625, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26322066

RESUMO

Plant reproductive transcriptomes have been analyzed in different species due to the agronomical and biotechnological importance of plant reproduction. Here we presented an olive tree reproductive transcriptome database with samples from pollen and pistil at different developmental stages, and leaf and root as control vegetative tissues http://reprolive.eez.csic.es). It was developed from 2,077,309 raw reads to 1,549 Sanger sequences. Using a pre-defined workflow based on open-source tools, sequences were pre-processed, assembled, mapped, and annotated with expression data, descriptions, GO terms, InterPro signatures, EC numbers, KEGG pathways, ORFs, and SSRs. Tentative transcripts (TTs) were also annotated with the corresponding orthologs in Arabidopsis thaliana from TAIR and RefSeq databases to enable Linked Data integration. It results in a reproductive transcriptome comprising 72,846 contigs with average length of 686 bp, of which 63,965 (87.8%) included at least one functional annotation, and 55,356 (75.9%) had an ortholog. A minimum of 23,568 different TTs was identified and 5,835 of them contain a complete ORF. The representative reproductive transcriptome can be reduced to 28,972 TTs for further gene expression studies. Partial transcriptomes from pollen, pistil, and vegetative tissues as control were also constructed. ReprOlive provides free access and download capability to these results. Retrieval mechanisms for sequences and transcript annotations are provided. Graphical localization of annotated enzymes into KEGG pathways is also possible. Finally, ReprOlive has included a semantic conceptualisation by means of a Resource Description Framework (RDF) allowing a Linked Data search for extracting the most updated information related to enzymes, interactions, allergens, structures, and reactive oxygen species.

15.

kpath: integration of metabolic pathway linked data.

Navas-Delgado, Ismael; García-Godoy, María Jesús; López-Camacho, Esteban; Rybinski, Maciej; Reyes-Palomares, Armando; Medina, Miguel Ángel; Aldana-Montes, José F.

Database (Oxford) ; 2015: bav053, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26055101

RESUMO

In the last few years, the Life Sciences domain has experienced a rapid growth in the amount of available biological databases. The heterogeneity of these databases makes data integration a challenging issue. Some integration challenges are locating resources, relationships, data formats, synonyms or ambiguity. The Linked Data approach partially solves the heterogeneity problems by introducing a uniform data representation model. Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. This article introduces kpath, a database that integrates information related to metabolic pathways. kpath also provides a navigational interface that enables not only the browsing, but also the deep use of the integrated data to build metabolic networks based on existing disperse knowledge. This user interface has been used to showcase relationships that can be inferred from the information available in several public databases.

Assuntos

Metaboloma , Interface Usuário-Computador

16.

jMetalCpp: optimizing molecular docking problems with a C++ metaheuristic framework.

López-Camacho, Esteban; García Godoy, María Jesús; Nebro, Antonio J; Aldana-Montes, José F.

Bioinformatics ; 30(3): 437-8, 2014 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-24273242

RESUMO

MOTIVATION: Molecular docking is a method for structure-based drug design and structural molecular biology, which attempts to predict the position and orientation of a small molecule (ligand) in relation to a protein (receptor) to produce a stable complex with a minimum binding energy. One of the most widely used software packages for this purpose is AutoDock, which incorporates three metaheuristic techniques. We propose the integration of AutoDock with jMetalCpp, an optimization framework, thereby providing both single- and multi-objective algorithms that can be used to effectively solve docking problems. RESULTS: The resulting combination of AutoDock + jMetalCpp allows users of the former to easily use the metaheuristics provided by the latter. In this way, biologists have at their disposal a richer set of optimization techniques than those already provided in AutoDock. Moreover, designers of metaheuristic techniques can use molecular docking for case studies, which can lead to more efficient algorithms oriented to solving the target problems. AVAILABILITY AND IMPLEMENTATION: jMetalCpp software adapted to AutoDock is freely available as a C++ source code at http://khaos.uma.es/AutodockjMetal/.

Assuntos

Simulação de Acoplamento Molecular/métodos , Software , Algoritmos , Desenho de Fármacos , Humanos , Ligantes , Proteínas/química , Proteínas/metabolismo

17.

Sharing and executing linked data queries in a collaborative environment.

García Godoy, María Jesús; López-Camacho, Esteban; Navas-Delgado, Ismael; Aldana-Montes, José F.

Bioinformatics ; 29(13): 1663-70, 2013 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-23620361

RESUMO

MOTIVATION: Life Sciences have emerged as a key domain in the Linked Data community because of the diversity of data semantics and formats available through a great variety of databases and web technologies. Thus, it has been used as the perfect domain for applications in the web of data. Unfortunately, bioinformaticians are not exploiting the full potential of this already available technology, and experts in Life Sciences have real problems to discover, understand and devise how to take advantage of these interlinked (integrated) data. RESULTS: In this article, we present Bioqueries, a wiki-based portal that is aimed at community building around biological Linked Data. This tool has been designed to aid bioinformaticians in developing SPARQL queries to access biological databases exposed as Linked Data, and also to help biologists gain a deeper insight into the potential use of this technology. This public space offers several services and a collaborative infrastructure to stimulate the consumption of biological Linked Data and, therefore, contribute to implementing the benefits of the web of data in this domain. Bioqueries currently contains 215 query entries grouped by database and theme, 230 registered users and 44 end points that contain biological Resource Description Framework information. AVAILABILITY: The Bioqueries portal is freely accessible at http://bioqueries.uma.es. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Bases de Dados Factuais , Software , Disciplinas das Ciências Biológicas , Comportamento Cooperativo , Internet

18.

Transparent mediation-based access to multiple yeast data sources using an ontology driven interface.

Briache, Abdelaali; Marrakchi, Kamar; Kerzazi, Amine; Navas-Delgado, Ismael; Rossi Hassani, Badr D; Lairini, Khalid; Aldana-Montes, José F.

BMC Bioinformatics ; 13 Suppl 1: S7, 2012 Jan 25.

Artigo em Inglês | MEDLINE | ID: mdl-22372975

RESUMO

BACKGROUND: Saccharomyces cerevisiae is recognized as a model system representing a simple eukaryote whose genome can be easily manipulated. Information solicited by scientists on its biological entities (Proteins, Genes, RNAs...) is scattered within several data sources like SGD, Yeastract, CYGD-MIPS, BioGrid, PhosphoGrid, etc. Because of the heterogeneity of these sources, querying them separately and then manually combining the returned results is a complex and time-consuming task for biologists most of whom are not bioinformatics expert. It also reduces and limits the use that can be made on the available data. RESULTS: To provide transparent and simultaneous access to yeast sources, we have developed YeastMed: an XML and mediator-based system. In this paper, we present our approach in developing this system which takes advantage of SB-KOM to perform the query transformation needed and a set of Data Services to reach the integrated data sources. The system is composed of a set of modules that depend heavily on XML and Semantic Web technologies. User queries are expressed in terms of a domain ontology through a simple form-based web interface. CONCLUSIONS: YeastMed is the first mediation-based system specific for integrating yeast data sources. It was conceived mainly to help biologists to find simultaneously relevant data from multiple data sources. It has a biologist-friendly interface easy to use. The system is available at http://www.khaos.uma.es/yeastmed/.

Assuntos

Ontologias Biológicas , Biologia Computacional/métodos , Mineração de Dados/métodos , Internet , Saccharomyces cerevisiae , Interface Usuário-Computador , Bases de Dados Factuais , Saccharomyces cerevisiae/citologia , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo

19.

Social pathway annotation: extensions of the systems biology metabolic modelling assistant.

Navas-Delgado, Ismael; Real-Chicharro, Alejandro; Medina, Miguel Ángel; Sánchez-Jiménez, Francisca; Aldana-Montes, José F.

Brief Bioinform ; 12(6): 576-87, 2011 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-20965999

RESUMO

High-throughput experiments have produced large amounts of heterogeneous data in the life sciences. These data are usually represented in different formats (and sometimes in technical documents) on the Web. Inevitably, life science researchers have to deal with all these data and different formats to perform their daily research, but it is simply not possible for a single human mind to analyse all these data. The integration of data in the life sciences is a key component in the analysis of biological processes. These data may contain errors, but the curation of the vast amount of data generated in the 'omic' era cannot be done by individual researchers. To address this problem, community-driven tools could be used to assist with data analysis. In this article, we focus on a tool with social networking capabilities built on top of the SBMM (Systems Biology Metabolic Modelling) Assistant to enable the collaborative improvement of metabolic pathway models (the application is freely available at http://sbmm.uma.es/SPA).

Assuntos

Biologia Computacional/métodos , Biologia de Sistemas/métodos , Bases de Dados Factuais , Internet , Redes e Vias Metabólicas , Software , Interface Usuário-Computador

20.

KA-SB: from data integration to large scale reasoning.

Roldán-García, María del Mar; Navas-Delgado, Ismael; Kerzazi, Amine; Chniber, Othmane; Molina-Castro, Joaquín; Aldana-Montes, José F.

BMC Bioinformatics ; 10 Suppl 10: S5, 2009 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-19796402

RESUMO

BACKGROUND: The analysis of information in the biological domain is usually focused on the analysis of data from single on-line data sources. Unfortunately, studying a biological process requires having access to disperse, heterogeneous, autonomous data sources. In this context, an analysis of the information is not possible without the integration of such data. METHODS: KA-SB is a querying and analysis system for final users based on combining a data integration solution with a reasoner. Thus, the tool has been created with a process divided into two steps: 1) KOMF, the Khaos Ontology-based Mediator Framework, is used to retrieve information from heterogeneous and distributed databases; 2) the integrated information is crystallized in a (persistent and high performance) reasoner (DBOWL). This information could be further analyzed later (by means of querying and reasoning). RESULTS: In this paper we present a novel system that combines the use of a mediation system with the reasoning capabilities of a large scale reasoner to provide a way of finding new knowledge and of analyzing the integrated information from different databases, which is retrieved as a set of ontology instances. This tool uses a graphical query interface to build user queries easily, which shows a graphical representation of the ontology and allows users o build queries by clicking on the ontology concepts. CONCLUSION: These kinds of systems (based on KOMF) will provide users with very large amounts of information (interpreted as ontology instances once retrieved), which cannot be managed using traditional main memory-based reasoners. We propose a process for creating persistent and scalable knowledgebases from sets of OWL instances obtained by integrating heterogeneous data sources with KOMF. This process has been applied to develop a demo tool http://khaos.uma.es/KA-SB, which uses the BioPax Level 3 ontology as the integration schema, and integrates UNIPROT, KEGG, CHEBI, BRENDA and SABIORK databases.

Assuntos

Biologia Computacional/métodos , Armazenamento e Recuperação da Informação/métodos , Software , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Internet

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA