Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Brief Bioinform ; 25(Supplement_1)2024 Jul 23.
Artículo en Inglés | MEDLINE | ID: mdl-39266450

RESUMEN

In an environment, microbes often work in communities to achieve most of their essential functions, including the production of essential nutrients. Microbial biofilms are communities of microbes that attach to a nonliving or living surface by embedding themselves into a self-secreted matrix of extracellular polymeric substances. These communities work together to enhance their colonization of surfaces, produce essential nutrients, and achieve their essential functions for growth and survival. They often consist of diverse microbes including bacteria, viruses, and fungi. Biofilms play a critical role in influencing plant phenotypes and human microbial infections. Understanding how these biofilms impact plant health, human health, and the environment is important for analyzing genotype-phenotype-driven rule-of-life functions. Such fundamental knowledge can be used to precisely control the growth of biofilms on a given surface. Metagenomics is a powerful tool for analyzing biofilm genomes through function-based gene and protein sequence identification (functional metagenomics) and sequence-based function identification (sequence metagenomics). Metagenomic sequencing enables a comprehensive sampling of all genes in all organisms present within a biofilm sample. However, the complexity of biofilm metagenomic study warrants the increasing need to follow the Findability, Accessibility, Interoperability, and Reusable (FAIR) Guiding Principles for scientific data management. This will ensure that scientific findings can be more easily validated by the research community. This study proposes a dockerized, self-learning bioinformatics workflow to increase the community adoption of metagenomics toolkits in a metagenomics and meta-transcriptomics investigation. Our biofilm metagenomics workflow self-learning module includes integrated learning resources with an interactive dockerized workflow. This module will allow learners to analyze resources that are beneficial for aggregating knowledge about biofilm marker genes, proteins, and metabolic pathways as they define the composition of specific microbial communities. Cloud and dockerized technology can allow novice learners-even those with minimal knowledge in computer science-to use complicated bioinformatics tools. Our cloud-based, dockerized workflow splits biofilm microbiome metagenomics analyses into four easy-to-follow submodules. A variety of tools are built into each submodule. As students navigate these submodules, they learn about each tool used to accomplish the task. The downstream analysis is conducted using processed data obtained from online resources or raw data processed via Nextflow pipelines. This analysis takes place within Vertex AI's Jupyter notebook instance with R and Python kernels. Subsequently, results are stored and visualized in Google Cloud storage buckets, alleviating the computational burden on local resources. The result is a comprehensive tutorial that guides bioinformaticians of any skill level through the entire workflow. It enables them to comprehend and implement the necessary processes involved in this integrated workflow from start to finish. This manuscript describes the development of a resource module that is part of a learning platform named "NIGMS Sandbox for Cloud-based Learning" https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.


Asunto(s)
Biopelículas , Metagenómica , Biopelículas/crecimiento & desarrollo , Metagenómica/métodos , Microbiota/genética , Nube Computacional , Humanos , Biología Computacional/métodos
2.
Nucleic Acids Res ; 45(D1): D1117-D1122, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27924016

RESUMEN

Bioinformatics and computational biology play a critical role in bioscience and biomedical research. As researchers design their experimental projects, one major challenge is to find the most relevant bioinformatics toolkits that will lead to new knowledge discovery from their data. The Bio-TDS (Bioscience Query Tool Discovery Systems, http://biotds.org/) has been developed to assist researchers in retrieving the most applicable analytic tools by allowing them to formulate their questions as free text. The Bio-TDS is a flexible retrieval system that affords users from multiple bioscience domains (e.g. genomic, proteomic, bio-imaging) the ability to query over 12 000 analytic tool descriptions integrated from well-established, community repositories. One of the primary components of the Bio-TDS is the ontology and natural language processing workflow for annotation, curation, query processing, and evaluation. The Bio-TDS's scientific impact was evaluated using sample questions posed by researchers retrieved from Biostars, a site focusing on BIOLOGICAL DATA ANALYSIS: The Bio-TDS was compared to five similar bioscience analytic tool retrieval systems with the Bio-TDS outperforming the others in terms of relevance and completeness. The Bio-TDS offers researchers the capacity to associate their bioscience question with the most relevant computational toolsets required for the data analysis in their knowledge discovery process.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Programas Informáticos , Sistemas de Administración de Bases de Datos , Genómica/métodos , Anotación de Secuencia Molecular , Proteómica/métodos , Reproducibilidad de los Resultados , Navegador Web , Flujo de Trabajo
5.
Biofilm ; 8: 100210, 2024 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-39221168

RESUMEN

Priority question exercises are increasingly used to frame and set future research, innovation and development agendas. They can provide an important bridge between the discoveries, data and outputs generated by researchers, and the information required by policy makers and funders. Microbial biofilms present huge scientific, societal and economic opportunities and challenges. In order to identify key priorities that will help to advance the field, here we review questions from a pool submitted by the international biofilm research community and from practitioners working across industry, the environment and medicine. To avoid bias we used computational approaches to group questions and manage a voting and selection process. The outcome of the exercise is a set of 78 unique questions, categorized in six themes: (i) Biofilm control, disruption, prevention, management, treatment (13 questions); (ii) Resistance, persistence, tolerance, role of aggregation, immune interaction, relevance to infection (10 questions); (iii) Model systems, standards, regulatory, policy education, interdisciplinary approaches (15 questions); (iv) Polymicrobial, interactions, ecology, microbiome, phage (13 questions); (v) Clinical focus, chronic infection, detection, diagnostics (13 questions); and (vi) Matrix, lipids, capsule, metabolism, development, physiology, ecology, evolution environment, microbiome, community engineering (14 questions). The questions presented are intended to highlight opportunities, stimulate discussion and provide focus for researchers, funders and policy makers, informing future research, innovation and development strategy for biofilms and microbial communities.

6.
Nucleic Acids Res ; 39(Web Server issue): W528-32, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21546552

RESUMEN

The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet.


Asunto(s)
Genómica/métodos , Programas Informáticos , Bases de Datos Genéticas , Internet , Integración de Sistemas , Flujo de Trabajo
7.
Res Sq ; 2023 Sep 08.
Artículo en Inglés | MEDLINE | ID: mdl-37720037

RESUMEN

Initially, research disciplines operated independently, but the emergence of trans-disciplinary sciences led to convergence research, impacting graduate programs and research laboratories, especially in bioengineering and material engineering as presented here. Current graduate curriculum fails to efficiently prepare students for multidisciplinary and convergence research, thus creating a gap between the students and research laboratory expectations. We present a convergence training framework for graduate students, incorporating problem-based learning under the guidance of senior scientists and collaboration with postdoctoral researchers. This case study serves as a template for transdisciplinary convergent training projects - bridging the expertise gap and fostering successful convergence learning experiences in computational biointerface (material-biology interface). The 18-month Advanced Data Science Workshop, initiated in 2019, involves project-based learning, online training modules, and data collection. A pilot solution utilized Jupyter notebook on Google collaborator and culminated in a face-to-face workshop where project presentations and finalization occurred. The program started with 9 experts in the four diverse fields creating 14 curated projects in data science (Artificial Intelligence/Machine Learning), material science, biofilm engineering, and biointerface. These were integrated into convergence research through webinars by the experts. The experts chose 8 of the 14 projects to be part of an all-day in-person workshop, where over 20 learners formed eight teams that tackled complex problems at the interface of digital image processing, gene expression analysis, and material prediction. Each team was comprised of students and postdoctoral researchers or research scientists from diverse domains including computer science, materials science, and biofilm research. Some projects were selected for presentation at the international IEEE Bioinformatics conference in 2022, with three resulting Machine Learning (ML) models submitted as a journal paper. Students engaged in problem discussions, collaborated with experts from different disciplines, and received guidance in decomposing learning objectives. Based on learner feedback, this successful experience allows for consolidation and integration of convergence research via problem-based learning into the curriculum. Three bioengineering participants, who received training in data science and engineering, have received bioinformatics jobs in biotechnology industries.

8.
Microorganisms ; 11(1)2023 Jan 03.
Artículo en Inglés | MEDLINE | ID: mdl-36677411

RESUMEN

A significant amount of literature is available on biocorrosion, which makes manual extraction of crucial information such as genes and proteins a laborious task. Despite the fast growth of biology related corrosion studies, there is a limited number of gene collections relating to the corrosion process (biocorrosion). Text mining offers a potential solution by automatically extracting the essential information from unstructured text. We present a text mining workflow that extracts biocorrosion associated genes/proteins in sulfate-reducing bacteria (SRB) from literature databases (e.g., PubMed and PMC). This semi-automatic workflow is built with the Named Entity Recognition (NER) method and Convolutional Neural Network (CNN) model. With PubMed and PMCID as inputs, the workflow identified 227 genes belonging to several Desulfovibrio species. To validate their functions, Gene Ontology (GO) enrichment and biological network analysis was performed using UniprotKB and STRING-DB, respectively. The GO analysis showed that metal ion binding, sulfur binding, and electron transport were among the principal molecular functions. Furthermore, the biological network analysis generated three interlinked clusters containing genes involved in metal ion binding, cellular respiration, and electron transfer, which suggests the involvement of the extracted gene set in biocorrosion. Finally, the dataset was validated through manual curation, yielding a similar set of genes as our workflow; among these, hysB and hydA, and sat and dsrB were identified as the metal ion binding and sulfur metabolism genes, respectively. The identified genes were mapped with the pangenome of 63 SRB genomes that yielded the distribution of these genes across 63 SRB based on the amino acid sequence similarity and were further categorized as core and accessory gene families. SRB's role in biocorrosion involves the transfer of electrons from the metal surface via a hydrogen medium to the sulfate reduction pathway. Therefore, genes encoding hydrogenases and cytochromes might be participating in removing hydrogen from the metals through electron transfer. Moreover, the production of corrosive sulfide from the sulfur metabolism indirectly contributes to the localized pitting of the metals. After the corroboration of text mining results with SRB biocorrosion mechanisms, we suggest that the text mining framework could be utilized for genes/proteins extraction and significantly reduce the manual curation time.

9.
Front Microbiol ; 14: 1086021, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37125195

RESUMEN

The growth and survival of an organism in a particular environment is highly depends on the certain indispensable genes, termed as essential genes. Sulfate-reducing bacteria (SRB) are obligate anaerobes which thrives on sulfate reduction for its energy requirements. The present study used Oleidesulfovibrio alaskensis G20 (OA G20) as a model SRB to categorize the essential genes based on their key metabolic pathways. Herein, we reported a feedback loop framework for gene of interest discovery, from bio-problem to gene set of interest, leveraging expert annotation with computational prediction. Defined bio-problem was applied to retrieve the genes of SRB from literature databases (PubMed, and PubMed Central) and annotated them to the genome of OA G20. Retrieved gene list was further used to enrich protein-protein interaction and was corroborated to the pangenome analysis, to categorize the enriched gene sets and the respective pathways under essential and non-essential. Interestingly, the sat gene (dde_2265) from the sulfur metabolism was the bridging gene between all the enriched pathways. Gene clusters involved in essential pathways were linked with the genes from seleno-compound metabolism, amino acid metabolism, secondary metabolite synthesis, and cofactor biosynthesis. Furthermore, pangenome analysis demonstrated the gene distribution, where 69.83% of the 116 enriched genes were mapped under "persistent," inferring the essentiality of these genes. Likewise, 21.55% of the enriched genes, which involves specially the formate dehydrogenases and metallic hydrogenases, appeared under "shell." Our methodology suggested that semi-automated text mining and network analysis may play a crucial role in deciphering the previously unexplored genes and key mechanisms which can help to generate a baseline prior to perform any experimental studies.

10.
J Mol Biol ; 435(2): 167895, 2023 01 30.
Artículo en Inglés | MEDLINE | ID: mdl-36463932

RESUMEN

Micrograph comparison remains useful in bioscience. This technology provides researchers with a quick snapshot of experimental conditions. But sometimes a two- condition comparison relies on researchers' eyes to draw conclusions. Our Bioimage Analysis, Statistic, and Comparison (BASIN) software provides an objective and reproducible comparison leveraging inferential statistics to bridge image data with other modalities. Users have access to machine learning-based object segmentation. BASIN provides several data points such as images' object counts, intensities, and areas. Hypothesis testing may also be performed. To improve BASIN's accessibility, we implemented it using R Shiny and provided both an online and offline version. We used BASIN to process 498 image pairs involving five bioscience topics. Our framework supported either direct claims or extrapolations 57% of the time. Analysis results were manually curated to determine BASIN's accuracy which was shown to be 78%. Additionally, each BASIN version's initial release shows an average 82% FAIR compliance score.


Asunto(s)
Biopelículas , Disciplinas de las Ciencias Biológicas , Procesamiento de Imagen Asistido por Computador , Aprendizaje Automático , Programas Informáticos , Procesamiento de Imagen Asistido por Computador/métodos , Flujo de Trabajo , Conjuntos de Datos como Asunto , Disciplinas de las Ciencias Biológicas/métodos
11.
Nucleic Acids Res ; 36(Database issue): D959-65, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18063570

RESUMEN

PlantGDB (http://www.plantgdb.org/) is a genomics database encompassing sequence data for green plants (Viridiplantae). PlantGDB provides annotated transcript assemblies for >100 plant species, with transcripts mapped to their cognate genomic context where available, integrated with a variety of sequence analysis tools and web services. For 14 plant species with emerging or complete genome sequence, PlantGDB's genome browsers (xGDB) serve as a graphical interface for viewing, evaluating and annotating transcript and protein alignments to chromosome or bacterial artificial chromosome (BAC)-based genome assemblies. Annotation is facilitated by the integrated yrGATE module for community curation of gene models. Novel web services at PlantGDB include Tracembler, an iterative alignment tool that generates contigs from GenBank trace file data and BioExtract Server, a web-based server for executing custom sequence analysis workflows. PlantGDB also hosts a plant genomics research outreach portal (PGROP) that facilitates access to a large number of resources for research and training.


Asunto(s)
Bases de Datos Genéticas , Genoma de Planta , Genes de Plantas , Genómica , Internet , Proteínas de Plantas/química , Proteínas de Plantas/genética , ARN Mensajero/química , Alineación de Secuencia , Programas Informáticos , Interfaz Usuario-Computador
12.
J Mol Neurosci ; 44(1): 53-8, 2011 May.
Artículo en Inglés | MEDLINE | ID: mdl-21416271

RESUMEN

Autosomal recessive spastic ataxia of Charlevoix-Saguenay is a distinct form of hereditary early-onset spastic ataxia caused by cerebellum and spinal cord degeneration. The SACS gene has been demonstrated to be responsible for the disease through worldwide description of different mutations. We report here a computational analysis of a novel SACS gene mutation identified in a Tunisian family, using workflow implemented on the BioExtract Server. Several online computational tools are currently available to explore the effect of novel identified mutations in human and other organisms. Such analysis is time-consuming and generates a batch of files that researchers need to extract and save. The BioExtract Server workflow described here offers an easy way to execute the required tools together, avoiding entering queries independently in each web tool or service.


Asunto(s)
Biología Computacional/métodos , Análisis Mutacional de ADN/métodos , Proteínas de Choque Térmico/genética , Mutación , Simulación por Computador , Sistemas de Computación , Análisis Mutacional de ADN/instrumentación , Humanos , Espasticidad Muscular/genética , Linaje , Fenotipo , Ataxias Espinocerebelosas/congénito , Ataxias Espinocerebelosas/genética , Túnez
13.
Int J Plant Genomics ; 2011: 923035, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-22253616

RESUMEN

The purpose of the online resource presented here, POPcorn (Project Portal for corn), is to enhance accessibility of maize genetic and genomic resources for plant biologists. Currently, many online locations are difficult to find, some are best searched independently, and individual project websites often degrade over time-sometimes disappearing entirely. The POPcorn site makes available (1) a centralized, web-accessible resource to search and browse descriptions of ongoing maize genomics projects, (2) a single, stand-alone tool that uses web Services and minimal data warehousing to search for sequence matches in online resources of diverse offsite projects, and (3) a set of tools that enables researchers to migrate their data to the long-term model organism database for maize genetic and genomic information: MaizeGDB. Examples demonstrating POPcorn's utility are provided herein.

14.
Artículo en Inglés | MEDLINE | ID: mdl-20150665

RESUMEN

Many in silico investigations in bioinformatics require access to multiple, distributed data sources and analytic tools. The requisite data sources may include large public data repositories, community databases, and project databases for use in domain-specific research. Different data sources frequently utilize distinct query languages and return results in unique formats, and therefore researchers must either rely upon a small number of primary data sources or become familiar with multiple query languages and formats. Similarly, the associated analytic tools often require specific input formats and produce unique outputs which make it difficult to utilize the output from one tool as input to another. The BioExtract Server (http://bioextract.org) is a Web-based data integration application designed to consolidate, analyze, and serve data from heterogeneous biomolecular databases in the form of a mash-up. The basic operations of the BioExtract Server allow researchers, via their Web browsers, to specify data sources, flexibly query data sources, apply analytic tools, download result sets, and store query results for later reuse. As a researcher works with the system, their "steps" are saved in the background. At any time, these steps can be preserved long-term as a workflow simply by providing a workflow name and description.


Asunto(s)
Biopolímeros/química , Minería de Datos/métodos , Sistemas de Administración de Bases de Datos , Bases de Datos Factuales , Difusión de la Información/métodos , Internet , Programas Informáticos , Biopolímeros/clasificación , Biopolímeros/fisiología , Biología Computacional/métodos , Flujo de Trabajo
15.
Int J Comput Biol Drug Des ; 1(3): 302-12, 2008.
Artículo en Inglés | MEDLINE | ID: mdl-20054995

RESUMEN

Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed service designed to provide researchers with the web ability to query multiple data sources, save results as searchable data sets, and execute analytic tools. As the researcher works with the system, their tasks are saved in the background. At any time these steps can be saved as a workflow that can then be executed again and/or modified later.


Asunto(s)
Biología Computacional/métodos , Redes de Comunicación de Computadores , Biología Computacional/estadística & datos numéricos , Sistemas de Computación , Bases de Datos Genéticas , Internet , Diseño de Software
16.
Plant Physiol ; 139(2): 610-8, 2005 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-16219921

RESUMEN

PlantGDB (http://www.plantgdb.org/) is a database of plant molecular sequences. Expressed sequence tag (EST) sequences are assembled into contigs that represent tentative unique genes. EST contigs are functionally annotated with information derived from known protein sequences that are highly similar to the putative translation products. Tentative Gene Ontology terms are assigned to match those of the similar sequences identified. Genome survey sequences are assembled similarly. The resulting genome survey sequence contigs are matched to ESTs and conserved protein homologs to identify putative full-length open reading frame-containing genes, which are subsequently provisionally classified according to established gene family designations. For Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa), the exon-intron boundaries for gene structures are annotated by spliced alignment of ESTs and full-length cDNAs to their respective complete genome sequences. Unique genome browsers have been developed to present all available EST and cDNA evidence for current transcript models (for Arabidopsis, see the AtGDB site at http://www.plantgdb.org/AtGDB/; for rice, see the OsGDB site at http://www.plantgdb.org/OsGDB/). In addition, a number of bioinformatic tools have been integrated at PlantGDB that enable researchers to carry out sequence analyses on-site using both their own data and data residing within the database.


Asunto(s)
Bases de Datos Genéticas , Genoma de Planta , Plantas/genética , Biología Computacional , ADN de Plantas/genética , Etiquetas de Secuencia Expresada , Genómica , Secuencias Repetitivas de Ácidos Nucleicos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA