Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PeerJ Comput Sci ; 7: e527, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34013039

RESUMO

Complex scientific experiments from various domains are typically modeled as workflows and executed on large-scale machines using a Parallel Workflow Management System (WMS). Since such executions usually last for hours or days, some WMSs provide user steering support, i.e., they allow users to run data analyses and, depending on the results, adapt the workflows at runtime. A challenge in the parallel execution control design is to manage workflow data for efficient executions while enabling user steering support. Data access for high scalability is typically transaction-oriented, while for data analysis, it is online analytical-oriented so that managing such hybrid workloads makes the challenge even harder. In this work, we present SchalaDB, an architecture with a set of design principles and techniques based on distributed in-memory data management for efficient workflow execution control and user steering. We propose a distributed data design for scalable workflow task scheduling and high availability driven by a parallel and distributed in-memory DBMS. To evaluate our proposal, we develop d-Chiron, a WMS designed according to SchalaDB's principles. We carry out an extensive experimental evaluation on an HPC cluster with up to 960 computing cores. Among other analyses, we show that even when running data analyses for user steering, SchalaDB's overhead is negligible for workloads composed of hundreds of concurrent tasks on shared data. Our results encourage workflow engine developers to follow a parallel and distributed data-oriented approach not only for scheduling and monitoring but also for user steering.

2.
Environ Microbiol ; 22(11): 4557-4570, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32700350

RESUMO

Cyanobacteria of the genus Synechococcus are major contributors to global primary productivity and are found in a wide range of aquatic ecosystems. This Synechococcus collective (SC) is metabolically diverse, with some lineages thriving in polar and nutrient-rich locations and others in tropical or riverine waters. Although many studies have discussed the ecology and evolution of the SC, there is a paucity of knowledge on its taxonomic structure. Thus, we present a new taxonomic classification framework for the SC based on recent advances in microbial genomic taxonomy. Phylogenomic analyses of 1085 cyanobacterial genomes demonstrate that organisms classified as Synechococcus are polyphyletic at the order rank. The SC is classified into 15 genera, which are placed into five distinct orders within the phylum Cyanobacteria: (i) Synechococcales (Cyanobium, Inmanicoccus, Lacustricoccus gen. Nov., Parasynechococcus, Pseudosynechococcus, Regnicoccus, Synechospongium gen. nov., Synechococcus and Vulcanococcus); (ii) Cyanobacteriales (Limnothrix); (iii) Leptococcales (Brevicoccus and Leptococcus); (iv) Thermosynechococcales (Stenotopis and Thermosynechococcus) and (v) Neosynechococcales (Neosynechococcus). The newly proposed classification is consistent with habitat distribution patterns (seawater, freshwater, brackish and thermal environments) and reflects the ecological and evolutionary relationships of the SC.


Assuntos
Genoma Bacteriano/genética , Synechococcus/classificação , Synechococcus/genética , Ecossistema , Água Doce/microbiologia , Genômica , Ferro/metabolismo , Filogenia , Águas Salinas , Água do Mar/microbiologia , Synechococcus/metabolismo
3.
PeerJ ; 6: e5551, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30186700

RESUMO

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process.

4.
Artigo em Inglês | MEDLINE | ID: mdl-26454874

RESUMO

A new open access database, Brazilian Marine Biodiversity (BaMBa) (https://marinebiodiversity.lncc.br), was developed in order to maintain large datasets from the Brazilian marine environment. Essentially, any environmental information can be added to BaMBa. Certified datasets obtained from integrated holistic studies, comprising physical-chemical parameters, -omics, microbiology, benthic and fish surveys can be deposited in the new database, enabling scientific, industrial and governmental policies and actions to be undertaken on marine resources. There is a significant number of databases, however BaMBa is the only integrated database resource both supported by a government initiative and exclusive for marine data. BaMBa is linked to the Information System on Brazilian Biodiversity (SiBBr, http://www.sibbr.gov.br/) and will offer opportunities for improved governance of marine resources and scientists' integration. Database URL: http://marinebiodiversity.lncc.br.


Assuntos
Organismos Aquáticos , Biota/fisiologia , Bases de Dados Factuais , Animais , Organismos Aquáticos/classificação , Organismos Aquáticos/fisiologia , Brasil
5.
OMICS ; 18(8): 524-38, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24960463

RESUMO

A key focus in 21(st) century integrative biology and drug discovery for neglected tropical and other diseases has been the use of BLAST-based computational methods for identification of orthologous groups in pathogenic organisms to discern orthologs, with a view to evaluate similarities and differences among species, and thus allow the transfer of annotation from known/curated proteins to new/non-annotated ones. We used here a profile-based sensitive methodology to identify distant homologs, coupled to the NCBI's COG (Unicellular orthologs) and KOG (Eukaryote orthologs), permitting us to perform comparative genomics analyses on five protozoan genomes. OrthoSearch was used in five protozoan proteomes showing that 3901 and 7473 orthologs can be identified by comparison with COG and KOG proteomes, respectively. The core protozoa proteome inferred was 418 Protozoa-COG orthologous groups and 704 Protozoa-KOG orthologous groups: (i) 31.58% (132/418) belongs to the category J (translation, ribosomal structure, and biogenesis), and 9.81% (41/418) to the category O (post-translational modification, protein turnover, chaperones) using COG; (ii) 21.45% (151/704) belongs to the categories J, and 13.92% (98/704) to the O using KOG. The phylogenomic analysis showed four well-supported clades for Eukarya, discriminating Multicellular [(i) human, fly, plant and worm] and Unicellular [(ii) yeast, (iii) fungi, and (iv) protozoa] species. These encouraging results attest to the usefulness of the profile-based methodology for comparative genomics to accelerate semi-automatic re-annotation, especially of the protozoan proteomes. This approach may also lend itself for applications in global health, for example, in the case of novel drug target discovery against pathogenic organisms previously considered difficult to research with traditional drug discovery tools.


Assuntos
Genes de Protozoários , Proteínas de Protozoários/genética , Archaea/genética , Bactérias/genética , Bases de Dados de Proteínas , Entamoeba histolytica/genética , Genômica , Saúde Global , Humanos , Leishmania major/genética , Anotação de Sequência Molecular , Filogenia , Plasmodium falciparum/genética , Homologia de Sequência do Ácido Nucleico , Trypanosoma brucei brucei/genética , Trypanosoma cruzi/genética
6.
BMC Res Notes ; 7: 132, 2014 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-24606808

RESUMO

BACKGROUND: The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. FINDINGS: STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. CONCLUSION: STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Software , Fluxo de Trabalho , Bases de Dados Factuais/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Internet , Filogenia , Reprodutibilidade dos Testes
7.
Int J Data Min Bioinform ; 4(3): 256-80, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20681479

RESUMO

Bioinformatics experiments are typically composed of programs in pipelines manipulating an enormous quantity of data. An interesting approach for managing those experiments is through workflow management systems (WfMS). In this work we discuss WfMS features to support genome homology workflows and present some relevant issues for typical genomic experiments. Our evaluation used Kepler WfMS to manage a real genomic pipeline, named OrthoSearch, originally defined as a Perl script. We show a case study detecting distant homologies on trypanomatids metabolic pathways. Our results reinforce the benefits of WfMS over script languages and point out challenges to WfMS in distributed environments.


Assuntos
Genoma de Protozoário , Genômica/métodos , Fluxo de Trabalho , Redes e Vias Metabólicas/genética , Ciência , Homologia de Sequência
8.
Nucleic Acids Res ; 36(Database issue): D547-52, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17981844

RESUMO

ProtozoaDB (http://www.biowebdb.org/protozoadb) is being developed to initially host both genomics and post-genomics data from Plasmodium falciparum, Entamoeba histolytica, Trypanosoma brucei, T. cruzi and Leishmania major, but will hopefully host other protozoan species as more genomes are sequenced. It is based on the Genomics Unified Schema and offers a modern Web-based interface for user-friendly data visualization and exploration. This database is not intended to duplicate other similar efforts such as GeneDB, PlasmoDB, TcruziDB or even TDRtargets, but to be complementary by providing further analyses with emphasis on distant similarities (HMM-based) and phylogeny-based annotations including orthology analysis. ProtozoaDB will be progressively linked to the above-mentioned databases, focusing in performing a multi-source dynamic combination of information through advanced interoperable Web tools such as Web services. Also, to provide Web services will allow third-party software to retrieve and use data from ProtozoaDB in automated pipelines (workflows) or other interoperable Web technologies, promoting better information reuse and integration. We also expect ProtozoaDB to catalyze the development of local and regional bioinformatics capabilities (research and training), and therefore promote/enhance scientific advancement in developing countries.


Assuntos
Bases de Dados Genéticas , Genoma de Protozoário , Animais , Gráficos por Computador , Entamoeba histolytica/genética , Genômica , Internet , Leishmania major/genética , Plasmodium falciparum/genética , Proteínas de Protozoários/química , Software , Trypanosoma brucei brucei/genética , Trypanosoma cruzi/genética , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...