Pesquisa | BVS - MINISTÉRIO DA SAÚDE

mSigSDK - private, at scale, computation of mutation signatures.

Ge, Aaron; Zhang, Tongwu; Martins, Yasmmin Côrtes; Landi, Maria Teresa; Park, Brian; Chen, Kailing; Balasubramanian, Jeya; Almeida, Jonas S.

ArXiv ; 2024 Jan 19.

Artigo em Inglês | MEDLINE | ID: mdl-38327678

RESUMO

In our previous work, we demonstrated that it is feasible to perform analysis on mutation signature data without the need for downloads or installations and analyze individual patient data at scale without compromising privacy. Building on this foundation, we developed an in-browser Software Development Kit (a JavaScript SDK), mSigSDK, to facilitate the orchestration of distributed data processing workflows and graphic visualization of mutational signature analysis results. We strictly adhered to modern web computing standards, particularly the modularization standards set by the ECMAScript ES6 framework (JavaScript modules). Our approach allows for the computation to be entirely performed by secure delegation to the computational resources of the user's own machine (in-browser), without any downloads or installations. The mSigSDK was developed primarily as a companion library to the mSig Portal resource of the National Cancer Institute Division of Cancer Epidemiology and Genetics (NIH/NCI/DCEG), with a focus on FAIR extensibility as components of other researchers' own data science constructs. Anticipated extensions include the programmatic operation of other mutation signature API ecosystems such as SIGNAL and COSMIC, advancing towards a data commons for mutational signature research (Grossman et al., 2016).

PPIntegrator: semantic integrative system for protein-protein interaction and application for host-pathogen datasets.

Martins, Yasmmin Côrtes; Ziviani, Artur; Cerqueira E Costa, Maiana de Oliveira; Cavalcanti, Maria Cláudia Reis; Nicolás, Marisa Fabiana; de Vasconcelos, Ana Tereza Ribeiro.

Bioinform Adv ; 3(1): vbad067, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37359724

RESUMO

Summary: Semantic web standards have shown importance in the last 20 years in promoting data formalization and interlinking between the existing knowledge graphs. In this context, several ontologies and data integration initiatives have emerged in recent years for the biological area, such as the broadly used Gene Ontology that contains metadata to annotate gene function and subcellular location. Another important subject in the biological area is protein-protein interactions (PPIs) which have applications like protein function inference. Current PPI databases have heterogeneous exportation methods that challenge their integration and analysis. Presently, several initiatives of ontologies covering some concepts of the PPI domain are available to promote interoperability across datasets. However, the efforts to stimulate guidelines for automatic semantic data integration and analysis for PPIs in these datasets are limited. Here, we present PPIntegrator, a system that semantically describes data related to protein interactions. We also introduce an enrichment pipeline to generate, predict and validate new potential host-pathogen datasets by transitivity analysis. PPIntegrator contains a data preparation module to organize data from three reference databases and a triplification and data fusion module to describe the provenance information and results. This work provides an overview of the PPIntegrator system applied to integrate and compare host-pathogen PPI datasets from four bacterial species using our proposed transitivity analysis pipeline. We also demonstrated some critical queries to analyze this kind of data and highlight the importance and usage of the semantic data generated by our system. Availability and implementation: https://github.com/YasCoMa/ppintegrator, https://github.com/YasCoMa/ppi_validation_process and https://github.com/YasCoMa/predprin.

The gene regulatory network of Staphylococcus aureus ST239-SCCmecIII strain Bmb9393 and assessment of genes associated with the biofilm in diverse backgrounds.

Costa, Maiana de Oliveira Cerqueira E; do Nascimento, Ana Paula Barbosa; Martins, Yasmmin Cortes; Dos Santos, Marcelo Trindade; Figueiredo, Agnes Marie de Sá; Perez-Rueda, Ernesto; Nicolás, Marisa Fabiana.

Front Microbiol ; 13: 1049819, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36704545

RESUMO

Introduction: Staphylococcus aureus is one of the most prevalent and relevant pathogens responsible for a wide spectrum of hospital-associated or community-acquired infections. In addition, methicillin-resistant Staphylococcus aureus may display multidrug resistance profiles that complicate treatment and increase the mortality rate. The ability to produce biofilm, particularly in device-associated infections, promotes chronic and potentially more severe infections originating from the primary site. Understanding the complex mechanisms involved in planktonic and biofilm growth is critical to identifying regulatory connections and ways to overcome the global health problem of multidrug-resistant bacteria. Methods: In this work, we apply literature-based and comparative genomics approaches to reconstruct the gene regulatory network of the high biofilm-producing strain Bmb9393, belonging to one of the highly disseminating successful clones, the Brazilian epidemic clone. To the best of our knowledge, we describe for the first time the topological properties and network motifs for the Staphylococcus aureus pathogen. We performed this analysis using the ST239-SCCmecIII Bmb9393 strain. In addition, we analyzed transcriptomes available in the literature to construct a set of genes differentially expressed in the biofilm, covering different stages of the biofilms and genetic backgrounds of the strains. Results and discussion: The Bmb9393 gene regulatory network comprises 1,803 regulatory interactions between 64 transcription factors and the non-redundant set of 1,151 target genes with the inclusion of 19 new regulons compared to the N315 transcriptional regulatory network published in 2011. In the Bmb9393 network, we found 54 feed-forward loop motifs, where the most prevalent were coherent type 2 and incoherent type 2. The non-redundant set of differentially expressed genes in the biofilm consisted of 1,794 genes with functional categories relevant for adaptation to the variable microenvironments established throughout the biofilm formation process. Finally, we mapped the set of genes with altered expression in the biofilm in the Bmb9393 gene regulatory network to depict how different growth modes can alter the regulatory systems. The data revealed 45 transcription factors and 876 shared target genes. Thus, the gene regulatory network model provided represents the most up-to-date model for Staphylococcus aureus, and the set of genes altered in the biofilm provides a global view of their influence on biofilm formation from distinct experimental perspectives and different strain backgrounds.

Large-Scale Protein Interactions Prediction by Multiple Evidence Analysis Associated With an In-Silico Curation Strategy.

Martins, Yasmmin Côrtes; Ziviani, Artur; Nicolás, Marisa Fabiana; de Vasconcelos, Ana Tereza Ribeiro.

Front Bioinform ; 1: 731345, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-36303787

RESUMO

Predicting the physical or functional associations through protein-protein interactions (PPIs) represents an integral approach for inferring novel protein functions and discovering new drug targets during repositioning analysis. Recent advances in high-throughput data generation and multi-omics techniques have enabled large-scale PPI predictions, thus promoting several computational methods based on different levels of biological evidence. However, integrating multiple results and strategies to optimize, extract interaction features automatically and scale up the entire PPI prediction process is still challenging. Most procedures do not offer an in-silico validation process to evaluate the predicted PPIs. In this context, this paper presents the PredPrIn scientific workflow that enables PPI prediction based on multiple lines of evidence, including the structure, sequence, and functional annotation categories, by combining boosting and stacking machine learning techniques. We also present a pipeline (PPIVPro) for the validation process based on cellular co-localization filtering and a focused search of PPI evidence on scientific publications. Thus, our combined approach provides means to extensive scale training or prediction of new PPIs and a strategy to evaluate the prediction quality. PredPrIn and PPIVPro are publicly available at https://github.com/YasCoMa/predprin and https://github.com/YasCoMa/ppi_validation_process.

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA