Pesquisa | BVS IEC

1.

CEBS update: curated toxicology database with enhanced tools for data integration.

Martini, Cari; Liu, Ying Frances; Gong, Hui; Sayers, Nicole; Segura, German; Fostel, Jennifer.

Nucleic Acids Res ; 50(D1): D1156-D1163, 2022 01 07.

Artigo em Inglês | MEDLINE | ID: mdl-34751388

RESUMO

The Chemical Effects in Biological Systems database (CEBS) contains extensive toxicology study results and metadata from the Division of the National Toxicology Program (NTP) and other studies of environmental health interest. This resource grants public access to search and collate data from over 10 250 studies for 12 750 test articles (chemicals, environmental agents). CEBS has made considerable strides over the last 5 years to integrate growing internal data repositories into data warehouses and data marts to better serve the public with high quality curated datasets. This effort includes harmonizing legacy terms and metadata to current standards, mapping test articles to external identifiers, and aligning terms to OBO (Open Biological and Biomedical Ontology) Foundry ontologies. The data are made available through the CEBS Homepage (https://cebs.niehs.nih.gov/cebs/), guided search applications, flat files on FTP (file transfer protocol), and APIs (application programming interface) for user access and to provide a bridge for computational tools. The user interface is intuitive with a single search bar to query keywords related to study metadata, publications, and data availability. Results are consolidated to single pages for each test article with NTP conclusions, publications, individual studies, data collections, and links to related test articles and projects available together.

Assuntos

Bases de Dados Factuais , Biologia de Sistemas/classificação , Toxicogenética/classificação , Toxicologia/classificação , Sistemas de Gerenciamento de Base de Dados , Humanos , Proteômica/classificação

2.

BioMedR: an R/CRAN package for integrated data analysis pipeline in biomedical study.

Dong, Jie; Zhu, Min-Feng; Yun, Yong-Huan; Lu, Ai-Ping; Hou, Ting-Jun; Cao, Dong-Sheng.

Brief Bioinform ; 22(1): 474-484, 2021 01 18.

Artigo em Inglês | MEDLINE | ID: mdl-31885044

RESUMO

BACKGROUND: With the increasing development of biotechnology and information technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these resources needs to be extracted and then transformed to useful knowledge by various data mining methods. However, a main computational challenge is how to effectively represent or encode molecular objects under investigation such as chemicals, proteins, DNAs and even complicated interactions when data mining methods are employed. To further explore these complicated data, an integrated toolkit to represent different types of molecular objects and support various data mining algorithms is urgently needed. RESULTS: We developed a freely available R/CRAN package, called BioMedR, for molecular representations of chemicals, proteins, DNAs and pairwise samples of their interactions. The current version of BioMedR could calculate 293 molecular descriptors and 13 kinds of molecular fingerprints for small molecules, 9920 protein descriptors based on protein sequences and six types of generalized scale-based descriptors for proteochemometric modeling, more than 6000 DNA descriptors from nucleotide sequences and six types of interaction descriptors using three different combining strategies. Moreover, this package realized five similarity calculation methods and four powerful clustering algorithms as well as several useful auxiliary tools, which aims at building an integrated analysis pipeline for data acquisition, data checking, descriptor calculation and data modeling. CONCLUSION: BioMedR provides a comprehensive and uniform R package to link up different representations of molecular objects with each other and will benefit cheminformatics/bioinformatics and other biomedical users. It is available at: https://CRAN.R-project.org/package=BioMedR and https://github.com/wind22zhu/BioMedR/.

Assuntos

Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Gerenciamento de Dados/métodos , Bases de Dados de Compostos Químicos , Bases de Dados Genéticas , Humanos

3.

VirusCircBase: a database of virus circular RNAs.

Cai, Zena; Fan, Yunshi; Zhang, Zheng; Lu, Congyu; Zhu, Zhaozhong; Jiang, Taijiao; Shan, Tongling; Peng, Yousong.

Brief Bioinform ; 22(2): 2182-2190, 2021 03 22.

Artigo em Inglês | MEDLINE | ID: mdl-32349124

RESUMO

Circular RNAs (circRNAs) are covalently closed long noncoding RNAs critical in diverse cellular activities and multiple human diseases. Several cancer-related viral circRNAs have been identified in double-stranded DNA viruses (dsDNA), yet no systematic study about the viral circRNAs has been reported. Herein, we have performed a systematic survey of 11 924 circRNAs from 23 viral species by computational prediction of viral circRNAs from viral-infection-related RNA sequencing data. Besides the dsDNA viruses, our study has also revealed lots of circRNAs in single-stranded RNA viruses and retro-transcribing viruses, such as the Zika virus, the Influenza A virus, the Zaire ebolavirus, and the Human immunodeficiency virus 1. Most viral circRNAs had reverse complementary sequences or repeated sequences at the flanking sequences of the back-splice sites. Most viral circRNAs only expressed in a specific cell line or tissue in a specific species. Functional enrichment analysis indicated that the viral circRNAs from dsDNA viruses were involved in KEGG pathways associated with cancer. All viral circRNAs presented in the current study were stored and organized in VirusCircBase, which is freely available at http://www.computationalbiology.cn/ViruscircBase/home.html and is the first virus circRNA database. VirusCircBase forms the fundamental atlas for the further exploration and investigation of viral circRNAs in the context of public health.

Assuntos

Sistemas de Gerenciamento de Base de Dados , RNA Circular/genética , RNA Viral/genética , Vírus/genética , Humanos

4.

Expert-augmented machine learning.

Gennatas, Efstathios D; Friedman, Jerome H; Ungar, Lyle H; Pirracchio, Romain; Eaton, Eric; Reichmann, Lara G; Interian, Yannet; Luna, José Marcio; Simone, Charles B; Auerbach, Andrew; Delgado, Elier; van der Laan, Mark J; Solberg, Timothy D; Valdes, Gilmer.

Proc Natl Acad Sci U S A ; 117(9): 4571-4577, 2020 03 03.

Artigo em Inglês | MEDLINE | ID: mdl-32071251

RESUMO

Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of humans and machines. Here, we present expert-augmented machine learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We used a large dataset of intensive-care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked 15 clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that, while clinicians agreed with the data in most cases, there were notable exceptions where they overestimated or underestimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinician-assessed risk and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problem-specific priors, which help build robust and dependable machine-learning models in critical applications.

Assuntos

Sistemas Inteligentes , Aprendizado de Máquina/normas , Informática Médica/métodos , Gerenciamento de Dados/métodos , Sistemas de Gerenciamento de Base de Dados , Informática Médica/normas

5.

Data management system for diabetes clinical trials: a pre-post evaluation study.

Nourani, Aynaz; Ayatollahi, Haleh; Solaymani-Dodaran, Masoud.

BMC Med Inform Decis Mak ; 23(1): 14, 2023 01 20.

Artigo em Inglês | MEDLINE | ID: mdl-36670481

RESUMO

BACKGROUND: Data management system for diabetes clinical trials is used to support clinical data management processes. The purpose of this study was to evaluate the quality and usability of this system from the users' perspectives. METHODS: This study was conducted in 2020, and the pre-post evaluation method was used to examine the quality and usability of the designed system. Initially, a questionnaire was designed and distributed among the researchers who were involved in the diabetes clinical trials (n = 30) to investigate their expectations. Then, the researchers were asked to use the system and explain their perspectives about it by completing two questionnaires. RESULTS: There was no statistically significant differences between the users' perspectives about the information quality, service quality, achievements, and communication before and after using the system. However, in terms of the system quality (P = 0.042) and users' autonomy (P = 0.026), the users' expectations were greater than the system performance. The system usability was at a good level based on the users' opinions. CONCLUSION: It seems that the designed system largely met the users' expectations in most areas. However, the system quality and users' autonomy need further attentions. In addition, the system should be used in multicenter trials and re-evaluated by a larger group of users.

Assuntos

Gerenciamento de Dados , Diabetes Mellitus , Humanos , Diabetes Mellitus/terapia , Inquéritos e Questionários , Ensaios Clínicos como Assunto , Sistemas de Gerenciamento de Base de Dados

6.

Evaluation of ontology structural metrics based on public repository data.

Franco, Manuel; Vivo, Juana María; Quesada-Martínez, Manuel; Duque-Ramos, Astrid; Fernández-Breis, Jesualdo Tomás.

Brief Bioinform ; 21(2): 473-485, 2020 03 23.

Artigo em Inglês | MEDLINE | ID: mdl-30715146

RESUMO

The development and application of biological ontologies have increased significantly in recent years. These ontologies can be retrieved from different repositories, which do not provide much information about quality aspects of the ontologies. In the past years, some ontology structural metrics have been proposed, but their validity as measurement instrument has not been sufficiently studied to date. In this work, we evaluate a set of reproducible and objective ontology structural metrics. Given the lack of standard methods for this purpose, we have applied an evaluation method based on the stability and goodness of the classifications of ontologies produced by each metric on an ontology corpus. The evaluation has been done using ontology repositories as corpora. More concretely, we have used 119 ontologies from the OBO Foundry repository and 78 ontologies from AgroPortal. First, we study the correlations between the metrics. Second, we study whether the clusters for a given metric are stable and have a good structure. The results show that the existing correlations are not biasing the evaluation, there are no metrics generating unstable clusterings and all the metrics evaluated provide at least reasonable clustering structure. Furthermore, our work permits to review and suggest the most reliable ontology structural metrics in terms of stability and goodness of their classifications. Availability: http://sele.inf.um.es/ontology-metrics.

Assuntos

Ontologias Biológicas , Sistemas de Gerenciamento de Base de Dados , Setor Público

7.

Community curation of bioinformatics software and data resources.

Ison, Jon; Ménager, Hervé; Brancotte, Bryan; Jaaniso, Erik; Salumets, Ahto; Racek, Tomás; Lamprecht, Anna-Lena; Palmblad, Magnus; Kalas, Matús; Chmura, Piotr; Hancock, John M; Schwämmle, Veit; Ienasescu, Hans-Ioan.

Brief Bioinform ; 21(5): 1697-1705, 2020 09 25.

Artigo em Inglês | MEDLINE | ID: mdl-31624831

RESUMO

The corpus of bioinformatics resources is huge and expanding rapidly, presenting life scientists with a growing challenge in selecting tools that fit the desired purpose. To address this, the European Infrastructure for Biological Information is supporting a systematic approach towards a comprehensive registry of tools and databases for all domains of bioinformatics, provided under a single portal (https://bio.tools). We describe here the practical means by which scientific communities, including individual developers and projects, through major service providers and research infrastructures, can describe their own bioinformatics resources and share these via bio.tools.

Assuntos

Participação da Comunidade , Biologia Computacional/métodos , Software , Biologia Computacional/normas , Sistemas de Gerenciamento de Base de Dados , Europa (Continente) , Humanos

8.

Molecular Biology Information Service: an innovative medical library-based bioinformatics support service for biomedical researchers.

Chattopadhyay, Ansuman; Iwema, Carrie L; Epstein, Barbara A; Lee, Adrian V; Levine, Arthur S.

Brief Bioinform ; 21(3): 876-884, 2020 05 21.

Artigo em Inglês | MEDLINE | ID: mdl-30949666

RESUMO

Biomedical researchers are increasingly reliant on obtaining bioinformatics training in order to conduct their research. Here we present a model that academic institutions may follow to provide such training for their researchers, based on the Molecular Biology Information Service (MBIS) of the Health Sciences Library System, University of Pittsburgh (Pitt). The MBIS runs a four-facet service with the following goals: (1) identify, procure and implement commercially licensed bioinformatics software, (2) teach hands-on workshops using bioinformatics tools to solve research questions, (3) provide in-person and email consultations on software/databases and (4) maintain a web portal providing overall guidance on the access and use of bioinformatics resources and MBIS-created webtools. This paper describes these facets of MBIS activities from 2006 to 2018, including outcomes from a survey measuring attitudes of Pitt researchers about MBIS service and performance.

Assuntos

Pesquisa Biomédica , Biologia Computacional/métodos , Bibliotecas Médicas/organização & administração , Pesquisadores , Sistemas de Gerenciamento de Base de Dados , Internet , Objetivos Organizacionais , Software

9.

MENDA: a comprehensive curated resource of metabolic characterization in depression.

Pu, Juncai; Yu, Yue; Liu, Yiyun; Tian, Lu; Gui, Siwen; Zhong, Xiaogang; Fan, Chu; Xu, Shaohua; Song, Xuemian; Liu, Lanxiang; Yang, Lining; Zheng, Peng; Chen, Jianjun; Cheng, Ke; Zhou, Chanjuan; Wang, Haiyang; Xie, Peng.

Brief Bioinform ; 21(4): 1455-1464, 2020 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-31157825

RESUMO

Depression is a seriously disabling psychiatric disorder with a significant burden of disease. Metabolic abnormalities have been widely reported in depressed patients and animal models. However, there are few systematic efforts that integrate meaningful biological insights from these studies. Herein, available metabolic knowledge in the context of depression was integrated to provide a systematic and panoramic view of metabolic characterization. After screening more than 10 000 citations from five electronic literature databases and five metabolomics databases, we manually curated 5675 metabolite entries from 464 studies, including human, rat, mouse and non-human primate, to develop a new metabolite-disease association database, called MENDA (http://menda.cqmu.edu.cn:8080/index.php). The standardized data extraction process was used for data collection, a multi-faceted annotation scheme was developed, and a user-friendly search engine and web interface were integrated for database access. To facilitate data analysis and interpretation based on MENDA, we also proposed a systematic analytical framework, including data integration and biological function analysis. Case studies were provided that identified the consistently altered metabolites using the vote-counting method, and that captured the underlying molecular mechanism using pathway and network analyses. Collectively, we provided a comprehensive curation of metabolic characterization in depression. Our model of a specific psychiatry disorder may be replicated to study other complex diseases.

Assuntos

Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Depressão/metabolismo , Metabolômica , Animais , Humanos , Modelos Animais

10.

The global dissemination of bacterial infections necessitates the study of reverse genomic epidemiology.

Ruan, Zhi; Yu, Yunsong; Feng, Ye.

Brief Bioinform ; 21(2): 741-750, 2020 03 23.

Artigo em Inglês | MEDLINE | ID: mdl-30715167

RESUMO

Whole genome sequencing (WGS) has revolutionized the genotyping of bacterial pathogens and is expected to become the new gold standard for tracing the transmissions of bacterial infectious diseases for public health purposes. Traditional genomic epidemiology often uses WGS as a verification tool, namely, when a common source or epidemiological link is suspected, the collected isolates are sequenced for the determination of clonal relationships. However, increasingly frequent international travel and food transportation, and the associated potential for the cross-border transmission of bacterial pathogens, often lead to an absence of information on bacterial transmission routes. Here we introduce the concept of 'reverse genomic epidemiology', i.e. when isolates are inspected by genome comparisons to be sufficiently similar to one another, they are assumed to be a consequence of infection from a common source. Through BacWGSTdb (http://bacdb.org/BacWGSTdb/), a database we have developed for bacterial genome typing and source tracking, we have found that almost the entire analyzed 20 bacterial species exhibit the phenomenon of cross-border clonal dissemination. Five networks were further identified in which isolates sharing nearly identical genomes were collected from at least five different countries. Three of these have been documented as real infectious disease outbreaks, therefore demonstrating the feasibility and authority of reverse genomic epidemiology. Our survey and proposed strategy would be of potential value in establishing a global surveillance system for tracing bacterial transmissions and outbreaks; the related database and techniques require urgent standardization.

Assuntos

Infecções Bacterianas/epidemiologia , Genoma Bacteriano , Saúde Global , Epidemiologia Molecular/métodos , Infecções Bacterianas/genética , Sistemas de Gerenciamento de Base de Dados , Surtos de Doenças , Humanos , Sequenciamento Completo do Genoma

11.

FishNET: An automated relational database for zebrafish colony management.

Cantu Gutierrez, Abiud; Cantu Gutierrez, Manuel; Rhyner, Alexander M; Ruiz, Oscar E; Eisenhoffer, George T; Wythe, Joshua D.

PLoS Biol ; 17(6): e3000343, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-31220074

RESUMO

The zebrafish Danio rerio is a powerful model system to study the genetics of development and disease. However, maintenance of zebrafish husbandry records is both time intensive and laborious, and a standardized way to manage and track the large amount of unique lines in a given laboratory or centralized facility has not been embraced by the field. Here, we present FishNET, an intuitive, open-source, relational database for managing data and information related to zebrafish husbandry and maintenance. By creating a "virtual facility," FishNET enables users to remotely inspect the rooms, racks, tanks, and lines within a given facility. Importantly, FishNET scales from one laboratory to an entire facility with several laboratories to multiple facilities, generating a cohesive laboratory and community-based platform. Automated data entry eliminates confusion regarding line nomenclature and streamlines maintenance of individual lines, while flexible query forms allow researchers to retrieve database records based on user-defined criteria. FishNET also links associated embryonic and adult biological samples with data, such as genotyping results or confocal images, to enable robust and efficient colony management and storage of laboratory information. A shared calendar function with email notifications and automated reminders for line turnover, automated tank counts, and census reports promote communication with both end users and administrators. The expected benefits of FishNET are improved vivaria efficiency, increased quality control for experimental numbers, and flexible data reporting and retrieval. FishNET's easy, intuitive record management and open-source, end-user-modifiable architecture provides an efficient solution to real-time zebrafish colony management for users throughout a facility and institution and, in some cases, across entire research hubs.

Assuntos

Criação de Animais Domésticos/métodos , Peixe-Zebra , Criação de Animais Domésticos/normas , Animais , Gerenciamento de Dados/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Laboratórios , Software

12.

Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base.

Konopka, Tomasz; Ng, Sandra; Smedley, Damian.

PLoS Comput Biol ; 17(8): e1009283, 2021 08.

Artigo em Inglês | MEDLINE | ID: mdl-34379637

RESUMO

Integrating reference datasets (e.g. from high-throughput experiments) with unstructured and manually-assembled information (e.g. notes or comments from individual researchers) has the potential to tailor bioinformatic analyses to specific needs and to lead to new insights. However, developing bespoke analysis pipelines from scratch is time-consuming, and general tools for exploring such heterogeneous data are not available. We argue that by treating all data as text, a knowledge-base can accommodate a range of bioinformatic data types and applications. We show that a database coupled to nearest-neighbor algorithms can address common tasks such as gene-set analysis as well as specific tasks such as ontology translation. We further show that a mathematical transformation motivated by diffusion can be effective for exploration across heterogeneous datasets. Diffusion enables the knowledge-base to begin with a sparse query, impute more features, and find matches that would otherwise remain hidden. This can be used, for example, to map multi-modal queries consisting of gene symbols and phenotypes to descriptions of diseases. Diffusion also enables user-driven learning: when the knowledge-base cannot provide satisfactory search results in the first instance, users can improve the results in real-time by adding domain-specific knowledge. User-driven learning has implications for data management, integration, and curation.

Assuntos

Bases de Conhecimento , Aprendizagem , Integração de Sistemas , Interface Usuário-Computador , Algoritmos , Sistemas de Gerenciamento de Base de Dados , Humanos

13.

RESCRIPt: Reproducible sequence taxonomy reference database management.

Robeson, Michael S; O'Rourke, Devon R; Kaehler, Benjamin D; Ziemski, Michal; Dillon, Matthew R; Foster, Jeffrey T; Bokulich, Nicholas A.

PLoS Comput Biol ; 17(11): e1009581, 2021 11.

Artigo em Inglês | MEDLINE | ID: mdl-34748542

RESUMO

Nucleotide sequence and taxonomy reference databases are critical resources for widespread applications including marker-gene and metagenome sequencing for microbiome analysis, diet metabarcoding, and environmental DNA (eDNA) surveys. Reproducibly generating, managing, using, and evaluating nucleotide sequence and taxonomy reference databases creates a significant bottleneck for researchers aiming to generate custom sequence databases. Furthermore, database composition drastically influences results, and lack of standardization limits cross-study comparisons. To address these challenges, we developed RESCRIPt, a Python 3 software package and QIIME 2 plugin for reproducible generation and management of reference sequence taxonomy databases, including dedicated functions that streamline creating databases from popular sources, and functions for evaluating, comparing, and interactively exploring qualitative and quantitative characteristics across reference databases. To highlight the breadth and capabilities of RESCRIPt, we provide several examples for working with popular databases for microbiome profiling (SILVA, Greengenes, NCBI-RefSeq, GTDB), eDNA and diet metabarcoding surveys (BOLD, GenBank), as well as for genome comparison. We show that bigger is not always better, and reference databases with standardized taxonomies and those that focus on type strains have quantitative advantages, though may not be appropriate for all use cases. Most databases appear to benefit from some curation (quality filtering), though sequence clustering appears detrimental to database quality. Finally, we demonstrate the breadth and extensibility of RESCRIPt for reproducible workflows with a comparison of global hepatitis genomes. RESCRIPt provides tools to democratize the process of reference database acquisition and management, enabling researchers to reproducibly and transparently create reference materials for diverse research applications. RESCRIPt is released under a permissive BSD-3 license at https://github.com/bokulich-lab/RESCRIPt.

Assuntos

Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas/estatística & dados numéricos , Software , Animais , Classificação , Biologia Computacional , Código de Barras de DNA Taxonômico , Bases de Dados de Ácidos Nucleicos , Genômica , Humanos , Metagenoma , Metagenômica , Microbiota/genética , Filogenia , RNA Ribossômico 16S/genética , Análise de Sequência

14.

CGRdb2.0: A Python Database Management System for Molecules, Reactions, and Chemical Data.

Gimadiev, Timur; Nugmanov, Ramil; Khakimova, Aigul; Fatykhova, Adeliya; Madzhidov, Timur; Sidorov, Pavel; Varnek, Alexandre.

J Chem Inf Model ; 62(9): 2015-2020, 2022 05 09.

Artigo em Inglês | MEDLINE | ID: mdl-34843251

RESUMO

This work introduces CGRdb2.0âan open-source database management system for molecules, reactions, and chemical data. CGRdb2.0 is a Python package connecting to a PostgreSQL database that enables native searches for molecules and reactions without complicated SQL syntax. The library provides out-of-the-box implementations for similarity and substructure searches for molecules, as well as similarity and substructure searches for reactions in two waysâbased on reaction components and based on the Condensed Graph of Reaction approach, the latter significantly accelerating the performance. In benchmarking studies with the RDKit database cartridge, we demonstrate that CGRdb2.0 performs searches faster for smaller data sets, while allowing for interactive access to the retrieved data.

Assuntos

Benchmarking , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais

15.

AnFiSA: An open-source computational platform for the analysis of sequencing data for rare genetic disease.

Bouzinier, M A; Etin, D; Trifonov, S I; Evdokimova, V N; Ulitin, V; Shen, J; Kokorev, A; Ghazani, A A; Chekaluk, Y; Albertyn, Z; Giersch, A; Morton, C C; Abraamyan, F; Bendapudi, P K; Sunyaev, S; Krier, J B.

J Biomed Inform ; 133: 104174, 2022 09.

Artigo em Inglês | MEDLINE | ID: mdl-35998814

RESUMO

Despite genomic sequencing rapidly transforming from being a bench-side tool to a routine procedure in a hospital, there is a noticeable lack of genomic analysis software that supports both clinical and research workflows as well as crowdsourcing. Furthermore, most existing software packages are not forward-compatible in regards to supporting ever-changing diagnostic rules adopted by the genetics community. Regular updates of genomics databases pose challenges for reproducible and traceable automated genetic diagnostics tools. Lastly, most of the software tools score low on explainability amongst clinicians. We have created a fully open-source variant curation tool, AnFiSA, with the intention to invite and accept contributions from clinicians, researchers, and professional software developers. The design of AnFiSA addresses the aforementioned issues via the following architectural principles: using a multidimensional database management system (DBMS) for genomic data to address reproducibility, curated decision trees adaptable to changing clinical rules, and a crowdsourcing-friendly interface to address difficult-to-diagnose cases. We discuss how we have chosen our technology stack and describe the design and implementation of the software. Finally, we show in detail how selected workflows can be implemented using the current version of AnFiSA by a medical geneticist.

Assuntos

Genômica , Software , Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genômica/métodos , Reprodutibilidade dos Testes , Fluxo de Trabalho

16.

Practical implications of using non-relational databases to store large genomic data files and novel phenotypes.

Moreira Souza, André; Weigert, Rodrigo de Andrade Santos; Machado de Sousa, Elaine Parros; Tassoni Andrietta, Lucas; Ventura, Ricardo Vieira.

J Anim Breed Genet ; 139(1): 100-112, 2022 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-34459042

RESUMO

The objective of our study was to provide practical directions on the storage of genomic information and novel phenotypes (treated here as unstructured data) using a non-relational database. The MongoDB technology was assessed for this purpose, enabling frequent data transactions involving numerous individuals under genetic evaluation. Our study investigated different genomic (Illumina Final Report, PLINK, 0125, FASTQ, and VCF formats) and phenotypic (including media files) information, using both real and simulated datasets. Advantages of our centralized database concept include the sublinear running time for queries after increasing the number of samples/markers exponentially, in addition to the comprehensive management of distinct data formats while searching for specific genomic regions. A comparison of our non-relational and generic solution, with an existing relational approach (developed for tabular data types using 2 bits to store genotypes), showed reduced importing time to handle 50M SNPs (PLINK format) achieved by the relational schema. Our experimental results also reinforce that data conversion is a costly step required to manage genomic data into both relational and non-relational database systems, and therefore, must be carefully treated for large applications.

Assuntos

Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação , Animais , Genômica , Genótipo , Fenótipo

17.

Propedia: a database for protein-peptide identification based on a hybrid clustering algorithm.

Martins, Pedro M; Santos, Lucianna H; Mariano, Diego; Queiroz, Felippe C; Bastos, Luana L; Gomes, Isabela de S; Fischer, Pedro H C; Rocha, Rafael E O; Silveira, Sabrina A; de Lima, Leonardo H F; de Magalhães, Mariana T Q; Oliveira, Maria G A; de Melo-Minardi, Raquel C.

BMC Bioinformatics ; 22(1): 1, 2021 Jan 02.

Artigo em Inglês | MEDLINE | ID: mdl-33388027

RESUMO

BACKGROUND: Protein-peptide interactions play a fundamental role in a wide variety of biological processes, such as cell signaling, regulatory networks, immune responses, and enzyme inhibition. Peptides are characterized by low toxicity and small interface areas; therefore, they are good targets for therapeutic strategies, rational drug planning and protein inhibition. Approximately 10% of the ethical pharmaceutical market is protein/peptide-based. Furthermore, it is estimated that 40% of protein interactions are mediated by peptides. Despite the fast increase in the volume of biological data, particularly on sequences and structures, there remains a lack of broad and comprehensive protein-peptide databases and tools that allow the retrieval, characterization and understanding of protein-peptide recognition and consequently support peptide design. RESULTS: We introduce Propedia, a comprehensive and up-to-date database with a web interface that permits clustering, searching and visualizing of protein-peptide complexes according to varied criteria. Propedia comprises over 19,000 high-resolution structures from the Protein Data Bank including structural and sequence information from protein-peptide complexes. The main advantage of Propedia over other peptide databases is that it allows a more comprehensive analysis of similarity and redundancy. It was constructed based on a hybrid clustering algorithm that compares and groups peptides by sequences, interface structures and binding sites. Propedia is available through a graphical, user-friendly and functional interface where users can retrieve, and analyze complexes and download each search data set. We performed case studies and verified that the utility of Propedia scores to rank promissing interacting peptides. In a study involving predicting peptides to inhibit SARS-CoV-2 main protease, we showed that Propedia scores related to similarity between different peptide complexes with SARS-CoV-2 main protease are in agreement with molecular dynamics free energy calculation. CONCLUSIONS: Propedia is a database and tool to support structure-based rational design of peptides for special purposes. Protein-peptide interactions can be useful to predict, classifying and scoring complexes or for designing new molecules as well. Propedia is up-to-date as a ready-to-use webserver with a friendly and resourceful interface and is available at: https://bioinfo.dcc.ufmg.br/propedia.

Assuntos

Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Peptídeos/química , Proteínas/química , Algoritmos , Humanos

18.

PDB-tools web: A user-friendly interface for the manipulation of PDB files.

Jiménez-García, Brian; Teixeira, João M C; Trellet, Mikael; Rodrigues, João P G L M; Bonvin, Alexandre M J J.

Proteins ; 89(3): 330-335, 2021 03.

Artigo em Inglês | MEDLINE | ID: mdl-33111403

RESUMO

The Protein Data Bank (PDB) file format remains a popular format used and supported by many software to represent coordinates of macromolecular structures. It however suffers from drawbacks such as error-prone manual editing. Because of that, various software toolkits have been developed to facilitate its editing and manipulation, but, to date, there is no online tool available for this purpose. Here we present PDB-Tools Web, a flexible online service for manipulating PDB files. It offers a rich and user-friendly graphical user interface that allows users to mix-and-match more than 40 individual tools from the pdb-tools suite. Those can be combined in a few clicks to perform complex pipelines, which can be saved and uploaded. The resulting processed PDB files can be visualized online and downloaded. The web server is freely available at https://wenmr.science.uu.nl/pdbtools.

Assuntos

Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Interface Usuário-Computador , Internet , Modelos Moleculares , Conformação Proteica , Proteínas/química

19.

Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases.

Rifaioglu, Ahmet Sureyya; Atas, Heval; Martin, Maria Jesus; Cetin-Atalay, Rengul; Atalay, Volkan; Dogan, Tunca.

Brief Bioinform ; 20(5): 1878-1912, 2019 09 27.

Artigo em Inglês | MEDLINE | ID: mdl-30084866

RESUMO

The identification of interactions between drugs/compounds and their targets is crucial for the development of new drugs. In vitro screening experiments (i.e. bioassays) are frequently used for this purpose; however, experimental approaches are insufficient to explore novel drug-target interactions, mainly because of feasibility problems, as they are labour intensive, costly and time consuming. A computational field known as 'virtual screening' (VS) has emerged in the past decades to aid experimental drug discovery studies by statistically estimating unknown bio-interactions between compounds and biological targets. These methods use the physico-chemical and structural properties of compounds and/or target proteins along with the experimentally verified bio-interaction information to generate predictive models. Lately, sophisticated machine learning techniques are applied in VS to elevate the predictive performance. The objective of this study is to examine and discuss the recent applications of machine learning techniques in VS, including deep learning, which became highly popular after giving rise to epochal developments in the fields of computer vision and natural language processing. The past 3 years have witnessed an unprecedented amount of research studies considering the application of deep learning in biomedicine, including computational drug discovery. In this review, we first describe the main instruments of VS methods, including compound and protein features (i.e. representations and descriptors), frequently used libraries and toolkits for VS, bioactivity databases and gold-standard data sets for system training and benchmarking. We subsequently review recent VS studies with a strong emphasis on deep learning applications. Finally, we discuss the present state of the field, including the current challenges and suggest future directions. We believe that this survey will provide insight to the researchers working in the field of computational drug discovery in terms of comprehending and developing novel bio-prediction methods.

Assuntos

Sistemas de Gerenciamento de Base de Dados , Aprendizado Profundo , Descoberta de Drogas , Simulação por Computador

20.

Big data management challenges in health research-a literature review.

Wang, Xiaoming; Williams, Carolyn; Liu, Zhen Hua; Croghan, Joe.

Brief Bioinform ; 20(1): 156-167, 2019 01 18.

Artigo em Inglês | MEDLINE | ID: mdl-28968677

RESUMO

Big data management for information centralization (i.e. making data of interest findable) and integration (i.e. making related data connectable) in health research is a defining challenge in biomedical informatics. While essential to create a foundation for knowledge discovery, optimized solutions to deliver high-quality and easy-to-use information resources are not thoroughly explored. In this review, we identify the gaps between current data management approaches and the need for new capacity to manage big data generated in advanced health research. Focusing on these unmet needs and well-recognized problems, we introduce state-of-the-art concepts, approaches and technologies for data management from computing academia and industry to explore improvement solutions. We explain the potential and significance of these advances for biomedical informatics. In addition, we discuss specific issues that have a great impact on technical solutions for developing the next generation of digital products (tools and data) to facilitate the raw-data-to-knowledge process in health research.

Assuntos

Big Data , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Biologia Computacional/tendências , Sistemas de Gerenciamento de Base de Dados/estatística & dados numéricos , Sistemas de Gerenciamento de Base de Dados/tendências , Humanos , Bases de Conhecimento , Aprendizado de Máquina , Pesquisa/estatística & dados numéricos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA