Pesquisa | Secretaria de Estado da Saúde

1.

EnzymeML: seamless data flow and modeling of enzymatic data.

Lauterbach, Simone; Dienhart, Hannah; Range, Jan; Malzacher, Stephan; Spöring, Jan-Dirk; Rother, Dörte; Pinto, Maria Filipa; Martins, Pedro; Lagerman, Colton E; Bommarius, Andreas S; Høst, Amalie Vang; Woodley, John M; Ngubane, Sandile; Kudanga, Tukayi; Bergmann, Frank T; Rohwer, Johann M; Iglezakis, Dorothea; Weidemann, Andreas; Wittig, Ulrike; Kettner, Carsten; Swainston, Neil; Schnell, Santiago; Pleiss, Jürgen.

Nat Methods ; 20(3): 400-402, 2023 03.

Artigo em Inglês | MEDLINE | ID: mdl-36759590

RESUMO

The design of biocatalytic reaction systems is highly complex owing to the dependency of the estimated kinetic parameters on the enzyme, the reaction conditions, and the modeling method. Consequently, reproducibility of enzymatic experiments and reusability of enzymatic data are challenging. We developed the XML-based markup language EnzymeML to enable storage and exchange of enzymatic data such as reaction conditions, the time course of the substrate and the product, kinetic parameters and the kinetic model, thus making enzymatic data findable, accessible, interoperable and reusable (FAIR). The feasibility and usefulness of the EnzymeML toolbox is demonstrated in six scenarios, for which data and metadata of different enzymatic reactions are collected and analyzed. EnzymeML serves as a seamless communication channel between experimental platforms, electronic lab notebooks, tools for modeling of enzyme kinetics, publication platforms and enzymatic reaction databases. EnzymeML is open and transparent, and invites the community to contribute. All documents and codes are freely available at https://enzymeml.org .

Assuntos

Gerenciamento de Dados , Metadados , Reprodutibilidade dos Testes , Bases de Dados Factuais , Cinética

2.

The Importance, Challenges, and Possible Solutions for Sharing Proteomics Data While Safeguarding Individuals' Privacy.

Shome, Mahasish; MacKenzie, Tim M G; Subbareddy, Smitha R; Snyder, Michael P.

Mol Cell Proteomics ; 23(3): 100731, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38331191

RESUMO

Proteomics data sharing has profound benefits at the individual level as well as at the community level. While data sharing has increased over the years, mostly due to journal and funding agency requirements, the reluctance of researchers with regard to data sharing is evident as many shares only the bare minimum dataset required to publish an article. In many cases, proper metadata is missing, essentially making the dataset useless. This behavior can be explained by a lack of incentives, insufficient awareness, or a lack of clarity surrounding ethical issues. Through adequate training at research institutes, researchers can realize the benefits associated with data sharing and can accelerate the norm of data sharing for the field of proteomics, as has been the standard in genomics for decades. In this article, we have put together various repository options available for proteomics data. We have also added pros and cons of those repositories to facilitate researchers in selecting the repository most suitable for their data submission. It is also important to note that a few types of proteomics data have the potential to re-identify an individual in certain scenarios. In such cases, extra caution should be taken to remove any personal identifiers before sharing on public repositories. Data sets that will be useless without personal identifiers need to be shared in a controlled access repository so that only authorized researchers can access the data and personal identifiers are kept safe.

Assuntos

Privacidade , Proteômica , Humanos , Genômica , Metadados , Disseminação de Informação

3.

MetaboLights: open data repository for metabolomics.

Yurekten, Ozgur; Payne, Thomas; Tejera, Noemi; Amaladoss, Felix Xavier; Martin, Callum; Williams, Mark; O'Donovan, Claire.

Nucleic Acids Res ; 52(D1): D640-D646, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37971328

RESUMO

MetaboLights is a global database for metabolomics studies including the raw experimental data and the associated metadata. The database is cross-species and cross-technique and covers metabolite structures and their reference spectra as well as their biological roles and locations where available. MetaboLights is the recommended metabolomics repository for a number of leading journals and ELIXIR, the European infrastructure for life science information. In this article, we describe the continued growth and diversity of submissions and the significant developments in recent years. In particular, we highlight MetaboLights Labs, our new Galaxy Project instance with repository-scale standardized workflows, and how data public on MetaboLights are being reused by the community. Metabolomics resources and data are available under the EMBL-EBI's Terms of Use at https://www.ebi.ac.uk/metabolights and under Apache 2.0 at https://github.com/EBI-Metabolights.

Assuntos

Bases de Dados Genéticas , Metabolômica , Metabolômica/métodos , Metadados , Internet

4.

SilkMeta: a comprehensive platform for sharing and exploiting pan-genomic and multi-omic silkworm data.

Lu, Kunpeng; Pan, Yifei; Shen, Jianghong; Yang, Lin; Zhan, Chengyu; Liang, Shubo; Tai, Shuaishuai; Wan, Linrong; Li, Tian; Cheng, Tingcai; Ma, Bi; Pan, Guoqing; He, Ningjia; Lu, Cheng; Westhof, Eric; Xiang, Zhonghuai; Han, Min-Jin; Tong, Xiaoling; Dai, Fangyin.

Nucleic Acids Res ; 52(D1): D1024-D1032, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37941143

RESUMO

The silkworm Bombyx mori is a domesticated insect that serves as an animal model for research and agriculture. The silkworm super-pan-genome dataset, which we published last year, is a unique resource for the study of global genomic diversity and phenotype-genotype association. Here we present SilkMeta (http://silkmeta.org.cn), a comprehensive database covering the available silkworm pan-genome and multi-omics data. The database contains 1082 short-read genomes, 546 long-read assembled genomes, 1168 transcriptomes, 294 phenotype characterizations (phenome), tens of millions of variations (variome), 7253 long non-coding RNAs (lncRNAs), 18 717 full length transcripts and a set of population statistics. We have compiled publications on functional genomics research and genetic stock deciphering (mutant map). A range of bioinformatics tools is also provided for data visualization and retrieval. The large batch of omics data and tools were integrated in twelve functional modules that provide useful strategies and data for comparative and functional genomics research. The interactive bioinformatics platform SilkMeta will benefit not only the silkworm but also the insect biology communities.

Assuntos

Bombyx , Genoma de Inseto , Animais , Bombyx/genética , Biologia Computacional , Genômica , Metadados , Multiômica

5.

DDBJ update in 2023: the MetaboBank for metabolomics data and associated metadata.

Ara, Takeshi; Kodama, Yuichi; Tokimatsu, Toshiaki; Fukuda, Asami; Kosuge, Takehide; Mashima, Jun; Tanizawa, Yasuhiro; Tanjo, Tomoya; Ogasawara, Osamu; Fujisawa, Takatomo; Nakamura, Yasukazu; Arita, Masanori.

Nucleic Acids Res ; 52(D1): D67-D71, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37971299

RESUMO

The Bioinformation and DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp) provides database archives that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), DDBJ accepts and distributes nucleotide sequence data as well as their study and sample information along with the National Center for Biotechnology Information in the United States and the European Bioinformatics Institute (EBI). Besides INSDC databases, the DDBJ Center provides databases for functional genomics (GEA: Genomic Expression Archive), metabolomics (MetaboBank) and human genetic and phenotypic data (JGA: Japanese Genotype-phenotype Archive). These database systems have been built on the National Institute of Genetics (NIG) supercomputer, which is also open for domestic life science researchers to analyze large-scale sequence data. This paper reports recent updates on the archival databases and the services of the DDBJ Center, highlighting the newly redesigned MetaboBank. MetaboBank uses BioProject and BioSample in its metadata description making it suitable for multi-omics large studies. Its collaboration with MetaboLights at EBI brings synergy in locating and reusing public data.

Assuntos

Bases de Dados de Ácidos Nucleicos , Metabolômica , Metadados , Humanos , Biologia Computacional , Genômica , Internet , Japão , Multiômica/métodos

6.

IMG/PR: a database of plasmids from genomes and metagenomes with rich annotations and metadata.

Camargo, Antonio Pedro; Call, Lee; Roux, Simon; Nayfach, Stephen; Huntemann, Marcel; Palaniappan, Krishnaveni; Ratner, Anna; Chu, Ken; Mukherjeep, Supratim; Reddy, T B K; Chen, I-Min A; Ivanova, Natalia N; Eloe-Fadrosh, Emiley A; Woyke, Tanja; Baltrus, David A; Castañeda-Barba, Salvador; de la Cruz, Fernando; Funnell, Barbara E; Hall, James P J; Mukhopadhyay, Aindrila; Rocha, Eduardo P C; Stalder, Thibault; Top, Eva; Kyrpides, Nikos C.

Nucleic Acids Res ; 52(D1): D164-D173, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37930866

RESUMO

Plasmids are mobile genetic elements found in many clades of Archaea and Bacteria. They drive horizontal gene transfer, impacting ecological and evolutionary processes within microbial communities, and hold substantial importance in human health and biotechnology. To support plasmid research and provide scientists with data of an unprecedented diversity of plasmid sequences, we introduce the IMG/PR database, a new resource encompassing 699 973 plasmid sequences derived from genomes, metagenomes and metatranscriptomes. IMG/PR is the first database to provide data of plasmid that were systematically identified from diverse microbiome samples. IMG/PR plasmids are associated with rich metadata that includes geographical and ecosystem information, host taxonomy, similarity to other plasmids, functional annotation, presence of genes involved in conjugation and antibiotic resistance. The database offers diverse methods for exploring its extensive plasmid collection, enabling users to navigate plasmids through metadata-centric queries, plasmid comparisons and BLAST searches. The web interface for IMG/PR is accessible at https://img.jgi.doe.gov/pr. Plasmid metadata and sequences can be downloaded from https://genome.jgi.doe.gov/portal/IMG_PR.

Assuntos

Metagenoma , Microbiota , Humanos , Metadados , Software , Bases de Dados Genéticas , Plasmídeos/genética

7.

Expression Atlas update: insights from sequencing data at both bulk and single cell level.

George, Nancy; Fexova, Silvie; Fuentes, Alfonso Munoz; Madrigal, Pedro; Bi, Yalan; Iqbal, Haider; Kumbham, Upendra; Nolte, Nadja Francesca; Zhao, Lingyun; Thanki, Anil S; Yu, Iris D; Marugan Calles, Jose C; Erdos, Karoly; Vilmovsky, Liora; Kurri, Sandeep R; Vathrakokoili-Pournara, Anna; Osumi-Sutherland, David; Prakash, Ananth; Wang, Shengbo; Tello-Ruiz, Marcela K; Kumari, Sunita; Ware, Doreen; Goutte-Gattat, Damien; Hu, Yanhui; Brown, Nick; Perrimon, Norbert; Vizcaíno, Juan Antonio; Burdett, Tony; Teichmann, Sarah; Brazma, Alvis; Papatheodorou, Irene.

Nucleic Acids Res ; 52(D1): D107-D114, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37992296

RESUMO

Expression Atlas (www.ebi.ac.uk/gxa) and its newest counterpart the Single Cell Expression Atlas (www.ebi.ac.uk/gxa/sc) are EMBL-EBI's knowledgebases for gene and protein expression and localisation in bulk and at single cell level. These resources aim to allow users to investigate their expression in normal tissue (baseline) or in response to perturbations such as disease or changes to genotype (differential) across multiple species. Users are invited to search for genes or metadata terms across species or biological conditions in a standardised consistent interface. Alongside these data, new features in Single Cell Expression Atlas allow users to query metadata through our new cell type wheel search. At the experiment level data can be explored through two types of dimensionality reduction plots, t-distributed Stochastic Neighbor Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP), overlaid with either clustering or metadata information to assist users' understanding. Data are also visualised as marker gene heatmaps identifying genes that help confer cluster identity. For some data, additional visualisations are available as interactive cell level anatomograms and cell type gene expression heatmaps.

Assuntos

Bases de Dados Genéticas , Perfilação da Expressão Gênica , Proteômica , Genótipo , Metadados , Análise de Célula Única , Internet , Humanos , Animais

8.

pyM2aia: Python interface for mass spectrometry imaging with focus on deep learning.

Cordes, Jonas; Enzlein, Thomas; Hopf, Carsten; Wolf, Ivo.

Bioinformatics ; 40(3)2024 Mar 04.

Artigo em Inglês | MEDLINE | ID: mdl-38445753

RESUMO

SUMMARY: Python is the most commonly used language for deep learning (DL). Existing Python packages for mass spectrometry imaging (MSI) data are not optimized for DL tasks. We, therefore, introduce pyM2aia, a Python package for MSI data analysis with a focus on memory-efficient handling, processing and convenient data-access for DL applications. pyM2aia provides interfaces to its parent application M2aia, which offers interactive capabilities for exploring and annotating MSI data in imzML format. pyM2aia utilizes the image input and output routines, data formats, and processing functions of M2aia, ensures data interchangeability, and enables the writing of readable and easy-to-maintain DL pipelines by providing batch generators for typical MSI data access strategies. We showcase the package in several examples, including imzML metadata parsing, signal processing, ion-image generation, and, in particular, DL model training and inference for spectrum-wise approaches, ion-image-based approaches, and approaches that use spectral and spatial information simultaneously. AVAILABILITY AND IMPLEMENTATION: Python package, code and examples are available at (https://m2aia.github.io/m2aia).

Assuntos

Aprendizado Profundo , Software , Espectrometria de Massas/métodos , Idioma , Metadados

9.

Standardized metadata for biological samples could unlock the potential of collections.

Brlík, Vojtech.

Nature ; 629(8012): 531, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38745091

Assuntos

Metadados , Metadados/normas , Animais , Humanos , Manejo de Espécimes/normas , Bancos de Espécimes Biológicos/normas

10.

IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata.

Camargo, Antonio Pedro; Nayfach, Stephen; Chen, I-Min A; Palaniappan, Krishnaveni; Ratner, Anna; Chu, Ken; Ritter, Stephan J; Reddy, T B K; Mukherjee, Supratim; Schulz, Frederik; Call, Lee; Neches, Russell Y; Woyke, Tanja; Ivanova, Natalia N; Eloe-Fadrosh, Emiley A; Kyrpides, Nikos C; Roux, Simon.

Nucleic Acids Res ; 51(D1): D733-D743, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36399502

RESUMO

Viruses are widely recognized as critical members of all microbiomes. Metagenomics enables large-scale exploration of the global virosphere, progressively revealing the extensive genomic diversity of viruses on Earth and highlighting the myriad of ways by which viruses impact biological processes. IMG/VR provides access to the largest collection of viral sequences obtained from (meta)genomes, along with functional annotation and rich metadata. A web interface enables users to efficiently browse and search viruses based on genome features and/or sequence similarity. Here, we present the fourth version of IMG/VR, composed of >15 million virus genomes and genome fragments, a ≈6-fold increase in size compared to the previous version. These clustered into 8.7 million viral operational taxonomic units, including 231 408 with at least one high-quality representative. Viral sequences in IMG/VR are now systematically identified from genomes, metagenomes, and metatranscriptomes using a new detection approach (geNomad), and IMG standard annotation are complemented with genome quality estimation using CheckV, taxonomic classification reflecting the latest taxonomic standards, and microbial host taxonomy prediction. IMG/VR v4 is available at https://img.jgi.doe.gov/vr, and the underlying data are available to download at https://genome.jgi.doe.gov/portal/IMG_VR.

Assuntos

Bases de Dados Genéticas , Genoma Viral , Metadados , Metagenômica , Software

11.

OMD Curation Toolkit: a workflow for in-house curation of public omics datasets.

Piquer-Esteban, Samuel; Arnau, Vicente; Diaz, Wladimiro; Moya, Andrés.

BMC Bioinformatics ; 25(1): 184, 2024 May 09.

Artigo em Inglês | MEDLINE | ID: mdl-38724907

RESUMO

BACKGROUND: Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with and especially curating public omics datasets remains challenging despite these efforts. While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing. RESULTS: Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets. This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects. While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources. CONCLUSIONS: Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data. Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.

Assuntos

Curadoria de Dados , Software , Fluxo de Trabalho , Curadoria de Dados/métodos , Metadados , Bases de Dados Genéticas , Genômica/métodos , Biologia Computacional/métodos

12.

Micro-Meta App: an interactive tool for collecting microscopy metadata based on community specifications.

Rigano, Alessandro; Ehmsen, Shannon; Öztürk, Serkan Utku; Ryan, Joel; Balashov, Alexander; Hammer, Mathias; Kirli, Koray; Boehm, Ulrike; Brown, Claire M; Bellve, Karl; Chambers, James J; Cosolo, Andrea; Coleman, Robert A; Faklaris, Orestis; Fogarty, Kevin E; Guilbert, Thomas; Hamacher, Anna B; Itano, Michelle S; Keeley, Daniel P; Kunis, Susanne; Lacoste, Judith; Laude, Alex; Ma, Willa Y; Marcello, Marco; Montero-Llopis, Paula; Nelson, Glyn; Nitschke, Roland; Pimentel, Jaime A; Weidtkamp-Peters, Stefanie; Park, Peter J; Alver, Burak H; Grunwald, David; Strambio-De-Castillia, Caterina.

Nat Methods ; 18(12): 1489-1495, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34862503

RESUMO

For quality, interpretation, reproducibility and sharing value, microscopy images should be accompanied by detailed descriptions of the conditions that were used to produce them. Micro-Meta App is an intuitive, highly interoperable, open-source software tool that was developed in the context of the 4D Nucleome (4DN) consortium and is designed to facilitate the extraction and collection of relevant microscopy metadata as specified by the recent 4DN-BINA-OME tiered-system of Microscopy Metadata specifications. In addition to substantially lowering the burden of quality assurance, the visual nature of Micro-Meta App makes it particularly suited for training purposes.

Assuntos

Metadados , Microscopia Confocal/instrumentação , Microscopia Confocal/métodos , Microscopia de Fluorescência/instrumentação , Microscopia de Fluorescência/métodos , Aplicativos Móveis , Linguagens de Programação , Software , Animais , Linhagem Celular , Biologia Computacional/métodos , Humanos , Processamento de Imagem Assistida por Computador , Camundongos , Reconhecimento Automatizado de Padrão , Controle de Qualidade , Reprodutibilidade dos Testes , Interface Usuário-Computador , Fluxo de Trabalho

13.

OME-NGFF: a next-generation file format for expanding bioimaging data-access strategies.

Moore, Josh; Allan, Chris; Besson, Sébastien; Burel, Jean-Marie; Diel, Erin; Gault, David; Kozlowski, Kevin; Lindner, Dominik; Linkert, Melissa; Manz, Trevor; Moore, Will; Pape, Constantin; Tischer, Christian; Swedlow, Jason R.

Nat Methods ; 18(12): 1496-1498, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34845388

RESUMO

The rapid pace of innovation in biological imaging and the diversity of its applications have prevented the establishment of a community-agreed standardized data format. We propose that complementing established open formats such as OME-TIFF and HDF5 with a next-generation file format such as Zarr will satisfy the majority of use cases in bioimaging. Critically, a common metadata format used in all these vessels can deliver truly findable, accessible, interoperable and reusable bioimaging data.

Assuntos

Biologia Computacional/instrumentação , Biologia Computacional/normas , Metadados , Microscopia/instrumentação , Microscopia/normas , Software , Benchmarking , Biologia Computacional/métodos , Compressão de Dados , Bases de Dados Factuais , Armazenamento e Recuperação da Informação , Internet , Microscopia/métodos , Linguagens de Programação , SARS-CoV-2

14.

Learning representations for gene ontology terms by jointly encoding graph structure and textual node descriptors.

Zhao, Lingling; Sun, Huiting; Cao, Xinyi; Wen, Naifeng; Wang, Junjie; Wang, Chunyu.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-35901452

RESUMO

Measuring the semantic similarity between Gene Ontology (GO) terms is a fundamental step in numerous functional bioinformatics applications. To fully exploit the metadata of GO terms, word embedding-based methods have been proposed recently to map GO terms to low-dimensional feature vectors. However, these representation methods commonly overlook the key information hidden in the whole GO structure and the relationship between GO terms. In this paper, we propose a novel representation model for GO terms, named GT2Vec, which jointly considers the GO graph structure obtained by graph contrastive learning and the semantic description of GO terms based on BERT encoders. Our method is evaluated on a protein similarity task on a collection of benchmark datasets. The experimental results demonstrate the effectiveness of using a joint encoding graph structure and textual node descriptors to learn vector representations for GO terms.

Assuntos

Biologia Computacional , Semântica , Biologia Computacional/métodos , Ontologia Genética , Metadados

15.

A quality control portal for sequencing data deposited at the European genome-phenome archive.

Fernández-Orth, Dietmar; Rueda, Manuel; Singh, Babita; Moldes, Mauricio; Jene, Aina; Ferri, Marta; Vasallo, Claudia; Fromont, Lauren A; Navarro, Arcadi; Rambla, Jordi.

Brief Bioinform ; 23(3)2022 05 13.

Artigo em Inglês | MEDLINE | ID: mdl-35438138

RESUMO

Since its launch in 2008, the European Genome-Phenome Archive (EGA) has been leading the archiving and distribution of human identifiable genomic data. In this regard, one of the community concerns is the potential usability of the stored data, as of now, data submitters are not mandated to perform any quality control (QC) before uploading their data and associated metadata information. Here, we present a new File QC Portal developed at EGA, along with QC reports performed and created for 1 694 442 files [Fastq, sequence alignment map (SAM)/binary alignment map (BAM)/CRAM and variant call format (VCF)] submitted at EGA. QC reports allow anonymous EGA users to view summary-level information regarding the files within a specific dataset, such as quality of reads, alignment quality, number and type of variants and other features. Researchers benefit from being able to assess the quality of data prior to the data access decision and thereby, increasing the reusability of data (https://ega-archive.org/blog/data-upcycling-powered-by-ega/).

Assuntos

Genoma , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Metadados , Controle de Qualidade , Software

16.

GEOfetch: a command-line tool for downloading data and standardized metadata from GEO and SRA.

Khoroshevskyi, Oleksandr; LeRoy, Nathan; Reuter, Vincent P; Sheffield, Nathan C.

Bioinformatics ; 39(3)2023 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-36857584

RESUMO

MOTIVATION: The Gene Expression Omnibus has become an important source of biological data for secondary analysis. However, there is no simple, programmatic way to download data and metadata from Gene Expression Omnibus (GEO) in a standardized annotation format. RESULTS: To address this, we present GEOfetch-a command-line tool that downloads and organizes data and metadata from GEO and SRA. GEOfetch formats the downloaded metadata as a Portable Encapsulated Project, providing universal format for the reanalysis of public data. AVAILABILITY AND IMPLEMENTATION: GEOfetch is available on Bioconda and the Python Package Index (PyPI).

Assuntos

Expressão Gênica , Metadados , Biologia Computacional

17.

PEMT: a patent enrichment tool for drug discovery.

Gadiya, Yojana; Zaliani, Andrea; Gribbon, Philip; Hofmann-Apitius, Martin.

Bioinformatics ; 39(1)2023 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-36322820

RESUMO

MOTIVATION: Drug discovery practitioners in industry and academia use semantic tools to extract information from online scientific literature to generate new insights into targets, therapeutics and diseases. However, due to complexities in access and analysis, patent-based literature is often overlooked as a source of information. As drug discovery is a highly competitive field, naturally, tools that tap into patent literature can provide any actor in the field an advantage in terms of better informed decision-making. Hence, we aim to facilitate access to patent literature through the creation of an automatic tool for extracting information from patents described in existing public resources. RESULTS: Here, we present PEMT, a novel patent enrichment tool, that takes advantage of public databases like ChEMBL and SureChEMBL to extract relevant patent information linked to chemical structures and/or gene names described through FAIR principles and metadata annotations. PEMT aims at supporting drug discovery and research by establishing a patent landscape around genes of interest. The pharmaceutical focus of the tool is mainly due to the subselection of International Patent Classification codes, but in principle, it can be used for other patent fields, provided that a link between a concept and chemical structure is investigated. Finally, we demonstrate a use-case in rare diseases by generating a gene-patent list based on the epidemiological prevalence of these diseases and exploring their underlying patent landscapes. AVAILABILITY AND IMPLEMENTATION: PEMT is an open-source Python tool and its source code and PyPi package are available at https://github.com/Fraunhofer-ITMP/PEMT and https://pypi.org/project/PEMT/, respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Metadados , Software , Bases de Dados Factuais

18.

Metadata retrieval from sequence databases with ffq.

Gálvez-Merchán, Ángel; Min, Kyung Hoi Joseph; Pachter, Lior; Booeshaghi, A Sina.

Bioinformatics ; 39(1)2023 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-36610997

RESUMO

MOTIVATION: Several genomic databases host data and metadata for an ever-growing collection of sequence datasets. While these databases have a shared hierarchical structure, there are no tools specifically designed to leverage it for metadata extraction. RESULTS: We present a command-line tool, called ffq, for querying user-generated data and metadata from sequence databases. Given an accession or a paper's DOI, ffq efficiently fetches metadata and links to raw data in JSON format. ffq's modularity and simplicity make it extensible to any genomic database exposing its data for programmatic access. AVAILABILITY AND IMPLEMENTATION: ffq is free and open source, and the code can be found here: https://github.com/pachterlab/ffq.

Assuntos

Metadados , Software , Bases de Dados de Ácidos Nucleicos

19.

Privacy preserving identification of population stratification for collaborative genomic research.

Dervishi, Leonard; Li, Wenbiao; Halimi, Anisa; Jiang, Xiaoqian; Vaidya, Jaideep; Ayday, Erman.

Bioinformatics ; 39(39 Suppl 1): i168-i176, 2023 06 30.

Artigo em Inglês | MEDLINE | ID: mdl-37387172

RESUMO

The rapid improvements in genomic sequencing technology have led to the proliferation of locally collected genomic datasets. Given the sensitivity of genomic data, it is crucial to conduct collaborative studies while preserving the privacy of the individuals. However, before starting any collaborative research effort, the quality of the data needs to be assessed. One of the essential steps of the quality control process is population stratification: identifying the presence of genetic difference in individuals due to subpopulations. One of the common methods used to group genomes of individuals based on ancestry is principal component analysis (PCA). In this article, we propose a privacy-preserving framework which utilizes PCA to assign individuals to populations across multiple collaborators as part of the population stratification step. In our proposed client-server-based scheme, we initially let the server train a global PCA model on a publicly available genomic dataset which contains individuals from multiple populations. The global PCA model is later used to reduce the dimensionality of the local data by each collaborator (client). After adding noise to achieve local differential privacy (LDP), the collaborators send metadata (in the form of their local PCA outputs) about their research datasets to the server, which then aligns the local PCA results to identify the genetic differences among collaborators' datasets. Our results on real genomic data show that the proposed framework can perform population stratification analysis with high accuracy while preserving the privacy of the research participants.

Assuntos

Genômica , Privacidade , Humanos , Mapeamento Cromossômico , Metadados , Análise de Componente Principal

20.

Microbiome Toolbox: methodological approaches to derive and visualize microbiome trajectories.

Banjac, Jelena; Sprenger, Norbert; Dogra, Shaillay Kumar.

Bioinformatics ; 39(1)2023 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-36469345

RESUMO

MOTIVATION: The gut microbiome changes rapidly under the influence of different factors such as age, dietary changes or medications to name just a few. To analyze and understand such changes, we present a Microbiome Toolbox. We implemented several methods for analysis and exploration to provide interactive visualizations for easy comprehension and reporting of longitudinal microbiome data. RESULTS: Based on the abundance of microbiome features such as taxa as well as functional capacity modules, and with the corresponding metadata per sample, the Microbiome Toolbox includes methods for (i) data analysis and exploration, (ii) data preparation including dataset-specific preprocessing and transformation, (iii) best feature selection for log-ratio denominators, (iv) two-group analysis, (v) microbiome trajectory prediction with feature importance over time, (vi) spline and linear regression statistical analysis for testing universality across different groups and differentiation of two trajectories, (vii) longitudinal anomaly detection on the microbiome trajectory and (viii) simulated intervention to return anomaly back to a reference trajectory. AVAILABILITY AND IMPLEMENTATION: The software tools are open source and implemented in Python. For developers interested in additional functionality of the Microbiome Toolbox, it is modular allowing for further extension with custom methods and analysis. The code, python package and the link to the interactive dashboard of Microbiome Toolbox are available on GitHub https://github.com/JelenaBanjac/microbiome-toolbox. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Microbiota , Software , Metadados

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa