Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 27
Filter
1.
Nucleic Acids Res ; 52(D1): D672-D678, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37941124

ABSTRACT

The Reactome Knowledgebase (https://reactome.org), an Elixir and GCBR core biological data resource, provides manually curated molecular details of a broad range of normal and disease-related biological processes. Processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Here we review progress towards annotation of the entire human proteome, targeted annotation of disease-causing genetic variants of proteins and of small-molecule drugs in a pathway context, and towards supporting explicit annotation of cell- and tissue-specific pathways. Finally, we briefly discuss issues involved in making Reactome more fully interoperable with other related resources such as the Gene Ontology and maintaining the resulting community resource network.


Subject(s)
Knowledge Bases , Metabolic Networks and Pathways , Signal Transduction , Humans , Metabolic Networks and Pathways/genetics , Proteome/genetics
2.
Nucleic Acids Res ; 48(D1): D77-D83, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31665515

ABSTRACT

Expression Atlas is EMBL-EBI's resource for gene and protein expression. It sources and compiles data on the abundance and localisation of RNA and proteins in various biological systems and contexts and provides open access to this data for the research community. With the increased availability of single cell RNA-Seq datasets in the public archives, we have now extended Expression Atlas with a new added-value service to display gene expression in single cells. Single Cell Expression Atlas was launched in 2018 and currently includes 123 single cell RNA-Seq studies from 12 species. The website can be searched by genes within or across species to reveal experiments, tissues and cell types where this gene is expressed or under which conditions it is a marker gene. Within each study, cells can be visualized using a pre-calculated t-SNE plot and can be coloured by different features or by cell clusters based on gene expression. Within each experiment, there are links to downloadable files, such as RNA quantification matrices, clustering results, reports on protocols and associated metadata, such as assigned cell types.


Subject(s)
Computational Biology/methods , Databases, Nucleic Acid , Gene Expression Profiling , Software , Gene Expression Profiling/methods , Organ Specificity , Single-Cell Analysis/methods , User-Computer Interface
4.
Nucleic Acids Res ; 47(D1): D711-D715, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30357387

ABSTRACT

ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data from a variety of technologies assaying functional modalities of a genome, such as gene expression or promoter occupancy. The number of experiments based on sequencing technologies, in particular RNA-seq experiments, has been increasing over the last few years and submissions of sequencing data have overtaken microarray experiments in the last 12 months. Additionally, there is a significant increase in experiments investigating single cells, rather than bulk samples, known as single-cell RNA-seq. To accommodate these trends, we have substantially changed our submission tool Annotare which, along with raw and processed data, collects all metadata necessary to interpret these experiments. Selected datasets are re-processed and loaded into our sister resource, the value-added Expression Atlas (and its component Single Cell Expression Atlas), which not only enables users to interpret the data easily but also serves as a test for data quality. With an increasing number of studies that combine different assay modalities (multi-omics experiments), a new more general archival resource the BioStudies Database has been developed, which will eventually supersede ArrayExpress. Data submissions will continue unchanged; all existing ArrayExpress data will be incorporated into BioStudies and the existing accession numbers and application programming interfaces will be maintained.


Subject(s)
Oligonucleotide Array Sequence Analysis/methods , Single-Cell Analysis/methods , Software , Databases, Genetic , RNA-Seq/methods
5.
Nucleic Acids Res ; 46(D1): D246-D251, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29165655

ABSTRACT

Expression Atlas (http://www.ebi.ac.uk/gxa) is an added value database that provides information about gene and protein expression in different species and contexts, such as tissue, developmental stage, disease or cell type. The available public and controlled access data sets from different sources are curated and re-analysed using standardized, open source pipelines and made available for queries, download and visualization. As of August 2017, Expression Atlas holds data from 3,126 studies across 33 different species, including 731 from plants. Data from large-scale RNA sequencing studies including Blueprint, PCAWG, ENCODE, GTEx and HipSci can be visualized next to each other. In Expression Atlas, users can query genes or gene-sets of interest and explore their expression across or within species, tissues, developmental stages in a constitutive or differential context, representing the effects of diseases, conditions or experimental interventions. All processed data matrices are available for direct download in tab-delimited format or as R-data. In addition to the web interface, data sets can now be searched and downloaded through the Expression Atlas R package. Novel features and visualizations include the on-the-fly analysis of gene set overlaps and the option to view gene co-expression in experiments investigating constitutive gene expression across tissues or other conditions.


Subject(s)
Databases, Genetic , Animals , Gene Expression Profiling , Humans , Mammals/genetics , Mammals/metabolism , Oligonucleotide Array Sequence Analysis , Plants/genetics , Plants/metabolism , Proteomics , Sequence Analysis, RNA , Species Specificity , User-Computer Interface
6.
Nucleic Acids Res ; 46(D1): D1181-D1189, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29165610

ABSTRACT

Gramene (http://www.gramene.org) is a knowledgebase for comparative functional analysis in major crops and model plant species. The current release, #54, includes over 1.7 million genes from 44 reference genomes, most of which were organized into 62,367 gene families through orthologous and paralogous gene classification, whole-genome alignments, and synteny. Additional gene annotations include ontology-based protein structure and function; genetic, epigenetic, and phenotypic diversity; and pathway associations. Gramene's Plant Reactome provides a knowledgebase of cellular-level plant pathway networks. Specifically, it uses curated rice reference pathways to derive pathway projections for an additional 66 species based on gene orthology, and facilitates display of gene expression, gene-gene interactions, and user-defined omics data in the context of these pathways. As a community portal, Gramene integrates best-of-class software and infrastructure components including the Ensembl genome browser, Reactome pathway browser, and Expression Atlas widgets, and undergoes periodic data and software upgrades. Via powerful, intuitive search interfaces, users can easily query across various portals and interactively analyze search results by clicking on diverse features such as genomic context, highly augmented gene trees, gene expression anatomograms, associated pathways, and external informatics resources. All data in Gramene are accessible through both visual and programmatic interfaces.


Subject(s)
Databases, Genetic , Gene Expression Regulation, Plant , Genomics/methods , Knowledge Bases , Plants/genetics , Epigenesis, Genetic , Gene Ontology , Genetic Research , Genetic Variation , Genome, Plant , Metabolic Networks and Pathways/genetics , Molecular Sequence Annotation , Plants/metabolism , Software , User-Computer Interface
7.
Nucleic Acids Res ; 45(D1): D1029-D1039, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27799469

ABSTRACT

Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX.


Subject(s)
Computational Biology/methods , Databases, Genetic , Plants/genetics , Plants/metabolism , Search Engine , Genomics/methods , Metabolic Networks and Pathways , Signal Transduction , Systems Biology/methods , User-Computer Interface , Web Browser
8.
Nucleic Acids Res ; 45(D1): D985-D994, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899665

ABSTRACT

We have designed and developed a data integration and visualization platform that provides evidence about the association of known and potential drug targets with diseases. The platform is designed to support identification and prioritization of biological targets for follow-up. Each drug target is linked to a disease using integrated genome-wide data from a broad range of data sources. The platform provides either a target-centric workflow to identify diseases that may be associated with a specific target, or a disease-centric workflow to identify targets that may be associated with a specific disease. Users can easily transition between these target- and disease-centric workflows. The Open Targets Validation Platform is accessible at https://www.targetvalidation.org.


Subject(s)
Computational Biology/methods , Molecular Targeted Therapy , Search Engine , Software , Databases, Factual , Humans , Molecular Targeted Therapy/methods , Reproducibility of Results , Web Browser , Workflow
9.
Bioinformatics ; 33(14): 2218-2220, 2017 Jul 15.
Article in English | MEDLINE | ID: mdl-28369191

ABSTRACT

MOTIVATION: The exponential growth of publicly available RNA-sequencing (RNA-Seq) data poses an increasing challenge to researchers wishing to discover, analyse and store such data, particularly those based in institutions with limited computational resources. EMBL-EBI is in an ideal position to address these challenges and to allow the scientific community easy access to not just raw, but also processed RNA-Seq data. We present a Web service to access the results of a systematically and continually updated standardized alignment as well as gene and exon expression quantification of all public bulk (and in the near future also single-cell) RNA-Seq runs in 264 species in European Nucleotide Archive, using Representational State Transfer. RESULTS: The RNASeq-er API (Application Programming Interface) enables ontology-powered search for and retrieval of CRAM, bigwig and bedGraph files, gene and exon expression quantification matrices (Fragments Per Kilobase Of Exon Per Million Fragments Mapped, Transcripts Per Million, raw counts) as well as sample attributes annotated with ontology terms. To date over 270 00 RNA-Seq runs in nearly 10 000 studies (1PB of raw FASTQ data) in 264 species in ENA have been processed and made available via the API. AVAILABILITY AND IMPLEMENTATION: The RNASeq-er API can be accessed at http://www.ebi.ac.uk/fg/rnaseq/api . The commands used to analyse the data are available in supplementary materials and at https://github.com/nunofonseca/irap/wiki/iRAP-single-library . CONTACT: rnaseq@ebi.ac.uk ; rpetry@ebi.ac.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Eukaryota/genetics , Sequence Analysis, RNA/methods , Software , Transcriptome , Animals , Databases, Genetic , Gene Expression , Gene Ontology , Humans , Internet
10.
Nucleic Acids Res ; 44(D1): D746-52, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26481351

ABSTRACT

Expression Atlas (http://www.ebi.ac.uk/gxa) provides information about gene and protein expression in animal and plant samples of different cell types, organism parts, developmental stages, diseases and other conditions. It consists of selected microarray and RNA-sequencing studies from ArrayExpress, which have been manually curated, annotated with ontology terms, checked for high quality and processed using standardised analysis methods. Since the last update, Atlas has grown seven-fold (1572 studies as of August 2015), and incorporates baseline expression profiles of tissues from Human Protein Atlas, GTEx and FANTOM5, and of cancer cell lines from ENCODE, CCLE and Genentech projects. Plant studies constitute a quarter of Atlas data. For genes of interest, the user can view baseline expression in tissues, and differential expression for biologically meaningful pairwise comparisons-estimated using consistent methodology across all of Atlas. Our first proteomics study in human tissues is now displayed alongside transcriptomics data in the same tissues. Novel analyses and visualisations include: 'enrichment' in each differential comparison of GO terms, Reactome, Plant Reactome pathways and InterPro domains; hierarchical clustering (by baseline expression) of most variable genes and experimental conditions; and, for a given gene-condition, distribution of baseline expression across biological replicates.


Subject(s)
Databases, Genetic , Gene Expression Profiling , Plants/metabolism , Proteins/metabolism , Proteomics , Animals , Cell Line, Tumor , Humans , Plants/genetics , User-Computer Interface
11.
Nucleic Acids Res ; 44(D1): D1133-40, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26553803

ABSTRACT

Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼ 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials.


Subject(s)
Databases, Genetic , Genome, Plant , Plants/metabolism , Gene Expression , Genetic Variation , Genomics , Internet , Metabolic Networks and Pathways , Molecular Sequence Annotation , Plants/genetics
12.
Nucleic Acids Res ; 43(Database issue): D1113-6, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25361974

ABSTRACT

The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is an international functional genomics database at the European Bioinformatics Institute (EMBL-EBI) recommended by most journals as a repository for data supporting peer-reviewed publications. It contains data from over 7000 public sequencing and 42,000 array-based studies comprising over 1.5 million assays in total. The proportion of sequencing-based submissions has grown significantly over the last few years and has doubled in the last 18 months, whilst the rate of microarray submissions is growing slightly. All data in ArrayExpress are available in the MAGE-TAB format, which allows robust linking to data analysis and visualization tools and standardized analysis. The main development over the last two years has been the release of a new data submission tool Annotare, which has reduced the average submission time almost 3-fold. In the near future, Annotare will become the only submission route into ArrayExpress, alongside MAGE-TAB format-based pipelines. ArrayExpress is a stable and highly accessed resource. Our future tasks include automation of data flows and further integration with other EMBL-EBI resources for the representation of multi-omics data.


Subject(s)
Databases, Genetic , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis , Genomics , High-Throughput Nucleotide Sequencing , Internet , Software
13.
Nucleic Acids Res ; 42(Database issue): D926-32, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24304889

ABSTRACT

Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of 'baseline' expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful 'contrasts', i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user.


Subject(s)
Databases, Genetic , Gene Expression Profiling , Genomics , Humans , Internet , Oligonucleotide Array Sequence Analysis , Proteins/genetics , Proteins/metabolism , RNA Isoforms/metabolism , Sequence Analysis, RNA
14.
BMC Genomics ; 16 Suppl 8: S2, 2015.
Article in English | MEDLINE | ID: mdl-26110515

ABSTRACT

BACKGROUND: A vast amount of DNA variation is being identified by increasingly large-scale exome and genome sequencing projects. To be useful, variants require accurate functional annotation and a wide range of tools are available to this end. McCarthy et al recently demonstrated the large differences in prediction of loss-of-function (LoF) variation when RefSeq and Ensembl transcripts are used for annotation, highlighting the importance of the reference transcripts on which variant functional annotation is based. RESULTS: We describe a detailed analysis of the similarities and differences between the gene and transcript annotation in the GENCODE and RefSeq genesets. We demonstrate that the GENCODE Comprehensive set is richer in alternative splicing, novel CDSs, novel exons and has higher genomic coverage than RefSeq, while the GENCODE Basic set is very similar to RefSeq. Using RNAseq data we show that exons and introns unique to one geneset are expressed at a similar level to those common to both. We present evidence that the differences in gene annotation lead to large differences in variant annotation where GENCODE and RefSeq are used as reference transcripts, although this is predominantly confined to non-coding transcripts and UTR sequence, with at most ~30% of LoF variants annotated discordantly. We also describe an investigation of dominant transcript expression, showing that it both supports the utility of the GENCODE Basic set in providing a smaller set of more highly expressed transcripts and provides a useful, biologically-relevant filter for further reducing the complexity of the transcriptome. CONCLUSIONS: The reference transcripts selected for variant functional annotation do have a large effect on the outcome. The GENCODE Comprehensive transcripts contain more exons, have greater genomic coverage and capture many more variants than RefSeq in both genome and exome datasets, while the GENCODE Basic set shows a higher degree of concordance with RefSeq and has fewer unique features. We propose that the GENCODE Comprehensive set has great utility for the discovery of new variants with functional potential, while the GENCODE Basic set is more suitable for applications demanding less complex interpretation of functional variants.


Subject(s)
Computational Biology , Genome, Human , Molecular Sequence Annotation , Protein Isoforms/metabolism , Software , Alternative Splicing , Databases, Genetic , Humans , Protein Isoforms/genetics , Transcriptome
15.
Nucleic Acids Res ; 40(Database issue): D1077-81, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22064864

ABSTRACT

Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive and the European Nucleotide Archive. A simple interface allows the user to query for differential gene expression either by gene names or attributes or by biological conditions, e.g. diseases, organism parts or cell types. Since our previous report we made 20 monthly releases and, as of Release 11.08 (August 2011), the database supports 19 species, which contains expression data measured for 19,014 biological conditions in 136,551 assays from 5598 independent studies.


Subject(s)
Databases, Genetic , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis , Atlases as Topic , Genomics , Humans , MicroRNAs/metabolism , Molecular Sequence Annotation , Sequence Analysis, RNA , User-Computer Interface
16.
Nat Genet ; 2024 Aug 28.
Article in English | MEDLINE | ID: mdl-39198675

ABSTRACT

The complex and dynamic cellular composition of the human endometrium remains poorly understood. Previous endometrial single-cell atlases profiled few donors and lacked consensus in defining cell types. We introduce the Human Endometrial Cell Atlas (HECA), a high-resolution single-cell reference atlas (313,527 cells) combining published and new endometrial single-cell transcriptomics datasets of 63 women with and without endometriosis. HECA assigns consensus and identifies previously unreported cell types, mapped in situ using spatial transcriptomics and validated using a new independent single-nuclei dataset (312,246 nuclei, 63 donors). In the functionalis, we identify intricate stromal-epithelial cell coordination via transforming growth factor beta (TGFß) signaling. In the basalis, we define signaling between fibroblasts and an epithelial population expressing progenitor markers. Integration of HECA with large-scale endometriosis genome-wide association study data pinpoints decidualized stromal cells and macrophages as most likely dysregulated in endometriosis. The HECA is a valuable resource for studying endometrial physiology and disorders, and for guiding microphysiological in vitro systems development.

17.
Curr Protoc ; 3(4): e722, 2023 Apr.
Article in English | MEDLINE | ID: mdl-37053306

ABSTRACT

Pathway databases provide descriptions of the roles of proteins, nucleic acids, lipids, carbohydrates, and other molecular entities within their biological cellular contexts. Pathway-centric views of these roles may allow for the discovery of unexpected functional relationships in data such as gene expression profiles and somatic mutation catalogues from tumor cells. For this reason, there is a high demand for high-quality pathway databases and their associated tools. The Reactome project (a collaboration between the Ontario Institute for Cancer Research, New York University Langone Health, the European Bioinformatics Institute, and Oregon Health & Science University) is one such pathway database. Reactome collects detailed information on biological pathways and processes in humans from the primary literature. Reactome content is manually curated, expert-authored, and peer-reviewed and spans the gamut from simple intermediate metabolism to signaling pathways and complex cellular events. This information is supplemented with likely orthologous molecular reactions in mouse, rat, zebrafish, worm, and other model organisms. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Browsing a Reactome pathway Basic Protocol 2: Exploring Reactome annotations of disease and drugs Basic Protocol 3: Finding the pathways involving a gene or protein Alternate Protocol 1: Finding the pathways involving a gene or protein using UniProtKB (SwissProt), Ensembl, or Entrez gene identifier Alternate Protocol 2: Using advanced search Basic Protocol 4: Using the Reactome pathway analysis tool to identify statistically overrepresented pathways Basic Protocol 5: Using the Reactome pathway analysis tool to overlay expression data onto Reactome pathway diagrams Basic Protocol 6: Comparing inferred model organism and human pathways using the Species Comparison tool Basic Protocol 7: Comparing tissue-specific expression using the Tissue Distribution tool.


Subject(s)
Metabolic Networks and Pathways , Zebrafish , Humans , Animals , Mice , Rats , Zebrafish/metabolism , Databases, Protein , Proteins/metabolism , Signal Transduction
18.
Nat Commun ; 11(1): 3400, 2020 07 07.
Article in English | MEDLINE | ID: mdl-32636365

ABSTRACT

The Pan-Cancer Analysis of Whole Genomes (PCAWG) project generated a vast amount of whole-genome cancer sequencing resource data. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumor types, we provide a user's guide to the five publicly available online data exploration and visualization tools introduced in the PCAWG marker paper. These tools are ICGC Data Portal, UCSC Xena, Chromothripsis Explorer, Expression Atlas, and PCAWG-Scout. We detail use cases and analyses for each tool, show how they incorporate outside resources from the larger genomics ecosystem, and demonstrate how the tools can be used together to understand the biology of cancers more deeply. Together, the tools enable researchers to query the complex genomic PCAWG data dynamically and integrate external information, enabling and enhancing interpretation.


Subject(s)
Computational Biology/methods , Genome, Human , Neoplasms/genetics , Chromothripsis , Data Analysis , Databases, Genetic , Genomics , Humans , Internet , Mutation , Software , User-Computer Interface , Whole Genome Sequencing
19.
Nucleic Acids Res ; 35(Database issue): D224-8, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17202162

ABSTRACT

InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.


Subject(s)
Databases, Protein , Internet , Protein Structure, Tertiary , Proteins/chemistry , Proteins/classification , Proteins/physiology , Sequence Analysis, Protein , Systems Integration , User-Computer Interface
20.
Nat Commun ; 10(1): 3512, 2019 08 05.
Article in English | MEDLINE | ID: mdl-31383865

ABSTRACT

The amount of omics data in the public domain is increasing every year. Modern science has become a data-intensive discipline. Innovative solutions for data management, data sharing, and for discovering novel datasets are therefore increasingly required. In 2016, we released the first version of the Omics Discovery Index (OmicsDI) as a light-weight system to aggregate datasets across multiple public omics data resources. OmicsDI aggregates genomics, transcriptomics, proteomics, metabolomics and multiomics datasets, as well as computational models of biological processes. Here, we propose a set of novel metrics to quantify the attention and impact of biomedical datasets. A complete framework (now integrated into OmicsDI) has been implemented in order to provide and evaluate those metrics. Finally, we propose a set of recommendations for authors, journals and data resources to promote an optimal quantification of the impact of datasets.


Subject(s)
Access to Information , Datasets as Topic , Information Dissemination , Computational Biology/statistics & numerical data , Gene Expression Profiling/statistics & numerical data , Genomics/statistics & numerical data , Humans , Metabolomics/statistics & numerical data , Proteomics/statistics & numerical data
SELECTION OF CITATIONS
SEARCH DETAIL