Search | VHL Regional Portal

Non-redundant compendium of human ncRNA genes in GeneCards.

Belinky, Frida; Bahir, Iris; Stelzer, Gil; Zimmerman, Shahar; Rosen, Naomi; Nativ, Noam; Dalah, Irina; Iny Stein, Tsippi; Rappaport, Noa; Mituyama, Toutai; Safran, Marilyn; Lancet, Doron.

Bioinformatics ; 29(2): 255-61, 2013 Jan 15.

Article in English | MEDLINE | ID: mdl-23172862

ABSTRACT

MOTIVATION: Non-coding RNA (ncRNA) genes are increasingly acknowledged for their importance in the human genome. However, there is no comprehensive non-redundant database for all such human genes. RESULTS: We leveraged the effective platform of GeneCards, the human gene compendium, together with the power of fRNAdb and additional primary sources, to judiciously unify all ncRNA gene entries obtainable from 15 different primary sources. Overlapping entries were clustered to unified locations based on an algorithm employing genomic coordinates. This allowed GeneCards' gamut of relevant entries to rise â¼5-fold, resulting in â¼80,000 human non-redundant ncRNAs, belonging to 14 classes. Such 'grand unification' within a regularly updated data structure will assist future ncRNA research. AVAILABILITY AND IMPLEMENTATION: All of these non-coding RNAs are included among the â¼122,500 entries in GeneCards V3.09, along with pertinent annotation, automatically mined by its built-in pipeline from 100 data sources. This information is available at www.genecards.org. CONTACT: Frida.Belinky@weizmann.ac.il SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Databases, Genetic , RNA, Untranslated/genetics , Algorithms , Cluster Analysis , Genes , Genome, Human , Genomics , Humans , Internet , Molecular Sequence Annotation

In-silico human genomics with GeneCards.

Stelzer, Gil; Dalah, Irina; Stein, Tsippi Iny; Satanower, Yigeal; Rosen, Naomi; Nativ, Noam; Oz-Levi, Danit; Olender, Tsviya; Belinky, Frida; Bahir, Iris; Krug, Hagit; Perco, Paul; Mayer, Bernd; Kolker, Eugene; Safran, Marilyn; Lancet, Doron.

Hum Genomics ; 5(6): 709-17, 2011 Oct.

Article in English | MEDLINE | ID: mdl-22155609

ABSTRACT

Since 1998, the bioinformatics, systems biology, genomics and medical communities have enjoyed a synergistic relationship with the GeneCards database of human genes (http://www.genecards.org). This human gene compendium was created to help to introduce order into the increasing chaos of information flow. As a consequence of viewing details and deep links related to specific genes, users have often requested enhanced capabilities, such that, over time, GeneCards has blossomed into a suite of tools (including GeneDecks, GeneALaCart, GeneLoc, GeneNote and GeneAnnot) for a variety of analyses of both single human genes and sets thereof. In this paper, we focus on inhouse and external research activities which have been enabled, enhanced, complemented and, in some cases, motivated by GeneCards. In turn, such interactions have often inspired and propelled improvements in GeneCards. We describe here the evolution and architecture of this project, including examples of synergistic applications in diverse areas such as synthetic lethality in cancer, the annotation of genetic variations in disease, omics integration in a systems biology approach to kidney disease, and bioinformatics tools.

Subject(s)

Databases, Genetic , Genes/genetics , Genome, Human , Genomics , Computational Biology , Humans

Omics data management and annotation.

Harel, Arye; Dalah, Irina; Pietrokovski, Shmuel; Safran, Marilyn; Lancet, Doron.

Methods Mol Biol ; 719: 71-96, 2011.

Article in English | MEDLINE | ID: mdl-21370079

ABSTRACT

Technological Omics breakthroughs, including next generation sequencing, bring avalanches of data which need to undergo effective data management to ensure integrity, security, and maximal knowledge-gleaning. Data management system requirements include flexible input formats, diverse data entry mechanisms and views, user friendliness, attention to standards, hardware and software platform definition, as well as robustness. Relevant solutions elaborated by the scientific community include Laboratory Information Management Systems (LIMS) and standardization protocols facilitating data sharing and managing. In project planning, special consideration has to be made when choosing relevant Omics annotation sources, since many of them overlap and require sophisticated integration heuristics. The data modeling step defines and categorizes the data into objects (e.g., genes, articles, disorders) and creates an application flow. A data storage/warehouse mechanism must be selected, such as file-based systems and relational databases, the latter typically used for larger projects. Omics project life cycle considerations must include the definition and deployment of new versions, incorporating either full or partial updates. Finally, quality assurance (QA) procedures must validate data and feature integrity, as well as system performance expectations. We illustrate these data management principles with examples from the life cycle of the GeneCards Omics project (http://www.genecards.org), a comprehensive, widely used compendium of annotative information about human genes. For example, the GeneCards infrastructure has recently been changed from text files to a relational database, enabling better organization and views of the growing data. Omics data handling benefits from the wealth of Web-based information, the vast amount of public domain software, increasingly affordable hardware, and effective use of data management and annotation principles as outlined in this chapter.

Subject(s)

Computational Biology/methods , Information Management/methods , Molecular Sequence Annotation/methods , Animals , Computational Biology/standards , Data Display , Databases, Genetic , Humans , Information Management/standards , Molecular Sequence Annotation/standards , Quality Control , Research Personnel , Software

GeneCards Version 3: the human gene integrator.

Safran, Marilyn; Dalah, Irina; Alexander, Justin; Rosen, Naomi; Iny Stein, Tsippi; Shmoish, Michael; Nativ, Noam; Bahir, Iris; Doniger, Tirza; Krug, Hagit; Sirota-Madi, Alexandra; Olender, Tsviya; Golan, Yaron; Stelzer, Gil; Harel, Arye; Lancet, Doron.

Database (Oxford) ; 2010: baq020, 2010 Aug 05.

Article in English | MEDLINE | ID: mdl-20689021

ABSTRACT

GeneCards (www.genecards.org) is a comprehensive, authoritative compendium of annotative information about human genes, widely used for nearly 15 years. Its gene-centric content is automatically mined and integrated from over 80 digital sources, resulting in a web-based deep-linked card for each of >73,000 human gene entries, encompassing the following categories: protein coding, pseudogene, RNA gene, genetic locus, cluster and uncategorized. We now introduce GeneCards Version 3, featuring a speedy and sophisticated search engine and a revamped, technologically enabling infrastructure, catering to the expanding needs of biomedical researchers. A key focus is on gene-set analyses, which leverage GeneCards' unique wealth of combinatorial annotations. These include the GeneALaCart batch query facility, which tabulates user-selected annotations for multiple genes and GeneDecks, which identifies similar genes with shared annotations, and finds set-shared annotations by descriptor enrichment analysis. Such set-centric features address a host of applications, including microarray data analysis, cross-database annotation mapping and gene-disorder associations for drug targeting. We highlight the new Version 3 database architecture, its multi-faceted search engine, and its semi-automated quality assurance system. Data enhancements include an expanded visualization of gene expression patterns in normal and cancer tissues, an integrated alternative splicing pattern display, and augmented multi-source SNPs and pathways sections. GeneCards now provides direct links to gene-related research reagents such as antibodies, recombinant proteins, DNA clones and inhibitory RNAs and features gene-related drugs and compounds lists. We also portray the GeneCards Inferred Functionality Score annotation landscape tool for scoring a gene's functional information status. Finally, we delineate examples of applications and collaborations that have benefited from the GeneCards suite. Database URL: www.genecards.org.

Subject(s)

Databases, Genetic , Genome, Human , Alternative Splicing , Databases, Protein , Gene Expression , Gene Regulatory Networks , Genetic Diseases, Inborn/genetics , Humans , Internet , Mutation , Polymorphism, Single Nucleotide , Protein Interaction Mapping , Search Engine

GeneDecks: paralog hunting and gene-set distillation with GeneCards annotation.

Stelzer, Gil; Inger, Aron; Olender, Tsviya; Iny-Stein, Tsippi; Dalah, Irina; Harel, Arye; Safran, Marilyn; Lancet, Doron.

OMICS ; 13(6): 477-87, 2009 Dec.

Article in English | MEDLINE | ID: mdl-20001862

ABSTRACT

Sophisticated genomic navigation strongly benefits from a capacity to establish a similarity metric among genes. GeneDecks is a novel analysis tool that provides such a metric by highlighting shared descriptors between pairs of genes, based on the rich annotation within the GeneCards compendium of human genes. The current implementation addresses information about pathways, protein domains, Gene Ontology (GO) terms, mouse phenotypes, mRNA expression patterns, disorders, drug relationships, and sequence-based paralogy. GeneDecks has two modes: (1) Paralog Hunter, which seeks functional paralogs based on combinatorial similarity of attributes; and (2) Set Distiller, which ranks descriptors by their degree of sharing within a given gene set. GeneDecks enables the elucidation of unsuspected putative functional paralogs, and a refined scrutiny of various gene-sets (e.g., from high-throughput experiments) for discovering relevant biological patterns.

Subject(s)

Databases, Genetic , Information Storage and Retrieval/methods , Software , Algorithms , Animals , Base Sequence , Database Management Systems , Humans , Mice , Molecular Sequence Data , Pattern Recognition, Automated , Sequence Analysis, DNA

GIFtS: annotation landscape analysis with GeneCards.

Harel, Arye; Inger, Aron; Stelzer, Gil; Strichman-Almashanu, Liora; Dalah, Irina; Safran, Marilyn; Lancet, Doron.

BMC Bioinformatics ; 10: 348, 2009 Oct 23.

Article in English | MEDLINE | ID: mdl-19852797

ABSTRACT

BACKGROUND: Gene annotation is a pivotal component in computational genomics, encompassing prediction of gene function, expression analysis, and sequence scrutiny. Hence, quantitative measures of the annotation landscape constitute a pertinent bioinformatics tool. GeneCards is a gene-centric compendium of rich annotative information for over 50,000 human gene entries, building upon 68 data sources, including Gene Ontology (GO), pathways, interactions, phenotypes, publications and many more. RESULTS: We present the GeneCards Inferred Functionality Score (GIFtS) which allows a quantitative assessment of a gene's annotation status, by exploiting the unique wealth and diversity of GeneCards information. The GIFtS tool, linked from the GeneCards home page, facilitates browsing the human genome by searching for the annotation level of a specified gene, retrieving a list of genes within a specified range of GIFtS value, obtaining random genes with a specific GIFtS value, and experimenting with the GIFtS weighting algorithm for a variety of annotation categories. The bimodal shape of the GIFtS distribution suggests a division of the human gene repertoire into two main groups: the high-GIFtS peak consists almost entirely of protein-coding genes; the low-GIFtS peak consists of genes from all of the categories. Cluster analysis of GIFtS annotation vectors provides the classification of gene groups by detailed positioning in the annotation arena. GIFtS also provide measures which enable the evaluation of the databases that serve as GeneCards sources. An inverse correlation is found (for GIFtS>25) between the number of genes annotated by each source, and the average GIFtS value of genes associated with that source. Three typical source prototypes are revealed by their GIFtS distribution: genome-wide sources, sources comprising mainly highly annotated genes, and sources comprising mainly poorly annotated genes. The degree of accumulated knowledge for a given gene measured by GIFtS was correlated (for GIFtS>30) with the number of publications for a gene, and with the seniority of this entry in the HGNC database. CONCLUSION: GIFtS can be a valuable tool for computational procedures which analyze lists of large set of genes resulting from wet-lab or computational research. GIFtS may also assist the scientific community with identification of groups of uncharacterized genes for diverse applications, such as delineation of novel functions and charting unexplored areas of the human genome.

Subject(s)

Cluster Analysis , Computational Biology/methods , Software , Databases, Genetic , Gene Expression Profiling , Genes

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL