ABSTRACT
Cancerogenesis is driven by mutations leading to aberrant functioning of a complex network of molecular interactions and simultaneously affecting multiple cellular functions. Therefore, the successful application of bioinformatics and systems biology methods for analysis of high-throughput data in cancer research heavily depends on availability of global and detailed reconstructions of signalling networks amenable for computational analysis. We present here the Atlas of Cancer Signalling Network (ACSN), an interactive and comprehensive map of molecular mechanisms implicated in cancer. The resource includes tools for map navigation, visualization and analysis of molecular data in the context of signalling network maps. Constructing and updating ACSN involves careful manual curation of molecular biology literature and participation of experts in the corresponding fields. The cancer-oriented content of ACSN is completely original and covers major mechanisms involved in cancer progression, including DNA repair, cell survival, apoptosis, cell cycle, EMT and cell motility. Cell signalling mechanisms are depicted in detail, together creating a seamless 'geographic-like' map of molecular interactions frequently deregulated in cancer. The map is browsable using NaviCell web interface using the Google Maps engine and semantic zooming principle. The associated web-blog provides a forum for commenting and curating the ACSN content. ACSN allows uploading heterogeneous omics data from users on top of the maps for visualization and performing functional analyses. We suggest several scenarios for ACSN application in cancer research, particularly for visualizing high-throughput data, starting from small interfering RNA-based screening results or mutation frequencies to innovative ways of exploring transcriptomes and phosphoproteomes. Integration and analysis of these data in the context of ACSN may help interpret their biological significance and formulate mechanistic hypotheses. ACSN may also support patient stratification, prediction of treatment response and resistance to cancer drugs, as well as design of novel treatment strategies.
ABSTRACT
In recent years, an increasing number of projects have investigated tumor genome structure, using microarray-based techniques like array comparative genomic hybridization (array-CGH) or single nucleotide polymorphism (SNP) arrays. The forthcoming studies have to integrate these former results and compare their findings to the existing sets of copy number data for validation. These sets also form the basis from which many comparative retrospective analyses can be carried out. Nevertheless, exploitation of this mass of data relies on a homogeneous preparation of copy number data, which will make it possible to compare them together, and their integration into a unified bioinformatics environment with ad hoc analysis tools and interfaces. To our knowledge, no such data integration has been proposed yet. Therefore the biologists and clinicians involved in cancer research urgently need such an integrative tool, which motivated us to undertake the construction of a database for array-CGH and other DNA copy number data for tumors (ACTuDB). When available, the associated clinical, transcriptome and loss of heterozygosity data were also integrated into ACTuDB. ACTuDB contains currently about 1500 genomic profiles for tumors and cell lines for the bladder, brain, breast, colon, liver, lymphoma, neuroblastoma, mouth and pancreas, together with data for replication timing experiments. The CGH array data were processed, using ad hoc algorithms (probe mapping, breakpoint detection, gain or loss status assignment and visualization) developed at Institut Curie. The database is available from http://bioinfo.curie.fr/actudb/ and can be browsed with a user-friendly interface. This database will be a useful resource for the genomic profiling of tumors, a field of highly active research. We invite research groups involved in tumor genome profiling to submit their data to ACTuDB.
Subject(s)
Databases, Genetic , Neoplasms/genetics , Oligonucleotide Array Sequence Analysis , Data Interpretation, Statistical , Gene Dosage , Gene Expression Profiling , Humans , Neoplasms/diagnosis , Nucleic Acid HybridizationABSTRACT
Studying the molecular stratification of breast carcinoma is a real challenge considering the extreme heterogeneity of these tumors. Many patients are now treated following recommendation established at several NIH and St Gallen consensus conferences. However a significant fraction of these breast cancer patients do not need adjuvant chemotherapies while other patients receive inefficacious therapies. High density gene expression arrays have been designed to attempt to establish expression profiles that could be used as prognostic indicators or as predictive markers for response to treatment. This review is intended to discuss the potential value of these new indicators, but also the current weaknesses of these new genomic and bioinformatic approaches. The combined analysis of transcriptomic and genomic alteration data from relatively large numbers of well annotated tumor specimens may offer an opportunity to overcome the current difficulties in validating recently published non overlapping gene lists as prognostic or therapeutic indicators. There is also hope for identifying and deciphering signal transduction pathways driving tumor progression with newly developed algorithms and semi quantitative parameters obtained in simplified in vitro or in vivo models for specific transduction pathways.
Subject(s)
Breast Neoplasms/classification , Breast Neoplasms/drug therapy , Animals , Antineoplastic Agents/therapeutic use , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Carcinoma, Ductal, Breast/classification , Carcinoma, Ductal, Breast/pathology , Carcinoma, Intraductal, Noninfiltrating/classification , Carcinoma, Intraductal, Noninfiltrating/pathology , Female , Gene Expression Profiling , Humans , Mice , Mice, Transgenic , Models, Animal , Mutation/genetics , Neoplasm Metastasis , Neoplasm Staging , Neoplastic Stem Cells/pathologyABSTRACT
MOTIVATION: The identification of recurrent genomic alterations can provide insight into the initiation and progression of genetic diseases, such as cancer. Array-CGH can identify chromosomal regions that have been gained or lost, with a resolution of approximately 1 mb, for the cutting-edge techniques. The extraction of discrete profiles from raw array-CGH data has been studied extensively, but subsequent steps in the analysis require flexible, efficient algorithms, particularly if the number of available profiles exceeds a few tens or the number of array probes exceeds a few thousands. RESULTS: We propose two algorithms for computing minimal and minimal constrained regions of gain and loss from discretized CGH profiles. The second of these algorithms can handle additional constraints describing relevant regions of copy number change. We have validated these algorithms on two public array-CGH datasets. AVAILABILITY: From the authors, upon request. CONTACT: celine@lri.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Algorithms , Computer Simulation , Databases, Genetic , Oligonucleotide Array Sequence Analysis/methods , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Chromosome Mapping , Colonic Neoplasms/genetics , Colonic Neoplasms/metabolism , Female , Gene Expression Profiling/methods , Humans , Neoplasms/genetics , Neoplasms/metabolism , Reproducibility of ResultsABSTRACT
MOTIVATION: The eXtensible Markup Language (XML) is an emerging standard for structuring documents, notably for the World Wide Web. In this paper, the authors present XML and examine its use as a data language for bioinformatics. In particular, XML is compared to other languages, and some of the potential uses of XML in bioinformatics applications are presented. The authors propose to adopt XML for data interchange between databases and other sources of data. Finally the discussion is illustrated by a test case of a pedigree data model in XML. CONTACT: Emmanuel.Barillot@infobiogen.fr
Subject(s)
Computational Biology , Information Storage and Retrieval , Internet , Programming Languages , HumansABSTRACT
MOTIVATION: Giving a meaningful representation of a pedigree is not obvious when it includes consanguinity loops, individuals with multiple mates or several related families. RESULTS: We show that finding a perfectly meaningful representation of a pedigree is equivalent to the interval graph sandwich problem and we propose an algorithm for drawing pedigrees.
Subject(s)
Algorithms , Computer Graphics , PedigreeABSTRACT
XML is a new language designed to solve one of the biggest problems of the World Wide Web: its main language, HTML, is not extensible. In this article, the authors discuss the current successes and limitations of the World Wide Web, briefly explain the basics of XML and present the benefits of using XML as a data-exchange language. Finally, they discuss real-life applications that have been developed using XML, with a focus on biology.
Subject(s)
Internet , Programming Languages , Science , SoftwareABSTRACT
The genome mapping projects now produce very dense maps with up to several thousands of markers per chromosome. Besides synteny plays a increasing role in mapping: enrichment of poor maps from the maps of close genomes (in terms of evolution) is a high-reward task. We propose a map viewer adapted to this situation: MappetShow gives a clear view of very dense maps and compares efficiently several maps. MappetShow is based on non-linear viewing and is written in Java. A map description language isolates the software from the data sources. This software was easily used on data coming from as different sources as an Object Request Broker, an Object-Oriented Database, or a flat data stream. MappetShow can be browsed at the URL http:¿www.infobiogen.fr/services/Mappet. More generally we discuss how to use the non-linear viewing concept in molecular biology data visualization.
Subject(s)
Chromosome Mapping/methods , Genome, Human , Software , Chromosome Mapping/statistics & numerical data , Computer Systems , Databases, Factual , Human Genome Project , Humans , Nonlinear DynamicsABSTRACT
The DBcat (http://www.infobiogen.fr/services/dbcat ) is a comprehensive catalog of biological databases, maintained and curated at Infobiogen. It contains 500 databases classified by application domains. The DBcat is a structured flat-file library, that can be searched by means of an SRS server or a dedicated Web interface. The files are available for download from Infobiogen anonymous ftp server.
Subject(s)
Biology , Databases, Factual , Information Storage and RetrievalABSTRACT
One of the current issues in genetic epidemiology is detecting susceptibility genes on the genome. It is common now to undertake systematic screening of the genome using approaches based on a measure of the haplotype sharing in sib pairs. Here, we compare the efficiency of two statistics, the maximum likelihood score (MLS) and the nonparametric linkage score (NPLa) on the simulated data provided for GAW11. A question often raised is whether it is better to perform a single-step or a two-step strategy. For the simulated model, and whatever the strategy used, we show here that the answer is not unequivocal. In both cases, the power to detect susceptibility genes in a single replicate with MLS or NPL is extremely low. With two replicates, only one of the four simulated loci could be detected with reasonable power. When gametic disequilibrium is suspected, methods testing for both linkage and association might be more powerful.
Subject(s)
Genetic Linkage , Genetic Predisposition to Disease , Models, Genetic , Genetic Testing , Genome , Humans , Likelihood Functions , Lod Score , Statistics, NonparametricABSTRACT
SUMMARY: We developed a collaborative pedigree environment called CoPE. This environment includes a Java program for drawing pedigrees and a standardized system for pedigree storage. Unlike other existing pedigree programs, this software is particularly intended for epidemiologists in the sense that it allows customized automatic drawing of large numbers of pedigrees and remote and distributed consultation of pedigrees. AVAILABILITY: At http://www.infobiogen.fr/services/CoPE
Subject(s)
Pedigree , SoftwareABSTRACT
MOTIVATION: The scientific community urgently needs to standardize the exchange of biological data. This is helped by the use of a common protocol and the definition of shared data structures. We have based our standardization work on CORBA, a technology that has become a standard in the past years and allows interoperability between distributed objects. RESULTS: We have defined an IDL specification for genome maps and present it to the scientific community. We have implemented CORBA servers based on this IDL to distribute RHdb and HuGeMap maps. The IDL will co-evolve with the needs of the mapping community. AVAILABILITY: The standard IDL for genome maps is available at http:// corba.ebi.ac.uk/RHdb/EUCORBA/MapIDL.htm l. The IORs to browse maps from Infobiogen and EBI are at http://www.infobiogen.fr/services/Hugemap/IOR and http://corba.ebi.ac.uk/RHdb/EUCORBA/IOR CONTACT: manu@infobiogen.fr, tome@ebi.ac.uk
Subject(s)
Chromosome Mapping , Computer Systems , Animals , Databases, Factual , Genome , Human Genome Project , Humans , Programming Languages , Software , User-Computer InterfaceABSTRACT
The DBcat (http://www.infobiogen.fr/services/dbcat) is a comprehensive catalog of biological databases, maintained and curated on a daily basis at GIS Infobiogen. It contains more than 400 databases classified by application domains. The DBcat is a structured flat file library, that can be searched by means of an SRS server or a dedicated Web interface. The files are available for downloading from Infobiogen anonymous ftp server.
Subject(s)
Biology , Catalogs as Topic , Databases, Factual , Information Storage and Retrieval , InternetABSTRACT
With so many databases available for research in the Human Genome Project, it is crucial to efficiently relate information from different resources. For that purpose, we maintain Virgil, a database of rich links for data browsing, data analysis and database interconnection. Virgil current version contains more than 40 000 rich links from five major databases: SWISS-PROT, GenBank, PDB, GDB and OMIM. Materials described in this paper are available from http://www.infobiogen.fr/services/virgil/
Subject(s)
Computer Communication Networks , Database Management Systems , Databases, Factual , Human Genome Project , Information Storage and Retrieval , Animals , HumansABSTRACT
The HuGeMap database stores the major genetic and physical maps of the human genome. HuGeMap is accessible on the Web at http://www. infobiogen.fr/services/Hugemap and through a CORBA server. A standard genome map data format for the interconnection of genome map databases was defined in collaboration with the EBI. The HuGeMap CORBA server provides this interconnection using the interface definition language IDL. Two graphical user interfaces were developed for the visualization of the HuGeMap data: ZoomMap (http://www.infobiogen.fr/services/zomit/Zoom Map.html) for navigation by zooming and data transformation via magic lenses, and MappetShow (http://www.infobiogen.fr/services/Mappet) for visualizing and comparing maps.
Subject(s)
Chromosome Mapping , Databases, Factual , Genome, Human , Animals , Humans , Information Storage and Retrieval , Internet , User-Computer InterfaceABSTRACT
MOTIVATION: Links between biological objects are frequently used by researchers in biology. However, many of the links found in public databases are insufficiently documented and difficult to retrieve. Virgil introduces the idea of a rich link, i.e. the link itself and the related pieces of information. Virgil was developed to collect, manage and distribute such links. RESULTS: At the moment, Virgil is a prototype database that contains rich links between GDB genes and Genbank sequences. The Virgil data model is rich enough to describe comprehensively a link between two biological objects. Two different means to access the information were developed: a schema-driven Web interface and a CORBA server. AVAILABILITY: http://www.infobiogen. fr/services/virgil/home.html CONTACT: Frederic.Achard@infobiogen.fr
Subject(s)
Database Management Systems , Databases, Factual , Human Genome Project/organization & administration , Computer Communication Networks , Data Collection , Gene Library , Software DesignABSTRACT
Database interconnection requires the development of links between related objects from different databases. We built a database of links, called Virgil, to manage and distribute rich (documented) links between GDB genes and GenBank human sequences. Virgil contains 18 667 unique links. In addition to a simple Web form for ad-hoc queries, we propose a generic Web interface and a prototype CORBA server for link distribution. Materials described in this paper are available from http://www.infobiogen.fr/services/virgil/home. html
Subject(s)
Computer Communication Networks , Databases, Factual , Genome, Human , HumansABSTRACT
The HuGeMap database stores the major genetic and physical maps of the human genome. It is also interconnected with the gene radiation hybrid mapping database RHdb. HuGeMap is accessible through a Web server for interactive browsing at URL http://www.infobiogen. fr/services/Hugemap , as well as through a CORBA server for effective programming. HuGeMap is intended as an attempt to build open, interconnected databases, that is databases that distribute their objects worldwide in compliance with a recognized standard of distribution. Maps can be displayed and compared with a java applet (http://babbage.infobiogen.fr:15000/Mappet/Show. html ) that queries the HuGeMap ORB server as well as the RHdb ORB server at the EBI.
Subject(s)
Chromosome Mapping , Databases, Factual , Genome, Human , Computer Communication Networks , Computer Graphics , Humans , Restriction Mapping , User-Computer InterfaceABSTRACT
MOTIVATION: The problems caused by the difficulty in visualizing and browsing biological databases have become crucial. Scientists can no longer interact directly with the huge amount of available data. However, future breakthroughs in biology depend on this interaction. We propose a new metaphor for biological data visualization and browsing that allows navigation in very large databases in an intuitive way. The concepts underlying our approach are based on navigation and visualization with zooming, semantic zooming and portals; and on data transformation via magic lenses. We think that these new visualization and navigation techniques should be applied globally to a federation of biological databases. RESULTS: We have implemented a generic tool, called Zomit, that provides an application programming interface for developing servers for such navigation and visualization, and a generic architecture-independent client (Javatrade mark applet) that queries such servers. As an illustration of the capabilities of our approach, we have developed ZoomMap, a prototype browser for the HuGeMap human genome map database. AVAILABILITY: Zomit and ZoomMap are available at the URL http://www.infobiogen.fr/services/zomit.
Subject(s)
Biological Science Disciplines/trends , Databases, Factual/trends , User-Computer Interface , Humans , MetaphorABSTRACT
Database interoperation is becoming a bottleneck for the research community in biology. In this paper, we first discuss the question of interoperability and give a brief overview of CORBA. Then, an example is explained in some detail: a simple but realistic data bank of STSs is implemented. The Object Request Broker is the media for communication between an object server (the data bank) and a client (possibly a genome center). Since CORBA enables easy development of networked applications, we meant this paper to provide an incentive for the bioinformatics community to develop distributed objects.