RESUMO
Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for predicting toxicity of new chemicals and products. A key feature of such approaches is their reliance on knowledge extracted from large collections of data and data sets in computable formats. The U.S. Environmental Protection Agency (EPA) has developed a large data resource called ACToR (Aggregated Computational Toxicology Resource) to support these data-intensive efforts. ACToR comprises four main repositories: core ACToR (chemical identifiers and structures, and summary data on hazard, exposure, use, and other domains), ToxRefDB (Toxicity Reference Database, a compilation of detailed in vivo toxicity data from guideline studies), ExpoCastDB (detailed human exposure data from observational studies of selected chemicals), and ToxCastDB (data from high-throughput screening programs, including links to underlying biological information related to genes and pathways). The EPA DSSTox (Distributed Structure-Searchable Toxicity) program provides expert-reviewed chemical structures and associated information for these and other high-interest public inventories. Overall, the ACToR system contains information on about 400,000 chemicals from 1100 different sources. The entire system is built using open source tools and is freely available to download. This review describes the organization of the data repository and provides selected examples of use cases.
Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Ecotoxicologia/métodos , United States Environmental Protection Agency , Algoritmos , Bases de Dados Factuais/normas , Bases de Dados Factuais/provisão & distribuição , Ecotoxicologia/organização & administração , Poluentes Ambientais/toxicidade , Humanos , Software , Estados Unidos , United States Environmental Protection Agency/organização & administraçãoRESUMO
SUMMARY: The Distributed Structure-Searchable Toxicity (DSSTox) ARYEXP and GEOGSE files are newly published, structure-annotated files of the chemical-associated and chemical exposure-related summary experimental content contained in the ArrayExpress Repository and Gene Expression Omnibus (GEO) Series (based on data extracted on September 20, 2008). ARYEXP and GEOGSE contain 887 and 1064 unique chemical substances mapped to 1835 and 2381 chemical exposure-related experiment accession IDs, respectively. The standardized files allow one to assess, compare and search the chemical content in each resource, in the context of the larger DSSTox toxicology data network, as well as across large public cheminformatics resources such as PubChem (http://pubchem.ncbi.nlm.nih.gov). AVAILABILITY: Data files and documentation may be accessed online at http://epa.gov/ncct/dsstox/.
Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Perfilação da Expressão Gênica/métodos , Toxicogenética/métodos , Bases de Dados Genéticas , Expressão Gênica , Genômica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , SoftwareRESUMO
ACToR (Aggregated Computational Toxicology Resource) is a database and set of software applications that bring into one central location many types and sources of data on environmental chemicals. Currently, the ACToR chemical database contains information on chemical structure, in vitro bioassays and in vivo toxicology assays derived from more than 150 sources including the U.S. Environmental Protection Agency (EPA), Centers for Disease Control (CDC), U.S. Food and Drug Administration (FDA), National Institutes of Health (NIH), state agencies, corresponding government agencies in Canada, Europe and Japan, universities, the World Health Organization (WHO) and non-governmental organizations (NGOs). At the EPA National Center for Computational Toxicology, ACToR helps manage large data sets being used in a high-throughput environmental chemical screening and prioritization program called ToxCast.
Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais/normas , Poluentes Ambientais/toxicidade , Biologia Computacional/normas , Biologia Computacional/estatística & dados numéricos , Biologia Computacional/tendências , Bases de Dados Factuais/estatística & dados numéricos , Bases de Dados Factuais/tendências , Exposição Ambiental/efeitos adversos , Exposição Ambiental/normas , Exposição Ambiental/estatística & dados numéricos , Poluentes Ambientais/química , Órgãos Governamentais/normas , Órgãos Governamentais/estatística & dados numéricos , Órgãos Governamentais/tendências , Estados Unidos , United States Environmental Protection Agency/normas , United States Environmental Protection Agency/estatística & dados numéricos , United States Environmental Protection Agency/tendênciasRESUMO
A publicly available toxicogenomics capability for supporting predictive toxicology and meta-analysis depends on availability of gene expression data for chemical treatment scenarios, the ability to locate and aggregate such information by chemical, and broad data coverage within chemical, genomics, and toxicological information domains. This capability also depends on common genomics standards, protocol description, and functional linkages of diverse public Internet data resources. We present a survey of public genomics resources from these vantage points and conclude that, despite progress in many areas, the current state of the majority of public microarray databases is inadequate for supporting these objectives, particularly with regard to chemical indexing. To begin to address these inadequacies, we focus chemical annotation efforts on experimental content contained in the two primary public genomic resources: ArrayExpress and Gene Expression Omnibus. Automated scripts and extensive manual review were employed to transform free-text experiment descriptions into a standardized, chemically indexed inventory of experiments in both resources. These files, which include top-level summary annotations, allow for identification of current chemical-associated experimental content, as well as chemical-exposure-related (or "Treatment") content of greatest potential value to toxicogenomics investigation. With these chemical-index files, it is possible for the first time to assess the breadth and overlap of chemical study space represented in these databases, and to begin to assess the sufficiency of data with shared protocols for chemical similarity inferences. Chemical indexing of public genomics databases is a first important step toward integrating chemical, toxicological and genomics data into predictive toxicology.