Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
1.
Methods Mol Biol ; 1613: 333-353, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28849567

RESUMO

A robust bioinformatics capability is widely acknowledged as central to realizing the promises of toxicogenomics. Successful application of toxicogenomic approaches, such as DNA microarrays, inextricably relies on appropriate data management, the ability to extract knowledge from massive amounts of data and the availability of functional information for data interpretation. At the FDA's National Center for Toxicological Research (NCTR), we are developing a public microarray data management and analysis software, called ArrayTrack that is also used in the routine review of genomic data submitted to the FDA. ArrayTrack stores a full range of information related to DNA microarrays and clinical and nonclinical studies as well as the digested data derived from proteomics and metabonomics experiments. In addition, ArrayTrack provides a rich collection of functional information about genes, proteins, and pathways drawn from various public biological databases for facilitating data interpretation. Many data analysis and visualization tools are available with ArrayTrack for individual platform data analysis, multiple omics data integration and integrated analysis of omics data with study data. Importantly, gene expression data, functional information, and analysis methods are fully integrated so that the data analysis and interpretation process is simplified and enhanced. Using ArrayTrack, users can select an analysis method from the ArrayTrack tool box, apply the method to selected microarray data and the analysis results can be directly linked to individual gene, pathway, and Gene Ontology analysis. ArrayTrack is publicly available online ( http://www.fda.gov/nctr/science/centers/toxicoinformatics/ArrayTrack/index.htm ), and the prospective user can also request a local installation version by contacting the authors.


Assuntos
Biologia Computacional/métodos , Biologia Computacional/organização & administração , Mineração de Dados , Bases de Dados Genéticas , Genômica , Armazenamento e Recuperação da Informação , Metabolômica , Software , Toxicogenética , Estados Unidos , United States Food and Drug Administration , Navegador
3.
BMC Bioinformatics ; 17(1): 213, 2016 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-27177941

RESUMO

BACKGROUND: Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. METHODS: We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella enterica strains were used as a case study to show the workflow of this procedure. The perplexity measurement of the topic numbers and the convergence efficiencies of Gibbs sampling were calculated and discussed for achieving the best result from the proposed procedure. RESULTS: The output topics by LDA algorithms could be treated as features of Salmonella strains to accurately describe the genetic diversity of fliC gene in various serotypes. The results of a two-way hierarchical clustering and data matrix analysis on LDA-derived matrices successfully classified Salmonella serotypes based on the NGS data. The implementation of topic modeling in NGS data analysis procedure provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data. CONCLUSION: The implementation of topic modeling in NGS data analysis provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.


Assuntos
Algoritmos , Mineração de Dados/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biomarcadores/análise , Análise por Conglomerados , Modelos Teóricos , Polimorfismo de Nucleotídeo Único/genética , Salmonella/classificação , Salmonella/genética , Sorotipagem
4.
Int J Environ Res Public Health ; 13(4): 373, 2016 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-27023590

RESUMO

Flavonoids are frequently used as dietary supplements in the absence of research evidence regarding health benefits or toxicity. Furthermore, ingested doses could far exceed those received from diet in the course of normal living. Some flavonoids exhibit binding to estrogen receptors (ERs) with consequential vigilance by regulatory authorities at the U.S. EPA and FDA. Regulatory authorities must consider both beneficial claims and potential adverse effects, warranting the increases in research that has spanned almost two decades. Here, we report pathway enrichment of 14 targets from the Comparative Toxicogenomics Database (CTD) and the Herbal Ingredients' Targets (HIT) database for 22 flavonoids that bind ERs. The selected flavonoids are confirmed ER binders from our earlier studies, and were here found in mainly involved in three types of biological processes, ER regulation, estrogen metabolism and synthesis, and apoptosis. Besides cancers, we conjecture that the flavonoids may affect several diseases via apoptosis pathways. Diseases such as amyotrophic lateral sclerosis, viral myocarditis and non-alcoholic fatty liver disease could be implicated. More generally, apoptosis processes may be importantly evolved biological functions of flavonoids that bind ERs and high dose ingestion of those flavonoids could adversely disrupt the cellular apoptosis process.


Assuntos
Poluentes Ambientais/toxicidade , Flavonoides/toxicidade , Receptores de Estrogênio/metabolismo , Poluentes Ambientais/metabolismo , Flavonoides/metabolismo , Humanos , Ligação Proteica/efeitos dos fármacos , Receptores de Estrogênio/genética
5.
BMC Public Health ; 16: 279, 2016 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-26993983

RESUMO

BACKGROUND: Both adolescent substance use and adolescent depression are major public health problems, and have the tendency to co-occur. Thousands of articles on adolescent substance use or depression have been published. It is labor intensive and time consuming to extract huge amounts of information from the cumulated collections. Topic modeling offers a computational tool to find relevant topics by capturing meaningful structure among collections of documents. METHODS: In this study, a total of 17,723 abstracts from PubMed published from 2000 to 2014 on adolescent substance use and depression were downloaded as objects, and Latent Dirichlet allocation (LDA) was applied to perform text mining on the dataset. Word clouds were used to visually display the content of topics and demonstrate the distribution of vocabularies over each topic. RESULTS: The LDA topics recaptured the search keywords in PubMed, and further discovered relevant issues, such as intervention program, association links between adolescent substance use and adolescent depression, such as sexual experience and violence, and risk factors of adolescent substance use, such as family factors and peer networks. Using trend analysis to explore the dynamics of proportion of topics, we found that brain research was assessed as a hot issue by the coefficient of the trend test. CONCLUSIONS: Topic modeling has the ability to segregate a large collection of articles into distinct themes, and it could be used as a tool to understand the literature, not only by recapturing known facts but also by discovering other relevant topics.


Assuntos
Mineração de Dados/métodos , Depressão/epidemiologia , Transtornos Relacionados ao Uso de Substâncias/epidemiologia , Adolescente , Comportamento do Adolescente , Humanos
6.
J Genet ; 94(4): 731-40, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26690529

RESUMO

Single-nucleotide polymorphisms (SNPs) determined based on SNP arrays from the international HapMap consortium (HapMap) and the genetic variants detected in the 1000 genomes project (1KGP) can serve as two references for genomewide association studies (GWAS). We conducted comparative analyses to provide a means for assessing concerns regarding SNP array-based GWAS findings as well as for realistically bounding expectations for next generation sequencing (NGS)-based GWAS. We calculated and compared base composition, transitions to transversions ratio, minor allele frequency and heterozygous rate for SNPs from HapMap and 1KGP for the 622 common individuals. We analysed the genotype discordance between HapMap and 1KGP to assess consistency in the SNPs from the two references. In 1KGP, 90.58% of 36,817,799 SNPs detected were not measured in HapMap. More SNPs with minor allele frequencies less than 0.01 were found in 1KGP than HapMap. The two references have low disc ordance (generally smaller than 0.02) in genotypes of common SNPs, with most discordance from heterozygous SNPs. Our study demonstrated that SNP array-based GWAS findings were reliable and useful, although only a small portion of genetic variances were explained. NGS can detect not only common but also rare variants, supporting the expectation that NGS-based GWAS will be able to incorporate a much larger portion of genetic variance than SNP arrays-based GWAS.


Assuntos
Genoma Humano/genética , Polimorfismo de Nucleotídeo Único/genética , Mapeamento Cromossômico/métodos , Frequência do Gene/genética , Genótipo , Projeto HapMap , Humanos , Desequilíbrio de Ligação/genética
7.
BMC Bioinformatics ; 16 Suppl 13: S8, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26424364

RESUMO

BACKGROUND: Topic modelling is an active research field in machine learning. While mainly used to build models from unstructured textual data, it offers an effective means of data mining where samples represent documents, and different biological endpoints or omics data represent words. Latent Dirichlet Allocation (LDA) is the most commonly used topic modelling method across a wide number of technical fields. However, model development can be arduous and tedious, and requires burdensome and systematic sensitivity studies in order to find the best set of model parameters. Often, time-consuming subjective evaluations are needed to compare models. Currently, research has yielded no easy way to choose the proper number of topics in a model beyond a major iterative approach. METHODS AND RESULTS: Based on analysis of variation of statistical perplexity during topic modelling, a heuristic approach is proposed in this study to estimate the most appropriate number of topics. Specifically, the rate of perplexity change (RPC) as a function of numbers of topics is proposed as a suitable selector. We test the stability and effectiveness of the proposed method for three markedly different types of grounded-truth datasets: Salmonella next generation sequencing, pharmacological side effects, and textual abstracts on computational biology and bioinformatics (TCBB) from PubMed. CONCLUSION: The proposed RPC-based method is demonstrated to choose the best number of topics in three numerical experiments of widely different data types, and for databases of very different sizes. The work required was markedly less arduous than if full systematic sensitivity studies had been carried out with number of topics as a parameter. We understand that additional investigation is needed to substantiate the method's theoretical basis, and to establish its generalizability in terms of dataset characteristics.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Heurística/fisiologia , Bases de Dados Factuais , Sequenciamento de Nucleotídeos em Larga Escala
8.
Chem Res Toxicol ; 28(9): 1784-95, 2015 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-26308263

RESUMO

Bisphenol A (BPA) replacement compounds are released to the environment and cause widespread human exposure. However, a lack of thorough safety evaluations on the BPA replacement compounds has raised public concerns. We assessed the endocrine disruption potential of BPA replacement compounds in the market to assist their safety evaluations. A literature search was conducted to ascertain the BPA replacement compounds in use. Available experimental estrogenic activity data of these compounds were extracted from the Estrogenic Activity Database (EADB) to assess their estrogenic potential. An in silico model was developed to predict the estrogenic activity of compounds lacking experimental data. Molecular dynamics (MD) simulations were performed to understand the mechanisms by which the estrogenic compounds bind to and activate the estrogen receptor (ER). Forty-five BPA replacement compounds were identified in the literature. Seven were more estrogenic and five less estrogenic than BPA, while six were nonestrogenic in EADB. A two-tier in silico model was developed based on molecular docking to predict the estrogenic activity of the 27 compounds lacking data. Eleven were predicted as ER binders and 16 as nonbinders. MD simulations revealed hydrophobic contacts and hydrogen bonds as the main interactions between ER and the estrogenic compounds.


Assuntos
Compostos Benzidrílicos/toxicidade , Disruptores Endócrinos/toxicidade , Estrogênios/farmacologia , Fenóis/toxicidade , Simulação por Computador , Bases de Dados de Compostos Químicos , Simulação de Dinâmica Molecular
9.
Toxicol Sci ; 143(2): 333-48, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25349334

RESUMO

One endocrine disruption mechanism is through binding to nuclear receptors such as the androgen receptor (AR) and estrogen receptor (ER) in target cells. The concentration of a chemical in serum is important for its entry into the target cells to bind the receptors, which is regulated by the serum proteins. Human sex hormone-binding globulin (SHBG) is the major transport protein in serum that can bind androgens and estrogens and thus change a chemical's availability to enter the target cells. Sequestration of an androgen or estrogen in the serum can alter the chemical elicited AR- and ER-mediated responses. To better understand the chemical-induced endocrine activity, we developed a competitive binding assay using human pregnancy plasma and measured the binding to the human SHBG for 125 structurally diverse chemicals, most of which were known to bind AR and ER. Eighty seven chemicals were able to bind the human SHBG in the assay, whereas 38 chemicals were nonbinders. Binding data for human SHBG are compared with that for rat α-fetoprotein, ER and AR. Knowing the binding profiles between serum and nuclear receptors will improve assessment of a chemical's potential for endocrine disruption. The SHBG binding data reported here represent the largest data set of structurally diverse chemicals tested for human SHBG binding. Utilization of the SHBG binding data with AR and ER binding data could enable better evaluation of endocrine disrupting potential of chemicals through AR- and ER-mediated responses since sequestration in serum could be considered.


Assuntos
Disruptores Endócrinos/química , Receptores Androgênicos/química , Receptores de Estrogênio/química , Globulina de Ligação a Hormônio Sexual/química , alfa-Fetoproteínas/química , Ligação Competitiva , Disruptores Endócrinos/metabolismo , Humanos , Ligantes , Modelos Moleculares , Ligação Proteica , Receptores Androgênicos/metabolismo , Receptores de Estrogênio/metabolismo , Globulina de Ligação a Hormônio Sexual/metabolismo , Relação Estrutura-Atividade , alfa-Fetoproteínas/metabolismo
10.
BMC Bioinformatics ; 15 Suppl 11: S4, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25349983

RESUMO

BACKGROUND: Endocrine disrupting chemicals (EDCs) are exogenous compounds that interfere with the endocrine system of vertebrates, often through direct or indirect interactions with nuclear receptor proteins. Estrogen receptors (ERs) are particularly important protein targets and many EDCs are ER binders, capable of altering normal homeostatic transcription and signaling pathways. An estrogenic xenobiotic can bind ER as either an agonist or antagonist to increase or inhibit transcription, respectively. The receptor conformations in the complexes of ER bound with agonists and antagonists are different and dependent on interactions with co-regulator proteins that vary across tissue type. Assessment of chemical endocrine disruption potential depends not only on binding affinity to ERs, but also on changes that may alter the receptor conformation and its ability to subsequently bind DNA response elements and initiate transcription. Using both agonist and antagonist conformations of the ERα, we developed an in silico approach that can be used to differentiate agonist versus antagonist status of potential binders. METHODS: The approach combined separate molecular docking models for ER agonist and antagonist conformations. The ability of this approach to differentiate agonists and antagonists was first evaluated using true agonists and antagonists extracted from the crystal structures available in the protein data bank (PDB), and then further validated using a larger set of ligands from the literature. The usefulness of the approach was demonstrated with enrichment analysis in data sets with a large number of decoy ligands. RESULTS: The performance of individual agonist and antagonist docking models was found comparable to similar models in the literature. When combined in a competitive docking approach, they provided the ability to discriminate agonists from antagonists with good accuracy, as well as the ability to efficiently select true agonists and antagonists from decoys during enrichment analysis. CONCLUSION: This approach enables evaluation of potential ER biological function changes caused by chemicals bound to the receptor which, in turn, allows the assessment of a chemical's endocrine disrupting potential. The approach can be used not only by regulatory authorities to perform risk assessments on potential EDCs but also by the industry in drug discovery projects to screen for potential agonists and antagonists.


Assuntos
Disruptores Endócrinos/química , Antagonistas do Receptor de Estrogênio/química , Receptor alfa de Estrogênio/agonistas , Receptor alfa de Estrogênio/antagonistas & inibidores , Estrogênios/química , Simulação de Acoplamento Molecular/métodos , Simulação por Computador , Disruptores Endócrinos/metabolismo , Antagonistas do Receptor de Estrogênio/metabolismo , Receptor alfa de Estrogênio/química , Receptor alfa de Estrogênio/metabolismo , Estrogênios/metabolismo , Ligantes
11.
BMC Bioinformatics ; 15 Suppl 11: S6, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25350283

RESUMO

BACKGROUND: Due to a significant decline in the costs associated with next-generation sequencing, it has become possible to decipher the genetic architecture of a population by sequencing a large number of individuals to a deep coverage. The Korean Personal Genomes Project (KPGP) recently sequenced 35 Korean genomes at high coverage using the Illumina Hiseq platform and made the deep sequencing data publicly available, providing the scientific community opportunities to decipher the genetic architecture of the Korean population. METHODS: In this study, we used two single nucleotide variant (SNV) calling pipelines: mapping the raw reads obtained from whole genome sequencing of 35 Korean individuals in KPGP using BWA and SOAP2 followed by SNV calling using SAMtools and SOAPsnp, respectively. The consensus SNVs obtained from the two SNV pipelines were used to represent the SNVs of the Korean population. We compared these SNVs to those from 17 other populations provided by the HapMap consortium and the 1000 Genomes Project (1KGP) and identified SNVs that were only present in the Korean population. We studied the mutation spectrum and analyzed the genes of non-synonymous SNVs only detected in the Korean population. RESULTS: We detected a total of 8,555,726 SNVs in the 35 Korean individuals and identified 1,213,613 SNVs detected in at least one Korean individual (SNV-1) and 12,640 in all of 35 Korean individuals (SNV-35) but not in 17 other populations. In contrast with the SNVs common to other populations in HapMap and 1KGP, the Korean only SNVs had high percentages of non-silent variants, emphasizing the unique roles of these Korean only SNVs in the Korean population. Specifically, we identified 8,361 non-synonymous Korean only SNVs, of which 58 SNVs existed in all 35 Korean individuals. The 5,754 genes of non-synonymous Korean only SNVs were highly enriched in some metabolic pathways. We found adhesion is the top disease term associated with SNV-1 and Nelson syndrome is the only disease term associated with SNV-35. We found that a significant number of Korean only SNVs are in genes that are associated with the drug term of adenosine. CONCLUSION: We identified the SNVs that were found in the Korean population but not seen in other populations, and explored the corresponding genes and pathways as well as the associated disease terms and drug terms. The results expand our knowledge of the genetic architecture of the Korean population, which will benefit the implementation of personalized medicine for the Korean population.


Assuntos
Povo Asiático/genética , Polimorfismo de Nucleotídeo Único , Doença/genética , Ontologia Genética , Estudos de Associação Genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Coreia (Geográfico) , Mutação , Alinhamento de Sequência , Análise de Sequência de DNA , Software
12.
Chem Res Toxicol ; 27(9): 1528-36, 2014 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-25083553

RESUMO

Toxicogenomics (TGx) endeavors to elucidate the underlying molecular mechanisms through exploring gene expression profiles in response to toxic substances. Recently, RNA-Seq is increasingly regarded as a more powerful alternative to microarrays in TGx studies. However, realizing RNA-Seq's full potential requires novel approaches to extracting information from the complex TGx data. Considering read counts as the number of times a word occurs in a document, gene expression profiles from RNA-Seq are analogous to a word by document matrix used in text mining. Topic modeling aiming at to discover the latent structures in text corpora would be helpful to explore RNA-Seq based TGx data. In this study, topic modeling was applied on a typical RNA-Seq based TGx data set to discover hidden functional modules. The RNA-Seq based gene expression profiles were transformed into "documents", on which latent Dirichlet allocation (LDA) was used to build a topic model. We found samples treated by the compounds with the same modes of actions (MoAs) could be clustered based on topic similarities. The topic most relevant to each cluster was identified as a "marker" topic, which was interpreted by gene enrichment analysis with MoAs then confirmed by compound and pathways associations mined from literature. To further validate the "marker" topics, we tested topic transferability from RNA-Seq to microarrays. The RNA-Seq based gene expression profile of a topic specifically associated with peroxisome proliferator-activated receptors (PPAR) signaling pathway was used to query samples with similar expression profiles in two different microarray data sets, yielding accuracy of about 85%. This proof-of-concept study demonstrates the applicability of topic modeling to discover functional modules in RNA-Seq data and suggests a valuable computational tool for leveraging information within TGx data in RNA-Seq era.


Assuntos
RNA/química , Toxicogenética , Análise por Conglomerados , Análise de Sequência com Séries de Oligonucleotídeos , Receptores Ativados por Proliferador de Peroxissomo/genética , Receptores Ativados por Proliferador de Peroxissomo/metabolismo , Receptores de Estrogênio/genética , Receptores de Estrogênio/metabolismo , Análise de Sequência de RNA , Transdução de Sinais , Transcriptoma
13.
Int J Environ Res Public Health ; 11(9): 8709-42, 2014 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-25162709

RESUMO

The estrogen receptors (ERs) are a group of versatile receptors. They regulate an enormity of processes starting in early life and continuing through sexual reproduction, development, and end of life. This review provides a background and structural perspective for the ERs as part of the nuclear receptor superfamily and discusses the ER versatility and promiscuity. The wide repertoire of ER actions is mediated mostly through ligand-activated transcription factors and many DNA response elements in most tissues and organs. Their versatility, however, comes with the drawback of promiscuous interactions with structurally diverse exogenous chemicals with potential for a wide range of adverse health outcomes. Even when interacting with endogenous hormones, ER actions can have adverse effects in disease progression. Finally, how nature controls ER specificity and how the subtle differences in receptor subtypes are exploited in pharmaceutical design to achieve binding specificity and subtype selectivity for desired biological response are discussed. The intent of this review is to complement the large body of literature with emphasis on most recent developments in selective ER ligands.


Assuntos
Receptores de Estrogênio/metabolismo , Fatores de Transcrição/metabolismo , Humanos , Ligantes , Receptores de Estrogênio/química , Receptores de Estrogênio/classificação
14.
Genome Biol ; 15(12): 523, 2014 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-25633159

RESUMO

BACKGROUND: Gene expression microarray has been the primary biomarker platform ubiquitously applied in biomedical research, resulting in enormous data, predictive models, and biomarkers accrued. Recently, RNA-seq has looked likely to replace microarrays, but there will be a period where both technologies co-exist. This raises two important questions: Can microarray-based models and biomarkers be directly applied to RNA-seq data? Can future RNA-seq-based predictive models and biomarkers be applied to microarray data to leverage past investment? RESULTS: We systematically evaluated the transferability of predictive models and signature genes between microarray and RNA-seq using two large clinical data sets. The complexity of cross-platform sequence correspondence was considered in the analysis and examined using three human and two rat data sets, and three levels of mapping complexity were revealed. Three algorithms representing different modeling complexity were applied to the three levels of mappings for each of the eight binary endpoints and Cox regression was used to model survival times with expression data. In total, 240,096 predictive models were examined. CONCLUSIONS: Signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development, and microarray-based models can accurately predict RNA-seq-profiled samples; while RNA-seq-based models are less accurate in predicting microarray-profiled samples and are affected both by the choice of modeling algorithm and the gene mapping complexity. The results suggest continued usefulness of legacy microarray data and established microarray biomarkers and predictive models in the forthcoming RNA-seq era.


Assuntos
Perfilação da Expressão Gênica/métodos , Marcadores Genéticos , RNA/análise , Análise de Sequência de RNA , Algoritmos , Animais , Biologia Computacional/métodos , Humanos , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos , Ratos
15.
BMC Bioinformatics ; 14 Suppl 14: S6, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24266910

RESUMO

BACKGROUND: An important mechanism of endocrine activity is chemicals entering target cells via transport proteins and then interacting with hormone receptors such as the estrogen receptor (ER). α-Fetoprotein (AFP) is a major transport protein in rodent serum that can bind and sequester estrogens, thus preventing entry to the target cell and where they could otherwise induce ER-mediated endocrine activity. Recently, we reported rat AFP binding affinities for a large set of structurally diverse chemicals, including 53 binders and 72 non-binders. However, the lack of three-dimensional (3D) structures of rat AFP hinders further understanding of the structural dependence for binding. Therefore, a 3D structure of rat AFP was built using homology modeling in order to elucidate rat AFP-ligand binding modes through docking analyses and molecular dynamics (MD) simulations. METHODS: Homology modeling was first applied to build a 3D structure of rat AFP. Molecular docking and Molecular Mechanics-Generalized Born Surface Area (MM-GBSA) scoring were then used to examine potential rat AFP ligand binding modes. MD simulations and free energy calculations were performed to refine models of binding modes. RESULTS: A rat AFP tertiary structure was first obtained using homology modeling and MD simulations. The rat AFP-ligand binding modes of 13 structurally diverse, representative binders were calculated using molecular docking, (MM-GBSA) ranking and MD simulations. The key residues for rat AFP-ligand binding were postulated through analyzing the binding modes. CONCLUSION: The optimized 3D rat AFP structure and associated ligand binding modes shed light on rat AFP-ligand binding interactions that, in turn, provide a means to estimate binding affinity of unknown chemicals. Our results will assist in the evaluation of the endocrine disruption potential of chemicals.


Assuntos
alfa-Fetoproteínas/química , Sequência de Aminoácidos , Animais , Sítios de Ligação , Ligantes , Modelos Moleculares , Simulação de Dinâmica Molecular , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Coelhos , Ratos , Alinhamento de Sequência , alfa-Fetoproteínas/metabolismo
16.
Toxicol Sci ; 135(2): 277-91, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23897986

RESUMO

Endocrine-active chemicals can potentially have adverse effects on both humans and wildlife. They can interfere with the body's endocrine system through direct or indirect interactions with many protein targets. Estrogen receptors (ERs) are one of the major targets, and many endocrine disruptors are estrogenic and affect the normal estrogen signaling pathways. However, ERs can also serve as therapeutic targets for various medical conditions, such as menopausal symptoms, osteoporosis, and ER-positive breast cancer. Because of the decades-long interest in the safety and therapeutic utility of estrogenic chemicals, a large number of chemicals have been assayed for estrogenic activity, but these data exist in various sources and different formats that restrict the ability of regulatory and industry scientists to utilize them fully for assessing risk-benefit. To address this issue, we have developed an Estrogenic Activity Database (EADB; http://www.fda.gov/ScienceResearch/BioinformaticsTools/EstrogenicActivityDatabaseEADB/default.htm) and made it freely available to the public. EADB contains 18,114 estrogenic activity data points collected for 8212 chemicals tested in 1284 binding, reporter gene, cell proliferation, and in vivo assays in 11 different species. The chemicals cover a broad chemical structure space and the data span a wide range of activities. A set of tools allow users to access EADB and evaluate potential endocrine activity of chemicals. As a case study, a classification model was developed using EADB for predicting ER binding of chemicals.


Assuntos
Bases de Dados de Compostos Químicos , Disruptores Endócrinos/toxicidade , Glândulas Endócrinas/efeitos dos fármacos , Estrogênios/farmacologia , Animais , Humanos
17.
Sci China Life Sci ; 56(2): 110-8, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23393026

RESUMO

Realizing personalized medicine requires integrating diverse data types with bioinformatics. The most vital data are genomic information for individuals that are from advanced next-generation sequencing (NGS) technologies at present. The technologies continue to advance in terms of both decreasing cost and sequencing speed with concomitant increase in the amount and complexity of the data. The prodigious data together with the requisite computational pipelines for data analysis and interpretation are stressors to IT infrastructure and the scientists conducting the work alike. Bioinformatics is increasingly becoming the rate-limiting step with numerous challenges to be overcome for translating NGS data for personalized medicine. We review some key bioinformatics tasks, issues, and challenges in contexts of IT requirements, data quality, analysis tools and pipelines, and validation of biomarkers.


Assuntos
Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Medicina de Precisão/estatística & dados numéricos , Interpretação Estatística de Dados , Sistemas de Gerenciamento de Base de Dados , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Humanos , Disseminação de Informação , Internet , Análise de Sequência de DNA/estatística & dados numéricos , Software
18.
Chem Res Toxicol ; 25(11): 2553-66, 2012 Nov 19.
Artigo em Inglês | MEDLINE | ID: mdl-23013281

RESUMO

Endocrine disrupting chemicals interfere with the endocrine system in animals, including humans, to exert adverse effects. One of the mechanisms of endocrine disruption is through the binding of receptors such as the estrogen receptor (ER) in target cells. The concentration of any chemical in serum is important for its entry into the target cells to bind the receptors. α-Fetoprotein (AFP) is a major transport protein in rodent serum that can bind with estrogens and thus change a chemical's availability for entrance into the target cell. Sequestration of an estrogen in the serum can alter the chemical's potential for disrupting estrogen receptor-mediated responses. To better understand endocrine disruption, we developed a competitive binding assay using rat amniotic fluid, which contains very high levels of AFP, and measured the binding to the rat AFP for 125 structurally diverse chemicals, most of which are known to bind ER. Fifty-three chemicals were able to bind the rat AFP in the assay, while 72 chemicals were determined to be nonbinders. Observations from closely examining the relationship between the binding data and structures of the tested chemicals are rationally explained in a manner consistent with proposed binding regions of rat AFP in the literature. The data reported here represent the largest data set of structurally diverse chemicals tested for rat AFP binding. The data assist in elucidating binding interactions and mechanisms between chemicals and rat AFP and, in turn, assist in the evaluation of the endocrine disrupting potential of chemicals.


Assuntos
Compostos Orgânicos/farmacologia , alfa-Fetoproteínas/metabolismo , Animais , Ligação Competitiva/efeitos dos fármacos , Relação Dose-Resposta a Droga , Feminino , Estrutura Molecular , Compostos Orgânicos/química , Ratos , Ratos Sprague-Dawley , Relação Estrutura-Atividade , alfa-Fetoproteínas/química
19.
PLoS One ; 7(9): e44483, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22970228

RESUMO

During the last several years, high-density genotyping SNP arrays have facilitated genome-wide association studies (GWAS) that successfully identified common genetic variants associated with a variety of phenotypes. However, each of the identified genetic variants only explains a very small fraction of the underlying genetic contribution to the studied phenotypic trait. Moreover, discordance observed in results between independent GWAS indicates the potential for Type I and II errors. High reliability of genotyping technology is needed to have confidence in using SNP data and interpreting GWAS results. Therefore, reproducibility of two widely genotyping technology platforms from Affymetrix and Illumina was assessed by analyzing four technical replicates from each of the six individuals in five laboratories. Genotype concordance of 99.40% to 99.87% within a laboratory for the sample platform, 98.59% to 99.86% across laboratories for the same platform, and 98.80% across genotyping platforms was observed. Moreover, arrays with low quality data were detected when comparing genotyping data from technical replicates, but they could not be detected according to venders' quality control (QC) suggestions. Our results demonstrated the technical reliability of currently available genotyping platforms but also indicated the importance of incorporating some technical replicates for genotyping QC in order to improve the reliability of GWAS results. The impact of discordant genotypes on association analysis results was simulated and could explain, at least in part, the irreproducibility of some GWAS findings when the effect size (i.e. the odds ratio) and the minor allele frequencies are low.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Genótipo , Humanos , Reprodutibilidade dos Testes
20.
BMC Bioinformatics ; 12 Suppl 10: S3, 2011 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-22166133

RESUMO

BACKGROUND: Genomic biomarkers play an increasing role in both preclinical and clinical application. Development of genomic biomarkers with microarrays is an area of intensive investigation. However, despite sustained and continuing effort, developing microarray-based predictive models (i.e., genomics biomarkers) capable of reliable prediction for an observed or measured outcome (i.e., endpoint) of unknown samples in preclinical and clinical practice remains a considerable challenge. No straightforward guidelines exist for selecting a single model that will perform best when presented with unknown samples. In the second phase of the MicroArray Quality Control (MAQC-II) project, 36 analysis teams produced a large number of models for 13 preclinical and clinical endpoints. Before external validation was performed, each team nominated one model per endpoint (referred to here as 'nominated models') from which MAQC-II experts selected 13 'candidate models' to represent the best model for each endpoint. Both the nominated and candidate models from MAQC-II provide benchmarks to assess other methodologies for developing microarray-based predictive models. METHODS: We developed a simple ensemble method by taking a number of the top performing models from cross-validation and developing an ensemble model for each of the MAQC-II endpoints. We compared the ensemble models with both nominated and candidate models from MAQC-II using blinded external validation. RESULTS: For 10 of the 13 MAQC-II endpoints originally analyzed by the MAQC-II data analysis team from the National Center for Toxicological Research (NCTR), the ensemble models achieved equal or better predictive performance than the NCTR nominated models. Additionally, the ensemble models had performance comparable to the MAQC-II candidate models. Most ensemble models also had better performance than the nominated models generated by five other MAQC-II data analysis teams that analyzed all 13 endpoints. CONCLUSIONS: Our findings suggest that an ensemble method can often attain a higher average predictive performance in an external validation set than a corresponding "optimized" model method. Using an ensemble method to determine a final model is a potentially important supplement to the good modeling practices recommended by the MAQC-II project for developing microarray-based genomic biomarkers.


Assuntos
Modelos Genéticos , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Toxicogenética/métodos , Perfilação da Expressão Gênica/métodos , Humanos , Metanálise como Assunto , Controle de Qualidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA