RESUMO
The antibody gene mutator activation-induced cytidine deaminase (AID) promiscuously damages oncogenes, leading to chromosomal translocations and tumorigenesis. Why nonimmunoglobulin loci are susceptible to AID activity is unknown. Here, we study AID-mediated lesions in the context of nuclear architecture and the B cell regulome. We show that AID targets are not randomly distributed across the genome but are predominantly grouped within super-enhancers and regulatory clusters. Unexpectedly, in these domains, AID deaminates active promoters and eRNA(+) enhancers interconnected in some instances over megabases of linear chromatin. Using genome editing, we demonstrate that 3D-linked targets cooperate to recruit AID-mediated breaks. Furthermore, a comparison of hypermutation in mouse B cells, AID-induced kataegis in human lymphomas, and translocations in MEFs reveals that AID damages different genes in different cell types. Yet, in all cases, the targets are predominantly associated with topological complex, highly transcribed super-enhancers, demonstrating that these compartments are key mediators of AID recruitment.
Assuntos
Linfócitos B/metabolismo , Carcinogênese , Citidina Desaminase/genética , Elementos Facilitadores Genéticos , Animais , Dano ao DNA , Humanos , Linfoma/metabolismo , CamundongosRESUMO
A key finding of the ENCODE project is that the enhancer landscape of mammalian cells undergoes marked alterations during ontogeny. However, the nature and extent of these changes are unclear. As part of the NIH Mouse Regulome Project, we here combined DNaseI hypersensitivity, ChIP-seq, and ChIA-PET technologies to map the promoter-enhancer interactomes of pluripotent ES cells and differentiated B lymphocytes. We confirm that enhancer usage varies widely across tissues. Unexpectedly, we find that this feature extends to broadly transcribed genes, including Myc and Pim1 cell-cycle regulators, which associate with an entirely different set of enhancers in ES and B cells. By means of high-resolution CpG methylomes, genome editing, and digital footprinting, we show that these enhancers recruit lineage-determining factors. Furthermore, we demonstrate that the turning on and off of enhancers during development correlates with promoter activity. We propose that organisms rely on a dynamic enhancer landscape to control basic cellular functions in a tissue-specific manner.
Assuntos
Linfócitos B/metabolismo , Células-Tronco Embrionárias/metabolismo , Elementos Facilitadores Genéticos , Regulação da Expressão Gênica no Desenvolvimento , Regiões Promotoras Genéticas , Regulon , Animais , Linhagem da Célula , Células Cultivadas , Ilhas de CpG , Metilação de DNA , Técnicas Genéticas , Camundongos , Especificidade de Órgãos , RNA Longo não Codificante/genética , Fatores de Transcrição/metabolismo , Transcrição GênicaRESUMO
50 years ago, Vincent Allfrey and colleagues discovered that lymphocyte activation triggers massive acetylation of chromatin. However, the molecular mechanisms driving epigenetic accessibility are still unknown. We here show that stimulated lymphocytes decondense chromatin by three differentially regulated steps. First, chromatin is repositioned away from the nuclear periphery in response to global acetylation. Second, histone nanodomain clusters decompact into mononucleosome fibers through a mechanism that requires Myc and continual energy input. Single-molecule imaging shows that this step lowers transcription factor residence time and non-specific collisions during sampling for DNA targets. Third, chromatin interactions shift from long range to predominantly short range, and CTCF-mediated loops and contact domains double in numbers. This architectural change facilitates cognate promoter-enhancer contacts and also requires Myc and continual ATP production. Our results thus define the nature and transcriptional impact of chromatin decondensation and reveal an unexpected role for Myc in the establishment of nuclear topology in mammalian cells.
Assuntos
Linfócitos B/metabolismo , Ciclo Celular , Núcleo Celular/metabolismo , Montagem e Desmontagem da Cromatina , Cromatina/metabolismo , Histonas/metabolismo , Ativação Linfocitária , Proteínas Proto-Oncogênicas c-myc/metabolismo , Acetilcoenzima A/metabolismo , Acetilação , Trifosfato de Adenosina/metabolismo , Animais , Linfócitos B/imunologia , Linhagem Celular , Cromatina/química , Cromatina/genética , Metilação de DNA , Epigênese Genética , Genótipo , Histonas/química , Imunidade Humoral , Metilação , Camundongos Endogâmicos C57BL , Camundongos Knockout , Conformação de Ácido Nucleico , Fenótipo , Domínios e Motivos de Interação entre Proteínas , Processamento de Proteína Pós-Traducional , Proteínas Proto-Oncogênicas c-myc/química , Proteínas Proto-Oncogênicas c-myc/genética , Imagem Individual de Molécula , Relação Estrutura-Atividade , Fatores de Tempo , Transcrição GênicaRESUMO
The Illuminating the Druggable Genome (IDG) project aims to improve our understanding of understudied proteins and our ability to study them in the context of disease biology by perturbing them with small molecules, biologics, or other therapeutic modalities. Two main products from the IDG effort are the Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/), which curates and aggregates information, and Pharos (https://pharos.nih.gov/), a web interface for fusers to extract and visualize data from TCRD. Since the 2021 release, TCRD/Pharos has focused on developing visualization and analysis tools that help reveal higher-level patterns in the underlying data. The current iterations of TCRD and Pharos enable users to perform enrichment calculations based on subsets of targets, diseases, or ligands and to create interactive heat maps and UpSet charts of many types of annotations. Using several examples, we show how to address disease biology and drug discovery questions through enrichment calculations and UpSet charts.
Assuntos
Bases de Dados Factuais , Terapia de Alvo Molecular , Proteoma , Humanos , Produtos Biológicos , Descoberta de Drogas , Internet , Proteoma/efeitos dos fármacosRESUMO
Epidemiology studies evaluate associations between the metabolome and disease risk. Urine is a common biospecimen used for such studies due to its wide availability and non-invasive collection. Evaluating the robustness of urinary metabolomic profiles under varying preanalytical conditions is thus of interest. Here we evaluate the impact of sample handling conditions on urine metabolome profiles relative to the gold standard condition (no preservative, no refrigeration storage, single freeze thaw). Conditions tested included the use of borate or chlorhexidine preservatives, various storage and freeze/thaw cycles. We demonstrate that sample handling conditions impact metabolite levels, with borate showing the largest impact with 125 of 1048 altered metabolites (adjusted P < 0.05). When simulating a case-control study with expected inconsistencies in sample handling, we predicted the occurrence of false positive altered metabolites to be low (< 11). Predicted false positives increased substantially (≥63) when cases were simulated to undergo alternate handling. Finally, we demonstrate that sample handling impacts on the urinary metabolome were markedly smaller than those in serum. While changes in urine metabolites incurred by sample handling are generally small, we recommend implementing consistent handling conditions and evaluating robustness of metabolite measurements for those showing significant associations with disease outcomes.
RESUMO
MOTIVATION: Functional interpretation of high-throughput metabolomic and transcriptomic results is a crucial step in generating insight from experimental data. However, pathway and functional information for genes and metabolites are distributed among many siloed resources, limiting the scope of analyses that rely on a single knowledge source. RESULTS: RaMP-DB 2.0 is a web interface, relational database, API and R package designed for straightforward and comprehensive functional interpretation of metabolomic and multi-omic data. RaMP-DB 2.0 has been upgraded with an expanded breadth and depth of functional and chemical annotations (ClassyFire, LIPID MAPS, SMILES, InChIs, etc.), with new data types related to metabolites and lipids incorporated. To streamline entity resolution across multiple source databases, we have implemented a new semi-automated process, thereby lessening the burden of harmonization and supporting more frequent updates. The associated RaMP-DB 2.0 R package now supports queries on pathways, common reactions (e.g. metabolite-enzyme relationship), chemical functional ontologies, chemical classes and chemical structures, as well as enrichment analyses on pathways (multi-omic) and chemical classes. Lastly, the RaMP-DB web interface has been completely redesigned using the Angular framework. AVAILABILITY AND IMPLEMENTATION: The code used to build all components of RaMP-DB 2.0 are freely available on GitHub at https://github.com/ncats/ramp-db, https://github.com/ncats/RaMP-Client/ and https://github.com/ncats/RaMP-Backend. The RaMP-DB web application can be accessed at https://rampdb.nih.gov/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Metabolômica , Software , Bases de Dados Factuais , Perfilação da Expressão Gênica , Bases de Conhecimento , ProteínasRESUMO
Prebiotic galactooligosaccharides (GOS) reduce anxiety-like behaviors in mice and humans. However, the biological pathways behind these behavioral changes are not well understood. To begin to study these pathways, we utilized C57BL/6 mice that were fed a standard diet with or without GOS supplementation for 3 weeks prior to testing on the open field. After behavioral testing, colonic contents and serum were collected for bacteriome (16S rRNA gene sequencing, colonic contents only) and metabolome (UPLC-MS, colonic contents and serum data) analyses. As expected, GOS significantly reduced anxiety-like behavior (i.e., increased time in the center) and decreased cytokine gene expression (Tnfa and Ccl2) in the prefrontal cortex. Notably, time in the center of the open field was significantly correlated with serum methyl-indole-3-acetic acid (methyl-IAA). This metabolite is a methylated form of indole-3-acetic acid (IAA) that is derived from bacterial metabolism of tryptophan. Sequencing analyses showed that GOS significantly increased Lachnospiraceae UCG006 and Akkermansia; these taxa are known to metabolize both GOS and tryptophan. To determine the extent to which methyl-IAA can affect anxiety-like behavior, mice were intraperitoneally injected with methyl-IAA. Mice given methyl-IAA had a reduction in anxiety-like behavior in the open field, along with lower Tnfa in the prefrontal cortex. Methyl-IAA was also found to reduce TNF-α (as well as CCL2) production by LPS-stimulated BV2 microglia. Together, these data support a novel pathway through which GOS reduces anxiety-like behaviors in mice and suggests that the bacterial metabolite methyl-IAA reduces microglial cytokine and chemokine production, which in turn reduces anxiety-like behavior.
Assuntos
Ansiedade , Microbioma Gastrointestinal , Camundongos Endogâmicos C57BL , Microglia , Oligossacarídeos , Córtex Pré-Frontal , Triptofano , Animais , Ansiedade/metabolismo , Camundongos , Microglia/metabolismo , Triptofano/metabolismo , Microbioma Gastrointestinal/efeitos dos fármacos , Microbioma Gastrointestinal/fisiologia , Masculino , Córtex Pré-Frontal/metabolismo , Oligossacarídeos/metabolismo , Oligossacarídeos/farmacologia , Oligossacarídeos/administração & dosagem , Comportamento Animal/efeitos dos fármacos , Prebióticos/administração & dosagem , Colo/metabolismo , Fator de Necrose Tumoral alfa/metabolismo , Quimiocina CCL2/metabolismoRESUMO
The United States has a complex regulatory scheme for marketing drugs. Understanding drug regulatory status is a daunting task that requires integrating data from many sources from the United States Food and Drug Administration (FDA), US government publications, and other processes related to drug development. At NCATS, we created Inxight Drugs (https://drugs.ncats.io), a web resource that attempts to address this challenge in a systematic manner. NCATS Inxight Drugs incorporates and unifies a wealth of data, including those supplied by the FDA and from independent public sources. The database offers a substantial amount of manually curated literature data unavailable from other sources. Currently, the database contains 125 036 product ingredients, including 2566 US approved drugs, 6242 marketed drugs, and 9684 investigational drugs. All substances are rigorously defined according to the ISO 11238 standard to comply with existing regulatory standards for unique drug substance identification. A special emphasis was placed on capturing manually curated and referenced data on treatment modalities and semantic relationships between substances. A supplementary resource 'Novel FDA Drug Approvals' features regulatory details of newly approved FDA drugs. The database is regularly updated using NCATS Stitcher data integration tool that automates data aggregation and supports full data access through a RESTful API.
Assuntos
Bases de Dados Factuais , Bases de Dados de Produtos Farmacêuticos , Preparações Farmacêuticas/classificação , United States Food and Drug Administration , Humanos , National Center for Advancing Translational Sciences (U.S.) , Pesquisa Translacional Biomédica/classificação , Estados UnidosRESUMO
BACKGROUND: Identifying individuals with a higher risk of developing severe coronavirus disease 2019 (COVID-19) outcomes will inform targeted and more intensive clinical monitoring and management. To date, there is mixed evidence regarding the impact of preexisting autoimmune disease (AID) diagnosis and/or immunosuppressant (IS) exposure on developing severe COVID-19 outcomes. METHODS: A retrospective cohort of adults diagnosed with COVID-19 was created in the National COVID Cohort Collaborative enclave. Two outcomes, life-threatening disease and hospitalization, were evaluated by using logistic regression models with and without adjustment for demographics and comorbidities. RESULTS: Of the 2 453 799 adults diagnosed with COVID-19, 191 520 (7.81%) had a preexisting AID diagnosis and 278 095 (11.33%) had a preexisting IS exposure. Logistic regression models adjusted for demographics and comorbidities demonstrated that individuals with a preexisting AID (odds ratio [OR], 1.13; 95% confidence interval [CI]: 1.09-1.17; P < .001), IS exposure (OR, 1.27; 95% CI: 1.24-1.30; P < .001), or both (OR, 1.35; 95% CI: 1.29-1.40; P < .001) were more likely to have a life-threatening disease. These results were consistent when hospitalization was evaluated. A sensitivity analysis evaluating specific IS revealed that tumor necrosis factor inhibitors were protective against life-threatening disease (OR, 0.80; 95% CI: .66-.96; P = .017) and hospitalization (OR, 0.80; 95% CI: .73-.89; P < .001). CONCLUSIONS: Patients with preexisting AID, IS exposure, or both are more likely to have a life-threatening disease or hospitalization. These patients may thus require tailored monitoring and preventative measures to minimize negative consequences of COVID-19.
Assuntos
Autoimunidade , COVID-19 , Adulto , Humanos , COVID-19/epidemiologia , Estudos Retrospectivos , Hospitalização , Imunossupressores/uso terapêuticoRESUMO
Single-cell multimodal omics (scMulti-omics) technologies have made it possible to trace cellular lineages during differentiation and to identify new cell types in heterogeneous cell populations. The derived information is especially promising for computing cell-type-specific biological networks encoded in complex diseases and improving our understanding of the underlying gene regulatory mechanisms. The integration of these networks could, therefore, give rise to a heterogeneous regulatory landscape (HRL) in support of disease diagnosis and drug therapeutics. In this review, we provide an overview of this field and pay particular attention to how diverse biological networks can be inferred in a specific cell type based on integrative methods. Then, we discuss how HRL can advance our understanding of regulatory mechanisms underlying complex diseases and aid in the prediction of prognosis and therapeutic responses. Finally, we outline challenges and future trends that will be central to bringing the field of HRL in complex diseases forward.
Assuntos
Biologia Computacional/métodos , Doença/genética , Redes Reguladoras de Genes , Análise de Célula Única/métodos , Animais , HumanosRESUMO
Together with various hosts and environments, ubiquitous microbes interact closely with each other forming an intertwined system or community. Of interest, shifts of the relationships between microbes and their hosts or environments are associated with critical diseases and ecological changes. While advances in high-throughput Omics technologies offer a great opportunity for understanding the structures and functions of microbiome, it is still challenging to analyse and interpret the omics data. Specifically, the heterogeneity and diversity of microbial communities, compounded with the large size of the datasets, impose a tremendous challenge to mechanistically elucidate the complex communities. Fortunately, network analyses provide an efficient way to tackle this problem, and several network approaches have been proposed to improve this understanding recently. Here, we systemically illustrate these network theories that have been used in biological and biomedical research. Then, we review existing network modelling methods of microbial studies at multiple layers from metagenomics to metabolomics and further to multi-omics. Lastly, we discuss the limitations of present studies and provide a perspective for further directions in support of the understanding of microbial communities.
Assuntos
Ensaios de Triagem em Larga Escala/métodos , Metabolômica/métodos , Metagenômica/métodos , Microbiota , Proteômica/métodos , Transcriptoma , HumanosRESUMO
BACKGROUND: The United Nations recently made a call to address the challenges of an estimated 300 million persons worldwide living with a rare disease through the collection, analysis, and dissemination of disaggregated data. Epidemiologic Information (EI) regarding prevalence and incidence data of rare diseases is sparse and current paradigms of identifying, extracting, and curating EI rely upon time-intensive, error-prone manual processes. With these limitations, a clear understanding of the variation in epidemiology and outcomes for rare disease patients is hampered. This challenges the public health of rare diseases patients through a lack of information necessary to prioritize research, policy decisions, therapeutic development, and health system allocations. METHODS: In this study, we developed a newly curated epidemiology corpus for Named Entity Recognition (NER), a deep learning framework, and a novel rare disease epidemiologic information pipeline named EpiPipeline4RD consisting of a web interface and Restful API. For the corpus creation, we programmatically gathered a representative sample of rare disease epidemiologic abstracts, utilized weakly-supervised machine learning techniques to label the dataset, and manually validated the labeled dataset. For the deep learning framework development, we fine-tuned our dataset and adapted the BioBERT model for NER. We measured the performance of our BioBERT model for epidemiology entity recognition quantitatively with precision, recall, and F1 and qualitatively through a comparison with Orphanet. We demonstrated the ability for our pipeline to gather, identify, and extract epidemiology information from rare disease abstracts through three case studies. RESULTS: We developed a deep learning model to extract EI with overall F1 scores of 0.817 and 0.878, evaluated at the entity-level and token-level respectively, and which achieved comparable qualitative results to Orphanet's collection paradigm. Additionally, case studies of the rare diseases Classic homocystinuria, GRACILE syndrome, Phenylketonuria demonstrated the adequate recall of abstracts with epidemiology information, high precision of epidemiology information extraction through our deep learning model, and the increased efficiency of EpiPipeline4RD compared to a manual curation paradigm. CONCLUSIONS: EpiPipeline4RD demonstrated high performance of EI extraction from rare disease literature to augment manual curation processes. This automated information curation paradigm will not only effectively empower development of the NIH Genetic and Rare Diseases Information Center (GARD), but also support the public health of the rare disease community.
Assuntos
Acidose Láctica , Colestase , Humanos , Doenças Raras/diagnóstico , Doenças Raras/epidemiologia , Saúde Pública , Armazenamento e Recuperação da InformaçãoRESUMO
In 2014, the National Institutes of Health (NIH) initiated the Illuminating the Druggable Genome (IDG) program to identify and improve our understanding of poorly characterized proteins that can potentially be modulated using small molecules or biologics. Two resources produced from these efforts are: The Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/) and Pharos (https://pharos.nih.gov/), a web interface to browse the TCRD. The ultimate goal of these resources is to highlight and facilitate research into currently understudied proteins, by aggregating a multitude of data sources, and ranking targets based on the amount of data available, and presenting data in machine learning ready format. Since the 2017 release, both TCRD and Pharos have produced two major releases, which have incorporated or expanded an additional 25 data sources. Recently incorporated data types include human and viral-human protein-protein interactions, protein-disease and protein-phenotype associations, and drug-induced gene signatures, among others. These aggregated data have enabled us to generate new visualizations and content sections in Pharos, in order to empower users to find new areas of study in the druggable genome.
Assuntos
Bases de Dados Factuais , Genoma Humano , Doenças Neurodegenerativas/genética , Proteômica/métodos , Software , Viroses/genética , Animais , Anticonvulsivantes/química , Anticonvulsivantes/uso terapêutico , Antivirais/química , Antivirais/uso terapêutico , Produtos Biológicos/química , Produtos Biológicos/uso terapêutico , Mineração de Dados/estatística & dados numéricos , Interações Hospedeiro-Patógeno/efeitos dos fármacos , Interações Hospedeiro-Patógeno/genética , Humanos , Internet , Aprendizado de Máquina/estatística & dados numéricos , Camundongos , Camundongos Knockout , Terapia de Alvo Molecular/métodos , Doenças Neurodegenerativas/classificação , Doenças Neurodegenerativas/tratamento farmacológico , Doenças Neurodegenerativas/virologia , Mapeamento de Interação de Proteínas , Proteoma/agonistas , Proteoma/antagonistas & inibidores , Proteoma/genética , Proteoma/metabolismo , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/uso terapêutico , Viroses/classificação , Viroses/tratamento farmacológico , Viroses/virologiaRESUMO
Understanding the molecular underpinnings of disease severity and progression in human studies is necessary to develop metabolism-related preventative strategies for severe COVID-19. Metabolites and metabolic pathways that predispose individuals to severe disease are not well understood. In this study, we generated comprehensive plasma metabolomic profiles in >550 patients from the Longitudinal EMR and Omics COVID-19 Cohort. Samples were collected before (n = 441), during (n = 86), and after (n = 82) COVID-19 diagnosis, representing 555 distinct patients, most of which had single timepoints. Regression models adjusted for demographics, risk factors, and comorbidities, were used to determine metabolites associated with predisposition to and/or persistent effects of COVID-19 severity, and metabolite changes that were transient/lingering over the disease course. Sphingolipids/phospholipids were negatively associated with severity and exhibited lingering elevations after disease, while modified nucleotides were positively associated with severity and had lingering decreases after disease. Cytidine and uridine metabolites, which were positively and negatively associated with COVID-19 severity, respectively, were acutely elevated, reflecting the particular importance of pyrimidine metabolism in active COVID-19. This is the first large metabolomics study using COVID-19 plasma samples before, during, and/or after disease. Our results lay the groundwork for identifying putative biomarkers and preventive strategies for severe COVID-19.
Assuntos
COVID-19 , Nucleotídeos , Humanos , Cinurenina , Teste para COVID-19 , Estudos Prospectivos , FosfolipídeosRESUMO
Consortium-based research is crucial for producing reliable, high-quality findings, but existing tools for consortium studies have important drawbacks with respect to data protection, ease of deployment, and analytical rigor. To address these concerns, we developed COnsortium of METabolomics Studies (COMETS) Analytics to support and streamline consortium-based analyses of metabolomics and other -omics data. The application requires no specialized expertise and can be run locally to guarantee data protection or through a Web-based server for convenience and speed. Unlike other Web-based tools, COMETS Analytics enables standardized analyses to be run across all cohorts, using an algorithmic, reproducible approach to diagnose, document, and fix model issues. This eliminates the time-consuming and potentially error-prone step of manually customizing models by cohort, helping to accelerate consortium-based projects and enhancing analytical reproducibility. We demonstrated that the application scales well by performing 2 data analyses in 45 cohort studies that together comprised measurements of 4,647 metabolites in up to 134,742 participants. COMETS Analytics performed well in this test, as judged by the minimal errors that analysts had in preparing data inputs and the successful execution of all models attempted. As metabolomics gathers momentum among biomedical and epidemiologic researchers, COMETS Analytics may be a useful tool for facilitating large-scale consortium-based research.
Assuntos
Academias e Institutos/organização & administração , Análise de Dados , Estudos Epidemiológicos , Metabolômica/métodos , Algoritmos , Humanos , Internet , Design de SoftwareRESUMO
In the event of an outbreak due to an emerging pathogen, time is of the essence to contain or to mitigate the spread of the disease. Drug repositioning is one of the strategies that has the potential to deliver therapeutics relatively quickly. The SARS-CoV-2 pandemic has shown that integrating critical data resources to drive drug-repositioning studies, involving host-host, host-pathogen, and drug-target interactions, remains a time-consuming effort that translates to a delay in the development and delivery of a life-saving therapy. Here, we describe a workflow we designed for a semiautomated integration of rapidly emerging data sets that can be generally adopted in a broad network pharmacology research setting. The workflow was used to construct a COVID-19 focused multimodal network that integrates 487 host-pathogen, 63â¯278 host-host protein, and 1221 drug-target interactions. The resultant Neo4j graph database named "Neo4COVID19" is made publicly accessible via a web interface and via API calls based on the Bolt protocol. Details for accessing the database are provided on a landing page (https://neo4covid19.ncats.io/). We believe that our Neo4COVID19 database will be a valuable asset to the research community and will catalyze the discovery of therapeutics to fight COVID-19.
Assuntos
COVID-19 , Reposicionamento de Medicamentos , Humanos , Farmacologia em Rede , Pandemias , SARS-CoV-2 , Fluxo de TrabalhoRESUMO
Membrane permeability plays an important role in oral drug absorption. Caco-2 and Madin-Darby Canine Kidney (MDCK) cell culture systems have been widely used for assessing intestinal permeability. Since most drugs are absorbed passively, Parallel Artificial Membrane Permeability Assay (PAMPA) has gained popularity as a low-cost and high-throughput method in early drug discovery when compared to high-cost, labor intensive cell-based assays. At the National Center for Advancing Translational Sciences (NCATS), PAMPA pH 5 is employed as one of the Tier I absorption, distribution, metabolism, and elimination (ADME) assays. In this study, we have developed a quantitative structure activity relationship (QSAR) model using our â¼6500 compound PAMPA pH 5 permeability dataset. Along with ensemble decision tree-based methods such as Random Forest and eXtreme Gradient Boosting, we employed deep neural network and a graph convolutional neural network to model PAMPA pH 5 permeability. The classification models trained on a balanced training set provided accuracies ranging from 71% to 78% on the external set. Of the four classifiers, the graph convolutional neural network that directly operates on molecular graphs offered the best classification performance. Additionally, an â¼85% correlation was obtained between PAMPA pH 5 permeability and in vivo oral bioavailability in mice and rats. These results suggest that data from this assay (experimental or predicted) can be used to rank-order compounds for preclinical in vivo testing with a high degree of confidence, reducing cost and attrition as well as accelerating the drug discovery process. Additionally, experimental data for 486 compounds (PubChem AID: 1645871) and the best models have been made publicly available (https://opendata.ncats.nih.gov/adme/).
Assuntos
Betametasona/farmacocinética , Dexametasona/farmacocinética , Ranitidina/farmacocinética , Verapamil/farmacocinética , Administração Oral , Animais , Betametasona/administração & dosagem , Disponibilidade Biológica , Células CACO-2 , Permeabilidade da Membrana Celular/efeitos dos fármacos , Células Cultivadas , Dexametasona/administração & dosagem , Cães , Relação Dose-Resposta a Droga , Humanos , Concentração de Íons de Hidrogênio , Células Madin Darby de Rim Canino , Camundongos , Estrutura Molecular , Redes Neurais de Computação , Ranitidina/administração & dosagem , Ratos , Relação Estrutura-Atividade , Verapamil/administração & dosagemRESUMO
BACKGROUND: Assigning chromatin states genome-wide (e.g. promoters, enhancers, etc.) is commonly performed to improve functional interpretation of these states. However, computational methods to assign chromatin state suffer from the following drawbacks: they typically require data from multiple assays, which may not be practically feasible to obtain, and they depend on peak calling algorithms, which require careful parameterization and often exclude the majority of the genome. To address these drawbacks, we propose a novel learning technique built upon the Self-Organizing Map (SOM), Self-Organizing Map with Variable Neighborhoods (SOM-VN), to learn a set of representative shapes from a single, genome-wide, chromatin accessibility dataset to associate with a chromatin state assignment in which a particular RE is prevalent. These shapes can then be used to assign chromatin state using our workflow. RESULTS: We validate the performance of the SOM-VN workflow on 14 different samples of varying quality, namely one assay each of A549 and GM12878 cell lines and two each of H1 and HeLa cell lines, primary B-cells, and brain, heart, and stomach tissue. We show that SOM-VN learns shapes that are (1) non-random, (2) associated with known chromatin states, (3) generalizable across sets of chromosomes, and (4) associated with magnitude and multimodality. We compare the accuracy of SOM-VN chromatin states against the Clustering Aggregation Tool (CAGT), an unsupervised method that learns chromatin accessibility signal shapes but does not associate these shapes with REs, and we show that overall precision and recall is increased when learning shapes using SOM-VN as compared to CAGT. We further compare enhancer state assignments from SOM-VN in signals above a set threshold to enhancer state assignments from Predicting Enhancers from ATAC-seq Data (PEAS), a deep learning method that assigns enhancer chromatin states to peaks. We show that the precision-recall area under the curve for the assignment of enhancer states is comparable to PEAS. CONCLUSIONS: Our work shows that the SOM-VN workflow can learn relationships between REs and chromatin accessibility signal shape, which is an important step toward the goal of assigning and comparing enhancer state across multiple experiments and phenotypic states.
Assuntos
Cromatina , Elementos Facilitadores Genéticos , Regiões Promotoras Genéticas , Adulto , Algoritmos , Pré-Escolar , Cromatina/genética , Células HeLa , Humanos , Adulto JovemRESUMO
After publication of this supplement article [1], it is requested the grant ID in the Funding section should be corrected from NSF grant IIS-7811367 to NSF grant IIS-1902617. Therefore, the correct 'Funding' section in this article should read: We thank the National Science Foundation (NSF grant IIS-1902617) for the financial support of ICIBM 2019. This article has not received sponsorship for publication.
RESUMO
An amendment to this paper has been published and can be accessed via the original article.