RESUMEN
Classical hydrocarbon scaffolds have long assisted in bringing new molecules to the market for a variety of applications, but one notable omission is that of tetraasteranes, which are homologues of cubanes belonging to a class of polycyclic hydrocarbon cage compounds. Tetraasteranes exhibit potential as scaffolds in drug discovery due to their identical cyclobutane structures and rigid conformation resembling cubanes. Based on the studies of the physical and chemical properties of tetraasteranes by density functional theory, three series of compounds were designed as homologues of cubanes by the substitution of cubane scaffolds in pharmaceuticals with tetraasteranes. Their potential for pharmaceutical applications was evaluated in silico by molecular docking and dynamics simulations. Their pharmacokinetic and physicochemical properties were studied by the ADMET (absorption, distribution, metabolism, excretion, and toxicity) analysis. The results indicate that tetraasteranes may be scaffolds as novel bioisosteres of cubanes, as well as hydrogen bond donors or acceptors, which enhance the affinity between ligands and receptors with more stable binding behavior and feasible tolerability in ADMET. All these findings provide new opportunities for tetraasteranes to serve as effective pharmaceutical scaffolds for drug discovery and to accelerate the drug discovery process by repurposing both new and old commercial compounds.
RESUMEN
Human glutaminyl cyclase (hQC) inhibitors have great potential to be used as anti- Alzheimer's disease (AD) agents by reducing the toxic pyroform of ß-amyloid in the brains of AD patients. The four-dimensional quantitative structure activity relationship (4D-QSAR) model of N-substituted urea/thioureas was established with satisfying predictive ability and statistical reliability (Q2 = 0.521, R2 = 0.933, R2prep = 0.619). By utilizing the developed 4D-QSAR model, a set of new N-substituted urea/thioureas was designed and evaluated for their Absorption Distribution Metabolism Excretion and Toxicity (ADMET) properties. The results of molecular dynamics (MD) simulations, Principal component analysis (PCA), free energy landscape (FEL), dynamic cross-correlation matrix (DCCM) and molecular mechanics generalized Born Poisson-Boltzmann surface area (MM-PBSA) free energy calculations, revealed that the designed compounds were remained stable in protein binding pocket and compounds b â¼ f (-35.1 to -44.55 kcal/mol) showed higher binding free energy than that of compound 14 (-33.51 kcal/mol). The findings of this work will be a theoretical foundation for further research and experimental validation of urea/thiourea derivatives as hQC inhibitors.
Asunto(s)
Aminoaciltransferasas , Inhibidores Enzimáticos , Simulación de Dinámica Molecular , Relación Estructura-Actividad Cuantitativa , Tiourea , Urea , Humanos , Tiourea/química , Tiourea/farmacología , Tiourea/análogos & derivados , Urea/química , Urea/análogos & derivados , Urea/farmacología , Aminoaciltransferasas/antagonistas & inhibidores , Aminoaciltransferasas/metabolismo , Inhibidores Enzimáticos/química , Inhibidores Enzimáticos/farmacología , Estructura Molecular , Diseño de FármacosRESUMEN
Graph-based pangenome is gaining more popularity than linear pangenome because it stores more comprehensive information of variations. However, traditional linear genome browser has its own advantages, especially the tremendous resources accumulated historically. With the fast-growing number of individual genomes and their annotations available, the demand for a genome browser to visualize genome annotation for many individuals together with a graph-based pangenome is getting higher and higher. Here we report a new pangenome browser PPanG, a precise pangenome browser enabling nucleotide-level comparison of individual genome annotations together with a graph-based pangenome. Nine rice genomes with annotations were provided by default as potential references, and any individual genome can be selected as the reference. Our pangenome browser provides unprecedented insights on genome variations at different levels from base to gene, and reveals how the structures of a gene could differ for individuals. PPanG can be applied to any species with multiple individual genomes available and it is available at https://cgm.sjtu.edu.cn/PPanG .
Asunto(s)
Genómica , Genómica/métodos , Oryza/genética , Anotación de Secuencia Molecular , Genoma de Planta , Variación Genética , Programas Informáticos , Navegador Web , Bases de Datos Genéticas , Nucleótidos/genética , GenomaAsunto(s)
Arachis , Genoma de Planta , Arachis/genética , Arachis/microbiología , Genoma de Planta/genética , TetraploidíaRESUMEN
BACKGROUND: The epidermal growth factor receptor (EGFR) protein has been intensively studied as a therapeutic target for non-small cell lung cancer (NSCLC). The aminobenzimidazole derivatives as the fourth-generation EGFR inhibitors have achieved promising results and overcame EGFR mutations at C797S, del19 and T790M in NSCLC. OBJECTIVE: In order to understand the quantitative structure-activity relationship (QSAR) of aminobenzimidazole derivatives as EGFRdel19 T790M C797S inhibitors, the four-dimensional QSAR (4D-QSAR) and multivariate image analysis (MIA-QSAR) have been performed on the data of 45 known aminobenzimidazole derivatives. METHODS: The 4D-QSAR descriptors were acquired by calculating the association energies between probes and aligned conformational ensemble profiles (CEP), and the regression models were established by partial least squares (PLS). In order to further understand and verify the 4D-QSAR model, MIA-QSAR was constructed by using chemical structure pictures to generate descriptors and PLS regression. Furthermore, the molecular docking and averaged noncovalent interactions (aNCI) analysis were also performed to further understand the interactions between ligands and the EGFR targets, which was in good agreement with the 4D-QSAR model. RESULTS: The established 4D-QSAR and MIA-QSAR models have strong stability and good external prediction ability. CONCLUSION: These results will provide theoretical guidance for the research and development of aminobenzimidazole derivatives as new EGFRdel19 T790M C797S inhibitors.
Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , Humanos , Relación Estructura-Actividad Cuantitativa , Simulación del Acoplamiento Molecular , Receptores ErbB/genética , Inhibidores de Proteínas Quinasas/farmacología , Inhibidores de Proteínas Quinasas/química , Mutación , Resistencia a AntineoplásicosRESUMEN
In this study, the visible-light-driven [2 + 2] photocycloaddition of 1,4-dihydropyrazines in solution was reported. The N,N'-diacyl-1,4-dihydropyrazines with different substituents showed completely different reactivity under the irradiation of a 430 nm blue light-emitting diode (LED) lamp. N,N'-Diacetyl-1,4-dihydropyrazine and N,N'-dipropionyl-1,4-dihydropyrazine were the only compounds capable of undergoing a [2 + 2] photocycloaddition reaction, yielding syn-dimers and cage-dimers (known as 3,6,9,12-tetraazatetraasteranes) with overall yields of 76 and 83%, correspondingly. The substituent-reactivity effect on [2 + 2] photocycloaddition of N,N'-diacyl-1,4-dihydropyrazines was investigated by density functional theory calculations. The results show that the substituents have little influence on Gibbs free energy for the [2 + 2] photocycloaddition and mainly affect the excited energy, reaction sites, and the triplet excited-state structures of 1,4-dihydropyrazines, which are closely related to whether the reaction occurs. The results offer insights into the photochemical reactivity of 1,4-dihydropyrazines and an approach for constructing dimers of N,N'-diacyl-1,4-dihydropyrazines through a solution-based visible-light-driven [2 + 2] photocycloaddition, especially for the construction of 3,6,9,12-tetraazatetraasteranes. Compared with the solid-state [2 + 2] photocycloaddition of 1,4-dihydropyrazine, this photocycloaddition will be an efficient and environmentally friendly method for synthesizing tetraazatetraasteranes with the advantages of milder reaction conditions, simple operation, adjustable reaction amounts by omitting the cocrystal growth step, etc.
RESUMEN
In our research on novel anticancer agents, a series of N6 -hydrazone purine derivatives were designed and synthesized by analysis of a pharmacophore model for ATP-competitive inhibitors. The activities screening results showed that N6 -hydrazone purine derivatives 21 and 26 not only showed potential antiproliferative activity against the A549 and MCF-7 cell lines comparable to Vandetanib as a positive control but also had moderate antiplatelet aggregation activity. In order to investigate the possible targets, a molecular docking study was carried out on the fourteen kinases associated with anticancer and antiplatelet aggregation activities. The results indicated that compounds 21 and 26 had the potential activity to target VEGFR-2, PI3Kα, EGFR, and HER2 kinases. The inhibition of the kinases assay showed that compound 26 could target VEGFR-2, PI3Kα, and EGFR (IC50 = 0.822, 3.040 and 6.625 µM). All results indicated that compound 26 will be an encouraging framework as potential new multi-target anticancer agent with potential antiplatelet aggregation activity.
Asunto(s)
Antineoplásicos , Receptor 2 de Factores de Crecimiento Endotelial Vascular , Humanos , Relación Estructura-Actividad , Receptor 2 de Factores de Crecimiento Endotelial Vascular/metabolismo , Simulación del Acoplamiento Molecular , Proliferación Celular , Hidrazonas/farmacología , Ensayos de Selección de Medicamentos Antitumorales , Antineoplásicos/farmacología , Receptores ErbB/metabolismo , Purinas/farmacología , Diseño de Fármacos , Inhibidores de Proteínas Quinasas/farmacología , Estructura MolecularRESUMEN
Lactococcus lactis (L. lactis) is a well isolated and cultured lactic acid bacterium, but if utilizing the isolate genomes alone, the genome-based analysis of this taxon would be incomplete, because there are still uncultured strains in some ecological niches. In this study, we recovered 93 high-quality metagenome-assembled genomes (MAGs) of L. lactis from food and human gut metagenomes with a culture-independent method. We then constructed a unified genome catalog of L. lactis by integrating these MAGs with 70 publicly available isolated genomes. Having this comprehensive resource, we assessed the genomic diversity and phylogenetic relationships to further explore the genetic and functional properties of L. lactis. An open pangenome of L. lactis was generated using our genome catalog, consisting of 13,066 genes in total, from which 5,448 genes were not identified in the isolate genomes. The core genome-based phylogenetic analysis showed that L. lactis strains we collected were separated into two main subclades corresponding to two subspecies, with some uncultured phylogenetic lineages discovered. The species disparity was also indicated in PCA analysis based on accessory genes of our pangenome. These various analyzes shed further light on unexpectedly high diversity within the taxon at both genome and gene levels and gave clues about its population structure and evolution. Lactococcus lactis has a long history of safe use in food fermentations and is considered as one of the important probiotic microorganisms. Obtaining the complete genetic information of L. lactis is important to the food and health industry. However, it can naturally inhabit many environments other than dairy products, including drain water and human gut samples. Here we presented an open pan-genome of L. lactis constructed from 163 high-quality genomes obtained from various environments, including MAGs recovered from environmental metagenomes and isolate genomes. This study expanded the genetic information of L. lactis about one third, including more than 5,000 novel genes found in uncultured strains. This more complete gene repertoire of L. lactis is crucial to further understanding the genetic and functional properties. These properties may be harnessed to impart additional value to dairy fermentation or other industries.
RESUMEN
Pangenomic study might improve the completeness of human reference genome (GRCh38) and promote precision medicine. Here, we use an automated pipeline of human pangenomic analysis to build gastric cancer pan-genome for 185 paired deep sequencing data (370 samples), and characterize the gene presence-absence variations (PAVs) at whole genome level. Genes ACOT1, GSTM1, SIGLEC14 and UGT2B17 are identified as highly absent genes in gastric cancer population. A set of genes from unaligned sequences with GRCh38 are predicted. We successfully locate one of predicted genes GC0643 on chromosome 9q34.2. Overexpression of GC0643 significantly inhibits cell growth, cell migration and invasion, cell cycle progression, and induces cell apoptosis in cancer cells. The tumor suppressor functions can be reversed by shGC0643 knockdown. The GC0643 is approved by NCBI database (GenBank: MW194843.1). Collectively, the robust pan-genome strategy provides a deeper understanding of the gene PAVs in the human cancer genome.
Asunto(s)
Neoplasias Gástricas , Pueblo Asiatico/genética , China , Genoma Humano , Humanos , Lectinas/genética , Receptores de Superficie Celular/genética , Neoplasias Gástricas/genéticaRESUMEN
The concept of pan-genome, which is the collection of all genomes from a population, has shown a great potential in genomics study, especially for crop sciences. The rice pan-genome constructed from the second-generation sequencing (SGS) data is about 270 Mb larger than Nipponbare, the rice reference genome (NipRG), but it is still disadvantaged by incompleteness and loss of genomic contexts. The third-generation sequencing (TGS) with long reads can help to construct better pan-genomes. In this paper, we report a high-quality rice pan-genome construction method by introducing a series of new steps to deal with the long-read data, including unmapped sequence block filtering, redundancy removing, and sequence block elongating. Compared to NipRG, the long-read sequencing-based pan-genome constructed from 105 rice accessions, which contains 604 Mb novel sequences, is much more comprehensive than the one constructed from â¼3000 rice genomes sequenced with short reads. The repetitive sequences are the main components of novel sequences, which partially explain the differences between the pan-genomes based on TGS and SGS. Adding six wild rice accessions, there are about 879 Mb novel sequences and 19,000 novel genes in the rice pan-genome in total. In addition, we have created high-quality reference genomes for all representative rice populations, including five gapless reference genomes. This study has made significant progress in our understanding of the rice pan-genome, and this pan-genome construction method for long-read data can be applied to accelerate a broad range of genomics studies.
Asunto(s)
Oryza , Genoma , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Oryza/genética , Análisis de Secuencia de ADNRESUMEN
The analysis of microbiome data has several technical challenges. In particular, count matrices contain a large proportion of zeros, some of which are biological, whereas others are technical. Furthermore, the measurements suffer from unequal sequencing depth, overdispersion, and data redundancy. These nuisance factors introduce substantial noise. We propose an accurate and robust method, mbDenoise, for denoising microbiome data. Assuming a zero-inflated probabilistic PCA (ZIPPCA) model, mbDenoise uses variational approximation to learn the latent structure and recovers the true abundance levels using the posterior, borrowing information across samples and taxa. mbDenoise outperforms state-of-the-art methods to extract the signal for downstream analyses.
Asunto(s)
Microbiota , Modelos Estadísticos , Análisis de Componente Principal , Proyectos de InvestigaciónRESUMEN
SARS-CoV-2 belongs to the coronavirus family. Comparing genomic features of viral genomes of coronavirus family can improve our understanding about SARS-CoV-2. Here we present the first pan-genome analysis of 3,932 whole genomes of 101 species out of 4 genera from the coronavirus family. We found that a total of 181 genes in the pan-genome of coronavirus family, among which only 3 genes, the S gene, M gene and N gene, are highly conserved. We also constructed a pan-genome from 23,539 whole genomes of SARS-CoV-2. There are 13 genes in total in the SARS-CoV-2 pan-genome. All of the 13 genes are core genes for SARS-CoV-2. The pan-genome of coronaviruses shows a lower level of diversity than the pan-genomes of other RNA viruses, which contain no core gene. The three highly conserved genes in coronavirus family, which are also core genes in SARS-CoV-2 pan-genome, could be potential targets in developing nucleic acid diagnostic reagents with a decreased possibility of cross-reaction with other coronavirus species.
Asunto(s)
Coronaviridae/genética , Genoma Viral , FilogeniaRESUMEN
The well-established functions of UHRF1 converge to DNA biological processes, as exemplified by DNA methylation maintenance and DNA damage repair during cell cycles. However, the potential effect of UHRF1 on RNA metabolism is largely unexplored. Here, we revealed that UHRF1 serves as a novel alternative RNA splicing regulator. The protein interactome of UHRF1 identified various splicing factors. Among them, SF3B3 could interact with UHRF1 directly and participate in UHRF1-regulated alternative splicing events. Furthermore, we interrogated the RNA interactome of UHRF1, and surprisingly, we identified U snRNAs, the canonical spliceosome components, in the purified UHRF1 complex. Unexpectedly, we found H3R2 methylation status determines the binding preference of U snRNAs, especially U2 snRNAs. The involvement of U snRNAs in UHRF1-containing complex and their binding preference to specific chromatin configuration imply a finely orchestrated mechanism at play. Our results provided the resources and pinpointed the molecular basis of UHRF1-mediated alternative RNA splicing, which will help us better our understanding of the physiological and pathological roles of UHRF1 in disease development.
Asunto(s)
Empalme Alternativo , Proteínas Potenciadoras de Unión a CCAAT/metabolismo , Histonas/metabolismo , Factores de Empalme de ARN/metabolismo , ARN Nuclear Pequeño/genética , Ubiquitina-Proteína Ligasas/metabolismo , Proteínas Potenciadoras de Unión a CCAAT/genética , Humanos , Metilación , Complejos Multiproteicos , Conformación de Ácido Nucleico , Unión Proteica , ARN Nuclear Pequeño/metabolismo , Ubiquitina-Proteína Ligasas/genéticaRESUMEN
Flavonoids are potential strikingly natural compounds with antioxidant activity and acetylcholinesterase (AChE) inhibitory activity for treating Alzheimer's disease (AD). In present study, in line with our interests in flavonoid derivatives as AChE inhibitors, a four-dimensional quantitative structure-activity relationship (4D-QSAR) molecular model was proposed. The data required to perform 4D-QSAR analysis includes 52 compounds reported in the literature, usually analogs, and their measured biological activities in a common assay. The model was generated by a complete set of 4D-QSAR program which was written by our group. The best model was found after trying multiple experiments. It had a good predictive ability with the cross-validation correlation coefficient Q2 = 0.77, the internal validation correlation coefficient R2 = 0.954, and the external validation correlation coefficient R2pred = 0.715. The molecular docking analysis was also carried out to understand exceedingly the interactions between flavonoids and the AChE targets, which was in good agreement with the 4D-QSAR model. Based on the information provided by the 4D-QSAR model and molecular docking analysis, the idea for optimizing the structures of flavonoids as AChE inhibitors was put forward which maybe provide theoretical guidance for the research and development of new AChE inhibitors.
Asunto(s)
Inhibidores de la Colinesterasa/química , Flavonoides/química , Modelos Moleculares , Relación Estructura-Actividad CuantitativaRESUMEN
With the development of genome-wide association studies, how to gain information from a large scale of data has become an issue of common concern, since traditional methods are not fully developed to solve problems such as identifying loci-to-loci interactions (also known as epistasis). Previous epistatic studies mainly focused on local information with a single outcome (phenotype), while in this paper, we developed a two-stage global search algorithm, Greedy Equivalence Search with Local Modification (GESLM), to implement a global search of directed acyclic graph in order to identify genome-wide epistatic interactions with multiple outcome variables (phenotypes) in a case-control design. GESLM integrates the advantages of score-based methods and constraint-based methods to learn the phenotype-related Bayesian network and is powerful and robust to find the interaction structures that display both genetic associations with phenotypes and gene interactions. We compared GESLM with some common phenotype-related loci detecting methods in simulation studies. The results showed that our method improved the accuracy and efficiency compared with others, especially in an unbalanced case-control study. Besides, its application on the UK Biobank dataset suggested that our algorithm has great performance when handling genome-wide association data with more than one phenotype.
Asunto(s)
Algoritmos , Estudio de Asociación del Genoma Completo , Fenotipo , Polimorfismo de Nucleótido Simple , Teorema de Bayes , Conjuntos de Datos como Asunto , HumanosRESUMEN
Interpreting functional analysis results derived from environmental samples using direct sequencing meta-omics data, including metagenomics and meta-transcriptomics data, is challenging due to their complexity. Visualization of functional analysis results can help researchers discover relevant biological insights. Despite the availability of many R packages, there lacks interactive and comprehensive graphic systems for displaying functional terms and corresponding genes in meta-omics analysis results. Here, we present ivTerm, an R-shiny package with a user-friendly graphical interface that enables users to inspect functional annotations, compare results across multiple experiments, create customized charts, and download these charts. It provides various basic and innovative chart types to visualize functional terms and involved genes. Users can also browse the description of terms obtained from the database web servers automatically. Two examples, including a metagenome analysis data for human gut and a meta-transcriptome data for coral symbiomes, are given to show the usage of ivTerm. In the end, we compared ivTerm with existing tools with similar functions, such as GOplot, ViSEAGO, and Chordomics. The tool ivTerm is convenient and efficient for biologists to gain an integrated view and develop deep insights by interactive analysis of meta-omics data. It can accelerate the procedure to develop insights from complex meta-omics data. The code for ivTerm is freely available at https://github.com/SJTU-CGM/ivTerm.
Asunto(s)
Biología Computacional/métodos , Gráficos por Computador , Visualización de Datos , Programas Informáticos , Interpretación Estadística de Datos , Bases de Datos Factuales , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Genómica/métodos , Humanos , Metabolómica/métodos , Metagenoma , TranscriptomaRESUMEN
BACKGROUND: Current taxonomic classification tools use exact string matching algorithms that are effective to tackle the data from the next generation sequencing technology. However, the unique error patterns in the third generation sequencing (TGS) technologies could reduce the accuracy of these programs. RESULTS: We developed a Classification tool using Discriminative K-mers and Approximate Matching algorithm (CDKAM). This approximate matching method was used for searching k-mers, which included two phases, a quick mapping phase and a dynamic programming phase. Simulated datasets as well as real TGS datasets have been tested to compare the performance of CDKAM with existing methods. We showed that CDKAM performed better in many aspects, especially when classifying TGS data with average length 1000-1500 bases. CONCLUSIONS: CDKAM is an effective program with higher accuracy and lower memory requirement for TGS metagenome sequence classification. It produces a high species-level accuracy.
Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , HumanosRESUMEN
A high serine content in body fluid was identified in a portion of patients with gastric cancer, but its biological significance was not clear. Here, we investigated the biological effect of serine on gastric cancer cells. Serine was added into the culture medium of MGC803 and HGC27 cancer cells, and its influence on multiple biological functions, such as cell growth, migration and invasion, and drug resistance was analyzed. We examined the global transcriptomic profiles in these cultured cells with high serine content. Both MGC803 and HGC27 cell lines were originated from male patients, however, their basal gene expression patterns were very different. The finding of cell differentiation-associated genes, ALPI, KRT18, TM4SF1, KRT81, A2M, MT1E, MUC16, BASP1, TUSC3, and PRSS21 in MGC803 cells suggested that this cell line was more poorly differentiated, compared to HGC27 cell line. When the serine concentration was increased to 150mg/ml in medium, the response of these two gastric cancer cell lines was different, particularly on cell growth, cell migration, and invasion and 5-FU resistance. In animal experiment, administration of high concentration of serine promoted cancer cell metastasis to local lymph node. Taken together, we characterized the basal gene expressing profiles of MGC803 and HGC27. The HGC27 cells were more differentiated than MGC803 cells. MGC803 cells were more sensitive to the change of serine content. Our results suggested that the responsiveness of cancer cells to microenvironmental change is associated with their genetic background.