RESUMEN
Reliable and ultra-fast DNA and RNA sequencing have been achieved with the emergence of high-throughput sequencing technology. When combining the results of DNA and RNA sequencing for tumor cells of cancer patients, neoantigens that potentially stimulate the immune response of either CD4+ or CD8+ T cells can be identified. However, due to the abundance of somatic mutations and the high polymorphic nature of human leukocyte antigen (HLA) it is challenging to accurately predict the neoantigens. Moreover, comparing to HLA-I presented peptides, the HLA-II presented peptides are more variable in length, making the prediction of HLA-II loaded neoantigens even harder. A number of computational approaches have been proposed to address this issue but none of them considers the DNA origin of the neoantigens from the perspective of 3D genome. Here we investigate the DNA origins of the immune-positive and non-negative HLA-II neoantigens in the context of 3D genome and discovered that the chromatin 3D architecture plays an important role in more effective HLA-II neoantigen prediction. We believe that the 3D genome information will help to increase the precision of HLA-II neoantigen discovery and eventually benefit precision and personalized medicine in cancer immunotherapy.
Asunto(s)
Antígenos de Neoplasias , Humanos , Antígenos de Neoplasias/inmunología , Antígenos de Neoplasias/genética , Antígenos de Histocompatibilidad Clase II/genética , Antígenos de Histocompatibilidad Clase II/inmunología , Neoplasias/inmunología , Neoplasias/genética , Genoma Humano , Cromatina/genética , Biología Computacional/métodosRESUMEN
Mammalian DNA replication is initiated at numerous replication origins, which are clustered into thousands of replication domains (RDs) across the genome. However, it remains unclear whether the replication origins within each RD are activated stochastically or preferentially near certain chromatin features. To understand how DNA replication in single human cells is regulated at the sub-RD level, we directly visualized and quantitatively characterized the spatiotemporal organization, morphology, and in situ epigenetic signatures of individual replication foci (RFi) across S-phase at superresolution using stochastic optical reconstruction microscopy. Importantly, we revealed a hierarchical radial pattern of RFi propagation dynamics that reverses directionality from early to late S-phase and is diminished upon caffeine treatment or CTCF knockdown. Together with simulation and bioinformatic analyses, our findings point to a "CTCF-organized REplication Propagation" (CoREP) model, which suggests a nonrandom selection mechanism for replication activation at the sub-RD level during early S-phase, mediated by CTCF-organized chromatin structures. Collectively, these findings offer critical insights into the key involvement of local epigenetic environment in coordinating DNA replication across the genome and have broad implications for our conceptualization of the role of multiscale chromatin architecture in regulating diverse cell nuclear dynamics in space and time.
Asunto(s)
Factor de Unión a CCCTC/metabolismo , Cromatina/metabolismo , Replicación del ADN , Factor de Unión a CCCTC/genética , Cromatina/genética , Epigenómica , Humanos , Fase SRESUMEN
MOTIVATION: The mutations of cancers can encode the seeds of their own destruction, in the form of T-cell recognizable immunogenic peptides, also known as neoantigens. It is computationally challenging, however, to accurately prioritize the potential neoantigen candidates according to their ability of activating the T-cell immunoresponse, especially when the somatic mutations are abundant. Although a few neoantigen prioritization methods have been proposed to address this issue, advanced machine learning model that is specifically designed to tackle this problem is still lacking. Moreover, none of the existing methods considers the original DNA loci of the neoantigens in the perspective of 3D genome which may provide key information for inferring neoantigens' immunogenicity. RESULTS: In this study, we discovered that DNA loci of the immunopositive and immunonegative MHC-I neoantigens have distinct spatial distribution patterns across the genome. We therefore used the 3D genome information along with an ensemble pMHC-I coding strategy, and developed a group feature selection-based deep sparse neural network model (DNN-GFS) that is optimized for neoantigen prioritization. DNN-GFS demonstrated increased neoantigen prioritization power comparing to existing sequence-based approaches. We also developed a webserver named deepAntigen (http://yishi.sjtu.edu.cn/deepAntigen) that implements the DNN-GFS as well as other machine learning methods. We believe that this work provides a new perspective toward more accurate neoantigen prediction which eventually contribute to personalized cancer immunotherapy. AVAILABILITY AND IMPLEMENTATION: Data and implementation are available on webserver: http://yishi.sjtu.edu.cn/deepAntigen. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Antígenos de Neoplasias , Neoplasias , Antígenos de Neoplasias/genética , Genoma , Humanos , Inmunoterapia , Neoplasias/genética , Linfocitos TRESUMEN
The high-order chromatin structure plays a non-negligible role in gene regulation. However, the mechanism, especially the sequence dependence for the formation of varied chromatin structures in different cells remains to be elucidated. As the nucleotide distributions in human and mouse genomes are highly uneven, we identified CGI (CpG island) forest and prairie genomic domains based on CGI densities of a species, dividing the genome into two sequentially, epigenetically, and transcriptionally distinct regions. These two megabase-sized domains also spatially segregate to different extents in different cell types. Forests and prairies show enhanced segregation from each other in development, differentiation, and senescence, meanwhile the multi-scale forest-prairie spatial intermingling is cell-type specific and increases in differentiation, helping to define cell identity. We propose that the phase separation of the 1D mosaic sequence in space serves as a potential driving force, and together with cell type specific epigenetic marks and transcription factors, shapes the chromatin structure in different cell types. The mosaicity in genome of different species in terms of forests and prairies could relate to observations in their biological processes like development and aging. In this way, we provide a bottoms-up theory to explain the chromatin structural and epigenetic changes in different processes.
Asunto(s)
Secuencia de Bases/fisiología , Fenómenos Fisiológicos Celulares/genética , Ensamble y Desensamble de Cromatina/fisiología , Cromatina/química , Conformación Molecular , Conformación de Ácido Nucleico , Animales , Sitios de Unión/genética , Fraccionamiento Químico , Cromatina/metabolismo , Islas de CpG , Epigénesis Genética/fisiología , Regulación de la Expresión Génica , Genes Esenciales/genética , Genoma Humano , Humanos , Ratones , Elementos Reguladores de la Transcripción/genética , Factores de Transcripción/metabolismoRESUMEN
The relationship between DNA methylation and chromatin structure is still largely unknown. By analyzing a large set of published sequencing data, we observed a long-range power law correlation of DNA methylation with cell class-specific scaling exponents in the range of tens of kilobases. We showed that such cell class-specific scaling exponents are caused by different patchiness of DNA methylation in different cells. By modeling the chromatin structure using high-resolution chromosome conformation capture data and mapping the methylation level onto the modeled structure, we demonstrated that the patchiness of DNA methylation is related to chromatin structure. The scaling exponents of the power law correlation are thus a display of the spatial organization of chromatin. Besides the long-range correlation, we also showed that the local correlation of DNA methylation is associated with nucleosome positioning. The local correlation of partially methylated domains is different from that of nonpartially methylated domains, suggesting that their chromatin structures differ at the scale of several hundred base pairs (covering a few nucleosomes). Our study provides a novel, to our knowledge, view of the spatial organization of chromatin structure from a perspective of DNA methylation, in which both long-range and local correlations of DNA methylation along the genome reflect the spatial organization of chromatin.
Asunto(s)
Cromatina/metabolismo , Metilación de ADN/fisiología , Glándulas Suprarrenales/metabolismo , Animales , Encéfalo/metabolismo , Análisis por Conglomerados , Células Madre Embrionarias/metabolismo , Femenino , Análisis de Fourier , Perfilación de la Expresión Génica , Humanos , Ratones , Modelos Genéticos , Modelos Moleculares , Neoplasias/genética , Neoplasias/metabolismo , Ovario/metabolismo , Páncreas/metabolismoRESUMEN
Constructing Markov state models from large-scale molecular dynamics simulation trajectories is a promising approach to dissect the kinetic mechanisms of complex chemical and biological processes. Combined with transition path theory, Markov state models can be applied to identify all pathways connecting any conformational states of interest. However, the identified pathways can be too complex to comprehend, especially for multi-body processes where numerous parallel pathways with comparable flux probability often coexist. Here, we have developed a path lumping method to group these parallel pathways into metastable path channels for analysis. We define the similarity between two pathways as the intercrossing flux between them and then apply the spectral clustering algorithm to lump these pathways into groups. We demonstrate the power of our method by applying it to two systems: a 2D-potential consisting of four metastable energy channels and the hydrophobic collapse process of two hydrophobic molecules. In both cases, our algorithm successfully reveals the metastable path channels. We expect this path lumping algorithm to be a promising tool for revealing unprecedented insights into the kinetic mechanisms of complex multi-body processes.
RESUMEN
Molecules with Möbius topology have drawn increasing attention from scientists in a variety of fields, such as organic chemistry, inorganic chemistry, and material science. However, synthetic difficulties and the lack of functionality impede their fundamental understanding and practical applications. Here, we report the facile synthesis of an aggregation-induced-emission (AIE)-active macrocycle (TPE-ET) and investigate its analogous triply and singly twisted Möbius topologies. Because of the twisted and flexible nature of the tetraphenylethene units, the macrocycle adjusts its conformations so as to accommodate different guest molecules in its crystals. Moreover, theoretical studies including topological and electronic calculations reveal the energetically favorable interconversion process between triply and singly twisted topologies.
RESUMEN
Protein-ligand recognition plays key roles in many biological processes. One of the most fascinating questions about protein-ligand recognition is to understand its underlying mechanism, which often results from a combination of induced fit and conformational selection. In this study, we have developed a three-pronged approach of Markov State Models, Molecular Dynamics simulations, and flux analysis to determine the contribution of each model. Using this approach, we have quantified the recognition mechanism of the choline binding protein (ChoX) to be â¼90% conformational selection dominant under experimental conditions. This is achieved by recovering all the necessary parameters for the flux analysis in combination with available experimental data. Our results also suggest that ChoX has several metastable conformational states, of which an apo-closed state is dominant, consistent with previous experimental findings. Our methodology holds great potential to be widely applied to understand recognition mechanisms underlining many fundamental biological processes.
Asunto(s)
Colina/química , Colina/metabolismo , Proteínas de Transporte de Membrana/química , Proteínas de Transporte de Membrana/metabolismo , Simulación de Dinámica Molecular , Cadenas de Markov , Unión Proteica , Conformación Proteica , TermodinámicaRESUMEN
Molecular design of small-molecule inhibitors targeting programmed cell death-1 (PD-1)/programmed cell death ligand-1 (PD-L1) pathway has been recognized as an active research area by the clinical success of cancer immunotherapy. In recent years, using machine learning (ML) methods to accelerate drug design have been confirmed. However, the black box character of ML methods makes model interpretation and ligands optimization obscured. Herein, five explainable ML models were constructed by integrating five ML models with the SHAP method, where these ML models were pretrained with >4000 molecules and their R2 ranged from 0.835 to 0.86 on test set. Subsequently, the explainable ML models were employed to identify the relationship between fragments and bio-activity of a small molecule inhibitor BMS-1166, leading to the modification of BMS-1166 into 60 novel compounds. After consensus docking and ADMET test, 3 small molecules (C27, C52 and C54) with better docking scores and lower toxicity than BMS-1166 were screened out further. Finally, the improved binding affinity of C27, C52 to the PD-L1 dimer was validated by the MD simulation. Overall, this work proposed an efficient protocol on the basis of explainable ML models for designing small-molecule inhibitors targeting PD-1/PD-L1 pathway in a rational way.
RESUMEN
Schizophrenia is a polygenic complex disease with a heritability as high as 80 %, yet the mechanism of polygenic interaction in its pathogenesis remains unclear. Studying the interaction and regulation of schizophrenia susceptibility genes is crucial for unraveling the pathogenesis of schizophrenia and developing antipsychotic drugs. Therefore, we developed a bioinformatics method named GRACI (Gene Regulation Analysis based on Causal Inference) based on the principles of information theory, a causal inference model, and high order chromatin 3D conformation. GRACI captures the interaction and regulatory relationships between schizophrenia susceptibility genes by analyzing genotyping data. Two datasets, comprising 1459 and 2065 samples respectively, were analyzed, and the gene networks from both datasets were constructed. GRACI showcased superior accuracy when compared to widely adopted methods for detecting gene-gene interactions and intergenic regulation. This alignment was further substantiated by its correlation with chromatin high-order conformation patterns. Using GRACI, we identified three potential genes-KCNN3, KCNH1, and KCND3-that are directly associated with schizophrenia pathogenesis. Furthermore, the results of GRACI on the standalone dataset illustrated the method's applicability to other complex diseases. GRACI download: https://github.com/liuliangjie19/GRACI.
Asunto(s)
Cromatina , Biología Computacional , Predisposición Genética a la Enfermedad , Esquizofrenia , Esquizofrenia/genética , Humanos , Cromatina/genética , Redes Reguladoras de Genes , Herencia MultifactorialRESUMEN
Markov models and master equations are a powerful means of modeling dynamic processes like protein conformational changes. However, these models are often difficult to understand because of the enormous number of components and connections between them. Therefore, a variety of methods have been developed to facilitate understanding by coarse-graining these complex models. Here, we employ Bayesian model comparison to determine which of these coarse-graining methods provides the models that are most faithful to the original set of states. We find that the Bayesian agglomerative clustering engine and the hierarchical Nyström expansion graph (HNEG) typically provide the best performance. Surprisingly, the original Perron cluster cluster analysis (PCCA) method often provides the next best results, outperforming the newer PCCA+ method and the most probable paths algorithm. We also show that the differences between the models are qualitatively significant, rather than being minor shifts in the boundaries between states. The performance of the methods correlates well with the entropy of the resulting coarse-grainings, suggesting that finding states with more similar populations (i.e., avoiding low population states that may just be noise) gives better results.
Asunto(s)
Algoritmos , Biología Computacional , Dipéptidos/química , Cadenas de Markov , Proteínas de Neurofilamentos/química , Fragmentos de Péptidos/química , beta-Lactamasas/química , Alanina/química , Análisis por Conglomerados , Entropía , Cinética , Modelos Moleculares , Conformación Proteica , beta-Lactamasas/metabolismoRESUMEN
Amyloid fibrillation of proteins is associated with a great variety of pathologic conditions. Development of new molecules that can monitor amyloidosis kinetics and inhibit fibril formation is of great diagnostic and therapeutic value. In this work, we have developed a biocompatible molecule that functions as an ex situ monitor and an in situ inhibitor for protein fibrillation, using insulin as a model protein. 1,2-Bis[4-(3-sulfonatopropoxyl)phenyl]-1,2-diphenylethene salt (BSPOTPE) is nonemissive when it is dissolved with native insulin in an incubation buffer but starts to fluoresce when it is mixed with preformed insulin fibril, enabling ex situ monitoring of amyloidogenesis kinetics and high-contrast fluorescence imaging of protein fibrils. Premixing BSPOTPE with insulin, on the other hand, inhibits the nucleation process and impedes the protofibril formation. Increasing the dose of BSPOTPE boosts its inhibitory potency. Theoretical modeling using molecular dynamics simulations and docking reveals that BSPOTPE is prone to binding to partially unfolded insulin through hydrophobic interaction of the phenyl rings of BSPOTPE with the exposed hydrophobic residues of insulin. Such binding is assumed to have stabilized the partially unfolded insulin and obstructed the formation of the critical oligomeric species in the protein fibrillogenesis process.
Asunto(s)
Amiloide/antagonistas & inhibidores , Amiloide/metabolismo , Insulina/metabolismo , Estilbenos/farmacología , Amiloide/química , Amiloidosis/diagnóstico , Animales , Bovinos , Insulina/química , Modelos Moleculares , Conformación Proteica , Espectrometría de FluorescenciaRESUMEN
There is a strong demand for methods that can efficiently reconstruct valid super-resolution intact genome 3D structures from sparse and noise single-cell Hi-C data. Here, we develop Single-Cell Chromosome Conformation Calculator (Si-C) within the Bayesian theory framework and apply this approach to reconstruct intact genome 3D structures from single-cell Hi-C data of eight G1-phase haploid mouse ES cells. The inferred 100-kb and 10-kb structures consistently reproduce the known conserved features of chromatin organization revealed by independent imaging experiments. The analysis of the 10-kb resolution 3D structures reveals cell-to-cell varying domain structures in individual cells and hyperfine structures in domains, such as loops. An average of 0.2 contact reads per divided bin is sufficient for Si-C to obtain reliable structures. The valid super-resolution structures constructed by Si-C demonstrate the potential for visualizing and investigating interactions between all chromatin loci at the genome scale in individual cells.
Asunto(s)
Cromatina/metabolismo , Cromosomas/metabolismo , Células Madre Embrionarias/metabolismo , Genoma , Análisis de la Célula Individual/métodos , Animales , Teorema de Bayes , Cromatina/química , Cromatina/genética , Cromosomas/química , Cromosomas/genética , Fase G1 , Haploidia , Hibridación Fluorescente in Situ , Ratones , Conformación MolecularRESUMEN
Chiral amplification in liquid crystals (LCs) is a well-known strategy. However, current knowledge about the underlying mechanism was still lacking; in particular, how it was realized at the nano scale still remained to be revealed. Here, we provide systematical exploration of chiral amplification of chiral aggregation induced emission (AIE) molecules in LCs from direct visualization of their co-assemblies at the nano scale to theoretical calculation of the molecular packing modes on a single molecular level. Using AFM imaging,we directly visualized the co-assembly formed by chiral AIE molecules/LCs at the nano scale: the chiral AIE molecules self-assembled into helical fibers to serve as the helical template for LCs to bind, while the LCs helically bound to the helical fibers to form the co-assembly, giving the morphology of pearled necklaces or thick rods. Theoretical calculation suggested that chiral AIE molecules were packed into left-handed helical fibers with a large volume of empty space between neighboring molecules, which provided the binding cites for LCs. Structural analysis showed that the π-π stacking between aromatic groups from LCs and TPE groups and the σ-π hyperconjugation between LC aromatic groups and cholesterol aliphatic groups play an important role in stabilizing the binding of LCs in the confined space on the surface of the helical assemblies.
RESUMEN
BACKGROUND: High-throughput sequencing technology has yielded reliable and ultra-fast sequencing for DNA and RNA. For tumor cells of cancer patients, when combining the results of DNA and RNA sequencing, one can identify potential neoantigens that stimulate the immune response of the T cell. However, when the somatic mutations are abundant, it is computationally challenging to efficiently prioritize the identified neoantigen candidates according to their ability of activating the T cell immuno-response. METHODS: Numerous prioritization or prediction approaches have been proposed to address this issue but none of them considers the original DNA loci of the neoantigens from the perspective of 3D genome. Based on our previous discoveries, we propose to investigate the distribution of neoantigens with different immunogenicity abilities in 3D genome and propose to adopt this important information into neoantigen prediction. RESULTS: We retrospect the DNA origins of the immuno-positive and immuno-negative neoantigens in the context of 3D genome and discovered that DNA loci of the immuno-positive neoantigens and immuno-negative neoantigens have very different distribution pattern. Specifically, comparing to the background 3D genome, DNA loci of the immuno-positive neoantigens tend to locate at specific regions in the 3D genome. We thus used this information into neoantigen prediction and demonstrated the effectiveness of this approach. CONCLUSION: We believe that the 3D genome information will help to increase the precision of neoantigen prioritization and discovery and eventually benefit precision and personalized medicine in cancer immunotherapy.
Asunto(s)
Antígenos de Neoplasias/química , Cromatina/química , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Medicina de Precisión , Conformación ProteicaRESUMEN
Amphiphile self-assembly is an essential bottom-up approach of fabricating advanced functional materials. Self-assembled materials with desired structures are often obtained through thermodynamic control. Here, we demonstrate that the selection of kinetic pathways can lead to drastically different self-assembled structures, underlining the significance of kinetic control in self-assembly. By constructing kinetic network models from large-scale molecular dynamics simulations, we show that two largely similar amphiphiles, 1-[11-oxo-11-(pyren-1-ylmethoxy)-undecyl]pyridinium bromide (PYR) and 1-(11-((5a1,8a-dihydropyren-1-yl)methylamino)-11-oxoundecyl)pyridinium bromide (PYN), prefer distinct kinetic assembly pathways. While PYR prefers an incremental growth mechanism and forms a nanotube, PYN favors a hopping growth pathway leading to a vesicle. Such preference was found to originate from the subtle difference in the distributions of hydrophobic and hydrophilic groups in their chemical structures, which leads to different rates of the adhesion process among the aggregating micelles. Our results are in good agreement with experimental results, and accentuate the role of kinetics in the rational design of amphiphile self-assembly.
RESUMEN
How chromosomes fold into 3D structures and how genome functions are affected or even controlled by their spatial organization remain challenging questions. Hi-C experiment has provided important structural insights for chromosome, and Hi-C data are used here to construct the 3D chromatin structure which are characterized by two spatially segregated chromatin compartments A and B. By mapping a plethora of genome features onto the constructed 3D chromatin model, we show vividly the close connection between genome properties and the spatial organization of chromatin. We are able to dissect the whole chromatin into two types of chromatin domains which have clearly different Hi-C contact patterns as well as different sizes of chromatin loops. The two chromatin types can be respectively regarded as the basic units of chromatin compartments A and B, and also spatially segregate from each other as the two chromatin compartments. Therefore, the chromatin loops segregate in the space according to their sizes, suggesting the excluded volume or entropic effect in chromatin compartmentalization as well as chromosome positioning. Taken together, these results provide clues to the folding principles of chromosomes, their spatial organization, and the resulted clustering of many genome features in the 3D space.
Asunto(s)
Cromatina/genética , Segregación Cromosómica/genética , Cromosomas/genética , Genoma Humano , Compartimento Celular/genética , Cromatina/ultraestructura , Cromosomas/ultraestructura , Humanos , Modelos GenéticosRESUMEN
The conformational dynamics of multibody systems plays crucial roles in many important problems. Markov state models (MSMs) are powerful kinetic network models that can predict long-time-scale dynamics using many short molecular dynamics simulations. Although MSMs have been successfully applied to conformational changes of individual proteins, the analysis of multibody systems is still a challenge because of the complexity of the dynamics that occur on a mixture of drastically different time scales. In this work, we have developed a new algorithm, automatic state partitioning for multibody systems (APM), for constructing MSMs to elucidate the conformational dynamics of multibody systems. The APM algorithm effectively addresses different time scales in the multibody systems by directly incorporating dynamics into geometric clustering when identifying the metastable conformational states. We have applied the APM algorithm to a 2D potential that can mimic a protein-ligand binding system and the aggregation of two hydrophobic particles in water and have shown that it can yield tremendous enhancements in the computational efficiency of MSM construction and the accuracy of the models.