RESUMEN
In flies, Centrosomin (Cnn) forms a phosphorylation-dependent scaffold that recruits proteins to the mitotic centrosome, but how Cnn assembles into a scaffold is unclear. We show that scaffold assembly requires conserved leucine zipper (LZ) and Cnn-motif 2 (CM2) domains that co-assemble into a 2:2 complex in vitro. We solve the crystal structure of the LZ:CM2 complex, revealing that both proteins form helical dimers that assemble into an unusual tetramer. A slightly longer version of the LZ can form micron-scale structures with CM2, whose assembly is stimulated by Plk1 phosphorylation in vitro. Mutating individual residues that perturb LZ:CM2 tetramer assembly perturbs the formation of these micron-scale assemblies in vitro and Cnn-scaffold assembly in vivo. Thus, Cnn molecules have an intrinsic ability to form large, LZ:CM2-interaction-dependent assemblies that are critical for mitotic centrosome assembly. These studies provide the first atomic insight into a molecular interaction required for mitotic centrosome assembly.
Asunto(s)
Centrosoma/química , Centrosoma/metabolismo , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/citología , Drosophila melanogaster/metabolismo , Mitosis , Secuencia de Aminoácidos , Animales , Drosophila melanogaster/química , Proteínas de Homeodominio/metabolismo , Modelos Moleculares , Fosforilación , Dominios Proteicos , Proteínas Serina-Treonina Quinasas/metabolismo , Alineación de SecuenciaRESUMEN
Histone modifications, known as histone marks, are pivotal in regulating gene expression within cells. The vast array of potential combinations of histone marks presents a considerable challenge in decoding the regulatory mechanisms solely through biological experimental approaches. To overcome this challenge, we have developed a method called CatLearning. It utilizes a modified convolutional neural network architecture with a specialized adaptation Residual Network to quantitatively interpret histone marks and predict gene expression. This architecture integrates long-range histone information up to 500Kb and learns chromatin interaction features without 3D information. By using only one histone mark, CatLearning achieves a high level of accuracy. Furthermore, CatLearning predicts gene expression by simulating changes in histone modifications at enhancers and throughout the genome. These findings help comprehend the architecture of histone marks and develop diagnostic and therapeutic targets for diseases with epigenetic changes.
Asunto(s)
Código de Histonas , Histonas , Humanos , Histonas/metabolismo , Histonas/genética , Cromatina/metabolismo , Cromatina/genética , Epigénesis Genética , Redes Neurales de la Computación , Biología Computacional/métodos , Regulación de la Expresión GénicaRESUMEN
Nucleic acid-binding proteins (NABPs), including DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs), play important roles in essential biological processes. To facilitate functional annotation and accurate prediction of different types of NABPs, many machine learning-based computational approaches have been developed. However, the datasets used for training and testing as well as the prediction scopes in these studies have limited their applications. In this paper, we developed new strategies to overcome these limitations by generating more accurate and robust datasets and developing deep learning-based methods including both hierarchical and multi-class approaches to predict the types of NABPs for any given protein. The deep learning models employ two layers of convolutional neural network and one layer of long short-term memory. Our approaches outperform existing DBP and RBP predictors with a balanced prediction between DBPs and RBPs, and are more practically useful in identifying novel NABPs. The multi-class approach greatly improves the prediction accuracy of DBPs and RBPs, especially for the DBPs with ~12% improvement. Moreover, we explored the prediction accuracy of single-stranded DNA binding proteins and their effect on the overall prediction accuracy of NABP predictions.
Asunto(s)
Biología Computacional , Proteínas de Unión al ADN , Aprendizaje Profundo , Proteínas de Unión al ARN , Proteínas de Unión al ARN/metabolismo , Proteínas de Unión al ADN/metabolismo , Biología Computacional/métodos , Redes Neurales de la Computación , HumanosRESUMEN
Discovering pre-microRNAs (miRNAs) is the core of miRNA discovery. Using traditional sequence/structural features, many tools have been published to discover miRNAs. However, in practical applications like genomic annotations, their actual performance has been very low. This becomes more grave in plants where unlike animals pre-miRNAs are much more complex and difficult to identify. A huge gap exists between animals and plants for the available software for miRNA discovery and species-specific miRNA information. Here, we present miWords, a composite deep learning system of transformers and convolutional neural networks which sees genome as a pool of sentences made of words with specific occurrence preferences and contexts, to accurately identify pre-miRNA regions across plant genomes. A comprehensive benchmarking was done involving >10 software representing different genre and many experimentally validated datasets. miWords emerged as the best one while breaching accuracy of 98% and performance lead of ~10%. miWords was also evaluated across Arabidopsis genome where also it outperformed the compared tools. As a demonstration, miWords was run across the tea genome, reporting 803 pre-miRNA regions, all validated by small RNA-seq reads from multiple samples, and most of them were functionally supported by the degradome sequencing data. miWords is freely available as stand-alone source codes at https://scbb.ihbt.res.in/miWords/index.php.
Asunto(s)
Arabidopsis , Aprendizaje Profundo , MicroARNs , Animales , MicroARNs/genética , MicroARNs/química , Programas Informáticos , Genómica , Genoma de Planta , Arabidopsis/genéticaRESUMEN
Classifying epitopes is essential since they can be applied in various fields, including therapeutics, diagnostics and peptide-based vaccines. To determine the epitope or peptide against an antibody, epitope mapping with peptides is the most extensively used method. However, this method is more time-consuming and inefficient than using present methods. The ability to retrieve data on protein sequences through laboratory procedures has led to the development of computational models that predict epitope binding based on machine learning and deep learning (DL). It has also evolved to become a crucial part of developing effective cancer immunotherapies. This paper proposes an architecture to generalize this case since various research strives to solve a low-performance classification problem. A proposed DL model is the fusion architecture, which combines two architectures: Transformer architecture and convolutional neural network (CNN), called MITNet and MITNet-Fusion. Combining these two architectures enriches feature space to correlate epitope labels with the binary classification method. The selected epitope-T-cell receptor (TCR) interactions are GILG, GLCT and NLVP, acquired from three databases: IEDB, VDJdb and McPAS-TCR. The previous input data was extracted using amino acid composition, dipeptide composition, spectrum descriptor and the combination of all those features called AADIP composition to encode the input data to DL architecture. For ensuring consistency, fivefold cross-validations were performed using the area under curve metric. Results showed that GILG, GLCT and NLVP received scores of 0.85, 0.87 and 0.86, respectively. Those results were compared to prior architecture and outperformed other similar deep learning models.
Asunto(s)
Epítopos de Linfocito T , Redes Neurales de la Computación , Secuencia de Aminoácidos , Péptidos/química , Receptores de Antígenos de Linfocitos TRESUMEN
As a kind of small molecule protein that can fight against various microorganisms in nature, antimicrobial peptides (AMPs) play an indispensable role in maintaining the health of organisms and fortifying defenses against diseases. Nevertheless, experimental approaches for AMP identification still demand substantial allocation of human resources and material inputs. Alternatively, computing approaches can assist researchers effectively and promptly predict AMPs. In this study, we present a novel AMP predictor called iAMP-Attenpred. As far as we know, this is the first work that not only employs the popular BERT model in the field of natural language processing (NLP) for AMPs feature encoding, but also utilizes the idea of combining multiple models to discover AMPs. Firstly, we treat each amino acid from preprocessed AMPs and non-AMP sequences as a word, and then input it into BERT pre-training model for feature extraction. Moreover, the features obtained from BERT method are fed to a composite model composed of one-dimensional CNN, BiLSTM and attention mechanism for better discriminating features. Finally, a flatten layer and various fully connected layers are utilized for the final classification of AMPs. Experimental results reveal that, compared with the existing predictors, our iAMP-Attenpred predictor achieves better performance indicators, such as accuracy, precision and so on. This further demonstrates that using the BERT approach to capture effective feature information of peptide sequences and combining multiple deep learning models are effective and meaningful for predicting AMPs.
Asunto(s)
Aminoácidos , Péptidos Antimicrobianos , Humanos , Secuencia de Aminoácidos , Procesamiento de Lenguaje Natural , InvestigadoresRESUMEN
Increasing studies have proved that microRNAs (miRNAs) are critical biomarkers in the development of human complex diseases. Identifying disease-related miRNAs is beneficial to disease prevention, diagnosis and remedy. Based on the assumption that similar miRNAs tend to associate with similar diseases, various computational methods have been developed to predict novel miRNA-disease associations (MDAs). However, selecting proper features for similarity calculation is a challenging task because of data deficiencies in biomedical science. In this study, we propose a deep learning-based computational method named MAGCN to predict potential MDAs without using any similarity measurements. Our method predicts novel MDAs based on known lncRNA-miRNA interactions via graph convolution networks with multichannel attention mechanism and convolutional neural network combiner. Extensive experiments show that the average area under the receiver operating characteristic values obtained by our method under 2-fold, 5-fold and 10-fold cross-validations are 0.8994, 0.9032 and 0.9044, respectively. When compared with five state-of-the-art methods, MAGCN shows improvement in terms of prediction accuracy. In addition, we conduct case studies on three diseases to discover their related miRNAs, and find that all the top 50 predictions for all the three diseases have been supported by established databases. The comprehensive results demonstrate that our method is a reliable tool in detecting new disease-related miRNAs.
Asunto(s)
MicroARNs , ARN Largo no Codificante , Humanos , Algoritmos , Biología Computacional/métodos , Bases de Datos Genéticas , MicroARNs/genética , ARN Largo no Codificante/genética , Aprendizaje ProfundoRESUMEN
Human leukocyte antigen (HLA) molecules play critically significant role within the realm of immunotherapy due to their capacities to recognize and bind exogenous antigens such as peptides, subsequently delivering them to immune cells. Predicting the binding between peptides and HLA molecules (pHLA) can expedite the screening of immunogenic peptides and facilitate vaccine design. However, traditional experimental methods are time-consuming and inefficient. In this study, an efficient method based on deep learning was developed for predicting peptide-HLA binding, which treated peptide sequences as linguistic entities. It combined the architectures of textCNN and BiLSTM to create a deep neural network model called APEX-pHLA. This model operated without limitations related to HLA class I allele variants and peptide segment lengths, enabling efficient encoding of sequence features for both HLA and peptide segments. On the independent test set, the model achieved Accuracy, ROC_AUC, F1, and MCC is 0.9449, 0.9850, 0.9453, and 0.8899, respectively. Similarly, on an external test set, the results were 0.9803, 0.9574, 0.8835, and 0.7863, respectively. These findings outperformed fifteen methods previously reported in the literature. The accurate prediction capability of the APEX-pHLA model in peptide-HLA binding might provide valuable insights for future HLA vaccine design.
Asunto(s)
Antígenos de Histocompatibilidad Clase I , Péptidos , Unión Proteica , Humanos , Antígenos de Histocompatibilidad Clase I/inmunología , Antígenos de Histocompatibilidad Clase I/metabolismo , Péptidos/química , Péptidos/inmunología , Aprendizaje Profundo , Antígenos HLA/inmunología , Antígenos HLA/genética , Redes Neurales de la Computación , Biología Computacional/métodosRESUMEN
N6-methyladenosine (m6A) is the most prevalent, abundant, and conserved internal modification in the eukaryotic messenger RNA (mRNAs) and plays a crucial role in the cellular process. Although more than ten methods were developed for m6A detection over the past decades, there were rooms left to improve the predictive accuracy and the efficiency. In this paper, we proposed an improved method for predicting m6A modification sites, which was based on bi-directional gated recurrent unit (Bi-GRU) and convolutional neural networks (CNN), called Deepm6A-MT. The Deepm6A-MT has two input channels. One is to use an embedding layer followed by the Bi-GRU and then by the CNN, and another is to use one-hot encoding, dinucleotide one-hot encoding, and nucleotide chemical property codes. We trained and evaluated the Deepm6A-MT both by the 5-fold cross-validation and the independent test. The empirical tests showed that the Deepm6A-MT achieved the state of the art performance. In addition, we also conducted the cross-species and the cross-tissues tests to further verify the Deepm6A-MT for effectiveness and efficiency. Finally, for the convenience of academic research, we deployed the Deepm6A-MT to the web server, which is accessed at the URL http://www.biolscience.cn/Deepm6A-MT/.
Asunto(s)
Adenosina , Aprendizaje Profundo , Adenosina/análogos & derivados , Adenosina/metabolismo , Adenosina/genética , Adenosina/química , Humanos , Animales , Redes Neurales de la Computación , ARN Mensajero/genética , ARN Mensajero/metabolismo , Biología Computacional/métodosRESUMEN
The multipotent stem cells of our body have been largely harnessed in biotherapeutics. However, as they are derived from multiple anatomical sources, from different tissues, human mesenchymal stem cells (hMSCs) are a heterogeneous population showing ambiguity in their in vitro behavior. Intra-clonal population heterogeneity has also been identified and pre-clinical mechanistic studies suggest that these cumulatively depreciate the therapeutic effects of hMSC transplantation. Although various biomarkers identify these specific stem cell populations, recent artificial intelligence-based methods have capitalized on the cellular morphologies of hMSCs, opening a new approach to understand their attributes. A robust and rapid platform is required to accommodate and eliminate the heterogeneity observed in the cell population, to standardize the quality of hMSC therapeutics globally. Here, we report our primary findings of morphological heterogeneity observed within and across two sources of hMSCs namely, stem cells from human exfoliated deciduous teeth (SHEDs) and human Wharton jelly mesenchymal stem cells (hWJ MSCs), using real-time single-cell images generated on immunophenotyping by imaging flow cytometry (IFC). We used the ImageJ software for identification and comparison between the two types of hMSCs using statistically significant morphometric descriptors that are biologically relevant. To expand on these insights, we have further applied deep learning methods and successfully report the development of a Convolutional Neural Network-based image classifier. In our research, we introduced a machine learning methodology to streamline the entire procedure, utilizing convolutional neural networks and transfer learning for binary classification, achieving an accuracy rate of 97.54%. We have also critically discussed the challenges, comparisons between solutions and future directions of machine learning in hMSC classification in biotherapeutics.
Asunto(s)
Aprendizaje Automático , Células Madre Mesenquimatosas , Análisis de la Célula Individual , Humanos , Células Madre Mesenquimatosas/citología , Análisis de la Célula Individual/métodos , Inmunofenotipificación/métodos , Citometría de Flujo/métodos , Diente Primario/citología , Procesamiento de Imagen Asistido por Computador/métodos , Gelatina de Wharton/citología , Células CultivadasRESUMEN
Interactions of biological molecules in organisms are considered to be primary factors for the lifecycle of that organism. Various important biological functions are dependent on such interactions and among different kinds of interactions, the protein DNA interactions are very important for the processes of transcription, regulation of gene expression, DNA repairing and packaging. Thus, keeping the knowledge of such interactions and the sites of those interactions is necessary to study the mechanism of various biological processes. As experimental identification through biological assays is quite resource-demanding, costly and error-prone, scientists opt for the computational methods for efficient and accurate identification of such DNA-protein interaction sites. Thus, herein, we propose a novel and accurate method namely DeepDBS for the identification of DNA-binding sites in proteins, using primary amino acid sequences of proteins under study. From protein sequences, deep representations were computed through a one-dimensional convolution neural network (1D-CNN), recurrent neural network (RNN) and long short-term memory (LSTM) network and were further used to train a Random Forest classifier. Random Forest with LSTM-based features outperformed the other models, as well as the existing state-of-the-art methods with an accuracy score of 0.99 for self-consistency test, 10-fold cross-validation, 5-fold cross-validation, and jackknife validation while 0.92 for independent dataset testing. It is concluded based on results that the DeepDBS can help accurate and efficient identification of DNA binding sites (DBS) in proteins.
RESUMEN
There is growing evidence for the role of DNA methylation (DNAm) quantitative trait loci (mQTLs) in the genetics of complex traits, including psychiatric disorders. However, due to extensive linkage disequilibrium (LD) of the genome, it is challenging to identify causal genetic variations that drive DNAm levels by population-based genetic association studies. This limits the utility of mQTLs for fine-mapping risk loci underlying psychiatric disorders identified by genome-wide association studies (GWAS). Here we present INTERACT, a deep learning model that integrates convolutional neural networks with transformer, to predict effects of genetic variations on DNAm levels at CpG sites in the human brain. We show that INTERACT-derived DNAm regulatory variants are not confounded by LD, are concentrated in regulatory genomic regions in the human brain, and are convergent with mQTL evidence from genetic association analysis. We further demonstrate that predicted DNAm regulatory variants are enriched for heritability of brain-related traits and improve polygenic risk prediction for schizophrenia across diverse ancestry samples. Finally, we applied predicted DNAm regulatory variants for fine-mapping schizophrenia GWAS risk loci to identify potential novel risk genes. Our study shows the power of a deep learning approach to identify functional regulatory variants that may elucidate the genetic basis of complex traits.
Asunto(s)
Química Encefálica , Metilación de ADN , Aprendizaje Profundo , Esquizofrenia , Encéfalo , Islas de CpG , Estudio de Asociación del Genoma Completo , Humanos , Redes Neurales de la Computación , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Esquizofrenia/genéticaRESUMEN
Short-length antimicrobial peptides (AMPs) have been demonstrated to have intensified antimicrobial activities against a wide spectrum of microbes. Therefore, exploration of novel and promising short AMPs is highly essential in developing various types of antimicrobial drugs or treatments. In addition to experimental approaches, computational methods have been developed to improve screening efficiency. Although existing computational methods have achieved satisfactory performance, there is still much room for model improvement. In this study, we proposed iAMP-DL, an efficient hybrid deep learning architecture, for predicting short AMPs. The model was constructed using two well-known deep learning architectures: the long short-term memory architecture and convolutional neural networks. To fairly assess the performance of the model, we compared our model with existing state-of-the-art methods using the same independent test set. Our comparative analysis shows that iAMP-DL outperformed other methods. Furthermore, to assess the robustness and stability of our model, the experiments were repeated 10 times to observe the variation in prediction efficiency. The results demonstrate that iAMP-DL is an effective, robust, and stable framework for detecting promising short AMPs. Another comparative study of different negative data sampling methods also confirms the effectiveness of our method and demonstrates that it can also be used to develop a robust model for predicting AMPs in general. The proposed framework was also deployed as an online web server with a user-friendly interface to support the research community in identifying short AMPs.
Asunto(s)
Péptidos Antimicrobianos , Aprendizaje Profundo , Péptidos Antimicrobianos/química , Péptidos Antimicrobianos/farmacología , Redes Neurales de la Computación , Biología Computacional/métodos , Péptidos Catiónicos Antimicrobianos/química , Péptidos Catiónicos Antimicrobianos/farmacologíaRESUMEN
BACKGROUND: Mild cognitive impairment (MCI) is the transition stage between the cognitive decline expected in normal aging and more severe cognitive decline such as dementia. The early diagnosis of MCI plays an important role in human healthcare. Current methods of MCI detection include cognitive tests to screen for executive function impairments, possibly followed by neuroimaging tests. However, these methods are expensive and time-consuming. Several studies have demonstrated that MCI and dementia can be detected by machine learning technologies from different modality data. This study proposes a multi-stream convolutional neural network (MCNN) model to predict MCI from face videos. RESULTS: The total effective data are 48 facial videos from 45 participants, including 35 videos from normal cognitive participants and 13 videos from MCI participants. The videos are divided into several segments. Then, the MCNN captures the latent facial spatial features and facial dynamic features of each segment and classifies the segment as MCI or normal. Finally, the aggregation stage produces the final detection results of the input video. We evaluate 27 MCNN model combinations including three ResNet architectures, three optimizers, and three activation functions. The experimental results showed that the ResNet-50 backbone with Swish activation function and Ranger optimizer produces the best results with an F1-score of 89% at the segment level. However, the ResNet-18 backbone with Swish and Ranger achieves the F1-score of 100% at the participant level. CONCLUSIONS: This study presents an efficient new method for predicting MCI from facial videos. Studies have shown that MCI can be detected from facial videos, and facial data can be used as a biomarker for MCI. This approach is very promising for developing accurate models for screening MCI through facial data. It demonstrates that automated, non-invasive, and inexpensive MCI screening methods are feasible and do not require highly subjective paper-and-pencil questionnaires. Evaluation of 27 model combinations also found that ResNet-50 with Swish is more stable for different optimizers. Such results provide directions for hyperparameter tuning to further improve MCI predictions.
Asunto(s)
Disfunción Cognitiva , Redes Neurales de la Computación , Disfunción Cognitiva/diagnóstico , Humanos , Anciano , Aprendizaje Automático , Masculino , Femenino , Cara/diagnóstico por imagen , Grabación en Video/métodosRESUMEN
O-linked ß-N-acetylglucosamine (O-GlcNAc) is a post-translational modification (i.e., O-GlcNAcylation) on serine/threonine residues of proteins, regulating a plethora of physiological and pathological events. As a dynamic process, O-GlcNAc functions in a site-specific manner. However, the experimental identification of the O-GlcNAc sites remains challenging in many scenarios. Herein, by leveraging the recent progress in cataloguing experimentally identified O-GlcNAc sites and advanced deep learning approaches, we establish an ensemble model, O-GlcNAcPRED-DL, a deep learning-based tool, for the prediction of O-GlcNAc sites. In brief, to make a benchmark O-GlcNAc data set, we extracted the information on O-GlcNAc from the recently constructed database O-GlcNAcAtlas, which contains thousands of experimentally identified and curated O-GlcNAc sites on proteins from multiple species. To overcome the imbalance between positive and negative data sets, we selected five groups of negative data sets in humans and mice to construct an ensemble predictor based on connection of a convolutional neural network and bidirectional long short-term memory. By taking into account three types of sequence information, we constructed four network frameworks, with the systematically optimized parameters used for the models. The thorough comparison analysis on two independent data sets of humans and mice and six independent data sets from other species demonstrated remarkably increased sensitivity and accuracy of the O-GlcNAcPRED-DL models, outperforming other existing tools. Moreover, a user-friendly Web server for O-GlcNAcPRED-DL has been constructed, which is freely available at http://oglcnac.org/pred_dl.
Asunto(s)
Aprendizaje Profundo , Humanos , Animales , Ratones , Proteínas/metabolismo , Procesamiento Proteico-Postraduccional , Acetilglucosamina/química , N-Acetilglucosaminiltransferasas/metabolismoRESUMEN
Efficient and high-accuracy filtering of cryo-electron microscopy (cryo-EM) micrographs is an emerging challenge with the growing speed of data collection and sizes of datasets. Convolutional neural networks (CNNs) are machine learning models that have been proven successful in many computer vision tasks, and have been previously applied to cryo-EM micrograph filtering. In this work, we demonstrate that two strategies, fine-tuning models from pretrained weights and including the power spectrum of micrographs as input, can greatly improve the attainable prediction accuracy of CNN models. The resulting software package, Miffi, is open-source and freely available for public use (https://github.com/ando-lab/miffi).
Asunto(s)
Microscopía por Crioelectrón , Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Programas Informáticos , Microscopía por Crioelectrón/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático , AlgoritmosRESUMEN
Deep learning is gaining importance due to its wide range of applications. Many researchers have utilized deep learning (DL) models for the automated diagnosis of cancer patients. This paper provides a systematic review of DL models for automated diagnosis of cancer patients. Initially, various DL models for cancer diagnosis are presented. Five major categories of cancers such as breast, lung, liver, brain and cervical cancer are considered. As these categories of cancers have a very high percentage of occurrences with high mortality rate. The comparative analysis of different types of DL models is drawn for the diagnosis of cancer at early stages by considering the latest research articles from 2016 to 2022. After comprehensive comparative analysis, it is found that most of the researchers achieved appreciable accuracy with implementation of the convolutional neural network model. These utilized the pretrained models for automated diagnosis of cancer patients. Various shortcomings with the existing DL-based automated cancer diagnosis models are also been presented. Finally, future directions are discussed to facilitate further research for automated diagnosis of cancer patients.
Asunto(s)
Aprendizaje Profundo , Diagnóstico por Computador , Neoplasias , Humanos , Pulmón , Redes Neurales de la Computación , Tomografía Computarizada por Rayos X , Neoplasias/diagnósticoRESUMEN
BACKGROUND: Alternative splicing is a pivotal mechanism of post-transcriptional modification that contributes to the transcriptome plasticity and proteome diversity in metazoan cells. Although many splicing regulations around the exon/intron regions are known, the relationship between promoter-bound transcription factors and the downstream alternative splicing largely remains unexplored. RESULTS: In this study, we present computational approaches to unravel the regulatory relationship between promoter-bound transcription factor binding sites (TFBSs) and the splicing patterns. We curated a fine dataset that includes DNase I hypersensitive site sequencing and transcriptomes across fifteen human tissues from ENCODE. Specifically, we proposed different representations of TF binding context and splicing patterns to examine the associations between the promoter and downstream splicing events. While machine learning models demonstrated potential in predicting splicing patterns based on TFBS occupancies, the limitations in the generalization of predicting the splicing forms of singleton genes across diverse tissues was observed with carefully examination using different cross-validation methods. We further investigated the association between alterations in individual TFBS at promoters and shifts in exon splicing efficiency. Our results demonstrate that the convolutional neural network (CNN) models, trained on TF binding changes in the promoters, can predict the changes in splicing patterns. Furthermore, a systemic in silico substitutions analysis on the CNN models highlighted several potential splicing regulators. Notably, using empirical validation using K562 CTCFL shRNA knock-down data, we showed the significant role of CTCFL in splicing regulation. CONCLUSION: In conclusion, our finding highlights the potential role of promoter-bound TFBSs in influencing the regulation of downstream splicing patterns and provides insights for discovering alternative splicing regulations.
Asunto(s)
Empalme Alternativo , Aprendizaje Profundo , Regiones Promotoras Genéticas , Factores de Transcripción , Humanos , Sitios de Unión , Factores de Transcripción/metabolismo , Factores de Transcripción/genética , Biología Computacional/métodos , Exones/genéticaRESUMEN
Attending to heartbeats for interoceptive awareness initiates distinct electrophysiological responses synchronized with the R-peaks of an electrocardiogram (ECG), such as the heartbeat-evoked potential (HEP). Beyond HEP, this study proposes heartbeat-related spectral perturbation (HRSP), a time-frequency map of the R-peak locked electroencephalogram (EEG), and explores its characteristics in identifying interoceptive attention states using a classification approach. HRSPs of EEG brain components specified by independent component analysis (ICA) were used for the offline and online classification of interoceptive states. A convolutional neural network (CNN) designed specifically for HRSP was applied to publicly available data from a binary-state experiment (attending to self-heartbeats and white noise) and data from our four-state classification experiment (attending to self-heartbeats, white noise, time passage, and toe) with diverse input feature conditions of HRSP. From the dynamic state perspective, we evaluated the primary frequency bands of HRSP and the minimal number of averaging epochs required to reflect changing interoceptive attention states without compromising accuracy. We also assessed the utility of group ICA and models for classifying HRSP in new participants. The CNN for trial-by-trial HRSP with actual R-peaks demonstrated significantly higher classification accuracy than HRSP with sham, i.e., randomly positioned, R-peaks. Gradient-weighted class activation mapping highlighted the prominent role of theta and alpha bands between 200-600 ms post-R-peak-features absent in classifications using sham HRSPs. Online classification benefits from employing a group ICA and classification model, ensuring reliable accuracy without individual EEG precollection. These results suggest HRSP's potential to reflect interoceptive attention states, proposing transformative implications for clinical applications.
Asunto(s)
Atención , Electroencefalografía , Frecuencia Cardíaca , Interocepción , Humanos , Electroencefalografía/métodos , Atención/fisiología , Frecuencia Cardíaca/fisiología , Masculino , Adulto , Femenino , Adulto Joven , Interocepción/fisiología , Redes Neurales de la Computación , Encéfalo/fisiología , Potenciales Evocados/fisiologíaRESUMEN
Hippocampal atrophy (tissue loss) has become a fundamental outcome parameter in clinical trials on Alzheimer's disease. To accurately estimate hippocampus volume and track its volume loss, a robust and reliable segmentation is essential. Manual hippocampus segmentation is considered the gold standard but is extensive, time-consuming, and prone to rater bias. Therefore, it is often replaced by automated programs like FreeSurfer, one of the most commonly used tools in clinical research. Recently, deep learning-based methods have also been successfully applied to hippocampus segmentation. The basis of all approaches are clinically used T1-weighted whole-brain MR images with approximately 1 mm isotropic resolution. However, such T1 images show low contrast-to-noise ratios (CNRs), particularly for many hippocampal substructures, limiting delineation reliability. To overcome these limitations, high-resolution T2-weighted scans are suggested for better visualization and delineation, as they show higher CNRs and usually allow for higher resolutions. Unfortunately, such time-consuming T2-weighted sequences are not feasible in a clinical routine. We propose an automated hippocampus segmentation pipeline leveraging deep learning with T2-weighted MR images for enhanced hippocampus segmentation of clinical T1-weighted images based on a series of 3D convolutional neural networks and a specifically acquired multi-contrast dataset. This dataset consists of corresponding pairs of T1- and high-resolution T2-weighted images, with the T2 images only used to create more accurate manual ground truth annotations and to train the segmentation network. The T2-based ground truth labels were also used to evaluate all experiments by comparing the masks visually and by various quantitative measures. We compared our approach with four established state-of-the-art hippocampus segmentation algorithms (FreeSurfer, ASHS, HippoDeep, HippMapp3r) and demonstrated a superior segmentation performance. Moreover, we found that the automated segmentation of T1-weighted images benefits from the T2-based ground truth data. In conclusion, this work showed the beneficial use of high-resolution, T2-based ground truth data for training an automated, deep learning-based hippocampus segmentation and provides the basis for a reliable estimation of hippocampal atrophy in clinical studies.