Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.798
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Annu Rev Cell Dev Biol ; 35: 683-701, 2019 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-31424964

RESUMO

Expansion microscopy (ExM) is a physical form of magnification that increases the effective resolving power of any microscope. Here, we describe the fundamental principles of ExM, as well as how recently developed ExM variants build upon and apply those principles. We examine applications of ExM in cell and developmental biology for the study of nanoscale structures as well as ExM's potential for scalable mapping of nanoscale structures across large sample volumes. Finally, we explore how the unique anchoring and hydrogel embedding properties enable postexpansion molecular interrogation in a purified chemical environment. ExM promises to play an important role complementary to emerging live-cell imaging techniques, because of its relative ease of adoption and modification and its compatibility with tissue specimens up to at least 200 µm thick.


Assuntos
Biologia do Desenvolvimento/métodos , Microscopia/métodos , Animais , Anticorpos , Humanos , Hidrogéis/química , Processamento de Imagem Assistida por Computador , Proteínas Luminescentes , Microscopia/instrumentação , Microscopia/tendências , Conformação Molecular
2.
Annu Rev Neurosci ; 44: 109-128, 2021 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-34236891

RESUMO

Animals operate in complex environments, and salient social information is encoded in the nervous system and then processed to initiate adaptive behavior. This encoding involves biological embedding, the process by which social experience affects the brain to influence future behavior. Biological embedding is an important conceptual framework for understanding social decision-making in the brain, as it encompasses multiple levels of organization that regulate how information is encoded and used to modify behavior. The framework we emphasize here is that social stimuli provoke short-term changes in neural activity that lead to changes in gene expression on longer timescales. This process, simplified-neurons are for today and genes are for tomorrow-enables the assessment of the valence of a social interaction, an appropriate and rapid response, and subsequent modification of neural circuitry to change future behavioral inclinations in anticipation of environmental changes. We review recent research on the neural and molecular basis of biological embedding in the context of social interactions, with a special focus on the honeybee.


Assuntos
Encéfalo , Interação Social , Animais , Neurônios , Comportamento Social
3.
Proc Natl Acad Sci U S A ; 121(11): e2309469121, 2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38442181

RESUMO

The early-life environment can profoundly shape the trajectory of an animal's life, even years or decades later. One mechanism proposed to contribute to these early-life effects is DNA methylation. However, the frequency and functional importance of DNA methylation in shaping early-life effects on adult outcomes is poorly understood, especially in natural populations. Here, we integrate prospectively collected data on fitness-associated variation in the early environment with DNA methylation estimates at 477,270 CpG sites in 256 wild baboons. We find highly heterogeneous relationships between the early-life environment and DNA methylation in adulthood: aspects of the environment linked to resource limitation (e.g., low-quality habitat, early-life drought) are associated with many more CpG sites than other types of environmental stressors (e.g., low maternal social status). Sites associated with early resource limitation are enriched in gene bodies and putative enhancers, suggesting they are functionally relevant. Indeed, by deploying a baboon-specific, massively parallel reporter assay, we show that a subset of windows containing these sites are capable of regulatory activity, and that, for 88% of early drought-associated sites in these regulatory windows, enhancer activity is DNA methylation-dependent. Together, our results support the idea that DNA methylation patterns contain a persistent signature of the early-life environment. However, they also indicate that not all environmental exposures leave an equivalent mark and suggest that socioenvironmental variation at the time of sampling is more likely to be functionally important. Thus, multiple mechanisms must converge to explain early-life effects on fitness-related traits.


Assuntos
Experiências Adversas da Infância , Metilação de DNA , Animais , Motivos de Nucleotídeos , Bioensaio , Papio/genética
4.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38493342

RESUMO

Dynamic compartmentalization of eukaryotic DNA into active and repressed states enables diverse transcriptional programs to arise from a single genetic blueprint, whereas its dysregulation can be strongly linked to a broad spectrum of diseases. While single-cell Hi-C experiments allow for chromosome conformation profiling across many cells, they are still expensive and not widely available for most labs. Here, we propose an alternate approach, scENCORE, to computationally reconstruct chromatin compartments from the more affordable and widely accessible single-cell epigenetic data. First, scENCORE constructs a long-range epigenetic correlation graph to mimic chromatin interaction frequencies, where nodes and edges represent genome bins and their correlations. Then, it learns the node embeddings to cluster genome regions into A/B compartments and aligns different graphs to quantify chromatin conformation changes across conditions. Benchmarking using cell-type-matched Hi-C experiments demonstrates that scENCORE can robustly reconstruct A/B compartments in a cell-type-specific manner. Furthermore, our chromatin confirmation switching studies highlight substantial compartment-switching events that may introduce substantial regulatory and transcriptional changes in psychiatric disease. In summary, scENCORE allows accurate and cost-effective A/B compartment reconstruction to delineate higher-order chromatin structure heterogeneity in complex tissues.


Assuntos
Cromatina , Cromossomos , Cromatina/genética , DNA , Conformação Molecular , Epigênese Genética
5.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38487845

RESUMO

B cell epitope prediction methods are separated into linear sequence-based predictors and conformational epitope predictions that typically use the measured or predicted protein structure. Most linear predictions rely on the translation of the sequence to biologically based representations and the applications of machine learning on these representations. We here present CALIBER 'Conformational And LInear B cell Epitopes pRediction', and show that a bidirectional long short-term memory with random projection produces a more accurate prediction (test set AUC=0.789) than all current linear methods. The same predictor when combined with an Evolutionary Scale Modeling-2 projection also improves on the state of the art in conformational epitopes (AUC = 0.776). The inclusion of the graph of the 3D distances between residues did not increase the prediction accuracy. However, the long-range sequence information was essential for high accuracy. While the same model structure was applicable for linear and conformational epitopes, separate training was required for each. Combining the two slightly increased the linear accuracy (AUC 0.775 versus 0.768) and reduced the conformational accuracy (AUC = 0.769).


Assuntos
Epitopos de Linfócito B , Epitopos de Linfócito B/química , Conformação Molecular
6.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39175133

RESUMO

Target identification is one of the crucial tasks in drug research and development, as it aids in uncovering the action mechanism of herbs/drugs and discovering new therapeutic targets. Although multiple algorithms of herb target prediction have been proposed, due to the incompleteness of clinical knowledge and the limitation of unsupervised models, accurate identification for herb targets still faces huge challenges of data and models. To address this, we proposed a deep learning-based target prediction framework termed HTINet2, which designed three key modules, namely, traditional Chinese medicine (TCM) and clinical knowledge graph embedding, residual graph representation learning, and supervised target prediction. In the first module, we constructed a large-scale knowledge graph that covers the TCM properties and clinical treatment knowledge of herbs, and designed a component of deep knowledge embedding to learn the deep knowledge embedding of herbs and targets. In the remaining two modules, we designed a residual-like graph convolution network to capture the deep interactions among herbs and targets, and a Bayesian personalized ranking loss to conduct supervised training and target prediction. Finally, we designed comprehensive experiments, of which comparison with baselines indicated the excellent performance of HTINet2 (HR@10 increased by 122.7% and NDCG@10 by 35.7%), ablation experiments illustrated the positive effect of our designed modules of HTINet2, and case study demonstrated the reliability of the predicted targets of Artemisia annua and Coptis chinensis based on the knowledge base, literature, and molecular docking.


Assuntos
Medicamentos de Ervas Chinesas , Medicina Tradicional Chinesa , Redes Neurais de Computação , Medicamentos de Ervas Chinesas/química , Medicamentos de Ervas Chinesas/farmacologia , Algoritmos , Humanos , Aprendizado Profundo , Teorema de Bayes
7.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38517692

RESUMO

Graph learning models have received increasing attention in the computational analysis of single-cell RNA sequencing (scRNA-seq) data. Compared with conventional deep neural networks, graph neural networks and language models have exhibited superior performance by extracting graph-structured data from raw gene count matrices. Established deep neural network-based clustering approaches generally focus on temporal expression patterns while ignoring inherent interactions at gene-level as well as cell-level, which could be regarded as spatial dynamics in single-cell data. Both gene-gene and cell-cell interactions are able to boost the performance of cell type detection, under the framework of multi-view modeling. In this study, spatiotemporal embedding and cell graphs are extracted to capture spatial dynamics at the molecular level. In order to enhance the accuracy of cell type detection, this study proposes the scHybridBERT architecture to conduct multi-view modeling of scRNA-seq data using extracted spatiotemporal patterns. In this scHybridBERT method, graph learning models are employed to deal with cell graphs and the Performer model employs spatiotemporal embeddings. Experimental outcomes about benchmark scRNA-seq datasets indicate that the proposed scHybridBERT method is able to enhance the accuracy of single-cell clustering tasks by integrating spatiotemporal embeddings and cell graphs.


Assuntos
Benchmarking , Regulação da Expressão Gênica , Comunicação Celular , Análise por Conglomerados , Aprendizagem
8.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38517698

RESUMO

The high-throughput genomic and proteomic scanning approaches allow investigators to measure the quantification of genome-wide genes (or gene products) for certain disease conditions, which plays an essential role in promoting the discovery of disease mechanisms. The high-throughput approaches often generate a large gene list of interest (GOIs), such as differentially expressed genes/proteins. However, researchers have to perform manual triage and validation to explore the most promising, biologically plausible linkages between the known disease genes and GOIs (disease signals) for further study. Here, to address this challenge, we proposed a network-based strategy DDK-Linker to facilitate the exploration of disease signals hidden in omics data by linking GOIs to disease knowns genes. Specifically, it reconstructed gene distances in the protein-protein interaction (PPI) network through six network methods (random walk with restart, Deepwalk, Node2Vec, LINE, HOPE, Laplacian) to discover disease signals in omics data that have shorter distances to disease genes. Furthermore, benefiting from the establishment of knowledge base we established, the abundant bioinformatics annotations were provided for each candidate disease signal. To assist in omics data interpretation and facilitate the usage, we have developed this strategy into an application that users can access through a website or download the R package. We believe DDK-Linker will accelerate the exploring of disease genes and drug targets in a variety of omics data, such as genomics, transcriptomics and proteomics data, and provide clues for complex disease mechanism and pharmacological research. DDK-Linker is freely accessible at http://ddklinker.ncpsb.org.cn/.


Assuntos
Proteômica , Software , Proteômica/métodos , Genômica/métodos , Biologia Computacional/métodos , Mapas de Interação de Proteínas
9.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38695119

RESUMO

Sequence similarity is of paramount importance in biology, as similar sequences tend to have similar function and share common ancestry. Scoring matrices, such as PAM or BLOSUM, play a crucial role in all bioinformatics algorithms for identifying similarities, but have the drawback that they are fixed, independent of context. We propose a new scoring method for amino acid similarity that remedies this weakness, being contextually dependent. It relies on recent advances in deep learning architectures that employ self-supervised learning in order to leverage the power of enormous amounts of unlabelled data to generate contextual embeddings, which are vector representations for words. These ideas have been applied to protein sequences, producing embedding vectors for protein residues. We propose the E-score between two residues as the cosine similarity between their embedding vector representations. Thorough testing on a wide variety of reference multiple sequence alignments indicate that the alignments produced using the new $E$-score method, especially ProtT5-score, are significantly better than those obtained using BLOSUM matrices. The new method proposes to change the way alignments are computed, with far-reaching implications in all areas of textual data that use sequence similarity. The program to compute alignments based on various $E$-scores is available as a web server at e-score.csd.uwo.ca. The source code is freely available for download from github.com/lucian-ilie/E-score.


Assuntos
Algoritmos , Biologia Computacional , Alinhamento de Sequência , Alinhamento de Sequência/métodos , Biologia Computacional/métodos , Software , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Proteínas/química , Proteínas/genética , Aprendizado Profundo , Bases de Dados de Proteínas
10.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38581416

RESUMO

The inference of gene regulatory networks (GRNs) from gene expression profiles has been a key issue in systems biology, prompting many researchers to develop diverse computational methods. However, most of these methods do not reconstruct directed GRNs with regulatory types because of the lack of benchmark datasets or defects in the computational methods. Here, we collect benchmark datasets and propose a deep learning-based model, DeepFGRN, for reconstructing fine gene regulatory networks (FGRNs) with both regulation types and directions. In addition, the GRNs of real species are always large graphs with direction and high sparsity, which impede the advancement of GRN inference. Therefore, DeepFGRN builds a node bidirectional representation module to capture the directed graph embedding representation of the GRN. Specifically, the source and target generators are designed to learn the low-dimensional dense embedding of the source and target neighbors of a gene, respectively. An adversarial learning strategy is applied to iteratively learn the real neighbors of each gene. In addition, because the expression profiles of genes with regulatory associations are correlative, a correlation analysis module is designed. Specifically, this module not only fully extracts gene expression features, but also captures the correlation between regulators and target genes. Experimental results show that DeepFGRN has a competitive capability for both GRN and FGRN inference. Potential biomarkers and therapeutic drugs for breast cancer, liver cancer, lung cancer and coronavirus disease 2019 are identified based on the candidate FGRNs, providing a possible opportunity to advance our knowledge of disease treatments.


Assuntos
Redes Reguladoras de Genes , Neoplasias Hepáticas , Humanos , Biologia de Sistemas/métodos , Transcriptoma , Algoritmos , Biologia Computacional/métodos
11.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38581422

RESUMO

Reliable cell type annotations are crucial for investigating cellular heterogeneity in single-cell omics data. Although various computational approaches have been proposed for single-cell RNA sequencing (scRNA-seq) annotation, high-quality cell labels are still lacking in single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) data, because of extreme sparsity and inconsistent chromatin accessibility between datasets. Here, we present a novel automated cell annotation method that transfers cell type information from a well-labeled scRNA-seq reference to an unlabeled scATAC-seq target, via a parallel graph neural network, in a semi-supervised manner. Unlike existing methods that utilize only gene expression or gene activity features, HyGAnno leverages genome-wide accessibility peak features to facilitate the training process. In addition, HyGAnno reconstructs a reference-target cell graph to detect cells with low prediction reliability, according to their specific graph connectivity patterns. HyGAnno was assessed across various datasets, showcasing its strengths in precise cell annotation, generating interpretable cell embeddings, robustness to noisy reference data and adaptability to tumor tissues.


Assuntos
Cromatina , Redes Neurais de Computação , Reprodutibilidade dos Testes
12.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38856171

RESUMO

The identification of protein complexes from protein interaction networks is crucial in the understanding of protein function, cellular processes and disease mechanisms. Existing methods commonly rely on the assumption that protein interaction networks are highly reliable, yet in reality, there is considerable noise in the data. In addition, these methods fail to account for the regulatory roles of biomolecules during the formation of protein complexes, which is crucial for understanding the generation of protein interactions. To this end, we propose a SpatioTemporal constrained RNA-protein heterogeneous network for Protein Complex Identification (STRPCI). STRPCI first constructs a multiplex heterogeneous protein information network to capture deep semantic information by extracting spatiotemporal interaction patterns. Then, it utilizes a dual-view aggregator to aggregate heterogeneous neighbor information from different layers. Finally, through contrastive learning, STRPCI collaboratively optimizes the protein embedding representations under different spatiotemporal interaction patterns. Based on the protein embedding similarity, STRPCI reweights the protein interaction network and identifies protein complexes with core-attachment strategy. By considering the spatiotemporal constraints and biomolecular regulatory factors of protein interactions, STRPCI measures the tightness of interactions, thus mitigating the impact of noisy data on complex identification. Evaluation results on four real PPI networks demonstrate the effectiveness and strong biological significance of STRPCI. The source code implementation of STRPCI is available from https://github.com/LI-jasm/STRPCI.


Assuntos
Mapas de Interação de Proteínas , RNA , RNA/metabolismo , RNA/química , Proteínas/metabolismo , Proteínas/química , Biologia Computacional/métodos , Algoritmos , Mapeamento de Interação de Proteínas/métodos , Humanos
13.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38605638

RESUMO

Recent advances in single-cell RNA sequencing technology have eased analyses of signaling networks of cells. Recently, cell-cell interaction has been studied based on various link prediction approaches on graph-structured data. These approaches have assumptions about the likelihood of node interaction, thus showing high performance for only some specific networks. Subgraph-based methods have solved this problem and outperformed other approaches by extracting local subgraphs from a given network. In this work, we present a novel method, called Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication (SEGCECO), which uses an attributed graph convolutional neural network to predict cell-cell communication from single-cell RNA-seq data. SEGCECO captures the latent and explicit attributes of undirected, attributed graphs constructed from the gene expression profile of individual cells. High-dimensional and sparse single-cell RNA-seq data make converting the data into a graphical format a daunting task. We successfully overcome this limitation by applying SoptSC, a similarity-based optimization method in which the cell-cell communication network is built using a cell-cell similarity matrix which is learned from gene expression data. We performed experiments on six datasets extracted from the human and mouse pancreas tissue. Our comparative analysis shows that SEGCECO outperforms latent feature-based approaches, and the state-of-the-art method for link prediction, WLNM, with 0.99 ROC and 99% prediction accuracy. The datasets can be found at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE84133 and the code is publicly available at Github https://github.com/sheenahora/SEGCECO and Code Ocean https://codeocean.com/capsule/8244724/tree.


Assuntos
Comunicação Celular , Transdução de Sinais , Humanos , Animais , Camundongos , Comunicação Celular/genética , Aprendizagem , Redes Neurais de Computação , Expressão Gênica
14.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38300515

RESUMO

Accurate cell type annotation in single-cell RNA-sequencing data is essential for advancing biological and medical research, particularly in understanding disease progression and tumor microenvironments. However, existing methods are constrained by single feature extraction approaches, lack of adaptability to immune cell types with similar molecular profiles but distinct functions and a failure to account for the impact of cell label noise on model accuracy, all of which compromise the precision of annotation. To address these challenges, we developed a supervised approach called scMMT. We proposed a novel feature extraction technique to uncover more valuable information. Additionally, we constructed a multi-task learning framework based on the GradNorm method to enhance the recognition of challenging immune cells and reduce the impact of label noise by facilitating mutual reinforcement between cell type annotation and protein prediction tasks. Furthermore, we introduced logarithmic weighting and label smoothing mechanisms to enhance the recognition ability of rare cell types and prevent model overconfidence. Through comprehensive evaluations on multiple public datasets, scMMT has demonstrated state-of-the-art performance in various aspects including cell type annotation, rare cell identification, dropout and label noise resistance, protein expression prediction and low-dimensional embedding representation.


Assuntos
Pesquisa Biomédica , Aprendizado Profundo , Humanos , Anotação de Sequência Molecular , Análise da Expressão Gênica de Célula Única , Progressão da Doença
15.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38279649

RESUMO

The identification of human-herpesvirus protein-protein interactions (PPIs) is an essential and important entry point to understand the mechanisms of viral infection, especially in malignant tumor patients with common herpesvirus infection. While natural language processing (NLP)-based embedding techniques have emerged as powerful approaches, the application of multi-modal embedding feature fusion to predict human-herpesvirus PPIs is still limited. Here, we established a multi-modal embedding feature fusion-based LightGBM method to predict human-herpesvirus PPIs. In particular, we applied document and graph embedding approaches to represent sequence, network and function modal features of human and herpesviral proteins. Training our LightGBM models through our compiled non-rigorous and rigorous benchmarking datasets, we obtained significantly better performance compared to individual-modal features. Furthermore, our model outperformed traditional feature encodings-based machine learning methods and state-of-the-art deep learning-based methods using various benchmarking datasets. In a transfer learning step, we show that our model that was trained on human-herpesvirus PPI dataset without cytomegalovirus data can reliably predict human-cytomegalovirus PPIs, indicating that our method can comprehensively capture multi-modal fusion features of protein interactions across various herpesvirus subtypes. The implementation of our method is available at https://github.com/XiaodiYangpku/MultimodalPPI/.


Assuntos
Benchmarking , Citomegalovirus , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural
16.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38324623

RESUMO

Recent advances in spatially resolved transcriptomics (SRT) have brought ever-increasing opportunities to characterize expression landscape in the context of tissue spatiality. Nevertheless, there still exist multiple challenges to accurately detect spatial functional regions in tissue. Here, we present a novel contrastive learning framework, SPAtially Contrastive variational AutoEncoder (SpaCAE), which contrasts transcriptomic signals of each spot and its spatial neighbors to achieve fine-grained tissue structures detection. By employing a graph embedding variational autoencoder and incorporating a deep contrastive strategy, SpaCAE achieves a balance between spatial local information and global information of expression, enabling effective learning of representations with spatial constraints. Particularly, SpaCAE provides a graph deconvolutional decoder to address the smoothing effect of local spatial structure on expression's self-supervised learning, an aspect often overlooked by current graph neural networks. We demonstrated that SpaCAE could achieve effective performance on SRT data generated from multiple technologies for spatial domains identification and data denoising, making it a remarkable tool to obtain novel insights from SRT studies.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Redes Neurais de Computação
17.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38754407

RESUMO

Predicting cancer drug response using both genomics and drug features has shown some success compared to using genomics features alone. However, there has been limited research done on how best to combine or fuse the two types of features. Using a visible neural network with two deep learning branches for genes and drug features as the base architecture, we experimented with different fusion functions and fusion points. Our experiments show that injecting multiplicative relationships between gene and drug latent features into the original concatenation-based architecture DrugCell significantly improved the overall predictive performance and outperformed other baseline models. We also show that different fusion methods respond differently to different fusion points, indicating that the relationship between drug features and different hierarchical biological level of gene features is optimally captured using different methods. Considering both predictive performance and runtime speed, tensor product partial is the best-performing fusion function to combine late-stage representations of drug and gene features to predict cancer drug response.


Assuntos
Antineoplásicos , Genótipo , Neoplasias , Redes Neurais de Computação , Humanos , Neoplasias/genética , Neoplasias/tratamento farmacológico , Antineoplásicos/uso terapêutico , Antineoplásicos/farmacologia , Aprendizado Profundo , Genômica/métodos , Biologia Computacional/métodos
18.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38426324

RESUMO

Emerging clinical evidence suggests that sophisticated associations with circular ribonucleic acids (RNAs) (circRNAs) and microRNAs (miRNAs) are a critical regulatory factor of various pathological processes and play a critical role in most intricate human diseases. Nonetheless, the above correlations via wet experiments are error-prone and labor-intensive, and the underlying novel circRNA-miRNA association (CMA) has been validated by numerous existing computational methods that rely only on single correlation data. Considering the inadequacy of existing machine learning models, we propose a new model named BGF-CMAP, which combines the gradient boosting decision tree with natural language processing and graph embedding methods to infer associations between circRNAs and miRNAs. Specifically, BGF-CMAP extracts sequence attribute features and interaction behavior features by Word2vec and two homogeneous graph embedding algorithms, large-scale information network embedding and graph factorization, respectively. Multitudinous comprehensive experimental analysis revealed that BGF-CMAP successfully predicted the complex relationship between circRNAs and miRNAs with an accuracy of 82.90% and an area under receiver operating characteristic of 0.9075. Furthermore, 23 of the top 30 miRNA-associated circRNAs of the studies on data were confirmed in relevant experiences, showing that the BGF-CMAP model is superior to others. BGF-CMAP can serve as a helpful model to provide a scientific theoretical basis for the study of CMA prediction.


Assuntos
MicroRNAs , Humanos , MicroRNAs/genética , RNA Circular/genética , Curva ROC , Aprendizado de Máquina , Algoritmos , Biologia Computacional/métodos
19.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38975896

RESUMO

Mechanisms of protein-DNA interactions are involved in a wide range of biological activities and processes. Accurately identifying binding sites between proteins and DNA is crucial for analyzing genetic material, exploring protein functions, and designing novel drugs. In recent years, several computational methods have been proposed as alternatives to time-consuming and expensive traditional experiments. However, accurately predicting protein-DNA binding sites still remains a challenge. Existing computational methods often rely on handcrafted features and a single-model architecture, leaving room for improvement. We propose a novel computational method, called EGPDI, based on multi-view graph embedding fusion. This approach involves the integration of Equivariant Graph Neural Networks (EGNN) and Graph Convolutional Networks II (GCNII), independently configured to profoundly mine the global and local node embedding representations. An advanced gated multi-head attention mechanism is subsequently employed to capture the attention weights of the dual embedding representations, thereby facilitating the integration of node features. Besides, extra node features from protein language models are introduced to provide more structural information. To our knowledge, this is the first time that multi-view graph embedding fusion has been applied to the task of protein-DNA binding site prediction. The results of five-fold cross-validation and independent testing demonstrate that EGPDI outperforms state-of-the-art methods. Further comparative experiments and case studies also verify the superiority and generalization ability of EGPDI.


Assuntos
Biologia Computacional , Proteínas de Ligação a DNA , DNA , Redes Neurais de Computação , Sítios de Ligação , DNA/metabolismo , DNA/química , Proteínas de Ligação a DNA/metabolismo , Proteínas de Ligação a DNA/química , Biologia Computacional/métodos , Algoritmos , Ligação Proteica
20.
Proc Natl Acad Sci U S A ; 120(31): e2305001120, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37490534

RESUMO

Real-world networks are neither regular nor random, a fact elegantly explained by mechanisms such as the Watts-Strogatz or the Barabási-Albert models, among others. Both mechanisms naturally create shortcuts and hubs, which while enhancing the network's connectivity, also might yield several undesired navigational effects: They tend to be overused during geodesic navigational processes-making the networks fragile-and provide suboptimal routes for diffusive-like navigation. Why, then, networks with complex topologies are ubiquitous? Here, we unveil that these models also entropically generate network bypasses: alternative routes to shortest paths which are topologically longer but easier to navigate. We develop a mathematical theory that elucidates the emergence and consolidation of network bypasses and measure their navigability gain. We apply our theory to a wide range of real-world networks and find that they sustain complexity by different amounts of network bypasses. At the top of this complexity ranking we found the human brain, which points out the importance of these results to understand the plasticity of complex systems.


Assuntos
Encéfalo , Humanos , Difusão
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA