Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 106
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 52(3): e16, 2024 Feb 09.
Artículo en Inglés | MEDLINE | ID: mdl-38088228

RESUMEN

Functional molecular module (i.e., gene-miRNA co-modules and gene-miRNA-lncRNA triple-layer modules) analysis can dissect complex regulations underlying etiology or phenotypes. However, current module detection methods lack an appropriate usage and effective model of multi-omics data and cross-layer regulations of heterogeneous molecules, causing the loss of critical genetic information and corrupting the detection performance. In this study, we propose a heterogeneous network co-clustering framework (HetFCM) to detect functional co-modules. HetFCM introduces an attributed heterogeneous network to jointly model interplays and multi-type attributes of different molecules, and applies multiple variational graph autoencoders on the network to generate cross-layer association matrices, then it performs adaptive weighted co-clustering on association matrices and attribute data to identify co-modules of heterogeneous molecules. Empirical study on Human and Maize datasets reveals that HetFCM can find out co-modules characterized with denser topology and more significant functions, which are associated with human breast cancer (subtypes) and maize phenotypes (i.e., lipid storage, drought tolerance and oil content). HetFCM is a useful tool to detect co-modules and can be applied to multi-layer functional modules, yielding novel insights for analyzing molecular mechanisms. We also developed a user-friendly module detection and analysis tool and shared it at http://www.sdu-idea.cn/FMDTool.


Asunto(s)
Neoplasias de la Mama , Análisis por Conglomerados , Redes Reguladoras de Genes , Zea mays , Femenino , Humanos , Neoplasias de la Mama/genética , Perfilación de la Expresión Génica/métodos , MicroARNs/genética , Fenotipo , Zea mays/genética
2.
Nucleic Acids Res ; 52(D1): D183-D193, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37956336

RESUMEN

Transcription factors (TFs), transcription co-factors (TcoFs) and their target genes perform essential functions in diseases and biological processes. KnockTF 2.0 (http://www.licpathway.net/KnockTF/index.html) aims to provide comprehensive gene expression profile datasets before/after T(co)F knockdown/knockout across multiple tissue/cell types of different species. Compared with KnockTF 1.0, KnockTF 2.0 has the following improvements: (i) Newly added T(co)F knockdown/knockout datasets in mice, Arabidopsis thaliana and Zea mays and also an expanded scale of datasets in humans. Currently, KnockTF 2.0 stores 1468 manually curated RNA-seq and microarray datasets associated with 612 TFs and 172 TcoFs disrupted by different knockdown/knockout techniques, which are 2.5 times larger than those of KnockTF 1.0. (ii) Newly added (epi)genetic annotations for T(co)F target genes in humans and mice, such as super-enhancers, common SNPs, methylation sites and chromatin interactions. (iii) Newly embedded and updated search and analysis tools, including T(co)F Enrichment (GSEA), Pathway Downstream Analysis and Search by Target Gene (BLAST). KnockTF 2.0 is a comprehensive update of KnockTF 1.0, which provides more T(co)F knockdown/knockout datasets and (epi)genetic annotations across multiple species than KnockTF 1.0. KnockTF 2.0 facilitates not only the identification of functional T(co)Fs and target genes but also the investigation of their roles in the physiological and pathological processes.


Asunto(s)
Bases de Datos Genéticas , Factores de Transcripción , Transcriptoma , Animales , Humanos , Ratones , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Internet , Marcación de Gen , Arabidopsis , Zea mays
3.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37000166

RESUMEN

Cooperative driver pathways discovery helps researchers to study the pathogenesis of cancer. However, most discovery methods mainly focus on genomics data, and neglect the known pathway information and other related multi-omics data; thus they cannot faithfully decipher the carcinogenic process. We propose CDPMiner (Cooperative Driver Pathways Miner) to discover cooperative driver pathways by multiplex network embedding, which can jointly model relational and attribute information of multi-type molecules. CDPMiner first uses the pathway topology to quantify the weight of genes in different pathways, and optimizes the relations between genes and pathways. Then it constructs an attributed multiplex network consisting of micro RNAs, long noncoding RNAs, genes and pathways, embeds the network through deep joint matrix factorization to mine more essential information for pathway-level analysis and reconstructs the pathway interaction network. Finally, CDPMiner leverages the reconstructed network and mutation data to define the driver weight between pathways to discover cooperative driver pathways. Experimental results on Breast invasive carcinoma and Stomach adenocarcinoma datasets show that CDPMiner can effectively fuse multi-omics data to discover more driver pathways, which indeed cooperatively trigger cancers and are valuable for carcinogenesis analysis. Ablation study justifies CDPMiner for a more comprehensive analysis of cancer by fusing multi-omics data.


Asunto(s)
Algoritmos , Neoplasias de la Mama , Humanos , Femenino , Genómica/métodos , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Mutación , Carcinogénesis/genética
4.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36715986

RESUMEN

MOTIVATION: Predicting the associations between human microbes and drugs (MDAs) is one critical step in drug development and precision medicine areas. Since discovering these associations through wet experiments is time-consuming and labor-intensive, computational methods have already been an effective way to tackle this problem. Recently, graph contrastive learning (GCL) approaches have shown great advantages in learning the embeddings of nodes from heterogeneous biological graphs (HBGs). However, most GCL-based approaches don't fully capture the rich structure information in HBGs. Besides, fewer MDA prediction methods could screen out the most informative negative samples for effectively training the classifier. Therefore, it still needs to improve the accuracy of MDA predictions. RESULTS: In this study, we propose a novel approach that employs the Structure-enhanced Contrastive learning and Self-paced negative sampling strategy for Microbe-Drug Association predictions (SCSMDA). Firstly, SCSMDA constructs the similarity networks of microbes and drugs, as well as their different meta-path-induced networks. Then SCSMDA employs the representations of microbes and drugs learned from meta-path-induced networks to enhance their embeddings learned from the similarity networks by the contrastive learning strategy. After that, we adopt the self-paced negative sampling strategy to select the most informative negative samples to train the MLP classifier. Lastly, SCSMDA predicts the potential microbe-drug associations with the trained MLP classifier. The embeddings of microbes and drugs learning from the similarity networks are enhanced with the contrastive learning strategy, which could obtain their discriminative representations. Extensive results on three public datasets indicate that SCSMDA significantly outperforms other baseline methods on the MDA prediction task. Case studies for two common drugs could further demonstrate the effectiveness of SCSMDA in finding novel MDA associations. AVAILABILITY: The source code is publicly available on GitHub https://github.com/Yue-Yuu/SCSMDA-master.


Asunto(s)
Desarrollo de Medicamentos , Medicina de Precisión , Humanos , Programas Informáticos
5.
BMC Bioinformatics ; 25(1): 203, 2024 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-38816718

RESUMEN

BACKGROUND: Molecular biology is crucial for drug discovery, protein design, and human health. Due to the vastness of the drug-like chemical space, depending on biomedical experts to manually design molecules is exceedingly expensive. Utilizing generative methods with deep learning technology offers an effective approach to streamline the search space for molecular design and save costs. This paper introduces a novel E(3)-equivariant score-based diffusion framework for 3D molecular generation via SDEs, aiming to address the constraints of unified Gaussian diffusion methods. Within the proposed framework EMDS, the complete diffusion is decomposed into separate diffusion processes for distinct components of the molecular feature space, while the modeling processes also capture the complex dependency among these components. Moreover, angle and torsion angle information is integrated into the networks to enhance the modeling of atom coordinates and utilize spatial information more effectively. RESULTS: Experiments on the widely utilized QM9 dataset demonstrate that our proposed framework significantly outperforms the state-of-the-art methods in all evaluation metrics for 3D molecular generation. Additionally, ablation experiments are conducted to highlight the contribution of key components in our framework, demonstrating the effectiveness of the proposed framework and the performance improvements of incorporating angle and torsion angle information for molecular generation. Finally, the comparative results of distribution show that our method is highly effective in generating molecules that closely resemble the actual scenario. CONCLUSION: Through the experiments and comparative results, our framework clearly outperforms previous 3D molecular generation methods, exhibiting significantly better capacity for modeling chemically realistic molecules. The excellent performance of EMDS in 3D molecular generation brings novel and encouraging opportunities for tackling challenging biomedical molecule and protein scenarios.


Asunto(s)
Aprendizaje Profundo , Modelos Moleculares , Biología Computacional/métodos , Algoritmos , Diseño de Fármacos , Descubrimiento de Drogas/métodos
6.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35380603

RESUMEN

Predicting differentially expressed genes (DEGs) from epigenetics signal data is the key to understand how epigenetics controls cell functional heterogeneity by gene regulation. This knowledge can help developing 'epigenetics drugs' for complex diseases like cancers. Most of existing machine learning-based methods suffer defects in prediction accuracy, interpretability or training speed. To address these problems, in this paper, we propose a Multiple Self-Attention model for predicting DEGs on Epigenetic data (Epi-MSA). Epi-MSA first uses convolutional neural networks for neighborhood bins information embedding, and then employs multiple self-attention encoders on different input epigenetics factors data to learn which locations of genes are important for predicting DEGs. Next it trains a soft attention module to pick out which epigenetics factors are significant. The attention mechanism makes the model interpretable, and the pure matrix operation of self-attention enables the model to be parallel calculated and speeds up the training. Experiments on datasets from the Roadmap Epigenome Project and BluePrint Data Analysis Portal (BDAP) show that the performance of Epi-MSA is better than existing competitive methods, and Epi-MSA also has a smaller standard deviation, which shows that Epi-MSA is effective and stable. In addition, Epi-MSA has a good interpretability, this is confirmed by referring its attention weight matrix with existing biological knowledge.


Asunto(s)
Epigenómica , Neoplasias , Epigénesis Genética , Epigenómica/métodos , Humanos , Aprendizaje Automático , Redes Neurales de la Computación
7.
Brief Bioinform ; 23(4)2022 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-35696639

RESUMEN

With the development of high-throughput genotyping technology, single nucleotide polymorphism (SNP)-SNP interactions (SSIs) detection has become an essential way for understanding disease susceptibility. Various methods have been proposed to detect SSIs. However, given the disease complexity and bias of individual SSI detectors, these single-detector-based methods are generally unscalable for real genome-wide data and with unfavorable results. We propose a novel ensemble learning-based approach (ELSSI) that can significantly reduce the bias of individual detectors and their computational load. ELSSI randomly divides SNPs into different subsets and evaluates them by multi-type detectors in parallel. Particularly, ELSSI introduces a four-stage pipeline (generate, score, switch and filter) to iteratively generate new SNP combination subsets from SNP subsets, score the combination subset by individual detectors, switch high-score combinations to other detectors for re-scoring, then filter out combinations with low scores. This pipeline makes ELSSI able to detect high-order SSIs from large genome-wide datasets. Experimental results on various simulated and real genome-wide datasets show the superior efficacy of ELSSI to state-of-the-art methods in detecting SSIs, especially for high-order ones. ELSSI is applicable with moderate PCs on the Internet and flexible to assemble new detectors. The code of ELSSI is available at https://www.sdu-idea.cn/codes.php?name=ELSSI.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Genoma , Estudio de Asociación del Genoma Completo/métodos
8.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34929742

RESUMEN

MOTIVATION: Accumulating evidences have indicated that microRNA (miRNA) plays a crucial role in the pathogenesis and progression of various complex diseases. Inferring disease-associated miRNAs is significant to explore the etiology, diagnosis and treatment of human diseases. As the biological experiments are time-consuming and labor-intensive, developing effective computational methods has become indispensable to identify associations between miRNAs and diseases. RESULTS: We present an Ensemble learning framework with Resampling method for MiRNA-Disease Association (ERMDA) prediction to discover potential disease-related miRNAs. Firstly, the resampling strategy is proposed for building multiple different balanced training subsets to address the challenge of sample imbalance within the database. Then, ERMDA extracts miRNA and disease feature representations by integrating miRNA-miRNA similarities, disease-disease similarities and experimentally verified miRNA-disease association information. Next, the feature selection approach is applied to reduce the redundant information and increase the diversity among these subsets. Lastly, ERMDA constructs an individual learner on each subset to yield primitive outcomes, and the soft voting method is introduced for making the final decision based on the prediction results of individual learners. A series of experimental results demonstrates that ERMDA outperforms other state-of-the-art methods on both balanced and unbalanced testing sets. Besides, case studies conducted on the three human diseases further confirm the ERMDA's prediction capability for identifying potential disease-related miRNAs. In conclusion, these experimental results demonstrate that our method can serve as an effective and reliable tool for researchers to explore the regulatory role of miRNAs in complex diseases.


Asunto(s)
Enfermedad/genética , Estudios de Asociación Genética , Aprendizaje Automático , MicroARNs/genética , Algoritmos , Biología Computacional , Predisposición Genética a la Enfermedad/genética , Humanos
9.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-36070619

RESUMEN

MOTIVATION: CircularRNA (circRNA) is a class of noncoding RNA with high conservation and stability, which is considered as an important disease biomarker and drug target. Accumulating pieces of evidence have indicated that circRNA plays a crucial role in the pathogenesis and progression of many complex diseases. As the biological experiments are time-consuming and labor-intensive, developing an accurate computational prediction method has become indispensable to identify disease-related circRNAs. RESULTS: We presented a hybrid graph representation learning framework, named GraphCDA, for predicting the potential circRNA-disease associations. Firstly, the circRNA-circRNA similarity network and disease-disease similarity network were constructed to characterize the relationships of circRNAs and diseases, respectively. Secondly, a hybrid graph embedding model combining Graph Convolutional Networks and Graph Attention Networks was introduced to learn the feature representations of circRNAs and diseases simultaneously. Finally, the learned representations were concatenated and employed to build the prediction model for identifying the circRNA-disease associations. A series of experimental results demonstrated that GraphCDA outperformed other state-of-the-art methods on several public databases. Moreover, GraphCDA could achieve good performance when only using a small number of known circRNA-disease associations as the training set. Besides, case studies conducted on several human diseases further confirmed the prediction capability of GraphCDA for predicting potential disease-related circRNAs. In conclusion, extensive experimental results indicated that GraphCDA could serve as a reliable tool for exploring the regulatory role of circRNAs in complex diseases.


Asunto(s)
Biología Computacional , ARN Circular , Biomarcadores , Biología Computacional/métodos , Humanos , Polímeros
10.
Sensors (Basel) ; 24(11)2024 May 21.
Artículo en Inglés | MEDLINE | ID: mdl-38894061

RESUMEN

Structural health monitoring for roads is an important task that supports inspection of transportation infrastructure. This paper explores deep learning techniques for crack detection in road images and proposes an automatic pixel-level semantic road crack image segmentation method based on a Swin transformer. This method employs Swin-T as the backbone network to extract feature information from crack images at various levels and utilizes the texture unit to extract the texture and edge characteristic information of cracks. The refinement attention module (RAM) and panoramic feature module (PFM) then merge these diverse features, ultimately refining the segmentation results. This method is called FetNet. We collect four public real-world datasets and conduct extensive experiments, comparing FetNet with various deep-learning methods. FetNet achieves the highest precision of 90.4%, a recall of 85.3%, an F1 score of 87.9%, and a mean intersection over union of 78.6% on the Crack500 dataset. The experimental results show that the FetNet approach surpasses other advanced models in terms of crack segmentation accuracy and exhibits excellent generalizability for use in complex scenes.

11.
BMC Bioinformatics ; 24(1): 163, 2023 Apr 21.
Artículo en Inglés | MEDLINE | ID: mdl-37085776

RESUMEN

MOTIVATION: Gene regulatory networks (GRNs) arise from the intricate interactions between transcription factors (TFs) and their target genes during the growth and development of organisms. The inference of GRNs can unveil the underlying gene interactions in living systems and facilitate the investigation of the relationship between gene expression patterns and phenotypic traits. Although several machine-learning models have been proposed for inferring GRNs from single-cell RNA sequencing (scRNA-seq) data, some of these models, such as Boolean and tree-based networks, suffer from sensitivity to noise and may encounter difficulties in handling the high noise and dimensionality of actual scRNA-seq data, as well as the sparse nature of gene regulation relationships. Thus, inferring large-scale information from GRNs remains a formidable challenge. RESULTS: This study proposes a multilevel, multi-structure framework called a pseudo-Siamese GRN (PSGRN) for inferring large-scale GRNs from time-series expression datasets. Based on the pseudo-Siamese network, we applied a gated recurrent unit to capture the time features of each TF and target matrix and learn the spatial features of the matrices after merging by applying the DenseNet framework. Finally, we applied a sigmoid function to evaluate interactions. We constructed two maize sub-datasets, including gene expression levels and GRNs, using existing open-source maize multi-omics data and compared them to other GRN inference methods, including GENIE3, GRNBoost2, nonlinear ordinary differential equations, CNNC, and DGRNS. Our results show that PSGRN outperforms state-of-the-art methods. This study proposed a new framework: a PSGRN that allows GRNs to be inferred from scRNA-seq data, elucidating the temporal and spatial features of TFs and their target genes. The results show the model's robustness and generalization, laying a theoretical foundation for maize genotype-phenotype associations with implications for breeding work.


Asunto(s)
Redes Reguladoras de Genes , Fitomejoramiento , Regulación de la Expresión Génica , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Algoritmos
12.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33367541

RESUMEN

In disease research, the study of gene-disease correlation has always been an important topic. With the emergence of large-scale connected data sets in biology, we use known correlations between the entities, which may be from different sets, to build a biological heterogeneous network and propose a new network embedded representation algorithm to calculate the correlation between disease and genes, using the correlation score to predict pathogenic genes. Then, we conduct several experiments to compare our method to other state-of-the-art methods. The results reveal that our method achieves better performance than the traditional methods.


Asunto(s)
Algoritmos , Biología Computacional , Redes Reguladoras de Genes , Humanos
13.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33839743

RESUMEN

MOTIVATION: Identifying the proteins that interact with drugs can reduce the cost and time of drug development. Existing computerized methods focus on integrating drug-related and protein-related data from multiple sources to predict candidate drug-target interactions (DTIs). However, multi-scale neighboring node sequences and various kinds of drug and protein similarities are neither fully explored nor considered in decision making. RESULTS: We propose a drug-target interaction prediction method, DTIP, to encode and integrate multi-scale neighbouring topologies, multiple kinds of similarities, associations, interactions related to drugs and proteins. We firstly construct a three-layer heterogeneous network to represent interactions and associations across drug, protein, and disease nodes. Then a learning framework based on fully-connected autoencoder is proposed to learn the nodes' low-dimensional feature representations within the heterogeneous network. Secondly, multi-scale neighbouring sequences of drug and protein nodes are formulated by random walks. A module based on bidirectional gated recurrent unit is designed to learn the neighbouring sequential information and integrate the low-dimensional features of nodes. Finally, we propose attention mechanisms at feature level, neighbouring topological level and similarity level to learn more informative features, topologies and similarities. The prediction results are obtained by integrating neighbouring topologies, similarities and feature attributes using a multiple layer CNN. Comprehensive experimental results over public dataset demonstrated the effectiveness of our innovative features and modules. Comparison with other state-of-the-art methods and case studies of five drugs further validated DTIP's ability in discovering the potential candidate drug-related proteins.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Aprendizaje Automático , Modelos Teóricos , Preparaciones Farmacéuticas/metabolismo , Proteínas/metabolismo , Desarrollo de Medicamentos/métodos , Humanos , Preparaciones Farmacéuticas/química , Unión Proteica , Proteínas/química , Reproducibilidad de los Resultados , Máquina de Vectores de Soporte
14.
Bioinformatics ; 38(19): 4581-4588, 2022 09 30.
Artículo en Inglés | MEDLINE | ID: mdl-35997558

RESUMEN

MOTIVATION: High-resolution annotation of gene functions is a central task in functional genomics. Multiple proteoforms translated from alternatively spliced isoforms from a single gene are actual function performers and greatly increase the functional diversity. The specific functions of different isoforms can decipher the molecular basis of various complex diseases at a finer granularity. Multi-instance learning (MIL)-based solutions have been developed to distribute gene(bag)-level Gene Ontology (GO) annotations to isoforms(instances), but they simply presume that a particular annotation of the gene is responsible by only one isoform, neglect the hierarchical structures and semantics of massive GO terms (labels), or can only handle dozens of terms. RESULTS: We propose an efficacy approach IsofunGO to differentiate massive functions of isoforms by GO embedding. Particularly, IsofunGO first introduces an attributed hierarchical network to model massive GO terms, and a GO network embedding strategy to learn compact representations of GO terms and project GO annotations of genes into compressed ones, this strategy not only explores and preserves hierarchy between GO terms but also greatly reduces the prediction load. Next, it develops an attention-based MIL network to fuse genomics and transcriptomics data of isoforms and predict isoform functions by referring to compressed annotations. Extensive experiments on benchmark datasets demonstrate the efficacy of IsofunGO. Both the GO embedding and attention mechanism can boost the performance and interpretability. AVAILABILITYAND IMPLEMENTATION: The code of IsofunGO is available at http://www.sdu-idea.cn/codes.php?name=IsofunGO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Semántica , Ontología de Genes , Anotación de Secuencia Molecular , Isoformas de Proteínas/genética
15.
Methods ; 205: 18-28, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35690250

RESUMEN

Genome-phenome association (GPA) prediction can promote the understanding of biological mechanisms about complex pathology of phenotypes (i.e., traits and diseases). Traditional heterogeneous network-based GPA approaches overwhelmingly need to project heterogeneous data toward homogeneous network for data fusion and prediction, such projections result in the loss of heterogeneous network structure information. Matrix factorization based data fusion can avoid such projection by integrating multi-type data in a coherent way, but they typically perform linear factorization and cannot mine the nonlinear relationships between molecules, which compromise the accuracy of GPA analysis. Furthermore, most of them can not selectively synergy network topology and node attribution information in a principle way. In this paper, we propose a weighted deep matrix factorization based solution (WDGPA) to predict GPAs by selectively and differentially fusing heterogeneous molecular network and diverse attributes of nodes. WDGPA firstly assigns weights to inter/intra-relational data matrices and attribute data matrices, and performs deep matrix factorization on these matrices of heterogeneous network in a cooperative manner to obtain the nonlinear representations of different nodes. In addition, it performs low-rank representation learning on the attribute data with the shared nonlinear representations. In this way, both the network topology and node attributes are jointly mined to explore the representations of molecules and complex interplays between molecules and phenotypes. WDGPA then uses the representational vectors of gene and phenotype nodes to predict GPAs. Experimental results on maize and human datasets confirm that WDGPA outperforms competitive methods by a large margin under different evaluation protocols.


Asunto(s)
Algoritmos , Genoma , Humanos , Fenotipo
16.
Methods ; 198: 65-75, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-34555529

RESUMEN

Epistasis between single nucleotide polymorphisms (SNPs) plays an important role in elucidating the missing heritability of complex diseases. Diverse approaches have been invented for detecting SNP interactions, but they canonically neglect the important and useful connections between SNPs and other bio-molecules (i.e., miRNAs and lncRNAs). To comprehensively model these disease related molecules, a heterogeneous bio-molecular network based solution EpiHNet is introduced for high-order SNP interactions detection. EpiHNet firstly uses case/control data to construct an SNP statistical network, and meta-path based similarity on the heterogeneous network composed with SNPs, genes, lncRNAs, miRNAs and diseases to define another SNP relational network. The SNP relational network can explore and exploit different associations between molecules and diseases to complement the SNP statistical network and search the significantly associated SNPs. Next, EpiHNet integrates these two networks into a composite network, applies the modularity based clustering with fast search strategy to divide SNP nodes into different clusters. After that, it detects SNP interactions based on SNP combinations derived from each cluster. Synthetic experiments on diverse two-locus and three-locus disease models manifest that EpiHNet outperforms competitive baselines, even without the heterogeneous network. For real WTCCC breast cancer data, EpiHNet also demonstrates expressive results on detecting high-order SNP interactions.


Asunto(s)
Epistasis Genética , Estudio de Asociación del Genoma Completo , Algoritmos , Estudios de Casos y Controles , Análisis por Conglomerados , Estudio de Asociación del Genoma Completo/métodos , Humanos , Polimorfismo de Nucleótido Simple
17.
Sensors (Basel) ; 23(13)2023 Jun 22.
Artículo en Inglés | MEDLINE | ID: mdl-37447674

RESUMEN

Accurate equipment operation trend prediction plays an important role in ensuring the safe operation of equipment and reducing maintenance costs. Therefore, monitoring the equipment vibration and predicting the time series of the vibration trend is one of the effective means to prevent equipment failures. In order to reduce the error of equipment operation trend prediction, this paper proposes a method for equipment operation trend prediction based on a combination of signal decomposition and an Informer prediction model. Aiming at the problem of high noise in vibration signals, which makes it difficult to obtain intrinsic characteristics when directly using raw data for prediction, the original signal is decomposed once using the variational mode decomposition (VMD) algorithm optimized by the improved sparrow search algorithm (ISSA) to obtain the intrinsic mode function (IMF) for different frequencies and calculate the fuzzy entropy. The improved adaptive white noise complete set empirical mode decomposition (ICEEMDAN) is used to decompose the components with the largest fuzzy entropy to obtain a series of intrinsic mode components, fully combining the advantages of the Informer model in processing long time series, and predict equipment operation trend data. Input all subsequences into the Informer model and reconstruct the results to obtain the predicted results. The experimental results indicate that the proposed method can effectively improve the accuracy of equipment operation trend prediction compared to other models.


Asunto(s)
Aprendizaje Profundo , Vibración , Algoritmos , Entropía , Falla de Equipo
18.
Bioinformatics ; 37(24): 4818-4825, 2021 12 11.
Artículo en Inglés | MEDLINE | ID: mdl-34282449

RESUMEN

MOTIVATION: Alternative splicing creates the considerable proteomic diversity and complexity on relatively limited genome. Proteoforms translated from alternatively spliced isoforms of a gene actually execute the biological functions of this gene, which reflect the functional knowledge of genes at a finer granular level. Recently, some computational approaches have been proposed to differentiate isoform functions using sequence and expression data. However, their performance is far from being desirable, mainly due to the imbalance and lack of annotations at isoform-level, and the difficulty of modeling gene-isoform relations. RESULT: We propose a deep multi-instance learning-based framework (DMIL-IsoFun) to differentiate the functions of isoforms. DMIL-IsoFun firstly introduces a multi-instance learning convolution neural network trained with isoform sequences and gene-level annotations to extract the feature vectors and initialize the annotations of isoforms, and then uses a class-imbalance Graph Convolution Network to refine the annotations of individual isoforms based on the isoform co-expression network and extracted features. Extensive experimental results show that DMIL-IsoFun improves the Smin and Fmax of state-of-the-art solutions by at least 29.6% and 40.8%. The effectiveness of DMIL-IsoFun is further confirmed on a testbed of human multiple-isoform genes, and maize isoforms related with photosynthesis. AVAILABILITY AND IMPLEMENTATION: The code and data are available at http://www.sdu-idea.cn/codes.php?name=DMIL-Isofun. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Empalme Alternativo , Proteómica , Humanos , Isoformas de Proteínas/genética , Redes Neurales de la Computación , Anotación de Secuencia Molecular
19.
Brief Bioinform ; 20(6): 1972-1980, 2019 11 27.
Artículo en Inglés | MEDLINE | ID: mdl-30020404

RESUMEN

A rooted phylogenetic network is an important structure in the description of evolutionary relationships. Computing the distance (topological dissimilarity) between two rooted phylogenetic networks is a fundamental in phylogenic analysis. During the past few decades, several polynomial-time computable metrics have been described. Here, we give a comprehensive review and analysis on those metrics, including the correlation among metrics and the distribution of distance values computed by each metric. Moreover, we describe the software and website, CDRPN (Computing Distance for Rooted Phylogenetic Networks), for measuring the topological dissimilarity between rooted phylogenetic networks. AVAILABILITY: http://bioinformatics.imu.edu.cn/distance/. CONTACT: guomaozu@bucea.edu.cn.


Asunto(s)
Biología Computacional , Filogenia , Algoritmos
20.
Bioinformatics ; 36(1): 303-310, 2020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31250882

RESUMEN

MOTIVATION: Alternative splicing contributes to the functional diversity of protein species and the proteoforms translated from alternatively spliced isoforms of a gene actually execute the biological functions. Computationally predicting the functions of genes has been studied for decades. However, how to distinguish the functional annotations of isoforms, whose annotations are essential for understanding developmental abnormalities and cancers, is rarely explored. The main bottleneck is that functional annotations of isoforms are generally unavailable and functional genomic databases universally store the functional annotations at the gene level. RESULTS: We propose IsoFun to accomplish Isoform Function prediction based on bi-random walks on a heterogeneous network. IsoFun firstly constructs an isoform functional association network based on the expression profiles of isoforms derived from multiple RNA-seq datasets. Next, IsoFun uses the available Gene Ontology annotations of genes, gene-gene interactions and the relations between genes and isoforms to construct a heterogeneous network. After this, IsoFun performs a tailored bi-random walk on the heterogeneous network to predict the association between GO terms and isoforms, thus accomplishing the prediction of GO annotations of isoforms. Experimental results show that IsoFun significantly outperforms the state-of-the-art algorithms and improves the area under the receiver-operating curve (AUROC) and the area under the precision-recall curve (AUPRC) by 17% and 44% at the gene-level, respectively. We further validated the performance of IsoFun on the genes ADAM15 and BCL2L1. IsoFun accurately differentiates the functions of respective isoforms of these two genes. AVAILABILITY AND IMPLEMENTATION: The code of IsoFun is available at http://mlda.swu.edu.cn/codes.php? name=IsoFun. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Biología Computacional , Isoformas de Proteínas , Biología Computacional/métodos , Ontología de Genes , Anotación de Secuencia Molecular , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA