Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 157
Filtrar
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38271483

RESUMO

The advent of single-cell sequencing technologies has revolutionized cell biology studies. However, integrative analyses of diverse single-cell data face serious challenges, including technological noise, sample heterogeneity, and different modalities and species. To address these problems, we propose scCorrector, a variational autoencoder-based model that can integrate single-cell data from different studies and map them into a common space. Specifically, we designed a Study Specific Adaptive Normalization for each study in decoder to implement these features. scCorrector substantially achieves competitive and robust performance compared with state-of-the-art methods and brings novel insights under various circumstances (e.g. various batches, multi-omics, cross-species, and development stages). In addition, the integration of single-cell data and spatial data makes it possible to transfer information between different studies, which greatly expand the narrow range of genes covered by MERFISH technology. In summary, scCorrector can efficiently integrate multi-study single-cell datasets, thereby providing broad opportunities to tackle challenges emerging from noisy resources.

2.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34471921

RESUMO

Graph is a natural data structure for describing complex systems, which contains a set of objects and relationships. Ubiquitous real-life biomedical problems can be modeled as graph analytics tasks. Machine learning, especially deep learning, succeeds in vast bioinformatics scenarios with data represented in Euclidean domain. However, rich relational information between biological elements is retained in the non-Euclidean biomedical graphs, which is not learning friendly to classic machine learning methods. Graph representation learning aims to embed graph into a low-dimensional space while preserving graph topology and node properties. It bridges biomedical graphs and modern machine learning methods and has recently raised widespread interest in both machine learning and bioinformatics communities. In this work, we summarize the advances of graph representation learning and its representative applications in bioinformatics. To provide a comprehensive and structured analysis and perspective, we first categorize and analyze both graph embedding methods (homogeneous graph embedding, heterogeneous graph embedding, attribute graph embedding) and graph neural networks. Furthermore, we summarize their representative applications from molecular level to genomics, pharmaceutical and healthcare systems level. Moreover, we provide open resource platforms and libraries for implementing these graph representation learning methods and discuss the challenges and opportunities of graph representation learning in bioinformatics. This work provides a comprehensive survey of emerging graph representation learning algorithms and their applications in bioinformatics. It is anticipated that it could bring valuable insights for researchers to contribute their knowledge to graph representation learning and future-oriented bioinformatics studies.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Algoritmos , Biologia Computacional/métodos , Conhecimento , Aprendizado de Máquina
3.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36484687

RESUMO

MOTIVATION: Cell-type-specific gene expression is maintained in large part by transcription factors (TFs) selectively binding to distinct sets of sites in different cell types. Recent research works have provided evidence that such cell-type-specific binding is determined by TF's intrinsic sequence preferences, cooperative interactions with co-factors, cell-type-specific chromatin landscapes and 3D chromatin interactions. However, computational prediction and characterization of cell-type-specific and shared binding sites is rarely studied. RESULTS: In this article, we propose two computational approaches for predicting and characterizing cell-type-specific and shared binding sites by integrating multiple types of features, in which one is based on XGBoost and another is based on convolutional neural network (CNN). To validate the performance of our proposed approaches, ChIP-seq datasets of 10 binding factors were collected from the GM12878 (lymphoblastoid) and K562 (erythroleukemic) human hematopoietic cell lines, each of which was further categorized into cell-type-specific (GM12878- and K562-specific) and shared binding sites. Then, multiple types of features for these binding sites were integrated to train the XGBoost- and CNN-based models. Experimental results show that our proposed approaches significantly outperform other competing methods on three classification tasks. Moreover, we identified independent feature contributions for cell-type-specific and shared sites through SHAP values and explored the ability of the CNN-based model to predict cell-type-specific and shared binding sites by excluding or including DNase signals. Furthermore, we investigated the generalization ability of our proposed approaches to different binding factors in the same cellular environment. AVAILABILITY AND IMPLEMENTATION: The source code is available at: https://github.com/turningpoint1988/CSSBS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Cromatina , Fatores de Transcrição , Humanos , Ligação Proteica/genética , Sítios de Ligação/genética , Fatores de Transcrição/metabolismo , Sequenciamento de Cromatina por Imunoprecipitação , Biologia Computacional/métodos
4.
PLoS Comput Biol ; 19(8): e1011344, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37651321

RESUMO

Accumulating evidence suggests that circRNAs play crucial roles in human diseases. CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. During the past few years, a large number of deep learning (DL) based methods have been proposed for predicting circRNA-disease association and achieved impressive prediction performance. However, there are two main drawbacks to these methods. The first is these methods underutilize biometric information in the data. Second, the features extracted by these methods are not outstanding to represent association characteristics between circRNAs and diseases. In this study, we developed a novel deep learning model, named iCircDA-NEAE, to predict circRNA-disease associations. In particular, we use disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity simultaneously for the first time, and extract hidden features based on accelerated attribute network embedding (AANE) and dynamic convolutional autoencoder (DCAE). Experimental results on the circR2Disease dataset show that iCircDA-NEAE outperforms other competing methods significantly. Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature. Furthermore, we observe that iCircDA-NEAE can effectively predict new potential circRNA-disease associations.


Assuntos
Algoritmos , RNA Circular , Humanos , RNA Circular/genética , Semântica
5.
World J Surg Oncol ; 22(1): 49, 2024 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-38331878

RESUMO

BACKGROUND: TMPRSS2-ERG (T2E) fusion is highly related to aggressive clinical features in prostate cancer (PC), which guides individual therapy. However, current fusion prediction tools lacked enough accuracy and biomarkers were unable to be applied to individuals across different platforms due to their quantitative nature. This study aims to identify a transcriptome signature to detect the T2E fusion status of PC at the individual level. METHODS: Based on 272 high-throughput mRNA expression profiles from the Sboner dataset, we developed a rank-based algorithm to identify a qualitative signature to detect T2E fusion in PC. The signature was validated in 1223 samples from three external datasets (Setlur, Clarissa, and TCGA). RESULTS: A signature, composed of five mRNAs coupled to ERG (five ERG-mRNA pairs, 5-ERG-mRPs), was developed to distinguish T2E fusion status in PC. 5-ERG-mRPs reached 84.56% accuracy in Sboner dataset, which was verified in Setlur dataset (n = 455, accuracy = 82.20%) and Clarissa dataset (n = 118, accuracy = 81.36%). Besides, for 495 samples from TCGA, two subtypes classified by 5-ERG-mRPs showed a higher level of significance in various T2E fusion features than subtypes obtained through current fusion prediction tools, such as STAR-Fusion. CONCLUSIONS: Overall, 5-ERG-mRPs can robustly detect T2E fusion in PC at the individual level, which can be used on any gene measurement platform without specific normalization procedures. Hence, 5-ERG-mRPs may serve as an auxiliary tool for PC patient management.


Assuntos
Neoplasias da Próstata , Transcriptoma , Masculino , Humanos , Proteínas de Fusão Oncogênica/genética , Proteínas de Fusão Oncogênica/metabolismo , Proteínas de Fusão Oncogênica/uso terapêutico , Neoplasias da Próstata/tratamento farmacológico , RNA Mensageiro/genética , Regulador Transcricional ERG/genética , Regulador Transcricional ERG/metabolismo , Serina Endopeptidases/genética , Serina Endopeptidases/metabolismo , Serina Endopeptidases/uso terapêutico
6.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33498086

RESUMO

Transcription factors (TFs) play an important role in regulating gene expression, thus identification of the regions bound by them has become a fundamental step for molecular and cellular biology. In recent years, an increasing number of deep learning (DL) based methods have been proposed for predicting TF binding sites (TFBSs) and achieved impressive prediction performance. However, these methods mainly focus on predicting the sequence specificity of TF-DNA binding, which is equivalent to a sequence-level binary classification task, and fail to identify motifs and TFBSs accurately. In this paper, we developed a fully convolutional network coupled with global average pooling (FCNA), which by contrast is equivalent to a nucleotide-level binary classification task, to roughly locate TFBSs and accurately identify motifs. Experimental results on human ChIP-seq datasets show that FCNA outperforms other competing methods significantly. Besides, we find that the regions located by FCNA can be used by motif discovery tools to further refine the prediction performance. Furthermore, we observe that FCNA can accurately identify TF-DNA binding motifs across different cell lines and infer indirect TF-DNA bindings.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Redes Neurais de Computação , Elementos de Resposta , Análise de Sequência de DNA , Análise de Sequência de Proteína , Fatores de Transcrição , Células A549 , Motivos de Aminoácidos , Humanos , Células MCF-7 , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
7.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33005921

RESUMO

DNA/RNA motif mining is the foundation of gene function research. The DNA/RNA motif mining plays an extremely important role in identifying the DNA- or RNA-protein binding site, which helps to understand the mechanism of gene regulation and management. For the past few decades, researchers have been working on designing new efficient and accurate algorithms for mining motif. These algorithms can be roughly divided into two categories: the enumeration approach and the probabilistic method. In recent years, machine learning methods had made great progress, especially the algorithm represented by deep learning had achieved good performance. Existing deep learning methods in motif mining can be roughly divided into three types of models: convolutional neural network (CNN) based models, recurrent neural network (RNN) based models, and hybrid CNN-RNN based models. We introduce the application of deep learning in the field of motif mining in terms of data preprocessing, features of existing deep learning architectures and comparing the differences between the basic deep learning models. Through the analysis and comparison of existing deep learning methods, we found that the more complex models tend to perform better than simple ones when data are sufficient, and the current methods are relatively simple compared with other fields such as computer vision, language processing (NLP), computer games, etc. Therefore, it is necessary to conduct a summary in motif mining by deep learning, which can help researchers understand this field.


Assuntos
DNA/genética , Redes Neurais de Computação , Motivos de Nucleotídeos , RNA/genética , Análise de Sequência de DNA , Análise de Sequência de RNA
8.
Brief Bioinform ; 22(2): 2085-2095, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32232320

RESUMO

Effectively representing Medical Subject Headings (MeSH) headings (terms) such as disease and drug as discriminative vectors could greatly improve the performance of downstream computational prediction models. However, these terms are often abstract and difficult to quantify. In this paper, we converted the MeSH tree structure into a relationship network and applied several graph embedding algorithms on it to represent these terms. Specifically, the relationship network consisting of nodes (MeSH headings) and edges (relationships), which can be constructed by the tree num. Then, five graph embedding algorithms including DeepWalk, LINE, SDNE, LAP and HOPE were implemented on the relationship network to represent MeSH headings as vectors. In order to evaluate the performance of the proposed methods, we carried out the node classification and relationship prediction tasks. The results show that the MeSH headings characterized by graph embedding algorithms can not only be treated as an independent carrier for representation, but also can be utilized as additional information to enhance the representation ability of vectors. Thus, it can serve as an input and continue to play a significant role in any computational models related to disease, drug, microbe, etc. Besides, our method holds great hope to inspire relevant researchers to study the representation of terms in this network perspective.


Assuntos
Algoritmos , Medical Subject Headings , Simulação por Computador , Sistemas de Liberação de Medicamentos , Predisposição Genética para Doença , Humanos , MicroRNAs/genética , Semântica
9.
PLoS Comput Biol ; 18(3): e1009941, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35263332

RESUMO

Transcription factors (TFs) play an important role in regulating gene expression, thus the identification of the sites bound by them has become a fundamental step for molecular and cellular biology. In this paper, we developed a deep learning framework leveraging existing fully convolutional neural networks (FCN) to predict TF-DNA binding signals at the base-resolution level (named as FCNsignal). The proposed FCNsignal can simultaneously achieve the following tasks: (i) modeling the base-resolution signals of binding regions; (ii) discriminating binding or non-binding regions; (iii) locating TF-DNA binding regions; (iv) predicting binding motifs. Besides, FCNsignal can also be used to predict opening regions across the whole genome. The experimental results on 53 TF ChIP-seq datasets and 6 chromatin accessibility ATAC-seq datasets show that our proposed framework outperforms some existing state-of-the-art methods. In addition, we explored to use the trained FCNsignal to locate all potential TF-DNA binding regions on a whole chromosome and predict DNA sequences of arbitrary length, and the results show that our framework can find most of the known binding regions and accept sequences of arbitrary length. Furthermore, we demonstrated the potential ability of our framework in discovering causal disease-associated single-nucleotide polymorphisms (SNPs) through a series of experiments.


Assuntos
Aprendizado Profundo , Sítios de Ligação , Sequenciamento de Cromatina por Imunoprecipitação , Ligação Proteica , Fatores de Transcrição/metabolismo
10.
PLoS Comput Biol ; 18(10): e1010572, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36206320

RESUMO

In recent years, major advances have been made in various chromosome conformation capture technologies to further satisfy the needs of researchers for high-quality, high-resolution contact interactions. Discriminating the loops from genome-wide contact interactions is crucial for dissecting three-dimensional(3D) genome structure and function. Here, we present a deep learning method to predict genome-wide chromatin loops, called DLoopCaller, by combining accessible chromatin landscapes and raw Hi-C contact maps. Some available orthogonal data ChIA-PET/HiChIP and Capture Hi-C were used to generate positive samples with a wider contact matrix which provides the possibility to find more potential genome-wide chromatin loops. The experimental results demonstrate that DLoopCaller effectively improves the accuracy of predicting genome-wide chromatin loops compared to the state-of-the-art method Peakachu. Moreover, compared to two of most popular loop callers, such as HiCCUPS and Fit-Hi-C, DLoopCaller identifies some unique interactions. We conclude that a combination of chromatin landscapes on the one-dimensional genome contributes to understanding the 3D genome organization, and the identified chromatin loops reveal cell-type specificity and transcription factor motif co-enrichment across different cell lines and species.


Assuntos
Cromatina , Aprendizado Profundo , Cromatina/genética , Genoma/genética , Cromossomos , Fatores de Transcrição/genética
11.
Mol Ther ; 30(4): 1775-1786, 2022 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-35121109

RESUMO

Many biological studies show that the mutation and abnormal expression of microRNAs (miRNAs) could cause a variety of diseases. As an important biomarker for disease diagnosis, miRNA is helpful to understand pathogenesis, and could promote the identification, diagnosis and treatment of diseases. However, the pathogenic mechanism how miRNAs affect these diseases has not been fully understood. Therefore, predicting the potential miRNA-disease associations is of great importance for the development of clinical medicine and drug research. In this study, we proposed a novel deep learning model based on hierarchical graph attention network for predicting miRNA-disease associations (HGANMDA). Firstly, we constructed a miRNA-disease-lncRNA heterogeneous graph based on known miRNA-disease associations, miRNA-lncRNA associations and disease-lncRNA associations. Secondly, the node-layer attention was applied to learn the importance of neighbor nodes based on different meta-paths. Thirdly, the semantic-layer attention was applied to learn the importance of different meta-paths. Finally, a bilinear decoder was employed to reconstruct the connections between miRNAs and diseases. The extensive experimental results indicated that our model achieved good performance and satisfactory results in predicting miRNA-disease associations.


Assuntos
MicroRNAs , RNA Longo não Codificante , Algoritmos , Biologia Computacional/métodos , MicroRNAs/genética , RNA Longo não Codificante/genética
12.
Bioinformatics ; 36(13): 4038-4046, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-31793982

RESUMO

MOTIVATION: Emerging evidence indicates that circular RNA (circRNA) plays a crucial role in human disease. Using circRNA as biomarker gives rise to a new perspective regarding our diagnosing of diseases and understanding of disease pathogenesis. However, detection of circRNA-disease associations by biological experiments alone is often blind, limited to small scale, high cost and time consuming. Therefore, there is an urgent need for reliable computational methods to rapidly infer the potential circRNA-disease associations on a large scale and to provide the most promising candidates for biological experiments. RESULTS: In this article, we propose an efficient computational method based on multi-source information combined with deep convolutional neural network (CNN) to predict circRNA-disease associations. The method first fuses multi-source information including disease semantic similarity, disease Gaussian interaction profile kernel similarity and circRNA Gaussian interaction profile kernel similarity, and then extracts its hidden deep feature through the CNN and finally sends them to the extreme learning machine classifier for prediction. The 5-fold cross-validation results show that the proposed method achieves 87.21% prediction accuracy with 88.50% sensitivity at the area under the curve of 86.67% on the CIRCR2Disease dataset. In comparison with the state-of-the-art SVM classifier and other feature extraction methods on the same dataset, the proposed model achieves the best results. In addition, we also obtained experimental support for prediction results by searching published literature. As a result, 7 of the top 15 circRNA-disease pairs with the highest scores were confirmed by literature. These results demonstrate that the proposed model is a suitable method for predicting circRNA-disease associations and can provide reliable candidates for biological experiments. AVAILABILITY AND IMPLEMENTATION: The source code and datasets explored in this work are available at https://github.com/look0012/circRNA-Disease-association. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Neurais de Computação , RNA Circular , Algoritmos , Humanos
13.
Cancer Cell Int ; 21(1): 47, 2021 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-33514366

RESUMO

BACKGROUND: The incidence of multiple primary malignant tumors (MPMTs) is rising due to the development of screening technologies, significant treatment advances and increased aging of the population. For patients with a prior cancer history, identifying the tumor origin of the second malignant lesion has important prognostic and therapeutic implications and still represents a difficult problem in clinical practice. METHODS: In this study, we evaluated the performance of a 90-gene expression assay and explored its potential diagnostic utility for MPMTs across a broad spectrum of tumor types. Thirty-five MPMT patients from Sir Run Run Shaw Hospital, College of Medicine, Zhejiang University and Fudan University Shanghai Cancer Center were enrolled; 73 MPMT specimens met all quality control criteria and were analyzed by the 90-gene expression assay. RESULTS: For each clinical specimen, the tumor type predicted by the 90-gene expression assay was compared with its pathological diagnosis, with an overall accuracy of 93.2% (68 of 73, 95% confidence interval 0.84-0.97). For histopathological subgroup analysis, the 90-gene expression assay achieved an overall accuracy of 95.0% (38 of 40; 95% CI 0.82-0.99) for well-moderately differentiated tumors and 92.0% (23 of 25; 95% CI 0.82-0.99) for poorly or undifferentiated tumors, with no statistically significant difference (p-value > 0.5). For squamous cell carcinoma specimens, the overall accuracy of gene expression assay also reached 87.5% (7 of 8; 95% CI 0.47-0.99) for identifying the tumor origins. CONCLUSIONS: The 90-gene expression assay provides flexibility and accuracy in identifying the tumor origin of MPMTs. Future incorporation of the 90-gene expression assay in pathological diagnosis will assist oncologists in applying precise treatments, leading to improved care and outcomes for MPMT patients.

14.
Bioinformatics ; 34(1): 33-40, 2018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-28968797

RESUMO

Motivation: Being responsible for initiating transaction of a particular gene in genome, promoter is a short region of DNA. Promoters have various types with different functions. Owing to their importance in biological process, it is highly desired to develop computational tools for timely identifying promoters and their types. Such a challenge has become particularly critical and urgent in facing the avalanche of DNA sequences discovered in the postgenomic age. Although some prediction methods were developed, they can only be used to discriminate a specific type of promoters from non-promoters. None of them has the ability to identify the types of promoters. This is due to the facts that different types of promoters may share quite similar consensus sequence pattern, and that the promoters of same type may have considerably different consensus sequences. Results: To overcome such difficulty, using the multi-window-based PseKNC (pseudo K-tuple nucleotide composition) approach to incorporate the short-, middle-, and long-range sequence information, we have developed a two-layer seamless predictor named as 'iPromoter-2 L'. The first layer serves to identify a query DNA sequence as a promoter or non-promoter, and the second layer to predict which of the following six types the identified promoter belongs to: σ24, σ28, σ32, σ38, σ54 and σ70. Availability and implementation: For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the powerful new predictor has been established at http://bioinformatics.hitsz.edu.cn/iPromoter-2L/. It is anticipated that iPromoter-2 L will become a very useful high throughput tool for genome analysis. Contact: bliu@hit.edu.cn or dshuang@tongji.edu.cn or kcchou@gordonlifescience.org. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Escherichia coli/genética , Genômica/métodos , Regiões Promotoras Genéticas , Análise de Sequência de DNA/métodos , Software , DNA Bacteriano/metabolismo , RNA Polimerases Dirigidas por DNA/metabolismo , Escherichia coli/enzimologia , Proteínas de Escherichia coli/metabolismo , Genoma Bacteriano
15.
Bioinformatics ; 34(22): 3835-3842, 2018 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-29878118

RESUMO

Motivation: Identification of enhancers and their strength is important because they play a critical role in controlling gene expression. Although some bioinformatics tools were developed, they are limited in discriminating enhancers from non-enhancers only. Recently, a two-layer predictor called 'iEnhancer-2L' was developed that can be used to predict the enhancer's strength as well. However, its prediction quality needs further improvement to enhance the practical application value. Results: A new predictor called 'iEnhancer-EL' was proposed that contains two layer predictors: the first one (for identifying enhancers) is formed by fusing an array of six key individual classifiers, and the second one (for their strength) formed by fusing an array of ten key individual classifiers. All these key classifiers were selected from 171 elementary classifiers formed by SVM (Support Vector Machine) based on kmer, subsequence profile and PseKNC (Pseudo K-tuple Nucleotide Composition), respectively. Rigorous cross-validations have indicated that the proposed predictor is remarkably superior to the existing state-of-the-art one in this area. Availability and implementation: A web server for the iEnhancer-EL has been established at http://bioinformatics.hitsz.edu.cn/iEnhancer-EL/, by which users can easily get their desired results without the need to go through the mathematical details. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Sequências Reguladoras de Ácido Nucleico , Software , Nucleotídeos
16.
Bioinformatics ; 34(18): 3086-3093, 2018 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-29684124

RESUMO

Motivation: DNA replication is the key of the genetic information transmission, and it is initiated from the replication origins. Identifying the replication origins is crucial for understanding the mechanism of DNA replication. Although several discriminative computational predictors were proposed to identify DNA replication origins of yeast species, they could only be used to identify very tiny parts (250 or 300 bp) of the replication origins. Besides, none of the existing predictors could successfully capture the 'GC asymmetry bias' of yeast species reported by experimental observations. Hence it would not be surprising why their power is so limited. To grasp the CG asymmetry feature and make the prediction able to cover the entire replication regions of yeast species, we develop a new predictor called 'iRO-3wPseKNC'. Results: Rigorous cross validations on the benchmark datasets from four yeast species (Saccharomyces cerevisiae, Schizosaccharomyces pombe, Kluyveromyces lactis and Pichia pastoris) have indicated that the proposed predictor is really very powerful for predicting the entire DNA duplication origins. Availability and implementation: The web-server for the iRO-3wPseKNC predictor is available at http://bioinformatics.hitsz.edu.cn/iRO-3wPseKNC/, by which users can easily get their desired results without the need to go through the mathematical details. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
DNA/genética , Origem de Replicação , Ascomicetos/genética , Replicação do DNA , Proteínas Fúngicas/genética , Software
17.
Bioinformatics ; 33(14): i243-i251, 2017 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-28881989

RESUMO

MOTIVATION: The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences. RESULTS: We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities. AVAILABILITY AND IMPLEMENTATION: CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8 . CONTACT: dshuang@tongji.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Biologia Computacional/métodos , Motivos de Nucleotídeos , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo , Área Sob a Curva , Sítios de Ligação , Humanos , Células K562 , Ligação Proteica , Curva ROC
18.
Int J Mol Sci ; 19(10)2018 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-30326663

RESUMO

Gene regulatory network (GRN) inference can understand the growth and development of animals and plants, and reveal the mystery of biology. Many computational approaches have been proposed to infer GRN. However, these inference approaches have hardly met the need of modeling, and the reducing redundancy methods based on individual information theory method have bad universality and stability. To overcome the limitations and shortcomings, this thesis proposes a novel algorithm, named HSCVFNT, to infer gene regulatory network with time-delayed regulations by utilizing a hybrid scoring method and complex-valued flexible neural network (CVFNT). The regulations of each target gene can be obtained by iteratively performing HSCVFNT. For each target gene, the HSCVFNT algorithm utilizes a novel scoring method based on time-delayed mutual information (TDMI), time-delayed maximum information coefficient (TDMIC) and time-delayed correlation coefficient (TDCC), to reduce the redundancy of regulatory relationships and obtain the candidate regulatory factor set. Then, the TDCC method is utilized to create time-delayed gene expression time-series matrix. Finally, a complex-valued flexible neural tree model is proposed to infer the time-delayed regulations of each target gene with the time-delayed time-series matrix. Three real time-series expression datasets from (Save Our Soul) SOS DNA repair system in E. coli and Saccharomyces cerevisiae are utilized to evaluate the performance of the HSCVFNT algorithm. As a result, HSCVFNT obtains outstanding F-scores of 0.923, 0.8 and 0.625 for SOS network and (In vivo Reverse-Engineering and Modeling Assessment) IRMA network inference, respectively, which are 5.5%, 14.3% and 72.2% higher than the best performance of other state-of-the-art GRN inference methods and time-delayed methods.


Assuntos
Algoritmos , Biologia Computacional , Redes Reguladoras de Genes , Teorema de Bayes , Biologia Computacional/métodos , Reparo do DNA , Escherichia coli/genética , Redes Neurais de Computação , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/genética , Sensibilidade e Especificidade
19.
BMC Bioinformatics ; 18(Suppl 16): 543, 2017 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-29297304

RESUMO

BACKGROUND: Accumulating biological and clinical reports have indicated that imbalance of microbial community is closely associated with occurrence and development of various complex human diseases. Identifying potential microbe-disease associations, which could provide better understanding of disease pathology and further boost disease diagnostic and prognostic, has attracted more and more attention. However, hardly any computational models have been developed for large scale microbe-disease association prediction. RESULTS: In this article, based on the assumption that microbes with similar functions tend to share similar association or non-association patterns with similar diseases and vice versa, we proposed the model of Network Consistency Projection for Human Microbe-Disease Association prediction (NCPHMDA) by integrating known microbe-disease associations and Gaussian interaction profile kernel similarity for microbes and diseases. NCPHMDA yielded outstanding AUCs of 0.9039, 0.7953 and average AUC of 0.8918 in global leave-one-out cross validation, local leave-one-out cross validation and 5-fold cross validation, respectively. Furthermore, colon cancer, asthma and type 2 diabetes were taken as independent case studies, where 9, 9 and 8 out of the top 10 predicted microbes were successfully confirmed by recent published clinical literature. CONCLUSION: NCPHMDA is a non-parametric universal network-based method which can simultaneously predict associated microbes for investigated diseases but does not require negative samples. It is anticipated that NCPHMDA would become an effective biological resource for clinical experimental guidance.


Assuntos
Simulação por Computador/tendências , Interações entre Hospedeiro e Microrganismos/fisiologia , Algoritmos , Humanos , Prognóstico
20.
Biomed Eng Online ; 16(1): 89, 2017 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-28679415

RESUMO

BACKGROUND: Visual inspection of cardiotocography traces by obstetricians and midwives is the gold standard for monitoring the wellbeing of the foetus during antenatal care. However, inter- and intra-observer variability is high with only a 30% positive predictive value for the classification of pathological outcomes. This has a significant negative impact on the perinatal foetus and often results in cardio-pulmonary arrest, brain and vital organ damage, cerebral palsy, hearing, visual and cognitive defects and in severe cases, death. This paper shows that using machine learning and foetal heart rate signals provides direct information about the foetal state and helps to filter the subjective opinions of medical practitioners when used as a decision support tool. The primary aim is to provide a proof-of-concept that demonstrates how machine learning can be used to objectively determine when medical intervention, such as caesarean section, is required and help avoid preventable perinatal deaths. METHODS: This is evidenced using an open dataset that comprises 506 controls (normal virginal deliveries) and 46 cases (caesarean due to pH ≤ 7.20-acidosis, n = 18; pH > 7.20 and pH < 7.25-foetal deterioration, n = 4; or clinical decision without evidence of pathological outcome measures, n = 24). Several machine-learning algorithms are trained, and validated, using binary classifier performance measures. RESULTS: The findings show that deep learning classification achieves sensitivity = 94%, specificity = 91%, Area under the curve = 99%, F-score = 100%, and mean square error = 1%. CONCLUSIONS: The results demonstrate that machine learning significantly improves the efficiency for the detection of caesarean section and normal vaginal deliveries using foetal heart rate signals compared with obstetrician and midwife predictions and systems reported in previous studies.


Assuntos
Cardiotocografia , Cesárea/classificação , Dispositivos Anticoncepcionais Femininos/classificação , Frequência Cardíaca Fetal , Aprendizado de Máquina , Processamento de Sinais Assistido por Computador , Adulto , Análise Discriminante , Feminino , Humanos , Dinâmica não Linear , Gravidez , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa