Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
Comput Biol Med ; 175: 108487, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38653064

RESUMO

Drug repurposing is promising in multiple scenarios, such as emerging viral outbreak controls and cost reductions of drug discovery. Traditional graph-based drug repurposing methods are limited to fast, large-scale virtual screens, as they constrain the counts for drugs and targets and fail to predict novel viruses or drugs. Moreover, though deep learning has been proposed for drug repurposing, only a few methods have been used, including a group of pre-trained deep learning models for embedding generation and transfer learning. Hence, we propose DeepSeq2Drug to tackle the shortcomings of previous methods. We leverage multi-modal embeddings and an ensemble strategy to complement the numbers of drugs and viruses and to guarantee the novel prediction. This framework (including the expanded version) involves four modal types: six NLP models, four CV models, four graph models, and two sequence models. In detail, we first make a pipeline and calculate the predictive performance of each pair of viral and drug embeddings. Then, we select the best embedding pairs and apply an ensemble strategy to conduct anti-viral drug repurposing. To validate the effect of the proposed ensemble model, a monkeypox virus (MPV) case study is conducted to reflect the potential predictive capability. This framework could be a benchmark method for further pre-trained deep learning optimization and anti-viral drug repurposing tasks. We also build software further to make the proposed model easier to reuse. The code and software are freely available at http://deepseq2drug.cs.cityu.edu.hk.


Assuntos
Antivirais , Aprendizado Profundo , Reposicionamento de Medicamentos , Reposicionamento de Medicamentos/métodos , Antivirais/farmacologia , Antivirais/uso terapêutico , Humanos , Software , Benchmarking
2.
Genes (Basel) ; 14(12)2023 Dec 18.
Artigo em Inglês | MEDLINE | ID: mdl-38137057

RESUMO

Tea is an important cash crop worldwide, and its nutritional value has led to its high economic benefits. Tea anthracnose is a common disease of tea plants that seriously affects food safety and yield and has a far-reaching impact on the sustainable development of the tea industry. In this study, phenotypic analysis and pathogenicity analysis were performed on knockout and complement strains of HTF2-the transcriptional regulator of tea anthracnose homeobox-and the pathogenic mechanism of these strains was explored via RNA-seq. The MoHox1 gene sequence of the rice blast fungus was indexed, and the anthracnose genome was searched for CfHTF2. Evolutionary analysis recently reported the affinity of HTF2 for C. fructicola and C. higginsianum. The loss of CfHTF2 slowed the vegetative growth and spore-producing capacity of C. fructicola and weakened its resistance and pathogenesis to adverse conditions. The transcriptome sequencing of wild-type N425 and CfHTF2 deletion mutants was performed, and a total of 3144 differentially expressed genes (DEGs) were obtained, 1594 of which were upregulated and 1550 of which were downregulated. GO and KEGG enrichment analyses of DEGs mainly focused on signaling pathways such as the biosynthesis of secondary metabolites. In conclusion, this study lays a foundation for further study of the pathogenic mechanism of tea anthracnose and provides a molecular basis for the analysis of the pathogenic molecular mechanism of CfHTF2.


Assuntos
Camellia sinensis , Osmorregulação , Esporos Fúngicos , Filogenia , Doenças das Plantas/genética , Doenças das Plantas/microbiologia , Camellia sinensis/genética , Camellia sinensis/metabolismo , Chá/genética
3.
Signal Image Video Process ; : 1-9, 2023 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-37362228

RESUMO

Speech quality is frequently affected by a variety factors in online conferencing applications, such as background noise, reverberation, packet loss and network jitter. In real scenarios, it is impossible to obtain a clean reference signal for evaluating the quality of the conferencing speech. Therefore, an effective non-intrusive speech quality assessment (NISQA) method is necessary. In this paper, we propose a new network framework for NISQA based on ResNet and BiLSTM. ResNet is utilized to extract local features, while BiLSTM is used to integrate representative features with long-term time dependencies and sequential characteristics. Considering that ResNet may result in the loss of context information when applied to the NISQA task, we propose a variant of ResNet which can preserve the time series information of the conferencing speech. The experimental results demonstrate that the proposed method has a high correlation with the mean opinion score of clean, noisy and processed speech.

4.
Adv Mater ; 35(15): e2203547, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36649977

RESUMO

Sodium storage batteries are one of the ever-increasing next-generation large-scale energy storage systems owing to the abundant resources and low cost. However, their viability is severely hampered by dendrite-related hazards on anodes. Herein, a novel ultrathin (8 µm) exterior-nonporous separator composed of honeycomb-structured fibers is prepared for homogeneous Na deposition and suppressed dendrite penetration. The unhindered ion transmission greatly benefits from honeycomb-structured fibers with huge electrolyte uptake (376.7%) and the polymer's inherent transport ability. Additionally, polar polymer chains consisting of polyethersulfone and polyvinylidene customize the highly aggregated solvation structure of electrolytes via substantial solvent immobilization, facilitating ion-conductivity-enhanced inorganic-rich solid-electrolyte interphase with remarkable interface endurance. With the reliable mechanical strength of the separator, the assembled sodium-ion full cell delivers significantly improved energy density and high safety, enabling stable operation under cutting and rolling. The as-prepared separator can further be generalized to lithium-based batteries for which apparent dendrite inhibition and cyclability are accessible and demonstrates its potential for practical application.

5.
Entropy (Basel) ; 25(1)2023 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-36673285

RESUMO

With the development of image recovery models, especially those based on adversarial and perceptual losses, the detailed texture portions of images are being recovered more naturally. However, these restored images are similar but not identical in detail texture to their reference images. With traditional image quality assessment methods, results with better subjective perceived quality often score lower in objective scoring. Assessment methods suffer from subjective and objective inconsistencies. This paper proposes a regional differential information entropy (RDIE) method for image quality assessment to address this problem. This approach allows better assessment of similar but not identical textural details and achieves good agreement with perceived quality. Neural networks are used to reshape the process of calculating information entropy, improving the speed and efficiency of the operation. Experiments conducted with this study's image quality assessment dataset and the PIPAL dataset show that the proposed RDIE method yields a high degree of agreement with people's average opinion scores compared with other image quality assessment metrics, proving that RDIE can better quantify the perceived quality of images.

6.
Entropy (Basel) ; 24(9)2022 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-36141076

RESUMO

Graph neural networks (GNNs) with feature propagation have demonstrated their power in handling unstructured data. However, feature propagation is also a smooth process that tends to make all node representations similar as the number of propagation increases. To address this problem, we propose a novel Block-Based Adaptive Decoupling (BBAD) Framework to produce effective deep GNNs by utilizing backbone networks. In this framework, each block contains a shallow GNN with feature propagation and transformation decoupled. We also introduce layer regularizations and flexible receptive fields to automatically adjust the propagation depth and to provide different aggregation hops for each node, respectively. We prove that the traditional coupled GNNs are more likely to suffer from over-smoothing when they become deep. We also demonstrate the diversity of outputs from different blocks of our framework. In the experiments, we conduct semi-supervised and fully supervised node classifications on benchmark datasets, and the results verify that our method can not only improve the performance of various backbone networks, but also is superior to existing deep graph neural networks with less parameters.

7.
Molecules ; 27(18)2022 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-36144703

RESUMO

Predicting products of organic chemical reactions is useful in chemical sciences, especially when one or more reactants are new organics. However, the performance of traditional learning models heavily relies on high-quality labeled data. In this work, to utilize unlabeled data for better prediction performance, we propose a method that combines semi-supervised learning with graph convolutional neural networks for chemical reaction prediction. First, we propose a Mean Teacher Weisfeiler-Lehman Network to find the reaction centers. Then, we construct the candidate product set. Finally, we use an Improved Weisfeiler-Lehman Difference Network to rank candidate products. Experimental results demonstrate that, with 400k labeled data, our framework can improve the top-5 accuracy by 0.7% using 35k unlabeled data. When the proportion of unlabeled data increases, the performance gain can be larger. For example, with 80k labeled data and 35k unlabeled data, the performance gain with our framework can be 1.8%.


Assuntos
Redes Neurais de Computação , Aprendizado de Máquina Supervisionado , Compostos Orgânicos
8.
PeerJ Comput Sci ; 7: e656, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34435100

RESUMO

Data prediction and imputation are important parts of marine animal movement trajectory analysis as they can help researchers understand animal movement patterns and address missing data issues. Compared with traditional methods, deep learning methods can usually provide enhanced pattern extraction capabilities, but their applications in marine data analysis are still limited. In this research, we propose a composite deep learning model to improve the accuracy of marine animal trajectory prediction and imputation. The model extracts patterns from the trajectories with an encoder network and reconstructs the trajectories using these patterns with a decoder network. We use attention mechanisms to highlight certain extracted patterns as well for the decoder. We also feed these patterns into a second decoder for prediction and imputation. Therefore, our approach is a coupling of unsupervised learning with the encoder and the first decoder and supervised learning with the encoder and the second decoder. Experimental results demonstrate that our approach can reduce errors by at least 10% on average comparing with other methods.

9.
PeerJ Comput Sci ; 7: e592, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34179456

RESUMO

Motion analysis is important in video surveillance systems and background subtraction is useful for moving object detection in such systems. However, most of the existing background subtraction methods do not work well for surveillance systems in the evening because objects are usually dark and reflected light is usually strong. To resolve these issues, we propose a framework that utilizes a Weber contrast descriptor, a texture feature extractor, and a light detection unit, to extract the features of foreground objects. We propose a local pattern enhancement method. For the light detection unit, our method utilizes the finding that lighted areas in the evening usually have a low saturation in hue-saturation-value and hue-saturation-lightness color spaces. Finally, we update the background model and the foreground objects in the framework. This approach is able to improve foreground object detection in night videos, which do not need a large data set for pre-training.

10.
J Hepatobiliary Pancreat Sci ; 28(8): 659-670, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-33053264

RESUMO

BACKGROUND/PURPOSE: To explore the risk factors of splenic vessel preservation in laparoscopic distal pancreatectomy (LDP) and to guide with the appropriate selection of surgical methods through three-dimensional (3D) reconstruction. METHODS: Patients suffering from benign or low-grade malignant tumors of pancreatic body and tail having undergone LDP in Ningbo Medical Center Lihuili Hospital from January 2014 to September 2019 were selected for quantitative analysis of the anatomical data of patients' pancreas, tumors, splenic vessels and spleens by 3D reconstruction. According to the final surgical methods, the patients were divided into the laparoscopic spleen-preserving distal pancreatectomy with splenic vessel preservation (lap-SVP) group and the non-lap-SVP group. Clinical data of the two groups were compared to assess the risk factors for surgical failure of lap-SVP and logistic regression model was applied to predict the choice of surgical methods. RESULTS: A total of 218 patients were included in the study, including 144 in the lap-SVP group and 74 in the non-lap-SVP group. Multivariate analysis confirms that large tumor volume, large contact area between the pancreas to be resected and the splenic vein, and large maximum ratio of the circumference of the splenic vessel embedded in the pancreas to be resected to the circumference of the splenic vessel are independent risk factors for surgical failure of lap-SVP (OR > 1, P < .05). The prediction accuracy of lap-SVP operation by the logistic regression reaches up to 80.9%. CONCLUSIONS: 3D reconstruction can provide essential basis for the surgical method selection of laparoscopic distal pancreatectomy.


Assuntos
Imageamento Tridimensional , Laparoscopia , Pancreatectomia , Neoplasias Pancreáticas , Humanos , Pancreatectomia/métodos , Neoplasias Pancreáticas/diagnóstico por imagem , Neoplasias Pancreáticas/cirurgia , Artéria Esplênica/cirurgia , Resultado do Tratamento
11.
PeerJ Comput Sci ; 6: e311, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33816962

RESUMO

Recently, object detection methods have developed rapidly and have been widely used in many areas. In many scenarios, helmet wearing detection is very useful, because people are required to wear helmets to protect their safety when they work in construction sites or cycle in the streets. However, for the problem of helmet wearing detection in complex scenes such as construction sites and workshops, the detection accuracy of current approaches still needs to be improved. In this work, we analyze the mechanism and performance of several detection algorithms and identify two feasible base algorithms that have complementary advantages. We use one base algorithm to detect relatively large heads and helmets. Also, we use the other base algorithm to detect relatively small heads, and we add another convolutional neural network to detect whether there is a helmet above each head. Then, we integrate these two base algorithms with an ensemble method. In this method, we first propose an approach to merge information of heads and helmets from the base algorithms, and then propose a linear function to estimate the confidence score of the identified heads and helmets. Experiments on a benchmark data set show that, our approach increases the precision and recall for base algorithms, and the mean Average Precision of our approach is 0.93, which is better than many other approaches. With GPU acceleration, our approach can achieve real-time processing on contemporary computers, which is useful in practice.

12.
Artigo em Inglês | MEDLINE | ID: mdl-30475727

RESUMO

Transcription factors (TFs) are the major components of human gene regulation. In particular, they bind onto specific DNA sequences and regulate neighborhood genes in different tissues at different developmental stages. Non-synonymous single nucleotide polymorphisms on its protein-coding sequences could result in undesired consequences in human. Therefore, it is necessary to develop methods for predicting any abnormality among those non-synonymous single nucleotide polymorphisms. To address it, we have developed and compared different strategies to predict deleterious non-synonymous single nucleotide polymorphisms (also known as missense mutations) on the protein-coding sequences of human TFs. Taking advantage of evolutionary conservation signals, we have developed and compared different classifiers with different feature sets as computed from different evolutionarily related sequence collections. The results indicate that the classic ensemble algorithm, Adaboost with decision stumps, with orthologous sequence collection, has performed the best (namely, TFmedic). We have further compared TFmedic with other state-of-the-arts methods (i.e., PolyPhen-2 and SIFT) on PolyPhen-2's own datasets, demonstrating that TFmedic can outperform the others. As applications, we have further applied TFmedic to all possible missense mutations on all human transcription factors; the proteome-wide results reveal interesting insights, consistent with the existing physiochemical knowledge. A case study with the actual 3D structure is conducted, revealing how TFmedic can be contributed to protein-DNA binding complex studies.


Assuntos
Aprendizado de Máquina , Polimorfismo de Nucleotídeo Único/genética , Fatores de Transcrição/genética , Algoritmos , Biologia Computacional , Mineração de Dados , Humanos , Mutação de Sentido Incorreto/genética
13.
iScience ; 15: 332-341, 2019 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-31103852

RESUMO

The early detection of cancers has the potential to save many lives. A recent attempt has been demonstrated successful. However, we note several critical limitations. Given the central importance and broad impact of early cancer detection, we aspire to address those limitations. We explore different supervised learning approaches for multiple cancer type detection and observe significant improvements; for instance, one of our approaches (i.e., CancerA1DE) can double the existing sensitivity from 38% to 77% for the earliest cancer detection (i.e., Stage I) at the 99% specificity level. For Stage II, it can even reach up to about 90% across multiple cancer types. In addition, CancerA1DE can also double the existing sensitivity from 30% to 70% for detecting breast cancers at the 99% specificity level. Data and model analysis are conducted to reveal the underlying reasons. A website is built at http://cancer.cs.cityu.edu.hk/.

14.
IEEE Trans Cybern ; 47(2): 415-424, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26887021

RESUMO

Protein binding microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner. In this paper, we describe the PBM motif model building problem. We apply several evolutionary computation methods and compare their performance with the interior point method, demonstrating their performance advantages. In addition, given the PBM domain knowledge, we propose and describe a novel method called kmerGA which makes domain-specific assumptions to exploit PBM data properties to build more accurate models than the other models built. The effectiveness and robustness of kmerGA is supported by comprehensive performance benchmarking on more than 200 datasets, time complexity analysis, convergence analysis, parameter analysis, and case studies. To demonstrate its utility further, kmerGA is applied to two real world applications: 1) PBM rotation testing and 2) ChIP-Seq peak sequence prediction. The results support the biological relevance of the models learned by kmerGA, and thus its real world applicability.


Assuntos
Sítios de Ligação , Biologia Computacional/métodos , Análise Serial de Proteínas/métodos , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Algoritmos , Bases de Dados de Proteínas , Modelos Teóricos , Ligação Proteica , Fatores de Transcrição/genética
15.
IEEE Trans Nanobioscience ; 16(1): 43-50, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27893398

RESUMO

Understanding genome-wide protein-DNA interaction signals forms the basis for further focused studies in gene regulation. In particular, the chromatin immunoprecipitation with massively parallel DNA sequencing technology (ChIP-Seq) can enable us to measure the in vivo genome-wide occupancy of the DNA-binding protein of interest in a single run. Multiple ChIP-Seq runs thus inherent the potential for us to decipher the combinatorial occupancies of multiple DNA-binding proteins. To handle the genome-wide signal profiles from those multiple runs, we propose to integrate regularized regression functions (i.e., LASSO, Elastic Net, and Ridge Regression) into the well-established SignalRanker and FullSignalRanker frameworks, resulting in six additional probabilistic models for inference on multiple normalized genome-wide signal profiles. The corresponding model training algorithms are devised with computational complexity analysis. Comprehensive benchmarking is conducted to demonstrate and compare the performance of nine related probabilistic models on the ENCODE ChIP-Seq datasets. The results indicate that the regularized SignalRanker models, in contrast to the original SignalRanker models, can demonstrate excellent inference performance comparable to the FullSignalRanker models with low model complexities and time complexities. Such a feature is especially valuable in the context of the rapidly growing genome-wide signal profile data in the recent years.


Assuntos
Imunoprecipitação da Cromatina/métodos , Genômica/métodos , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Genoma/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Células K562
16.
Artigo em Inglês | MEDLINE | ID: mdl-27045826

RESUMO

Transcription factor binding sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, protein binding microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k = 8∼10). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build TFBS (also known as DNA motif) models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement if choosing di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.


Assuntos
Sítios de Ligação , DNA/química , DNA/metabolismo , Motivos de Nucleotídeos , Análise Serial de Proteínas/métodos , Fatores de Transcrição/metabolismo , Algoritmos , Imunoprecipitação da Cromatina , Biologia Computacional/métodos , Ligação Proteica , Fatores de Transcrição/química
17.
Bioinformatics ; 32(3): 321-4, 2016 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-26411866

RESUMO

MOTIVATION: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. RESULTS: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. AVAILABILITY AND IMPLEMENTATION: The identified motif pair data is compressed and available in the supplementary materials associated with this manuscript. CONTACT: kc.w@cityu.edu.hk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Cromatina/química , Elementos Reguladores de Transcrição , Análise de Sequência de DNA/métodos , Sítios de Ligação , Cromatina/metabolismo , Desoxirribonuclease I , Genômica , Humanos , Células K562 , Motivos de Nucleotídeos , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo
18.
Artigo em Inglês | MEDLINE | ID: mdl-26671811

RESUMO

With the prevalence of chromatin immunoprecipitation (ChIP) with sequencing (ChIP-Seq) technology, massive ChIP-Seq data has been accumulated. The ChIP-Seq technology measures the genome-wide occupancy of DNA-binding proteins in vivo. It is well-known that different DNA-binding protein occupancies may result in a gene being regulated in different conditions (e.g. different cell types). To fully understand a gene's function, it is essential to develop probabilistic models on multiple ChIP-Seq profiles for deciphering the gene transcription causalities. In this work, we propose and describe two probabilistic models. Assuming the conditional independence of different DNA-binding proteins' occupancies, the first method (SignalRanker) is developed as an intuitive method for ChIP-Seq genome-wide signal profile inference. Unfortunately, such an assumption may not always hold in some gene regulation cases. Thus, we propose and describe another method (FullSignalRanker) which does not make the conditional independence assumption. The proposed methods are compared with other existing methods on ENCODE ChIP-Seq datasets, demonstrating its regression and classification ability. The results suggest that FullSignalRanker is the best-performing method for recovering the signal ranks on the promoter and enhancer regions. In addition, FullSignalRanker is also the best-performing method for peak sequence classification. We envision that SignalRanker and FullSignalRanker will become important in the era of next generation sequencing. FullSignalRanker program is available on the following website: http://www.cs.toronto.edu/~wkc/FullSignalRanker/.


Assuntos
DNA/química , DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Modelos Estatísticos , Fatores de Transcrição/química , Fatores de Transcrição/genética , Algoritmos , Sequência de Bases , Sítios de Ligação , Simulação por Computador , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão , Valores de Referência , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
19.
Nucleic Acids Res ; 43(21): 10180-9, 2015 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-26527718

RESUMO

The protein-DNA interactions between transcription factors and transcription factor binding sites are essential activities in gene regulation. To decipher the binding codes, it is a long-standing challenge to understand the binding mechanism across different transcription factor DNA binding families. Past computational learning studies usually focus on learning and predicting the DNA binding residues on protein side. Taking into account both sides (protein and DNA), we propose and describe a computational study for learning the specificity-determining residue-nucleotide interactions of different known DNA-binding domain families. The proposed learning models are compared to state-of-the-art models comprehensively, demonstrating its competitive learning performance. In addition, we describe and propose two applications which demonstrate how the learnt models can provide meaningful insights into protein-DNA interactions across different DNA binding families.


Assuntos
Proteínas de Ligação a DNA/química , DNA/química , Análise de Sequência de Proteína/métodos , Sítios de Ligação , Biologia Computacional/métodos , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Humanos , Aprendizado de Máquina , Modelos Moleculares , Motivos de Nucleotídeos , Matrizes de Pontuação de Posição Específica , Ligação Proteica , Estrutura Terciária de Proteína , Análise de Sequência de DNA
20.
Bioinformatics ; 31(1): 17-24, 2015 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-25192742

RESUMO

MOTIVATION: Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-Seq) measures the genome-wide occupancy of transcription factors in vivo. Different combinations of DNA-binding protein occupancies may result in a gene being expressed in different tissues or at different developmental stages. To fully understand the functions of genes, it is essential to develop probabilistic models on multiple ChIP-Seq profiles to decipher the combinatorial regulatory mechanisms by multiple transcription factors. RESULTS: In this work, we describe a probabilistic model (SignalSpider) to decipher the combinatorial binding events of multiple transcription factors. Comparing with similar existing methods, we found SignalSpider performs better in clustering promoter and enhancer regions. Notably, SignalSpider can learn higher-order combinatorial patterns from multiple ChIP-Seq profiles. We have applied SignalSpider on the normalized ChIP-Seq profiles from the ENCODE consortium and learned model instances. We observed different higher-order enrichment and depletion patterns across sets of proteins. Those clustering patterns are supported by Gene Ontology (GO) enrichment, evolutionary conservation and chromatin interaction enrichment, offering biological insights for further focused studies. We also proposed a specific enrichment map visualization method to reveal the genome-wide transcription factor combinatorial patterns from the models built, which extend our existing fine-scale knowledge on gene regulation to a genome-wide level. AVAILABILITY AND IMPLEMENTATION: The matrix-algebra-optimized executables and source codes are available at the authors' websites: http://www.cs.toronto.edu/∼wkc/SignalSpider.


Assuntos
Imunoprecipitação da Cromatina , Biologia Computacional/métodos , Proteínas de Ligação a DNA/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Reconhecimento Automatizado de Padrão , Software , Fatores de Transcrição/metabolismo , Análise por Conglomerados , Proteínas de Ligação a DNA/genética , Humanos , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA