Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Bioinformatics ; 34(20): 3446-3453, 2018 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-29757349

RESUMO

Motivation: Transcription factor (TF) binds to the promoter region of a gene to control gene expression. Identifying precise TF binding sites (TFBSs) is essential for understanding the detailed mechanisms of TF-mediated gene regulation. However, there is a shortage of computational approach that can deliver single base pair resolution prediction of TFBS. Results: In this paper, we propose DeepSNR, a Deep Learning algorithm for predicting TF binding location at Single Nucleotide Resolution de novo from DNA sequence. DeepSNR adopts a novel deconvolutional network (deconvNet) model and is inspired by the similarity to image segmentation by deconvNet. The proposed deconvNet architecture is constructed on top of 'DeepBind' and we trained the entire model using TF-specific data from ChIP-exonuclease (ChIP-exo) experiments. DeepSNR has been shown to outperform motif search-based methods for several evaluation metrics. We have also demonstrated the usefulness of DeepSNR in the regulatory analysis of TFBS as well as in improving the TFBS prediction specificity using ChIP-seq data. Availability and implementation: DeepSNR is available open source in the GitHub repository (https://github.com/sirajulsalekin/DeepSNR). Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Fatores de Transcrição/metabolismo , Algoritmos , Pareamento de Bases , Sítios de Ligação , Humanos , Ligação Proteica , Software , Fatores de Transcrição/química
2.
BMC Bioinformatics ; 18(1): 313, 2017 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-28645323

RESUMO

BACKGROUND: Identifying disease correlated features early before large number of molecules are impacted by disease progression with significant abundance change is very advantageous to biologists for developing early disease diagnosis biomarkers. Disease correlated features have relatively low level of abundance change at early stages. Finding them using existing bioinformatic tools in high throughput data is a challenging task since the technology suffers from limited dynamic range and significant noise. Most existing biomarker discovery algorithms can only detect molecules with high abundance changes, frequently missing early disease diagnostic markers. RESULTS: We present a new statistic called early response index (ERI) to prioritize disease correlated molecules as potential early biomarkers. Instead of classification accuracy, ERI measures the average classification accuracy improvement attainable by a feature when it is united with other counterparts for classification. ERI is more sensitive to abundance changes than other ranking statistics. We have shown that ERI significantly outperforms SAM and Localfdr in detecting early responding molecules in a proteomics study of a mouse model of multiple sclerosis. Importantly, ERI was able to detect many disease relevant proteins before those algorithms detect them at a later time point. CONCLUSIONS: ERI method is more sensitive for significant feature detection during early stage of disease development. It potentially has a higher specificity for biomarker discovery, and can be used to identify critical time frame for disease intervention.


Assuntos
Biomarcadores/metabolismo , Esclerose Múltipla/diagnóstico , Proteômica/métodos , Algoritmos , Animais , Sistema Nervoso Central/metabolismo , Diagnóstico Precoce , Camundongos , Esclerose Múltipla/metabolismo , Esclerose Múltipla/patologia , Proteoma/metabolismo , Fatores de Tempo
3.
Rapid Commun Mass Spectrom ; 29(19): 1841-8, 2015 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26331936

RESUMO

RATIONALE: Without accurate peak linking/alignment, only the expression levels of a small percentage of proteins can be compared across multiple samples in Liquid Chromatography/Mass Spectrometry/Tandem Mass Spectrometry (LC/MS/MS) due to the selective nature of tandem MS peptide identification. This greatly hampers biomedical research that aims at finding biomarkers for disease diagnosis, treatment, and the understanding of disease mechanisms. A recent algorithm, PeakLink, has allowed the accurate linking of LC/MS peaks without tandem MS identifications to their corresponding ones with identifications across multiple samples collected from different instruments, tissues and labs, which greatly enhanced the ability of comparing proteins. However, PeakLink cannot be implemented practically for large numbers of samples based on existing software architectures, because it requires access to peak elution profiles from multiple LC/MS/MS samples simultaneously. METHODS: We propose a new architecture based on parallel processing, which extracts LC/MS peak features, and saves them in database files to enable the implementation of PeakLink for multiple samples. The software has been deployed in High-Performance Computing (HPC) environments. The core part of the software, MZDASoft Parallel Peak Extractor (PPE), can be downloaded with a user and developer's guide, and it can be run on HPC centers directly. The quantification applications, MZDASoft TandemQuant and MZDASoft PeakLink, are written in Matlab, which are compiled with a Matlab runtime compiler. A sample script that incorporates all necessary processing steps of MZDASoft for LC/MS/MS quantification in a parallel processing environment is available. The project webpage is http://compgenomics.utsa.edu/zgroup/MZDASoft. RESULTS: The proposed architecture enables the implementation of PeakLink for multiple samples. Significantly more (100%-500%) proteins can be compared over multiple samples with better quantification accuracy in test cases. CONCLUSION: MZDASoft enables large-scale comparison of protein expression levels over multiple samples with much larger protein comparison coverage and better quantification accuracy. It is an efficient implementation based on parallel processing which can be used to process large amounts of data.


Assuntos
Cromatografia Líquida/métodos , Proteômica/métodos , Software , Espectrometria de Massas em Tandem/métodos , Algoritmos , Proteínas/análise
4.
Proteome Sci ; 9 Suppl 1: S9, 2011 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-22166063

RESUMO

BACKGROUND: Transcriptional regulation by transcription factor (TF) controls the time and abundance of mRNA transcription. Due to the limitation of current proteomics technologies, large scale measurements of protein level activities of TFs is usually infeasible, making computational reconstruction of transcriptional regulatory network a difficult task. RESULTS: We proposed here a novel Bayesian non-negative factor model for TF mediated regulatory networks. Particularly, the non-negative TF activities and sample clustering effect are modeled as the factors from a Dirichlet process mixture of rectified Gaussian distributions, and the sparse regulatory coefficients are modeled as the loadings from a sparse distribution that constrains its sparsity using knowledge from database; meantime, a Gibbs sampling solution was developed to infer the underlying network structure and the unknown TF activities simultaneously. The developed approach has been applied to simulated system and breast cancer gene expression data. Result shows that, the proposed method was able to systematically uncover TF mediated transcriptional regulatory network structure, the regulatory coefficients, the TF protein level activities and the sample clustering effect. The regulation target prediction result is highly coordinated with the prior knowledge, and sample clustering result shows superior performance over previous molecular based clustering method. CONCLUSIONS: The results demonstrated the validity and effectiveness of the proposed approach in reconstructing transcriptional networks mediated by TFs through simulated systems and real data.

5.
Front Phys ; 82020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33274189

RESUMO

Epitranscriptome is an exciting area that studies different types of modifications in transcripts and the prediction of such modification sites from the transcript sequence is of significant interest. However, the scarcity of positive sites for most modifications imposes critical challenges for training robust algorithms. To circumvent this problem, we propose MR-GAN, a generative adversarial network (GAN) based model, which is trained in an unsupervised fashion on the entire pre-mRNA sequences to learn a low dimensional embedding of transcriptomic sequences. MR-GAN was then applied to extract embeddings of the sequences in a training dataset we created for eight epitranscriptome modifications, including m6A, m1A, m1G, m2G, m5C, m5U, 2'-O-Me, Pseudouridine (Ψ) and Dihydrouridine (D), of which the positive samples are very limited. Prediction models were trained based on the embeddings extracted by MR-GAN. We compared the prediction performance with the one-hot encoding of the training sequences and SRAMP, a state-of-the-art m6A site prediction algorithm and demonstrated that the learned embeddings outperform one-hot encoding by a significant margin for up to 15% improvement. Using MR-GAN, we also investigated the sequence motifs for each modification type and uncovered known motifs as well as new motifs not possible with sequences directly. The results demonstrated that transcriptome features extracted using unsupervised learning could lead to high precision for predicting multiple types of epitranscriptome modifications, even when the data size is small and extremely imbalanced.

6.
Mol Inform ; 36(4)2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28000384

RESUMO

In the past decades, a few synergistic feature selection algorithms have been published, which includes Cooperative Index (CI) and K-Top Scoring Pair (k-TSP). These algorithms consider the synergistic behavior of features when they are included in a feature panel. Although promising results have been shown for these algorithms, there is lack of a comprehensive and fair comparison with other feature selection algorithms across a large number of microarray datasets in terms of classification accuracy and computational complexity. There is a need in evaluating their performance and reducing the complexity of such algorithms. We compared the performance of synergistic feature selection algorithms with 11 other commonly used algorithms based on 22 microarray gene expression binary class datasets. The evaluation confirms that synergistic algorithms such as CI and k-TSP will gradually increase the classification performance as more features are used in the classifiers. Also, in order to cut down computational cost, we proposed a new feature selection ranking score called Positive Synergy Index (PSI). Testing results show that features selected using PSI as well as synergistic feature selection algorithms provide better performance compared to with all other methods, while PSI has a computational complexity significantly lower than that of other synergistic algorithms.


Assuntos
Algoritmos , Análise em Microsséries , Humanos , Neoplasias/metabolismo , Neoplasias/patologia , Máquina de Vetores de Suporte
7.
PLoS One ; 8(10): e72951, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24115998

RESUMO

In liquid chromatography-mass spectrometry (LC-MS), parts of LC peaks are often corrupted by their co-eluting peptides, which results in increased quantification variance. In this paper, we propose to apply accurate LC peak boundary detection to remove the corrupted part of LC peaks. Accurate LC peak boundary detection is achieved by checking the consistency of intensity patterns within peptide elution time ranges. In addition, we remove peptides with erroneous mass assignment through model fitness check, which compares observed intensity patterns to theoretically constructed ones. The proposed algorithm can significantly improve the accuracy and precision of peptide ratio measurements.


Assuntos
Cromatografia Líquida/métodos , Espectrometria de Massas/métodos , Peptídeos/análise , Algoritmos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA