Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34607350

RESUMO

Identifying cis-regulatory motifs from genomic sequencing data (e.g. ChIP-seq and CLIP-seq) is crucial in identifying transcription factor (TF) binding sites and inferring gene regulatory mechanisms for any organism. Since 2015, deep learning (DL) methods have been widely applied to identify TF binding sites and predict motif patterns, with the strengths of offering a scalable, flexible and unified computational approach for highly accurate predictions. As far as we know, 20 DL methods have been developed. However, without a clear and systematic assessment, users will struggle to choose the most appropriate tool for their specific studies. In this manuscript, we evaluated 20 DL methods for cis-regulatory motif prediction using 690 ENCODE ChIP-seq, 126 cancer ChIP-seq and 55 RNA CLIP-seq data. Four metrics were investigated, including the accuracy of motif finding, the performance of DNA/RNA sequence classification, algorithm scalability and tool usability. The assessment results demonstrated the high complementarity of the existing DL methods. It was determined that the most suitable model should primarily depend on the data size and type and the method's outputs.


Assuntos
Aprendizado Profundo , Algoritmos , Sequência de Bases , Sítios de Ligação/genética , Imunoprecipitação da Cromatina , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
2.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36208174

RESUMO

Multiple types of non-canonical nucleic acid structures play essential roles in DNA recombination and replication, transcription, and genomic instability and have been associated with several human diseases. Thus, an increasing number of experimental and bioinformatics methods have been developed to identify these structures. To date, most reviews have focused on the features of non-canonical DNA/RNA structure formation, experimental approaches to mapping these structures, and the association of these structures with diseases. In addition, two reviews of computational algorithms for the prediction of non-canonical nucleic acid structures have been published. One of these reviews focused only on computational approaches for G4 detection until 2020. The other mainly summarized the computational tools for predicting cruciform, H-DNA and Z-DNA, in which the algorithms discussed were published before 2012. Since then, several experimental and computational methods have been developed. However, a systematic review including the conformation, sequencing mapping methods and computational prediction strategies for these structures has not yet been published. The purpose of this review is to provide an updated overview of conformation, current sequencing technologies and computational identification methods for non-canonical nucleic acid structures, as well as their strengths and weaknesses. We expect that this review will aid in understanding how these structures are characterised and how they contribute to related biological processes and diseases.


Assuntos
Quadruplex G , Humanos , RNA/genética , RNA/química , Conformação de Ácido Nucleico , Estruturas R-Loop , DNA/genética
3.
J Theor Biol ; 463: 92-98, 2019 02 21.
Artigo em Inglês | MEDLINE | ID: mdl-30528447

RESUMO

MOTIVATION: In vivo discovery of G-quadruplex-forming sequences would provide the most relevant G-quadruplexes along a genomic DNA or an RNA molecule, however it is difficult to perform due to the small size of G-quadruplexes, the existence of different topologies, and the additional influence of environmental factors and ligands present during experimentation. In vitro discovery on the other hand is not only unable to simulate in vivo conditions but also, is not practical for large sequences due to limited resources. The immediate solution continues to be the computational prediction although, not always in agreement with experimental findings. This is often due to features that are not conventionally accepted for G-quadruplexes such as disrupted G-tracts or extremely long loops. RESULTS: Here, we propose a novel tool for the discovery of putative G-quadruplexes with better accuracy through consideration of the features of previously missed G-quadruplex-forming sequences. Comparing against a set of experimentally confirmed sequences, a sensitivity as high as 99% and Youden's J-statistics of as high as 0.91 is achieved; an improvement over other computational approaches. More importantly, we showed that the allowance of a single atypical G-tract which includes a mismatched or a bulging non-guanine nucleotide, and a single loop of extreme size benefits the overall prediction. AVAILABILITY AND IMPLEMENTATION: The python code may be found at http://github.com/odoluca/G4Catchall and the web application at http://homes.ieu.edu.tr/odoluca/G4Catchall.


Assuntos
Sequência de Bases , Quadruplex G , Software , Biologia Computacional/métodos , DNA/química , Conformação de Ácido Nucleico , Motivos de Nucleotídeos
4.
Subcell Biochem ; 89: 139-155, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30378022

RESUMO

Fungal peroxisomes are characterized by a number of specific biological functions. To understand the physiology and biochemistry of these organelles knowledge of the proteome content is crucial. Here, we address different strategies to predict peroxisomal proteins by bioinformatics approaches. These tools range from simple text searches to network based learning strategies. A complication of this analysis is the existence of cryptic peroxisomal proteins, which are overlooked in conventional bioinformatics queries. These include proteins where targeting information results from transcriptional and posttranscriptional alterations. But also proteins with low efficiency targeting motifs that are predominantly localized in the cytosol, and proteins lacking any canonical targeting information, can play important roles within peroxisomes. Many of these proteins are so far unpredictable. Detection and characterization of these cryptic peroxisomal proteins revealed the presence of novel peroxisomal enzymatic reaction networks in fungi.


Assuntos
Proteínas Fúngicas/metabolismo , Fungos/química , Fungos/citologia , Peroxissomos/metabolismo , Proteômica , Fungos/enzimologia , Peroxissomos/química , Peroxissomos/enzimologia , Transporte Proteico , Proteoma/química , Proteoma/metabolismo
5.
Adv Exp Med Biol ; 870: 291-318, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26387106

RESUMO

Short, linear motifs (SLiMs) in proteins are functional microdomains consisting of contiguous residue segments along the protein sequence, typically not more than 10 consecutive amino acids in length with less than 5 defined positions. Many positions are 'degenerate' thus offering flexibility in terms of the amino acid types allowed at those positions. Their short length and degenerate nature confers evolutionary plasticity meaning that SLiMs often evolve convergently. Further, SLiMs have a propensity to occur within intrinsically unstructured protein segments and this confers versatile functionality to unstructured regions of the proteome. SLiMs mediate multiple types of protein interactions based on domain-peptide recognition and guide functions including posttranslational modifications, subcellular localization of proteins, and ligand binding. SLiMs thus behave as modular interaction units that confer versatility to protein function and SLiM-mediated interactions are increasingly being recognized as therapeutic targets. In this chapter we start with a brief description about the properties of SLiMs and their interactions and then move on to discuss algorithms and tools including several web-based methods that enable the discovery of novel SLiMs (de novo motif discovery) as well as the prediction of novel occurrences of known SLiMs. Both individual amino acid sequences as well as sets of protein sequences can be scanned using these methods to obtain statistically overrepresented sequence patterns. Lists of putatively functional SLiMs are then assembled based on parameters such as evolutionary sequence conservation, disorder scores, structural data, gene ontology terms and other contextual information that helps to assess the functional credibility or significance of these motifs. These bioinformatics methods should certainly guide experiments aimed at motif discovery.


Assuntos
Motivos de Aminoácidos , Biologia Computacional , Proteínas Intrinsicamente Desordenadas/química , Algoritmos , Sequência de Aminoácidos , Dados de Sequência Molecular , Conformação Proteica , Homologia de Sequência de Aminoácidos
6.
J Integr Plant Biol ; 56(10): 1020-31, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24783971

RESUMO

The RNA-binding glycine-rich protein (RB-GRP) family is characterized by the presence of a glycine-rich domain arranged in (Gly)n-X repeats and an RNA-recognition motif (RRM). RB-GRPs participate in varied physiological and biochemical processes especially in the stress response of plants. In this study, a total of 23 RB-GRPs distributed on 10 chromosomes were identified in maize (Zea mays L.), and they were divided into four subgroups according to their conserved domain architecture. Five pairs of paralogs were identified, while none of them was located on the same chromosomal region, suggesting that segmental duplication is predominant in the duplication events of the RB-GRPs in maize. Comparative analysis of RB-GRPs in maize, Arabidopsis (Arabidopsis thaliana L.), rice (Oryza sativa L.), and wheat (Triticum aestivum) revealed that two exclusive subgroups were only identified in maize. Expression of eight ZmRB-GRPs was significantly regulated by at least two kinds of stresses. In addition, cis-elements predicted in the promoter regions of the ZmRB-GRPs also indicated that these ZmRB-GRPs would be involved in stress response of maize. The preliminary genome-wide analysis of the RB-GRPs in maize would provide useful information for further study on the function of the ZmRB-GRPs.


Assuntos
Proteínas de Plantas/genética , Proteínas de Ligação a RNA/genética , Zea mays/genética , Mapeamento Cromossômico , Cromossomos de Plantas , Evolução Molecular , Duplicação Gênica , Genoma de Planta , Família Multigênica , Filogenia , Proteínas de Plantas/metabolismo , Regiões Promotoras Genéticas , Proteínas de Ligação a RNA/metabolismo , Estresse Fisiológico , Zea mays/metabolismo
7.
Bio Protoc ; 14(13): e5023, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-39007158

RESUMO

In recent years, the increase in genome sequencing across diverse plant species has provided a significant advantage for phylogenomics studies, allowing the analysis of one of the most diverse gene families in plants: nucleotide-binding leucine-rich repeat receptors (NLRs). However, due to the sequence diversity of the NLR gene family, identifying key molecular features and functionally conserved sequence patterns is challenging through multiple sequence alignment. Here, we present a step-by-step protocol for a computational pipeline designed to identify evolutionarily conserved motifs in plant NLR proteins. In this protocol, we use a large-scale NLR dataset, including 1,862 NLR genes annotated from monocot and dicot species, to predict conserved sequence motifs, such as the MADA and EDVID motifs, within the coiled-coil (CC)-NLR subfamily. Our pipeline can be applied to identify molecular signatures that have remained conserved in the gene family over evolutionary time across plant species. Key features • Phylogenomics analysis of plant NLR immune receptor family. • Identification of functionally conserved sequence patterns among plant NLRs.

8.
Biology (Basel) ; 13(8)2024 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-39194501

RESUMO

Small open reading frames (sORFs; <300 nucleotides or <100 amino acids) are widespread across all genomes, and an increasing variety of them appear to be translating from non-genic regions. Over the past few decades, peptides produced from sORFs have been identified as functional in various organisms, from bacteria to humans. Despite recent advances in next-generation sequencing and proteomics, accurate annotation and classification of sORFs remain a rate-limiting step toward reliable and high-throughput detection of small proteins from non-genic regions. Additionally, the cost of computational methods utilizing machine learning is lower than that of biological experiments, and they can be employed to detect sORFs, laying the groundwork for biological experiments. We present D-sORF, a machine-learning framework that integrates the statistical nucleotide context and motif information around the start codon to predict coding sORFs. D-sORF scores directly for coding identity and requires only the underlying genomic sequence, without incorporating parameters such as the conservation, which, in the case of sORFs, may increase the dispersion of scores within the significantly less conserved non-genic regions. D-sORF achieves 94.74% precision and 92.37% accuracy for small ORFs (using the 99 nt medium length window). When D-sORF is applied to sORFs associated with ribosomes, the identification of transcripts producing peptides (annotated by the Ensembl IDs) is similar to or superior to experimental methodologies based on ribosome-sequencing (Ribo-Seq) profiling. In parallel, the recognition of putative negative data, such as the intron-containing transcripts that associate with ribosomes, remains remarkably low, indicating that D-sORF could be efficiently applied to filter out false-positive sORFs from Ribo-Seq data because of the non-productive ribosomal binding or noise inherent in these protocols.

9.
Bioinformation ; 18(1): 19-25, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35815200

RESUMO

Hepatitis E virus (HEV) is a major causative agent of acute hepatitis in developing countries. The Norway rat HEV genome consists of six open reading frames (ORFs), i.e., ORF1, ORF2, ORF3, ORF4, ORF5 and ORF6. The additional reading frame encoded protein ORF5 is attributed to life cycle of rat HEV. The ORFF5 protein's function remains undetermined. Therefore, it is of interest to analyze the ORF5 protein for its physiochemical properties, primary structure, secondary structure, tertiary structure and functional characteristics using bioinformatics tools. Analysis of the ORF5 protein revealed it as highly unstable, hydrophilic with basic pI. The ORF5 protein consisted mostly of Arg, Pro, Ser, Leu and Gly. The 3D structural homology model of the ORF5 protein generated showed mixed α/ß structural fold with predominance of coils. Structural analysis revealed the presence of clefts, pores and a tunnel. This data will help in the sequence, structure and functional annotation of ORF5.

10.
Viruses ; 15(1)2022 12 29.
Artigo em Inglês | MEDLINE | ID: mdl-36680141

RESUMO

SARS-CoV-2 Omicron (B.1.1.529) lineages rapidly became dominant in various countries reflecting its enhanced transmissibility and ability to escape neutralizing antibodies. Although T cells induced by ancestral SARS-CoV-2-based vaccines also recognize Omicron variants, we showed in our previous study that there was a marked loss of T cell cross-reactivity to spike epitopes harboring Omicron BA.1 mutations. The emerging BA.4/BA.5 subvariants carry other spike mutations than the BA.1 variant. The present study aims to investigate the impact of BA.4/BA.5 spike mutations on T cell cross-reactivity at the epitope level. Here, we focused on universal T-helper epitopes predicted to be presented by multiple common HLA class II molecules for broad population coverage. Fifteen universal T-helper epitopes of ancestral spike, which contain mutations in the Omicron BA.4/BA.5 variants, were identified utilizing a bioinformatic tool. T cells isolated from 10 subjects, who were recently vaccinated with mRNA-based BNT162b2, were tested for functional cross-reactivity between epitopes of ancestral SARS-CoV-2 spike and the Omicron BA.4/BA.5 spike counterparts. Reduced T cell cross-reactivity in one or more vaccinees was observed against 87% of the tested 15 non-conserved CD4+ T cell epitopes. These results should be considered for vaccine boosting strategies to protect against Omicron BA.4/BA.5 and future SARS-CoV-2 variants.


Assuntos
Vacina BNT162 , COVID-19 , Humanos , COVID-19/prevenção & controle , SARS-CoV-2/genética , Linfócitos T , Mutação , Anticorpos Neutralizantes , Vacinas contra COVID-19 , Epitopos de Linfócito T/genética , Glicoproteína da Espícula de Coronavírus/genética , Anticorpos Antivirais
11.
Viruses ; 14(7)2022 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-35891550

RESUMO

Omicron BA.1 variant can readily infect people with vaccine-induced or naturally acquired SARS-CoV-2 immunity facilitated by escape from neutralizing antibodies. In contrast, T-cell reactivity against the Omicron BA.1 variant seems relatively well preserved. Here, we studied the preexisting T cells elicited by either vaccination with the mRNA-based BNT162b2 vaccine or by natural infection with ancestral SARS-CoV-2 for their cross-reactive potential to 20 selected CD4+ T-cell epitopes of spike-protein-harboring Omicron BA.1 mutations. Although the overall memory CD4+ T-cell responses primed by the ancestral spike protein was still preserved generally, we show here that there is also a clear loss of memory CD4+ T-cell cross-reactivity to immunodominant epitopes across the spike protein due to Omicron BA.1 mutations. Complete or partial loss of preexisting T-cell responsiveness was observed against 60% of 20 nonconserved CD4+ T-cell epitopes predicted to be presented by a broad set of common HLA class II alleles. Monitoring such mutations in circulating strains helps predict which virus variants may escape previously induced cellular immunity and could be of concern.


Assuntos
COVID-19 , SARS-CoV-2 , Glicoproteína da Espícula de Coronavírus , Linfócitos T , Anticorpos Neutralizantes , Anticorpos Antivirais , Vacina BNT162 , COVID-19/imunologia , COVID-19/prevenção & controle , Epitopos de Linfócito T/genética , Humanos , Glicoproteínas de Membrana , Mutação , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus/genética , Linfócitos T/imunologia , Proteínas do Envelope Viral/genética
12.
mSystems ; 6(4): e0052621, 2021 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-34254822

RESUMO

Much of our knowledge of bacterial transcription initiation has been derived from studying the promoters of Escherichia coli and Bacillus subtilis. Given the expansive diversity across the bacterial phylogeny, it is unclear how much of this knowledge can be applied to other organisms. Here, we report on bioinformatic analyses of promoter sequences of the primary σ factor (σ70) by leveraging publicly available transcription start site (TSS) sequencing data sets for nine bacterial species spanning five phyla. This analysis identifies previously unreported differences in the -35 and -10 elements of σ70-dependent promoters in several groups of bacteria. We found that Actinobacteria and Betaproteobacteria σ70-dependent promoters lack the TTG triad in their -35 element, which is predicted to be conserved across the bacterial phyla. In addition, the majority of the Alphaproteobacteria σ70-dependent promoters analyzed lacked the thymine at position -7 that is highly conserved in other phyla. Bioinformatic examination of the Alphaproteobacteria σ70-dependent promoters identifies a significant overrepresentation of essential genes and ones encoding proteins with common cellular functions downstream of promoters containing an A, C, or G at position -7. We propose that transcription of many σ70-dependent promoters in Alphaproteobacteria depends on the transcription factor CarD, which is an essential protein in several members of this phylum. Our analysis expands the knowledge of promoter architecture across the bacterial phylogeny and provides new information that can be used to engineer bacteria for use in medical, environmental, agricultural, and biotechnological processes. IMPORTANCE Transcription of DNA to RNA by RNA polymerase is essential for cells to grow, develop, and respond to stress. Understanding the process and control of transcription is important for health, disease, the environment, and biotechnology. Decades of research on a few bacteria have identified promoter DNA sequences that are recognized by the σ subunit of RNA polymerase. We used bioinformatic analyses to reveal previously unreported differences in promoter DNA sequences across the bacterial phylogeny. We found that many Actinobacteria and Betaproteobacteria promoters lack a sequence in their -35 DNA recognition element that was previously assumed to be conserved and that Alphaproteobacteria lack a thymine residue at position -7, also previously assumed to be conserved. Our work reports important new information about bacterial transcription, illustrates the benefits of studying bacteria across the phylogenetic tree, and proposes new lines of future investigation.

13.
BMC Mol Cell Biol ; 22(1): 9, 2021 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-33509084

RESUMO

BACKGROUND: Leucine-rich-repeat receptor-like kinases (LRR-RLKs) play central roles in sensing various signals to regulate plant development and environmental responses. The extracellular domains (ECDs) of plant LRR-RLKs contain LRR motifs, consisting of highly conserved residues and variable residues, and are responsible for ligand perception as a receptor or co-receptor. However, there are few comprehensive studies on the ECDs of LRR-RLKs due to the difficulty in effectively identifying the divergent LRR repeats. RESULTS: In the current study, an efficient LRR motif prediction program, the "Phyto-LRR prediction" program, was developed based on the position-specific scoring matrix algorithm (PSSM) with some optimizations. This program was trained by 16-residue plant-specific LRR-highly conserved segments (HCS) from LRR-RLKs of 17 represented land plant species and a database containing more than 55,000 predicted LRRs based on this program was constructed. Both the prediction tool and database are freely available at http://phytolrr.com/ for website usage and at http://github.com/phytolrr for local usage. The LRR-RLKs were classified into 18 subgroups (SGs) according to the maximum-likelihood phylogenetic analysis of kinase domains (KDs) of the sequences. Based on the database and the SGs, the characteristics of the LRR motifs in the ECDs of the LRR-RLKs were examined, such as the arrangement of the LRRs, the solvent accessibility, the variable residues, and the N-glycosylation sites, revealing a comprehensive profile of the plant LRR-RLK ectodomains. CONCLUSION: The "Phyto-LRR prediction" program is effective in predicting the LRR segments in plant LRR-RLKs, which, together with the database, will facilitate the exploration of plant LRR-RLKs functions. Based on the database, comprehensive sequential characteristics of the plant LRR-RLK ectodomains were profiled and analyzed.


Assuntos
Proteínas de Plantas/metabolismo , Proteínas Serina-Treonina Quinases/metabolismo , Proteínas/metabolismo , Motivos de Aminoácidos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Glicosilação , Proteínas de Repetições Ricas em Leucina , Proteínas de Plantas/química , Domínios Proteicos , Proteínas Serina-Treonina Quinases/química , Proteínas/química , Especificidade da Espécie
14.
Methods Mol Biol ; 2141: 37-72, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32696352

RESUMO

Short linear motifs (SLiMs) are important mediators of interactions between intrinsically disordered regions of proteins and their interaction partners. Here, we detail instructions for the computational prediction of SLiMs in disordered protein regions, using the main tools of the SLiMSuite package: (1) SLiMProb identifies and calculates enrichment of predefined motifs in a set of proteins; (2) SLiMFinder predicts SLiMs de novo in a set of proteins, accounting for evolutionary relationships; (3) QSLiMFinder increases SLiMFinder sensitivity by focusing SLiM prediction on a specific query protein/region; (4) CompariMotif compares predicted SLiMs to known SLiMs or other SLiM predictions to identify common patterns. For each tool, command-line and online server examples are provided. Detailed notes provide additional advice on different applications of SLiMSuite, including batch running of multiple datasets and conservation masking using alignments of predicted orthologues.


Assuntos
Biologia Computacional/métodos , Proteínas Intrinsicamente Desordenadas/química , Software , Motivos de Aminoácidos , Sequência de Aminoácidos , Ferramenta de Busca , Alinhamento de Sequência
15.
Ther Adv Vaccines ; 2(3): 77-89, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24790732

RESUMO

Major histocompatibility complex class I (MHC-I) presented peptide epitopes provide a 'window' into the changes occurring in a cell. Conventionally, these peptides are generated by proteolysis of endogenously synthesized proteins in the cytosol, loaded onto MHC-I molecules, and presented on the cell surface for surveillance by CD8(+) T cells. MHC-I restricted processing and presentation alerts the immune system to any infectious or tumorigenic processes unfolding intracellularly and provides potential targets for a cytotoxic T cell response. Therefore, therapeutic vaccines based on MHC-I presented peptide epitopes could, theoretically, induce CD8(+) T cell responses that have tangible clinical impacts on tumor eradication and patient survival. Three major methods have been used to identify MHC-I restricted epitopes for inclusion in peptide-based vaccines for cancer: genetic, motif prediction and, more recently, immunoproteomic analysis. Although the first two methods are capable of identifying T cell stimulatory epitopes, these have significant disadvantages and may not accurately represent epitopes presented by a tumor cell. In contrast, immunoproteomic methods can overcome these disadvantages and identify naturally processed and presented tumor associated epitopes that induce more clinically relevant tumor specific cytotoxic T cell responses. In this review, we discuss the importance of using the naturally presented MHC-I peptide repertoire in formulating peptide vaccines, the recent application of peptide-based vaccines in a variety of cancers, and highlight the pros and cons of the current state of peptide vaccines.

16.
Bioinformation ; 5(2): 49-51, 2010 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-21346861

RESUMO

UNLABELLED: Locating transcription factor binding sites in genomic sequences is a key step in deciphering transcription networks. Currently available software for site search is mostly server-based, limiting the range and flexibility of this type of analysis. xFITOM is a fully customizable program for locating binding sites in genomic sequences written in C++. Through an easy-to-use interface, xFITOM that allows users an unprecedented degree of flexibility in site search. Among other features,it enables users to define motifs by mixing real sites and IUPAC consensus sequences,to search the annotated sequences of unfinished genomes and to choose among 11 different search algorithms. AVAILABILITY: XFITOM IS AVAILABLE FOR DOWNLOAD AT: http://research.umbc.edu/˜erill.

17.
Bioinformation ; 3(10): 415-8, 2009 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-19759861

RESUMO

UNLABELLED: Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. D-MATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the co-regulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sos-box cis-regulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. D-MATRIX tool is accessible through the CIMAP domain network. AVAILABILITY: http://203.190.147.116/dmatrix/

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa