Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
Nucleic Acids Res ; 45(8): e65, 2017 05 05.
Article in English | MEDLINE | ID: mdl-28082394

ABSTRACT

Our current knowledge of eukaryotic promoters indicates their complex architecture that is often composed of numerous functional motifs. Most of known promoters include multiple and in some cases mutually exclusive transcription start sites (TSSs). Moreover, TSS selection depends on cell/tissue, development stage and environmental conditions. Such complex promoter structures make their computational identification notoriously difficult. Here, we present TSSPlant, a novel tool that predicts both TATA and TATA-less promoters in sequences of a wide spectrum of plant genomes. The tool was developed by using large promoter collections from ppdb and PlantProm DB. It utilizes eighteen significant compositional and signal features of plant promoter sequences selected in this study, that feed the artificial neural network-based model trained by the backpropagation algorithm. TSSPlant achieves significantly higher accuracy compared to the next best promoter prediction program for both TATA promoters (MCC≃0.84 and F1-score≃0.91 versus MCC≃0.51 and F1-score≃0.71) and TATA-less promoters (MCC≃0.80, F1-score≃0.89 versus MCC≃0.29 and F1-score≃0.50). TSSPlant is available to download as a standalone program at http://www.cbrc.kaust.edu.sa/download/.


Subject(s)
Genome, Plant , Neural Networks, Computer , Plant Proteins/genetics , Promoter Regions, Genetic , RNA Polymerase II/genetics , Transcription Initiation Site , Arabidopsis/genetics , Arabidopsis/metabolism , Gene Expression , Oryza/genetics , Oryza/metabolism , Plant Proteins/metabolism , RNA Polymerase II/metabolism , Sequence Analysis, DNA , Software
2.
Bioinformatics ; 33(3): 334-340, 2017 02 01.
Article in English | MEDLINE | ID: mdl-27694198

ABSTRACT

Motivation: The computational search for promoters in prokaryotes remains an attractive problem in bioinformatics. Despite the attention it has received for many years, the problem has not been addressed satisfactorily. In any bacterial genome, the transcription start site is chosen mostly by the sigma (σ) factor proteins, which control the gene activation. The majority of published bacterial promoter prediction tools target σ 70 promoters in Escherichia coli . Moreover, no σ-specific classification of promoters is available for prokaryotes other than for E. coli . Results: Here, we introduce bTSSfinder, a novel tool that predicts putative promoters for five classes of σ factors in Cyanobacteria (σ A , σ C , σ H , σ G and σ F ) and for five classes of sigma factors in E. coli (σ 70 , σ 38 , σ 32 , σ 28 and σ 24 ). Comparing to currently available tools, bTSSfinder achieves higher accuracy (MCC = 0.86, F 1 -score = 0.93) compared to the next best tool with MCC = 0.59, F 1 -score = 0.79) and covers multiple classes of promoters. Availability and Implementation: bTSSfinder is available standalone and online at http://www.cbrc.kaust.edu.sa/btssfinder . Contacts: ilham.shahmuradov@kaust.edu.sa or vladimir.bajic@kaust.edu.sa. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Cyanobacteria/genetics , Escherichia coli/genetics , Promoter Regions, Genetic , Software , Transcription Initiation Site , DNA-Directed RNA Polymerases/metabolism , Genome, Bacterial , Sigma Factor/metabolism , Transcription, Genetic , Transcriptional Activation
3.
Bioinformatics ; 31(21): 3544-5, 2015 Nov 01.
Article in English | MEDLINE | ID: mdl-26142184

ABSTRACT

UNLABELLED: Gene transcription is mostly conducted through interactions of various transcription factors and their binding sites on DNA (regulatory elements, REs). Today, we are still far from understanding the real regulatory content of promoter regions. Computer methods for identification of REs remain a widely used tool for studying and understanding transcriptional regulation mechanisms. The Nsite, NsiteH and NsiteM programs perform searches for statistically significant (non-random) motifs of known human, animal and plant one-box and composite REs in a single genomic sequence, in a pair of aligned homologous sequences and in a set of functionally related sequences, respectively. AVAILABILITY AND IMPLEMENTATION: Pre-compiled executables built under commonly used operating systems are available for download by visiting http://www.molquest.kaust.edu.sa and http://www.softberry.com. CONTACT: solovictor@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Promoter Regions, Genetic , Software , Animals , Binding Sites , Genomics , Humans , Nucleotide Motifs , Plants/genetics , Regulatory Sequences, Nucleic Acid , Sequence Analysis, DNA , Transcription Factors/metabolism
4.
Biomedicines ; 11(9)2023 Aug 23.
Article in English | MEDLINE | ID: mdl-37760783

ABSTRACT

The principal aim of the current study was to investigate the relationship between miR-149 T>C (rs2292832) and miR-196a2 C>T (rs11614913) small non-coding RNA polymorphisms and the risk of developing CRC in the Azerbaijani population. The study included 120 patients diagnosed with CRC and 125 healthy individuals. Peripheral blood samples were collected from all the subjects in EDTA tubes and DNA extraction was performed by salting out. Polymorphisms were determined using the polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) method. While comparing without gender distinction no statistical correlation was found between the heterozygous TC (OR = 0.66; 95% CI = 0.37-1.15; p = 0.142), mutant CC (OR = 1.23; 95% CI = 0.62-2.45; p = 0.550), and mutant C (OR = 1.03; 95% CI = 0.72-1.49; p = 0.859) alleles of the miR-149 gene and the CT (OR = 1.23; 95% CI = 0.69-2.20; p = 0.485), mutant TT (OR = 1.29; 95% CI = 0.67-2.47; p = 0.452), and mutant T (OR = 1.17; 95% CI = 0.82-1.67; p = 0.388) alleles of the miR-196a2 gene and the risk of CRC. However, among women, miR-149 TC (OR = 0.43; 95% CI = 0.19-1.01; p = 0.048) correlated with a reduced risk of CRC, whereas miR-196a2 CT (OR = 2.77; 95% CI = 1.13-6.79; p = 0.025) correlated with an increased risk of CRC. Our findings indicated that miR-149 T>C (rs2292832) might play a protective role in the development of CRC in female patients, whereas the miR-196a2 (rs11614913) polymorphism is associated with an increased risk of CRC in women in the Azerbaijani population, highlighting the importance of gender dimorphism in cancer etiology.

5.
Front Plant Sci ; 14: 1039211, 2023.
Article in English | MEDLINE | ID: mdl-36993855

ABSTRACT

Pomegranate has a unique evolutionary history given that different cultivars have eight or nine bivalent chromosomes with possible crossability between the two classes. Therefore, it is important to study chromosome evolution in pomegranate to understand the dynamics of its population. Here, we de novo assembled the Azerbaijani cultivar "Azerbaijan guloyshasi" (AG2017; 2n = 16) and re-sequenced six cultivars to track the evolution of pomegranate and to compare it with previously published de novo assembled and re-sequenced cultivars. High synteny was observed between AG2017, Bhagawa (2n = 16), Tunisia (2n = 16), and Dabenzi (2n = 18), but these four cultivars diverged from the cultivar Taishanhong (2n = 18) with several rearrangements indicating the presence of two major chromosome evolution events. Major presence/absence variations were not observed as >99% of the five genomes aligned across the cultivars, while >99% of the pan-genic content was represented by Tunisia and Taishanhong only. We also revisited the divergence between soft- and hard-seeded cultivars with less structured population genomic data, compared to previous studies, to refine the selected genomic regions and detect global migration routes for pomegranate. We reported a unique admixture between soft- and hard-seeded cultivars that can be exploited to improve the diversity, quality, and adaptability of local pomegranate varieties around the world. Our study adds body knowledge to understanding the evolution of the pomegranate genome and its implications for the population structure of global pomegranate diversity, as well as planning breeding programs aiming to develop improved cultivars.

6.
Biomolecules ; 12(11)2022 11 01.
Article in English | MEDLINE | ID: mdl-36358962

ABSTRACT

Alternative splicing is an important means of generating the protein diversity necessary for cellular functions. Hence, there is a growing interest in assessing the structural and functional impact of alternative protein isoforms. Typically, experimental studies are used to determine the structures of the canonical proteins ignoring the other isoforms. Therefore, there is still a large gap between abundant sequence information and meager structural data on these isoforms. During the last decade, significant progress has been achieved in the development of bioinformatics tools for structural and functional annotations of proteins. Moreover, the appearance of the AlphaFold program opened up the possibility to model a large number of high-confidence structures of the isoforms. In this study, using state-of-the-art tools, we performed in silico analysis of 58 eukaryotic proteomes. The evaluated structural states included structured domains, intrinsically disordered regions, aggregation-prone regions, and tandem repeats. Among other things, we found that the isoforms have fewer signal peptides, transmembrane regions, or tandem repeat regions in comparison with their canonical counterparts. This could change protein function and/or cellular localization. The AlphaFold modeling demonstrated that frequently isoforms, having differences with the canonical sequences, still can fold in similar structures though with significant structural rearrangements which can lead to changes of their functions. Based on the modeling, we suggested classification of the structural differences between canonical proteins and isoforms. Altogether, we can conclude that a majority of isoforms, similarly to the canonical proteins are under selective pressure for the functional roles.


Subject(s)
Computational Biology , Proteome , Proteome/genetics , Protein Isoforms/genetics , Protein Isoforms/chemistry , Alternative Splicing
7.
BMC Genomics ; 11: 646, 2010 Nov 19.
Article in English | MEDLINE | ID: mdl-21092114

ABSTRACT

BACKGROUND: mRNA polyadenylation is an essential step of pre-mRNA processing in eukaryotes. Accurate prediction of the pre-mRNA 3'-end cleavage/polyadenylation sites is important for defining the gene boundaries and understanding gene expression mechanisms. RESULTS: 28761 human mapped poly(A) sites have been classified into three classes containing different known forms of polyadenylation signal (PAS) or none of them (PAS-strong, PAS-weak and PAS-less, respectively) and a new computer program POLYAR for the prediction of poly(A) sites of each class was developed. In comparison with polya_svm (till date the most accurate computer program for prediction of poly(A) sites) while searching for PAS-strong poly(A) sites in human sequences, POLYAR had a significantly higher prediction sensitivity (80.8% versus 65.7%) and specificity (66.4% versus 51.7%) However, when a similar sort of search was conducted for PAS-weak and PAS-less poly(A) sites, both programs had a very low prediction accuracy, which indicates that our knowledge about factors involved in the determination of the poly(A) sites is not sufficient to identify such polyadenylation regions. CONCLUSIONS: We present a new classification of polyadenylation sites into three classes and a novel computer program POLYAR for prediction of poly(A) sites/regions of each of the class. In tests, POLYAR shows high accuracy of prediction of the PAS-strong poly(A) sites, though this program's efficiency in searching for PAS-weak and PAS-less poly(A) sites is not very high but is comparable to other available programs. These findings suggest that additional characteristics of such poly(A) sites remain to be elucidated. POLYAR program with a stand-alone version for downloading is available at http://cub.comsats.edu.pk/polyapredict.htm.


Subject(s)
Computational Biology/methods , Poly A/genetics , Software , 5' Untranslated Regions/genetics , Base Sequence , Humans , Introns/genetics , Polyadenylation/genetics
8.
Nucleic Acids Res ; 31(1): 114-7, 2003 Jan 01.
Article in English | MEDLINE | ID: mdl-12519961

ABSTRACT

PlantProm DB, a plant promoter database, is an annotated, non-redundant collection of proximal promoter sequences for RNA polymerase II with experimentally determined transcription start site(s), TSS, from various plant species. The first release (2002.01) of PlantProm DB contains 305 entries including 71, 220 and 14 promoters from monocot, dicot and other plants, respectively. It provides DNA sequence of the promoter regions (-200 : +51) with TSS on the fixed position +201, taxonomic/promoter type classification of promoters and Nucleotide Frequency Matrices (NFM) for promoter elements: TATA-box, CCAAT-box and TSS-motif (Inr). Analysis of TSS-motifs revealed that their composition is different in dicots and monocots, as well as for TATA and TATA-less promoters. The database serves as learning set in developing plant promoter prediction programs. One such program (TSSP) based on discriminant analysis has been created by Softberry Inc. and the application of a support ftp: vector machine approach for promoter identification is under development. PlantProm DB is available at http://mendel.cs.rhul.ac.uk/ and http://www.softberry.com/.


Subject(s)
Databases, Nucleic Acid , Genes, Plant , Promoter Regions, Genetic , RNA Polymerase II/genetics , Response Elements , Sequence Analysis, DNA
9.
Methods Mol Biol ; 674: 57-83, 2010.
Article in English | MEDLINE | ID: mdl-20827586

ABSTRACT

Promoter sequences are the main regulatory elements of gene expression. Their recognition by computer algorithms is fundamental for understanding gene expression patterns, cell specificity and development. This chapter describes the advanced approaches to identify promoters in animal, plant and bacterial sequences. Also, we discuss an approach to identify statistically significant regulatory motifs in genomic sequences.


Subject(s)
Computational Biology/methods , Gene Expression Regulation/genetics , Promoter Regions, Genetic/genetics , Algorithms , Animals , Bacteria/genetics , Base Sequence , DNA/genetics , DNA/metabolism , Humans , Mice , Molecular Sequence Data , Plants/genetics , Rats , Sequence Homology, Nucleic Acid , Software , Transcription Factors/metabolism
10.
Bioinformatics ; 19(15): 1964-71, 2003 Oct 12.
Article in English | MEDLINE | ID: mdl-14555630

ABSTRACT

UNLABELLED: In this paper we propose a new method for recognition of prokaryotic promoter regions with startpoints of transcription. The method is based on Sequence Alignment Kernel, a function reflecting the quantitative measure of match between two sequences. This kernel function is further used in Dual SVM, which performs the recognition. Several recognition methods have been trained and tested on positive data set, consisting of 669 sigma70-promoter regions with known transcription startpoints of Escherichia coli and two negative data sets of 709 examples each, taken from coding and non-coding regions of the same genome. The results show that our method performs well and achieves 16.5% average error rate on positive & coding negative data and 18.6% average error rate on positive & non-coding negative data. AVAILABILITY: The demo version of our method is accessible from our website http://mendel.cs.rhul.ac.uk/


Subject(s)
Algorithms , Artificial Intelligence , Escherichia coli/genetics , Gene Expression Profiling/methods , Pattern Recognition, Automated , Promoter Regions, Genetic/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Reproducibility of Results , Sensitivity and Specificity
11.
Plant Mol Biol ; 52(5): 923-34, 2003 Jul.
Article in English | MEDLINE | ID: mdl-14558655

ABSTRACT

Pairwise comparison of whole plastid and draft nuclear genomic sequences of Arabidopsis thaliana and Oryza sativa L. ssp. indica shows that rice nuclear genomic sequences contain homologs of plastid DNA covering about 94 kb (83%) of plastid genome and including one or more full-length intact (without mutations resulting in premature stop codons) homologues of 26 known protein-coding (KPC) plastid genes. By contrast, only about 20 kb (16%) of chloroplast DNA, including a single intact plastid-derived KPC gene, is presented in the nucleus of A. thaliana. Sixteen rice plastid genes have at least one nuclear copy without any mutation or with only synonymous substitutions. Nuclear copies for other ten plastid genes contain both synonymous and non-synonymous substitutions. Multiple ESTs for 25 out of 26 KPC genes were also found, as well as putative promoters for some of them. The study of substitutions pattern shows that some of nuclear homologues of plastid genes may be functional and/or are under the pressure of the positive natural selection. The similar comparative analysis performed on rice chromosome 1 revealed 27 contigs containing plastid-derived sequences, totalling about 84 kb and covering two thirds of chloroplast DNA, with the intact nuclear copies of 26 different KPC genes. One of these contigs, AP003280, includes almost 57 kb (45%) of chloroplast genome with the intact copies of 22 KPC genes. At the same time, we observed that relative locations of homologues in plastid DNA and the nuclear genome are significantly different.


Subject(s)
Arabidopsis/genetics , Cell Nucleus/genetics , Genome, Plant , Oryza/genetics , Plastids/genetics , Chromosomes, Plant/genetics , DNA, Chloroplast/genetics , Gene Dosage , Genes, Plant/genetics , Nuclear Proteins/genetics , Plant Proteins/genetics
SELECTION OF CITATIONS
SEARCH DETAIL