Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
EMBO J ; 2024 Sep 16.
Artículo en Inglés | MEDLINE | ID: mdl-39284910

RESUMEN

Transcription factors (TFs) regulate gene expression by binding with varying strengths to DNA via their DNA-binding domain. Additionally, some TFs also interact with RNA, which modulates transcription factor binding to chromatin. However, whether RNA-mediated TF binding results in differential transcriptional outcomes remains unknown. In this study, we demonstrate that estrogen receptor α (ERα), a ligand-activated TF, interacts with RNA in a ligand-dependent manner. Defects in RNA binding lead to genome-wide loss of ERα recruitment, particularly at weaker ERα-motifs. Furthermore, ERα mobility in the nucleus increases in the absence of its RNA-binding capacity. Unexpectedly, this increased mobility coincides with robust polymerase loading and transcription of ERα-regulated genes that harbor low-strength motifs. However, highly stable binding of ERα on chromatin negatively impacts ligand-dependent transcription. Collectively, our results suggest that RNA interactions spatially confine ERα on low-affinity sites to fine-tune gene transcription.

2.
Front Genet ; 15: 1424085, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38952710

RESUMEN

Motivation: The interaction between DNA motifs (DNA motif pairs) influences gene expression through partnership or competition in the process of gene regulation. Potential chromatin interactions between different DNA motifs have been implicated in various diseases. However, current methods for identifying DNA motif pairs rely on the recognition of single DNA motifs or probabilities, which may result in local optimal solutions and can be sensitive to the choice of initial values. A method for precisely identifying DNA motif pairs is still lacking. Results: Here, we propose a novel computational method for predicting DNA Motif Pairs based on Composite Heterogeneous Graph (MPCHG). This approach leverages a composite heterogeneous graph model to identify DNA motif pairs on paired sequences. Compared with the existing methods, MPCHG has greatly improved the accuracy of motifs prediction. Furthermore, the predicted DNA motifs demonstrate heightened DNase accessibility than the background sequences. Notably, the two DNA motifs forming a pair exhibit functional consistency. Importantly, the interacting TF pairs obtained by predicted DNA motif pairs were significantly enriched with known interacting TF pairs, suggesting their potential contribution to chromatin interactions. Collectively, we believe that these identified DNA motif pairs held substantial implications for revealing gene transcriptional regulation under long-range chromatin interactions.

3.
Int J Mol Sci ; 25(3)2024 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-38339181

RESUMEN

The concept of cis-regulatory modules located in gene promoters represents today's vision of the organization of gene transcriptional regulation. Such modules are a combination of two or more single, short DNA motifs. The bioinformatic identification of such modules belongs to so-called NP-hard problems with extreme computational complexity, and therefore, simplifications, assumptions, and heuristics are usually deployed to tackle the problem. In practice, this requires, first, many parameters to be set before the search, and second, it leads to the identification of locally optimal results. Here, a novel method is presented, aimed at identifying the cis-regulatory elements in gene promoters based on an exhaustive search of all the feasible modules' configurations. All required parameters are automatically estimated using positive and negative datasets. To be computationally efficient, the search is accelerated using a multidimensional hash function, allowing the search to complete in a few hours on a regular laptop (for example, a CPU Intel i7, 3.2 GH, 32 Gb RAM). Tests on an established benchmark and real data show better performance of BestCRM compared to the available methods according to several metrics like specificity, sensitivity, AUC, etc. A great practical advantage of the method is its minimum number of input parameters-apart from positive and negative promoters, only a desired level of module presence in promoters is required.


Asunto(s)
Algoritmos , Secuencias Reguladoras de Ácidos Nucleicos , Regiones Promotoras Genéticas , Secuencias Reguladoras de Ácidos Nucleicos/genética , Regulación de la Expresión Génica , Biología Computacional/métodos
4.
Ann Bot ; 131(1): 87-108, 2023 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-34874999

RESUMEN

BACKGROUND AND AIMS: Diploid and polyploid Urochloa (including Brachiaria, Panicum and Megathyrsus species) C4 tropical forage grasses originating from Africa are important for food security and the environment, often being planted in marginal lands worldwide. We aimed to characterize the nature of their genomes, the repetitive DNA and the genome composition of polyploids, leading to a model of the evolutionary pathways within the group including many apomictic species. METHODS: Some 362 forage grass accessions from international germplasm collections were studied, and ploidy was determined using an optimized flow cytometry method. Whole-genome survey sequencing and molecular cytogenetic analysis were used to identify chromosomes and genomes in Urochloa accessions belonging to the 'brizantha' and 'humidicola' agamic complexes and U. maxima. KEY RESULTS: Genome structures are complex and variable, with multiple ploidies and genome compositions within the species, and no clear geographical patterns. Sequence analysis of nine diploid and polyploid accessions enabled identification of abundant genome-specific repetitive DNA motifs. In situ hybridization with a combination of repetitive DNA and genomic DNA probes identified evolutionary divergence and allowed us to discriminate the different genomes present in polyploids. CONCLUSIONS: We suggest a new coherent nomenclature for the genomes present. We develop a model of evolution at the whole-genome level in diploid and polyploid accessions showing processes of grass evolution. We support the retention of narrow species concepts for Urochloa brizantha, U. decumbens and U. ruziziensis, and do not consider diploids and polyploids of single species as cytotypes. The results and model will be valuable in making rational choices of parents for new hybrids, assist in use of the germplasm for breeding and selection of Urochloa with improved sustainability and agronomic potential, and assist in measuring and conserving biodiversity in grasslands.


Asunto(s)
Brachiaria , Poaceae , Poaceae/genética , Brachiaria/genética , Poliploidía , Ploidias , Genómica
5.
Front Plant Sci ; 13: 1026364, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36483968

RESUMEN

Structural chromosome rearrangements involving translocations, fusions and fissions lead to evolutionary variation between species and potentially reproductive isolation and variation in gene expression. While the wheats (Triticeae, Poaceae) and oats (Aveneae) all maintain a basic chromosome number of x=7, genomes of oats show frequent intergenomic translocations, in contrast to wheats where these translocations are relatively rare. We aimed to show genome structural diversity and genome relationships in tetraploid, hexaploid and octoploid Avena species and amphiploids, establishing patterns of intergenomic translocations across different oat taxa using fluorescence in situ hybridization (FISH) with four well-characterized repetitive DNA sequences: pAs120, AF226603, Ast-R171 and Ast-T116. In A. agadiriana (2n=4x=28), the selected probes hybridized to all chromosomes indicating that this species originated from one (autotetraploid) or closely related ancestors with the same genomes. Hexaploid amphiploids were confirmed as having the genomic composition AACCDD, while octoploid amphiploids showed three different genome compositions: AACCCCDD, AAAACCDD or AABBCCDD. The A, B, C, and D genomes of oats differ significantly in their involvement in non-centromeric, intercalary translocations. There was a predominance of distal intergenomic translocations from the C- into the D-genome chromosomes. Translocations from A- to C-, or D- to C-genome chromosomes were less frequent, proving that at least some of the translocations in oat polyploids are non-reciprocal. Rare translocations from A- to D-, D- to A- and C- to B-genome chromosomes were also visualized. The fundamental research has implications for exploiting genomic biodiversity in oat breeding through introgression from wild species potentially with contrasting chromosomal structures and hence deleterious segmental duplications or large deletions in amphiploid parental lines.

6.
Int J Mol Sci ; 23(17)2022 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-36077578

RESUMEN

CRISPR-Cas systems empower prokaryotes with adaptive immunity against invasive mobile genetic elements. At the first step of CRISPR immunity adaptation, short DNA fragments from the invaders are integrated into CRISPR arrays at the leader-proximal end. To date, the mechanism of recognition of the leader-proximal end remains largely unknown. Here, in the Sulfolobus islandicus subtype I-A system, we show that mutations destroying the proximal region reduce CRISPR adaptation in vivo. We identify that a stem-loop structure is present on the leader-proximal end, and we demonstrate that Cas1 preferentially binds the stem-loop structure in vitro. Moreover, we demonstrate that the integrase activity of Cas1 is modulated by interacting with a CRISPR-associated factor Csa3a. When translocated to the CRISPR array, the Csa3a-Cas1 complex is separated by Csa3a binding to the leader-distal motif and Cas1 binding to the leader-proximal end. Mutation at the leader-distal motif reduces CRISPR adaptation efficiency, further confirming the in vivo function of leader-distal motif. Together, our results suggest a general model for binding of Cas1 protein to a leader motif and modulation of integrase activity by an accessory factor.


Asunto(s)
Proteínas Asociadas a CRISPR , Sulfolobus , Proteínas Asociadas a CRISPR/metabolismo , Sistemas CRISPR-Cas , Integrasas/metabolismo , Motivos de Nucleótidos , Sulfolobus/genética , Sulfolobus/metabolismo
7.
Trends Plant Sci ; 27(12): 1206-1208, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36100536

RESUMEN

Advanced machine learning (ML) algorithms produce highly accurate models of gene expression, uncovering novel regulatory features in nucleotide sequences involving multiple cis-regulatory regions across whole genes and structural properties. These broaden our understanding of gene regulation and point to new principles to test and adopt in the field of plant science.


Asunto(s)
Regulación de la Expresión Génica de las Plantas , Genes de Plantas , Regulación de la Expresión Génica de las Plantas/genética , Aprendizaje Automático , Algoritmos , Secuencias Reguladoras de Ácidos Nucleicos
8.
Mol Cell ; 82(18): 3398-3411.e11, 2022 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-35863348

RESUMEN

Regulatory elements activate promoters by recruiting transcription factors (TFs) to specific motifs. Notably, TF-DNA interactions often depend on cooperativity with colocalized partners, suggesting an underlying cis-regulatory syntax. To explore TF cooperativity in mammals, we analyze ∼500 mouse and human primary cells by combining an atlas of TF motifs, footprints, ChIP-seq, transcriptomes, and accessibility. We uncover two TF groups that colocalize with most expressed factors, forming stripes in hierarchical clustering maps. The first group includes lineage-determining factors that occupy DNA elements broadly, consistent with their key role in tissue-specific transcription. The second one, dubbed universal stripe factors (USFs), comprises ∼30 SP, KLF, EGR, and ZBTB family members that recognize overlapping GC-rich sequences in all tissues analyzed. Knockouts and single-molecule tracking reveal that USFs impart accessibility to colocalized partners and increase their residence time. Mammalian cells have thus evolved a TF superfamily with overlapping DNA binding that facilitate chromatin accessibility.


Asunto(s)
Cromatina , Factores de Transcripción , Animales , Sitios de Unión , Cromatina/genética , ADN/genética , Humanos , Mamíferos/genética , Mamíferos/metabolismo , Ratones , Ratones Noqueados , Unión Proteica , Factores de Transcripción/metabolismo
9.
Plants (Basel) ; 10(10)2021 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-34685852

RESUMEN

Gene duplication and the preservation of both copies during evolution is an intriguing evolutionary phenomenon. Their preservation is related to the function they perform. The central component of centromere specification and function is the centromere-specific histone H3 (CENH3). Some cereal species (maize, rice) have one copy of the gene encoding this protein, while some (wheat, barley, rye) have two. Therefore, they represent a good model for a comparative study of the functional activity of the duplicated CENH3 genes and their protein products. We determined the organization of the CENH3 locus in rye (Secale cereale L.) and identified the functional motifs in the vicinity of the CENH3 genes. We compared the expression of these genes at different stages of plant development and the loading of their products, the CENH3 proteins, into nucleosomes during mitosis and meiosis. Using extended chromatin fibers, we revealed patterns of loading CENH3 proteinsinto polynucleosomal domains in centromeric chromatin. Our results indicate no sign of neofunctionalization, subfunctionalization or specialization in the gene copies. The influence of negative selection on the coding part of the genes led them to preserve their conserved function. The advantage of having two functional genes appears as the gene-dosage effect.

10.
BMC Bioinformatics ; 22(1): 278, 2021 May 26.
Artículo en Inglés | MEDLINE | ID: mdl-34039269

RESUMEN

BACKGROUND: The investigation of molecular alterations associated with the conservation and variation of DNA methylation in eukaryotes is gaining interest in the biomedical research community. Among the different determinants of methylation stability, the DNA composition of the CpG surrounding regions has been shown to have a crucial role in the maintenance and establishment of methylation statuses. This aspect has been previously characterized in a quantitative manner by inspecting the nucleotidic composition in the region. Research in this field still lacks a qualitative perspective, linked to the identification of certain sequences (or DNA motifs) related to particular DNA methylation phenomena. RESULTS: Here we present a novel computational strategy based on short DNA motif discovery in order to characterize sequence patterns related to aberrant CpG methylation events. We provide our framework as a user-friendly, shiny-based application, CpGmotifs, to easily retrieve and characterize DNA patterns related to CpG methylation in the human genome. Our tool supports the functional interpretation of deregulated methylation events by predicting transcription factors binding sites (TFBS) encompassing the identified motifs. CONCLUSIONS: CpGmotifs is an open source software. Its source code is available on GitHub https://github.com/Greco-Lab/CpGmotifs and a ready-to-use docker image is provided on DockerHub at https://hub.docker.com/r/grecolab/cpgmotifs .


Asunto(s)
Metilación de ADN , Genoma Humano , Islas de CpG , Humanos , Motivos de Nucleótidos , Programas Informáticos
11.
G3 (Bethesda) ; 11(1)2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-33561247

RESUMEN

Homologous recombination is a key pathway found in nearly all bacterial taxa. The recombination complex not only allows bacteria to repair DNA double-strand breaks but also promotes adaption through the exchange of DNA between cells. In Proteobacteria, this process is mediated by the RecBCD complex, which relies on the recognition of a DNA motif named Chi to initiate recombination. The Chi motif has been characterized in Escherichia coli and analogous sequences have been found in several other species from diverse families, suggesting that this mode of action is widespread across bacteria. However, the sequences of Chi-like motifs are known for only five bacterial species: E. coli, Haemophilus influenzae, Bacillus subtilis, Lactococcus lactis, and Staphylococcus aureus. In this study, we detected putative Chi motifs in a large dataset of Proteobacteria and identified four additional motifs sharing high sequence similarity and similar properties to the Chi motif of E. coli in 85 species of Proteobacteria. Most Chi motifs were detected in Enterobacteriaceae and this motif appears well conserved in this family. However, we did not detect Chi motifs for the majority of Proteobacteria, suggesting that different motifs are used in these species. Altogether these results substantially expand our knowledge on the evolution of Chi motifs and on the recombination process in bacteria.


Asunto(s)
Escherichia coli , Recombinación Genética , ADN Bacteriano , Escherichia coli/genética , Exodesoxirribonucleasa V , Exodesoxirribonucleasas/genética , Proteobacteria
12.
J R Soc Interface ; 17(171): 20200600, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-33023397

RESUMEN

Automatic de novo identification of the main regulons of a bacterium from genome and transcriptome data remains a challenge. To address this task, we propose a statistical model that can use information on exact positions of the transcription start sites and condition-dependent expression profiles. The central idea of this model is to improve the probabilistic representation of the promoter DNA sequences by incorporating covariates summarizing expression profiles (e.g. coordinates in projection spaces or hierarchical clustering trees). A dedicated trans-dimensional Markov chain Monte Carlo algorithm adjusts the width and palindromic properties of the corresponding position-weight matrices, the number of parameters to describe exact position relative to the transcription start site, and chooses the expression covariates relevant for each motif. All parameters are estimated simultaneously, for many motifs and many expression covariates. The method is applied to a dataset of transcription start sites and expression profiles available for Listeria monocytogenes. The results validate the approach and provide a new global view of the transcription regulatory network of this important pathogen. Remarkably, a previously unreported motif is found in promoter regions of ribosomal protein genes, suggesting a role in the regulation of growth.


Asunto(s)
Listeria monocytogenes , Algoritmos , Listeria monocytogenes/genética , Cadenas de Markov , Modelos Estadísticos , Regiones Promotoras Genéticas , Transcriptoma
13.
Curr Protoc Nucleic Acid Chem ; 82(1): e115, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32931657

RESUMEN

Custom-built DNA nanostructures are now used in applications such as biosensing, molecular computation, biomolecular analysis, and drug delivery. While the functionality and biocompatibility of DNA makes DNA nanostructures useful in such applications, the field faces a challenge in making biostable DNA nanostructures. Being a natural material, DNA is most suited for biological applications, but is also easily degraded by nucleases. Several methods have been employed to study the nuclease degradation rates and enhancement of nuclease resistance. This protocol describes the use of gel electrophoresis to analyze the extent of nuclease degradation of DNA nanostructures and to report degradation times, kinetics of nuclease digestion, and evaluation of biostability enhancement factors. © 2020 Wiley Periodicals LLC. Basic Protocol: Timed analysis of nuclease degradation of DNA nanostructures Support Protocol: Calculating biostability enhancement factors.


Asunto(s)
ADN/química , Desoxirribonucleasa I/química , Electroforesis en Gel de Poliacrilamida/métodos , Nanoestructuras
14.
PeerJ Comput Sci ; 6: e278, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33816929

RESUMEN

Application of deep neural network is a rapidly expanding field now reaching many disciplines including genomics. In particular, convolutional neural networks have been exploited for identifying the functional role of short genomic sequences. These approaches rely on gathering large sets of sequences with known functional role, extracting those sequences from whole-genome-annotations. These sets are then split into learning, test and validation sets in order to train the networks. While the obtained networks perform well on validation sets, they often perform poorly when applied on whole genomes in which the ratio of positive over negative examples can be very different than in the training set. We here address this issue by assessing the genome-wide performance of networks trained with sets exhibiting different ratios of positive to negative examples. As a case study, we use sequences encompassing gene starts from the RefGene database as positive examples and random genomic sequences as negative examples. We then demonstrate that models trained using data from one organism can be used to predict gene-start sites in a related species, when using training sets providing good genome-wide performance. This cross-species application of convolutional neural networks provides a new way to annotate any genome from existing high-quality annotations in a related reference species. It also provides a way to determine whether the sequence motifs recognised by chromatin-associated proteins in different species are conserved or not.

15.
J Theor Biol ; 461: 41-50, 2019 01 14.
Artículo en Inglés | MEDLINE | ID: mdl-30336158

RESUMEN

In 1932, Paul Erdös asked whether a random walk constructed from a binary sequence can achieve the lowest possible deviation (lowest discrepancy), for the sequence itself and for all its subsequences formed by homogeneous arithmetic progressions. Although avoiding low discrepancy is impossible for infinite sequences, as recently proven by Terence Tao, attempts were made to construct such sequences with finite lengths. We recognize that such constructed sequences (we call these "Erdös sequences") exhibit certain hallmarks of randomness at the local level: they show roughly equal frequencies of short subsequences, and at the same time exclude trivial periodic patterns. For the human DNA we examine the frequency of a set of Erdös motifs of length-10 using three nucleotides-to-binary mappings. The particular length-10 Erdös sequence is derived from the length-11 Mathias sequence and is identical with the first 10 digits of the Thue-Morse sequence, underscoring the fact that both are deficient in periodicities. Our calculations indicate that: (1) the purine(A and G)/pyridimine(C and T) based Erdös motifs are greatly underrepresented in the human genome, (2) the strong(G and C)/weak(A and T) based Erdös motifs are slightly overrepresented, (3) the densities of the two are negatively correlated, (4) the Erdös motifs based on all three mappings being combined are slightly underrepresented, and (5) the strong/weak based Erdös motifs are greatly overrepresented in the human messenger RNA sequences.


Asunto(s)
Secuencia de Bases/genética , Motivos de Nucleótidos/genética , Biología Computacional , ADN/genética , Genoma Humano/genética , Humanos , ARN/genética , ARN Mensajero/genética
16.
iScience ; 7: 198-211, 2018 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-30267681

RESUMEN

Although the existing works on DNA motif discovery on DNA sequences are plethoric, mechanistic knowledge to infer DNA motifs from protein sequences across multiple DNA-binding domain families without conducting any wet-lab experiments is still lacking. Therefore, the k-spectrum recognition modeling is proposed to address the issues at the highest possible resolutions. The k-spectrum model can capture DNA motif patterns from protein sequences at the resolution in which local sequence context and nucleotide dependency can be taken into account completely. Multiple evaluation metrics are adopted and measured on millions of k-mer binding intensities from 92 proteins across 5 DNA-binding families (i.e., bHLH, bZIP, ETS, Forkhead, and Homeodomain), demonstrating its competitive edges. In addition, it not only can contribute to DNA motif recognition modeling but also can help prioritize the observed or even unobserved binding of single nucleotide variants on transcription factor binding sites in a genome-wide manner.

17.
Methods Mol Biol ; 1867: 15-28, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30155812

RESUMEN

Cys2His2 zinc-finger proteins (C2H2-ZFPs) constitute the largest class of human transcription factors (TFs) and also the least characterized one. Determining the DNA sequence preferences of C2H2-ZFPs is an important first step toward elucidating their roles in transcriptional regulation. Among the most promising approaches for obtaining the sequence preferences of C2H2-ZFPs are those that combine machine-learning predictions with in vivo binding maps of these proteins. Here, we provide a protocol and guidelines for predicting the DNA-binding preferences of C2H2-ZFPs from their amino acid sequences using a machine learning-based recognition code. This protocol also describes the tools and steps to combine these predictions with ChIP-seq data to remove inaccuracies, identify the zinc-finger domains within each C2H2-ZFP that engage with DNA in vivo, and pinpoint the genomic binding sites of the C2H2-ZFPs.


Asunto(s)
Dedos de Zinc CYS2-HIS2 , Inmunoprecipitación de Cromatina/métodos , Biología Computacional/métodos , Proteínas de Unión al ADN/metabolismo , ADN/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Motivos de Nucleótidos , Sitios de Unión , ADN/genética , Proteínas de Unión al ADN/genética , Regulación de la Expresión Génica , Genoma Humano , Humanos , Posición Específica de Matrices de Puntuación , Unión Proteica , Elementos Reguladores de la Transcripción , Análisis de Secuencia de ADN/métodos , Programas Informáticos
18.
Methods Mol Biol ; 1811: 1-9, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29926442

RESUMEN

The founding of structural DNA nanotechnology is described, birth pangs and all by the originator of the field. The excitement of the invention, the characters, and the roles are evident as a true celebration of scientific research.


Asunto(s)
ADN/química , ADN/genética , Nanotecnología/historia , Historia del Siglo XX , Historia del Siglo XXI , Humanos , Modelos Moleculares , Nanoestructuras/química , Conformación de Ácido Nucleico
19.
Genom Data ; 14: 24-31, 2017 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-28840100

RESUMEN

The nucleotide binding site-leucine rich repeat (NBS-LRR) proteins play an important role in the defense mechanisms against pathogens. Using bioinformatics approach, we identified and annotated 104 NBS-LRR genes in chickpea. Phylogenetic analysis points to their diversification into two families namely TIR-NBS-LRR and non-TIR-NBS-LRR. Gene architecture revealed intron gain/loss events in this resistance gene family during their independent evolution into two families. Comparative genomics analysis elucidated its evolutionary relationship with other fabaceae species. Around 50% NBS-LRRs reside in macro-syntenic blocks underlining positional conservation along with sequence conservation of NBS-LRR genes in chickpea. Transcriptome sequencing data provided evidence for their transcription and tissue-specific expression. Four cis-regulatory elements namely WBOX, DRE, CBF, and GCC boxes, that commonly occur in resistance genes, were present in the promoter regions of these genes. Further, the findings will provide a strong background to use candidate disease resistance NBS-encoding genes and identify their specific roles in chickpea.

20.
J Bioinform Comput Biol ; 15(4): 1750014, 2017 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-28571483

RESUMEN

Identification of transcription factor binding sites or biological motifs is an important step in deciphering the mechanisms of gene regulation. It is a classic problem that has eluded a satisfactory and efficient solution. In this paper, we devise a three-phase algorithm to mine for biologically significant motifs. In the first phase, we generate all the possible string motifs, this phase is followed by a filtering process where we discard all motifs that do not meet the constraints. And in the final phase, motifs are scored and ranked using a combination of stochastic techniques and [Formula: see text]-value. We show that our method outperforms some very well-known motif discovery tools, e.g. MEME and Weeder on well-established benchmark data suites. We also apply the algorithm on the non-coding regions of M. tuberculosis and report significant motifs of size 10 with excellent [Formula: see text]-values in a fraction of the time MEME and MoSDi did. In fact, among the best 10 motifs ([Formula: see text]-value wise) in the non-coding regions of M. tuberculosis reported by the tools MEME, MoSDi and ours, five were discovered by our approach which included the third and the fourth best ones. All this in 1/17 and 1/6 the time which MEME and MoSDi (respectively) took.


Asunto(s)
Algoritmos , Proteínas Bacterianas/genética , Biología Computacional/métodos , Mycobacterium tuberculosis/genética , Motivos de Nucleótidos , Análisis de Secuencia de ADN/métodos , Factores de Transcripción/metabolismo , Proteínas Bacterianas/metabolismo , Sitios de Unión , Mycobacterium tuberculosis/metabolismo , Factores de Transcripción/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...