Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Int J Mol Sci ; 25(7)2024 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-38612878

RESUMO

We developed a procedure for locating genes on Drosophila melanogaster polytene chromosomes and described three types of chromosome structures (gray bands, black bands, and interbands), which differed markedly in morphological and genetic properties. This was reached through the use of our original methods of molecular and genetic analysis, electron microscopy, and bioinformatics data processing. Analysis of the genome-wide distribution of these properties led us to a bioinformatics model of the Drosophila genome organization, in which the genome was divided into two groups of genes. One was constituted by 65, in which the genome was divided into two groups, 62 genes that are expressed in most cell types during life cycle and perform basic cellular functions (the so-called "housekeeping genes"). The other one was made up of 3162 genes that are expressed only at particular stages of development ("developmental genes"). These two groups of genes are so different that we may state that the genome has two types of genetic organization. Different are the timings of their expression, chromatin packaging levels, the composition of activating and deactivating proteins, the sizes of these genes, the lengths of their introns, the organization of the promoter regions of the genes, the locations of origin recognition complexes (ORCs), and DNA replication timings.


Assuntos
Drosophila , Genes Essenciais , Animais , Drosophila/genética , Drosophila melanogaster/genética , Cromatina , Íntrons
2.
Mol Syst Biol ; 18(2): e9816, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35156763

RESUMO

The core promoter plays a central role in setting metazoan gene expression levels, but how exactly it "computes" expression remains poorly understood. To dissect its function, we carried out a comprehensive structure-function analysis in Drosophila. First, we performed a genome-wide bioinformatic analysis, providing an improved picture of the sequence motifs architecture. We then measured synthetic promoters' activities of ~3,000 mutational variants with and without an external stimulus (hormonal activation), at large scale and with high accuracy using robotics and a dual luciferase reporter assay. We observed a strong impact on activity of the different types of mutations, including knockout of individual sequence motifs and motif combinations, variations of motif strength, nucleosome positioning, and flanking sequences. A linear combination of the individual motif features largely accounts for the combinatorial effects on core promoter activity. These findings shed new light on the quantitative assessment of gene expression in metazoans.


Assuntos
Biologia Computacional , Drosophila , Animais , Drosophila/genética , Genoma , Regiões Promotoras Genéticas
3.
Genomics ; 112(3): 2107-2118, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-31816430

RESUMO

Despite the broad variety of available microRNA (miRNA) prediction tools, their application to the discovery and annotation of novel miRNA genes in domestic species is still limited. In this study we designed a comprehensive pipeline (eMIRNA) for miRNA identification in the yet poorly annotated porcine genome and demonstrated the usefulness of implementing a motif search positional refinement strategy for the accurate determination of precursor miRNA boundaries. The small RNA fraction from gluteus medius skeletal muscle of 48 Duroc gilts was sequenced and used for the prediction of novel miRNA loci. Additionally, we selected the human miRNA annotation for a homology-based search of porcine miRNAs with orthologous genes in the human genome. A total of 20 novel expressed miRNAs were identified in the porcine muscle transcriptome and 27 additional novel porcine miRNAs were also detected by homology-based search using the human miRNA annotation. The existence of three selected novel miRNAs (ssc-miR-483, ssc-miR484 and ssc-miR-200a) was further confirmed by reverse transcription quantitative real-time PCR analyses in the muscle and liver tissues of Göttingen minipigs. In summary, the eMIRNA pipeline presented in the current work allowed us to expand the catalogue of porcine miRNAs and showed better performance than other commonly used miRNA prediction approaches. More importantly, the flexibility of our pipeline makes possible its application in other yet poorly annotated non-model species.


Assuntos
Genoma , Genômica/métodos , Aprendizado de Máquina , MicroRNAs/genética , MicroRNAs/metabolismo , Sus scrofa/genética , Algoritmos , Animais , Loci Gênicos , Fígado/metabolismo , MicroRNAs/química , Anotação de Sequência Molecular , Músculo Esquelético/metabolismo , Motivos de Nucleotídeos , Precursores de RNA/química , RNA-Seq , Homologia de Sequência do Ácido Nucleico , Sus scrofa/metabolismo , Transcriptoma
4.
Sensors (Basel) ; 21(23)2021 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-34884039

RESUMO

Numerous approaches exist for disaggregating power consumption data, referred to as non-intrusive load monitoring (NILM). Whereas NILM is primarily used for energy monitoring, we intend to disaggregate a household's power consumption to detect human activity in the residence. Therefore, this paper presents a novel approach for NILM, which uses pattern recognition on the raw power waveform of the smart meter measurements to recognize individual household appliance actions. The presented NILM approach is capable of (near) real-time appliance action detection in a streaming setting, using edge computing. It is unique in our approach that we quantify the disaggregating uncertainty using continuous pattern correlation instead of binary device activity states. Further, we outline using the disaggregated appliance activity data for human activity recognition (HAR). To evaluate our approach, we use a dataset collected from actual households. We show that the developed NILM approach works, and the disaggregation quality depends on the pattern selection and the appliance type. In summary, we demonstrate that it is possible to detect human activity within the residence using a motif-detection-based NILM approach applied to smart meter measurements.


Assuntos
Atividades Humanas , Reconhecimento Psicológico , Humanos , Incerteza
5.
BMC Genomics ; 20(Suppl 5): 424, 2019 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-31167665

RESUMO

BACKGROUND: Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity between families of proteins, etc. Several motif models have been proposed in the literature. The (l,d)-motif model is one of these that has been studied widely. However, this model will sometimes report too many spurious motifs than expected. We interpret a motif as a biologically significant entity that is evolutionarily preserved within some distance. It may be highly improbable that the motif undergoes the same number of changes in each of the species. To address this issue, in this paper, we introduce a new model which is more general than (l,d)-motif model. This model is called (l,d1,d2)-motif model (LDDMS) and is NP-hard as well. We present three elegant as well as efficient algorithms to solve the LDDMS problem, i.e., LDDMS1, LDDMS2 and LDDMS3. They are all exact algorithms. RESULTS: We did both theoretical analyses and empirical tests on these algorithms. Theoretical analyses demonstrate that our algorithms have less computational cost than the pattern driven approach. Empirical results on both simulated datasets and real datasets show that each of the three algorithms has some advantages on some (l,d1,d2) instances. CONCLUSIONS: We proposed LDDMS model which is more practically relevant. We also proposed three exact efficient algorithms to solve the problem. Besides, our algorithms can be nicely parallelized. We believe that the idea in this new model can also be extended to other motif search problems such as Edit-distance-based Motif Search (EMS) and Simple Motif Search (SMS).


Assuntos
Algoritmos , Motivos de Aminoácidos , Motivos de Nucleotídeos , Biologia Computacional , Humanos , Modelos Teóricos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos
6.
Entropy (Basel) ; 21(8)2019 Aug 16.
Artigo em Inglês | MEDLINE | ID: mdl-33267515

RESUMO

Chromatin immunoprecipitation combined with next-generation sequencing (ChIP-Seq) technology has enabled the identification of transcription factor binding sites (TFBSs) on a genome-wide scale. To effectively and efficiently discover TFBSs in the thousand or more DNA sequences generated by a ChIP-Seq data set, we propose a new algorithm named AP-ChIP. First, we set two thresholds based on probabilistic analysis to construct and further filter the cluster subsets. Then, we use Affinity Propagation (AP) clustering on the candidate cluster subsets to find the potential motifs. Experimental results on simulated data show that the AP-ChIP algorithm is able to make an almost accurate prediction of TFBSs in a reasonable time. Also, the validity of the AP-ChIP algorithm is tested on a real ChIP-Seq data set.

7.
BMC Bioinformatics ; 19(1): 228, 2018 06 18.
Artigo em Inglês | MEDLINE | ID: mdl-29914360

RESUMO

BACKGROUND: Given a set of t n-length DNA sequences, q satisfying 0 < q ≤ 1, and l and d satisfying 0 ≤ d < l < n, the quorum planted motif search (qPMS) finds l-length strings that occur in at least qt input sequences with up to d mismatches and is mainly used to locate transcription factor binding sites in DNA sequences. Existing qPMS algorithms have been able to efficiently process small standard datasets (e.g., t = 20 and n = 600), but they are too time consuming to process large DNA datasets, such as ChIP-seq datasets that contain thousands of sequences or more. RESULTS: We analyze the effects of t and q on the time performance of qPMS algorithms and find that a large t or a small q causes a longer computation time. Based on this information, we improve the time performance of existing qPMS algorithms by selecting a sample sequence set D' with a small t and a large q from the large input dataset D and then executing qPMS algorithms on D'. A sample sequence selection algorithm named SamSelect is proposed. The experimental results on both simulated and real data show (1) that SamSelect can select D' efficiently and (2) that the qPMS algorithms executed on D' can find implanted or real motifs in a significantly shorter time than when executed on D. CONCLUSIONS: We improve the ability of existing qPMS algorithms to process large DNA datasets from the perspective of selecting high-quality sample sequence sets so that the qPMS algorithms can find motifs in a short time in the selected sample sequence set D', rather than take an unfeasibly long time to search the original sequence set D. Our motif discovery method is an approximate algorithm.


Assuntos
Algoritmos , Biologia Computacional/métodos , DNA/análise , DNA/genética , Motivos de Nucleotídeos , Análise de Sequência de DNA/métodos , DNA/química , Humanos
8.
BMC Bioinformatics ; 18(1): 504, 2017 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-29157200

RESUMO

BACKGROUND: The sequence of nucleotides in an RNA determines the possible base pairs for an RNA fold and thus also determines the overall shape and function of an RNA. The Swellix program presented here combines a helix abstraction with a combinatorial approach to the RNA folding problem in order to compute all possible non-pseudoknotted RNA structures for RNA sequences. The Swellix program builds on the Crumple program and can include experimental constraints on global RNA structures such as the minimum number and lengths of helices from crystallography, cryoelectron microscopy, or in vivo crosslinking and chemical probing methods. RESULTS: The conceptual advance in Swellix is to count helices and generate all possible combinations of helices rather than counting and combining base pairs. Swellix bundles similar helices and includes improvements in memory use and efficient parallelization. Biological applications of Swellix are demonstrated by computing the reduction in conformational space and entropy due to naturally modified nucleotides in tRNA sequences and by motif searches in Human Endogenous Retroviral (HERV) RNA sequences. The Swellix motif search reveals occurrences of protein and drug binding motifs in the HERV RNA ensemble that do not occur in minimum free energy or centroid predicted structures. CONCLUSIONS: Swellix presents significant improvements over Crumple in terms of efficiency and memory use. The efficient parallelization of Swellix enables the computation of sequences as long as 418 nucleotides with sufficient experimental constraints. Thus, Swellix provides a practical alternative to free energy minimization tools when multiple structures, kinetically determined structures, or complex RNA-RNA and RNA-protein interactions are present in an RNA folding problem.


Assuntos
Algoritmos , Biologia Computacional/métodos , Conformação de Ácido Nucleico , RNA/química , Pareamento de Bases , Sequência de Bases , Retrovirus Endógenos/genética , Humanos , Nucleotídeos/química , Preparações Farmacêuticas/química , Preparações Farmacêuticas/metabolismo , RNA/genética , Dobramento de RNA , RNA de Transferência/química , RNA Viral/química , RNA Viral/genética , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/metabolismo , Termodinâmica
9.
BMC Bioinformatics ; 17(1): 216, 2016 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-27188396

RESUMO

BACKGROUND: In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. RESULTS: We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. CONCLUSIONS: We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo .


Assuntos
Motivos de Nucleotídeos , RNA/química , Análise de Sequência de RNA/métodos , Algoritmos , Entropia , Humanos
10.
BMC Bioinformatics ; 17 Suppl 9: 266, 2016 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-27454113

RESUMO

BACKGROUND: The planted (l, d) motif search (PMS) is an important yet challenging problem in computational biology. Pattern-driven PMS algorithms usually use k out of t input sequences as reference sequences to generate candidate motifs, and they can find all the (l, d) motifs in the input sequences. However, most of them simply take the first k sequences in the input as reference sequences without elaborate selection processes, and thus they may exhibit sharp fluctuations in running time, especially for large alphabets. RESULTS: In this paper, we build the reference sequence selection problem and propose a method named RefSelect to quickly solve it by evaluating the number of candidate motifs for the reference sequences. RefSelect can bring a practical time improvement of the state-of-the-art pattern-driven PMS algorithms. Experimental results show that RefSelect (1) makes the tested algorithms solve the PMS problem steadily in an efficient way, (2) particularly, makes them achieve a speedup of up to about 100× on the protein data, and (3) is also suitable for large data sets which contain hundreds or more sequences. CONCLUSIONS: The proposed algorithm RefSelect can be used to solve the problem that many pattern-driven PMS algorithms present execution time instability. RefSelect requires a small amount of storage space and is capable of selecting reference sequences efficiently and effectively. Also, the parallel version of RefSelect is provided for handling large data sets.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Algoritmos , Motivos de Aminoácidos , Domínios Proteicos , Proteínas/genética , Análise de Sequência de Proteína , Software
11.
Front Bioinform ; 4: 1341479, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38379813

RESUMO

In the past, several methods have been developed for predicting the single-label subcellular localization of messenger RNA (mRNA). However, only limited methods are designed to predict the multi-label subcellular localization of mRNA. Furthermore, the existing methods are slow and cannot be implemented at a transcriptome scale. In this study, a fast and reliable method has been developed for predicting the multi-label subcellular localization of mRNA that can be implemented at a genome scale. Machine learning-based methods have been developed using mRNA sequence composition, where the XGBoost-based classifier achieved an average area under the receiver operator characteristic (AUROC) of 0.709 (0.668-0.732). In addition to alignment-free methods, we developed alignment-based methods using motif search techniques. Finally, a hybrid technique that combines the XGBoost model and the motif-based approach has been developed, achieving an average AUROC of 0.742 (0.708-0.816). Our method-MRSLpred-outperforms the existing state-of-the-art classifier in terms of performance and computation efficiency. A publicly accessible webserver and a standalone tool have been developed to facilitate researchers (webserver: https://webs.iiitd.edu.in/raghava/mrslpred/).

12.
Med Biol Eng Comput ; 60(2): 511-530, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35020123

RESUMO

The detection of inimitable patterns (motif) occurring in a set of biological sequences could elevate new biological discoveries. Its application in recognition of transcription factors and their binding sites have demonstrated the necessity to attain knowledge of gene function, human diseases, and drug design. The literature identifies (ℓ, d) motif search as the widely studied problem in PMS (Planted Motif Search). This paper proposes an efficient optimization algorithm named "Freezing FireFly (FFF)" to solve (ℓ, d) motif search problem. The new strategy freezing such as local and global was added to increase the performance of the basic Firefly algorithm. It freezes the best possible out coming positions even in the lesser brighter one. The performance of the proposed algorithm is experienced on simulated and real datasets. The experimental results show that the proposed algorithm resolves the instance (50, 21) within 1.47 min in the simulated dataset. For real (such as ChIP-seq (Chromatin Immunoprecipitation)) and synthetic datasets, the proposed algorithm runs much faster in comparison to existing state-of-the-art optimization algorithms, including Samselect, TraverStringRef, PMS8, qPMS9, AlignACE, FMGA, and GSGA.


Assuntos
Algoritmos , Humanos , Sítios de Ligação , Biologia Computacional , Congelamento
13.
Front Plant Sci ; 13: 938545, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35968123

RESUMO

Position weight matrix (PWM) is the traditional motif model representing the transcription factor (TF) binding sites. It proposes that the positions contribute independently to TFs binding affinity, although this hypothesis does not fit the data perfectly. This explains why PWM hits are missing in a substantial fraction of ChIP-seq peaks. To study various modes of the direct binding of plant TFs, we compiled the benchmark collection of 111 ChIP-seq datasets for Arabidopsis thaliana, and applied the traditional PWM, and two alternative motif models BaMM and SiteGA, proposing the dependencies of the positions. The variation in the stringency of the recognition thresholds for the models proposed that the hits of PWM, BaMM, and SiteGA models are associated with the sites of high/medium, any, and low affinity, respectively. At the medium recognition threshold, about 60% of ChIP-seq peaks contain PWM hits consisting of conserved core consensuses, while BaMM and SiteGA provide hits for an additional 15% of peaks in which a weaker core consensus is compensated through intra-motif dependencies. The presence/absence of these dependencies in the motifs of alternative/traditional models was confirmed by the dependency logo DepLogo visualizing the position-wise partitioning of the alignments of predicted sites. We exemplify the detailed analysis of ChIP-seq profiles for plant TFs CCA1, MYC2, and SEP3. Gene ontology (GO) enrichment analysis revealed that among the three motif models, the SiteGA had the highest portions of genes with the significantly enriched GO terms among all predicted genes. We showed that both alternative motif models provide for traditional PWM greater extensions in predicted sites for TFs MYC2/SEP3 with condition/tissue specific functions, compared to those for TF CCA1 with housekeeping functions. Overall, the combined application of standard and alternative motif models is beneficial to detect various modes of the direct TF-DNA interactions in the maximal portion of ChIP-seq loci.

14.
Front Big Data ; 4: 806014, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35281988

RESUMO

Temporal networks are graphs where each edge is linked with a timestamp, denoting when an interaction between two nodes happens. According to the most recently proposed definitions of the problem, motif search in temporal networks consists in finding and counting all connected temporal graphs Q (called motifs) occurring in a larger temporal network T, such that matched target edges follow the same chronological order imposed by edges in Q. In the last few years, several algorithms have been proposed to solve motif search, but most of them are limited to very small or specific motifs due to the computational complexity of the problem. In this paper, we present MODIT (MOtif DIscovery in Temporal Networks), an algorithm for counting motifs of any size in temporal networks, inspired by a very recent algorithm for subgraph isomorphism in temporal networks, called TemporalRI. Experiments show that for big motifs (more than 3 nodes and 3 edges) MODIT can efficiently retrieve them in reasonable time (up to few hours) in many networks of medium and large size and outperforms state-of-the art algorithms.

15.
Genes (Basel) ; 11(7)2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32630768

RESUMO

The mTOR signaling controls essential biological functions including proliferation, growth, metabolism, autophagy, ageing, and others. Hyperactivation of mTOR signaling leads to a plethora of human disorders; thus, mTOR is an attractive drug target. The discovery of mTOR signaling started from isolation of rapamycin in 1975 and cloning of TOR genes in 1993. In the past 27 years, numerous research groups have contributed significantly to advancing our understanding of mTOR signaling and mTOR biology. Notably, a variety of experimental approaches have been employed in these studies to identify key mTOR pathway members that shape up the mTOR signaling we know today. Technique development drives mTOR research, while canonical biochemical and yeast genetics lay the foundation for mTOR studies. Here in this review, we summarize major experimental approaches used in the past in delineating mTOR signaling, including biochemical immunoprecipitation approaches, genetic approaches, immunofluorescence microscopic approaches, hypothesis-driven studies, protein sequence or motif search driven approaches, and bioinformatic approaches. We hope that revisiting these distinct types of experimental approaches will provide a blueprint for major techniques driving mTOR research. More importantly, we hope that thinking and reasonings behind these experimental designs will inspire future mTOR research as well as studies of other protein kinases beyond mTOR.


Assuntos
Proteínas de Saccharomyces cerevisiae/metabolismo , Transdução de Sinais , Serina-Treonina Quinases TOR/metabolismo , Técnicas Genéticas , Saccharomyces cerevisiae , Proteínas de Saccharomyces cerevisiae/genética , Serina-Treonina Quinases TOR/genética
16.
Bioinform Biol Insights ; 13: 1177932218821365, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30670918

RESUMO

Heat stress transcription factors (HSFs) regulate transcriptional response to a large number of environmental influences, such as temperature fluctuations and chemical compound applications. Plant HSFs represent a large and diverse gene family. The HSF members vary substantially both in gene expression patterns and molecular functions. HEATSTER is a web resource for mining, annotating, and analyzing members of the different classes of HSFs in plants. A web-interface allows the identification and class assignment of HSFs, intuitive searches in the database and visualization of conserved motifs, and domains to classify novel HSFs.

17.
Front Plant Sci ; 8: 709, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28523013

RESUMO

Dehydrins, a large family of abiotic stress proteins, are defined by the presence of a mostly conserved motif known as the K-segment, and may also contain two other conserved motifs known as the Y-segment and S-segment. Using the dehydrin literature, we developed a sequence motif definition of the K-segment, which we used to create a large dataset of dehydrin sequences by searching the Pfam00257 dehydrin dataset and the Phytozome 10 sequences of vascular plants. A comprehensive analysis of these sequences reveals that lysine residues are highly conserved in the K-segment, while the amino acid type is often conserved at other positions. Despite the Y-segment name, the central tyrosine is somewhat conserved, but can be substituted with two other small aromatic amino acids (phenylalanine or histidine). The S-segment contains a series of serine residues, but in some proteins is also preceded by a conserved LHR sequence. In many dehydrins containing all three of these motifs the S-segment is linked to the K-segment by a GXGGRRKK motif (where X can be any amino acid), suggesting a functional linkage between these two motifs. An analysis of the sequences shows that the dehydrin architecture and several biochemical properties (isoelectric point, molecular mass, and hydrophobicity score) are dependent on each other, and that some dehydrin architectures are overexpressed during certain abiotic stress, suggesting that they may be optimized for a specific abiotic stress while others are involved in all forms of dehydration stress (drought, cold, and salinity).

18.
J Mol Biol ; 429(23): 3587-3605, 2017 11 24.
Artigo em Inglês | MEDLINE | ID: mdl-28988954

RESUMO

Coarse-grained models represent attractive approaches to analyze and simulate ribonucleic acid (RNA) molecules, for example, for structure prediction and design, as they simplify the RNA structure to reduce the conformational search space. Our structure prediction protocol RAGTOP (RNA-As-Graphs Topology Prediction) represents RNA structures as tree graphs and samples graph topologies to produce candidate graphs. However, for a more detailed study and analysis, construction of atomic from coarse-grained models is required. Here we present our graph-based fragment assembly algorithm (F-RAG) to convert candidate three-dimensional (3D) tree graph models, produced by RAGTOP into atomic structures. We use our related RAG-3D utilities to partition graphs into subgraphs and search for structurally similar atomic fragments in a data set of RNA 3D structures. The fragments are edited and superimposed using common residues, full atomic models are scored using RAGTOP's knowledge-based potential, and geometries of top scoring models is optimized. To evaluate our models, we assess all-atom RMSDs and Interaction Network Fidelity (a measure of residue interactions) with respect to experimentally solved structures and compare our results to other fragment assembly programs. For a set of 50 RNA structures, we obtain atomic models with reasonable geometries and interactions, particularly good for RNAs containing junctions. Additional improvements to our protocol and databases are outlined. These results provide a good foundation for further work on RNA structure prediction and design applications.


Assuntos
Algoritmos , Biologia Computacional/métodos , Conformação de Ácido Nucleico , RNA/química , Bases de Dados de Ácidos Nucleicos , Humanos , Modelos Moleculares
19.
Methods Mol Biol ; 1468: 121-38, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27662874

RESUMO

Transcriptional enhancers are DNA regulatory elements that are bound by transcription factors and act to positively regulate the expression of nearby or distally located target genes. Enhancers have many features that have been discovered using genomic analyses. Recent studies have shown that active enhancers recruit RNA polymerase II (Pol II) and are transcribed, producing enhancer RNAs (eRNAs). GRO-seq, a method for identifying the location and orientation of all actively transcribing RNA polymerases across the genome, is a powerful approach for monitoring nascent enhancer transcription. Furthermore, the unique pattern of enhancer transcription can be used to identify enhancers in the absence of any information about the underlying transcription factors. Here, we describe the computational approaches required to identify and analyze active enhancers using GRO-seq data, including data pre-processing, alignment, and transcript calling. In addition, we describe protocols and computational pipelines for mining GRO-seq data to identify active enhancers, as well as known transcription factor binding sites that are transcribed. Furthermore, we discuss approaches for integrating GRO-seq-based enhancer data with other genomic data, including target gene expression and function. Finally, we describe molecular biology assays that can be used to confirm and explore further the function of enhancers that have been identified using genomic assays. Together, these approaches should allow the user to identify and explore the features and biological functions of new cell type-specific enhancers.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Elementos Facilitadores Genéticos , Humanos , RNA Polimerase II/metabolismo , RNA Longo não Codificante/genética , Alinhamento de Sequência , Análise de Sequência de RNA , Transcrição Gênica
20.
J Comput Biol ; 23(7): 615-23, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27152692

RESUMO

Motif finding is an important and a challenging problem in many biological applications such as discovering promoters, enhancers, locus control regions, transcription factors, and more. The (l, d)-planted motif search, PMS, is one of several variations of the problem. In this problem, there are n given sequences over alphabets of size [Formula: see text], each of length m, and two given integers l and d. The problem is to find a motif m of length l, where in each sequence there is at least an l-mer at a Hamming distance of [Formula: see text] of m. In this article, we propose ET-Motif, an algorithm that can solve the PMS problem in [Formula: see text] time and [Formula: see text] space. The time bound can be further reduced by a factor of m with [Formula: see text] space. In case the suffix tree that is built for the input sequences is balanced, the problem can be solved in [Formula: see text] time and [Formula: see text] space. Similarly, the time bound can be reduced by a factor of m using [Formula: see text] space. Moreover, the variations of the problem, namely the edit distance PMS and edited PMS (Quorum), can be solved using ET-Motif with simple modifications but upper bands of space and time. For edit distance PMS, the time and space bounds will be increased by [Formula: see text], while for edited PMS the increase will be of [Formula: see text] in the time bound.


Assuntos
Análise de Sequência de DNA/métodos , Algoritmos , Biologia Computacional/métodos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa