Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Biomol Struct Dyn ; : 1-9, 2024 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-38165232

RESUMO

The interphase chromatin structure is extremely complex, precise and dynamic. Experimental methods can only show the frequency of interaction of the various parts of the chromatin. Therefore, it is extremely important to develop theoretical methods to predict the chromatin structure. In this publication, we implemented an extended version of the SBS model described by Barbieri et al. and created the ChroMC program that is easy to use and freely available (https://github.com/regulomics/chroMC) to other users. We also describe the necessary factors for the effective modeling of the chromatin structure in Drosophila melanogaster. We compared results of chromatin structure predictions using two methods: Monte Carlo and Molecular Dynamic. Our simulations suggest that incorporating black, non-reactive chromatin is necessary for successful prediction of chromatin structure, while the loop extrusion model with a long range attraction potential or Lennard-Jones (with local attraction force) as well as using Hi-C data as input are not essential for the basic structure reconstruction. We also proposed a new way to calculate the similarity of the properties of contact maps including the calculation of local similarity.Communicated by Ramaswamy H. Sarma.

4.
Sci Rep ; 11(1): 15668, 2021 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-34341417

RESUMO

Genome-wide studies have uncovered specific genetic alterations, transcriptomic patterns and epigenetic profiles associated with different glioma types. We have recently created a unique atlas encompassing genome-wide profiles of open chromatin, histone H3K27ac and H3Kme3 modifications, DNA methylation and transcriptomes of 33 glioma samples of different grades. Here, we intersected genome-wide atlas data with topologically associating domains (TADs) and demonstrated that the chromatin organization and epigenetic landscape of enhancers have a strong impact on genes differentially expressed in WHO low grade versus high grade gliomas. We identified TADs enriched in glioma grade-specific genes and/or epigenetic marks. We found the set of transcription factors, including REST, E2F1 and NFKB1, that are most likely to regulate gene expression in multiple TADs, containing specific glioma-related genes. Moreover, many genes associated with the cell-matrix adhesion Gene Ontology group, in particular 14 PROTOCADHERINs, were found to be regulated by long-range contacts with enhancers. Presented results demonstrate the existence of epigenetic differences associated with chromatin organization driving differential gene expression in gliomas of different malignancy.


Assuntos
Cromatina , Epigênese Genética , Glioma , Cromossomos , Elementos Facilitadores Genéticos , Evolução Molecular , Humanos
5.
Int J Mol Sci ; 22(15)2021 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-34360860

RESUMO

Maps of Hi-C contacts between promoters and enhancers can be analyzed as networks, with cis-regulatory regions as nodes and their interactions as edges. We checked if in the published promoter-enhancer network of mouse embryonic stem (ES) cells the differences in the node type (promoter or enhancer) and the node degree (number of regions interacting with a given promoter or enhancer) are reflected by sequence composition or sequence similarity of the interacting nodes. We used counts of all k-mers (k = 4) to analyze the sequence composition and the Euclidean distance between the k-mer count vectors (k-mer distance) as the measure of sequence (dis)similarity. The results we obtained with 4-mers are interpretable in terms of dinucleotides. Promoters are GC-rich as compared to enhancers, which is known. Enhancers are enriched in scaffold/matrix attachment regions (S/MARs) patterns and depleted of CpGs. Furthermore, we show that promoters are more similar to their interacting enhancers than vice-versa. Most notably, in both promoters and enhancers, the GC content and the CpG count increase with the node degree. As a consequence, enhancers of higher node degree become more similar to promoters, whereas higher degree promoters become less similar to enhancers. We confirmed the key results also for human keratinocytes.


Assuntos
Elementos Facilitadores Genéticos , Redes Reguladoras de Genes , Modelos Genéticos , Células-Tronco Embrionárias Murinas/metabolismo , Animais , Composição de Bases , Biologia Computacional , Ilhas de CpG , Humanos , Queratinócitos/metabolismo , Camundongos
6.
Int J Mol Sci ; 22(15)2021 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-34360892

RESUMO

The explosive development of next-generation sequencing-based technologies has allowed us to take an unprecedented look at many molecular signatures of the non-coding genome. In particular, the ChIP-seq (Chromatin ImmunoPrecipitation followed by sequencing) technique is now very commonly used to assess the proteins associated with different non-coding DNA regions genome-wide. While the analysis of such data related to transcription factor binding is relatively straightforward, many modified histone variants, such as H3K27me3, are very important for the process of gene regulation but are very difficult to interpret. We propose a novel method, called HERON (HiddEn MaRkov mOdel based peak calliNg), for genome-wide data analysis that is able to detect DNA regions enriched for a certain feature, even in difficult settings of weakly enriched long DNA domains. We demonstrate the performance of our method both on simulated and experimental data.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação/métodos , DNA/genética , DNA/metabolismo , Genoma Humano , Histonas/genética , Histonas/metabolismo , Adulto , Algoritmos , Expressão Gênica , Regulação da Expressão Gênica , Hipocampo/embriologia , Hipocampo/metabolismo , Código das Histonas/genética , Humanos , Fígado/metabolismo , Metilação , Distribuição Normal , Ligação Proteica
7.
NAR Genom Bioinform ; 3(3): lqab069, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34327330

RESUMO

Despite great increase of the amount of data from genome-wide association studies (GWAS) and whole-genome sequencing (WGS), the genetic background of a partially heritable Alzheimer's disease (AD) is not fully understood yet. Machine learning methods are expected to help researchers in the analysis of the large number of SNPs possibly associated with the disease onset. To date, a number of such approaches were applied to genotype-based classification of AD patients and healthy controls using GWAS data and reported accuracy of 0.65-0.975. However, since the estimated influence of genotype on sporadic AD occurrence is lower than that, these very high classification accuracies may potentially be a result of overfitting. We have explored the possibilities of applying feature selection and classification using random forests to WGS and GWAS data from two datasets. Our results suggest that this approach is prone to overfitting if feature selection is performed before division of data into the training and testing set. Therefore, we recommend avoiding selection of features used to build the model based on data included in the testing set. We suggest that for currently available dataset sizes the expected classifier performance is between 0.55 and 0.7 (AUC) and higher accuracies reported in literature are likely a result of overfitting.

8.
Nat Commun ; 12(1): 3621, 2021 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-34131149

RESUMO

Chromatin structure and accessibility, and combinatorial binding of transcription factors to regulatory elements in genomic DNA control transcription. Genetic variations in genes encoding histones, epigenetics-related enzymes or modifiers affect chromatin structure/dynamics and result in alterations in gene expression contributing to cancer development or progression. Gliomas are brain tumors frequently associated with epigenetics-related gene deregulation. We perform whole-genome mapping of chromatin accessibility, histone modifications, DNA methylation patterns and transcriptome analysis simultaneously in multiple tumor samples to unravel epigenetic dysfunctions driving gliomagenesis. Based on the results of the integrative analysis of the acquired profiles, we create an atlas of active enhancers and promoters in benign and malignant gliomas. We explore these elements and intersect with Hi-C data to uncover molecular mechanisms instructing gene expression in gliomas.


Assuntos
Cromatina , Glioma/genética , Sequências Reguladoras de Ácido Nucleico , Sítios de Ligação , Neoplasias Encefálicas/genética , Imunoprecipitação da Cromatina , DNA/metabolismo , Metilação de DNA , Proteínas de Ligação a DNA/metabolismo , Proteína Potenciadora do Homólogo 2 de Zeste , Epigênese Genética , Epigenômica , Proteína Forkhead Box M1 , Expressão Gênica , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Glioblastoma , Código das Histonas , Histonas , Humanos , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo
9.
PeerJ ; 9: e10558, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33981483

RESUMO

MOTIVATION: Computational analysis of chromosomal contact data is currently gaining popularity with the rapid advance in experimental techniques providing access to a growing body of data. An important problem in this area is the identification of long range contacts between distinct chromatin regions. Such loops were shown to exist at different scales, either mediating relatively short range interactions between enhancers and promoters or providing interactions between much larger, distant chromosome domains. A proper statistical analysis as well as availability to a wide research community are crucial in a tool for this task. RESULTS: We present HiCEnterprise, a first freely available software tool for identification of long range chromatin contacts not only between small regions, but also between chromosomal domains. It implements four different statistical tests for identification of significant contacts for user defined regions or domains as well as necessary functions for input, output and visualization of chromosome contacts. AVAILABILITY: The software and the corresponding documentation are available at: github.com/regulomics/HiCEnterprise. SUPPLEMENTARY INFORMATION: Supplemental data are available in the online version of the article and at the website regulomics.mimuw.edu.pl/wp/hicenterprise.

10.
Methods ; 181-182: 80-85, 2020 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-31445092

RESUMO

Recent years have brought us great wealth of new types of experimental data on different aspects of chromatin state, from chromosome conformation assays, through super-resolution microscopic imaging to epigenetic modifications and lamina interaction assays. This rapid increase in data availability have motivated many novel approaches to 3D modeling of chromosomes, their conformations and dynamic behavior. Even though there are many tools already developed for molecular visualization in the field of structural bioinformatics, they are usually optimized for visualization of smaller molecules (like proteins) and much shorter trajectories. We have developed a novel approach to visualization of long trajectories of large polymers, typical in the field of chromatin modeling. Our software, called QChromosomeVisualizer (QCV), allows for quick visualization of long simulations containing thousands or even millions of frames and generating good looking still images and animations including spherical 360 videos that can be viewed in VR headsets. We believe that this kind of tools will be helpful for the broader community of researchers interested in modeling by allowing them to create new and clearer ways to communicate their results.


Assuntos
Cromossomos/química , Biologia Computacional/métodos , Visualização de Dados , Imageamento Tridimensional/métodos , Software , Cromatina/química , Conformação Molecular , Polímeros/química , Realidade Virtual
11.
J Theor Biol ; 486: 110091, 2020 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-31790679

RESUMO

Gene regulatory networks are a popular tool for modelling important biological phenomena, such as cell differentiation or oncogenesis. Efficient identification of the causal connections between genes, their products and regulating transcription factors, is key to understanding how defects in their function may trigger diseases. Modelling approaches should keep up with the ever more detailed descriptions of the biological phenomena at play, as provided by new experimental findings and technical improvements. In recent years, we have seen great improvements in mapping of specific binding sites of many transcription factors to distinct regulatory regions. Recent gene regulatory network models use binding measurements; but usually only to define gene-to-gene interactions, ignoring regulatory module structure. Moreover, current huge amount of transcriptomic data, and exploration of all possible cis-regulatory arrangements which can lead to the same transcriptomic response, makes manual model building both tedious and time-consuming. In our paper, we propose a method to specify possible regulatory connections in a given Boolean network, based on transcription factor binding evidence. This is implemented by an algorithm which expands a regular Boolean network model into a "cis-regulatory" Boolean network model. This expanded model explicitly defines regulatory regions as additional nodes in the network, and adds new, valuable biological insights to the system dynamics. The expanded model can automatically be compared with expression data. And, for each node, a regulatory function, consistent with the experimental data, can be found. The resulting models are usually more constrained (by biologically-motivated metadata), and can then be inspected in in silico simulations. The fully automated method for model identification has been implemented in Python, and the expansion algorithm in R. The method resorts to the Z3 Satisfiability Modulo Theories (SMT) solver, and is similar to the RE:IN application (Yordanov et al., 2016). It is available on https://github.com/regulomics/expansion-network.


Assuntos
Biologia Computacional , Redes Reguladoras de Genes , Algoritmos , Sítios de Ligação , Simulação por Computador
12.
J Comput Biol ; 26(4): 305-314, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30810370

RESUMO

Studying the three-dimensional structure of chromosomes is an emerging field flourishing in recent years because of rapid development of experimental approaches for studying chromosomal contacts. This has led to numerous studies providing results of segmentation of chromosome sequences of different species into so-called topologically associating domains (TADs). As the number of such studies grows steadily and many of them make claims about the perceived differences between TAD structures observed in different conditions, there is a growing need for good measures of similarity (or dissimilarity) between such segmentations. We provide here a bipartite (BP) score, which is a relatively simple distance metric based on the bipartite matching between two segmentations. In this article, we provide the rationale behind choosing specifically this function and show its results on several different data sets, both simulated and experimental. We show that not only the BP score is a proper metric satisfying the triangle inequality, but also that it is providing good granularity of scores for typical situations occurring between different TAD segmentations. We also introduce local variant of the BP metric and show that in actual comparisons between experimental data sets, the local BP score is correlating with the observed changes in gene expression and genome methylation. In summary, we consider the BP score a good foundation for analyzing the dynamics of chromosome structures. The methodology we present in this study could be used by many researchers in their ongoing analyses, making it a popular and useful tool.


Assuntos
Cromossomos Humanos/química , Biologia Computacional/métodos , Algoritmos , Cromatina/química , Humanos , Conformação Molecular
13.
PeerJ ; 6: e5692, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30364537

RESUMO

BACKGROUND: Bayesian networks are directed acyclic graphical models widely used to represent the probabilistic relationships between random variables. They have been applied in various biological contexts, including gene regulatory networks and protein-protein interactions inference. Generally, learning Bayesian networks from experimental data is NP-hard, leading to widespread use of heuristic search methods giving suboptimal results. However, in cases when the acyclicity of the graph can be externally ensured, it is possible to find the optimal network in polynomial time. While our previously developed tool BNFinder implements polynomial time algorithm, reconstructing networks with the large amount of experimental data still leads to computations on single CPU growing exceedingly. RESULTS: In the present paper we propose parallelized algorithm designed for multi-core and distributed systems and its implementation in the improved version of BNFinder-tool for learning optimal Bayesian networks. The new algorithm has been tested on different simulated and experimental datasets showing that it has much better efficiency of parallelization than the previous version. BNFinder gives comparable results in terms of accuracy with respect to current state-of-the-art inference methods, giving significant advantage in cases when external information such as regulators list or prior edge probability can be introduced, particularly for datasets with static gene expression observations. CONCLUSIONS: We show that the new method can be used to reconstruct networks in the size range of thousands of genes making it practically applicable to whole genome datasets of prokaryotic systems and large components of eukaryotic genomes. Our benchmarking results on realistic datasets indicate that the tool should be useful to a wide audience of researchers interested in discovering dependencies in their large-scale transcriptomic datasets.

14.
BMC Cancer ; 18(1): 23, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29301499

RESUMO

BACKGROUND: The Grainyhead-like (GRHL) transcription factors have been linked to many different types of cancer. However, no previous study has attempted to investigate potential correlations in expression of different GRHL genes in this context. Furthermore, there is very little information concerning damaging mutations and/or single nucleotide polymorphisms in GRHL genes that may be linked to cancer. METHODS: DNA and RNA were extracted from human non-melanoma skin cancers (NMSC) and adjacent normal tissues (n = 33 pairs of samples). The expression of GRHL genes was measured by quantitative real time PCR. Regulation of GRHL expression by miRNA was studied using cell transfection methods and dual-luciferase reporter system. Targeted deep sequencing of GRHL genes in tumor samples and control tissues were employed to search for mutations and single nucleotide polymorphisms. Single marker rs141193530 was genotyped with pyrosequencing in additional NMSC replication cohort (n = 176). Appropriate statistical and bioinformatic methods were used to analyze and interpret results. RESULTS: We discovered that the expression of two genes - GRHL1 and GRHL3 - is reduced in a coordinated manner in tumor samples, in comparison to the control healthy skin samples obtained from the same individuals. It is possible that both GRHL1 and GRHL3 are regulated, at least to some extent, by different strands of the same oncogenic microRNA - miR-21, what would at least partially explain observed correlation. No de novo mutations in the GRHL genes were detected in the examined tumor samples. However, some single nucleotide polymorphisms in the GRHL genes occur at significantly altered frequencies in the examined group of NMSC patients. CONCLUSIONS: Non-melanoma skin cancer growth is accompanied by coordinated reduced expression of epidermal differentiation genes: GRHL1 and GRHL3, which may be regulated by miR-21-3p and -5p, respectively. Some potentially damaging single nucleotide polymorphisms in GRHL genes occur with altered frequencies in NMSC patients, and they may in particular impair the expression of GRHL3 gene or functioning of encoded protein. The presence of these polymorphisms may indicate an increased risk of NMSC development in affected people.


Assuntos
Proteínas de Ligação a DNA/genética , MicroRNAs/genética , Proteínas Repressoras/genética , Neoplasias Cutâneas/genética , Fatores de Transcrição/genética , Diferenciação Celular/genética , Epiderme/crescimento & desenvolvimento , Epiderme/patologia , Feminino , Regulação Neoplásica da Expressão Gênica/genética , Humanos , Masculino , Mutação , Polimorfismo de Nucleotídeo Único/genética , Neoplasias Cutâneas/patologia
15.
Nucleic Acids Res ; 46(4): 1724-1740, 2018 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-29216379

RESUMO

Endothelial cells (ECs) differentiate from mesodermal progenitors during vasculogenesis. By comparing changes in chromatin interactions between human umbilical vein ECs, embryonic stem cells and mesendoderm cells, we identified regions exhibiting EC-specific compartmentalization and changes in the degree of connectivity within topologically associated domains (TADs). These regions were characterized by EC-specific transcription, binding of lineage-determining transcription factors and cohesin. In addition, we identified 1200 EC-specific long-range interactions (LRIs) between TADs. Most of the LRIs were connected between regions enriched for H3K9me3 involving pericentromeric regions, suggesting their involvement in establishing compartmentalization of heterochromatin during differentiation. Second, we provide evidence that EC-specific LRIs correlate with changes in the hierarchy of chromatin aggregation. Despite these rearrangements, the majority of chromatin domains fall within a pre-established hierarchy conserved throughout differentiation. Finally, we investigated the effect of hypoxia on chromatin organization. Although hypoxia altered the expression of hundreds of genes, minimal effect on chromatin organization was seen. Nevertheless, 70% of hypoxia-inducible genes situated within a TAD bound by HIF1α suggesting that transcriptional responses to hypoxia largely depend on pre-existing chromatin organization. Collectively our results show that large structural rearrangements establish chromatin architecture required for functional endothelium and this architecture remains largely unchanged in response to hypoxia.


Assuntos
Cromatina/metabolismo , Células Endoteliais da Veia Umbilical Humana/metabolismo , Proteínas de Ciclo Celular/metabolismo , Diferenciação Celular , Hipóxia Celular , Células Cultivadas , Proteínas Cromossômicas não Histona/metabolismo , Epigênese Genética , Heterocromatina , Humanos , Transcrição Gênica , Coesinas
16.
BMC Med Genomics ; 10(Suppl 1): 34, 2017 05 24.
Artigo em Inglês | MEDLINE | ID: mdl-28589862

RESUMO

BACKGROUND: Many genetic diseases are caused by mutations in non-coding regions of the genome. These mutations are frequently found in enhancer sequences, causing disruption to the regulatory program of the cell. Enhancers are short regulatory sequences in the non-coding part of the genome that are essential for the proper regulation of transcription. While the experimental methods for identification of such sequences are improving every year, our understanding of the rules behind the enhancer activity has not progressed much in the last decade. This is especially true in case of tissue-specific enhancers, where there are clear problems in predicting specificity of enhancer activity. RESULTS: We show a random-forest based machine learning approach capable of matching the performance of the current state-of-the-art methods for enhancer prediction. Then we show that it is, similarly to other published methods, frequently cross-predicting enhancers as active in different tissues, making it less useful for predicting tissue specific activity. Then we proceed to show that the problem is related to the fact that the enhancer predicting models exhibit a bias towards predicting gene promoters as active enhancers. Then we show that using a two-step classifier can lead to lower cross-prediction between tissues. CONCLUSIONS: We provide whole-genome predictions of human heart and brain enhancers obtained with two-step classifier.


Assuntos
Elementos Facilitadores Genéticos/genética , Genômica/métodos , Regiões Promotoras Genéticas/genética , Sequência de Bases , Histonas/genética , Humanos , Especificidade de Órgãos
17.
Mol Carcinog ; 56(11): 2414-2423, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28543713

RESUMO

The involvement of Grainyhead-like (GRHL) transcription factors in various cancers is well documented. However, little is known about their role in clear cell renal cell carcinoma (ccRCC). We discovered that the expression of two of these factors-GRHL1 and GRHL2-are downregulated in ccRCC samples, and their expression is correlated with the expression of VHL gene. This suggests a functional link between the GRHL transcription factors and one of the best known tumor suppressors. Although the GRHL genes are not mutated in ccRCC, some of the single nucleotide polymorphisms in these genes may indicate an increased risk of ccRCC development and/or may allow to assess patients' prognoses and predict their responses to various forms of therapy. Silencing of GRHL2 expression in non-tumorigenic kidney cell line results in increased cell proliferation, increased resistance to apoptosis, as well as changes in the levels of selected proteins involved in the pathogenesis of ccRCC. These changes support the potential role for GRHL2 as a suppressor of ccRCC.


Assuntos
Carcinoma de Células Renais/genética , Proteínas de Ligação a DNA/genética , Regulação Neoplásica da Expressão Gênica , Neoplasias Renais/genética , Rim/patologia , Fatores de Transcrição/genética , Carcinoma de Células Renais/patologia , Linhagem Celular Tumoral , Feminino , Inativação Gênica , Humanos , Rim/metabolismo , Neoplasias Renais/patologia , Masculino , Polimorfismo de Nucleotídeo Único , Proteínas Repressoras/genética
18.
Nucleic Acids Res ; 45(6): 3116-3129, 2017 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-27994035

RESUMO

ATP-dependent chromatin remodeling complexes are important regulators of gene expression in Eukaryotes. In plants, SWI/SNF-type complexes have been shown critical for transcriptional control of key developmental processes, growth and stress responses. To gain insight into mechanisms underlying these roles, we performed whole genome mapping of the SWI/SNF catalytic subunit BRM in Arabidopsis thaliana, combined with transcript profiling experiments. Our data show that BRM occupies thousands of sites in Arabidopsis genome, most of which located within or close to genes. Among identified direct BRM transcriptional targets almost equal numbers were up- and downregulated upon BRM depletion, suggesting that BRM can act as both activator and repressor of gene expression. Interestingly, in addition to genes showing canonical pattern of BRM enrichment near transcription start site, many other genes showed a transcription termination site-centred BRM occupancy profile. We found that BRM-bound 3΄ gene regions have promoter-like features, including presence of TATA boxes and high H3K4me3 levels, and possess high antisense transcriptional activity which is subjected to both activation and repression by SWI/SNF complex. Our data suggest that binding to gene terminators and controlling transcription of non-coding RNAs is another way through which SWI/SNF complex regulates expression of its targets.


Assuntos
Adenosina Trifosfatases/metabolismo , Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Regulação da Expressão Gênica de Plantas , Regiões Promotoras Genéticas , Regiões Terminadoras Genéticas , Região 3'-Flanqueadora , Arabidopsis/metabolismo , Sítios de Ligação , RNA Antissenso/biossíntese , RNA Mensageiro/biossíntese , Transcrição Gênica
19.
J Comput Biol ; 24(3): 193-199, 2017 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-27710048

RESUMO

Here, we provide a new software tool, called FastBill, for prediction of evolutionarily conserved cis-regulatory modules. It improves on the previous version of our program, called Billboard, by improving the statistical significance calculation. It is also faster than the original Billboard, allowing for large-scale analyses, including multiple informant species. We illustrate the utility of FastBill by performing a large-scale computational experiment of enhancer prediction in the promoter area of more than 150 Drosophila melanogaster genes that possess annotated experimentally verified enhancers. FastBill is written in Python and is freely available for download as a standalone tool.


Assuntos
Drosophila melanogaster/genética , Drosophila/genética , Elementos Facilitadores Genéticos , Genes de Insetos , Regiões Promotoras Genéticas , Software , Animais , Drosophila/classificação , Evolução Molecular , Anotação de Sequência Molecular , Filogenia
20.
Int J Genomics ; 2015: 563482, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26558255

RESUMO

Background. Next-generation sequencing technologies are now producing multiple times the genome size in total reads from a single experiment. This is enough information to reconstruct at least some of the differences between the individual genome studied in the experiment and the reference genome of the species. However, in most typical protocols, this information is disregarded and the reference genome is used. Results. We provide a new approach that allows researchers to reconstruct genomes very closely related to the reference genome (e.g., mutants of the same species) directly from the reads used in the experiment. Our approach applies de novo assembly software to experimental reads and so-called pseudoreads and uses the resulting contigs to generate a modified reference sequence. In this way, it can very quickly, and at no additional sequencing cost, generate new, modified reference sequence that is closer to the actual sequenced genome and has a full coverage. In this paper, we describe our approach and test its implementation called RECORD. We evaluate RECORD on both simulated and real data. We made our software publicly available on sourceforge. Conclusion. Our tests show that on closely related sequences RECORD outperforms more general assisted-assembly software.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...