Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 82
Filtrar
1.
Nucleic Acids Res ; 51(D1): D564-D570, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36350659

RESUMO

We present an update of EpiFactors, a manually curated database providing information about epigenetic regulators, their complexes, targets, and products which is openly accessible at http://epifactors.autosome.org. An updated version of the EpiFactors contains information on 902 proteins, including 101 histones and protamines, and, as a main update, a newly curated collection of 124 lncRNAs involved in epigenetic regulation. The amount of publications concerning the role of lncRNA in epigenetics is rapidly growing. Yet, the resource that compiles, integrates, organizes, and presents curated information on lncRNAs in epigenetics is missing. EpiFactors fills this gap and provides data on epigenetic regulators in an accessible and user-friendly form. For 820 of the genes in EpiFactors, we include expression estimates across multiple cell types assessed by CAGE-Seq in the FANTOM5 project. In addition, the updated EpiFactors contains information on 73 protein complexes involved in epigenetic regulation. Our resource is practical for a wide range of users, including biologists, bioinformaticians and molecular/systems biologists.


Assuntos
Bases de Dados Genéticas , Epigênese Genética , Humanos , Histonas/genética , Histonas/metabolismo , Protaminas , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo
2.
Mol Genet Genomics ; 298(3): 555-566, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36856825

RESUMO

The cancer syndrome polymerase proofreading-associated polyposis results from germline mutations in the POLE and POLD1 genes. Mutations in the exonuclease domain of these genes are associated with hyper- and ultra-mutated tumors with a predominance of base substitutions resulting from faulty proofreading during DNA replication. When a new variant is identified by gene testing of POLE and POLD1, it is important to verify whether the variant is associated with PPAP or not, to guide genetic counseling of mutation carriers. In 2015, we reported the likely pathogenic (class 4) germline POLE c.1373A > T p.(Tyr458Phe) variant and we have now characterized this variant to verify that it is a class 5 pathogenic variant. For this purpose, we investigated (1) mutator phenotype in tumors from two carriers, (2) mutation frequency in cell-based mutagenesis assays, and (3) structural consequences based on protein modeling. Whole-exome sequencing of two tumors identified an ultra-mutator phenotype with a predominance of base substitutions, the majority of which are C > T. A SupF mutagenesis assay revealed increased mutation frequency in cells overexpressing the variant of interest as well as in isogenic cells encoding the variant. Moreover, exonuclease repair yeast-based assay supported defect in proofreading activity. Lastly, we present a homology model of human POLE to demonstrate structural consequences leading to pathogenic impact of the p.(Tyr458Phe) mutation. The three lines of evidence, taken together with updated co-segregation and previously published data, allow the germline variant POLE c.1373A > T p.(Tyr458Phe) to be reclassified as a class 5 variant. That means the variant is associated with PPAP.


Assuntos
DNA Polimerase II , Neoplasias , Humanos , DNA Polimerase II/genética , DNA Polimerase II/química , DNA Polimerase II/metabolismo , Proteínas de Ligação a Poli-ADP-Ribose/genética , Neoplasias/genética , Mutação , Exonucleases/genética , Exonucleases/metabolismo
3.
Insect Mol Biol ; 31(6): 810-820, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36054587

RESUMO

The protein vitellogenin (Vg) plays a central role in lipid transportation in most egg-laying animals. High Vg levels correlate with stress resistance and lifespan potential in honey bees (Apis mellifera). Vg is the primary circulating zinc-carrying protein in honey bees. Zinc is an essential metal ion in numerous biological processes, including the function and structure of many proteins. Measurements of Zn2+ suggest a variable number of ions per Vg molecule in different animal species, but the molecular implications of zinc-binding by this protein are not well-understood. We used inductively coupled plasma mass spectrometry to determine that, on average, each honey bee Vg molecule binds 3 Zn2+ -ions. Our full-length protein structure and sequence analysis revealed seven potential zinc-binding sites. These are located in the ß-barrel and α-helical subdomains of the N-terminal domain, the lipid binding site, and the cysteine-rich C-terminal region of unknown function. Interestingly, two potential zinc-binding sites in the ß-barrel can support a proposed role for this structure in DNA-binding. Overall, our findings suggest that honey bee Vg bind zinc at several functional regions, indicating that Zn2+ -ions are important for many of the activities of this protein. In addition to being potentially relevant for other egg-laying species, these insights provide a platform for studies of metal ions in bee health, which is of global interest due to recent declines in pollinator numbers.


Assuntos
Proteínas de Insetos , Vitelogeninas , Abelhas , Animais , Vitelogeninas/metabolismo , Proteínas de Insetos/metabolismo , Zinco , Sítios de Ligação , Lipídeos
4.
Biostatistics ; 21(3): 625-639, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-30698663

RESUMO

We present model-based analysis for ChIA-PET (MACPET), which analyzes paired-end read sequences provided by ChIA-PET for finding binding sites of a protein of interest. MACPET uses information from both tags of each PET and searches for binding sites in a two-dimensional space, while taking into account different noise levels in different genomic regions. MACPET shows favorable results compared with MACS in terms of motif occurrence and spatial resolution. Furthermore, significant binding sites discovered by MACPET are involved in a higher number of significant three-dimensional interactions than those discovered by MACS. MACPET is freely available on Bioconductor. ChIA-PET; MACPET; Model-based clustering; Paired-end tags; Peak-calling algorithm.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Imunoprecipitação da Cromatina , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Modelos Biológicos , Ligação Proteica , Análise de Sequência de DNA , Humanos
5.
BMC Bioinformatics ; 21(1): 134, 2020 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-32252623

RESUMO

BACKGROUND: Diseases like cancer will lead to changes in gene expression, and it is relevant to identify key regulatory genes that can be linked directly to these changes. This can be done by computing a Regulatory Impact Factor (RIF) score for relevant regulators. However, this computation is based on estimating correlated patterns of gene expression, often Pearson correlation, and an assumption about a set of specific regulators, normally transcription factors. This study explores alternative measures of correlation, using the Fisher and Sobolev metrics, and an extended set of regulators, including epigenetic regulators and long non-coding RNAs (lncRNAs). Data on prostate cancer have been used to explore the effect of these modifications. RESULTS: A tool for computation of RIF scores with alternative correlation measures and extended sets of regulators was developed and tested on gene expression data for prostate cancer. The study showed that the Fisher and Sobolev metrics lead to improved identification of well-documented regulators of gene expression in prostate cancer, and the sets of identified key regulators showed improved overlap with previously defined gene sets of relevance to cancer. The extended set of regulators lead to identification of several interesting candidates for further studies, including lncRNAs. Several key processes were identified as important, including spindle assembly and the epithelial-mesenchymal transition (EMT). CONCLUSIONS: The study has shown that using alternative metrics of correlation can improve the performance of tools based on correlation of gene expression in genomic data. The Fisher and Sobolev metrics should be considered also in other correlation-based applications.


Assuntos
Biologia Computacional/métodos , Epigênese Genética , Fatores de Transcrição/metabolismo , Bases de Dados Genéticas , Transição Epitelial-Mesenquimal , Regulação Neoplásica da Expressão Gênica , Humanos , Masculino , Neoplasias da Próstata/genética , Neoplasias da Próstata/patologia , RNA Longo não Codificante/metabolismo , Fatores de Transcrição/genética
6.
Nucleic Acids Res ; 45(D1): D737-D743, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27794045

RESUMO

Upon the first publication of the fifth iteration of the Functional Annotation of Mammalian Genomes collaborative project, FANTOM5, we gathered a series of primary data and database systems into the FANTOM web resource (http://fantom.gsc.riken.jp) to facilitate researchers to explore transcriptional regulation and cellular states. In the course of the collaboration, primary data and analysis results have been expanded, and functionalities of the database systems enhanced. We believe that our data and web systems are invaluable resources, and we think the scientific community will benefit for this recent update to deepen their understanding of mammalian cellular organization. We introduce the contents of FANTOM5 here, report recent updates in the web resource and provide future perspectives.


Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Mamíferos/genética , Software , Navegador , Animais , Biologia Computacional , Humanos , Ferramenta de Busca
7.
BMC Bioinformatics ; 19(1): 533, 2018 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-30567492

RESUMO

BACKGROUND: Almost 16,000 human long non-coding RNA (lncRNA) genes have been identified in the GENCODE project. However, the function of most of them remains to be discovered. The function of lncRNAs and other novel genes can be predicted by identifying significantly enriched annotation terms in already annotated genes that are co-expressed with the lncRNAs. However, such approaches are sensitive to the methods that are used to estimate the level of co-expression. RESULTS: We have tested and compared two well-known statistical metrics (Pearson and Spearman) and two geometrical metrics (Sobolev and Fisher) for identification of the co-expressed genes, using experimental expression data across 19 normal human tissues. We have also used a benchmarking approach based on semantic similarity to evaluate how well these methods are able to predict annotation terms, using a well-annotated set of protein-coding genes. CONCLUSION: This work shows that geometrical metrics, in particular in combination with the statistical metrics, will predict annotation terms more efficiently than traditional approaches. Tests on selected lncRNAs confirm that it is possible to predict the function of these genes given a reliable set of expression data. The software used for this investigation is freely available.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Anotação de Sequência Molecular , RNA Longo não Codificante/metabolismo , Software , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , RNA Longo não Codificante/genética
8.
BMC Cancer ; 18(1): 478, 2018 04 27.
Artigo em Inglês | MEDLINE | ID: mdl-29703166

RESUMO

BACKGROUND: The relationship between cholesterol and prostate cancer has been extensively studied for decades, where high levels of cellular cholesterol are generally associated with cancer progression and less favorable outcomes. However, the role of in vivo cellular cholesterol synthesis in this process is unclear, and data on the transcriptional activity of cholesterol synthesis pathway genes in tissue from prostate cancer patients are inconsistent. METHODS: A common problem with cancer tissue data from patient cohorts is the presence of heterogeneous tissue which confounds molecular analysis of the samples. In this study we present a general method to minimize systematic confounding from stroma tissue in any prostate cancer cohort comparing prostate cancer and normal samples. In particular we use samples assessed by histopathology to identify genes enriched and depleted in prostate stroma. These genes are then used to assess stroma content in tissue samples from other prostate cancer cohorts where no histopathology is available. Differential expression analysis is performed by comparing cancer and normal samples where the average stroma content has been balanced between the sample groups. In total we analyzed seven patient cohorts with prostate cancer consisting of 1713 prostate cancer and 230 normal tissue samples. RESULTS: When stroma confounding was minimized, differential gene expression analysis over all cohorts showed robust and consistent downregulation of nearly all genes in the cholesterol synthesis pathway. Additional Gene Ontology analysis also identified cholesterol synthesis as the most significantly altered metabolic pathway in prostate cancer at the transcriptional level. CONCLUSION: The surprising observation that cholesterol synthesis genes are downregulated in prostate cancer is important for our understanding of how prostate cancer cells regulate cholesterol levels in vivo. Moreover, we show that tissue heterogeneity explains the lack of consistency in previous expression analysis of cholesterol synthesis genes in prostate cancer.


Assuntos
Colesterol/biossíntese , Regulação Enzimológica da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Metabolismo dos Lipídeos/genética , Neoplasias da Próstata/genética , Neoplasias da Próstata/metabolismo , Vias Biossintéticas/genética , Estudos de Coortes , Regulação para Baixo , Humanos , Masculino , Modelos Biológicos , Neoplasias da Próstata/patologia , Reprodutibilidade dos Testes , Células Estromais/metabolismo , Células Estromais/patologia , Transcrição Gênica
9.
RNA Biol ; 15(6): 829-831, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29671387

RESUMO

The genetic alphabet consists of the four letters: C, A, G, and T in DNA and C,A,G, and U in RNA. Triplets of these four letters jointly encode 20 different amino acids out of which proteins of all organisms are built. This system is universal and is found in all kingdoms of life. However, bases in DNA and RNA can be chemically modified. In DNA, around 10 different modifications are known, and those have been studied intensively over the past 20 years. Scientific studies on DNA modifications and proteins that recognize them gave rise to the large field of epigenetic and epigenomic research. The outcome of this intense research field is the discovery that development, ageing, and stem-cell dependent regeneration but also several diseases including cancer are largely controlled by the epigenetic state of cells. Consequently, this research has already led to the first FDA approved drugs that exploit the gained knowledge to combat disease. In recent years, the ~150 modifications found in RNA have come to the focus of intense research. Here we provide a perspective on necessary and expected developments in the fast expanding area of RNA modifications, termed epitranscriptomics.


Assuntos
DNA de Neoplasias , Epigênese Genética , Epigenômica/normas , Perfilação da Expressão Gênica/normas , Regulação Neoplásica da Expressão Gênica , Neoplasias , RNA Neoplásico , Transcriptoma , DNA de Neoplasias/genética , DNA de Neoplasias/metabolismo , Europa (Continente) , Perfilação da Expressão Gênica/métodos , Humanos , Neoplasias/genética , Neoplasias/metabolismo , RNA Neoplásico/genética , RNA Neoplásico/metabolismo
10.
Clin Genet ; 92(4): 405-414, 2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28195393

RESUMO

BACKGROUND: Many families with a high burden of colorectal cancer fulfil the clinical criteria for Lynch Syndrome. However, in about half of these families, no germline mutation in the mismatch repair genes known to be associated with this disease can be identified. The aim of this study was to find the genetic cause for the increased colorectal cancer risk in these unsolved cases. MATERIALS AND METHODS: To reach the aim, we designed a gene panel targeting 112 previously known or candidate colorectal cancer susceptibility genes to screen 274 patient samples for mutations. Mutations were validated by Sanger sequencing and, where possible, segregation analysis was performed. RESULTS: We identified 73 interesting variants, of whom 17 were pathogenic and 19 were variants of unknown clinical significance in well-established cancer susceptibility genes. In addition, 37 potentially pathogenic variants in candidate colorectal cancer susceptibility genes were detected. CONCLUSION: In conclusion, we found a promising DNA variant in more than 25 % of the patients, which shows that gene panel testing is a more effective method to identify germline variants in CRC patients compared to a single gene approach.


Assuntos
Neoplasias Colorretais Hereditárias sem Polipose/diagnóstico , Neoplasias Colorretais/diagnóstico , Predisposição Genética para Doença , Proteínas de Neoplasias/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Neoplasias Colorretais Hereditárias sem Polipose/genética , Neoplasias Colorretais Hereditárias sem Polipose/patologia , Reparo de Erro de Pareamento de DNA/genética , Feminino , Mutação em Linhagem Germinativa , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Pessoa de Meia-Idade
11.
BMC Bioinformatics ; 17(1): 296, 2016 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-27473391

RESUMO

BACKGROUND: The Gene Ontology (GO) is a dynamic, controlled vocabulary that describes the cellular function of genes and proteins according to tree major categories: biological process, molecular function and cellular component. It has become widely used in many bioinformatics applications for annotating genes and measuring their semantic similarity, rather than their sequence similarity. Generally speaking, semantic similarity measures involve the GO tree topology, information content of GO terms, or a combination of both. RESULTS: Here we present a new semantic similarity measure called TopoICSim (Topological Information Content Similarity) which uses information on the specific paths between GO terms based on the topology of the GO tree, and the distribution of information content along these paths. The TopoICSim algorithm was evaluated on two human benchmark datasets based on KEGG pathways and Pfam domains grouped as clans, using GO terms from either the biological process or molecular function. The performance of the TopoICSim measure compared favorably to five existing methods. Furthermore, the TopoICSim similarity was also tested on gene/protein sets defined by correlated gene expression, using three human datasets, and showed improved performance compared to two previously published similarity measures. Finally we used an online benchmarking resource which evaluates any similarity measure against a set of 11 similarity measures in three tests, using gene/protein sets based on sequence similarity, Pfam domains, and enzyme classifications. The results for TopoICSim showed improved performance relative to most of the measures included in the benchmarking, and in particular a very robust performance throughout the different tests. CONCLUSIONS: The TopoICSim similarity measure provides a competitive method with robust performance for quantification of semantic similarity between genes and proteins based on GO annotations. An R script for TopoICSim is available at http://bigr.medisin.ntnu.no/tools/TopoICSim.R .


Assuntos
Biologia Computacional/métodos , Ontologia Genética , Algoritmos , Humanos , Anotação de Sequência Molecular , Semântica , Vocabulário Controlado
12.
BMC Bioinformatics ; 17(1): 459, 2016 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-27842491

RESUMO

BACKGROUND: Transcription factors are key proteins in the regulation of gene transcription. An important step in this process is the opening of chromatin in order to make genomic regions available for transcription. Data on DNase I hypersensitivity has previously been used to label a subset of transcription factors as Pioneers, Settlers and Migrants to describe their potential role in this process. These labels represent an interesting hypothesis on gene regulation and possibly a useful approach for data analysis, and therefore we wanted to expand the set of labeled transcription factors to include as many known factors as possible. We have used a well-annotated dataset of 1175 transcription factors as input to supervised machine learning methods, using the subset with previously assigned labels as training set. We then used the final classifier to label the additional transcription factors according to their potential role as Pioneers, Settlers and Migrants. The full set of labeled transcription factors was used to investigate associated properties and functions of each class, including an analysis of interaction data for transcription factors based on DNA co-binding and protein-protein interactions. We also used the assigned labels to analyze a previously published set of gene lists associated with a time course experiment on cell differentiation. RESULTS: The analysis showed that the classification of transcription factors with respect to their potential role in chromatin opening largely was determined by how they bind to DNA. Each subclass of transcription factors was enriched for properties that seemed to characterize the subclass relative to its role in gene regulation, with very general functions for Pioneers, whereas Migrants to a larger extent were associated with specific processes. Further analysis showed that the expanded classification is a useful resource for analyzing other datasets on transcription factors with respect to their potential role in gene regulation. The analysis of transcription factor interaction data showed complementary differences between the subclasses, where transcription factors labeled as Pioneers often interact with other transcription factors through DNA co-binding, whereas Migrants to a larger extent use protein-protein interactions. The analysis of time course data on cell differentiation indicated a shift in the regulatory program associated with Pioneer-like transcription factors during differentiation. CONCLUSIONS: The expanded classification is an interesting resource for analyzing data on gene regulation, as illustrated here on transcription factor interaction data and data from a time course experiment. The potential regulatory function of transcription factors seems largely to be determined by how they bind DNA, but is also influenced by how they interact with each other through cooperativity and protein-protein interactions.


Assuntos
Regulação da Expressão Gênica , Fatores de Transcrição/metabolismo , Cromatina/genética , Cromatina/metabolismo , DNA/genética , DNA/metabolismo , Genômica , Humanos , Fatores de Transcrição/genética
13.
Nucleic Acids Res ; 41(5): 2846-56, 2013 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-23325852

RESUMO

Genome-wide gene expression analyses of the human somatic cell cycle have indicated that the set of cycling genes differ between primary and cancer cells. By identifying genes that have cell cycle dependent expression in HaCaT human keratinocytes and comparing these with previously identified cell cycle genes, we have identified three distinct groups of cell cycle genes. First, housekeeping genes enriched for known cell cycle functions; second, cell type-specific genes enriched for HaCaT-specific functions; and third, Polycomb-regulated genes. These Polycomb-regulated genes are specifically upregulated during DNA replication, and consistent with being epigenetically silenced in other cell cycle phases, these genes have lower expression than other cell cycle genes. We also find similar patterns in foreskin fibroblasts, indicating that replication-dependent expression of Polycomb-silenced genes is a prevalent but unrecognized regulatory mechanism.


Assuntos
Ciclo Celular/genética , Replicação do DNA , Proteínas do Grupo Polycomb/fisiologia , Regulação para Cima , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Linhagem Celular , Ilhas de CpG , Fibroblastos/metabolismo , Perfilação da Expressão Gênica , Genes Essenciais , Histonas/fisiologia , Humanos , Queratinócitos/metabolismo , Queratinócitos/fisiologia , Análise dos Mínimos Quadrados , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos , Regiões Promotoras Genéticas , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Transcrição Gênica , Transcriptoma
14.
Nucleic Acids Res ; 41(Web Server issue): W133-41, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23632163

RESUMO

The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome.


Assuntos
Genômica/métodos , Software , Interpretação Estatística de Dados , Genoma , Internet
15.
BMC Genomics ; 15: 192, 2014 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-24625193

RESUMO

BACKGROUND: Gene duplication and horizontal gene transfer are common processes in bacterial and archaeal genomes, and are generally assumed to result in either diversification or loss of the redundant gene copies. However, a recent analysis of the genome of the soil bacterium Azotobacter vinelandii DJ revealed an abundance of highly similar homologs among carbohydrate metabolism genes. In many cases these multiple genes did not appear to be the result of recent duplications, or to function only as a means of stimulating expression by increasing gene dosage, as the homologs were located in varying functional genetic contexts. Based on these initial findings we here report in-depth bioinformatic analyses focusing specifically on highly similar intra-genome homologs, or synologs, among carbohydrate metabolism genes, as well as an analysis of the general occurrence of very similar synologs in prokaryotes. RESULTS: Approximately 900 bacterial and archaeal genomes were analysed for the occurrence of synologs, both in general and among carbohydrate metabolism genes specifically. This showed that large numbers of highly similar synologs among carbohydrate metabolism genes are very rare in bacterial and archaeal genomes, and that the A. vinelandii DJ genome contains an unusually large amount of such synologs. The majority of these synologs were found to be non-tandemly organized and localized in varying but metabolically relevant genomic contexts. The same observation was made for other genomes harbouring high levels of such synologs. It was also shown that highly similar synologs generally constitute a very small fraction of the protein-coding genes in prokaryotic genomes. The overall synolog fraction of the A. vinelandii DJ genome was well above the data set average, but not nearly as remarkable as the levels observed when only carbohydrate metabolism synologs were considered. CONCLUSIONS: Large numbers of highly similar synologs are rare in bacterial and archaeal genomes, both in general and among carbohydrate metabolism genes. However, A. vinelandii and several other soil bacteria harbour large numbers of highly similar carbohydrate metabolism synologs which seem not to result from recent duplication or transfer events. These genes may confer adaptive benefits with respect to certain lifestyles and environmental factors, most likely due to increased regulatory flexibility and/or increased gene dosage.


Assuntos
Azotobacter vinelandii/genética , Proteínas de Bactérias/genética , Metabolismo dos Carboidratos/genética , Adaptação Fisiológica , Proteínas Arqueais/genética , Sequência Conservada , Genoma Bacteriano , Proteoma/genética , Pseudomonas/genética , Homologia de Sequência de Aminoácidos
16.
BMC Genomics ; 15: 120, 2014 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-24669905

RESUMO

BACKGROUND: Deciphering the most common modes by which chromatin regulates transcription, and how this is related to cellular status and processes is an important task for improving our understanding of human cellular biology. The FANTOM5 and ENCODE projects represent two independent large scale efforts to map regulatory and transcriptional features to the human genome. Here we investigate chromatin features around a comprehensive set of transcription start sites in four cell lines by integrating data from these two projects. RESULTS: Transcription start sites can be distinguished by chromatin states defined by specific combinations of both chromatin mark enrichment and the profile shapes of these chromatin marks. The observed patterns can be associated with cellular functions and processes, and they also show association with expression level, location relative to nearby genes, and CpG content. In particular we find a substantial number of repressed inter- and intra-genic transcription start sites enriched for active chromatin marks and Pol II, and these sites are strongly associated with immediate-early response processes and cell signaling. Associations between start sites with similar chromatin patterns are validated by significant correlations in their global expression profiles. CONCLUSIONS: The results confirm the link between chromatin state and cellular function for expressed transcripts, and also indicate that active chromatin states at repressed transcripts may poise transcripts for rapid activation during immune response.


Assuntos
Cromatina/metabolismo , Sítio de Iniciação de Transcrição , Algoritmos , Linhagem Celular Tumoral , Cromatina/genética , Imunoprecipitação da Cromatina , Análise por Conglomerados , Biologia Computacional , Ilhas de CpG , Regulação da Expressão Gênica , Biblioteca Gênica , Células HeLa , Células Hep G2 , Histonas/química , Histonas/metabolismo , Humanos , Células K562 , Análise de Componente Principal , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo
17.
Environ Microbiol ; 16(2): 545-58, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23827055

RESUMO

It is well established that micro-organisms colonize a variety of extreme environments, including habitats like oil reservoirs deep inside the earth crust. Here, we present the results of a comparative high-coverage DNA sequencing study of metagenomes derived from two different oil reservoirs, both located about 2.5 km subseafloor below the Norwegian Sea. A previously reported bioinformatic analysis of DNA sequence data derived from one of the reservoirs (Well I) indicated that the community is dominated by bacterial species with a smaller fraction of Archaea. Here, we report results of a similar analysis from another reservoir (Well II) located in the same geographical area, however, according to available geological knowledge lacking direct physical contact with Well I. Interestingly, the Well II community is largely dominated by Archaea with a subordinate fraction of Bacteria. Comparison of the two datasets showed that large fractions of the sequences are extremely similar, both with respect to identity (typically above 98%) and gene organization. We therefore conclude that both wells contain essentially the same organisms, but in different relative abundances. Assuming that the communities have been distinct for long timescales because of physical separation, the results also indicate that microbial growth in the reservoirs is extremely slow.


Assuntos
Archaea/classificação , Bactérias/classificação , Metagenoma , Campos de Petróleo e Gás/microbiologia , Filogenia , Archaea/genética , Archaea/isolamento & purificação , Bactérias/genética , Bactérias/isolamento & purificação , Sequência de Bases , Ecossistema , Oceanos e Mares , RNA Ribossômico 16S/genética , Análise de Sequência de DNA
18.
BMC Bioinformatics ; 14: 9, 2013 Jan 16.
Artigo em Inglês | MEDLINE | ID: mdl-23323883

RESUMO

BACKGROUND: Traditional methods for computational motif discovery often suffer from poor performance. In particular, methods that search for sequence matches to known binding motifs tend to predict many non-functional binding sites because they fail to take into consideration the biological state of the cell. In recent years, genome-wide studies have generated a lot of data that has the potential to improve our ability to identify functional motifs and binding sites, such as information about chromatin accessibility and epigenetic states in different cell types. However, it is not always trivial to make use of this data in combination with existing motif discovery tools, especially for researchers who are not skilled in bioinformatics programming. RESULTS: Here we present MotifLab, a general workbench for analysing regulatory sequence regions and discovering transcription factor binding sites and cis-regulatory modules. MotifLab supports comprehensive motif discovery and analysis by allowing users to integrate several popular motif discovery tools as well as different kinds of additional information, including phylogenetic conservation, epigenetic marks, DNase hypersensitive sites, ChIP-Seq data, positional binding preferences of transcription factors, transcription factor interactions and gene expression. MotifLab offers several data-processing operations that can be used to create, manipulate and analyse data objects, and complete analysis workflows can be constructed and automatically executed within MotifLab, including graphical presentation of the results. CONCLUSIONS: We have developed MotifLab as a flexible workbench for motif analysis in a genomic context. The flexibility and effectiveness of this workbench has been demonstrated on selected test cases, in particular two previously published benchmark data sets for single motifs and modules, and a realistic example of genes responding to treatment with forskolin. MotifLab is freely available at http://www.motiflab.org.


Assuntos
Elementos Reguladores de Transcrição , Análise de Sequência de DNA/métodos , Software , Fatores de Transcrição/metabolismo , Algoritmos , Sítios de Ligação , Colforsina/farmacologia , Regulação da Expressão Gênica , Humanos , Motivos de Nucleotídeos , Filogenia
19.
BMC Med ; 11: 163, 2013 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-23849224

RESUMO

BACKGROUND: Vitamin D insufficiency has been implicated in autoimmunity. ChIP-seq experiments using immune cell lines have shown that vitamin D receptor (VDR) binding sites are enriched near regions of the genome associated with autoimmune diseases. We aimed to investigate VDR binding in primary CD4+ cells from healthy volunteers. METHODS: We extracted CD4+ cells from nine healthy volunteers. Each sample underwent VDR ChIP-seq. Our results were analyzed in relation to published ChIP-seq and RNA-seq data in the Genomic HyperBrowser. We used MEMEChIP for de novo motif discovery. 25-Hydroxyvitamin D levels were measured using liquid chromatography-tandem mass spectrometry and samples were divided into vitamin D sufficient (25(OH)D ≥75 nmol/L) and insufficient/deficient (25(OH)D <75 nmol/L) groups. RESULTS: We found that the amount of VDR binding is correlated with the serum level of 25-hydroxyvitamin D (r = 0.92, P= 0.0005). In vivo VDR binding sites are enriched for autoimmune disease associated loci, especially when 25-hydroxyvitamin D levels (25(OH)D) were sufficient (25(OH)D ≥75: 3.13-fold, P<0.0001; 25(OH)D <75: 2.76-fold, P<0.0001; 25(OH)D ≥75 enrichment versus 25(OH)D <75 enrichment: P= 0.0002). VDR binding was also enriched near genes associated specifically with T-regulatory and T-helper cells in the 25(OH)D ≥75 group. MEME ChIP did not identify any VDR-like motifs underlying our VDR ChIP-seq peaks. CONCLUSION: Our results show a direct correlation between in vivo 25-hydroxyvitamin D levels and the number of VDR binding sites, although our sample size is relatively small. Our study further implicates VDR binding as important in gene-environment interactions underlying the development of autoimmunity and provides a biological rationale for 25-hydroxyvitamin D sufficiency being based at 75 nmol/L. Our results also suggest that VDR binding in response to physiological levels of vitamin D occurs predominantly in a VDR motif-independent manner.


Assuntos
Doenças Autoimunes/sangue , Linfócitos T CD4-Positivos/metabolismo , Análise Serial de Proteínas/métodos , Receptores de Calcitriol/sangue , Vitamina D/análogos & derivados , Motivos de Aminoácidos , Sequência de Aminoácidos , Doenças Autoimunes/genética , Doenças Autoimunes/patologia , Sítios de Ligação/genética , Linfócitos T CD4-Positivos/patologia , Genômica/métodos , Humanos , Cultura Primária de Células , Receptores de Calcitriol/genética , Vitamina D/sangue
20.
Nucleic Acids Res ; 39(4): e25, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21113027

RESUMO

Chromatin immunoprecipitation (ChIP) followed by high throughput sequencing (ChIP-seq) is rapidly becoming the method of choice for discovering cell-specific transcription factor binding locations genome wide. By aligning sequenced tags to the genome, binding locations appear as peaks in the tag profile. Several programs have been designed to identify such peaks, but program evaluation has been difficult due to the lack of benchmark data sets. We have created benchmark data sets for three transcription factors by manually evaluating a selection of potential binding regions that cover typical variation in peak size and appearance. Performance of five programs on this benchmark showed, first, that external control or background data was essential to limit the number of false positive peaks from the programs. However, >80% of these peaks could be manually filtered out by visual inspection alone, without using additional background data, showing that peak shape information is not fully exploited in the evaluated programs. Second, none of the programs returned peak-regions that corresponded to the actual resolution in ChIP-seq data. Our results showed that ChIP-seq peaks should be narrowed down to 100-400 bp, which is sufficient to identify unique peaks and binding sites. Based on these results, we propose a meta-approach that gives improved peak definitions.


Assuntos
Imunoprecipitação da Cromatina , Sequenciamento de Nucleotídeos em Larga Escala , Software , Fatores de Transcrição/metabolismo , Benchmarking , Sítios de Ligação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA