Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 82
Filtrar
1.
BMC Med Genomics ; 16(1): 126, 2023 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-37296477

RESUMO

BACKGROUND: Hereditary genetic mutations causing predisposition to colorectal cancer are accountable for approximately 30% of all colorectal cancer cases. However, only a small fraction of these are high penetrant mutations occurring in DNA mismatch repair genes, causing one of several types of familial colorectal cancer (CRC) syndromes. Most of the mutations are low-penetrant variants, contributing to an increased risk of familial colorectal cancer, and they are often found in additional genes and pathways not previously associated with CRC. The aim of this study was to identify such variants, both high-penetrant and low-penetrant ones. METHODS: We performed whole exome sequencing on constitutional DNA extracted from blood of 48 patients suspected of familial colorectal cancer and used multiple in silico prediction tools and available literature-based evidence to detect and investigate genetic variants. RESULTS: We identified several causative and some potentially causative germline variants in genes known for their association with colorectal cancer. In addition, we identified several variants in genes not typically included in relevant gene panels for colorectal cancer, including CFTR, PABPC1 and TYRO3, which may be associated with an increased risk for cancer. CONCLUSIONS: Identification of variants in additional genes that potentially can be associated with familial colorectal cancer indicates a larger genetic spectrum of this disease, not limited only to mismatch repair genes. Usage of multiple in silico tools based on different methods and combined through a consensus approach increases the sensitivity of predictions and narrows down a large list of variants to the ones that are most likely to be significant.


Assuntos
Neoplasias Colorretais Hereditárias sem Polipose , Neoplasias Colorretais , Humanos , Neoplasias Colorretais Hereditárias sem Polipose/diagnóstico , Neoplasias Colorretais Hereditárias sem Polipose/genética , Sequenciamento do Exoma , Predisposição Genética para Doença , Linhagem , Mutação em Linhagem Germinativa , Células Germinativas , Neoplasias Colorretais/genética , Neoplasias Colorretais/diagnóstico
2.
Mol Genet Genomics ; 298(3): 555-566, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36856825

RESUMO

The cancer syndrome polymerase proofreading-associated polyposis results from germline mutations in the POLE and POLD1 genes. Mutations in the exonuclease domain of these genes are associated with hyper- and ultra-mutated tumors with a predominance of base substitutions resulting from faulty proofreading during DNA replication. When a new variant is identified by gene testing of POLE and POLD1, it is important to verify whether the variant is associated with PPAP or not, to guide genetic counseling of mutation carriers. In 2015, we reported the likely pathogenic (class 4) germline POLE c.1373A > T p.(Tyr458Phe) variant and we have now characterized this variant to verify that it is a class 5 pathogenic variant. For this purpose, we investigated (1) mutator phenotype in tumors from two carriers, (2) mutation frequency in cell-based mutagenesis assays, and (3) structural consequences based on protein modeling. Whole-exome sequencing of two tumors identified an ultra-mutator phenotype with a predominance of base substitutions, the majority of which are C > T. A SupF mutagenesis assay revealed increased mutation frequency in cells overexpressing the variant of interest as well as in isogenic cells encoding the variant. Moreover, exonuclease repair yeast-based assay supported defect in proofreading activity. Lastly, we present a homology model of human POLE to demonstrate structural consequences leading to pathogenic impact of the p.(Tyr458Phe) mutation. The three lines of evidence, taken together with updated co-segregation and previously published data, allow the germline variant POLE c.1373A > T p.(Tyr458Phe) to be reclassified as a class 5 variant. That means the variant is associated with PPAP.


Assuntos
DNA Polimerase II , Neoplasias , Humanos , DNA Polimerase II/genética , DNA Polimerase II/química , DNA Polimerase II/metabolismo , Proteínas de Ligação a Poli-ADP-Ribose/genética , Neoplasias/genética , Mutação , Exonucleases/genética , Exonucleases/metabolismo
3.
Nucleic Acids Res ; 51(D1): D564-D570, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36350659

RESUMO

We present an update of EpiFactors, a manually curated database providing information about epigenetic regulators, their complexes, targets, and products which is openly accessible at http://epifactors.autosome.org. An updated version of the EpiFactors contains information on 902 proteins, including 101 histones and protamines, and, as a main update, a newly curated collection of 124 lncRNAs involved in epigenetic regulation. The amount of publications concerning the role of lncRNA in epigenetics is rapidly growing. Yet, the resource that compiles, integrates, organizes, and presents curated information on lncRNAs in epigenetics is missing. EpiFactors fills this gap and provides data on epigenetic regulators in an accessible and user-friendly form. For 820 of the genes in EpiFactors, we include expression estimates across multiple cell types assessed by CAGE-Seq in the FANTOM5 project. In addition, the updated EpiFactors contains information on 73 protein complexes involved in epigenetic regulation. Our resource is practical for a wide range of users, including biologists, bioinformaticians and molecular/systems biologists.


Assuntos
Bases de Dados Genéticas , Epigênese Genética , Humanos , Histonas/genética , Histonas/metabolismo , Protaminas , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo
4.
PLoS One ; 17(10): e0275621, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36282866

RESUMO

Mitochondrial activity in cancer cells has been central to cancer research since Otto Warburg first published his thesis on the topic in 1956. Although Warburg proposed that oxidative phosphorylation in the tricarboxylic acid (TCA) cycle was perturbed in cancer, later research has shown that oxidative phosphorylation is activated in most cancers, including prostate cancer (PCa). However, more detailed knowledge on mitochondrial metabolism and metabolic pathways in cancers is still lacking. In this study we expand our previously developed method for analyzing functional homologous proteins (FunHoP), which can provide a more detailed view of metabolic pathways. FunHoP uses results from differential expression analysis of RNA-Seq data to improve pathway analysis. By adding information on subcellular localization based on experimental data and computational predictions we can use FunHoP to differentiate between mitochondrial and non-mitochondrial processes in cancerous and normal prostate cell lines. Our results show that mitochondrial pathways are upregulated in PCa and that splitting metabolic pathways into mitochondrial and non-mitochondrial counterparts using FunHoP adds to the interpretation of the metabolic properties of PCa cells.


Assuntos
Genes Mitocondriais , Neoplasias da Próstata , Masculino , Humanos , Regulação para Cima , Linhagem Celular Tumoral , Fosforilação Oxidativa , Neoplasias da Próstata/genética , Neoplasias da Próstata/metabolismo , Ácidos Tricarboxílicos
5.
Insect Mol Biol ; 31(6): 810-820, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36054587

RESUMO

The protein vitellogenin (Vg) plays a central role in lipid transportation in most egg-laying animals. High Vg levels correlate with stress resistance and lifespan potential in honey bees (Apis mellifera). Vg is the primary circulating zinc-carrying protein in honey bees. Zinc is an essential metal ion in numerous biological processes, including the function and structure of many proteins. Measurements of Zn2+ suggest a variable number of ions per Vg molecule in different animal species, but the molecular implications of zinc-binding by this protein are not well-understood. We used inductively coupled plasma mass spectrometry to determine that, on average, each honey bee Vg molecule binds 3 Zn2+ -ions. Our full-length protein structure and sequence analysis revealed seven potential zinc-binding sites. These are located in the ß-barrel and α-helical subdomains of the N-terminal domain, the lipid binding site, and the cysteine-rich C-terminal region of unknown function. Interestingly, two potential zinc-binding sites in the ß-barrel can support a proposed role for this structure in DNA-binding. Overall, our findings suggest that honey bee Vg bind zinc at several functional regions, indicating that Zn2+ -ions are important for many of the activities of this protein. In addition to being potentially relevant for other egg-laying species, these insights provide a platform for studies of metal ions in bee health, which is of global interest due to recent declines in pollinator numbers.


Assuntos
Proteínas de Insetos , Vitelogeninas , Abelhas , Animais , Vitelogeninas/metabolismo , Proteínas de Insetos/metabolismo , Zinco , Sítios de Ligação , Lipídeos
6.
iScience ; 25(6): 104451, 2022 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-35707723

RESUMO

High secretion of the metabolites citrate and spermine is a unique hallmark for normal prostate epithelial cells, and is reduced in aggressive prostate cancer. However, the identity of the genes controlling this biological process is mostly unknown. In this study, we have created a gene signature of 150 genes connected to citrate and spermine secretion in the prostate. We have computationally integrated metabolic measurements with multiple transcriptomics datasets from the public domain, including 3826 tissue samples from prostate and prostate cancer. The accuracy of the signature is validated by its unique enrichment in prostate samples and prostate epithelial tissue compartments. The signature highlights genes AZGP1, ANPEP and metallothioneins with zinc-binding properties not previously studied in the prostate, and the expression of these genes are reduced in more aggressive cancer lesions. However, the absence of signature enrichment in common prostate model systems can make it challenging to study these genes mechanistically.

7.
BMC Med Genomics ; 14(1): 214, 2021 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-34465341

RESUMO

BACKGROUND: Detection of copy number variation (CNV) in genes associated with disease is important in genetic diagnostics, and next generation sequencing (NGS) technology provides data that can be used for CNV detection. However, CNV detection based on NGS data is in general not often used in diagnostic labs as the data analysis is challenging, especially with data from targeted gene panels. Wet lab methods like MLPA (MRC Holland) are widely used, but are expensive, time consuming and have gene-specific limitations. Our aim has been to develop a bioinformatic tool for CNV detection from NGS data in medical genetic diagnostic samples. RESULTS: Our computational pipeline for detection of CNVs in NGS data from targeted gene panels utilizes coverage depth of the captured regions and calculates a copy number ratio score for each region. This is computed by comparing the mean coverage of the sample with the mean coverage of the same region in other samples, defined as a pool. The pipeline selects pools for comparison dynamically from previously sequenced samples, using the pool with an average coverage depth that is nearest to the one of the samples. A sliding window-based approach is used to analyze each region, where length of sliding window and sliding distance can be chosen dynamically to increase or decrease the resolution. This helps in detecting CNVs in small or partial exons. With this pipeline we have correctly identified the CNVs in 36 positive control samples, with sensitivity of 100% and specificity of 91%. We have detected whole gene level deletion/duplication, single/multi exonic level deletion/duplication, partial exonic deletion and mosaic deletion. Since its implementation in mid-2018 it has proven its diagnostic value with more than 45 CNV findings in routine tests. CONCLUSIONS: With this pipeline as part of our diagnostic practices it is now possible to detect partial, single or multi-exonic, and intragenic CNVs in all genes in our target panel. This has helped our diagnostic lab to expand the portfolio of genes where we offer CNV detection, which previously was limited by the availability of MLPA kits.


Assuntos
Variações do Número de Cópias de DNA
8.
F1000Res ; 102021.
Artigo em Inglês | MEDLINE | ID: mdl-34249331

RESUMO

Background: Many types of data from genomic analyses can be represented as genomic tracks, i.e. features linked to the genomic coordinates of a reference genome. Examples of such data are epigenetic DNA methylation data, ChIP-seq peaks, germline or somatic DNA variants, as well as RNA-seq expression levels. Researchers often face difficulties in locating, accessing and combining relevant tracks from external sources, as well as locating the raw data, reducing the value of the generated information. Description of work: We propose to advance the application of FAIR data principles (Findable, Accessible, Interoperable, and Reusable) to produce searchable metadata for genomic tracks. Findability and Accessibility of metadata can then be ensured by a track search service that integrates globally identifiable metadata from various track hubs in the Track Hub Registry and other relevant repositories. Interoperability and Reusability need to be ensured by the specification and implementation of a basic set of recommendations for metadata. We have tested this concept by developing such a specification in a JSON Schema, called FAIRtracks, and have integrated it into a novel track search service, called TrackFind. We demonstrate practical usage by importing datasets through TrackFind into existing examples of relevant analytical tools for genomic tracks: EPICO and the GSuite HyperBrowser. Conclusion: We here provide a first iteration of a draft standard for genomic track metadata, as well as the accompanying software ecosystem. It can easily be adapted or extended to future needs of the research community regarding data, methods and tools, balancing the requirements of both data submitters and analytical end-users.


Assuntos
Ecossistema , Metadados , Genoma , Genômica , Software
9.
BMC Res Notes ; 14(1): 162, 2021 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-33931103

RESUMO

OBJECTIVE: Properties of gene products can be described or annotated with Gene Ontology (GO) terms. But for many genes we have limited information about their products, for example with respect to function. This is particularly true for long non-coding RNAs (lncRNAs), where the function in most cases is unknown. However, it has been shown that annotation as described by GO terms to some extent can be predicted by enrichment analysis on properties of co-expressed genes. RESULTS: GAPGOM integrates two relevant algorithms, lncRNA2GOA and TopoICSim, into a user-friendly R package. Here lncRNA2GOA does annotation prediction by co-expression, whereas TopoICSim estimates similarity between GO graphs, which can be used for benchmarking of prediction performance, but also for comparison of GO graphs in general. The package provides an improved implementation of the original tools, with substantial improvements in performance and documentation, unified interfaces, and additional features.


Assuntos
Benchmarking , Biologia Computacional , Algoritmos , Ontologia Genética , Anotação de Sequência Molecular
10.
Genomics Proteomics Bioinformatics ; 19(5): 848-859, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-33741524

RESUMO

Cytoscape is often used for visualization and analysis of metabolic pathways. For example, based on KEGG data, a reader for KEGG Markup Language (KGML) is used to load files into Cytoscape. However, although multiple genes can be responsible for the same reaction, the KGML-reader KEGGScape only presents the first listed gene in a network node for a given reaction. This can lead to incorrect interpretations of the pathways. Our new method, FunHoP, shows all possible genes in each node, making the pathways more complete. FunHoP collapses all genes in a node into one measurement using read counts from RNA-seq. Assuming that activity for an enzymatic reaction mainly depends upon the gene with the highest number of reads, and weighting the reads on gene length and ratio, a new expression value is calculated for the node as a whole. Differential expression at node level is then applied to the networks. Using prostate cancer as model, we integrate RNA-seq data from two patient cohorts with metabolism data from literature. Here we show that FunHoP gives more consistent pathways that are easier to interpret biologically. Code and documentation for running FunHoP can be found at https://github.com/kjerstirise/FunHoP.


Assuntos
Redes e Vias Metabólicas , Software , Humanos , Redes e Vias Metabólicas/genética
11.
Cancer Inform ; 19: 1176935120965542, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33116353

RESUMO

The k-Nearest Neighbor (kNN) classifier represents a simple and very general approach to classification. Still, the performance of kNN classifiers can often compete with more complex machine-learning algorithms. The core of kNN depends on a "guilt by association" principle where classification is performed by measuring the similarity between a query and a set of training patterns, often computed as distances. The relative performance of kNN classifiers is closely linked to the choice of distance or similarity measure, and it is therefore relevant to investigate the effect of using different distance measures when comparing biomedical data. In this study on classification of cancer data sets, we have used both common and novel distance measures, including the novel distance measures Sobolev and Fisher, and we have evaluated the performance of kNN with these distances on 4 cancer data sets of different type. We find that the performance when using the novel distance measures is comparable to the performance with more well-established measures, in particular for the Sobolev distance. We define a robust ranking of all the distance measures according to overall performance. Several distance measures show robust performance in kNN over several data sets, in particular the Hassanat, Sobolev, and Manhattan measures. Some of the other measures show good performance on selected data sets but seem to be more sensitive to the nature of the classification data. It is therefore important to benchmark distance measures on similar data prior to classification to identify the most suitable measure in each case.

12.
Sci Adv ; 6(37)2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32917713

RESUMO

Intestinal epithelial homeostasis is maintained by adult intestinal stem cells, which, alongside Paneth cells, appear after birth in the neonatal period. We aimed to identify regulators of neonatal intestinal epithelial development by testing a small library of epigenetic modifier inhibitors in Paneth cell-skewed organoid cultures. We found that lysine-specific demethylase 1A (Kdm1a/Lsd1) is absolutely required for Paneth cell differentiation. Lsd1-deficient crypts, devoid of Paneth cells, are still able to form organoids without a requirement of exogenous or endogenous Wnt. Mechanistically, we find that LSD1 enzymatically represses genes that are normally expressed only in fetal and neonatal epithelium. This gene profile is similar to what is seen in repairing epithelium, and we find that Lsd1-deficient epithelium has superior regenerative capacities after irradiation injury. In summary, we found an important regulator of neonatal intestinal development and identified a druggable target to reprogram intestinal epithelium toward a reparative state.


Assuntos
Mucosa Intestinal , Celulas de Paneth , Diferenciação Celular/genética , Histona Desmetilases/genética , Humanos , Recém-Nascido , Organoides
13.
PLoS One ; 15(7): e0235613, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32634176

RESUMO

Germline variants inactivating the mismatch repair (MMR) genes MLH1, MSH2, MSH6 and PMS2 cause Lynch syndrome that implies an increased cancer risk, where colon and endometrial cancer are the most frequent. Identification of these pathogenic variants is important to identify endometrial cancer patients with inherited increased risk of new cancers, in order to offer them lifesaving surveillance. However, several other genes are also part of the MMR pathway. It is therefore relevant to search for variants in additional genes that may be associated with cancer risk by including all known genes involved in the MMR pathway. Next-generation sequencing was used to screen 22 genes involved in the MMR pathway in constitutional DNA extracted from full blood from 199 unselected endometrial cancer patients. Bioinformatic pipelines were developed for identification and functional annotation of variants, using several different software tools and custom programs. This facilitated identification of 22 exonic, 4 UTR and 9 intronic variants that could be classified according to pathogenicity. This study has identified several germline variants in genes of the MMR pathway that potentially may be associated with an increased risk for cancer, in particular endometrial cancer, and therefore are relevant for further investigation. We have also developed bioinformatics strategies to analyse targeted sequencing data, including low quality data and genomic regions outside of the protein coding exons of the relevant genes.


Assuntos
Reparo de Erro de Pareamento de DNA , Neoplasias do Endométrio/patologia , Endonuclease PMS2 de Reparo de Erro de Pareamento/genética , Proteína 1 Homóloga a MutL/genética , Proteína 2 Homóloga a MutS/genética , Neoplasias Colorretais Hereditárias sem Polipose/genética , Neoplasias Colorretais Hereditárias sem Polipose/patologia , Variações do Número de Cópias de DNA , DNA de Neoplasias/sangue , DNA de Neoplasias/química , DNA de Neoplasias/metabolismo , Neoplasias do Endométrio/genética , Éxons , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Íntrons , Fatores de Risco , Regiões não Traduzidas/genética
14.
BMC Bioinformatics ; 21(1): 134, 2020 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-32252623

RESUMO

BACKGROUND: Diseases like cancer will lead to changes in gene expression, and it is relevant to identify key regulatory genes that can be linked directly to these changes. This can be done by computing a Regulatory Impact Factor (RIF) score for relevant regulators. However, this computation is based on estimating correlated patterns of gene expression, often Pearson correlation, and an assumption about a set of specific regulators, normally transcription factors. This study explores alternative measures of correlation, using the Fisher and Sobolev metrics, and an extended set of regulators, including epigenetic regulators and long non-coding RNAs (lncRNAs). Data on prostate cancer have been used to explore the effect of these modifications. RESULTS: A tool for computation of RIF scores with alternative correlation measures and extended sets of regulators was developed and tested on gene expression data for prostate cancer. The study showed that the Fisher and Sobolev metrics lead to improved identification of well-documented regulators of gene expression in prostate cancer, and the sets of identified key regulators showed improved overlap with previously defined gene sets of relevance to cancer. The extended set of regulators lead to identification of several interesting candidates for further studies, including lncRNAs. Several key processes were identified as important, including spindle assembly and the epithelial-mesenchymal transition (EMT). CONCLUSIONS: The study has shown that using alternative metrics of correlation can improve the performance of tools based on correlation of gene expression in genomic data. The Fisher and Sobolev metrics should be considered also in other correlation-based applications.


Assuntos
Biologia Computacional/métodos , Epigênese Genética , Fatores de Transcrição/metabolismo , Bases de Dados Genéticas , Transição Epitelial-Mesenquimal , Regulação Neoplásica da Expressão Gênica , Humanos , Masculino , Neoplasias da Próstata/genética , Neoplasias da Próstata/patologia , RNA Longo não Codificante/metabolismo , Fatores de Transcrição/genética
15.
BMC Med Genomics ; 13(1): 6, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31914996

RESUMO

BACKGROUND: Prostate cancer (PCa) has the highest incidence rates of cancers in men in western countries. Unlike several other types of cancer, PCa has few genetic drivers, which has led researchers to look for additional epigenetic and transcriptomic contributors to PCa development and progression. Especially datasets on DNA methylation, the most commonly studied epigenetic marker, have recently been measured and analysed in several PCa patient cohorts. DNA methylation is most commonly associated with downregulation of gene expression. However, positive associations of DNA methylation to gene expression have also been reported, suggesting a more diverse mechanism of epigenetic regulation. Such additional complexity could have important implications for understanding prostate cancer development but has not been studied at a genome-wide scale. RESULTS: In this study, we have compared three sets of genome-wide single-site DNA methylation data from 870 PCa and normal tissue samples with multi-cohort gene expression data from 1117 samples, including 532 samples where DNA methylation and gene expression have been measured on the exact same samples. Genes were classified according to their corresponding methylation and expression profiles. A large group of hypermethylated genes was robustly associated with increased gene expression (UPUP group) in all three methylation datasets. These genes demonstrated distinct patterns of correlation between DNA methylation and gene expression compared to the genes showing the canonical negative association between methylation and expression (UPDOWN group). This indicates a more diversified role of DNA methylation in regulating gene expression than previously appreciated. Moreover, UPUP and UPDOWN genes were associated with different compartments - UPUP genes were related to the structures in nucleus, while UPDOWN genes were linked to extracellular features. CONCLUSION: We identified a robust association between hypermethylation and upregulation of gene expression when comparing samples from prostate cancer and normal tissue. These results challenge the classical view where DNA methylation is always associated with suppression of gene expression, which underlines the importance of considering corresponding expression data when assessing the downstream regulatory effect of DNA methylation.


Assuntos
Metilação de DNA , DNA de Neoplasias , Epigênese Genética , Regulação Neoplásica da Expressão Gênica , Neoplasias da Próstata , Regulação para Cima , DNA de Neoplasias/genética , DNA de Neoplasias/metabolismo , Humanos , Masculino , Neoplasias da Próstata/genética , Neoplasias da Próstata/metabolismo , Neoplasias da Próstata/patologia
16.
Biostatistics ; 21(3): 625-639, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-30698663

RESUMO

We present model-based analysis for ChIA-PET (MACPET), which analyzes paired-end read sequences provided by ChIA-PET for finding binding sites of a protein of interest. MACPET uses information from both tags of each PET and searches for binding sites in a two-dimensional space, while taking into account different noise levels in different genomic regions. MACPET shows favorable results compared with MACS in terms of motif occurrence and spatial resolution. Furthermore, significant binding sites discovered by MACPET are involved in a higher number of significant three-dimensional interactions than those discovered by MACS. MACPET is freely available on Bioconductor. ChIA-PET; MACPET; Model-based clustering; Paired-end tags; Peak-calling algorithm.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Imunoprecipitação da Cromatina , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Modelos Biológicos , Ligação Proteica , Análise de Sequência de DNA , Humanos
17.
Clin Epigenetics ; 11(1): 193, 2019 12 12.
Artigo em Inglês | MEDLINE | ID: mdl-31831061

RESUMO

Sequencing technologies have changed not only our approaches to classical genetics, but also the field of epigenetics. Specific methods allow scientists to identify novel genome-wide epigenetic patterns of DNA methylation down to single-nucleotide resolution. DNA methylation is the most researched epigenetic mark involved in various processes in the human cell, including gene regulation and development of diseases, such as cancer. Increasing numbers of DNA methylation sequencing datasets from human genome are produced using various platforms-from methylated DNA precipitation to the whole genome bisulfite sequencing. Many of those datasets are fully accessible for repeated analyses. Sequencing experiments have become routine in laboratories around the world, while analysis of outcoming data is still a challenge among the majority of scientists, since in many cases it requires advanced computational skills. Even though various tools are being created and published, guidelines for their selection are often not clear, especially to non-bioinformaticians with limited experience in computational analyses. Separate tools are often used for individual steps in the analysis, and these can be challenging to manage and integrate. However, in some instances, tools are combined into pipelines that are capable to complete all the essential steps to achieve the result. In the case of DNA methylation sequencing analysis, the goal of such pipeline is to map sequencing reads, calculate methylation levels, and distinguish differentially methylated positions and/or regions. The objective of this review is to describe basic principles and steps in the analysis of DNA methylation sequencing data that in particular have been used for mammalian genomes, and more importantly to present and discuss the most pronounced computational pipelines that can be used to analyze such data. We aim to provide a good starting point for scientists with limited experience in computational analyses of DNA methylation and hydroxymethylation data, and recommend a few tools that are powerful, but still easy enough to use for their own data analysis.


Assuntos
Biologia Computacional/métodos , Metilação de DNA , Análise de Sequência de DNA/métodos , Análise de Dados , Epigênese Genética , Genoma Humano , Humanos , Software
18.
BMC Bioinformatics ; 19(1): 533, 2018 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-30567492

RESUMO

BACKGROUND: Almost 16,000 human long non-coding RNA (lncRNA) genes have been identified in the GENCODE project. However, the function of most of them remains to be discovered. The function of lncRNAs and other novel genes can be predicted by identifying significantly enriched annotation terms in already annotated genes that are co-expressed with the lncRNAs. However, such approaches are sensitive to the methods that are used to estimate the level of co-expression. RESULTS: We have tested and compared two well-known statistical metrics (Pearson and Spearman) and two geometrical metrics (Sobolev and Fisher) for identification of the co-expressed genes, using experimental expression data across 19 normal human tissues. We have also used a benchmarking approach based on semantic similarity to evaluate how well these methods are able to predict annotation terms, using a well-annotated set of protein-coding genes. CONCLUSION: This work shows that geometrical metrics, in particular in combination with the statistical metrics, will predict annotation terms more efficiently than traditional approaches. Tests on selected lncRNAs confirm that it is possible to predict the function of these genes given a reliable set of expression data. The software used for this investigation is freely available.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Anotação de Sequência Molecular , RNA Longo não Codificante/metabolismo , Software , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , RNA Longo não Codificante/genética
19.
F1000Res ; 72018.
Artigo em Inglês | MEDLINE | ID: mdl-30271575

RESUMO

The Norwegian e-Infrastructure for Life Sciences (NeLS) has been developed by ELIXIR Norway to provide its users with a system enabling data storage, sharing, and analysis in a project-oriented fashion. The system is available through easy-to-use web interfaces, including the Galaxy workbench for data analysis and workflow execution. Users confident with a command-line interface and programming may also access it through Secure Shell (SSH) and application programming interfaces (APIs).  NeLS has been in production since 2015, with training and support provided by the help desk of ELIXIR Norway. Through collaboration with NorSeq, the national consortium for high-throughput sequencing, an integrated service is offered so that sequencing data generated in a research project is provided to the involved researchers through NeLS. Sensitive data, such as individual genomic sequencing data, are handled using the TSD (Services for Sensitive Data) platform provided by Sigma2 and the University of Oslo. NeLS integrates national e-infrastructure storage and computing resources, and is also integrated with the SEEK platform in order to store large data files produced by experiments described in SEEK.   In this article, we outline the architecture of NeLS and discuss possible directions for further development.


Assuntos
Disciplinas das Ciências Biológicas , Sistemas de Gerenciamento de Base de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Disseminação de Informação/métodos , Armazenamento e Recuperação da Informação/métodos , Noruega
20.
Front Microbiol ; 9: 1416, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30008706

RESUMO

Shiga toxin-producing Escherichia coli (STEC) cause both sporadic infections and outbreaks of enteric disease in humans, with symptoms ranging from asymptomatic carriage to severe disease like haemolytic uremic syndrome (HUS). Bacterial virulence factors like subtypes of the Shiga toxin (Stx) and the locus of enterocyte effacement (LEE) pathogenicity island, as well as host factors like young age, are strongly associated with development of HUS. However, these factors alone do not accurately differentiate between strains that cause HUS and those that do not cause severe disease, which is important in the context of diagnosis, treatment, as well as infection control. We have used RNA sequencing to compare transcriptomes of 30 stx2a and eae positive STEC strains of non-O157 serogroups isolated from children <5 years of age. The strains were from children with HUS (HUS group, n = 15), and children with asymptomatic or mild disease (non-HUS group, n = 15), either induced with mitomycin C or non-induced, to reveal potential differences in gene expression levels between groups. When the HUS and non-HUS group were compared for differential expression of protein-encoding gene families, 399 of 6,119 gene families were differentially expressed (log2 fold change ≥ 1, FDR < 0.05) in the non-induced condition, whereas only one gene family was differentially expressed in the induced condition. Gene ontology and cluster analysis showed that several fimbrial operons, as well as a putative type VI secretion system (T6SS) were more highly expressed in the HUS group than in the non-HUS group, indicating a role of these in the virulence of STEC strains causing severe disease.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...