Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 21(5): 1523-1530, 2020 09 25.
Artículo en Inglés | MEDLINE | ID: mdl-31624847

RESUMEN

The generation and systematic collection of genome-wide data is ever-increasing. This vast amount of data has enabled researchers to study relations between a variety of genomic and epigenomic features, including genetic variation, gene regulation and phenotypic traits. Such relations are typically investigated by comparatively assessing genomic co-occurrence. Technically, this corresponds to assessing the similarity of pairs of genome-wide binary vectors. A variety of similarity measures have been proposed for this problem in other fields like ecology. However, while several of these measures have been employed for assessing genomic co-occurrence, their appropriateness for the genomic setting has never been investigated. We show that the choice of similarity measure may strongly influence results and propose two alternative modelling assumptions that can be used to guide this choice. On both simulated and real genomic data, the Jaccard index is strongly altered by dataset size and should be used with caution. The Forbes coefficient (fold change) and tetrachoric correlation are less influenced by dataset size, but one should be aware of increased variance for small datasets. All results on simulated and real data can be inspected and reproduced at https://hyperbrowser.uio.no/sim-measure.


Asunto(s)
Genómica/métodos , Algoritmos , Conjuntos de Datos como Asunto , Regulación de la Expresión Génica , Variación Genética , Humanos
2.
BMC Bioinformatics ; 22(1): 498, 2021 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-34654363

RESUMEN

BACKGROUND: Identifying gene interactions is a topic of great importance in genomics, and approaches based on network models provide a powerful tool for studying these. Assuming a Gaussian graphical model, a gene association network may be estimated from multiomic data based on the non-zero entries of the inverse covariance matrix. Inferring such biological networks is challenging because of the high dimensionality of the problem, making traditional estimators unsuitable. The graphical lasso is constructed for the estimation of sparse inverse covariance matrices in such situations, using [Formula: see text]-penalization on the matrix entries. The weighted graphical lasso is an extension in which prior biological information from other sources is integrated into the model. There are however issues with this approach, as it naïvely forces the prior information into the network estimation, even if it is misleading or does not agree with the data at hand. Further, if an associated network based on other data is used as the prior, the method often fails to utilize the information effectively. RESULTS: We propose a novel graphical lasso approach, the tailored graphical lasso, that aims to handle prior information of unknown accuracy more effectively. We provide an R package implementing the method, tailoredGlasso. Applying the method to both simulated and real multiomic data sets, we find that it outperforms the unweighted and weighted graphical lasso in terms of all performance measures we consider. In fact, the graphical lasso and weighted graphical lasso can be considered special cases of the tailored graphical lasso, and a parameter determined by the data measures the usefulness of the prior information. We also find that among a larger set of methods, the tailored graphical is the most suitable for network inference from high-dimensional data with prior information of unknown accuracy. With our method, mRNA data are demonstrated to provide highly useful prior information for protein-protein interaction networks. CONCLUSIONS: The method we introduce utilizes useful prior information more effectively without involving any risk of loss of accuracy should the prior information be misleading.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Genómica , Distribución Normal , Mapas de Interacción de Proteínas
3.
PLoS Comput Biol ; 15(2): e1006731, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30779737

RESUMEN

Graph-based representations are considered to be the future for reference genomes, as they allow integrated representation of the steadily increasing data on individual variation. Currently available tools allow de novo assembly of graph-based reference genomes, alignment of new read sets to the graph representation as well as certain analyses like variant calling and haplotyping. We here present a first method for calling ChIP-Seq peaks on read data aligned to a graph-based reference genome. The method is a graph generalization of the peak caller MACS2, and is implemented in an open source tool, Graph Peak Caller. By using the existing tool vg to build a pan-genome of Arabidopsis thaliana, we validate our approach by showing that Graph Peak Caller with a pan-genome reference graph can trace variants within peaks that are not part of the linear reference genome, and find peaks that in general are more motif-enriched than those found by MACS2.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Arabidopsis/genética , Genoma/genética , Unión Proteica , Programas Informáticos , Factores de Transcripción
4.
Stat Med ; 39(25): 3549-3568, 2020 11 10.
Artículo en Inglés | MEDLINE | ID: mdl-32851696

RESUMEN

In many statistical regression and prediction problems, it is reasonable to assume monotone relationships between certain predictor variables and the outcome. Genomic effects on phenotypes are, for instance, often assumed to be monotone. However, in some settings, it may be reasonable to assume a partially linear model, where some of the covariates can be assumed to have a linear effect. One example is a prediction model using both high-dimensional gene expression data, and low-dimensional clinical data, or when combining continuous and categorical covariates. We study methods for fitting the partially linear monotone model, where some covariates are assumed to have a linear effect on the response, and some are assumed to have a monotone (potentially nonlinear) effect. Most existing methods in the literature for fitting such models are subject to the limitation that they have to be provided the monotonicity directions a priori for the different monotone effects. We here present methods for fitting partially linear monotone models which perform both automatic variable selection, and monotonicity direction discovery. The proposed methods perform comparably to, or better than, existing methods, in terms of estimation, prediction, and variable selection performance, in simulation experiments in both classical and high-dimensional data settings.


Asunto(s)
Algoritmos , Simulación por Computador , Modelos Lineales , Análisis de Regresión
5.
BMC Bioinformatics ; 18(1): 263, 2017 May 18.
Artículo en Inglés | MEDLINE | ID: mdl-28521770

RESUMEN

BACKGROUND: It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as the positions of genes or transcription factor binding sites, on graph-based reference genomes. RESULTS: We formalize offset-based coordinate systems on graph-based reference genomes and introduce methods for representing intervals on these reference structures. We show the advantage of our methods by representing genes on a graph-based representation of the newest assembly of the human genome (GRCh38) and its alternative loci for regions that are highly variable. CONCLUSION: More complex reference genomes, containing alternative loci, require methods to represent genomic data on these structures. Our proposed notation for genomic intervals makes it possible to fully utilize the alternative loci of the GRCh38 assembly and potential future graph-based reference genomes. We have made a Python package for representing such intervals on offset-based coordinate systems, available at https://github.com/uio-cels/offsetbasedgraph . An interactive web-tool using this Python package to visualize genes on a graph created from GRCh38 is available at https://github.com/uio-cels/genomicgraphcoords .


Asunto(s)
Gráficos por Computador , Genoma Humano , Genómica/métodos , Algoritmos , Sitios Genéticos , Humanos , Internet , ARN Mensajero/genética , ARN Mensajero/metabolismo , Análisis de Secuencia de ADN , Programas Informáticos
6.
Nucleic Acids Res ; 41(10): 5164-74, 2013 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-23571755

RESUMEN

The study of chromatin 3D structure has recently gained much focus owing to novel techniques for detecting genome-wide chromatin contacts using next-generation sequencing. A deeper understanding of the architecture of the DNA inside the nucleus is crucial for gaining insight into fundamental processes such as transcriptional regulation, genome dynamics and genome stability. Chromatin conformation capture-based methods, such as Hi-C and ChIA-PET, are now paving the way for routine genome-wide studies of chromatin 3D structure in a range of organisms and tissues. However, appropriate methods for analyzing such data are lacking. Here, we propose a hypothesis test and an enrichment score of 3D co-localization of genomic elements that handles intra- or interchromosomal interactions, both separately and jointly, and that adjusts for biases caused by structural dependencies in the 3D data. We show that maintaining structural properties during resampling is essential to obtain valid estimation of P-values. We apply the method on chromatin states and a set of mutated regions in leukemia cells, and find significant co-localization of these elements, with varying enrichment scores, supporting the role of chromatin 3D structure in shaping the landscape of somatic mutations in cancer.


Asunto(s)
Cromatina/química , Línea Celular Tumoral , Cromosomas Humanos/química , Interpretación Estadística de Datos , Genoma , Humanos , Leucemia/genética , Mutación , Conformación de Ácido Nucleico , Análisis de Secuencia de ADN
7.
Nucleic Acids Res ; 41(Web Server issue): W133-41, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23632163

RESUMEN

The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome.


Asunto(s)
Genómica/métodos , Programas Informáticos , Interpretación Estadística de Datos , Genoma , Internet
8.
PLoS Comput Biol ; 7(12): e1002292, 2011 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-22144885

RESUMEN

Integration of retroviral vectors in the human genome follows non random patterns that favor insertional deregulation of gene expression and may cause risks of insertional mutagenesis when used in clinical gene therapy. Understanding how viral vectors integrate into the human genome is a key issue in predicting these risks. We provide a new statistical method to compare retroviral integration patterns. We identified the positions where vectors derived from the Human Immunodeficiency Virus (HIV) and the Moloney Murine Leukemia Virus (MLV) show different integration behaviors in human hematopoietic progenitor cells. Non-parametric density estimation was used to identify candidate comparative hotspots, which were then tested and ranked. We found 100 significative comparative hotspots, distributed throughout the chromosomes. HIV hotspots were wider and contained more genes than MLV ones. A Gene Ontology analysis of HIV targets showed enrichment of genes involved in antigen processing and presentation, reflecting the high HIV integration frequency observed at the MHC locus on chromosome 6. Four histone modifications/variants had a different mean density in comparative hotspots (H2AZ, H3K4me1, H3K4me3, H3K9me1), while gene expression within the comparative hotspots did not differ from background. These findings suggest the existence of epigenetic or nuclear three-dimensional topology contexts guiding retroviral integration to specific chromosome areas.


Asunto(s)
Vectores Genéticos/genética , Genoma Humano , VIH/genética , Modelos Genéticos , Virus de la Leucemia Murina de Moloney/genética , Integración Viral , Antígenos CD34/genética , Cromosomas Humanos Par 6 , Sitios Genéticos , Antígenos HLA/genética , Células Madre Hematopoyéticas , Histonas/genética , Humanos , Reproducibilidad de los Resultados
9.
Stat Appl Genet Mol Biol ; 10(1)2011 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-23089821

RESUMEN

The lasso is one of the most commonly used methods for high-dimensional regression, but can be unstable and lacks satisfactory asymptotic properties for variable selection. We propose to use weighted lasso with integrated relevant external information on the covariates to guide the selection towards more stable results. Weighting the penalties with external information gives each regression coefficient a covariate specific amount of penalization and can improve upon standard methods that do not use such information by borrowing knowledge from the external material. The method is applied to two cancer data sets, with gene expressions as covariates. We find interesting gene signatures, which we are able to validate. We discuss various ideas on how the weights should be defined and illustrate how different types of investigations can utilize our method exploiting different sources of external data. Through simulations, we show that our method outperforms the lasso and the adaptive lasso when the external information is from relevant to partly relevant, in terms of both variable selection and prediction.


Asunto(s)
Biología Computacional/métodos , Análisis de Regresión , Programas Informáticos , Simulación por Computador , Progresión de la Enfermedad , Dosificación de Gen , Regulación Neoplásica de la Expresión Génica , Genes Relacionados con las Neoplasias , Estudio de Asociación del Genoma Completo/métodos , Neoplasias de Cabeza y Cuello/genética , Neoplasias de Cabeza y Cuello/patología , Humanos , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Estadísticas no Paramétricas , Análisis de Supervivencia
10.
PLoS Genet ; 5(11): e1000719, 2009 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-19911042

RESUMEN

Integrative analysis of gene dosage, expression, and ontology (GO) data was performed to discover driver genes in the carcinogenesis and chemoradioresistance of cervical cancers. Gene dosage and expression profiles of 102 locally advanced cervical cancers were generated by microarray techniques. Fifty-two of these patients were also analyzed with the Illumina expression method to confirm the gene expression results. An independent cohort of 41 patients was used for validation of gene expressions associated with clinical outcome. Statistical analysis identified 29 recurrent gains and losses and 3 losses (on 3p, 13q, 21q) associated with poor outcome after chemoradiotherapy. The intratumor heterogeneity, assessed from the gene dosage profiles, was low for these alterations, showing that they had emerged prior to many other alterations and probably were early events in carcinogenesis. Integration of the alterations with gene expression and GO data identified genes that were regulated by the alterations and revealed five biological processes that were significantly overrepresented among the affected genes: apoptosis, metabolism, macromolecule localization, translation, and transcription. Four genes on 3p (RYBP, GBE1) and 13q (FAM48A, MED4) correlated with outcome at both the gene dosage and expression level and were satisfactorily validated in the independent cohort. These integrated analyses yielded 57 candidate drivers of 24 genetic events, including novel loci responsible for chemoradioresistance. Further mapping of the connections among genetic events, drivers, and biological processes suggested that each individual event stimulates specific processes in carcinogenesis through the coordinated control of multiple genes. The present results may provide novel therapeutic opportunities of both early and advanced stage cervical cancers.


Asunto(s)
Dosificación de Gen , Regulación Neoplásica de la Expresión Génica , Neoplasias del Cuello Uterino/genética , Adulto , Anciano , Estudios de Cohortes , Femenino , Genes Relacionados con las Neoplasias , Humanos , Estimación de Kaplan-Meier , Persona de Mediana Edad , Análisis de Secuencia por Matrices de Oligonucleótidos , Modelos de Riesgos Proporcionales , Análisis de Regresión , Neoplasias del Cuello Uterino/tratamiento farmacológico , Neoplasias del Cuello Uterino/patología , Neoplasias del Cuello Uterino/radioterapia
11.
BMC Genomics ; 12: 353, 2011 Jul 07.
Artículo en Inglés | MEDLINE | ID: mdl-21736759

RESUMEN

BACKGROUND: Transcription factors in disease-relevant pathways represent potential drug targets, by impacting a distinct set of pathways that may be modulated through gene regulation. The influence of transcription factors is typically studied on a per disease basis, and no current resources provide a global overview of the relations between transcription factors and disease. Furthermore, existing pipelines for related large-scale analysis are tailored for particular sources of input data, and there is a need for generic methodology for integrating complementary sources of genomic information. RESULTS: We here present a large-scale analysis of multiple diseases versus multiple transcription factors, with a global map of over-and under-representation of 446 transcription factors in 1010 diseases. This map, referred to as the differential disease regulome, provides a first global statistical overview of the complex interrelationships between diseases, genes and controlling elements. The map is visualized using the Google map engine, due to its very large size, and provides a range of detailed information in a dynamic presentation format.The analysis is achieved through a novel methodology that performs a pairwise, genome-wide comparison on the cartesian product of two distinct sets of annotation tracks, e.g. all combinations of one disease and one TF.The methodology was also used to extend with maps using alternative data sets related to transcription and disease, as well as data sets related to Gene Ontology classification and histone modifications. We provide a web-based interface that allows users to generate other custom maps, which could be based on precisely specified subsets of transcription factors and diseases, or, in general, on any categorical genome annotation tracks as they are improved or become available. CONCLUSION: We have created a first resource that provides a global overview of the complex relations between transcription factors and disease. As the accuracy of the disease regulome depends mainly on the quality of the input data, forthcoming ChIP-seq based binding data for many TFs will provide improved maps. We further believe our approach to genome analysis could allow an advance from the current typical situation of one-time integrative efforts to reproducible and upgradable integrative analysis. The differential disease regulome and its associated methodology is available at http://hyperbrowser.uio.no.


Asunto(s)
Enfermedad/genética , Genómica/métodos , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Gráficos por Computador , Humanos , Internet , Anotación de Secuencia Molecular
12.
BMC Genomics ; 9: 258, 2008 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-18513391

RESUMEN

BACKGROUND: Oligoarrays have become an accessible technique for exploring the transcriptome, but it is presently unclear how absolute transcript data from this technique compare to the data achieved with tag-based quantitative techniques, such as massively parallel signature sequencing (MPSS) and serial analysis of gene expression (SAGE). By use of the TransCount method we calculated absolute transcript concentrations from spotted oligoarray intensities, enabling direct comparisons with tag counts obtained with MPSS and SAGE. The tag counts were converted to number of transcripts per cell by assuming that the sum of all transcripts in a single cell was 5.105. Our aim was to investigate whether the less resource demanding and more widespread oligoarray technique could provide data that were correlated to and had the same absolute scale as those obtained with MPSS and SAGE. RESULTS: A number of 1,777 unique transcripts were detected in common for the three technologies and served as the basis for our analyses. The correlations involving the oligoarray data were not weaker than, but, similar to the correlation between the MPSS and SAGE data, both when the entire concentration range was considered and at high concentrations. The data sets were more strongly correlated at high transcript concentrations than at low concentrations. On an absolute scale, the number of transcripts per cell and gene was generally higher based on oligoarrays than on MPSS and SAGE, and ranged from 1.6 to 9,705 for the 1,777 overlapping genes. The MPSS data were on same scale as the SAGE data, ranging from 0.5 to 3,180 (MPSS) and 9 to1,268 (SAGE) transcripts per cell and gene. The sum of all transcripts per cell for these genes was 3.8.105 (oligoarrays), 1.1.105 (MPSS) and 7.6.104 (SAGE), whereas the corresponding sum for all detected transcripts was 1.1.106 (oligoarrays), 2.8.105 (MPSS) and 3.8.105 (SAGE). CONCLUSION: The oligoarrays and TransCount provide quantitative transcript concentrations that are correlated to MPSS and SAGE data, but, the absolute scale of the measurements differs across the technologies. The discrepancy questions whether the sum of all transcripts within a single cell might be higher than the number of 5.105 suggested in the literature and used to convert tag counts to transcripts per cell. If so, this may explain the apparent higher transcript detection efficiency of the oligoarrays, and has to be clarified before absolute transcript concentrations can be interchanged across the technologies. The ability to obtain transcript concentrations from oligoarrays opens up the possibility of efficient generation of universal transcript databases with low resource demands.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Animales , Etiquetas de Secuencia Expresada , Ratones , ARN Mensajero/genética , ARN Mensajero/metabolismo , Retina/metabolismo
13.
BMC Med Genomics ; 11(1): 24, 2018 03 07.
Artículo en Inglés | MEDLINE | ID: mdl-29514638

RESUMEN

BACKGROUND: Using high-dimensional penalized regression we studied genome-wide DNA-methylation in bone biopsies of 80 postmenopausal women in relation to their bone mineral density (BMD). The women showed BMD varying from severely osteoporotic to normal. Global gene expression data from the same individuals was available, and since DNA-methylation often affects gene expression, the overall aim of this paper was to include both of these omics data sets into an integrated analysis. METHODS: The classical penalized regression uses one penalty, but we incorporated individual penalties for each of the DNA-methylation sites. These individual penalties were guided by the strength of association between DNA-methylations and gene transcript levels. DNA-methylations that were highly associated to one or more transcripts got lower penalties and were therefore favored compared to DNA-methylations showing less association to expression. Because of the complex pathways and interactions among genes, we investigated both the association between DNA-methylations and their corresponding cis gene, as well as the association between DNA-methylations and trans-located genes. Two integrating penalized methods were used: first, an adaptive group-regularized ridge regression, and secondly, variable selection was performed through a modified version of the weighted lasso. RESULTS: When information from gene expressions was integrated, predictive performance was considerably improved, in terms of predictive mean square error, compared to classical penalized regression without data integration. We found a 14.7% improvement in the ridge regression case and a 17% improvement for the lasso case. Our version of the weighted lasso with data integration found a list of 22 interesting methylation sites. Several corresponded to genes that are known to be important in bone formation. Using BMD as response and these 22 methylation sites as covariates, least square regression analyses resulted in R2=0.726, comparable to an average R2=0.438 for 10000 randomly selected groups of DNA-methylations with group size 22. CONCLUSIONS: Two recent types of penalized regression methods were adapted to integrate DNA-methylation and their association to gene expression in the analysis of bone mineral density. In both cases predictions clearly benefit from including the additional information on gene expressions.


Asunto(s)
Densidad Ósea/genética , Metilación de ADN , Análisis de Datos , Perfilación de la Expresión Génica , Posmenopausia/genética , Posmenopausia/fisiología , Estudios de Cohortes , Femenino , Genómica , Humanos , Análisis Multivariante , Análisis de Regresión
14.
Nucleic Acids Res ; 33(17): e143, 2005 Oct 04.
Artículo en Inglés | MEDLINE | ID: mdl-16204447

RESUMEN

A method providing absolute transcript concentrations from spotted microarray intensity data is presented. Number of transcripts per microg total RNA, mRNA or per cell, are obtained for each gene, enabling comparisons of transcript levels within and between tissues. The method is based on Bayesian statistical modelling incorporating available information about the experiment from target preparation to image analysis, leading to realistically large confidence intervals for estimated concentrations. The method was validated in experiments using transcripts at known concentrations, showing accuracy and reproducibility of estimated concentrations, which were also in excellent agreement with results from quantitative real-time PCR. We determined the concentration for 10,157 genes in cervix cancers and a pool of cancer cell lines and found values in the range of 10(5)-10(10) transcripts per microg total RNA. The precision of our estimates was sufficiently high to detect significant concentration differences between two tumours and between different genes within the same tumour, comparisons that are not possible with standard intensity ratios. Our method can be used to explore the regulation of pathways and to develop individualized therapies, based on absolute transcript concentrations. It can be applied broadly, facilitating the construction of the transcriptome, continuously updating it by integrating future data.


Asunto(s)
Genómica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , ARN Mensajero/análisis , ARN Neoplásico/análisis , Teorema de Bayes , Línea Celular Tumoral , Femenino , Humanos , Transcripción Genética , Neoplasias del Cuello Uterino/genética
15.
Epigenetics ; 12(8): 674-687, 2017 08.
Artículo en Inglés | MEDLINE | ID: mdl-28650214

RESUMEN

DNA methylation affects expression of associated genes and may contribute to the missing genetic effects from genome-wide association studies of osteoporosis. To improve insight into the mechanisms of postmenopausal osteoporosis, we combined transcript profiling with DNA methylation analyses in bone. RNA and DNA were isolated from 84 bone biopsies of postmenopausal donors varying markedly in bone mineral density (BMD). In all, 2529 CpGs in the top 100 genes most significantly associated with BMD were analyzed. The methylation levels at 63 CpGs differed significantly between healthy and osteoporotic women at 10% false discovery rate (FDR). Five of these CpGs at 5% FDR could explain 14% of BMD variation. To test whether blood DNA methylation reflect the situation in bone (as shown for other tissues), an independent cohort was selected and BMD association was demonstrated in blood for 13 of the 63 CpGs. Four transcripts representing inhibitors of bone metabolism-MEPE, SOST, WIF1, and DKK1-showed correlation to a high number of methylated CpGs, at 5% FDR. Our results link DNA methylation to the genetic influence modifying the skeleton, and the data suggest a complex interaction between CpG methylation and gene regulation. This is the first study in the hitherto largest number of postmenopausal women to demonstrate a strong association among bone CpG methylation, transcript levels, and BMD/fracture. This new insight may have implications for evaluation of osteoporosis stage and susceptibility.


Asunto(s)
Metilación de ADN , Osteoporosis Posmenopáusica/genética , Proteínas Adaptadoras Transductoras de Señales/genética , Proteínas Adaptadoras Transductoras de Señales/metabolismo , Anciano , Anciano de 80 o más Años , Células Sanguíneas/metabolismo , Densidad Ósea/genética , Proteínas Morfogenéticas Óseas/genética , Proteínas Morfogenéticas Óseas/metabolismo , Huesos/metabolismo , Estudios de Casos y Controles , Islas de CpG , Proteínas de la Matriz Extracelular/genética , Proteínas de la Matriz Extracelular/metabolismo , Femenino , Marcadores Genéticos/genética , Glicoproteínas/genética , Glicoproteínas/metabolismo , Humanos , Péptidos y Proteínas de Señalización Intercelular/genética , Péptidos y Proteínas de Señalización Intercelular/metabolismo , Persona de Mediana Edad , Fosfoproteínas/genética , Fosfoproteínas/metabolismo , Proteínas Represoras/genética , Proteínas Represoras/metabolismo
16.
Gigascience ; 6(7): 1-12, 2017 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-28459977

RESUMEN

Background: Recent large-scale undertakings such as ENCODE and Roadmap Epigenomics have generated experimental data mapped to the human reference genome (as genomic tracks) representing a variety of functional elements across a large number of cell types. Despite the high potential value of these publicly available data for a broad variety of investigations, little attention has been given to the analytical methodology necessary for their widespread utilisation. Findings: We here present a first principled treatment of the analysis of collections of genomic tracks. We have developed novel computational and statistical methodology to permit comparative and confirmatory analyses across multiple and disparate data sources. We delineate a set of generic questions that are useful across a broad range of investigations and discuss the implications of choosing different statistical measures and null models. Examples include contrasting analyses across different tissues or diseases. The methodology has been implemented in a comprehensive open-source software system, the GSuite HyperBrowser. To make the functionality accessible to biologists, and to facilitate reproducible analysis, we have also developed a web-based interface providing an expertly guided and customizable way of utilizing the methodology. With this system, many novel biological questions can flexibly be posed and rapidly answered. Conclusions: Through a combination of streamlined data acquisition, interoperable representation of dataset collections, and customizable statistical analysis with guided setup and interpretation, the GSuite HyperBrowser represents a first comprehensive solution for integrative analysis of track collections across the genome and epigenome. The software is available at: https://hyperbrowser.uio.no.


Asunto(s)
Conjuntos de Datos como Asunto/normas , Epigénesis Genética , Epigenómica/métodos , Genoma Humano , Programas Informáticos , Secuenciación Completa del Genoma/métodos , Epigenómica/normas , Humanos , Secuenciación Completa del Genoma/normas
17.
J Steroid Biochem Mol Biol ; 95(1-5): 105-11, 2005 May.
Artículo en Inglés | MEDLINE | ID: mdl-16023338

RESUMEN

Intratumoral levels of E1 (oestrone), E1S (oestrone sulphate) and E2 (oestradiol) are significantly reduced by treatment with the aromatase inhibitor anastrozole regardless of treatment response. The purpose of the present pilot study was to look for additional markers of biochemical response to aromatase inhibitors on mRNA expression level. Whole genome expression was studied using microarray analysis of breast cancer tissue from 12 patients with locally advanced tumors, both before and following 15 weeks of treatment with the aromatase inhibitor anastrozole (Arimidex). Intratumoral mRNA levels for a subset of genes coding for steroid metabolizing enzymes, hormone receptors and some growth mediators involved in cell cycle control were analysed by quantitative RT-PCR. There was a correlation between the two methods for some but not all genes. The mRNA expression levels of the different genes were correlated to each other and to the intratumoral levels of E1, E2 and E1S, before and after the treatment. Notably, a correlation of the E1/E2 metabolic ratio to the mRNA levels of CYP19A1 was observed before treatment (r=0.745, p<0.005). Whole genome expression analysis of these 12 breast cancer patients revealed similar tumor classification to previously published larger studies. Tumors with no or low expression of ESR1 (oestrogen receptor) clustered together and were characterized by a strong basal-like signature highly expressing keratins 5/17, cadherin 3, frizzled and apolipoprotein D, among others. The luminal epithelial tumor cluster, on the other hand, highly expressed ESR1, GATA binding protein 3 and N-acetyl transferase. An evident ERBB2 cluster was observed due to the marked over-expression of the ERBB2 gene and GRB7 and PPARBP in this patient material). Using significance analysis of microarrays (SAM), we identified 298 genes significantly differently expressed between the partial response and progressive disease groups.


Asunto(s)
Antineoplásicos Hormonales/uso terapéutico , Inhibidores de la Aromatasa/uso terapéutico , Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/genética , Expresión Génica/efectos de los fármacos , Nitrilos/uso terapéutico , Triazoles/uso terapéutico , Anastrozol , Antineoplásicos Hormonales/farmacología , Inhibidores de la Aromatasa/farmacología , Biomarcadores de Tumor/genética , Femenino , Humanos , Nitrilos/farmacología , Análisis de Secuencia por Matrices de Oligonucleótidos , Triazoles/farmacología
18.
Genome Biol ; 11(12): R121, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-21182759

RESUMEN

The immense increase in the generation of genomic scale data poses an unmet analytical challenge, due to a lack of established methodology with the required flexibility and power. We propose a first principled approach to statistical analysis of sequence-level genomic information. We provide a growing collection of generic biological investigations that query pairwise relations between tracks, represented as mathematical objects, along the genome. The Genomic HyperBrowser implements the approach and is available at http://hyperbrowser.uio.no.


Asunto(s)
Biología Computacional/métodos , Genoma , Genómica/métodos , Análisis de Secuencia/métodos , Programas Informáticos , Emparejamiento Base , Exones , Expresión Génica , Histonas/metabolismo , Modelos Biológicos , Desnaturalización de Ácido Nucleico , Polimorfismo de Nucleótido Simple
19.
Bioinformatics ; 21(23): 4272-9, 2005 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-16216830

RESUMEN

MOTIVATION: Missing values are problematic for the analysis of microarray data. Imputation methods have been compared in terms of the similarity between imputed and true values in simulation experiments and not of their influence on the final analysis. The focus has been on missing at random, while entries are missing also not at random. RESULTS: We investigate the influence of imputation on the detection of differentially expressed genes from cDNA microarray data. We apply ANOVA for microarrays and SAM and look to the differentially expressed genes that are lost because of imputation. We show that this new measure provides useful information that the traditional root mean squared error cannot capture. We also show that the type of missingness matters: imputing 5% missing not at random has the same effect as imputing 10-30% missing at random. We propose a new method for imputation (LinImp), fitting a simple linear model for each channel separately, and compare it with the widely used KNNimpute method. For 10% missing at random, KNNimpute leads to twice as many lost differentially expressed genes as LinImp. AVAILABILITY: The R package for LinImp is available at http://folk.uio.no/idasch/imp.


Asunto(s)
Biología Computacional/métodos , Regulación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Algoritmos , Análisis de Varianza , Análisis por Conglomerados , ADN Complementario/metabolismo , Interpretación Estadística de Datos , Perfilación de la Expresión Génica , Funciones de Verosimilitud , Modelos Lineales , Cómputos Matemáticos , Modelos Genéticos , Modelos Estadísticos , Modelos Teóricos , Familia de Multigenes , Distribución Normal , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Análisis de Secuencia de ADN , Programas Informáticos , Estadística como Asunto
20.
Bioinformatics ; 21(6): 821-2, 2005 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-15531610

RESUMEN

SUMMARY: CGH-Explorer is a program for visualization and statistical analysis of microarray-based comparative genomic hybridization (array-CGH) data. The program has preprocessing facilities, tools for graphical exploration of individual arrays or groups of arrays, and tools for statistical identification of regions of amplification and deletion.


Asunto(s)
Análisis Mutacional de ADN/métodos , Perfilación de la Expresión Génica/métodos , Hibridación in Situ/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Interfaz Usuario-Computador , Gráficos por Computador , Dosificación de Gen , Variación Genética/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA