Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
BMC Bioinformatics ; 22(1): 498, 2021 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-34654363

RESUMEN

BACKGROUND: Identifying gene interactions is a topic of great importance in genomics, and approaches based on network models provide a powerful tool for studying these. Assuming a Gaussian graphical model, a gene association network may be estimated from multiomic data based on the non-zero entries of the inverse covariance matrix. Inferring such biological networks is challenging because of the high dimensionality of the problem, making traditional estimators unsuitable. The graphical lasso is constructed for the estimation of sparse inverse covariance matrices in such situations, using [Formula: see text]-penalization on the matrix entries. The weighted graphical lasso is an extension in which prior biological information from other sources is integrated into the model. There are however issues with this approach, as it naïvely forces the prior information into the network estimation, even if it is misleading or does not agree with the data at hand. Further, if an associated network based on other data is used as the prior, the method often fails to utilize the information effectively. RESULTS: We propose a novel graphical lasso approach, the tailored graphical lasso, that aims to handle prior information of unknown accuracy more effectively. We provide an R package implementing the method, tailoredGlasso. Applying the method to both simulated and real multiomic data sets, we find that it outperforms the unweighted and weighted graphical lasso in terms of all performance measures we consider. In fact, the graphical lasso and weighted graphical lasso can be considered special cases of the tailored graphical lasso, and a parameter determined by the data measures the usefulness of the prior information. We also find that among a larger set of methods, the tailored graphical is the most suitable for network inference from high-dimensional data with prior information of unknown accuracy. With our method, mRNA data are demonstrated to provide highly useful prior information for protein-protein interaction networks. CONCLUSIONS: The method we introduce utilizes useful prior information more effectively without involving any risk of loss of accuracy should the prior information be misleading.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Genómica , Distribución Normal , Mapas de Interacción de Proteínas
2.
Lifetime Data Anal ; 27(4): 710-736, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34618267

RESUMEN

Due to rapid developments in machine learning, and in particular neural networks, a number of new methods for time-to-event predictions have been developed in the last few years. As neural networks are parametric models, it is more straightforward to integrate parametric survival models in the neural network framework than the popular semi-parametric Cox model. In particular, discrete-time survival models, which are fully parametric, are interesting candidates to extend with neural networks. The likelihood for discrete-time survival data may be parameterized by the probability mass function (PMF) or by the discrete hazard rate, and both of these formulations have been used to develop neural network-based methods for time-to-event predictions. In this paper, we review and compare these approaches. More importantly, we show how the discrete-time methods may be adopted as approximations for continuous-time data. To this end, we introduce two discretization schemes, corresponding to equidistant times or equidistant marginal survival probabilities, and two ways of interpolating the discrete-time predictions, corresponding to piecewise constant density functions or piecewise constant hazard rates. Through simulations and study of real-world data, the methods based on the hazard rate parametrization are found to perform slightly better than the methods that use the PMF parametrization. Inspired by these investigations, we also propose a continuous-time method by assuming that the continuous-time hazard rate is piecewise constant. The method, named PC-Hazard, is found to be highly competitive with the aforementioned methods in addition to other methods for survival prediction found in the literature.


Asunto(s)
Redes Neurales de la Computación , Humanos , Modelos de Riesgos Proporcionales
3.
Appl Environ Microbiol ; 86(6)2020 03 02.
Artículo en Inglés | MEDLINE | ID: mdl-31953333

RESUMEN

The relative importance of host-specific selection or environmental factors in determining the composition of the intestinal microbiome in wild vertebrates remains poorly understood. Here, we used metagenomic shotgun sequencing of individual specimens to compare the levels of intra- and interspecific variation of intestinal microbiome communities in two ecotypes (NEAC and NCC) of Atlantic cod (Gadus morhua) that have distinct behavior and habitats and three Gadidae species that occupy a range of ecological niches. Interestingly, we found significantly diverged microbiomes among the two Atlantic cod ecotypes. Interspecific patterns of variation are more variable, with significantly diverged communities for most species' comparisons, apart from the comparison between coastal cod (NCC) and Norway pout (Trisopterus esmarkii), whose community compositions are not significantly diverged. The absence of consistent species-specific microbiomes suggests that external environmental factors, such as temperature, diet, or a combination thereof, comprise major drivers of the intestinal community composition of codfishes.IMPORTANCE The composition of the intestinal microbial community associated with teleost fish is influenced by a diversity of factors, ranging from internal factors (such as host-specific selection) to external factors (such as niche occupation). These factors are often difficult to separate, as differences in niche occupation (e.g., diet, temperature, or salinity) may correlate with distinct evolutionary trajectories. Here, we investigate four gadoid species with contrasting levels of evolutionary separation and niche occupation. Using metagenomic shotgun sequencing, we observed distinct microbiomes among two Atlantic cod (Gadus morhua) ecotypes (NEAC and NCC) with distinct behavior and habitats. In contrast, interspecific patterns of variation were more variable. For instance, we did not observe interspecific differentiation between the microbiomes of coastal cod (NCC) and Norway pout (Trisopterus esmarkii), whose lineages underwent evolutionary separation over 20 million years ago. The observed pattern of microbiome variation in these gadoid species is therefore most parsimoniously explained by differences in niche occupation.


Asunto(s)
Bacterias/genética , Ecotipo , Gadiformes/microbiología , Microbioma Gastrointestinal/genética , Metagenoma , Animales , Bacterias/aislamiento & purificación , Femenino , Gadus morhua/microbiología , Masculino , Noruega
4.
Environ Microbiol ; 21(7): 2576-2594, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-31091345

RESUMEN

Atlantic cod (Gadus morhua) is an ecologically important species with a wide-spread distribution in the North Atlantic Ocean, yet little is known about the diversity of its intestinal microbiome in its natural habitat. No geographical differentiation in this microbiome was observed based on 16S rRNA amplicon analyses, yet such finding may result from an inherent lack of power of this method to resolve fine-scaled biological complexity. Here, we use metagenomic shotgun sequencing to investigate the intestinal microbiome of 19 adult Atlantic cod individuals from two coastal populations in Norway-located 470 km apart. Resolving the species community to unprecedented resolution, we identify two abundant species, Photobacterium iliopiscarium and Photobacterium kishitanii, which comprise over 50% of the classified reads. Interestingly, the intestinal P. kishitanii strains have functionally intact lux genes, and its high abundance suggests that fish intestines form an important part of its ecological niche. These observations support a hypothesis that bioluminescence plays an ecological role in the marine food web. Despite our improved taxonomical resolution, we identify no geographical differences in bacterial community structure, indicating that the intestinal microbiome of these coastal cod is colonized by a limited number of closely related bacterial species with a broad geographical distribution.


Asunto(s)
Bacterias/aislamiento & purificación , Gadus morhua/microbiología , Microbioma Gastrointestinal , Intestinos/microbiología , Animales , Océano Atlántico , Bacterias/clasificación , Bacterias/genética , Metagenoma , Noruega , Photobacterium/genética , ARN Ribosómico 16S/genética
5.
Eur J Popul ; 35(1): 87-99, 2019 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-30976269

RESUMEN

Life expectancies at birth are routinely computed from period life tables. When mortality is falling, such period life expectancies will typically underestimate real life expectancies, that is, life expectancies for birth cohorts. Hence, it becomes problematic to compare period life expectancies between countries when they have different historical mortality developments. For instance, life expectancies for countries in which the longevity improved early (like Norway and Sweden) are difficult to compare with those in countries where it improved later (like Italy and Japan). To get a fair comparison between the countries, one should consider cohort data. Since cohort life expectancies can only be computed for cohorts that were born more than a hundred years ago, in this paper we suggest that for younger cohorts one may consider the expected number of years lost up to a given age. Contrary to the results based on period data, our cohort results then indicate that Italian women may expect to lose more years than women in Norway and Sweden, while there are no indications that Japanese women will lose fewer years than women in Scandinavia. The large differences seen for period data may just be an artefact due to the distortion that period life tables imply in times of changing mortality.

6.
Biometrics ; 71(3): 696-703, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25854648

RESUMEN

Standard use of Cox regression requires collection of covariate information for all individuals in a cohort even when only a small fraction of them experiences the event of interest (fail). This may be very expensive for large cohorts. Further in biomarker studies, it will imply a waste of valuable biological material that one may want to save for future studies. A nested case-control study offers a useful alternative. For this design, covariate information is only needed for the failing individuals (cases) and a sample of controls selected from the cases' at-risk sets. Methods based on martingale residuals are useful for checking the fit of Cox's regression model for cohort data. But similar methods have so far not been developed for nested case-control data. In this article, it is described how one may define martingale residuals for nested case-control data, and it is shown how plots and tests based on cumulative sums of martingale residuals may be used to check model fit. The plots and tests may be obtained using available software.


Asunto(s)
Algoritmos , Estudios de Casos y Controles , Interpretación Estadística de Datos , Modelos de Riesgos Proporcionales , Análisis de Regresión , Simulación por Computador , Humanos , Modelos Estadísticos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
7.
Stat Med ; 34(29): 3866-87, 2015 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-26278111

RESUMEN

When it comes to clinical survival trials, regulatory restrictions usually require the application of methods that solely utilize baseline covariates and the intention-to-treat principle. Thereby, much potentially useful information is lost, as collection of time-to-event data often goes hand in hand with collection of information on biomarkers and other internal time-dependent covariates. However, there are tools to incorporate information from repeated measurements in a useful manner that can help to shed more light on the underlying treatment mechanisms. We consider dynamic path analysis, a model for mediation analysis in the presence of a time-to-event outcome and time-dependent covariates to investigate direct and indirect effects in a study of different lipid-lowering treatments in patients with previous myocardial infarctions. Further, we address the question whether survival in itself may produce associations between the treatment and the mediator in dynamic path analysis and give an argument that because of linearity of the assumed additive hazard model, this is not the case. We further elaborate on our view that, when studying mediation, we are actually dealing with underlying processes rather than single variables measured only once during the study period. This becomes apparent in results from various models applied to the study of lipid-lowering treatments as well as our additionally conducted simulation study, where we clearly observe that discarding information on repeated measurements can lead to potentially erroneous conclusions.


Asunto(s)
Ensayos Clínicos como Asunto/estadística & datos numéricos , Interpretación Estadística de Datos , Proyectos de Investigación/estadística & datos numéricos , Análisis de Supervivencia , Ensayos Clínicos como Asunto/normas , Simulación por Computador , Humanos , Modelos de Riesgos Proporcionales , Proyectos de Investigación/normas , Factores de Tiempo , Resultado del Tratamiento
8.
Nucleic Acids Res ; 41(10): 5164-74, 2013 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-23571755

RESUMEN

The study of chromatin 3D structure has recently gained much focus owing to novel techniques for detecting genome-wide chromatin contacts using next-generation sequencing. A deeper understanding of the architecture of the DNA inside the nucleus is crucial for gaining insight into fundamental processes such as transcriptional regulation, genome dynamics and genome stability. Chromatin conformation capture-based methods, such as Hi-C and ChIA-PET, are now paving the way for routine genome-wide studies of chromatin 3D structure in a range of organisms and tissues. However, appropriate methods for analyzing such data are lacking. Here, we propose a hypothesis test and an enrichment score of 3D co-localization of genomic elements that handles intra- or interchromosomal interactions, both separately and jointly, and that adjusts for biases caused by structural dependencies in the 3D data. We show that maintaining structural properties during resampling is essential to obtain valid estimation of P-values. We apply the method on chromatin states and a set of mutated regions in leukemia cells, and find significant co-localization of these elements, with varying enrichment scores, supporting the role of chromatin 3D structure in shaping the landscape of somatic mutations in cancer.


Asunto(s)
Cromatina/química , Línea Celular Tumoral , Cromosomas Humanos/química , Interpretación Estadística de Datos , Genoma , Humanos , Leucemia/genética , Mutación , Conformación de Ácido Nucleico , Análisis de Secuencia de ADN
9.
BMC Public Health ; 15: 1082, 2015 Oct 23.
Artículo en Inglés | MEDLINE | ID: mdl-26498223

RESUMEN

BACKGROUND: Multi-state models, as an extension of traditional models in survival analysis, have proved to be a flexible framework for analysing the transitions between various states of sickness absence and work over time. In this paper we study a cohort of work rehabilitation participants and analyse their subsequent sickness absence using Norwegian registry data on sickness benefits. Our aim is to study how detailed individual covariate information from questionnaires explain differences in sickness absence and work, and to use methods from causal inference to assess the effect of interventions to reduce sickness absence. Examples of the latter are to evaluate the use of partial versus full time sick leave and to estimate the effect of a cooperation agreement on a more inclusive working life. METHODS: Covariate adjusted transition intensities are estimated using Cox proportional hazards and Aalen additive hazards models, while the effect of interventions are assessed using methods of inverse probability weighting and G-computation. RESULTS: Results from covariate adjusted analyses show great differences in sickness absence and work for patients with assumed high risk and low risk covariate characteristics, for example based on age, type of work, income, health score and type of diagnosis. Causal analyses show small effects of partial versus full time sick leave and a positive effect of having a cooperation agreement, with about 5 percent points higher probability of returning to work. CONCLUSIONS: Detailed covariate information is important for explaining transitions between different states of sickness absence and work, also for patient specific cohorts. Methods for causal inference can provide the needed tools for going from covariate specific estimates to population average effects in multi-state models, and identify causal parameters with a straightforward interpretation based on interventions.


Asunto(s)
Absentismo , Modelos Biológicos , Reinserción al Trabajo , Ausencia por Enfermedad , Adulto , Empleo/estadística & datos numéricos , Femenino , Humanos , Masculino , Persona de Mediana Edad , Medicina del Trabajo , Sistema de Registros , Rehabilitación , Reinserción al Trabajo/estadística & datos numéricos , Factores de Riesgo , Ausencia por Enfermedad/estadística & datos numéricos , Análisis de Supervivencia , Trabajo
10.
Lifetime Data Anal ; 21(4): 517-41, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-25608704

RESUMEN

In a nested case-control study, controls are selected for each case from the individuals who are at risk at the time at which the case occurs. We say that the controls are matched on study time. To adjust for possible confounding, it is common to match on other variables as well. The standard analysis of nested case-control data is based on a partial likelihood which compares the covariates of each case to those of its matched controls. It has been suggested that one may break the matching of nested case-control data and analyse them as case-cohort data using an inverse probability weighted (IPW) pseudo likelihood. Further, when some covariates are available for all individuals in the cohort, multiple imputation (MI) makes it possible to use all available data in the cohort. In the paper we review the standard method and the IPW and MI approaches, and compare their performance using simulations that cover a range of scenarios, including one and two endpoints.


Asunto(s)
Estudios de Casos y Controles , Bioestadística , Estudios de Cohortes , Simulación por Computador , Humanos , Funciones de Verosimilitud , Probabilidad , Modelos de Riesgos Proporcionales , Análisis de Supervivencia
11.
Stat Appl Genet Mol Biol ; 12(5): 637-52, 2013 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-23942354

RESUMEN

Genomics studies frequently involve clustering of molecular data to identify groups, but common clustering methods such as K-means clustering and hierarchical clustering do not determine the number of clusters. Methods for estimating the number of clusters typically focus on identifying the global structure in the data, however the discovery of substructures within clusters may also be of great biological interest. We propose a novel method, Partitioning Algorithm based on Recursive Thresholding (PART), that recursively uncovers distinct subgroups in the groups already identified. Outliers are common in high-dimensional genomics data and may mask the presence of substructure within a cluster. A crucial feature of the algorithm is the introduction of tentative splits of clusters to isolate outliers that might otherwise halt the recursion prematurely. The method is demonstrated on simulated as well as a wide range of real data sets from gene expression microarrays, where the correct clusters were known in advance. When subclusters are present and the variance is large or varies between the clusters, the proposed method performs better than two established global methods on simulated data. On the real data sets the overall performance of PART is superior to the global methods when used in combination with hierarchical clustering. The method is implemented in the R package clusterGenomics and is freely available from CRAN (The Comprehensive R Archive Network).


Asunto(s)
Perfilación de la Expresión Génica , Neoplasias/genética , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Simulación por Computador , Interpretación Estadística de Datos , Genómica , Humanos , Modelos Biológicos , Modelos Estadísticos , Neoplasias/metabolismo , Transcriptoma
13.
Biom J ; 53(2): 202-16, 2011 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-21308723

RESUMEN

Survival prediction from high-dimensional genomic data is dependent on a proper regularization method. With an increasing number of such methods proposed in the literature, comparative studies are called for and some have been performed. However, there is currently no consensus on which prediction assessment criterion should be used for time-to-event data. Without a firm knowledge about whether the choice of evaluation criterion may affect the conclusions made as to which regularization method performs best, these comparative studies may be of limited value. In this paper, four evaluation criteria are investigated: the log-rank test for two groups, the area under the time-dependent ROC curve (AUC), an R²-measure based on the Cox partial likelihood, and an R²-measure based on the Brier score. The criteria are compared according to how they rank six widely used regularization methods that are based on the Cox regression model, namely univariate selection, principal components regression (PCR), supervised PCR, partial least squares regression, ridge regression, and the lasso. Based on our application to three microarray gene expression data sets, we find that the results obtained from the widely used log-rank test deviate from the other three criteria studied. For future studies, where one also might want to include non-likelihood or non-model-based regularization methods, we argue in favor of AUC and the R²-measure based on the Brier score, as these do not suffer from the arbitrary splitting into two groups nor depend on the Cox partial likelihood.


Asunto(s)
Regulación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos , Algoritmos , Área Bajo la Curva , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Humanos , Linfoma de Células B Grandes Difuso/genética , Modelos Estadísticos , Neuroblastoma/genética , Reacción en Cadena de la Polimerasa , Pronóstico , Modelos de Riesgos Proporcionales , Curva ROC , Análisis de Regresión , Sobrevida
14.
Lifetime Data Anal ; 17(3): 445-60, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21046240

RESUMEN

We present a hierarchical frailty model based on distributions derived from non-negative Lévy processes. The model may be applied to data with several levels of dependence, such as family data or other general clusters, and is an alternative to additive frailty models. We present several parametric examples of the model, and properties such as expected values, variance and covariance. The model is applied to a case-cohort sample of age at onset for melanoma from the Swedish Multi-Generation Register, organized in nuclear families of parents and one or two children. We compare the genetic component of the total frailty variance to the common environmental term, and estimate the effect of birth cohort and gender.


Asunto(s)
Melanoma/genética , Modelos Genéticos , Modelos Estadísticos , Edad de Inicio , Estudios de Casos y Controles , Estudios de Cohortes , Familia , Femenino , Humanos , Masculino
15.
Waste Manag ; 126: 623-631, 2021 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-33866138

RESUMEN

Our society generates extensive amounts of municipal solid waste (MSW), which are mainly incinerated for volume reduction and energy recovery. Though, MSW incineration generates hazardous air pollution control (APC) residues that must be treated and deposited in appropriate landfills. An alternative to landfilling is material recovery, leading to regeneration of valuable products and reducing hazardous waste amounts. The chemical composition of APC residues, stemming from MSW, makes the waste attractive for metal and salt recovery, but its variation makes the development of material recovery processes challenging. This study investigates results from 895 X-ray fluorescence analyses of fly ash and dry scrubber residue samples originating from Norway and Sweden between 2006 and 2020 to explore variation in chemical composition within and between different incineration plants. The average relative standard deviation of elemental concentration in APC residue was estimated to 30% within plants. The variation in elemental concentration between grate fired incineration plants is about half of the average variation within the plants. The study also clarifies compositional differences from APC residues originating from fluidized bed incinerators and grate incinerators. Also, reported concentrations of APC residues from other countries than Sweden and Norway showed significant differences in chemical composition. The presented variations clarifies the importance of holistic approaches for waste valorization processes which can substitute stabilization processes for landfilling.


Asunto(s)
Metales Pesados , Eliminación de Residuos , Ceniza del Carbón/análisis , Incineración , Metales Pesados/análisis , Noruega , Residuos Sólidos/análisis , Suecia
16.
Commun Biol ; 3(1): 153, 2020 04 02.
Artículo en Inglés | MEDLINE | ID: mdl-32242091

RESUMEN

Somatic copy number alterations are a frequent sign of genome instability in cancer. A precise characterization of the genome architecture would reveal underlying instability mechanisms and provide an instrument for outcome prediction and treatment guidance. Here we show that the local spatial behavior of copy number profiles conveys important information about this architecture. Six filters were defined to characterize regional traits in copy number profiles, and the resulting Copy Aberration Regional Mapping Analysis (CARMA) algorithm was applied to tumors in four breast cancer cohorts (n = 2919). The derived motifs represent a layer of information that complements established molecular classifications of breast cancer. A score reflecting presence or absence of motifs provided a highly significant independent prognostic predictor. Results were consistent between cohorts. The nonsite-specific occurrence of the detected patterns suggests that CARMA captures underlying replication and repair defects and could have a future potential in treatment stratification.


Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias de la Mama/genética , Variaciones en el Número de Copia de ADN , Dosificación de Gen , Inestabilidad Genómica , Algoritmos , Neoplasias de la Mama/mortalidad , Neoplasias de la Mama/terapia , Toma de Decisiones Clínicas , Bases de Datos Genéticas , Femenino , Perfilación de la Expresión Génica , Humanos , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Pronóstico , Medición de Riesgo , Factores de Riesgo , Transcriptoma
17.
BMC Bioinformatics ; 10: 413, 2009 Dec 13.
Artículo en Inglés | MEDLINE | ID: mdl-20003386

RESUMEN

BACKGROUND: Survival prediction from high-dimensional genomic data is an active field in today's medical research. Most of the proposed prediction methods make use of genomic data alone without considering established clinical covariates that often are available and known to have predictive value. Recent studies suggest that combining clinical and genomic information may improve predictions, but there is a lack of systematic studies on the topic. Also, for the widely used Cox regression model, it is not obvious how to handle such combined models. RESULTS: We propose a way to combine classical clinical covariates with genomic data in a clinico-genomic prediction model based on the Cox regression model. The prediction model is obtained by a simultaneous use of both types of covariates, but applying dimension reduction only to the high-dimensional genomic variables. We describe how this can be done for seven well-known prediction methods: variable selection, unsupervised and supervised principal components regression and partial least squares regression, ridge regression, and the lasso. We further perform a systematic comparison of the performance of prediction models using clinical covariates only, genomic data only, or a combination of the two. The comparison is done using three survival data sets containing both clinical information and microarray gene expression data. Matlab code for the clinico-genomic prediction methods is available at http://www.med.uio.no/imb/stat/bmms/software/clinico-genomic/. CONCLUSIONS: Based on our three data sets, the comparison shows that established clinical covariates will often lead to better predictions than what can be obtained from genomic data alone. In the cases where the genomic models are better than the clinical, ridge regression is used for dimension reduction. We also find that the clinico-genomic models tend to outperform the models based on only genomic data. Further, clinico-genomic models and the use of ridge regression gives for all three data sets better predictions than models based on the clinical covariates alone.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Perfilación de la Expresión Génica , Análisis de Supervivencia
19.
BMC Med Genomics ; 11(1): 24, 2018 03 07.
Artículo en Inglés | MEDLINE | ID: mdl-29514638

RESUMEN

BACKGROUND: Using high-dimensional penalized regression we studied genome-wide DNA-methylation in bone biopsies of 80 postmenopausal women in relation to their bone mineral density (BMD). The women showed BMD varying from severely osteoporotic to normal. Global gene expression data from the same individuals was available, and since DNA-methylation often affects gene expression, the overall aim of this paper was to include both of these omics data sets into an integrated analysis. METHODS: The classical penalized regression uses one penalty, but we incorporated individual penalties for each of the DNA-methylation sites. These individual penalties were guided by the strength of association between DNA-methylations and gene transcript levels. DNA-methylations that were highly associated to one or more transcripts got lower penalties and were therefore favored compared to DNA-methylations showing less association to expression. Because of the complex pathways and interactions among genes, we investigated both the association between DNA-methylations and their corresponding cis gene, as well as the association between DNA-methylations and trans-located genes. Two integrating penalized methods were used: first, an adaptive group-regularized ridge regression, and secondly, variable selection was performed through a modified version of the weighted lasso. RESULTS: When information from gene expressions was integrated, predictive performance was considerably improved, in terms of predictive mean square error, compared to classical penalized regression without data integration. We found a 14.7% improvement in the ridge regression case and a 17% improvement for the lasso case. Our version of the weighted lasso with data integration found a list of 22 interesting methylation sites. Several corresponded to genes that are known to be important in bone formation. Using BMD as response and these 22 methylation sites as covariates, least square regression analyses resulted in R2=0.726, comparable to an average R2=0.438 for 10000 randomly selected groups of DNA-methylations with group size 22. CONCLUSIONS: Two recent types of penalized regression methods were adapted to integrate DNA-methylation and their association to gene expression in the analysis of bone mineral density. In both cases predictions clearly benefit from including the additional information on gene expressions.


Asunto(s)
Densidad Ósea/genética , Metilación de ADN , Análisis de Datos , Perfilación de la Expresión Génica , Posmenopausia/genética , Posmenopausia/fisiología , Estudios de Cohortes , Femenino , Genómica , Humanos , Análisis Multivariante , Análisis de Regresión
20.
Front Microbiol ; 9: 1561, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30057577

RESUMEN

Atlantic cod (Gadus morhua) provides an interesting species for the study of host-microbe interactions because it lacks the MHC II complex that is involved in the presentation of extracellular pathogens. Nonetheless, little is known about the diversity of its microbiome in natural populations. Here, we use high-throughput sequencing of the 16S rRNA V4 region, amplified with the primer design of the Earth Microbiome Project (EMP), to investigate the microbial composition in gut content and mucosa of 22 adult individuals from two coastal populations in Norway, located 470 km apart. We identify a core microbiome of 23 OTUs (97% sequence similarity) in all individuals that comprises 93% of the total number of reads. The most abundant orders are classified as Vibrionales, Fusobacteriales, Clostridiales, and Bacteroidales. While mucosal samples show significantly lower diversity than gut content samples, no differences in OTU community composition are observed between the two geographically separated populations. All specimens share a limited number of abundant OTUs. Moreover, the most abundant OTU consists of a single oligotype (order Vibrionales, genus Photobacterium) that represents nearly 50% of the reads in both locations. Our results suggest that these microbiomes comprise a limited number of species or that the EMP V4 primers do not yield sufficient resolution to confidently separate these communities. Our study contributes to a growing body of literature that shows limited spatial differentiation of the intestinal microbiomes in marine fish based on 16S rRNA sequencing, highlighting the need for multi-gene approaches to provide more insight into the diversity of these communities.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA