Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
BMC Bioinformatics ; 22(1): 498, 2021 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-34654363

RESUMO

BACKGROUND: Identifying gene interactions is a topic of great importance in genomics, and approaches based on network models provide a powerful tool for studying these. Assuming a Gaussian graphical model, a gene association network may be estimated from multiomic data based on the non-zero entries of the inverse covariance matrix. Inferring such biological networks is challenging because of the high dimensionality of the problem, making traditional estimators unsuitable. The graphical lasso is constructed for the estimation of sparse inverse covariance matrices in such situations, using [Formula: see text]-penalization on the matrix entries. The weighted graphical lasso is an extension in which prior biological information from other sources is integrated into the model. There are however issues with this approach, as it naïvely forces the prior information into the network estimation, even if it is misleading or does not agree with the data at hand. Further, if an associated network based on other data is used as the prior, the method often fails to utilize the information effectively. RESULTS: We propose a novel graphical lasso approach, the tailored graphical lasso, that aims to handle prior information of unknown accuracy more effectively. We provide an R package implementing the method, tailoredGlasso. Applying the method to both simulated and real multiomic data sets, we find that it outperforms the unweighted and weighted graphical lasso in terms of all performance measures we consider. In fact, the graphical lasso and weighted graphical lasso can be considered special cases of the tailored graphical lasso, and a parameter determined by the data measures the usefulness of the prior information. We also find that among a larger set of methods, the tailored graphical is the most suitable for network inference from high-dimensional data with prior information of unknown accuracy. With our method, mRNA data are demonstrated to provide highly useful prior information for protein-protein interaction networks. CONCLUSIONS: The method we introduce utilizes useful prior information more effectively without involving any risk of loss of accuracy should the prior information be misleading.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Genômica , Distribuição Normal , Mapas de Interação de Proteínas
2.
Commun Biol ; 3(1): 153, 2020 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-32242091

RESUMO

Somatic copy number alterations are a frequent sign of genome instability in cancer. A precise characterization of the genome architecture would reveal underlying instability mechanisms and provide an instrument for outcome prediction and treatment guidance. Here we show that the local spatial behavior of copy number profiles conveys important information about this architecture. Six filters were defined to characterize regional traits in copy number profiles, and the resulting Copy Aberration Regional Mapping Analysis (CARMA) algorithm was applied to tumors in four breast cancer cohorts (n = 2919). The derived motifs represent a layer of information that complements established molecular classifications of breast cancer. A score reflecting presence or absence of motifs provided a highly significant independent prognostic predictor. Results were consistent between cohorts. The nonsite-specific occurrence of the detected patterns suggests that CARMA captures underlying replication and repair defects and could have a future potential in treatment stratification.


Assuntos
Biomarcadores Tumorais/genética , Neoplasias da Mama/genética , Variações do Número de Cópias de DNA , Dosagem de Genes , Instabilidade Genômica , Algoritmos , Neoplasias da Mama/mortalidade , Neoplasias da Mama/terapia , Tomada de Decisão Clínica , Bases de Dados Genéticas , Feminino , Perfilação da Expressão Gênica , Humanos , Pessoa de Meia-Idade , Valor Preditivo dos Testes , Prognóstico , Medição de Risco , Fatores de Risco , Transcriptoma
3.
Stat Appl Genet Mol Biol ; 12(5): 637-52, 2013 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-23942354

RESUMO

Genomics studies frequently involve clustering of molecular data to identify groups, but common clustering methods such as K-means clustering and hierarchical clustering do not determine the number of clusters. Methods for estimating the number of clusters typically focus on identifying the global structure in the data, however the discovery of substructures within clusters may also be of great biological interest. We propose a novel method, Partitioning Algorithm based on Recursive Thresholding (PART), that recursively uncovers distinct subgroups in the groups already identified. Outliers are common in high-dimensional genomics data and may mask the presence of substructure within a cluster. A crucial feature of the algorithm is the introduction of tentative splits of clusters to isolate outliers that might otherwise halt the recursion prematurely. The method is demonstrated on simulated as well as a wide range of real data sets from gene expression microarrays, where the correct clusters were known in advance. When subclusters are present and the variance is large or varies between the clusters, the proposed method performs better than two established global methods on simulated data. On the real data sets the overall performance of PART is superior to the global methods when used in combination with hierarchical clustering. The method is implemented in the R package clusterGenomics and is freely available from CRAN (The Comprehensive R Archive Network).


Assuntos
Perfilação da Expressão Gênica , Neoplasias/genética , Software , Algoritmos , Análise por Conglomerados , Simulação por Computador , Interpretação Estatística de Dados , Genômica , Humanos , Modelos Biológicos , Modelos Estatísticos , Neoplasias/metabolismo , Transcriptoma
4.
Nucleic Acids Res ; 41(10): 5164-74, 2013 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-23571755

RESUMO

The study of chromatin 3D structure has recently gained much focus owing to novel techniques for detecting genome-wide chromatin contacts using next-generation sequencing. A deeper understanding of the architecture of the DNA inside the nucleus is crucial for gaining insight into fundamental processes such as transcriptional regulation, genome dynamics and genome stability. Chromatin conformation capture-based methods, such as Hi-C and ChIA-PET, are now paving the way for routine genome-wide studies of chromatin 3D structure in a range of organisms and tissues. However, appropriate methods for analyzing such data are lacking. Here, we propose a hypothesis test and an enrichment score of 3D co-localization of genomic elements that handles intra- or interchromosomal interactions, both separately and jointly, and that adjusts for biases caused by structural dependencies in the 3D data. We show that maintaining structural properties during resampling is essential to obtain valid estimation of P-values. We apply the method on chromatin states and a set of mutated regions in leukemia cells, and find significant co-localization of these elements, with varying enrichment scores, supporting the role of chromatin 3D structure in shaping the landscape of somatic mutations in cancer.


Assuntos
Cromatina/química , Linhagem Celular Tumoral , Cromossomos Humanos/química , Interpretação Estatística de Dados , Genoma , Humanos , Leucemia/genética , Mutação , Conformação de Ácido Nucleico , Análise de Sequência de DNA
5.
Biom J ; 53(2): 202-16, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21308723

RESUMO

Survival prediction from high-dimensional genomic data is dependent on a proper regularization method. With an increasing number of such methods proposed in the literature, comparative studies are called for and some have been performed. However, there is currently no consensus on which prediction assessment criterion should be used for time-to-event data. Without a firm knowledge about whether the choice of evaluation criterion may affect the conclusions made as to which regularization method performs best, these comparative studies may be of limited value. In this paper, four evaluation criteria are investigated: the log-rank test for two groups, the area under the time-dependent ROC curve (AUC), an R²-measure based on the Cox partial likelihood, and an R²-measure based on the Brier score. The criteria are compared according to how they rank six widely used regularization methods that are based on the Cox regression model, namely univariate selection, principal components regression (PCR), supervised PCR, partial least squares regression, ridge regression, and the lasso. Based on our application to three microarray gene expression data sets, we find that the results obtained from the widely used log-rank test deviate from the other three criteria studied. For future studies, where one also might want to include non-likelihood or non-model-based regularization methods, we argue in favor of AUC and the R²-measure based on the Brier score, as these do not suffer from the arbitrary splitting into two groups nor depend on the Cox partial likelihood.


Assuntos
Regulação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Algoritmos , Área Sob a Curva , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Humanos , Linfoma Difuso de Grandes Células B/genética , Modelos Estatísticos , Neuroblastoma/genética , Reação em Cadeia da Polimerase , Prognóstico , Modelos de Riscos Proporcionais , Curva ROC , Análise de Regressão , Sobrevida
6.
Lifetime Data Anal ; 17(3): 445-60, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21046240

RESUMO

We present a hierarchical frailty model based on distributions derived from non-negative Lévy processes. The model may be applied to data with several levels of dependence, such as family data or other general clusters, and is an alternative to additive frailty models. We present several parametric examples of the model, and properties such as expected values, variance and covariance. The model is applied to a case-cohort sample of age at onset for melanoma from the Swedish Multi-Generation Register, organized in nuclear families of parents and one or two children. We compare the genetic component of the total frailty variance to the common environmental term, and estimate the effect of birth cohort and gender.


Assuntos
Melanoma/genética , Modelos Genéticos , Modelos Estatísticos , Idade de Início , Estudos de Casos e Controles , Estudos de Coortes , Família , Feminino , Humanos , Masculino
7.
Lifetime Data Anal ; 14(2): 179-95, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18188699

RESUMO

Most methods for survival prediction from high-dimensional genomic data combine the Cox proportional hazards model with some technique of dimension reduction, such as partial least squares regression (PLS). Applying PLS to the Cox model is not entirely straightforward, and multiple approaches have been proposed. The method of Park etal. (Bioinformatics 18(Suppl. 1):S120-S127, 2002) uses a reformulation of the Cox likelihood to a Poisson type likelihood, thereby enabling estimation by iteratively reweighted partial least squares for generalized linear models. We propose a modification of the method of Park et al. (2002) such that estimates of the baseline hazard and the gene effects are obtained in separate steps. The resulting method has several advantages over the method of Park et al. (2002) and other existing Cox PLS approaches, as it allows for estimation of survival probabilities for new patients, enables a less memory-demanding estimation procedure, and allows for incorporation of lower-dimensional non-genomic variables like disease grade and tumor thickness. We also propose to combine our Cox PLS method with an initial gene selection step in which genes are ordered by their Cox score and only the highest-ranking k% of the genes are retained, obtaining a so-called supervised partial least squares regression method. In simulations, both the unsupervised and the supervised version outperform other Cox PLS methods.


Assuntos
Genômica/métodos , Análise dos Mínimos Quadrados , Modelos de Riscos Proporcionais , Neoplasias da Mama/genética , Simulação por Computador , Feminino , Perfilação da Expressão Gênica , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sobrevida
8.
Breast Cancer Res ; 9(3): R30, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17504517

RESUMO

INTRODUCTION: Gene expression profiling of breast carcinomas has increased our understanding of the heterogeneous biology of this disease and promises to impact clinical care. The aim of this study was to evaluate the prognostic value of gene expression-based classification along with established prognostic markers and mutation status of the TP53 gene (tumour protein p53) in a group of breast cancer patients with long-term (12 to 16 years) follow-up. METHODS: The clinical and histopathological parameters of 200 breast cancer patients were studied for their effects on clinical outcome using univariate/multivariate Cox regression. The prognostic impact of mutations in the TP53 gene, identified using temporal temperature gradient gel electrophoresis and sequencing, was also evaluated. Eighty of the samples were analyzed for gene expression using 42 K cDNA microarrays and the patients were assigned to five previously defined molecular expression groups. The strength of the gene expression based classification versus standard markers was evaluated by adding this variable to the Cox regression model used to analyze all samples. RESULTS: Both univariate and multivariate analysis showed that TP53 mutation status, tumor size and lymph node status were the strongest predictors of breast cancer survival for the whole group of patients. Analyses of the patients with gene expression data showed that TP53 mutation status, gene expression based classification, tumor size and lymph node status were significant predictors of survival. Breast cancer cases in the 'basal-like' and 'ERBB2+' gene expression subgroups had a very high mortality the first two years, while the 'highly proliferating luminal' cases developed the disease more slowly, showing highest mortality after 5 to 8 years. The TP53 mutation status showed strong association with the 'basal-like' and 'ERBB2+' subgroups, and tumors with mutation had a characteristic gene expression pattern. CONCLUSION: TP53 mutation status and gene-expression based groups are important survival markers of breast cancer, and these molecular markers may provide prognostic information that complements clinical variables. The study adds experience and knowledge to an ongoing characterization and classification of the disease.


Assuntos
Neoplasias da Mama/epidemiologia , Neoplasias da Mama/genética , Perfilação da Expressão Gênica , Proteína Supressora de Tumor p53/genética , Adulto , Idoso , Análise de Variância , Neoplasias da Mama/mortalidade , Neoplasias da Mama Masculina/genética , Neoplasias da Mama Masculina/mortalidade , Feminino , Seguimentos , Genes erbB-2 , Genes p53 , Marcadores Genéticos , Humanos , Masculino , Pessoa de Meia-Idade , Análise Multivariada , Análise de Sequência com Séries de Oligonucleotídeos , Prognóstico , Análise de Regressão , Análise de Sobrevida , Fatores de Tempo
9.
Lifetime Data Anal ; 13(2): 211-40, 2007 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-17334924

RESUMO

We study non-Markov multistage models under dependent censoring regarding estimation of stage occupation probabilities. The individual transition and censoring mechanisms are linked together through covariate processes that affect both the transition intensities and the censoring hazard for the corresponding subjects. In order to adjust for the dependent censoring, an additive hazard regression model is applied to the censoring times, and all observed counting and "at risk" processes are subsequently given an inverse probability of censoring weighted form. We examine the bias of the Datta-Satten and Aalen-Johansen estimators of stage occupation probability, and also consider the variability of these estimators by studying their estimated standard errors and mean squared errors. Results from different simulation studies of frailty models indicate that the Datta-Satten estimator is approximately unbiased, whereas the Aalen-Johansen estimator either under- or overestimates the stage occupation probability due to the dependent nature of the censoring process. However, in our simulations, the mean squared error of the latter estimator tends to be slightly smaller than that of the former estimator. Studies on development of nephropathy among diabetics and on blood platelet recovery among bone marrow transplant patients are used as demonstrations on how the two estimation methods work in practice. Our analyses show that the Datta-Satten estimator performs well in estimating stage occupation probability, but that the censoring mechanism has to be quite selective before a deviation from the Aalen-Johansen estimator is of practical importance.


Assuntos
Plaquetas/metabolismo , Transplante de Medula Óssea/fisiologia , Nefropatias Diabéticas/epidemiologia , Falência Renal Crônica/epidemiologia , Cadeias de Markov , Transplante de Medula Óssea/estatística & dados numéricos , Dinamarca/epidemiologia , Feminino , Humanos , Masculino , New South Wales/epidemiologia , Estados Unidos/epidemiologia
10.
Lifetime Data Anal ; 12(2): 143-67, 2006 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16817006

RESUMO

In this article we introduce a general approach to dynamic path analysis. This is an extension of classical path analysis to the situation where variables may be time-dependent and where the outcome of main interest is a stochastic process. In particular we will focus on the survival and event history analysis setting where the main outcome is a counting process. Our approach will be especially fruitful for analyzing event history data with internal time-dependent covariates, where an ordinary regression analysis may fail. The approach enables us to describe how the effect of a fixed covariate partly is working directly and partly indirectly through internal time-dependent covariates. For the sequence of times of event, we define a sequence of path analysis models. At each time of an event, ordinary linear regression is used to estimate the relation between the covariates, while the additive hazard model is used for the regression of the counting process on the covariates. The methodology is illustrated using data from a randomized trial on survival for patients with liver cirrhosis.


Assuntos
Modelos Estatísticos , Análise de Sobrevida , Idoso , Antineoplásicos Hormonais/uso terapêutico , Dinamarca , Feminino , Humanos , Cirrose Hepática/tratamento farmacológico , Masculino , Pessoa de Meia-Idade , Prednisona/uso terapêutico , Processos Estocásticos , Fatores de Tempo
11.
Biom J ; 48(3): 381-98, 2006 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16845903

RESUMO

We propose a method for analysis of recurrent event data using information on previous occurrences of the event as a time-dependent covariate. The focus is on understanding how to analyze the effect of such a dynamic covariate while at the same time ensuring that the effects of treatment and other fixed covariates are unbiasedly estimated. By applying an additive regression model for the intensity of the recurrent events, concepts like direct, indirect and total effects of the fixed covariates may be defined in an analogous way as for traditional path analysis. Theoretical considerations as well as simulations are presented, and a data set on recurrent bladder tumors is used to illustrate the methodology.


Assuntos
Biometria/métodos , Interpretação Estatística de Dados , Modelos Estatísticos , Recidiva Local de Neoplasia/mortalidade , Modelos de Riscos Proporcionais , Medição de Risco/métodos , Neoplasias da Bexiga Urinária/mortalidade , Simulação por Computador , Bases de Dados Factuais , Humanos , Reprodutibilidade dos Testes , Fatores de Risco , Sensibilidade e Especificidade
12.
Stat Methods Med Res ; 11(2): 183-202, 2002 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-12040696

RESUMO

Multi-state models are used to describe situations where individuals may move among a finite number of states defined by specific conditions of health, including death. The transition intensities of the models are described by proportional hazards models, and it is reviewed how estimation of the regression parameters and the baseline transition intensities may be performed when only nested case-control data are available for all or some of the transitions. The regression parameter estimates and the estimates of baseline transition intensities are combined to give estimates of the integrated transition intensities for specified covariate histories, and from these estimates covariate-dependent Markov transition probabilities are derived.


Assuntos
Biometria , Estudos de Casos e Controles , Cadeias de Markov , Análise de Variância , Humanos , Neoplasias Pulmonares/mortalidade , Mineração , Modelos Biológicos , Neoplasias Induzidas por Radiação/mortalidade , Probabilidade , Modelos de Riscos Proporcionais , Análise de Regressão , Fatores de Risco , Urânio
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA