Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
BMC Bioinformatics ; 25(1): 147, 2024 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-38605284

RESUMEN

BACKGROUND: Expression quantitative trait locus (eQTL) analysis aims to detect the genetic variants that influence the expression of one or more genes. Gene-level eQTL testing forms a natural grouped-hypothesis testing strategy with clear biological importance. Methods to control family-wise error rate or false discovery rate for group testing have been proposed earlier, but may not be powerful or easily apply to eQTL data, for which certain structured alternatives may be defensible and may enable the researcher to avoid overly conservative approaches. RESULTS: In an empirical Bayesian setting, we propose a new method to control the false discovery rate (FDR) for grouped hypotheses. Here, each gene forms a group, with SNPs annotated to the gene corresponding to individual hypotheses. The heterogeneity of effect sizes in different groups is considered by the introduction of a random effects component. Our method, entitled Random Effects model and testing procedure for Group-level FDR control (REG-FDR), assumes a model for alternative hypotheses for the eQTL data and controls the FDR by adaptive thresholding. As a convenient alternate approach, we also propose Z-REG-FDR, an approximate version of REG-FDR, that uses only Z-statistics of association between genotype and expression for each gene-SNP pair. The performance of Z-REG-FDR is evaluated using both simulated and real data. Simulations demonstrate that Z-REG-FDR performs similarly to REG-FDR, but with much improved computational speed. CONCLUSION: Our results demonstrate that the Z-REG-FDR method performs favorably compared to other methods in terms of statistical power and control of FDR. It can be of great practical use for grouped hypothesis testing for eQTL analysis or similar problems in statistical genomics due to its fast computation and ability to be fit using only summary data.


Asunto(s)
Genómica , Sitios de Carácter Cuantitativo , Simulación por Computador , Teorema de Bayes , Genotipo
2.
Biostatistics ; 24(2): 388-405, 2023 04 14.
Artículo en Inglés | MEDLINE | ID: mdl-33948626

RESUMEN

The relative proportion of RNA isoforms expressed for a given gene has been associated with disease states in cancer, retinal diseases, and neurological disorders. Examination of relative isoform proportions can help determine biological mechanisms, but such analyses often require a per-gene investigation of splicing patterns. Leveraging large public data sets produced by genomic consortia as a reference, one can compare splicing patterns in a data set of interest with those of a reference panel in which samples are divided into distinct groups, such as tissue of origin, or disease status. We propose A latent Dirichlet model to Compare expressed isoform proportions TO a Reference panel (ACTOR), a latent Dirichlet model with Dirichlet Multinomial observations to compare expressed isoform proportions in a data set to an independent reference panel. We use a variational Bayes procedure to estimate posterior distributions for the group membership of one or more samples. Using the Genotype-Tissue Expression project as a reference data set, we evaluate ACTOR on simulated and real RNA-seq data sets to determine tissue-type classifications of genes. ACTOR is publicly available as an R package at https://github.com/mccabes292/actor.


Asunto(s)
Teorema de Bayes , Humanos , Isoformas de Proteínas/genética , Isoformas de Proteínas/análisis , Isoformas de Proteínas/metabolismo , Análisis de Secuencia de ARN/métodos
3.
Dis Esophagus ; 36(4)2023 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-36222072

RESUMEN

Few predictors of response to topical corticosteroid (tCS) treatment have been identified in eosinophilic esophagitis (EoE). We aimed to determine whether baseline gene expression predicts histologic response to tCS treatment for EoE. We analyzed prospectively collected samples from incident EoE cases who were treated with tCS for 8 weeks in a development cohort (prospective study) or in an independent validation cohort (clinical trial). Whole transcriptome RNA expression was determined from a baseline (pre-treatment) RNA-later preserved esophageal biopsy. Baseline expression was compared between histologic responders (<15 eos/hpf) and non-responders (≥15 eos/hpf), and differential correlation was used to assess baseline gene expression by response status. In 87 EoE cases analyzed in the development set, there were no differentially expressed genes associated with treatment response (at false discovery rate = 0.1). However, differential correlation identified a module of 22 genes with statistically significantly high pairwise correlation in non-responders (mean correlation coefficient = 0.7) compared to low correlation in responders (coefficient = 0.3). When this 22-gene module was applied to the 89 EoE cases in the independent cohort, it was not validated to predict tCS response at the 15 eos/hpf threshold (mean correlation coefficient = 0.32 in responders and 0.25 in nonresponders). Exploration of other thresholds also did not validate any modules. Though we identified a 22 gene differential correlation module measured pre-treatment that was strongly associated with subsequent histologic response to tCS in EoE, this was not validated in an independent population. Alternative methods to predict steroid response should be explored.


Asunto(s)
Esofagitis Eosinofílica , Humanos , Esofagitis Eosinofílica/tratamiento farmacológico , Esofagitis Eosinofílica/genética , Esofagitis Eosinofílica/complicaciones , Estudios Prospectivos , Glucocorticoides/uso terapéutico , Esteroides/uso terapéutico , Expresión Génica
4.
Biostatistics ; 19(3): 391-406, 2018 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-29029013

RESUMEN

Expression quantitative trait locus (eQTL) analyses identify genetic markers associated with the expression of a gene. Most up-to-date eQTL studies consider the connection between genetic variation and expression in a single tissue. Multi-tissue analyses have the potential to improve findings in a single tissue, and elucidate the genotypic basis of differences between tissues. In this article, we develop a hierarchical Bayesian model (MT-eQTL) for multi-tissue eQTL analysis. MT-eQTL explicitly captures patterns of variation in the presence or absence of eQTL, as well as the heterogeneity of effect sizes across tissues. We devise an efficient Expectation-Maximization (EM) algorithm for model fitting. Inferences concerning eQTL detection and the configuration of eQTL across tissues are derived from the adaptive thresholding of local false discovery rates, and maximum a posteriori estimation, respectively. We also provide theoretical justification of the adaptive procedure. We investigate the MT-eQTL model through an extensive analysis of a 9-tissue data set from the GTEx initiative.


Asunto(s)
Bioestadística/métodos , Expresión Génica , Genómica/métodos , Técnicas de Genotipaje/métodos , Modelos Estadísticos , Sitios de Carácter Cuantitativo , Teorema de Bayes , Humanos
5.
BMC Bioinformatics ; 19(1): 95, 2018 03 09.
Artículo en Inglés | MEDLINE | ID: mdl-29523079

RESUMEN

BACKGROUND: Expression quantitative trait loci (eQTL) analysis identifies genetic markers associated with the expression of a gene. Most existing eQTL analyses and methods investigate association in a single, readily available tissue, such as blood. Joint analysis of eQTL in multiple tissues has the potential to improve, and expand the scope of, single-tissue analyses. Large-scale collaborative efforts such as the Genotype-Tissue Expression (GTEx) program are currently generating high quality data in a large number of tissues. However, computational constraints limit genome-wide multi-tissue eQTL analysis. RESULTS: We develop an integrative method under a hierarchical Bayesian framework for eQTL analysis in a large number of tissues. The model fitting procedure is highly scalable, and the computing time is a polynomial function of the number of tissues. Multi-tissue eQTLs are identified through a local false discovery rate approach, which rigorously controls the false discovery rate. Using simulation and GTEx real data studies, we show that the proposed method has superior performance to existing methods in terms of computing time and the power of eQTL discovery. CONCLUSIONS: We provide a scalable method for eQTL analysis in a large number of tissues. The method enables the identification of eQTL with different configurations and facilitates the characterization of tissue specificity.


Asunto(s)
Regulación de la Expresión Génica , Especificidad de Órganos/genética , Sitios de Carácter Cuantitativo/genética , Algoritmos , Teorema de Bayes , Simulación por Computador , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Polimorfismo de Nucleótido Simple/genética , Curva ROC
6.
Biometrics ; 74(2): 616-625, 2018 06.
Artículo en Inglés | MEDLINE | ID: mdl-29073327

RESUMEN

The study of expression Quantitative Trait Loci (eQTL) is an important problem in genomics and biomedicine. While detection (testing) of eQTL associations has been widely studied, less work has been devoted to the estimation of eQTL effect size. To reduce false positives, detection methods frequently rely on linear modeling of rank-based normalized or log-transformed gene expression data. Unfortunately, these approaches do not correspond to the simplest model of eQTL action, and thus yield estimates of eQTL association that can be uninterpretable and inaccurate. In this article, we propose a new, log-of-linear model for eQTL action, termed ACME, that captures allelic contributions to cis-acting eQTLs in an additive fashion, yielding effect size estimates that correspond to a biologically coherent model of cis-eQTLs. We describe a non-linear least-squares algorithm to fit the model by maximum likelihood, and obtain corresponding p-values. We perform careful investigation of the model using a combination of simulated data and data from the Genotype Tissue Expression (GTEx) project. Our results reveal little evidence for dominance effects, a parsimonious result that accords with a simple biological model for allele-specific expression and supports use of the ACME model. We show that Type-I error is well-controlled under our approach in a realistic setting, so that rank-based normalizations are unnecessary. Furthermore, we show that such normalizations can be detrimental to power and estimation accuracy under the proposed model. We then show, through effect size analyses of whole-genome cis-eQTLs in the GTEx data, that using standard normalizations instead of ACME noticeably affects the ranking and sign of estimates.


Asunto(s)
Modelos Lineales , Sitios de Carácter Cuantitativo , Algoritmos , Alelos , Expresión Génica , Humanos , Estadística como Asunto
7.
Genet Epidemiol ; 39(2): 77-88, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-25417853

RESUMEN

Genomewide association studies (GWAS) sometimes identify loci at which both the number and identities of the underlying causal variants are ambiguous. In such cases, statistical methods that model effects of multiple single-nucleotide polymorphisms (SNPs) simultaneously can help disentangle the observed patterns of association and provide information about how those SNPs could be prioritized for follow-up studies. Current multi-SNP methods, however, tend to assume that SNP effects are well captured by additive genetics; yet when genetic dominance is present, this assumption translates to reduced power and faulty prioritizations. We describe a statistical procedure for prioritizing SNPs at GWAS loci that efficiently models both additive and dominance effects. Our method, LLARRMA-dawg, combines a group LASSO procedure for sparse modeling of multiple SNP effects with a resampling procedure based on fractional observation weights. It estimates for each SNP the robustness of association with the phenotype both to sampling variation and to competing explanations from other SNPs. In producing an SNP prioritization that best identifies underlying true signals, we show the following: our method easily outperforms a single-marker analysis; when additive-only signals are present, our joint model for additive and dominance is equivalent to or only slightly less powerful than modeling additive-only effects; and when dominance signals are present, even in combination with substantial additive effects, our joint model is unequivocally more powerful than a model assuming additivity. We also describe how performance can be improved through calibrated randomized penalization, and discuss how dominance in ungenotyped SNPs can be incorporated through either heterozygote dosage or multiple imputation.


Asunto(s)
Genes Dominantes/genética , Modelos Genéticos , Polimorfismo de Nucleótido Simple/genética , Sitios Genéticos/genética , Estudio de Asociación del Genoma Completo , Heterocigoto , Humanos , Fenotipo , Curva ROC
8.
Biometrics ; 71(4): 1185-94, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26243050

RESUMEN

We describe a simple, computationally efficient, permutation-based procedure for selecting the penalty parameter in LASSO-penalized regression. The procedure, permutation selection, is intended for applications where variable selection is the primary focus, and can be applied in a variety of structural settings, including that of generalized linear models. We briefly discuss connections between permutation selection and existing theory for the LASSO. In addition, we present a simulation study and an analysis of real biomedical data sets in which permutation selection is compared with selection based on the following: cross-validation (CV), the Bayesian information criterion (BIC), scaled sparse linear regression, and a selection method based on recently developed testing procedures for the LASSO.


Asunto(s)
Modelos Estadísticos , Animales , Teorema de Bayes , Biometría/métodos , Neoplasias de la Mama/genética , HDL-Colesterol/sangre , HDL-Colesterol/genética , Simulación por Computador , Bases de Datos Factuales/estadística & datos numéricos , Femenino , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Modelos Lineales , Modelos Logísticos , Ratones , Análisis de Regresión
9.
Genet Epidemiol ; 36(5): 451-62, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22549815

RESUMEN

Significance testing one SNP at a time has proven useful for identifying genomic regions that harbor variants affecting human disease. But after an initial genome scan has identified a "hit region" of association, single-locus approaches can falter. Local linkage disequilibrium (LD) can make both the number of underlying true signals and their identities ambiguous. Simultaneous modeling of multiple loci should help. However, it is typically applied ad hoc: conditioning on the top SNPs, with limited exploration of the model space and no assessment of how sensitive model choice was to sampling variability. Formal alternatives exist but are seldom used. Bayesian variable selection is coherent but requires specifying a full joint model, including priors on parameters and the model space. Penalized regression methods (e.g., LASSO) appear promising but require calibration, and, once calibrated, lead to a choice of SNPs that can be misleadingly decisive. We present a general method for characterizing uncertainty in model choice that is tailored to reprioritizing SNPs within a hit region under strong LD. Our method, LASSO local automatic regularization resample model averaging (LLARRMA), combines LASSO shrinkage with resample model averaging and multiple imputation, estimating for each SNP the probability that it would be included in a multi-SNP model in alternative realizations of the data. We apply LLARRMA to simulations based on case-control genome-wide association studies data, and find that when there are several causal loci and strong LD, LLARRMA identifies a set of candidates that is enriched for true signals relative to single locus analysis and to the recently proposed method of Stability Selection.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Algoritmos , Teorema de Bayes , Calibración , Estudios de Casos y Controles , Mapeo Cromosómico , Simulación por Computador , Genotipo , Humanos , Modelos Genéticos , Modelos Estadísticos , Modelos Teóricos , Epidemiología Molecular/métodos , Curva ROC , Análisis de Regresión
10.
Bernoulli (Andover) ; 19(1): 275-294, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24194673

RESUMEN

We investigate the maximal size of distinguished submatrices of a Gaussian random matrix. Of interest are submatrices whose entries have an average greater than or equal to a positive constant, and submatrices whose entries are well fit by a two-way ANOVA model. We identify size thresholds and associated (asymptotic) probability bounds for both large-average and ANOVA-fit submatrices. Probability bounds are obtained when the matrix and submatrices of interest are square and, in rectangular cases, when the matrix and submatrices of interest have fixed aspect ratios. Our principal result is an almost sure interval concentration result for the size of large average submatrices in the square case.

11.
J Clin Oncol ; 41(26): 4192-4199, 2023 Sep 10.
Artículo en Inglés | MEDLINE | ID: mdl-37672882

RESUMEN

PURPOSE: To improve on current standards for breast cancer prognosis and prediction of chemotherapy benefit by developing a risk model that incorporates the gene expression-based "intrinsic" subtypes luminal A, luminal B, HER2-enriched, and basal-like. METHODS: A 50-gene subtype predictor was developed using microarray and quantitative reverse transcriptase polymerase chain reaction data from 189 prototype samples. Test sets from 761 patients (no systemic therapy) were evaluated for prognosis, and 133 patients were evaluated for prediction of pathologic complete response (pCR) to a taxane and anthracycline regimen. RESULTS: The intrinsic subtypes as discrete entities showed prognostic significance (P = 2.26E-12) and remained significant in multivariable analyses that incorporated standard parameters (estrogen receptor status, histologic grade, tumor size, and node status). A prognostic model for node-negative breast cancer was built using intrinsic subtype and clinical information. The C-index estimate for the combined model (subtype and tumor size) was a significant improvement on either the clinicopathologic model or subtype model alone. The intrinsic subtype model predicted neoadjuvant chemotherapy efficacy with a negative predictive value for pCR of 97%. CONCLUSION: Diagnosis by intrinsic subtype adds significant prognostic and predictive information to standard parameters for patients with breast cancer. The prognostic properties of the continuous risk score will be of value for the management of node-negative breast cancers. The subtypes and risk score can also be used to assess the likelihood of efficacy from neoadjuvant chemotherapy.

12.
Breast Cancer Res Treat ; 133(3): 865-80, 2012 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-22048815

RESUMEN

Breast cancer is a heterogeneous disease with known expression-defined tumor subtypes. DNA copy number studies have suggested that tumors within gene expression subtypes share similar DNA Copy number aberrations (CNA) and that CNA can be used to further sub-divide expression classes. To gain further insights into the etiologies of the intrinsic subtypes, we classified tumors according to gene expression subtype and next identified subtype-associated CNA using a novel method called SWITCHdna, using a training set of 180 tumors and a validation set of 359 tumors. Fisher's exact tests, Chi-square approximations, and Wilcoxon rank-sum tests were performed to evaluate differences in CNA by subtype. To assess the functional significance of loss of a specific chromosomal region, individual genes were knocked down by shRNA and drug sensitivity, and DNA repair foci assays performed. Most tumor subtypes exhibited specific CNA. The Basal-like subtype was the most distinct with common losses of the regions containing RB1, BRCA1, INPP4B, and the greatest overall genomic instability. One Basal-like subtype-associated CNA was loss of 5q11-35, which contains at least three genes important for BRCA1-dependent DNA repair (RAD17, RAD50, and RAP80); these genes were predominantly lost as a pair, or all three simultaneously. Loss of two or three of these genes was associated with significantly increased genomic instability and poor patient survival. RNAi knockdown of RAD17, or RAD17/RAD50, in immortalized human mammary epithelial cell lines caused increased sensitivity to a PARP inhibitor and carboplatin, and inhibited BRCA1 foci formation in response to DNA damage. These data suggest a possible genetic cause for genomic instability in Basal-like breast cancers and a biological rationale for the use of DNA repair inhibitor related therapeutics in this breast cancer subtype.


Asunto(s)
Neoplasias de la Mama/genética , Variaciones en el Número de Copia de ADN , Inestabilidad Genómica , Neoplasias Basocelulares/genética , Ácido Anhídrido Hidrolasas , Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/mortalidad , Proteínas de Ciclo Celular/genética , Enzimas Reparadoras del ADN/genética , Proteínas de Unión al ADN/genética , Femenino , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Genes BRCA1 , Humanos , Neoplasias Basocelulares/tratamiento farmacológico , Neoplasias Basocelulares/mortalidad , Análisis de Supervivencia
13.
Bioinformatics ; 27(5): 678-85, 2011 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-21183584

RESUMEN

MOTIVATION: DNA copy number gains and losses are commonly found in tumor tissue, and some of these aberrations play a role in tumor genesis and development. Although high resolution DNA copy number data can be obtained using array-based techniques, no single method is widely used to distinguish between recurrent and sporadic copy number aberrations. RESULTS: Here we introduce Discovering Copy Number Aberrations Manifested In Cancer (DiNAMIC), a novel method for assessing the statistical significance of recurrent copy number aberrations. In contrast to competing procedures, the testing procedure underlying DiNAMIC is carefully motivated, and employs a novel cyclic permutation scheme. Extensive simulation studies show that DiNAMIC controls false positive discoveries in a variety of realistic scenarios. We use DiNAMIC to analyze two publicly available tumor datasets, and our results show that DiNAMIC detects multiple loci that have biological relevance. AVAILABILITY: Source code implemented in R, as well as text files containing examples and sample datasets are available at http://www.bios.unc.edu/research/genomic_software/DiNAMIC.


Asunto(s)
Variaciones en el Número de Copia de ADN , Neoplasias/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Biología Computacional/métodos , ADN de Neoplasias/genética , Humanos
14.
BMC Genomics ; 11: 574, 2010 Oct 18.
Artículo en Inglés | MEDLINE | ID: mdl-20955544

RESUMEN

BACKGROUND: Analysis of microarray experiments often involves testing for the overrepresentation of pre-defined sets of genes among lists of genes deemed individually significant. Most popular gene set testing methods assume the independence of genes within each set, an assumption that is seriously violated, as extensive correlation between genes is a well-documented phenomenon. RESULTS: We conducted a meta-analysis of over 200 datasets from the Gene Expression Omnibus in order to demonstrate the practical impact of strong gene correlation patterns that are highly consistent across experiments. We show that a common independence assumption-based gene set testing procedure produces very high false positive rates when applied to data sets for which treatment groups have been randomized, and that gene sets with high internal correlation are more likely to be declared significant. A reanalysis of the same datasets using an array resampling approach properly controls false positive rates, leading to more parsimonious and high-confidence gene set findings, which should facilitate pathway-based interpretation of the microarray data. CONCLUSIONS: These findings call into question many of the gene set testing results in the literature and argue strongly for the adoption of resampling based gene set testing criteria in the peer reviewed biomedical literature.


Asunto(s)
Bases de Datos Genéticas , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Animales , Regulación de la Expresión Génica , Humanos , Ratones , Publicaciones
15.
Bioinformatics ; 25(4): 482-9, 2009 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-19091771

RESUMEN

MOTIVATION: Gene expression Quantitative Trait Locus (eQTL) mapping measures the association between transcript expression and genotype in order to find genomic locations likely to regulate transcript expression. The availability of both gene expression and high-density genotype data has improved our ability to perform eQTL mapping in inbred mouse and other homozygous populations. However, existing eQTL mapping software does not scale well when the number of transcripts and markers are on the order of 10(5) and 10(5)-10(6), respectively. RESULTS: We propose a new method, FastMap, for fast and efficient eQTL mapping in homozygous inbred populations with binary allele calls. FastMap exploits the discrete nature and structure of the measured single nucleotide polymorphisms (SNPs). In particular, SNPs are organized into a Hamming distance-based tree that minimizes the number of arithmetic operations required to calculate the association of a SNP by making use of the association of its parent SNP in the tree. FastMap's tree can be used to perform both single marker mapping and haplotype association mapping over an m-SNP window. These performance enhancements also permit permutation-based significance testing. AVAILABILITY: The FastMap program and source code are available at the website: http://cebc.unc.edu/fastmap86.html.


Asunto(s)
Homocigoto , Sitios de Carácter Cuantitativo/genética , Programas Informáticos , Algoritmos , Animales , Perfilación de la Expresión Génica , Genoma , Genotipo , Ratones
16.
N Engl J Med ; 355(6): 560-9, 2006 Aug 10.
Artículo en Inglés | MEDLINE | ID: mdl-16899776

RESUMEN

BACKGROUND: Gene-expression-profiling studies of primary breast tumors performed by different laboratories have resulted in the identification of a number of distinct prognostic profiles, or gene sets, with little overlap in terms of gene identity. METHODS: To compare the predictions derived from these gene sets for individual samples, we obtained a single data set of 295 samples and applied five gene-expression-based models: intrinsic subtypes, 70-gene profile, wound response, recurrence score, and the two-gene ratio (for patients who had been treated with tamoxifen). RESULTS: We found that most models had high rates of concordance in their outcome predictions for the individual samples. In particular, almost all tumors identified as having an intrinsic subtype of basal-like, HER2-positive and estrogen-receptor-negative, or luminal B (associated with a poor prognosis) were also classified as having a poor 70-gene profile, activated wound response, and high recurrence score. The 70-gene and recurrence-score models, which are beginning to be used in the clinical setting, showed 77 to 81 percent agreement in outcome classification. CONCLUSIONS: Even though different gene sets were used for prognostication in patients with breast cancer, four of the five tested showed significant agreement in the outcome predictions for individual patients and are probably tracking a common set of biologic phenotypes.


Asunto(s)
Neoplasias de la Mama/genética , Expresión Génica , Modelos Genéticos , Análisis de Varianza , Neoplasias de la Mama/mortalidad , Femenino , Perfilación de la Expresión Génica , Humanos , Fenotipo , Pronóstico , Modelos de Riesgos Proporcionales , Receptor ErbB-2 , Receptores de Estrógenos , Análisis de Supervivencia
17.
Bioinformatics ; 24(9): 1154-60, 2008 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-18325927

RESUMEN

MOTIVATION: Gene-expression microarrays are currently being applied in a variety of biomedical applications. This article considers the problem of how to merge datasets arising from different gene-expression studies of a common organism and phenotype. Of particular interest is how to merge data from different technological platforms. RESULTS: The article makes two contributions to the problem. The first is a simple cross-study normalization method, which is based on linked gene/sample clustering of the given datasets. The second is the introduction and description of several general validation measures that can be used to assess and compare cross-study normalization methods. The proposed normalization method is applied to three existing breast cancer datasets, and is compared to several competing normalization methods using the proposed validation measures. AVAILABILITY: The supplementary materials and XPN Matlab code are publicly available at website: https://genome.unc.edu/xpn


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Familia de Multigenes/fisiología , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Interpretación Estadística de Datos , Perfilación de la Expresión Génica/normas , Análisis de Secuencia por Matrices de Oligonucleótidos/normas , Valores de Referencia
18.
Genome Biol ; 20(1): 52, 2019 03 07.
Artículo en Inglés | MEDLINE | ID: mdl-30845957

RESUMEN

We propose a statistical boosting method, termed I-Boost, to integrate multiple types of high-dimensional genomics data with clinical data for predicting survival time. I-Boost provides substantially higher prediction accuracy than existing methods. By applying I-Boost to The Cancer Genome Atlas, we show that the integration of multiple genomics platforms with clinical variables improves the prediction of survival time over the use of clinical variables alone; gene expression values are typically more prognostic of survival time than other genomics data types; and gene modules/signatures are at least as prognostic as the collection of individual gene expression data.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Genómica/métodos , Neoplasias/mortalidad , Programas Informáticos , Humanos , Modelos Estadísticos , Neoplasias/genética , Pronóstico , Tasa de Supervivencia
19.
Ann Appl Stat ; 12(2): 1180-1203, 2018 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-31871518

RESUMEN

Given data obtained under two sampling conditions, it is often of interest to identify variables that behave differently in one condition than in the other. We introduce a method for differential analysis of second-order behavior called Differential Correlation Mining (DCM). The DCM method identifies differentially correlated sets of variables, with the property that the average pairwise correlation between variables in a set is higher under one sample condition than the other. DCM is based on an iterative search procedure that adaptively updates the size and elements of a candidate variable set. Updates are performed via hypothesis testing of individual variables, based on the asymptotic distribution of their average differential correlation. We investigate the performance of DCM by applying it to simulated data as well as to recent experimental datasets in genomics and brain imaging.

20.
J Mach Learn Res ; 182018 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-30853860

RESUMEN

Community detection is the process of grouping strongly connected nodes in a network. Many community detection methods for un-weighted networks have a theoretical basis in a null model. Communities discovered by these methods therefore have interpretations in terms of statistical significance. In this paper, we introduce a null for weighted networks called the continuous configuration model. First, we propose a community extraction algorithm for weighted networks which incorporates iterative hypothesis testing under the null. We prove a central limit theorem for edge-weight sums and asymptotic consistency of the algorithm under a weighted stochastic block model. We then incorporate the algorithm in a community detection method called CCME. To benchmark the method, we provide a simulation framework involving the null to plant "background" nodes in weighted networks with communities. We show that the empirical performance of CCME on these simulations is competitive with existing methods, particularly when overlapping communities and background nodes are present. To further validate the method, we present two real-world networks with potential background nodes and analyze them with CCME, yielding results that reveal macro-features of the corresponding systems.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA