Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Biostatistics ; 23(4): 1133-1149, 2022 10 14.
Artículo en Inglés | MEDLINE | ID: mdl-35094048

RESUMEN

Genomic data sets contain the effects of various unobserved biological variables in addition to the variable of primary interest. These latent variables often affect a large number of features (e.g., genes), giving rise to dense latent variation. This latent variation presents both challenges and opportunities for classification. While some of these latent variables may be partially correlated with the phenotype of interest and thus helpful, others may be uncorrelated and merely contribute additional noise. Moreover, whether potentially helpful or not, these latent variables may obscure weaker effects that impact only a small number of features but more directly capture the signal of primary interest. To address these challenges, we propose the cross-residualization classifier (CRC). Through an adjustment and ensemble procedure, the CRC estimates and residualizes out the latent variation, trains a classifier on the residuals, and then reintegrates the latent variation in a final ensemble classifier. Thus, the latent variables are accounted for without discarding any potentially predictive information. We apply the method to simulated data and a variety of genomic data sets from multiple platforms. In general, we find that the CRC performs well relative to existing classifiers and sometimes offers substantial gains.


Asunto(s)
Algoritmos , Genómica , Genómica/métodos , Humanos
2.
Bioinformatics ; 38(21): 4934-4940, 2022 10 31.
Artículo en Inglés | MEDLINE | ID: mdl-36063034

RESUMEN

MOTIVATION: High-throughput fluorescent microscopy is a popular class of techniques for studying tissues and cells through automated imaging and feature extraction of hundreds to thousands of samples. Like other high-throughput assays, these approaches can suffer from unwanted noise and technical artifacts that obscure the biological signal. In this work, we consider how an experimental design incorporating multiple levels of replication enables the removal of technical artifacts from such image-based platforms. RESULTS: We develop a general approach to remove technical artifacts from high-throughput image data that leverages an experimental design with multiple levels of replication. To illustrate the methods, we consider microenvironment microarrays (MEMAs), a high-throughput platform designed to study cellular responses to microenvironmental perturbations. In application to MEMAs, our approach removes unwanted spatial artifacts and thereby enhances the biological signal. This approach has broad applicability to diverse biological assays. AVAILABILITY AND IMPLEMENTATION: Raw data are on synapse (syn2862345), analysis code is on github: gjhunt/mema_norm, a reproducible Docker image is available on dockerhub: gjhunt/mema_norm. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Artefactos , Ensayos Analíticos de Alto Rendimiento , Análisis por Micromatrices , Proyectos de Investigación
3.
Proc Natl Acad Sci U S A ; 116(20): 9775-9784, 2019 05 14.
Artículo en Inglés | MEDLINE | ID: mdl-31028141

RESUMEN

Concerted examination of multiple collections of single-cell RNA sequencing (RNA-seq) data promises further biological insights that cannot be uncovered with individual datasets. Here we present scMerge, an algorithm that integrates multiple single-cell RNA-seq datasets using factor analysis of stably expressed genes and pseudoreplicates across datasets. Using a large collection of public datasets, we benchmark scMerge against published methods and demonstrate that it consistently provides improved cell type separation by removing unwanted factors; scMerge can also enhance biological discovery through robust data integration, which we show through the inference of development trajectory in a liver dataset collection.


Asunto(s)
Metaanálisis como Asunto , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Programas Informáticos , Algoritmos , Animales , Desarrollo Embrionario , Análisis Factorial , Expresión Génica , Humanos , Ratones
4.
Nucleic Acids Res ; 47(12): 6073-6083, 2019 07 09.
Artículo en Inglés | MEDLINE | ID: mdl-31114909

RESUMEN

The Nanostring nCounter gene expression assay uses molecular barcodes and single molecule imaging to detect and count hundreds of unique transcripts in a single reaction. These counts need to be normalized to adjust for the amount of sample, variations in assay efficiency and other factors. Most users adopt the normalization approach described in the nSolver analysis software, which involves background correction based on the observed values of negative control probes, a within-sample normalization using the observed values of positive control probes and normalization across samples using reference (housekeeping) genes. Here we present a new normalization method, Removing Unwanted Variation-III (RUV-III), which makes vital use of technical replicates and suitable control genes. We also propose an approach using pseudo-replicates when technical replicates are not available. The effectiveness of RUV-III is illustrated on four different datasets. We also offer suggestions on the design and analysis of studies involving this technology.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Adenocarcinoma del Pulmón/genética , Adenocarcinoma del Pulmón/metabolismo , Células Dendríticas/metabolismo , Humanos , Enfermedades Inflamatorias del Intestino/genética , Enfermedades Inflamatorias del Intestino/metabolismo , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/metabolismo , Activación de Linfocitos/genética , Imagen Individual de Molécula
5.
Bioinformatics ; 35(12): 2093-2099, 2019 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-30407492

RESUMEN

MOTIVATION: Cell type composition of tissues is important in many biological processes. To help understand cell type composition using gene expression data, methods of estimating (deconvolving) cell type proportions have been developed. Such estimates are often used to adjust for confounding effects of cell type in differential expression analysis (DEA). RESULTS: We propose dtangle, a new cell type deconvolution method. dtangle works on a range of DNA microarray and bulk RNA-seq platforms. It estimates cell type proportions using publicly available, often cross-platform, reference data. We evaluate dtangle on 11 benchmark datasets showing that dtangle is competitive with published deconvolution methods, is robust to outliers and selection of tuning parameters, and is fast. As a case study, we investigate the human immune response to Lyme disease. dtangle's estimates reveal a temporal trend consistent with previous findings and are important covariates for DEA across disease status. AVAILABILITY AND IMPLEMENTATION: dtangle is on CRAN (cran.r-project.org/package=dtangle) or github (dtangle.github.io). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos
6.
J Pediatr Orthop ; 40(9): 487-491, 2020 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-32931690

RESUMEN

BACKGROUND: The vast majority of pediatric distal-third tibial shaft fractures can be treated with closed reduction and casting. If conservative measures fail, then these fractures are usually treated with 2 antegrade flexible intramedullary nails. A postoperative cast is usually applied because of the tenuous fixation of the 2 nails. Recent studies have described the use of 4 nails to increase the stability of the fixation, a technique that may preclude the need for postoperative casting. The purpose of this biomechanical study is to quantify the relative increase in stiffness and load to failure when using 4 versus 2 nails to surgically stabilize these fractures. METHODS: Short, oblique osteotomies were created in the distal third of small fourth-generation tibial sawbones and stabilized with 2 (double) or 4 (quadruple) flexible intramedullary nails. After pilot testing, 5 models per fixation method were tested cyclically in axial compression, torsion, and 4-point bending in valgus and recurvatum. At the end of the study, each model was loaded to failure in valgus. Stiffness values were calculated, and yield points were recorded. The data were compared using Student's t tests. Results are presented as mean±SD. The level of significance was set at P≤0.05. RESULTS: Stiffness in valgus 4-point bending was 624±231 and 336±162 N/mm in the quadruple-nail and double-nail groups, respectively (P=0.04). There were no statistically significant differences in any other mode of testing. CONCLUSIONS: The quadruple-nail construct was almost 2 times as stiff as the double-nail construct in resisting valgus deformation. This provides biomechanical support for a previously published study describing the clinical success of this fixation construct.


Asunto(s)
Clavos Ortopédicos , Fijación Intramedular de Fracturas/instrumentación , Tibia/lesiones , Fracturas de la Tibia/cirugía , Fenómenos Biomecánicos , Niño , Diáfisis/lesiones , Diáfisis/cirugía , Fijación Intramedular de Fracturas/métodos , Humanos , Masculino , Diseño de Prótesis , Tibia/cirugía , Fracturas de la Tibia/fisiopatología
7.
Biostatistics ; 17(1): 16-28, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26286812

RESUMEN

When dealing with large scale gene expression studies, observations are commonly contaminated by sources of unwanted variation such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g. when the goal is to cluster the samples or to build a corrected version of the dataset--as opposed to the study of an observed factor of interest--taking unwanted variation into account can become a difficult task. The factors driving unwanted variation may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data. The proposed methods are then evaluated on synthetic data and three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state-of-the-art corrections. All proposed methods are implemented in the bioconductor package RUVnormalize.


Asunto(s)
Interpretación Estadística de Datos , Expresión Génica/genética , Variación Genética/genética , Análisis por Micromatrices/métodos , Humanos
8.
Nucleic Acids Res ; 43(16): e106, 2015 Sep 18.
Artículo en Inglés | MEDLINE | ID: mdl-25990733

RESUMEN

Due to their relatively low-cost per sample and broad, gene-centric coverage of CpGs across the human genome, Illumina's 450k arrays are widely used in large scale differential methylation studies. However, by their very nature, large studies are particularly susceptible to the effects of unwanted variation. The effects of unwanted variation have been extensively documented in gene expression array studies and numerous methods have been developed to mitigate these effects. However, there has been much less research focused on the appropriate methodology to use for accounting for unwanted variation in methylation array studies. Here we present a novel 2-stage approach using RUV-inverse in a differential methylation analysis of 450k data and show that it outperforms existing methods.


Asunto(s)
Metilación de ADN , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Adolescente , Anciano de 80 o más Años , Envejecimiento/genética , Femenino , Variación Genética , Humanos , Lactante , Recién Nacido , Masculino , Neoplasias/genética , Fumar/genética
9.
Gut ; 64(11): 1721-31, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-25385008

RESUMEN

OBJECTIVE: Differences in gastric cancer (GC) clinical outcomes between patients in Asian and non-Asian countries has been historically attributed to variability in clinical management. However, recent international Phase III trials suggest that even with standardised treatments, GC outcomes differ by geography. Here, we investigated gene expression differences between Asian and non-Asian GCs, and if these molecular differences might influence clinical outcome. DESIGN: We compared gene expression profiles of 1016 GCs from six Asian and three non-Asian GC cohorts, using a two-stage meta-analysis design and a novel biostatistical method (RUV-4) to adjust for technical variation between cohorts. We further validated our findings by computerised immunohistochemical analysis on two independent tissue microarray (TMA) cohorts from Asian and non-Asian localities (n=665). RESULTS: Gene signatures differentially expressed between Asians and non-Asian GCs were related to immune function and inflammation. Non-Asian GCs were significantly enriched in signatures related to T-cell biology, including CTLA-4 signalling. Similarly, in the TMA cohorts, non-Asian GCs showed significantly higher expression of T-cell markers (CD3, CD45R0, CD8) and lower expression of the immunosuppressive T-regulatory cell marker FOXP3 compared to Asian GCs (p<0.05). Inflammatory cell markers CD66b and CD68 also exhibited significant cohort differences (p<0.05). Exploratory analyses revealed a significant relationship between tumour immunity factors, geographic locality-specific prognosis, and postchemotherapy outcomes. CONCLUSIONS: Analyses of >1600 GCs suggest that Asian and non-Asian GCs exhibit distinct tumour immunity signatures related to T-cell function. These differences may influence geographical differences in clinical outcome, and the design of future trials particularly in immuno-oncology.


Asunto(s)
Adenocarcinoma/genética , Adenocarcinoma/inmunología , Neoplasias Gástricas/genética , Neoplasias Gástricas/inmunología , Transcriptoma , Adenocarcinoma/tratamiento farmacológico , Adulto , Anciano , Anciano de 80 o más Años , Pueblo Asiatico/genética , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios Retrospectivos , Neoplasias Gástricas/tratamiento farmacológico , Resultado del Tratamiento , Adulto Joven
10.
Anal Chem ; 87(7): 3606-15, 2015 Apr 07.
Artículo en Inglés | MEDLINE | ID: mdl-25692814

RESUMEN

Metabolomics experiments are inevitably subject to a component of unwanted variation, due to factors such as batch effects, long runs of samples, and confounding biological variation. Although the removal of this unwanted variation is a vital step in the analysis of metabolomics data, it is considered a gray area in which there is a recognized need to develop a better understanding of the procedures and statistical methods required to achieve statistically relevant optimal biological outcomes. In this paper, we discuss the causes of unwanted variation in metabolomics experiments, review commonly used metabolomics approaches for handling this unwanted variation, and present a statistical approach for the removal of unwanted variation to obtain normalized metabolomics data. The advantages and performance of the approach relative to several widely used metabolomics normalization approaches are illustrated through two metabolomics studies, and recommendations are provided for choosing and assessing the most suitable normalization method for a given metabolomics experiment. Software for the approach is made freely available.


Asunto(s)
Espectrometría de Masas/métodos , Metabolómica/métodos , Programas Informáticos , Humanos , Análisis de Componente Principal
11.
Biostatistics ; 13(3): 539-52, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22101192

RESUMEN

Microarray expression studies suffer from the problem of batch effects and other unwanted variation. Many methods have been proposed to adjust microarray data to mitigate the problems of unwanted variation. Several of these methods rely on factor analysis to infer the unwanted variation from the data. A central problem with this approach is the difficulty in discerning the unwanted variation from the biological variation that is of interest to the researcher. We present a new method, intended for use in differential expression studies, that attempts to overcome this problem by restricting the factor analysis to negative control genes. Negative control genes are genes known a priori not to be differentially expressed with respect to the biological factor of interest. Variation in the expression levels of these genes can therefore be assumed to be unwanted variation. We name this method "Remove Unwanted Variation, 2-step" (RUV-2). We discuss various techniques for assessing the performance of an adjustment method and compare the performance of RUV-2 with that of other commonly used adjustment methods such as Combat and Surrogate Variable Analysis (SVA). We present several example studies, each concerning genes differentially expressed with respect to gender in the brain and find that RUV-2 performs as well or better than other methods. Finally, we discuss the possibility of adapting RUV-2 for use in studies not concerned with differential expression and conclude that there may be promise but substantial challenges remain.


Asunto(s)
Interpretación Estadística de Datos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Femenino , Humanos , Masculino
12.
Nat Biotechnol ; 41(1): 82-95, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36109686

RESUMEN

Accurate identification and effective removal of unwanted variation is essential to derive meaningful biological results from RNA sequencing (RNA-seq) data, especially when the data come from large and complex studies. Using RNA-seq data from The Cancer Genome Atlas (TCGA), we examined several sources of unwanted variation and demonstrate here how these can significantly compromise various downstream analyses, including cancer subtype identification, association between gene expression and survival outcomes and gene co-expression analysis. We propose a strategy, called pseudo-replicates of pseudo-samples (PRPS), for deploying our recently developed normalization method, called removing unwanted variation III (RUV-III), to remove the variation caused by library size, tumor purity and batch effects in TCGA RNA-seq data. We illustrate the value of our approach by comparing it to the standard TCGA normalizations on several TCGA RNA-seq datasets. RUV-III with PRPS can be used to integrate and normalize other large transcriptomic datasets coming from multiple laboratories or platforms.


Asunto(s)
Neoplasias , ARN , Humanos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN , Neoplasias/genética
13.
Nat Commun ; 13(1): 1358, 2022 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-35292647

RESUMEN

Transcriptome deconvolution aims to estimate the cellular composition of an RNA sample from its gene expression data, which in turn can be used to correct for composition differences across samples. The human brain is unique in its transcriptomic diversity, and comprises a complex mixture of cell-types, including transcriptionally similar subtypes of neurons. Here, we carry out a comprehensive evaluation of deconvolution methods for human brain transcriptome data, and assess the tissue-specificity of our key observations by comparison with human pancreas and heart. We evaluate eight transcriptome deconvolution approaches and nine cell-type signatures, testing the accuracy of deconvolution using in silico mixtures of single-cell RNA-seq data, RNA mixtures, as well as nearly 2000 human brain samples. Our results identify the main factors that drive deconvolution accuracy for brain data, and highlight the importance of biological factors influencing cell-type signatures, such as brain region and in vitro cell culturing.


Asunto(s)
ARN , Transcriptoma , Encéfalo , Perfilación de la Expresión Génica/métodos , Humanos , Especificidad de Órganos , Análisis de Secuencia de ARN/métodos , Transcriptoma/genética
14.
J Neurosci ; 30(23): 7917-27, 2010 Jun 09.
Artículo en Inglés | MEDLINE | ID: mdl-20534840

RESUMEN

Previous work has characterized the properties of neurotransmitter release at excitatory and inhibitory synapses, but we know remarkably little about the properties of monoamine release, because these neuromodulators do not generally produce a fast ionotropic response. Since dopamine and serotonin neurons can also release glutamate in vitro and in vivo, we have used the vesicular monoamine transporter VMAT2 and the vesicular glutamate transporter VGLUT1 to compare the localization and recycling of synaptic vesicles that store, respectively, monoamines and glutamate. First, VMAT2 segregates partially from VGLUT1 in the boutons of midbrain dopamine neurons, indicating the potential for distinct release sites. Second, endocytosis after stimulation is slower for VMAT2 than VGLUT1. During the stimulus, however, the endocytosis of VMAT2 (but not VGLUT1) accelerates dramatically in midbrain dopamine but not hippocampal neurons, indicating a novel, cell-specific mechanism to sustain high rates of release. On the other hand, we find that in both midbrain dopamine and hippocampal neurons, a substantially smaller proportion of VMAT2 than VGLUT1 is available for evoked release, and VMAT2 shows considerably more dispersion along the axon after exocytosis than VGLUT1. Even when expressed in the same neuron, the two vesicular transporters thus target to distinct populations of synaptic vesicles, presumably due to their selection of distinct recycling pathways.


Asunto(s)
Dopamina/metabolismo , Neuronas/metabolismo , Vesículas Sinápticas/metabolismo , Proteína 1 de Transporte Vesicular de Glutamato/metabolismo , Proteínas de Transporte Vesicular de Monoaminas/metabolismo , Animales , Animales Recién Nacidos , Western Blotting , Células Cultivadas , Electrofisiología , Endocitosis/fisiología , Exocitosis/fisiología , Hipocampo/citología , Hipocampo/metabolismo , Mesencéfalo/citología , Mesencéfalo/metabolismo , Ratas
15.
Medicine (Baltimore) ; 100(40): e27422, 2021 Oct 08.
Artículo en Inglés | MEDLINE | ID: mdl-34622851

RESUMEN

ABSTRACT: As severe acute respiratory syndrome coronavirus 2 continues to spread, easy-to-use risk models that predict hospital mortality can assist in clinical decision making and triage. We aimed to develop a risk score model for in-hospital mortality in patients hospitalized with 2019 novel coronavirus (COVID-19) that was robust across hospitals and used clinical factors that are readily available and measured standardly across hospitals.In this retrospective observational study, we developed a risk score model using data collected by trained abstractors for patients in 20 diverse hospitals across the state of Michigan (Mi-COVID19) who were discharged between March 5, 2020 and August 14, 2020. Patients who tested positive for severe acute respiratory syndrome coronavirus 2 during hospitalization or were discharged with an ICD-10 code for COVID-19 (U07.1) were included. We employed an iterative forward selection approach to consider the inclusion of 145 potential risk factors available at hospital presentation. Model performance was externally validated with patients from 19 hospitals in the Mi-COVID19 registry not used in model development. We shared the model in an easy-to-use online application that allows the user to predict in-hospital mortality risk for a patient if they have any subset of the variables in the final model.Two thousand one hundred and ninety-three patients in the Mi-COVID19 registry met our inclusion criteria. The derivation and validation sets ultimately included 1690 and 398 patients, respectively, with mortality rates of 19.6% and 18.6%, respectively. The average age of participants in the study after exclusions was 64 years old, and the participants were 48% female, 49% Black, and 87% non-Hispanic. Our final model includes the patient's age, first recorded respiratory rate, first recorded pulse oximetry, highest creatinine level on day of presentation, and hospital's COVID-19 mortality rate. No other factors showed sufficient incremental model improvement to warrant inclusion. The area under the receiver operating characteristics curve for the derivation and validation sets were .796 (95% confidence interval, .767-.826) and .829 (95% confidence interval, .782-.876) respectively.We conclude that the risk of in-hospital mortality in COVID-19 patients can be reliably estimated using a few factors, which are standardly measured and available to physicians very early in a hospital encounter.


Asunto(s)
COVID-19/mortalidad , Mortalidad Hospitalaria/tendencias , Factores de Edad , Anciano , Anciano de 80 o más Años , Índice de Masa Corporal , Comorbilidad , Creatinina/sangre , Femenino , Conductas Relacionadas con la Salud , Humanos , Modelos Logísticos , Masculino , Michigan/epidemiología , Persona de Mediana Edad , Oximetría , Pronóstico , Curva ROC , Grupos Raciales , Estudios Retrospectivos , Medición de Riesgo , Factores de Riesgo , SARS-CoV-2 , Índice de Severidad de la Enfermedad , Factores Sexuales , Factores Socioeconómicos
16.
J Bioinform Comput Biol ; 18(1): 2040004, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-32336251

RESUMEN

MOTIVATION: In single-cell RNA-sequencing (scRNA-seq) experiments, RNA transcripts are extracted and measured from isolated cells to understand gene expression at the cellular level. Measurements from this technology are affected by many technical artifacts, including batch effects. In analogous bulk gene expression experiments, external references, e.g. synthetic gene spike-ins often from the External RNA Controls Consortium (ERCC), may be incorporated to the experimental protocol for use in adjusting measurements for technical artifacts. In scRNA-seq experiments, the use of external spike-ins is controversial due to dissimilarities with endogenous genes and uncertainty about sufficient precision of their introduction. Instead, endogenous genes with highly stable expression could be used as references within scRNA-seq to help normalize the data. First, however, a specific notion of stable expression at the single-cell level needs to be formulated; genes could be stable in absolute expression, in proportion to cell volume, or in proportion to total gene expression. Different types of stable genes will be useful for different normalizations and will need different methods for discovery. RESULTS: We compile gene sets whose products are associated with cellular structures and record these gene sets for future reuse and analysis. We find that genes whose final products are associated with the cytosolic ribosome have expressions that are highly stable with respect to the total RNA content. Notably, these genes appear to be stable in bulk measurements as well. SUPPLEMENTARY INFORMATION: Supplementary data are available through GitHub (johanngb/sc-stable).


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Animales , Biología Computacional/métodos , Bases de Datos Factuales , Humanos , Ratones
17.
J Comput Graph Stat ; 29(4): 929-941, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-34531645

RESUMEN

Proper data transformation is an essential part of analysis. Choosing appropriate transformations for variables can enhance visualization, improve efficacy of analytical methods, and increase data interpretability. However determining appropriate transformations of variables from high-content imaging data poses new challenges. Imaging data produces hundreds of covariates from each of thousands of images in a corpus. Each of these covariates will have a different distribution and need a potentially different transformation. As such imaging data produces hundreds of covariates, determining an appropriate transformation for each of them is infeasible by hand. In this paper we explore simple, robust, and automatic transformations of high-content image data. A central application of our work is to microenvironment microarray bio-imaging data from the NIH LINCS program. We show that our robust transformations enhance visualization and improve the discovery of substantively relevant latent effects. These transformations enhance analysis of image features individually and also improve data integration approaches when combining together multiple features. We anticipate that the advantages of this work will likely also be realized in the analysis of data from other high-content and highly-multiplexed technologies like Cell Painting or Cyclic Immunofluorescence. Software and further analysis can be found at gjhunt.github.io/rr.

18.
Eval Rev ; 42(4): 458-488, 2018 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-30442034

RESUMEN

BACKGROUND: When conducting a randomized controlled trial, it is common to specify in advance the statistical analyses that will be used to analyze the data. Typically, these analyses will involve adjusting for small imbalances in baseline covariates. However, this poses a dilemma, as adjusting for too many covariates can hurt precision more than it helps, and it is often unclear which covariates are predictive of outcome prior to conducting the experiment. OBJECTIVES: This article aims to produce a covariate adjustment method that allows for automatic variable selection, so that practitioners need not commit to any specific set of covariates prior to seeing the data. RESULTS: In this article, we propose the "leave-one-out potential outcomes" estimator. We leave out each observation and then impute that observation's treatment and control potential outcomes using a prediction algorithm such as a random forest. In addition to allowing for automatic variable selection, this estimator is unbiased under the Neyman-Rubin model, generally performs at least as well as the unadjusted estimator, and the experimental randomization largely justifies the statistical assumptions made.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA