Pesquisa | Biblioteca Virtual em Saúde

Towards cross-application model-agnostic federated cohort discovery.

Dobbins, Nicholas J; Morris, Michele; Sadhu, Eugene; MacFadden, Douglas; Nazaire, Marc-Danie; Simons, William; Weber, Griffin; Murphy, Shawn; Visweswaran, Shyam.

J Am Med Inform Assoc ; 2024 Aug 07.

Artigo em Inglês | MEDLINE | ID: mdl-39110920

RESUMO

OBJECTIVES: To demonstrate that 2 popular cohort discovery tools, Leaf and the Shared Health Research Information Network (SHRINE), are readily interoperable. Specifically, we adapted Leaf to interoperate and function as a node in a federated data network that uses SHRINE and dynamically generate queries for heterogeneous data models. MATERIALS AND METHODS: SHRINE queries are designed to run on the Informatics for Integrating Biology & the Bedside (i2b2) data model. We created functionality in Leaf to interoperate with a SHRINE data network and dynamically translate SHRINE queries to other data models. We randomly selected 500 past queries from the SHRINE-based national Evolve to Next-Gen Accrual to Clinical Trials (ENACT) network for evaluation, and an additional 100 queries to refine and debug Leaf's translation functionality. We created a script for Leaf to convert the terms in the SHRINE queries into equivalent structured query language (SQL) concepts, which were then executed on 2 other data models. RESULTS AND DISCUSSION: 91.1% of the generated queries for non-i2b2 models returned counts within 5% (or ±5 patients for counts under 100) of i2b2, with 91.3% recall. Of the 8.9% of queries that exceeded the 5% margin, 77 of 89 (86.5%) were due to errors introduced by the Python script or the extract-transform-load process, which are easily fixed in a production deployment. The remaining errors were due to Leaf's translation function, which was later fixed. CONCLUSION: Our results support that cohort discovery applications such as Leaf and SHRINE can interoperate in federated data networks with heterogeneous data models.

RNA-SeQC: RNA-seq metrics for quality control and process optimization.

DeLuca, David S; Levin, Joshua Z; Sivachenko, Andrey; Fennell, Timothy; Nazaire, Marc-Danie; Williams, Chris; Reich, Michael; Winckler, Wendy; Getz, Gad.

Bioinformatics ; 28(11): 1530-2, 2012 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-22539670

RESUMO

UNLABELLED: RNA-seq, the application of next-generation sequencing to RNA, provides transcriptome-wide characterization of cellular activity. Assessment of sequencing performance and library quality is critical to the interpretation of RNA-seq data, yet few tools exist to address this issue. We introduce RNA-SeQC, a program which provides key measures of data quality. These metrics include yield, alignment and duplication rates; GC bias, rRNA content, regions of alignment (exon, intron and intragenic), continuity of coverage, 3'/5' bias and count of detectable transcripts, among others. The software provides multi-sample evaluation of library construction protocols, input materials and other experimental parameters. The modularity of the software enables pipeline integration and the routine monitoring of key measures of data quality such as the number of alignable reads, duplication rates and rRNA contamination. RNA-SeQC allows investigators to make informed decisions about sample inclusion in downstream analysis. In summary, RNA-SeQC provides quality control measures critical to experiment design, process optimization and downstream computational analysis. AVAILABILITY AND IMPLEMENTATION: See www.genepattern.org to run online, or www.broadinstitute.org/rna-seqc/ for a command line tool.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Software , Perfilação da Expressão Gênica , Biblioteca Gênica , Internet , Controle de Qualidade , RNA/genética , RNA Ribossômico/genética

Joint modeling and registration of cell populations in cohorts of high-dimensional flow cytometric data.

Pyne, Saumyadipta; Lee, Sharon X; Wang, Kui; Irish, Jonathan; Tamayo, Pablo; Nazaire, Marc-Danie; Duong, Tarn; Ng, Shu-Kay; Hafler, David; Levy, Ronald; Nolan, Garry P; Mesirov, Jill; McLachlan, Geoffrey J.

PLoS One ; 9(7): e100334, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24983991

RESUMO

In biomedical applications, an experimenter encounters different potential sources of variation in data such as individual samples, multiple experimental conditions, and multivariate responses of a panel of markers such as from a signaling network. In multiparametric cytometry, which is often used for analyzing patient samples, such issues are critical. While computational methods can identify cell populations in individual samples, without the ability to automatically match them across samples, it is difficult to compare and characterize the populations in typical experiments, such as those responding to various stimulations or distinctive of particular patients or time-points, especially when there are many samples. Joint Clustering and Matching (JCM) is a multi-level framework for simultaneous modeling and registration of populations across a cohort. JCM models every population with a robust multivariate probability distribution. Simultaneously, JCM fits a random-effects model to construct an overall batch template--used for registering populations across samples, and classifying new samples. By tackling systems-level variation, JCM supports practical biomedical applications involving large cohorts. Software for fitting the JCM models have been implemented in an R package EMMIX-JCM, available from http://www.maths.uq.edu.au/~gjm/mix_soft/EMMIX-JCM/.

Assuntos

Biologia Computacional/métodos , Citometria de Fluxo , Software , Algoritmos , Análise por Conglomerados , Simulação por Computador , Humanos

GenePattern flow cytometry suite.

Spidlen, Josef; Barsky, Aaron; Breuer, Karin; Carr, Peter; Nazaire, Marc-Danie; Hill, Barbara Allen; Qian, Yu; Liefeld, Ted; Reich, Michael; Mesirov, Jill P; Wilkinson, Peter; Scheuermann, Richard H; Sekaly, Rafick-Pierre; Brinkman, Ryan R.

Source Code Biol Med ; 8(1): 14, 2013 Jul 03.

Artigo em Inglês | MEDLINE | ID: mdl-23822732

RESUMO

BACKGROUND: Traditional flow cytometry data analysis is largely based on interactive and time consuming analysis of series two dimensional representations of up to 20 dimensional data. Recent technological advances have increased the amount of data generated by the technology and outpaced the development of data analysis approaches. While there are advanced tools available, including many R/BioConductor packages, these are only accessible programmatically and therefore out of reach for most experimentalists. GenePattern is a powerful genomic analysis platform with over 200 tools for analysis of gene expression, proteomics, and other data. A web-based interface provides easy access to these tools and allows the creation of automated analysis pipelines enabling reproducible research. RESULTS: In order to bring advanced flow cytometry data analysis tools to experimentalists without programmatic skills, we developed the GenePattern Flow Cytometry Suite. It contains 34 open source GenePattern flow cytometry modules covering methods from basic processing of flow cytometry standard (i.e., FCS) files to advanced algorithms for automated identification of cell populations, normalization and quality assessment. Internally, these modules leverage from functionality developed in R/BioConductor. Using the GenePattern web-based interface, they can be connected to build analytical pipelines. CONCLUSIONS: GenePattern Flow Cytometry Suite brings advanced flow cytometry data analysis capabilities to users with minimal computer skills. Functionality previously available only to skilled bioinformaticians is now easily accessible from a web browser.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA