SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants.

Kuksa, Pavel P; Lee, Chien-Yueh; Amlie-Wolf, Alexandre; Gangadharan, Prabhakaran; Mlynarski, Elizabeth E; Chou, Yi-Fan; Lin, Han-Jen; Issen, Heather; Greenfest-Allen, Emily; Valladares, Otto; Leung, Yuk Yee; Wang, Li-San

Kuksa, Pavel P; Lee, Chien-Yueh; Amlie-Wolf, Alexandre; Gangadharan, Prabhakaran; Mlynarski, Elizabeth E; Chou, Yi-Fan; Lin, Han-Jen; Issen, Heather; Greenfest-Allen, Emily; Valladares, Otto; Leung, Yuk Yee; Wang, Li-San.

Afiliação

Kuksa PP; Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.
Lee CY; Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.
Amlie-Wolf A; Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.
Gangadharan P; Genomics and Computational Biology Graduate Group.
Mlynarski EE; Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.
Chou YF; Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.
Lin HJ; Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.
Issen H; Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.
Greenfest-Allen E; Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.
Valladares O; Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.
Leung YY; Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
Wang LS; Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.

Bioinformatics ; 36(12): 3879-3881, 2020 06 01.

Article em En | MEDLINE | ID: mdl-32330239

ABSTRACT

ABSTRACT

SUMMARY:

We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. AVAILABILITY AND IMPLEMENTATION SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https//bitbucket.org/wanglab-upenn/SparkINFERNO or https//hub.docker.com/r/wanglab/spark-inferno. CONTACT lswang@pennmedicine.upenn.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Assuntos

Estudo de Associação Genômica Ampla; Locos de Características Quantitativas; Algoritmos; Genômica; Software

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Locos de Características Quantitativas / Estudo de Associação Genômica Ampla Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google