Your browser doesn't support javascript.
loading
SeqHBase: a big data toolset for family based sequencing data analysis.
He, Min; Person, Thomas N; Hebbring, Scott J; Heinzen, Ethan; Ye, Zhan; Schrodi, Steven J; McPherson, Elizabeth W; Lin, Simon M; Peissig, Peggy L; Brilliant, Murray H; O'Rawe, Jason; Robison, Reid J; Lyon, Gholson J; Wang, Kai.
Afiliación
  • He M; Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, USA Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, USA Department of Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, M
  • Person TN; Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, USA.
  • Hebbring SJ; Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, USA Department of Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, Madison, Wisconsin, USA.
  • Heinzen E; College of Science and Engineering, University of Minnesota-Twin Cities, Minnesota, Minnesota, USA.
  • Ye Z; Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, USA.
  • Schrodi SJ; Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, USA Department of Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, Madison, Wisconsin, USA.
  • McPherson EW; Department of Medical Genetics Services, Marshfield Clinic, Marshfield, Wisconsin, USA.
  • Lin SM; The Research Institute at Nationwide Children's Hospital, Columbus, Ohio, USA.
  • Peissig PL; Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, USA.
  • Brilliant MH; Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, USA Department of Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, Madison, Wisconsin, USA.
  • O'Rawe J; Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics, Cold Spring Harbor, New York, USA.
  • Robison RJ; Utah Foundation for Biomedical Research, Provo, Utah, USA.
  • Lyon GJ; Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics, Cold Spring Harbor, New York, USA Utah Foundation for Biomedical Research, Provo, Utah, USA.
  • Wang K; Utah Foundation for Biomedical Research, Provo, Utah, USA Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, California, USA Department of Psychiatry, University of Southern California, Los Angeles, California, USA.
J Med Genet ; 52(4): 282-8, 2015 Apr.
Article en En | MEDLINE | ID: mdl-25587064
ABSTRACT

BACKGROUND:

Whole-genome sequencing (WGS) and whole-exome sequencing (WES) technologies are increasingly used to identify disease-contributing mutations in human genomic studies. It can be a significant challenge to process such data, especially when a large family or cohort is sequenced. Our objective was to develop a big data toolset to efficiently manipulate genome-wide variants, functional annotations and coverage, together with conducting family based sequencing data analysis.

METHODS:

Hadoop is a framework for reliable, scalable, distributed processing of large data sets using MapReduce programming models. Based on Hadoop and HBase, we developed SeqHBase, a big data-based toolset for analysing family based sequencing data to detect de novo, inherited homozygous, or compound heterozygous mutations that may contribute to disease manifestations. SeqHBase takes as input BAM files (for coverage at every site), variant call format (VCF) files (for variant calls) and functional annotations (for variant prioritisation).

RESULTS:

We applied SeqHBase to a 5-member nuclear family and a 10-member 3-generation family with WGS data, as well as a 4-member nuclear family with WES data. Analysis times were almost linearly scalable with number of data nodes. With 20 data nodes, SeqHBase took about 5 secs to analyse WES familial data and approximately 1 min to analyse WGS familial data.

CONCLUSIONS:

These results demonstrate SeqHBase's high efficiency and scalability, which is necessary as WGS and WES are rapidly becoming standard methods to study the genetics of familial disorders.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Análisis de Secuencia de ADN / Genómica Límite: Humans Idioma: En Revista: J Med Genet Año: 2015 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Análisis de Secuencia de ADN / Genómica Límite: Humans Idioma: En Revista: J Med Genet Año: 2015 Tipo del documento: Article