Search | VHL Search Portal

Assessing the performance of in silico methods for predicting the pathogenicity of variants in the gene CHEK2, among Hispanic females with breast cancer.

Voskanian, Alin; Katsonis, Panagiotis; Lichtarge, Olivier; Pejaver, Vikas; Radivojac, Predrag; Mooney, Sean D; Capriotti, Emidio; Bromberg, Yana; Wang, Yanran; Miller, Max; Martelli, Pier Luigi; Savojardo, Castrense; Babbi, Giulia; Casadio, Rita; Cao, Yue; Sun, Yuanfei; Shen, Yang; Garg, Aditi; Pal, Debnath; Yu, Yao; Huff, Chad D; Tavtigian, Sean V; Young, Erin; Neuhausen, Susan L; Ziv, Elad; Pal, Lipika R; Andreoletti, Gaia; Brenner, Steven E; Kann, Maricel G.

Hum Mutat ; 40(9): 1612-1622, 2019 09.

Article in English | MEDLINE | ID: mdl-31241222

ABSTRACT

The availability of disease-specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI-5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV-disease relationships.

Subject(s)

Breast Neoplasms/genetics , Checkpoint Kinase 2/genetics , Computational Biology/methods , Hispanic or Latino/genetics , Polymorphism, Single Nucleotide , Adult , Aged , Breast Neoplasms/ethnology , Case-Control Studies , Computer Simulation , Female , Genetic Predisposition to Disease , Humans , Linear Models , Middle Aged , United States/ethnology , Exome Sequencing

HIVE-heptagon: A sensible variant-calling algorithm with post-alignment quality controls.

Simonyan, Vahan; Chumakov, Konstantin; Donaldson, Eric; Karagiannis, Konstantinos; Lam, Phuc VinhNguyen; Dingerdissen, Hayley; Voskanian, Alin.

Genomics ; 109(3-4): 131-140, 2017 07.

Article in English | MEDLINE | ID: mdl-28188908

ABSTRACT

Advances in high-throughput sequencing (HTS) technologies have greatly increased the availability of genomic data and potential discovery of clinically significant genomic variants. However, numerous issues still exist with the analysis of these data, including data complexity, the absence of formally agreed upon best practices, and inconsistent reproducibility. Toward a more robust and reproducible variant-calling paradigm, we propose a series of selective noise filtrations and post-alignment quality control (QC) techniques that may reduce the rate of false variant calls. We have implemented both novel and refined post-alignment QC mechanisms to augment existing pre-alignment QC measures. These techniques can be used independently or in combination to identify and correct issues caused during data generation or early analysis stages. The adoption of these procedures by the broader scientific community is expected to improve the identification of clinically significant variants both in terms of computational efficiency and in the confidence of the results. AVAILABILITY: https://hive.biochemistry.gwu.edu/.

Subject(s)

Algorithms , Genome, Human , High-Throughput Nucleotide Sequencing/methods , Polymorphism, Genetic , Quality Control , Genomics/methods , Humans , Reproducibility of Results , Sequence Analysis, DNA/methods

High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis.

Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E; Tkachenko, Valery; Torcivia-Rodriguez, John; Voskanian, Alin; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja.

Database (Oxford) ; 20162016.

Article in English | MEDLINE | ID: mdl-26989153

ABSTRACT

The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure.The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu.

Subject(s)

High-Throughput Nucleotide Sequencing/methods , User-Computer Interface , Computational Biology , Mutation/genetics , Poliovirus/genetics , Poliovirus Vaccines/immunology , Proteomics , Recombination, Genetic , Sequence Alignment , Statistics as Topic

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL