Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
Article in English | MEDLINE | ID: mdl-32368312

ABSTRACT

BACKGROUND: Understanding the genetic basis of cancer risk is a major international endeavor. The emergence of next-generation sequencing (NGS) in late 2000's has further accelerated the discovery of many cancer susceptibility genes. The use of targeted NGS-based multigene testing panels to provide comprehensive analysis of cancer susceptible genes has proven to be a viable option, with the accurate and robust detection of a wide range of clinically relevant variants in the targeted genes being crucial. METHODS: We have developed and validated a targeted NGS-based test for hereditary cancer risk assessment using Illumina's NGS platform by analyzing the protein-coding regions of 35 hereditary cancer genes with a bioinformatics pipeline that utilizes standard practices in the field. This 35-gene hereditary cancer panel is designed to identify germline cancer-causing mutations for 8 different cancers: breast, ovarian, prostate, uterine, colorectal, pancreatic, stomach cancers and melanoma. The panel was validated using well-characterized DNA specimens [NIGMS Human Genetic Cell Repository], where DNA had been extracted using blood of individuals whose genetic variants had been previously characterized by the 1000 Genome Project and the Coriell Catalog. RESULTS: The 35-gene hereditary cancer panel shows high sensitivity (99.9%) and specificity (100%) across 4820 variants including single nucleotide variants (SNVs) and small insertions and deletions (indel; up to 25 bp). The reproducibility and repeatability are 99.8 and 100%, respectively. CONCLUSIONS: The use of targeted NGS-based multigene testing panels to provide comprehensive analysis of cancer susceptible genes has been considered a viable option. In the present study, we developed and validated a 35-gene panel for testing 8 common cancers using next-generation sequencing (NGS). The performance of our hereditary cancer panel is assessed across a board range of variants in the 35 genes to support clinical use.

2.
Proc Natl Acad Sci U S A ; 114(30): 8059-8064, 2017 07 25.
Article in English | MEDLINE | ID: mdl-28674023

ABSTRACT

The HLA gene complex on human chromosome 6 is one of the most polymorphic regions in the human genome and contributes in large part to the diversity of the immune system. Accurate typing of HLA genes with short-read sequencing data has historically been difficult due to the sequence similarity between the polymorphic alleles. Here, we introduce an algorithm, xHLA, that iteratively refines the mapping results at the amino acid level to achieve 99-100% four-digit typing accuracy for both class I and II HLA genes, taking only [Formula: see text]3 min to process a 30× whole-genome BAM file on a desktop computer.


Subject(s)
Histocompatibility Testing/methods , Algorithms , Benchmarking , Humans
3.
PLoS One ; 9(7): e102383, 2014.
Article in English | MEDLINE | ID: mdl-25025225

ABSTRACT

BACKGROUND: Isoniazid (INH) is a highly effective antibiotic central for the treatment of Mycobacterium tuberculosis (MTB). INH-resistant MTB clinical isolates are frequently mutated in the katG gene and the inhA promoter region, but 10 to 37% of INH-resistant clinical isolates have no detectable alterations in currently known gene targets associated with INH-resistance. We aimed to identify novel genes associated with INH-resistance in these latter isolates. METHODOLOGY/PRINCIPAL FINDINGS: INH-resistant clinical isolates of MTB were pre-screened for mutations in the katG, inhA, kasA and ndh genes and the regulatory regions of inhA and ahpC. Twelve INH-resistant isolates with no mutations, and 17 INH-susceptible MTB isolates were subjected to whole genome sequencing. Phylogenetically related variants and synonymous mutations were excluded and further analysis revealed mutations in 60 genes and 4 intergenic regions associated with INH-resistance. Sanger sequencing verification of 45 genes confirmed that mutations in 40 genes were observed only in INH-resistant isolates and not in INH-susceptible isolates. The ratios of non-synonymous to synonymous mutations (dN/dS ratio) for the INH-resistance associated mutations identified in this study were 1.234 for INH-resistant and 0.654 for INH-susceptible isolates, strongly suggesting that these mutations are indeed associated with INH-resistance. CONCLUSION: The discovery of novel targets associated with INH-resistance described in this study may potentially be important for the development of improved molecular detection strategies.


Subject(s)
Antitubercular Agents/pharmacology , Drug Resistance, Bacterial/genetics , Isoniazid/pharmacology , Mycobacterium tuberculosis/drug effects , Mutation , Mycobacterium tuberculosis/genetics , Phylogeny , Promoter Regions, Genetic
4.
BMC Genomics ; 15: 516, 2014 Jun 24.
Article in English | MEDLINE | ID: mdl-24962530

ABSTRACT

BACKGROUND: The Ion Torrent PGM is a popular benchtop sequencer that shows promise in replacing conventional Sanger sequencing as the gold standard for mutation detection. Despite the PGM's reported high accuracy in calling single nucleotide variations, it tends to generate many false positive calls in detecting insertions and deletions (indels), which may hinder its utility for clinical genetic testing. RESULTS: Recently, the proprietary analytical workflow for the Ion Torrent sequencer, Torrent Suite (TS), underwent a series of upgrades. We evaluated three major upgrades of TS by calling indels in the BRCA1 and BRCA2 genes. Our analysis revealed that false negative indels could be generated by TS under both default calling parameters and parameters adjusted for maximum sensitivity. However, indel calling with the same data using the open source variant callers, GATK and SAMtools showed that false negatives could be minimised with the use of appropriate bioinformatics analysis. Furthermore, we identified two variant calling measures, Quality-by-Depth (QD) and VARiation of the Width of gaps and inserts (VARW), which substantially reduced false positive indels, including non-homopolymer associated errors without compromising sensitivity. In our best case scenario that involved the TMAP aligner and SAMtools, we achieved 100% sensitivity, 99.99% specificity and 29% False Discovery Rate (FDR) in indel calling from all 23 samples, which is a good performance for mutation screening using PGM. CONCLUSIONS: New versions of TS, BWA and GATK have shown improvements in indel calling sensitivity and specificity over their older counterpart. However, the variant caller of TS exhibits a lower sensitivity than GATK and SAMtools. Our findings demonstrate that although indel calling from PGM sequences may appear to be noisy at first glance, proper computational indel calling analysis is able to maximize both the sensitivity and specificity at the single base level, paving the way for the usage of this technology for future clinical genetic testing.


Subject(s)
DNA Mutational Analysis/methods , Genes, BRCA1 , Genes, BRCA2 , INDEL Mutation , Sequence Analysis, DNA/methods , DNA Mutational Analysis/standards , Genetic Testing/methods , Genetic Testing/standards , Humans , Reproducibility of Results , Sensitivity and Specificity , Sequence Analysis, DNA/standards , Workflow
5.
PLoS Comput Biol ; 8(12): e1002798, 2012.
Article in English | MEDLINE | ID: mdl-23236268

ABSTRACT

Precise patterns of spatial and temporal gene expression are central to metazoan complexity and act as a driving force for embryonic development. While there has been substantial progress in dissecting and predicting cis-regulatory activity, our understanding of how information from multiple enhancer elements converge to regulate a gene's expression remains elusive. This is in large part due to the number of different biological processes involved in mediating regulation as well as limited availability of experimental measurements for many of them. Here, we used a Bayesian approach to model diverse experimental regulatory data, leading to accurate predictions of both spatial and temporal aspects of gene expression. We integrated whole-embryo information on transcription factor recruitment to multiple cis-regulatory modules, insulator binding and histone modification status in the vicinity of individual gene loci, at a genome-wide scale during Drosophila development. The model uses Bayesian networks to represent the relation between transcription factor occupancy and enhancer activity in specific tissues and stages. All parameters are optimized in an Expectation Maximization procedure providing a model capable of predicting tissue- and stage-specific activity of new, previously unassayed genes. Performing the optimization with subsets of input data demonstrated that neither enhancer occupancy nor chromatin state alone can explain all gene expression patterns, but taken together allow for accurate predictions of spatio-temporal activity. Model predictions were validated using the expression patterns of more than 600 genes recently made available by the BDGP consortium, demonstrating an average 15-fold enrichment of genes expressed in the predicted tissue over a naïve model. We further validated the model by experimentally testing the expression of 20 predicted target genes of unknown expression, resulting in an accuracy of 95% for temporal predictions and 50% for spatial. While this is, to our knowledge, the first genome-wide approach to predict tissue-specific gene expression in metazoan development, our results suggest that integrative models of this type will become more prevalent in the future.


Subject(s)
Chromatin/metabolism , Gene Expression , Models, Theoretical , Transcription Factors/metabolism , Algorithms , Animals , Bayes Theorem , Drosophila/genetics , Enhancer Elements, Genetic
6.
PLoS One ; 7(9): e45798, 2012.
Article in English | MEDLINE | ID: mdl-23029247

ABSTRACT

The emergence of benchtop sequencers has made clinical genetic testing using next-generation sequencing more feasible. Ion Torrent's PGM™ is one such benchtop sequencer that shows clinical promise in detecting single nucleotide variations (SNVs) and microindel variations (indels). However, the large number of false positive indels caused by the high frequency of homopolymer sequencing errors has impeded PGM™'s usage for clinical genetic testing. An extensive analysis of PGM™ data from the sequencing reads of the well-characterized genome of the Escherichia coli DH10B strain and sequences of the BRCA1 and BRCA2 genes from six germline samples was done. Three commonly used variant detection tools, SAMtools, Dindel, and GATK's Unified Genotyper, all had substantial false positive rates for indels. By incorporating filters on two major measures we could dramatically improve false positive rates without sacrificing sensitivity. The two measures were: B-Allele Frequency (BAF) and VARiation of the Width of gaps and inserts (VARW) per indel position. A BAF threshold applied to indels detected by UnifiedGenotyper removed ~99% of the indel errors detected in both the DH10B and BRCA sequences. The optimum BAF threshold for BRCA sequences was determined by requiring 100% detection sensitivity and minimum false discovery rate, using variants detected from Sanger sequencing as reference. This resulted in 15 indel errors remaining, of which 7 indel errors were removed by selecting a VARW threshold of zero. VARW specific errors increased in frequency with higher read depth in the BRCA datasets, suggesting that homopolymer-associated indel errors cannot be reduced by increasing the depth of coverage. Thus, using a VARW threshold is likely to be important in reducing indel errors from data with higher coverage. In conclusion, BAF and VARW thresholds provide simple and effective filtering criteria that can improve the specificity of indel detection in PGM™ data without compromising sensitivity.


Subject(s)
DNA Mutational Analysis/instrumentation , INDEL Mutation , BRCA1 Protein/genetics , BRCA2 Protein/genetics , Escherichia coli/genetics , False Positive Reactions , Gene Frequency , Genome, Bacterial , Haploidy , Humans , Polymorphism, Single Nucleotide , Sensitivity and Specificity , Software
7.
J Mol Diagn ; 14(6): 602-12, 2012 Nov.
Article in English | MEDLINE | ID: mdl-22921312

ABSTRACT

In a clinical setting, next-generation sequencing (NGS) approaches for the enrichment and resequencing of DNA targets may have limitations in throughput, cost, or accuracy. We evaluated an NGS workflow for targeted DNA sequencing for mutation detection. Targeted sequence data of the BRCA1 and BRCA2 genes, generated using a PCR-based, multiplexed NGS approach using the SOLiD 4 (n = 24) and Ion Torrent PGM (n = 20) next-generation sequencers, were evaluated against sequence data obtained by Sanger sequencing. The overall sensitivity for SOLiD and PGM were 97.8% (95% CI = 94.7 to 100.0) and 98.9% (95% CI = 96.8 to 100.0) respectively. The specificity for the SOLiD platform was high, at 100.0% (95% CI = 99.3 to 100.0). PGM correctly identified all 3 indels, but 68 false-positive indels were also called. Equimolar normalization of amplicons was not necessary for successful NGS. Both platforms are highly amenable to scale-up, potentially reducing the reagent cost for BRCA testing to

Subject(s)
BRCA2 Protein/genetics , DNA Mutational Analysis/methods , High-Throughput Nucleotide Sequencing/methods , Mutation , DNA Mutational Analysis/economics , Genes, BRCA2 , High-Throughput Nucleotide Sequencing/economics , Humans , Sensitivity and Specificity
8.
Ann N Y Acad Sci ; 1158: 215-23, 2009 Mar.
Article in English | MEDLINE | ID: mdl-19348643

ABSTRACT

In the DREAM2 community-wide experiment on regulatory network inference, one of the challenges was to identify which genes, in a list of 200, are direct regulatory targets of the transcription factor BCL6. The organizers of the challenge defined targets based on gene expression and chromatin immunoprecipitation experiments (ChIP-chip). The expression data were publicly available; the ChIP-chip data were not. In order to assess the likelihood that a gene is a BCL6 target, we used three classes of information: expression-level differences, over-representation of sequence motifs in promoter regions, and gene ontology annotations. A weight was attached to each analysis based on how well it identified BCL6-bound genes as defined by publicly available ChIP-chip data. By the organizers' criteria, our group, GenomeSingapore, performed best. However, our retrospective analysis indicates that this success was dominated by a gene expression analysis that was predicated on a regulatory model known to be favored by the organizers. We also noted that the 200-gene test set was enriched only in genes that are upregulated, while genes bound by BCL6 are enriched in both upregulated and downregulated genes. Together, these observations suggest possible model biases in the selection of the gold-standard gene set and imply that our success was attained in part by adhering to the same assumptions. We argue that model biases of this type are unavoidable in the inference of regulatory networks and, for that reason, we suggest that future community-wide experiments of this type should focus on the prediction of data, rather than models.


Subject(s)
Gene Expression Regulation , Gene Regulatory Networks , Transcription Factors/metabolism , Algorithms , Animals , Chromatin Immunoprecipitation , Computational Biology/methods , Databases, Genetic , Gene Expression Profiling , Humans , Models, Biological , Oligonucleotide Array Sequence Analysis , Proto-Oncogene Proteins/genetics , Proto-Oncogene Proteins/metabolism , ROC Curve , Repressor Proteins/genetics , Repressor Proteins/metabolism
9.
J Comput Biol ; 16(2): 357-68, 2009 Feb.
Article in English | MEDLINE | ID: mdl-19193152

ABSTRACT

We have developed a method for inferring condition-specific targets of transcription factors based on ranking genes by gene expression change and ranking genes based on predicted transcription factor occupancy. The average of these two ranks, used as a test statistic, allows target genes to be inferred in a stringent manner. The method complements chromatin immunoprecipitation experiments by predicting targets under many conditions for which ChIP experiments have not been performed. We used the method to predict targets of 102 yeast transcription factors in approximately 1600 expression microarray experiments. The reliability of the method is suggested by the strong enrichment of genes previously shown to be bound, by the validation of binding to novel targets, by the way transcription factors with similar specificities can be functionally distinguished, and by the greater-than-expected number of regulatory network motifs, such as auto-regulatory interactions, that arise from new, predicted interactions. The combination of ChIP data and the targets inferred from this analysis results in a high-confidence regulatory network that includes many novel interactions. Interestingly, we find only a weak association between conditions in which we can infer the activity of a transcription factor and conditions in which the transcription gene itself is regulated. Thus, methods that rely on transcription factor regulation to help define regulatory interactions may miss regulatory relationships that are detected by the method reported here.


Subject(s)
Gene Expression Regulation , Promoter Regions, Genetic , Transcription Factors/metabolism , Algorithms , Chromatin Immunoprecipitation , Cluster Analysis , Oligonucleotide Array Sequence Analysis/methods , Reproducibility of Results , Transcription Factors/genetics
10.
Mol Cell ; 32(6): 878-87, 2008 Dec 26.
Article in English | MEDLINE | ID: mdl-19111667

ABSTRACT

The sequence specificity of DNA-binding proteins is the primary mechanism by which the cell recognizes genomic features. Here, we describe systematic determination of yeast transcription factor DNA-binding specificities. We obtained binding specificities for 112 DNA-binding proteins representing 19 distinct structural classes. One-third of the binding specificities have not been previously reported. Several binding sequences have striking genomic distributions relative to transcription start sites, supporting their biological relevance and suggesting a role in promoter architecture. Among these are Rsc3 binding sequences, containing the core CGCG, which are found preferentially approximately 100 bp upstream of transcription start sites. Mutation of RSC3 results in a dramatic increase in nucleosome occupancy in hundreds of proximal promoters containing a Rsc3 binding element, but has little impact on promoters lacking Rsc3 binding sequences, indicating that Rsc3 plays a broad role in targeting nucleosome exclusion at yeast promoters.


Subject(s)
DNA-Binding Proteins/metabolism , Nucleosomes/metabolism , Promoter Regions, Genetic , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/genetics , Transcription Factors/genetics , Base Sequence , Binding Sites , Genes, Fungal , Molecular Sequence Data , Mutation/genetics , Phylogeny , Reproducibility of Results , Sequence Homology, Amino Acid , Transcription Factors/metabolism
11.
Cell ; 133(6): 1106-17, 2008 Jun 13.
Article in English | MEDLINE | ID: mdl-18555785

ABSTRACT

Transcription factors (TFs) and their specific interactions with targets are crucial for specifying gene-expression programs. To gain insights into the transcriptional regulatory networks in embryonic stem (ES) cells, we use chromatin immunoprecipitation coupled with ultra-high-throughput DNA sequencing (ChIP-seq) to map the locations of 13 sequence-specific TFs (Nanog, Oct4, STAT3, Smad1, Sox2, Zfx, c-Myc, n-Myc, Klf4, Esrrb, Tcfcp2l1, E2f1, and CTCF) and 2 transcription regulators (p300 and Suz12). These factors are known to play different roles in ES-cell biology as components of the LIF and BMP signaling pathways, self-renewal regulators, and key reprogramming factors. Our study provides insights into the integration of the signaling pathways into the ES-cell-specific transcription circuitries. Intriguingly, we find specific genomic regions extensively targeted by different TFs. Collectively, the comprehensive mapping of TF-binding sites identifies important features of the transcriptional regulatory networks that define ES-cell identity.


Subject(s)
Embryonic Stem Cells/metabolism , Gene Regulatory Networks , Signal Transduction , Animals , Base Sequence , Binding Sites , Chromatin Immunoprecipitation , Genome , Kruppel-Like Factor 4 , Mice , Multiprotein Complexes , Transcription Factors/metabolism
12.
PLoS One ; 2(8): e776, 2007 Aug 22.
Article in English | MEDLINE | ID: mdl-17712424

ABSTRACT

BACKGROUND: Cellular signaling involves a sequence of events from ligand binding to membrane receptors through transcription factors activation and the induction of mRNA expression. The transcriptional-regulatory system plays a pivotal role in the control of gene expression. A novel computational approach to the study of gene regulation circuits is presented here. METHODOLOGY: Based on the concept of finite state machine, which provides a discrete view of gene regulation, a novel sequential logic model (SLM) is developed to decipher control mechanisms of dynamic transcriptional regulation of gene expressions. The SLM technique is also used to systematically analyze the dynamic function of transcriptional inputs, the dependency and cooperativity, such as synergy effect, among the binding sites with respect to when, how much and how fast the gene of interest is expressed. PRINCIPAL FINDINGS: SLM is verified by a set of well studied expression data on endo16 of Strongylocentrotus purpuratus (sea urchin) during the embryonic midgut development. A dynamic regulatory mechanism for endo16 expression controlled by three binding sites, UI, R and Otx is identified and demonstrated to be consistent with experimental findings. Furthermore, we show that during transition from specification to differentiation in wild type endo16 expression profile, SLM reveals three binary activities are not sufficient to explain the transcriptional regulation of endo16 expression and additional activities of binding sites are required. Further analyses suggest detailed mechanism of R switch activity where indirect dependency occurs in between UI activity and R switch during specification to differentiation stage. CONCLUSIONS/SIGNIFICANCE: The sequential logic formalism allows for a simplification of regulation network dynamics going from a continuous to a discrete representation of gene activation in time. In effect our SLM is non-parametric and model-independent, yet providing rich biological insight. The demonstration of the efficacy of this approach in endo16 is a promising step for further application of the proposed method.


Subject(s)
Gene Expression Regulation , Gene Regulatory Networks , Models, Genetic , Transcription, Genetic , Animals , Cell Adhesion Molecules/genetics , Cell Adhesion Molecules/metabolism , Cell Differentiation , Enhancer Elements, Genetic , Morphogenesis/physiology , Mutagenesis , Sea Urchins/physiology
13.
In Silico Biol ; 7(1): 61-75, 2007.
Article in English | MEDLINE | ID: mdl-17688428

ABSTRACT

P53 is probably the most important tumor suppressor known. Over the years, information about this gene has increased dramatically. We have built a comprehensive knowledgebase of p53, which aims to facilitate wet-lab biologists to formulate their experiments and new-comers to learn whatever they need about the gene and bioinformaticians to make new discoveries through data analysis. Using the information curated, including mutation information, transcription factors, transcriptional targets, and single nucleotide polymorphisms, we have performed extensive bioinformatics analysis, and made several new discoveries about p53. We have identified point missense mutations that are over-represented in cancers, but lack of functional studies. By assessing the capability of six p53 transcriptional targets' tag SNPs selected from HapMap to capture SNPs obtained from National Institute of Environmental Health Sciences (NIEHS) Environmental Genome project and vice versa, we conclude that NIEHS data is a better source for tagSNP selections of these genes in future association studies. Analysis of microRNA regulation in the transcriptional network of the p53 gene reveals potentially important regulatory relationships between oncogenic microRNAs and transcription factors of p53. By mapping transcription factors of p53 to pathways involved in cell cycle and apoptosis, we have identified distinctive transcriptional controls of p53 in these two physiological states.


Subject(s)
Genes, p53 , MicroRNAs/genetics , Mutation, Missense , Point Mutation , Polymorphism, Genetic , Tumor Suppressor Protein p53/metabolism , Apoptosis , Codon , Computational Biology/methods , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , MicroRNAs/metabolism , Oligonucleotide Array Sequence Analysis , Polymorphism, Single Nucleotide , Transcription, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL
...