RESUMO
Bacterial leaf streak of rice, caused by Xanthomonas oryzae pv. oryzicola (Xoc) is an increasingly important yield constraint in this staple crop. A mesophyll colonizer, Xoc differs from X. oryzae pv. oryzae (Xoo), which invades xylem to cause bacterial blight of rice. Both produce multiple distinct TAL effectors, type III-delivered proteins that transactivate effector-specific host genes. A TAL effector finds its target(s) via a partially degenerate code whereby the modular effector amino acid sequence identifies nucleotide sequences to which the protein binds. Virulence contributions of some Xoo TAL effectors have been shown, and their relevant targets, susceptibility (S) genes, identified, but the role of TAL effectors in leaf streak is uncharacterized. We used host transcript profiling to compare leaf streak to blight and to probe functions of Xoc TAL effectors. We found that Xoc and Xoo induce almost completely different host transcriptional changes. Roughly one in three genes upregulated by the pathogens is preceded by a candidate TAL effector binding element. Experimental analysis of the 44 such genes predicted to be Xoc TAL effector targets verified nearly half, and identified most others as false predictions. None of the Xoc targets is a known bacterial blight S gene. Mutational analysis revealed that Tal2g, which activates two genes, contributes to lesion expansion and bacterial exudation. Use of designer TAL effectors discriminated a sulfate transporter gene as the S gene. Across all targets, basal expression tended to be higher than genome-average, and induction moderate. Finally, machine learning applied to real vs. falsely predicted targets yielded a classifier that recalled 92% of the real targets with 88% precision, providing a tool for better target prediction in the future. Our study expands the number of known TAL effector targets, identifies a new class of S gene, and improves our ability to predict functional targeting.
Assuntos
Proteínas de Bactérias/genética , Genes de Plantas , Interações Hospedeiro-Patógeno/genética , Oryza/microbiologia , Doenças das Plantas/genética , Xanthomonas/genética , Sequência de Aminoácidos , Sequência de Bases , Análise Mutacional de DNA , Resistência à Doença , Regulação da Expressão Gênica de Plantas , Técnicas de Inativação de Genes , Análise de Sequência com Séries de Oligonucleotídeos , Folhas de Planta/microbiologia , Reação em Cadeia da Polimerase Via Transcriptase ReversaRESUMO
Identifying regions of loss-of-heterozygosity (LOH) in a tumor sample is a challenging problem. State-of-the-art computational approaches can infer LOH from single-nucleotide polymorphism (SNP) array data, but calling precise boundaries is complicated by normal-cell contamination and markers that are homozygous in the germline and therefore non-informative. More recently, the focus has shifted to pinpointing the loci recurrently affected by LOH events across multiple tumors. Recurrent LOH regions often harbor genes important for tumor suppression. Here, we propose a method that infers LOH rates across an entire sample set on an SNP-by-SNP basis. Our method achieves this by leveraging the straightforward principle that, by definition, LOH depletes heterozygotes, thereby disrupting Hardy-Weinberg equilibrium. We apply a statistical test for such LOH-influenced disruptions, and derive a maximum-likelihood estimator for the LOH rate based on the observed number of heterozygotes. This accounts for LOH in both its hemizygous deletion and copy-neutral forms, and does not make use of matched normal genotypes. Power simulations show high levels of sensitivity for the statistical test, and application to a control normal-tissue data set demonstrates a low false-discovery rate. We apply the method to three large publicly available tumor SNP array data sets, where it is able to localize tumor-suppressor gene targets of the LOH events. Inferred LOH rates are quite concordant across platforms/laboratories and between cell lines and tumors, but in a tumor type-dependent fashion. Finally, we produce rate estimates that are generally higher than previously published, and provide evidence that the latter are likely underestimates.
Assuntos
Biomarcadores Tumorais/genética , Desequilíbrio de Ligação/genética , Perda de Heterozigosidade/genética , Neoplasias/genética , Linhagem Celular Tumoral , Bases de Dados Genéticas , Reações Falso-Positivas , Genes Supressores de Tumor , Genoma Humano/genética , Humanos , Funções Verossimilhança , Neoplasias/enzimologia , Proteínas Tirosina Fosfatases Classe 2 Semelhantes a Receptores/genéticaRESUMO
Cancer is a disease driven by a combination of inherited risk alleles coupled with the acquisition of somatic mutations, including amplification and deletion of genomic DNA. Potential relationships between the inherited and somatic aspects of the disease have only rarely been examined on a genome-wide level. Applying a novel integrative analysis of SNP and copy number measurements, we queried the tumor and normal-tissue genomes of 178 glioblastoma patients from the Cancer Genome Atlas project for preferentially amplified alleles, under the hypothesis that oncogenic germline variants will be selectively amplified in the tumor environment. Selected alleles are revealed by allelic imbalance in amplification across samples. This general approach is based on genetic principles and provides a method for identifying important tumor-related alleles. We find that SNP alleles that are most significantly overrepresented in amplicons tend to occur in genes involved with regulation of kinase and transferase activity, and many of these genes are known contributors to gliomagenesis. The analysis also implicates variants in synapse genes. By incorporating gene expression data, we demonstrate synergy between preferential allelic amplification and expression in DOCK4 and EGFR. Our results support the notion that combining germline and tumor genetic data can identify regions relevant to cancer biology.
Assuntos
Alelos , Amplificação de Genes/genética , Células Germinativas/metabolismo , Glioblastoma/genética , Seleção Genética , Receptores ErbB/genética , Proteínas Ativadoras de GTPase/genética , Genoma Humano/genética , Glioblastoma/patologia , Humanos , Razão de Chances , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
BACKGROUND: Targeted sequencing using oncopanels requires comprehensive assessments of accuracy and detection sensitivity to ensure analytical validity. By employing reference materials characterized by the U.S. Food and Drug Administration-led SEquence Quality Control project phase2 (SEQC2) effort, we perform a cross-platform multi-lab evaluation of eight Pan-Cancer panels to assess best practices for oncopanel sequencing. RESULTS: All panels demonstrate high sensitivity across targeted high-confidence coding regions and variant types for the variants previously verified to have variant allele frequency (VAF) in the 5-20% range. Sensitivity is reduced by utilizing VAF thresholds due to inherent variability in VAF measurements. Enforcing a VAF threshold for reporting has a positive impact on reducing false positive calls. Importantly, the false positive rate is found to be significantly higher outside the high-confidence coding regions, resulting in lower reproducibility. Thus, region restriction and VAF thresholds lead to low relative technical variability in estimating promising biomarkers and tumor mutational burden. CONCLUSION: This comprehensive study provides actionable guidelines for oncopanel sequencing and clear evidence that supports a simplified approach to assess the analytical performance of oncopanels. It will facilitate the rapid implementation, validation, and quality control of oncopanels in clinical use.
Assuntos
Biomarcadores Tumorais , Testes Genéticos/métodos , Genômica/métodos , Neoplasias/genética , Oncogenes , Variações do Número de Cópias de DNA , Testes Genéticos/normas , Genômica/normas , Humanos , Técnicas de Diagnóstico Molecular/métodos , Técnicas de Diagnóstico Molecular/normas , Mutação , Neoplasias/diagnóstico , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
BACKGROUND: Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance. RESULTS: In reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5-100× more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels. CONCLUSION: These new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.
Assuntos
Alelos , Biomarcadores Tumorais , Frequência do Gene , Testes Genéticos/métodos , Variação Genética , Genômica/métodos , Neoplasias/genética , Linhagem Celular Tumoral , Variações do Número de Cópias de DNA , Heterogeneidade Genética , Testes Genéticos/normas , Genômica/normas , Humanos , Neoplasias/diagnóstico , Fluxo de TrabalhoRESUMO
Xanthomonas spp. reduce crop yields and quality worldwide. During infection of their plant hosts, many strains secrete transcription activator-like (TAL) effectors, which enter the host cell nucleus and activate specific corresponding host genes at effector binding elements (EBEs) in the promoter. TAL effectors may contribute to disease by activating the expression of susceptibility genes or trigger resistance associated with the hypersensitive reaction (HR) by activating an executor resistance (R) gene. The rice bacterial leaf streak pathogen X. oryzae pv. oryzicola (Xoc) is known to suppress host resistance, and no host R gene has been identified against it, despite considerable effort. To further investigate Xoc suppression of host resistance, we conducted a screen of effectors from BLS256 and identified Tal2a as an HR elicitor in rice when delivered heterologously by a strain of the closely related rice bacterial blight pathogen X. oryzae pv. oryzae (Xoo) or by the soybean pathogen X. axonopodis pv. glycines. The HR required the Tal2a activation domain, suggesting an executor R gene. Tal2a activity was differentially distributed among geographically diverse Xoc isolates, being largely conserved among Asian isolates. We identified four genes induced by Tal2a in next-generation RNA sequencing experiments and confirmed them using quantitative real-time reverse transcription-polymerase chain reaction (qPCR). However, neither individual nor collective activation of these genes by designer TAL effectors resulted in HR. A tal2a knockout mutant of BLS256 showed virulence comparable with the wild-type, but plasmid-based overexpression of tal2a at different levels in the wild-type reduced virulence in a directly corresponding way. Overall, the results reveal that host resistance suppression by Xoc plays a critical role in pathogenesis. Further, the dose-dependent avirulence activity of Tal2a and the apparent lack of a single canonical target that accounts for HR point to a novel, activation domain-dependent mode of action, which might involve, for example, a non-coding gene or a specific pattern of activation across multiple targets.
Assuntos
Proteínas de Bactérias/metabolismo , Resistência à Doença , Oryza/imunologia , Oryza/microbiologia , Doenças das Plantas/microbiologia , Efetores Semelhantes a Ativadores de Transcrição/metabolismo , Xanthomonas/metabolismo , Proteínas de Bactérias/química , Geografia , Oryza/enzimologia , Oryza/genética , Domínios Proteicos , Efetores Semelhantes a Ativadores de Transcrição/química , Ubiquitina Tiolesterase , Virulência , Xanthomonas/patogenicidadeRESUMO
Transcription activator-like (TAL) effectors from Xanthomonas citri subsp. malvacearum (Xcm) are essential for bacterial blight of cotton (BBC). Here, by combining transcriptome profiling with TAL effector-binding element (EBE) prediction, we show that GhSWEET10, encoding a functional sucrose transporter, is induced by Avrb6, a TAL effector determining Xcm pathogenicity. Activation of GhSWEET10 by designer TAL effectors (dTALEs) restores virulence of Xcm avrb6 deletion strains, whereas silencing of GhSWEET10 compromises cotton susceptibility to infections. A BBC-resistant line carrying an unknown recessive b6 gene bears the same EBE as the susceptible line, but Avrb6-mediated induction of GhSWEET10 is reduced, suggesting a unique mechanism underlying b6-mediated resistance. We show via an extensive survey of GhSWEET transcriptional responsiveness to different Xcm field isolates that additional GhSWEETs may also be involved in BBC. These findings advance our understanding of the disease and resistance in cotton and may facilitate the development cotton with improved resistance to BBC.
Assuntos
Gossypium/fisiologia , Proteínas de Membrana Transportadoras/genética , Doenças das Plantas/genética , Proteínas de Plantas/genética , Efetores Semelhantes a Ativadores de Transcrição/metabolismo , Xanthomonas/patogenicidade , Resistência à Doença/genética , Regulação da Expressão Gênica de Plantas/fisiologia , Gossypium/microbiologia , Proteínas de Membrana Transportadoras/metabolismo , Proteínas de Plantas/metabolismo , Regiões Promotoras Genéticas/genéticaRESUMO
Xanthomonas oryzae pv. oryzicola (Xoc) causes the increasingly important disease bacterial leaf streak of rice (BLS) in part by type III delivery of repeat-rich transcription activator-like (TAL) effectors to upregulate host susceptibility genes. By pathogen whole genome, single molecule, real-time sequencing and host RNA sequencing, we compared TAL effector content and rice transcriptional responses across 10 geographically diverse Xoc strains. TAL effector content is surprisingly conserved overall, yet distinguishes Asian from African isolates. Five TAL effectors are conserved across all strains. In a prior laboratory assay in rice cv. Nipponbare, only two contributed to virulence in strain BLS256 but the strict conservation indicates all five may be important, in different rice genotypes or in the field. Concatenated and aligned, TAL effector content across strains largely reflects relationships based on housekeeping genes, suggesting predominantly vertical transmission. Rice transcriptional responses did not reflect these relationships, and on average, only 28% of genes upregulated and 22% of genes downregulated by a strain are up- and down- regulated (respectively) by all strains. However, when only known TAL effector targets were considered, the relationships resembled those of the TAL effectors. Toward identifying new targets, we used the TAL effector-DNA recognition code to predict effector binding elements in promoters of genes upregulated by each strain, but found that for every strain, all upregulated genes had at least one. Filtering with a classifier we developed previously decreases the number of predicted binding elements across the genome, suggesting that it may reduce false positives among upregulated genes. Applying this filter and eliminating genes for which upregulation did not strictly correlate with presence of the corresponding TAL effector, we generated testable numbers of candidate targets for four of the five strictly conserved TAL effectors.
RESUMO
Protein-RNA interactions are central to essential cellular processes such as protein synthesis and regulation of gene expression and play roles in human infectious and genetic diseases. Reliable identification of protein-RNA interfaces is critical for understanding the structural bases and functional implications of such interactions and for developing effective approaches to rational drug design. Sequence-based computational methods offer a viable, cost-effective way to identify putative RNA-binding residues in RNA-binding proteins. Here we report two novel approaches: (i) HomPRIP, a sequence homology-based method for predicting RNA-binding sites in proteins; (ii) RNABindRPlus, a new method that combines predictions from HomPRIP with those from an optimized Support Vector Machine (SVM) classifier trained on a benchmark dataset of 198 RNA-binding proteins. Although highly reliable, HomPRIP cannot make predictions for the unaligned parts of query proteins and its coverage is limited by the availability of close sequence homologs of the query protein with experimentally determined RNA-binding sites. RNABindRPlus overcomes these limitations. We compared the performance of HomPRIP and RNABindRPlus with that of several state-of-the-art predictors on two test sets, RB44 and RB111. On a subset of proteins for which homologs with experimentally determined interfaces could be reliably identified, HomPRIP outperformed all other methods achieving an MCC of 0.63 on RB44 and 0.83 on RB111. RNABindRPlus was able to predict RNA-binding residues of all proteins in both test sets, achieving an MCC of 0.55 and 0.37, respectively, and outperforming all other methods, including those that make use of structure-derived features of proteins. More importantly, RNABindRPlus outperforms all other methods for any choice of tradeoff between precision and recall. An important advantage of both HomPRIP and RNABindRPlus is that they rely on readily available sequence and sequence-derived features of RNA-binding proteins. A webserver implementation of both methods is freely available at http://einstein.cs.iastate.edu/RNABindRPlus/.