RESUMO
BACKGROUND: The beauty and power of the genome editing mechanism, CRISPR Cas9 endonuclease system, lies in the fact that it is RNA-programmable such that Cas9 can be guided to any genomic loci complementary to a 20-nt RNA, single guide RNA (sgRNA), to cleave double stranded DNA, allowing the introduction of wanted mutations. Unfortunately, it has been reported repeatedly that the sgRNA can also guide Cas9 to off-target sites where the DNA sequence is homologous to sgRNA. RESULTS: Using human genome and Streptococcus pyogenes Cas9 (SpCas9) as an example, this article mathematically analyzed the probabilities of off-target homologies of sgRNAs and discovered that for large genome size such as human genome, potential off-target homologies are inevitable for sgRNA selection. A highly efficient computationl algorithm was developed for whole genome sgRNA design and off-target homology searches. By means of a dynamically constructed sequence-indexed database and a simplified sequence alignment method, this algorithm achieves very high efficiency while guaranteeing the identification of all existing potential off-target homologies. Via this algorithm, 1,876,775 sgRNAs were designed for the 19,153 human mRNA genes and only two sgRNAs were found to be free of off-target homology. CONCLUSIONS: By means of the novel and efficient sgRNA homology search algorithm introduced in this article, genome wide sgRNA design and off-target analysis were conducted and the results confirmed the mathematical analysis that for a sgRNA sequence, it is almost impossible to escape potential off-target homologies. Future innovations on the CRISPR Cas9 gene editing technology need to focus on how to eliminate the Cas9 off-target activity.
Assuntos
Algoritmos , Sistemas CRISPR-Cas , RNA Guia de Cinetoplastídeos/genética , Análise de Sequência de RNA/métodos , Streptococcus pyogenes/genética , Genoma Humano , Genômica , Humanos , Edição de RNARESUMO
BACKGROUND: Green tea polyphenol epigallocatechin-3-gallate (EGCG) has been demonstrated to inhibit cancer in experimental studies through its antioxidant activity and modulations on cellular functions by binding specific proteins. By means of computational analysis and functional genomic approaches, we previously identified a set of protein coding genes and microRNAs whose expressions were significantly modulated in response to the EGCG treatment in tobacco carcinogen-induced lung adenocarcinoma in A/J mice. However, to what degree these genes are involved in the cancer inhibition of EGCG remains unclear. RESULTS: In this study, we further employed statistical methods and literature research to analyze these data in combination with The Cancer Genome Atlas (TCGA) lung adenocarcinoma datasets for additional data mining. Under the assumption that, if a gene mediates EGCG's cancer inhibition, its expression level change caused by EGCG should be opposite to what occurred in the carcinogenesis, we identified Myb and Peg3 as the primary putative genes involved in the cancer inhibitory activity. Further analysis suggested that the regulation of Myb could be mediated through an EGCG-upregulated microRNA, miR-449c-5p. CONCLUSIONS: Although the actions of EGCG involve multiple targets/pathways, further analysis by mining the existing genomic datasets revealed that the upregulations of Myb and Peg3 are likely the key anti-cancer events of EGCG in vivo.