RESUMEN
BACKGROUND: The domestic dog, Canis lupus familiaris, is a companion animal for humans as well as an animal model in cancer research due to similar spontaneous occurrence of cancers as humans. Despite the social and biological importance of dogs, the catalogue of genomic variations and transcripts for dogs is relatively incomplete. RESULTS: We developed CanISO, a new database to hold a large collection of transcriptome profiles and genomic variations for domestic dogs. CanISO provides 87,692 novel transcript isoforms and 60,992 known isoforms from whole transcriptome sequencing of canine tumors (N = 157) and their matched normal tissues (N = 64). CanISO also provides genomic variation information for 210,444 unique germline single nucleotide polymorphisms (SNPs) from the whole exome sequencing of 183 dogs, with a query system that searches gene- and transcript-level information as well as covered SNPs. Transcriptome profiles can be compared with corresponding human transcript isoforms at a tissue level, or between sample groups to identify tumor-specific gene expression and alternative splicing patterns. CONCLUSIONS: CanISO is expected to increase understanding of the dog genome and transcriptome, as well as its functional associations with humans, such as shared/distinct mechanisms of cancer. CanISO is publicly available at https://www.kobic.re.kr/caniso/ .
Asunto(s)
Neoplasias , Lobos , Perros , Animales , Humanos , Transcriptoma , Lobos/genética , Genoma , Genómica , Neoplasias/genética , Neoplasias/veterinaria , Isoformas de Proteínas/genéticaRESUMEN
Three-dimensional (3D) genome organization is tightly coupled with gene regulation in various biological processes and diseases. In cancer, various types of large-scale genomic rearrangements can disrupt the 3D genome, leading to oncogenic gene expression. However, unraveling the pathogenicity of the 3D cancer genome remains a challenge since closer examinations have been greatly limited due to the lack of appropriate tools specialized for disorganized higher-order chromatin structure. Here, we updated a 3D-genome Interaction Viewer and database named 3DIV by uniformly processing â¼230 billion raw Hi-C reads to expand our contents to the 3D cancer genome. The updates of 3DIV are listed as follows: (i) the collection of 401 samples including 220 cancer cell line/tumor Hi-C data, 153 normal cell line/tissue Hi-C data, and 28 promoter capture Hi-C data, (ii) the live interactive manipulation of the 3D cancer genome to simulate the impact of structural variations and (iii) the reconstruction of Hi-C contact maps by user-defined chromosome order to investigate the 3D genome of the complex genomic rearrangement. In summary, the updated 3DIV will be the most comprehensive resource to explore the gene regulatory effects of both the normal and cancer 3D genome. '3DIV' is freely available at http://3div.kr.
Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Genómica , Neoplasias/genética , Biología Computacional/métodos , Epigenómica/métodos , Regulación Neoplásica de la Expresión Génica , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Humanos , Programas InformáticosRESUMEN
High-throughput screening based on CRISPR-Cas9 libraries has become an attractive and powerful technique to identify target genes for functional studies. However, accessibility of public data is limited due to the lack of user-friendly utilities and up-to-date resources covering experiments from third parties. Here, we describe iCSDB, an integrated database of CRISPR screening experiments using human cell lines. We compiled two major sources of CRISPR-Cas9 screening: the DepMap portal and BioGRID ORCS. DepMap portal itself is an integrated database that includes three large-scale projects of CRISPR screening. We additionally aggregated CRISPR screens from BioGRID ORCS that is a collection of screening results from PubMed articles. Currently, iCSDB contains 1375 genome-wide screens across 976 human cell lines, covering 28 tissues and 70 cancer types. Importantly, the batch effects from different CRISPR libraries were removed and the screening scores were converted into a single metric to estimate the knockout efficiency. Clinical and molecular information were also integrated to help users to select cell lines of interest readily. Furthermore, we have implemented various interactive tools and viewers to facilitate users to choose, examine and compare the screen results both at the gene and guide RNA levels. iCSDB is available at https://www.kobic.re.kr/icsdb/.
Asunto(s)
Sistemas CRISPR-Cas/genética , Bases de Datos Genéticas , Edición Génica/métodos , Marcación de Gen/métodos , Genoma Humano/genética , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Línea Celular Tumoral , Humanos , Internet , Navegador WebRESUMEN
BACKGROUND & AIMS: Squalene epoxidase (SQLE), a rate-limiting enzyme in cholesterol biosynthesis, is suggested as a proto-oncogene. Paradoxically, SQLE is degraded by excess cholesterol, and low SQLE is associated with aggressive colorectal cancer (CRC). Therefore, we studied the functional consequences of SQLE reduction in CRC progression. METHODS: Gene and protein expression data and clinical features of CRCs were obtained from public databases and 293 human tissues, analyzed by immunohistochemistry. In vitro studies showed underlying mechanisms of CRC progression mediated by SQLE reduction. Mice were fed a 2% high-cholesterol or a control diet before and after cecum implantation of SQLE genetic knockdown/control CRC cells. Metastatic dissemination and circulating cancer stem cells were demonstrated by in vivo tracking and flow cytometry analysis, respectively. RESULTS: In vitro studies showed that SQLE reduction helped cancer cells overcome constraints by inducing the epithelial-mesenchymal transition required to generate cancer stem cells. Surprisingly, SQLE interacted with GSK3ß and p53. Active GSK3ß contributes to the stability of SQLE, thereby increasing cell cholesterol content, whereas SQLE depletion disrupted the GSK3ß/p53 complex, resulting in a metastatic phenotype. This was confirmed in a spontaneous CRC metastasis mice model, where SQLE reduction, by a high-cholesterol regimen or genetic knockdown, strikingly promoted CRC aggressiveness through the production of migratory cancer stem cells. CONCLUSIONS: We showed that SQLE reduction caused by cholesterol accumulation aggravates CRC progression via the activation of the ß-catenin oncogenic pathway and deactivation of the p53 tumor suppressor pathway. Our findings provide new insights into the link between cholesterol and CRC, identifying SQLE as a key regulator in CRC aggressiveness and a prognostic biomarker.
Asunto(s)
Colesterol/metabolismo , Neoplasias Colorrectales/patología , Escualeno-Monooxigenasa/metabolismo , Adulto , Anciano , Animales , Línea Celular Tumoral , Colon/patología , Modelos Animales de Enfermedad , Femenino , Técnicas de Silenciamiento del Gen , Glucógeno Sintasa Quinasa 3 beta/metabolismo , Humanos , Mucosa Intestinal/patología , Masculino , Ratones , Persona de Mediana Edad , Células Madre Neoplásicas/patología , Oxidación-Reducción , Proto-Oncogenes Mas , Recto/patología , Escualeno-Monooxigenasa/genética , Proteína p53 Supresora de Tumor/metabolismo , Adulto Joven , beta Catenina/metabolismoRESUMEN
Fusion genes represent an important class of biomarkers and therapeutic targets in cancer. ChimerDB is a comprehensive database of fusion genes encompassing analysis of deep sequencing data (ChimerSeq) and text mining of publications (ChimerPub) with extensive manual annotations (ChimerKB). In this update, we present all three modules substantially enhanced by incorporating the recent flood of deep sequencing data and related publications. ChimerSeq now covers all 10 565 patients in the TCGA project, with compilation of computational results from two reliable programs of STAR-Fusion and FusionScan with several public resources. In sum, ChimerSeq includes 65 945 fusion candidates, 21 106 of which were predicted by multiple programs (ChimerSeq-Plus). ChimerPub has been upgraded by applying a deep learning method for text mining followed by extensive manual curation, which yielded 1257 fusion genes including 777 cases with experimental supports (ChimerPub-Plus). ChimerKB includes 1597 fusion genes with publication support, experimental evidences and breakpoint information. Importantly, we implemented several new features to aid estimation of functional significance, including the fusion structure viewer with domain information, gene expression plot of fusion positive versus negative patients and a STRING network viewer. The user interface also was greatly enhanced by applying responsive web design. ChimerDB 4.0 is available at http://www.kobic.re.kr/chimerdb/.
Asunto(s)
Biomarcadores de Tumor/genética , Biología Computacional , Manejo de Datos , Bases de Datos Genéticas , Neoplasias/genética , Minería de Datos , Humanos , Neoplasias/terapia , Programas Informáticos , Interfaz Usuario-ComputadorRESUMEN
SUMMARY: Predictive biomarkers for patient stratification play critical roles in realizing the paradigm of precision medicine. Molecular characteristics such as somatic mutations and expression signatures represent the primary source of putative biomarker genes for patient stratification. However, evaluation of such candidate biomarkers is still cumbersome and requires multistep procedures especially when using massive public omics data. Here, we present an interactive web application that divides patients from large cohorts (e.g. The Cancer Genome Atlas, TCGA) dynamically into two groups according to the mutation, copy number variation or gene expression of query genes. It further supports users to examine the prognostic value of resulting patient groups based on survival analysis and their association with the clinical features as well as the previously annotated molecular subtypes, facilitated with a rich and interactive visualization. Importantly, we also support custom omics data with clinical information. AVAILABILITY AND IMPLEMENTATION: CaPSSA (Cancer Patient Stratification and Survival Analysis) runs on a web-browser and is freely available without restrictions at http://www.kobic.re.kr/capssa/. The source code is available on https://github.com/yjjang/capssa. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias/genética , Oncogenes , Variaciones en el Número de Copia de ADN , Humanos , Mutación , Programas Informáticos , Análisis de SupervivenciaRESUMEN
Three-dimensional (3D) chromatin structure is an emerging paradigm for understanding gene regulation mechanisms. Hi-C (high-throughput chromatin conformation capture), a method to detect long-range chromatin interactions, allows extensive genome-wide investigation of 3D chromatin structure. However, broad application of Hi-C data have been hindered by the level of complexity in processing Hi-C data and the large size of raw sequencing data. In order to overcome these limitations, we constructed a database named 3DIV (a 3D-genome Interaction Viewer and database) that provides a list of long-range chromatin interaction partners for the queried locus with genomic and epigenomic annotations. 3DIV is the first of its kind to collect all publicly available human Hi-C data to provide 66 billion uniformly processed raw Hi-C read pairs obtained from 80 different human cell/tissue types. In contrast to other databases, 3DIV uniquely provides normalized chromatin interaction frequencies against genomic distance dependent background signals and a dynamic browsing visualization tool for the listed interactions, which could greatly advance the interpretation of chromatin interactions. '3DIV' is available at http://kobic.kr/3div.
Asunto(s)
Cromatina/genética , Bases de Datos Genéticas , Genoma Humano , Programas Informáticos , Cromatina/ultraestructura , Bases de Datos de Ácidos Nucleicos , Epigénesis Genética , Estudio de Asociación del Genoma Completo , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Imagenología Tridimensional , Internet , Anotación de Secuencia Molecular , Conformación de Ácido Nucleico , Polimorfismo de Nucleótido SimpleRESUMEN
Fusion gene is an important class of therapeutic targets and prognostic markers in cancer. ChimerDB is a comprehensive database of fusion genes encompassing analysis of deep sequencing data and manual curations. In this update, the database coverage was enhanced considerably by adding two new modules of The Cancer Genome Atlas (TCGA) RNA-Seq analysis and PubMed abstract mining. ChimerDB 3.0 is composed of three modules of ChimerKB, ChimerPub and ChimerSeq. ChimerKB represents a knowledgebase including 1066 fusion genes with manual curation that were compiled from public resources of fusion genes with experimental evidences. ChimerPub includes 2767 fusion genes obtained from text mining of PubMed abstracts. ChimerSeq module is designed to archive the fusion candidates from deep sequencing data. Importantly, we have analyzed RNA-Seq data of the TCGA project covering 4569 patients in 23 cancer types using two reliable programs of FusionScan and TopHat-Fusion. The new user interface supports diverse search options and graphic representation of fusion gene structure. ChimerDB 3.0 is available at http://ercsb.ewha.ac.kr/fusiongene/.
Asunto(s)
Minería de Datos , Bases de Datos Genéticas , Neoplasias/genética , Proteínas de Fusión Oncogénica/genética , Transcriptoma , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Humanos , Programas Informáticos , Interfaz Usuario-ComputadorRESUMEN
SUMMARY: Deep sequencing of small RNAs has become a routine process in recent years, but no dedicated viewer is as yet available to explore the sequence features simultaneously along with secondary structure and gene expression of microRNA (miRNA). We present a highly interactive application that visualizes the sequence alignment, secondary structure and normalized read counts in synchronous multipanel windows. This helps users to easily examine the relationships between the structure of precursor and the sequences and abundance of final products and thereby will facilitate the studies on miRNA biogenesis and regulation. The project manager handles multiple samples of multiple groups. The read alignment is imported in BAM file format. Implemented features comprise sorting, zooming, highlighting, editing, filtering, saving, exporting, etc. Currently, miRseqViewer supports 84 organisms whose annotation is available at miRBase. AVAILABILITY AND IMPLEMENTATION: miRseqViewer, implemented in Java, is available at https://github.com/insoo078/mirseqviewer or at http://msv.kobic.re.kr. CONTACT: sanghyuk@ewha.ac.kr.
Asunto(s)
Biología Computacional/métodos , Gráficos por Computador , Bases de Datos de Ácidos Nucleicos , MicroARNs/genética , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Alineación de SecuenciaRESUMEN
Biogenesis and molecular function are two key subjects in the field of microRNA (miRNA) research. Deep sequencing has become the principal technique in cataloging of miRNA repertoire and generating expression profiles in an unbiased manner. Here, we describe the miRGator v3.0 update (http://mirgator.kobic.re.kr) that compiled the deep sequencing miRNA data available in public and implemented several novel tools to facilitate exploration of massive data. The miR-seq browser supports users to examine short read alignment with the secondary structure and read count information available in concurrent windows. Features such as sequence editing, sorting, ordering, import and export of user data would be of great utility for studying iso-miRs, miRNA editing and modifications. miRNA-target relation is essential for understanding miRNA function. Coexpression analysis of miRNA and target mRNAs, based on miRNA-seq and RNA-seq data from the same sample, is visualized in the heat-map and network views where users can investigate the inverse correlation of gene expression and target relations, compiled from various databases of predicted and validated targets. By keeping datasets and analytic tools up-to-date, miRGator should continue to serve as an integrated resource for biogenesis and functional investigation of miRNAs.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , MicroARNs/química , MicroARNs/metabolismo , ARN Mensajero/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento , Internet , ARN Mensajero/química , Análisis de Secuencia de ARN , TranscriptomaRESUMEN
A wave of new technologies has created opportunities for the cost-effective generation of high-throughput profiles of biological systems, foreshadowing a "data-driven science" era. The large variety of data available from biological research is also a rich resource that can be used for innovative endeavors. However, we are facing considerable challenges in big data deposition, integration, and translation due to the complexity of biological data and its production at unprecedented exponential rates. To address these problems, in 2020, the Korean government officially announced a national strategy to collect and manage the biological data produced through national R&D fund allocations and provide the collected data to researchers. To this end, the Korea Bioinformation Center (KOBIC) developed a new biological data repository, the Korea BioData Station (K-BDS), for sharing data from individual researchers and research programs to create a data-driven biological study environment. The K-BDS is dedicated to providing free open access to a suite of featured data resources in support of worldwide activities in both academia and industry.
RESUMEN
Despite substantial advances in disease genetics, studies to date have largely focused on individuals of European descent. This limits further discoveries of novel functional genetic variants in other ethnic groups. To alleviate the paucity of East Asian population genome resources, we established the Korean Variant Archive 2 (KOVA 2), which is composed of 1896 whole-genome sequences and 3409 whole-exome sequences from healthy individuals of Korean ethnicity. This is the largest genome database from the ethnic Korean population to date, surpassing the 1909 Korean individuals deposited in gnomAD. The variants in KOVA 2 displayed all the known genetic features of those from previous genome databases, and we compiled data from Korean-specific runs of homozygosity, positively selected intervals, and structural variants. In doing so, we found loci, such as the loci of ADH1A/1B and UHRF1BP1, that are strongly selected in the Korean population relative to other East Asian populations. Our analysis of allele ages revealed a correlation between variant functionality and evolutionary age. The data can be browsed and downloaded from a public website ( https://www.kobic.re.kr/kova/ ). We anticipate that KOVA 2 will serve as a valuable resource for genetic studies involving East Asian populations.
Asunto(s)
Pueblo Asiatico , Exoma , Humanos , Pueblo Asiatico/genética , República de Corea , Polimorfismo de Nucleótido SimpleRESUMEN
Human pluripotent stem cell (hPSC)-derived organoids and cells have similar characteristics to human organs and tissues. Thus, in vitro human organoids and cells serve as a superior alternative to conventional cell lines and animal models in drug development and regenerative medicine. For a simple and reproducible analysis of the quality of organoids and cells to compensate for the shortcomings of existing experimental validation studies, a quantitative evaluation method should be developed. Here, using the GTEx database, we construct a quantitative calculation system to assess similarity to the human organs. To evaluate our system, we generate hPSC-derived organoids and cells, and detected organ similarity. To facilitate the access of our system by researchers, we develop a web-based user interface presenting similarity to the appropriate organs as percentages. Thus, this program could provide valuable information for the generation of high-quality organoids and cells and a strategy to guide proper lineage-oriented differentiation.
Asunto(s)
Algoritmos , Diferenciación Celular/genética , Especificidad de Órganos/genética , Organoides/metabolismo , Células Madre Pluripotentes/metabolismo , Transcriptoma/genética , Técnicas de Cultivo de Célula/métodos , Línea Celular , Perfilación de la Expresión Génica/métodos , Humanos , Organoides/citología , Células Madre Pluripotentes/citología , RNA-Seq/métodos , Reacción en Cadena de la Polimerasa de Transcriptasa InversaRESUMEN
Functional analyses of genes are crucial for unveiling biological responses, genetic engineering, and developing new medicines. However, functional analyses have largely been restricted to model organisms, representing a major hurdle for functional studies and industrial applications. To resolve this, comparative genome analyses can be used to provide clues to gene functions as well as their evolutionary history. To this end, we present Prometheus, a web-based omics portal that contains more than 17,215 sequences from prokaryotic and eukaryotic genomes. This portal supports interkingdom comparative analyses via a domain architecture-based gene identification system and Gene Search, and users can easily and rapidly identify single or entire gene sets in specific pathways. Bioinformatics tools for further analyses are provided in Prometheus or through Bio-Express, a cloud-based bioinformatics analysis platform. Prometheus is a new paradigm for comparative analyses of large amounts of genomic information.
Asunto(s)
Genómica/métodos , Programas Informáticos , Animales , Archaea/genética , Bacterias/genética , Hongos/genética , Humanos , Metabolómica/métodos , Plantas/genética , Alineación de Secuencia/métodosRESUMEN
Database URL: GEMiCCL is available at https://www.kobic.kr/GEMICCL/.
Asunto(s)
Línea Celular Tumoral , Bases de Datos Genéticas , Regulación Neoplásica de la Expresión Génica , Mutación , Neoplasias , Polimorfismo de Nucleótido Simple , Programas Informáticos , Animales , Humanos , Neoplasias/genética , Neoplasias/metabolismoRESUMEN
Hibiscus syriacus (L.) (rose of Sharon) is one of the most widespread garden shrubs in the world. We report a draft of the H. syriacus genome comprised of a 1.75 Gb assembly that covers 92% of the genome with only 1.7% (33 Mb) gap sequences. Predicted gene modeling detected 87,603 genes, mostly supported by deep RNA sequencing data. To define gene family distribution among relatives of H. syriacus, orthologous gene sets containing 164,660 genes in 21,472 clusters were identified by OrthoMCL analysis of five plant species, including H. syriacus, Arabidopsis thaliana, Gossypium raimondii, Theobroma cacao and Amborella trichopoda. We inferred their evolutionary relationships based on divergence times among Malvaceae plant genes and found that gene families involved in flowering regulation and disease resistance were more highly divergent and expanded in H. syriacus than in its close relatives, G. raimondii (DD) and T. cacao. Clustered gene families and gene collinearity analysis revealed that two recent rounds of whole-genome duplication were followed by diploidization of the H. syriacus genome after speciation. Copy number variation and phylogenetic divergence indicates that WGDs and subsequent diploidization led to unequal duplication and deletion of flowering-related genes in H. syriacus and may affect its unique floral morphology.
Asunto(s)
Flores/crecimiento & desarrollo , Genoma de Planta , Hibiscus/genética , Poliploidía , Proteínas de Unión al ADN/genética , Hibiscus/fisiología , Familia de Multigenes , Proteínas de Unión al ARN/genética , TranscriptomaRESUMEN
BACKGROUND: Deep sequencing techniques provide a remarkable opportunity for comprehensive understanding of tumorigenesis at the molecular level. As omics studies become popular, integrative approaches need to be developed to move from a simple cataloguing of mutations and changes in gene expression to dissecting the molecular nature of carcinogenesis at the systemic level and understanding the complex networks that lead to cancer development. RESULTS: Here, we describe a high-throughput, multi-dimensional sequencing study of primary lung adenocarcinoma tumors and adjacent normal tissues of six Korean female never-smoker patients. Our data encompass results from exome-seq, RNA-seq, small RNA-seq, and MeDIP-seq. We identified and validated novel genetic aberrations, including 47 somatic mutations and 19 fusion transcripts. One of the fusions involves the c-RET gene, which was recently reported to form fusion genes that may function as drivers of carcinogenesis in lung cancer patients. We also characterized gene expression profiles, which we integrated with genomic aberrations and gene regulations into functional networks. The most prominent gene network module that emerged indicates that disturbances in G2/M transition and mitotic progression are causally linked to tumorigenesis in these patients. Also, results from the analysis strongly suggest that several novel microRNA-target interactions represent key regulatory elements of the gene network. CONCLUSIONS: Our study not only provides an overview of the alterations occurring in lung adenocarcinoma at multiple levels from genome to transcriptome and epigenome, but also offers a model for integrative genomics analysis and proposes potential target pathways for the control of lung adenocarcinoma.