RESUMEN
During the last decade, the generation and accumulation of petabase-scale high-throughput sequencing data have resulted in great challenges, including access to human data, as well as transfer, storage, and sharing of enormous amounts of data. To promote data-driven biological research, the Korean government announced that all biological data generated from government-funded research projects should be deposited at the Korea BioData Station (K-BDS), which consists of multiple databases for individual data types. Here, we introduce the Korean Nucleotide Archive (KoNA), a repository of nucleotide sequence data. As of July 2022, the Korean Read Archive in KoNA has collected over 477 TB of raw next-generation sequencing data from national genome projects. To ensure data quality and prepare for international alignment, a standard operating procedure was adopted, which is similar to that of the International Nucleotide Sequence Database Collaboration. The standard operating procedure includes quality control processes for submitted data and metadata using an automated pipeline, followed by manual examination. To ensure fast and stable data transfer, a high-speed transmission system called GBox is used in KoNA. Furthermore, the data uploaded to or downloaded from KoNA through GBox can be readily processed using a cloud computing service called Bio-Express. This seamless coupling of KoNA, GBox, and Bio-Express enhances the data experience, including submission, access, and analysis of raw nucleotide sequences. KoNA not only satisfies the unmet needs for a national sequence repository in Korea but also provides datasets to researchers globally and contributes to advances in genomics. The KoNA is available at https://www.kobic.re.kr/kona/.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , República de Corea , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodosRESUMEN
After the first outbreak, SARS-CoV-2 infection continues to occur due to the emergence of new variants. There is limited information available on the comparative evaluation of evolutionary characteristics of SARS-CoV-2 among different countries over time, and its relatedness to epidemiological and socio-environmental factors within those countries. We assessed comparative Bayesian evolutionary characteristics for SARS-CoV-2 in eight countries from 2020 to 2022 using BEAST version 2.6.7. Additionally, the relatedness between virus evolution factors and both epidemiological and socio-environmental factors was analyzed using Pearson's correlation coefficient. The estimated substitution rates in the gene encoding S protein of SARS-CoV-2 exhibited a continuous increase from 2020 to 2022 and were divided into two distinct groups in 2022 (p value < 0.05). Effective population size (Ne) generally showed decreased patterns by time. Notably, the change rates of the substitution rates were negatively correlated with the cumulative vaccination rates in 2021. A strict and rapid vaccination policy in the United Arab Emirates dramatically reduced the evolution of the virus, compared to other countries. Also, the average yearly temperature in countries were negatively correlated with the substitution rates. The changes of six epitopes in SARS-CoV-2 were related to various socio-environmental factors. We figured out comparative virus evolutionary traits and the association of epidemiological and socio-environmental factors especially cumulative vaccination rates and average temperature.
RESUMEN
Alterations in DNA methylation play an important pathophysiological role in the development and progression of colorectal cancer. We comprehensively profiled DNA methylation alterations in 165 Korean patients with colorectal cancer (CRC), and conducted an in-depth investigation of cancer-specific methylation patterns. Our analysis of the tumor samples revealed a significant presence of hypomethylated probes, primarily within the gene body regions; few hypermethylated sites were observed, which were mostly enriched in promoter-like and CpG island regions. The CpG Island Methylator PhenotypeHigh (CIMP-H) exhibited notable enrichment of microsatellite instability-high (MSI-H). Additionally, our findings indicated a significant correlation between methylation of the MLH1 gene and MSI-H status. Furthermore, we found that the CIMP-H had a higher tendency to affect the right-side of the colon tissues and was slightly more prevalent among older patients. Through our methylome profile analysis, we successfully verified the thylation patterns and clinical characteristics of Korean patients with CRC. This valuable dataset lays a strong foundation for exploring novel molecular insights and potential therapeutic targets for the treatment of CRC. [BMB Reports 2024; 57(2): 110-115].
Asunto(s)
Neoplasias Colorrectales , Metilación de ADN , Humanos , Metilación de ADN/genética , Inestabilidad de Microsatélites , Mutación , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patología , República de Corea , Islas de CpG/genética , FenotipoRESUMEN
A wave of new technologies has created opportunities for the cost-effective generation of high-throughput profiles of biological systems, foreshadowing a "data-driven science" era. The large variety of data available from biological research is also a rich resource that can be used for innovative endeavors. However, we are facing considerable challenges in big data deposition, integration, and translation due to the complexity of biological data and its production at unprecedented exponential rates. To address these problems, in 2020, the Korean government officially announced a national strategy to collect and manage the biological data produced through national R&D fund allocations and provide the collected data to researchers. To this end, the Korea Bioinformation Center (KOBIC) developed a new biological data repository, the Korea BioData Station (K-BDS), for sharing data from individual researchers and research programs to create a data-driven biological study environment. The K-BDS is dedicated to providing free open access to a suite of featured data resources in support of worldwide activities in both academia and industry.
RESUMEN
BACKGROUND: Systematic in vitro loss-of-function screens provide valuable resources that can facilitate the discovery of drugs targeting cancer vulnerabilities. RESULTS: We develop a deep learning-based method to predict tumor-specific vulnerabilities in patient samples by leveraging a wealth of in vitro screening data. Acquired dependencies of tumors are inferred in cases in which one allele is disrupted by inactivating mutations or in association with oncogenic mutations. Nucleocytoplasmic transport by Ran GTPase is identified as a common vulnerability in Her2-positive breast cancers. Vulnerability to loss of Ku70/80 is predicted for tumors that are defective in homologous recombination and rely on nonhomologous end joining for DNA repair. Our experimental validation for Ran, Ku70/80, and a proteasome subunit using patient-derived cells shows that they can be targeted specifically in particular tumors that are predicted to be dependent on them. CONCLUSION: This approach can be applied to facilitate the development of precision therapeutic targets for different tumors.
Asunto(s)
Neoplasias de la Mama/genética , Biología Computacional/métodos , Aprendizaje Profundo , Modelos Biológicos , Terapia Molecular Dirigida , Simulación por Computador , Humanos , Mutación PuntualRESUMEN
Despite the improved 5-year survival rate of breast cancer, triple-negative breast cancer (TNBC) remains a challenge due to lack of effective targeted therapy and higher recurrence and metastasis than other subtypes. To identify novel druggable targets and to understand its unique biology, we tried to implement 24 patient-derived xenografts (PDXs) of TNBC. The overall success rate of PDX implantation was 45%, much higher than estrogen receptor (ER)-positive cases. Immunohistochemical analysis revealed conserved ER/PR/Her2 negativity (with two exceptions) between the original and PDX tumors. Genomic analysis of 10 primary tumor-PDX pairs with Ion AmpliSeq CCP revealed high degree of variant conservation (85.0%-96.9%) between primary and PDXs. Further analysis showed 44 rare variants with a predicted high impact in 36 genes including Trp53, Pten, Notch1, and Col1a1. Among them, we confirmed frequent Notch1 variant. Furthermore, RNA-seq analysis of 24 PDXs revealed 594 gene fusions, of which 163 were in-frame, including AZGP1-GJC3 and NF1-AARSD1. Finally, western blot analysis of oncogenic signaling proteins supporting molecular diversity of TNBC PDXs. Overall, our report provides a molecular basis for the usefulness of the TNBC PDX model in preclinical study.
Asunto(s)
Biomarcadores de Tumor/genética , Recurrencia Local de Neoplasia/genética , Proteínas de Fusión Oncogénica/genética , Neoplasias de la Mama Triple Negativas/genética , Adipoquinas , Alanina-ARNt Ligasa/genética , Animales , Proteínas Portadoras/genética , Línea Celular Tumoral , Conexinas/genética , Femenino , Glicoproteínas/genética , Humanos , Ratones , Proteínas del Tejido Nervioso/genética , Neurofibromina 1/genética , Polimorfismo de Nucleótido Simple , Receptor Notch1/genética , Análisis de Secuencia de ARN , Ensayos Antitumor por Modelo de XenoinjertoRESUMEN
Cancer driving genes have been identified as recurrently affected by variants that alter protein-coding sequences. However, a majority of cancer variants arise in noncoding regions, and some of them are thought to play a critical role through transcriptional perturbation. Here we identified putative transcriptional driver genes based on combinatorial variant recurrence in cis-regulatory regions. The identified genes showed high connectivity in the cancer type-specific transcription regulatory network, with high outdegree and many downstream genes, highlighting their causative role during tumorigenesis. In the protein interactome, the identified transcriptional drivers were not as highly connected as coding driver genes but appeared to form a network module centered on the coding drivers. The coding and regulatory variants associated via these interactions between the coding and transcriptional drivers showed exclusive and complementary occurrence patterns across tumor samples. Transcriptional cancer drivers may act through an extensive perturbation of the regulatory network and by altering protein network modules through interactions with coding driver genes.
Asunto(s)
Regulación Neoplásica de la Expresión Génica/genética , Genes Relacionados con las Neoplasias/genética , Modelos Genéticos , Neoplasias/genética , Elementos Reguladores de la Transcripción/genética , Transducción de Señal/genética , Animales , Simulación por Computador , Variación Genética/genética , Humanos , Proteínas de Neoplasias/genéticaRESUMEN
BACKGROUND: One of the greatest challenges in cancer genomics is to distinguish driver mutations from passenger mutations. Whereas recurrence is a hallmark of driver mutations, it is difficult to observe recurring noncoding mutations owing to a limited amount of whole-genome sequenced samples. Hence, it is required to develop a method to predict potentially recurrent mutations. RESULTS: In this work, we developed a random forest classifier that predicts regulatory mutations that may recur based on the features of the mutations repeatedly appearing in a given cohort. With breast cancer as a model, we profiled 35 quantitative features describing genetic and epigenetic signals at the mutation site, transcription factors whose binding motif was disrupted by the mutation, and genes targeted by long-range chromatin interactions. A true set of mutations for machine learning was generated by interrogating publicly available pan-cancer genomes based on our statistical model of mutation recurrence. The performance of our random forest classifier was evaluated by cross validations. The variable importance of each feature in the classification of mutations was investigated. Our statistical recurrence model for the random forest classifier showed an area under the curve (AUC) of ~0.78 in predicting recurrent mutations. Chromatin accessibility at the mutation sites, the distance from the mutations to known cancer risk loci, and the role of the target genes in the regulatory or protein interaction network were among the most important variables. CONCLUSIONS: Our methods enable to characterize recurrent regulatory mutations using a limited number of whole-genome samples, and based on the characterization, to predict potential driver mutations whose recurrence is not found in the given samples but likely to be observed with additional samples.
Asunto(s)
Neoplasias de la Mama/genética , Genómica/métodos , Mutación , Cromatina/genética , Femenino , Genoma , Humanos , Modelos Estadísticos , Factores de Transcripción/genéticaRESUMEN
Recurrence is a hallmark of cancer-driving mutations. Recurrent mutations can arise at the same site or affect the same gene at different sites. Here we identified a set of mutations arising in individual samples and altering different cis-regulatory elements that converge on a common gene via chromatin interactions. The mutations and genes identified in this fashion showed strong relevance to cancer, in contrast to noncoding mutations with site-specific recurrence only. We developed a prediction method that identifies potentially recurrent mutations on the basis of the features shared by mutations whose recurrence is observed in a given cohort. Our method was capable of accurately predicting recurrent mutations at the level of target genes but not mutations recurring at the same site. We experimentally validated predicted mutations in distal regulatory regions of the TERT gene. In conclusion, we propose a novel approach to discovering potential cancer-driving mutations in noncoding regions.
Asunto(s)
Cromatina , Análisis Mutacional de ADN/métodos , Mutación , Neoplasias/genética , Cromatina/química , Estudios de Cohortes , ADN de Neoplasias , Elementos de Facilitación Genéticos , Pruebas Genéticas/métodos , Humanos , Secuencias Reguladoras de Ácidos NucleicosRESUMEN
Global network modeling of distal regulatory interactions is essential in understanding the overall architecture of gene expression programs. Here, we developed a Bayesian probabilistic model and computational method for global causal network construction with breast cancer as a model. Whereas physical regulator binding was well supported by gene expression causality in general, distal elements in intragenic regions or loci distant from the target gene exhibited particularly strong functional effects. Modeling the action of long-range enhancers was critical in recovering true biological interactions with increased coverage and specificity overall and unraveling regulatory complexity underlying tumor subclasses and drug responses in particular. Transcriptional cancer drivers and risk genes were discovered based on the network analysis of somatic and genetic cancer-related DNA variants. Notably, we observed that the risk genes were functionally downstream of the cancer drivers and were selectively susceptible to network perturbation by tumorigenic changes in their upstream drivers. Furthermore, cancer risk alleles tended to increase the susceptibility of the transcription of their associated genes. These findings suggest that transcriptional cancer drivers selectively induce a combinatorial misregulation of downstream risk genes, and that genetic risk factors, mostly residing in distal regulatory regions, increase transcriptional susceptibility to upstream cancer-driving somatic changes.
Asunto(s)
Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Genes Relacionados con las Neoplasias , Transcripción Genética , Teorema de Bayes , Línea Celular Tumoral , Elementos de Facilitación Genéticos , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Variación Genética , Genómica/métodos , Humanos , Células MCF-7 , Riesgo , Factores de Transcripción/metabolismoRESUMEN
The study aimed to investigate whether a combination of the P3-based Guilty Knowledge Test (GKT) and reality monitoring (RM) distinguished between individuals who are guilty, witnesses, or informed, and using both tests provided more accurate information than did the use of either measure alone. Participants consisted of 45 males that were randomly and evenly assigned to three groups (i.e., guilty, witness, and informed). The guilty group conducted a mock crime where they intentionally crashed their vehicle into another vehicle in a virtual environment (VE). As those in the witness group drove their own vehicles, they observed the guilty groups' vehicle crash into another vehicle. The informed group read an account and saw screenshots of the accident. All participants were instructed to insist that they were innocent. Subsequently, they performed the P3-based GKT and wrote an account of the accident for the RM analysis. A higher P3 amplitude corresponded to how well the participants recognized the presented stimulus, and a higher RM score corresponded to how well the participants reported vivid sensory information and how much less they reported uncertain information. Findings for the P3-based GKT indicated that the informed group showed lower P3 amplitude when presented with the probe stimulus than did the guilty and witness groups. Regarding the RM analysis, the informed group obtained higher RM scores on visual, temporal, and spatial details and lower scores on cognitive operations than the guilty and witness groups. Finally, discriminant analysis revealed that the combination of the P3-based GKT and RM more accurately distinguished between the three groups than the use of either measure alone. The findings suggest that RM may build upon a weakness of the P3-based GKT's. More specifically, it may build upon its susceptibility to the leakage of information about the crime, therefore helping protect innocent individuals who have information about a crime from being perceived as guilty.
RESUMEN
An easily applicable empirical formula was derived for use in the assessment of the photoneutron dose at the maze entrance of a 15 MV medical accelerator treatment room. The neutron dose equivalent rates around the Varian medical accelerator head calculated with the Monte Carlo code MCNPX were used as the source term in producing the base data. The dose equivalents were validated by measurements with bubble detectors. Irradiation geometry conditions expected to yield higher neutron dose rates in the maze were selected: a 20 x 20 cm2 irradiation field, gantry rotation plane parallel to the maze walls, and the photon beams directed to the opposite wall to the maze entrance. The neutron dose equivalents at the maze entrance were computed for 697 arbitrary single-bend maze configurations by extending the Monte Carlo calculations down to the maze entrance. Then, the empirical formula was derived by a multiple regression fit to the neutron dose equivalents at the maze entrance for all the different maze configurations. The goodness of the empirical formula was evaluated by applying it to seven operating medical accelerators of different makes. When the source terms were fixed, the neutron doses estimated from the authors' formula agreed better with the corresponding MCNPX simulations than the results of the Kersey method. In addition, compared with the Wu-McGinley formula, the authors' formula provided better estimates for the mazes with length longer than 8.5 m. There are, however, discrepancies between the measured dose rates and the estimated values from the authors' formula, particularly for the machines other than a Varian model. Further efforts are needed to characterize the neutron field at the maze entrance to reduce the discrepancies. Furthermore, neutron source terms for the machines other than a Varian model should be simulated or measured and incorporated into the formula for accurate extended application to a variety of models.
Asunto(s)
Algoritmos , Modelos Teóricos , Aceleradores de Partículas/instrumentación , Monitoreo de Radiación/métodos , Protección Radiológica/métodos , Simulación por Computador , Neutrones , Dosis de RadiaciónRESUMEN
OBJECTIVE: S100B is a neurotrophic factor that is involved in neuroplasticity. Neuroplasticity is disrupted in depression; however, treatment with antidepressants can restore neuroplasticity. S100B has previously been used as a biological marker for neuropathology and neuroplasticity; therefore, in this study, we compared serum S100B levels in depressive patients to those of normal controls. In addition, we compared the serum S100B levels of antidepressant responders to those of nonresponders. METHODS: Thirty five normal controls and 59 depressive patients were enrolled in this study. Depressive patients entered a 6 week clinical trial that included treatment with antidepressants. The serum S100B levels and clinical assessments, which included Hamilton depression rating scores, were measured at baseline and after 6 weeks of treatment with antidepressants. The difference in the serum S100B levels between depressive patients and normal controls and between antidepressant responders and nonresponders was then compared. RESULTS: There were no significant differences in the serum S100B levels of normal controls and depressive patients. In addition, 30 of the depressive patients responded to antidepressant treatment while 29 did not. Finally, the responders had significantly higher baseline serum S100B levels than the nonresponders. CONCLUSION: The results of this study suggest that the baseline serum S100B level is associated with the subsequent response to antidepressants. In addition, the high baseline serum S100B level that was observed in depressive patients may enhance neuroplasticity, which results in a favorable therapeutic response to antidepressants.
RESUMEN
Changes in P300 amplitude were used as an indicator of reactivity to smoking-related stimuli in smokers. The amplitude of P300--a component of event-related brain potentials (ERPs) elicited by 10 smoking-related (craving), 10 antismoking (aversive) and 10 neutral stimuli-- was recorded in smokers (n=10) and nonsmokers (n=10). Electroencephalography (EEG) data were obtained by the Laxtha EEG-monitoring device in the EEG recording room, and were recorded at F3, F4, C3, and C4. Three-way repeated-measures analysis of variance (ANOVA) was computed on the P300 amplitudes. The factors were group (smokers, nonsmokers), stimulus (craving, aversive, neutral), and electrode location (F3, F4, C3, and C4). The main effects of stimulus were significant, but the group effects did not show significant interactions with other factors. An interesting observation was the similarity between P300 waveforms for craving and aversive stimuli in smokers. These findings could indicate that the antismoking-related response is similar to the smoking-related one.