Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 31
Filter
1.
Genomics & Informatics ; : e40-2023.
Article in English | WPRIM | ID: wpr-1000717

ABSTRACT

Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipeline’s performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discarding chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted.

2.
Gut and Liver ; : 85-91, 2021.
Article in English | WPRIM | ID: wpr-874566

ABSTRACT

Background/Aims@#Risk prediction models using a deep neural network (DNN) have not been reported to predict the risk of advanced colorectal neoplasia (ACRN). The aim of this study was to compare DNN models with simple clinical score models to predict the risk of ACRN in colorectal cancer screening. @*Methods@#Databases of screening colonoscopy from Kangbuk Samsung Hospital (n=121,794) and Kyung Hee University Hospital at Gangdong (n=3,728) were used to develop DNN-based prediction models. Two DNN models, the Asian-Pacific Colorectal Screening (APCS) model and the Korean Colorectal Screening (KCS) model, were developed and compared with two simple score models using logistic regression methods to predict the risk of ACRN. The areas under the receiver operating characteristic curves (AUCs) of the models were compared in internal and external validation databases. @*Results@#In the internal validation set, the AUCs of DNN model 1 and the APCS score model were 0.713 and 0.662 (p0.1). @*Conclusions@#Simple score models for the risk prediction of ACRN are as useful as DNN-based models when input variables are limited. However, further studies on this issue are warranted to predict the risk of ACRN in colorectal cancer screening because DNN-based models are currently under improvement.

3.
Genomics & Informatics ; : e26-2020.
Article in English | WPRIM | ID: wpr-890708

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) has been widely applied to provide insights into the cell-by-cell expression difference in a given bulk sample. Accordingly, numerous analysis methods have been developed. As it involves simultaneous analyses of many cell and genes, efficiency of the methods is crucial. The conventional cell type annotation method is laborious and subjective. Here we propose a semi-automatic method that calculates a normalized score for each cell type based on user-supplied cell type–specific marker gene list. The method was applied to a publicly available scRNA-seq data of mouse cardiac non-myocyte cell pool. Annotating the 35 t-stochastic neighbor embedding clusters into 12 cell types was straightforward, and its accuracy was evaluated by constructing co-expression network for each cell type. Gene Ontology analysis was congruent with the annotated cell type and the corollary regulatory network analysis showed upstream transcription factors that have well supported literature evidences. The source code is available as an R script upon request.

4.
Genomics & Informatics ; : e26-2020.
Article in English | WPRIM | ID: wpr-898412

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) has been widely applied to provide insights into the cell-by-cell expression difference in a given bulk sample. Accordingly, numerous analysis methods have been developed. As it involves simultaneous analyses of many cell and genes, efficiency of the methods is crucial. The conventional cell type annotation method is laborious and subjective. Here we propose a semi-automatic method that calculates a normalized score for each cell type based on user-supplied cell type–specific marker gene list. The method was applied to a publicly available scRNA-seq data of mouse cardiac non-myocyte cell pool. Annotating the 35 t-stochastic neighbor embedding clusters into 12 cell types was straightforward, and its accuracy was evaluated by constructing co-expression network for each cell type. Gene Ontology analysis was congruent with the annotated cell type and the corollary regulatory network analysis showed upstream transcription factors that have well supported literature evidences. The source code is available as an R script upon request.

5.
Genomics & Informatics ; : 128-135, 2017.
Article in English | WPRIM | ID: wpr-192020

ABSTRACT

As next-generation sequencing technologies have advanced, enormous amounts of whole-genome sequence information in various species have been released. However, it is still difficult to assemble the whole genome precisely, due to inherent limitations of short-read sequencing technologies. In particular, the complexities of plants are incomparable to those of microorganisms or animals because of whole-genome duplications, repeat insertions, and Numt insertions, etc. In this study, we describe a new method for detecting misassembly sequence regions of Brassica rapa with genotyping-by-sequencing, followed by MadMapper clustering. The misassembly candidate regions were cross-checked with BAC clone paired-ends library sequences that have been mapped to the reference genome. The results were further verified with gene synteny relations between Brassica rapa and Arabidopsis thaliana. We conclude that this method will help detect misassembly regions and be applicable to incompletely assembled reference genomes from a variety of species.


Subject(s)
Animals , Arabidopsis , Brassica rapa , Clone Cells , Genome , Methods , Synteny
6.
Genomics & Informatics ; : 178-182, 2017.
Article in English | WPRIM | ID: wpr-192013

ABSTRACT

Next-generation sequencing (NGS) technology has become a trend in the genomics research area. There are many software programs and automated pipelines to analyze NGS data, which can ease the pain for traditional scientists who are not familiar with computer programming. However, downstream analyses, such as finding differentially expressed genes or visualizing linkage disequilibrium maps and genome-wide association study (GWAS) data, still remain a challenge. Here, we introduce a dockerized web application written in R using the Shiny platform to visualize pre-analyzed RNA sequencing and GWAS data. In addition, we have integrated a genome browser based on the JBrowse platform and an automated intermediate parsing process required for custom track construction, so that users can easily build and navigate their personal genome tracks with in-house datasets. This application will help scientists perform series of downstream analyses and obtain a more integrative understanding about various types of genomic data by interactively visualizing them with customizable options.


Subject(s)
Humans , Dataset , Genome , Genome-Wide Association Study , Genomics , Linkage Disequilibrium , Sequence Analysis, RNA
7.
Genomics & Informatics ; : 29-33, 2016.
Article in English | WPRIM | ID: wpr-193407

ABSTRACT

A retron is a bacterial retroelement that encodes an RNA gene and a reverse transcriptase (RT). The former, once transcribed, works as a template primer for reverse transcription by the latter. The resulting DNA is covalently linked to the upstream part of the RNA; this chimera is called multicopy single-stranded DNA (msDNA), which is extrachromosomal DNA found in many bacterial species. Based on the conserved features in the eight known msDNA sequences, we developed a detection method and applied it to scan National Center for Biotechnology Information (NCBI) RefSeq bacterial genome sequences. Among 16,844 bacterial sequences possessing a retron-type RT domain, we identified 48 unique types of msDNA. Currently, the biological role of msDNA is not well understood. Our work will be a useful tool in studying the distribution, evolution, and physiological role of msDNA.


Subject(s)
Biotechnology , Chimera , DNA , DNA, Single-Stranded , Genome, Bacterial , Retroelements , Reverse Transcription , RNA , RNA-Directed DNA Polymerase
8.
Genomics & Informatics ; : 90-95, 2016.
Article in English | WPRIM | ID: wpr-117342

ABSTRACT

Nuclear mitochondrial DNA segment (Numt) insertion describes a well-known phenomenon of mitochondrial DNA transfer into a eukaryotic nuclear genome. However, it has not been well understood, especially in plants. Numt insertion patterns vary from species to species in different kingdoms. In this study, the patterns were surveyed in nine plant species, and we found some tip-offs. First, when the mitochondrial genome size is relatively large, the portion of the longer Numt is also larger than the short one. Second, the whole genome duplication event increases the ratio of the shorter Numt portion in the size distribution. Third, Numt insertions are enriched in exon regions. This analysis may be helpful for understanding plant evolution.


Subject(s)
DNA, Mitochondrial , Exons , Genome , Genome, Mitochondrial , Plants
9.
Genomics & Informatics ; : 211-215, 2016.
Article in English | WPRIM | ID: wpr-172198

ABSTRACT

The alteration of alternative splicing patterns has an effect on the quantification of functional proteins, leading to phenotype variation. The splicing quantitative trait locus (sQTL) is one of the main genetic elements affecting splicing patterns. Here, we report the results of genome-wide sQTLs across 141 strains of Arabidopsis thaliana with publicly available next generation sequencing datasets. As a result, we found 1,694 candidate sQTLs in Arabidopsis thaliana at a false discovery rate of 0.01. Furthermore, among the candidate sQTLs, we found 25 sQTLs that overlapped with the list of previously examined trait-associated single nucleotide polymorphisms (SNPs). In summary, this sQTL analysis provides new insight into genetic elements affecting alternative splicing patterns in Arabidopsis thaliana and the mechanism of previously reported trait-associated SNPs.


Subject(s)
Alternative Splicing , Arabidopsis , Dataset , Phenotype , Polymorphism, Single Nucleotide , Quantitative Trait Loci
10.
Genomics & Informatics ; : 76-80, 2015.
Article in English | WPRIM | ID: wpr-216095

ABSTRACT

Type 2 diabetes mellitus is a complex metabolic disorder associated with multiple genetic, developmental and environmental factors. The recent advances in gene expression microarray technologies as well as network-based analysis methodologies provide groundbreaking opportunities to study type 2 diabetes mellitus. In the present study, we used previously published gene expression microarray datasets of human skeletal muscle samples collected from 20 insulin sensitive individuals before and after insulin treatment in order to construct insulin-mediated regulatory network. Based on a motif discovery method implemented by iRegulon, a Cytoscape app, we identified 25 candidate regulons, motifs of which were enriched among the promoters of 478 up-regulated genes and 82 down-regulated genes. We then looked for a hierarchical network of the candidate regulators, in such a way that the conditional combination of their expression changes may explain those of their target genes. Using Genomica, a software tool for regulatory network construction, we obtained a hierarchical network of eight regulons that were used to map insulin downstream signaling network. Taken together, the results illustrate the benefits of combining completely different methods such as motif-based regulatory factor discovery and expression level-based construction of regulatory network of their target genes in understanding insulin induced biological processes and signaling pathways.


Subject(s)
Humans , Biological Phenomena , Dataset , Diabetes Mellitus, Type 2 , Gene Expression , Insulin , Methods , Muscle, Skeletal , Regulon , Transcription Factors
11.
Genomics & Informatics ; : 165-170, 2014.
Article in English | WPRIM | ID: wpr-61847

ABSTRACT

Genome-wide association (GWA) studies have found many important genetic variants that affect various traits. Since these studies are useful to investigate untyped but causal variants using linkage disequilibrium (LD), it would be useful to explore the haplotypes of single-nucleotide polymorphisms (SNPs) within the same LD block of significant associations based on high-density variants from population references. Here, we tried to make a haplotype catalog affecting body mass index (BMI) through an integrative analysis of previously published whole-genome next-generation sequencing (NGS) data of 7 representative Korean individuals and previously known Korean GWA signals. We selected 435 SNPs that were significantly associated with BMI from the GWA analysis and searched 53 LD ranges nearby those SNPs. With the NGS data, the haplotypes were phased within the LDs. A total of 44 possible haplotype blocks for Korean BMI were cataloged. Although the current result constitutes little data, this study provides new insights that may help to identify important haplotypes for traits and low variants nearby significant SNPs. Furthermore, we can build a more comprehensive catalog as a larger dataset becomes available.


Subject(s)
Body Mass Index , Dataset , Genome-Wide Association Study , Haplotypes , Korea , Linkage Disequilibrium , Polymorphism, Single Nucleotide
12.
Genomics & Informatics ; : 171-180, 2014.
Article in English | WPRIM | ID: wpr-61846

ABSTRACT

Aberrant DNA methylation, as an epigenetic marker of cancer, influences tumor development and progression. We downloaded publicly available DNA methylation and gene expression datasets of matched cancer and normal pairs from the Cancer Genome Atlas Data Portal and performed a systematic computational analysis. This study has three aims to screen genes that show hypermethylation and downregulated patterns in colorectal cancers, to identify differentially methylated regions in one of these genes, SFRP1, and to test whether the SFRP genes affect survival or not. Our results show that 31 hypermethylated genes had a negative correlation with gene expression. Among them, SFRP1 had a differentially methylated pattern at each methylation site. We also show that SFRP1 may be a potential biomarker for colorectal cancer survival.


Subject(s)
Colorectal Neoplasms , Computer Simulation , Dataset , DNA Methylation , Epigenomics , Gene Expression , Genome , Methylation , Survival Analysis
13.
Genomics & Informatics ; : 101-101, 2013.
Article in English | WPRIM | ID: wpr-58528

ABSTRACT

No abstract available.

14.
Genomics & Informatics ; : 135-141, 2013.
Article in English | WPRIM | ID: wpr-58523

ABSTRACT

Gene set analysis is a powerful tool for interpreting a genome-wide association study result and is gaining popularity these days. Comparison of the gene sets obtained for a variety of traits measured from a single genetic epidemiology dataset may give insights into the biological mechanisms underlying these traits. Based on the previously published single nucleotide polymorphism (SNP) genotype data on 8,842 individuals enrolled in the Korea Association Resource project, we performed a series of systematic genome-wide association analyses for 49 quantitative traits of basic epidemiological, anthropometric, or blood chemistry parameters. Each analysis result was subjected to subsequent gene set analyses based on Gene Ontology (GO) terms using gene set analysis software, GSA-SNP, identifying a set of GO terms significantly associated to each trait (pcorr < 0.05). Pairwise comparison of the traits in terms of the semantic similarity in their GO sets revealed surprising cases where phenotypically uncorrelated traits showed high similarity in terms of biological pathways. For example, the pH level was related to 7 other traits that showed low phenotypic correlations with it. A literature survey implies that these traits may be regulated partly by common pathways that involve neuronal or nerve systems.


Subject(s)
Genome-Wide Association Study , Genotype , Hydrogen-Ion Concentration , Korea , Molecular Epidemiology , Neurons , Polymorphism, Single Nucleotide , Semantics
15.
Genomics & Informatics ; : 163-163, 2013.
Article in English | WPRIM | ID: wpr-11260

ABSTRACT

No abstract available.

16.
Genomics & Informatics ; : 1-1, 2013.
Article in English | WPRIM | ID: wpr-177972

ABSTRACT

No abstract available.

17.
Genomics & Informatics ; : 59-59, 2013.
Article in English | WPRIM | ID: wpr-164845

ABSTRACT

No abstract available.

18.
Genomics & Informatics ; : 123-127, 2012.
Article in English | WPRIM | ID: wpr-57571

ABSTRACT

Gene set analysis (GSA) is useful in interpreting a genome-wide association study (GWAS) result in terms of biological mechanism. We compared the performance of two different GSA implementations that accept GWAS p-values of single nucleotide polymorphisms (SNPs) or gene-by-gene summaries thereof, GSA-SNP and i-GSEA4GWAS, under the same settings of inputs and parameters. GSA runs were made with two sets of p-values from a Korean type 2 diabetes mellitus GWAS study: 259,188 and 1,152,947 SNPs of the original and imputed genotype datasets, respectively. When Gene Ontology terms were used as gene sets, i-GSEA4GWAS produced 283 and 1,070 hits for the unimputed and imputed datasets, respectively. On the other hand, GSA-SNP reported 94 and 38 hits, respectively, for both datasets. Similar, but to a lesser degree, trends were observed with Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets as well. The huge number of hits by i-GSEA4GWAS for the imputed dataset was probably an artifact due to the scaling step in the algorithm. The decrease in hits by GSA-SNP for the imputed dataset may be due to the fact that it relies on Z-statistics, which is sensitive to variations in the background level of associations. Judicious evaluation of the GSA outcomes, perhaps based on multiple programs, is recommended.


Subject(s)
Artifacts , Diabetes Mellitus, Type 2 , Genome , Genome-Wide Association Study , Genotype , Hand , Polymorphism, Single Nucleotide
19.
Genomics & Informatics ; : 213-213, 2012.
Article in English | WPRIM | ID: wpr-11763

ABSTRACT

No abstract available.

20.
Genomics & Informatics ; : 234-238, 2012.
Article in English | WPRIM | ID: wpr-11759

ABSTRACT

Genetic epidemiology studies have established that the natural variation of gene expression profiles is heritable and has genetic bases. A number of proximal and remote DNA variations, known as expression quantitative trait loci (eQTLs), that are associated with the expression phenotypes have been identified, first in Epstein-Barr virus-transformed lymphoblastoid cell lines and later expanded to other cell and tissue types. Integration of the eQTL information and the network analysis of transcription modules may lead to a better understanding of gene expression regulation. As these network modules have relevance to biological or disease pathways, these findings may be useful in predicting disease susceptibility.


Subject(s)
Cell Line , Disease Susceptibility , DNA , Gene Expression Regulation , Metagenomics , Molecular Epidemiology , Phenotype , Quantitative Trait Loci , Transcriptome
SELECTION OF CITATIONS
SEARCH DETAIL