Search | VHL Search Portal

1.

Surveying mutation density patterns around specific genomic features.

Yu, Hui; Ness, Scott; Li, Chung-I; Bai, Yongsheng; Mao, Peng; Guo, Yan.

Genome Res ; 32(10): 1930-1940, 2022 10.

Article in English | MEDLINE | ID: mdl-36100435

ABSTRACT

Mutation density patterns reveal unique biological properties of specific genomic regions and shed light on the mechanisms of carcinogenesis. Although previous studies reported insightful mutation density patterns associated with certain genomic regions such as transcription start sites and DNA replication origins, a tool that can systematically investigate mutational spatial patterns is still lacking. Thus, we developed MutDens, a bioinformatic tool for comprehensive analysis of mutation density patterns around genomic features, namely, genomic positions, in humans and model species. By scanning the bidirectional vicinity regions of given positions, MutDens systematically characterizes the mutation density for single-base substitution mutational classes after adjusting for total mutation burden and local nucleotide proportion. Analysis results using MutDens not only verified the previously reported transcriptional strand bias around transcription start sites and replicative strand bias around DNA replication origins, but also identified novel mutation density patterns around other genomics features, such as enhancers and retrotransposon insertion polymorphism sites. To our knowledge, MutDens is the first tool that systematically calculates, examines, and compares mutation density patterns, thus providing a valuable avenue for investigating the mutational landscapes associated with important genomic features.

Subject(s)

Genomics , Replication Origin , Humans , Mutation , Transcription Initiation Site , DNA

2.

Somatic mutation effects diffused over microRNA dysregulation.

Yu, Hui; Jiang, Limin; Li, Chung-I; Ness, Scott; Piccirillo, Sara G M; Guo, Yan.

Bioinformatics ; 39(9)2023 09 02.

Article in English | MEDLINE | ID: mdl-37624931

ABSTRACT

MOTIVATION: As an important player in transcriptome regulation, microRNAs may effectively diffuse somatic mutation impacts to broad cellular processes and ultimately manifest disease and dictate prognosis. Previous studies that tried to correlate mutation with gene expression dysregulation neglected to adjust for the disparate multitudes of false positives associated with unequal sample sizes and uneven class balancing scenarios. RESULTS: To properly address this issue, we developed a statistical framework to rigorously assess the extent of mutation impact on microRNAs in relation to a permutation-based null distribution of a matching sample structure. Carrying out the framework in a pan-cancer study, we ascertained 9008 protein-coding genes with statistically significant mutation impacts on miRNAs. Of these, the collective miRNA expression for 83 genes showed significant prognostic power in nine cancer types. For example, in lower-grade glioma, 10 genes' mutations broadly impacted miRNAs, all of which showed prognostic value with the corresponding miRNA expression. Our framework was further validated with functional analysis and augmented with rich features including the ability to analyze miRNA isoforms; aggregative prognostic analysis; advanced annotations such as mutation type, regulator alteration, somatic motif, and disease association; and instructive visualization such as mutation OncoPrint, Ideogram, and interactive mRNA-miRNA network. AVAILABILITY AND IMPLEMENTATION: The data underlying this article are available in MutMix, at http://innovebioinfo.com/Database/TmiEx/MutMix.php.

Subject(s)

Glioma , MicroRNAs , Humans , Diffusion , MicroRNAs/genetics , Mutation , RNA, Messenger

3.

Controlling the confounding effect of metabolic gene expression to identify actual metabolite targets in microsatellite instability cancers.

Li, Chung-I; Yeh, Yu-Min; Tsai, Yi-Shan; Huang, Tzu-Hsuan; Shen, Meng-Ru; Lin, Peng-Chan.

Hum Genomics ; 17(1): 18, 2023 03 06.

Article in English | MEDLINE | ID: mdl-36879264

ABSTRACT

BACKGROUND: The metabolome is the best representation of cancer phenotypes. Gene expression can be considered a confounding covariate affecting metabolite levels. Data integration across metabolomics and genomics to establish the biological relevance of cancer metabolism is challenging. This study aimed to eliminate the confounding effect of metabolic gene expression to reflect actual metabolite levels in microsatellite instability (MSI) cancers. METHODS: In this study, we propose a new strategy using covariate-adjusted tensor classification in high dimensions (CATCH) models to integrate metabolite and metabolic gene expression data to classify MSI and microsatellite stability (MSS) cancers. We used datasets from the Cancer Cell Line Encyclopedia (CCLE) phase II project and treated metabolomic data as tensor predictors and data on gene expression of metabolic enzymes as confounding covariates. RESULTS: The CATCH model performed well, with high accuracy (0.82), sensitivity (0.66), specificity (0.88), precision (0.65), and F1 score (0.65). Seven metabolite features adjusted for metabolic gene expression, namely, 3-phosphoglycerate, 6-phosphogluconate, cholesterol ester, lysophosphatidylethanolamine (LPE), phosphatidylcholine, reduced glutathione, and sarcosine, were found in MSI cancers. Only one metabolite, Hippurate, was present in MSS cancers. The gene expression of phosphofructokinase 1 (PFKP), which is involved in the glycolytic pathway, was related to 3-phosphoglycerate. ALDH4A1 and GPT2 were associated with sarcosine. LPE was associated with the expression of CHPT1, which is involved in lipid metabolism. The glycolysis, nucleotide, glutamate, and lipid metabolic pathways were enriched in MSI cancers. CONCLUSIONS: We propose an effective CATCH model for predicting MSI cancer status. By controlling the confounding effect of metabolic gene expression, we identified cancer metabolic biomarkers and therapeutic targets. In addition, we provided the possible biology and genetics of MSI cancer metabolism.

Subject(s)

Microsatellite Instability , Neoplasms , Humans , Sarcosine , Glyceric Acids , Neoplasms/genetics , Biomarkers, Tumor/genetics , Gene Expression

4.

Deep neural network based tissue deconvolution of circulating tumor cell RNA.

Yan, Fengyao; Jiang, Limin; Ye, Fei; Ping, Jie; Bowley, Tetiana Y; Ness, Scott A; Li, Chung-I; Marchetti, Dario; Tang, Jijun; Guo, Yan.

J Transl Med ; 21(1): 783, 2023 11 04.

Article in English | MEDLINE | ID: mdl-37925448

ABSTRACT

Prior research has shown that the deconvolution of cell-free RNA can uncover the tissue origin. The conventional deconvolution approaches rely on constructing a reference tissue-specific gene panel, which cannot capture the inherent variation present in actual data. To address this, we have developed a novel method that utilizes a neural network framework to leverage the entire training dataset. Our approach involved training a model that incorporated 15 distinct tissue types. Through one semi-independent and two complete independent validations, including deconvolution using a semi in silico dataset, deconvolution with a custom normal tissue mixture RNA-seq data, and deconvolution of longitudinal circulating tumor cell RNA-seq (ctcRNA) data from a cancer patient with metastatic tumors, we demonstrate the efficacy and advantages of the deep-learning approach which were exerted by effectively capturing the inherent variability present in the dataset, thus leading to enhanced accuracy. Sensitivity analyses reveal that neural network models are less susceptible to the presence of missing data, making them more suitable for real-world applications. Moreover, by leveraging the concept of organotropism, we applied our approach to trace the migration of circulating tumor cell-derived RNA (ctcRNA) in a cancer patient with metastatic tumors, thereby highlighting the potential clinical significance of early detection of cancer metastasis.

Subject(s)

Neoplastic Cells, Circulating , RNA , Humans , Neural Networks, Computer , RNA-Seq , Sequence Analysis, RNA

5.

OrchidBase 5.0: updates of the orchid genome knowledgebase.

Chen, You-Yi; Li, Chung-I; Hsiao, Yu-Yun; Ho, Sau-Yee; Zhang, Zhe-Bin; Liao, Chien-Chi; Lee, Bing-Ru; Lin, Shao-Ting; Wu, Wan-Lin; Wang, Jeen-Shing; Zhang, Diyang; Liu, Ke-Wei; Liu, Ding-Kun; Zhao, Xue-Wei; Li, Yuan-Yuan; Ke, Shi-Jie; Zhou, Zhuang; Huang, Ming-Zhong; Wu, Yong-Shu; Peng, Dong-Hui; Lan, Si-Ren; Chen, Hong-Hwa; Liu, Zhong-Jian; Wu, Wei-Sheng; Tsai, Wen-Chieh.

BMC Plant Biol ; 22(1): 557, 2022 Dec 02.

Article in English | MEDLINE | ID: mdl-36456919

ABSTRACT

Containing the largest number of species, the orchid family provides not only materials for studying plant evolution and environmental adaptation, but economically and culturally important ornamental plants for human society. Previously, we collected genome and transcriptome information of Dendrobium catenatum, Phalaenopsis equestris, and Apostasia shenzhenica which belong to two different subfamilies of Orchidaceae, and developed user-friendly tools to explore the orchid genetic sequences in the OrchidBase 4.0. The OrchidBase 4.0 offers the opportunity for plant science community to compare orchid genomes and transcriptomes and retrieve orchid sequences for further study.In the year 2022, two whole-genome sequences of Orchidoideae species, Platanthera zijinensis and Platanthera guangdongensis, were de novo sequenced, assembled and analyzed. In addition, systemic transcriptomes from these two species were also established. Therefore, we included these datasets to develop the new version of OrchidBase 5.0. In addition, three new functions including synteny, gene order, and miRNA information were also developed for orchid genome comparisons and miRNA characterization.OrchidBase 5.0 extended the genetic information to three orchid subfamilies (including five orchid species) and provided new tools for orchid researchers to analyze orchid genomes and transcriptomes. The online resources can be accessed at https://cosbi.ee.ncku.edu.tw/orchidbase5/.

Subject(s)

MicroRNAs , Orchidaceae , Gene Order , Knowledge Bases , MicroRNAs/genetics , Orchidaceae/genetics , Synteny

6.

Validation of a Deep Learning-based Automatic Detection Algorithm for Measurement of Endotracheal Tube-to-Carina Distance on Chest Radiographs.

Huang, Min-Hsin; Chen, Chi-Yeh; Horng, Ming-Huwi; Li, Chung-I; Hsu, I-Lin; Su, Che-Min; Sun, Yung-Nien; Lai, Chao-Han.

Anesthesiology ; 137(6): 704-715, 2022 12 01.

Article in English | MEDLINE | ID: mdl-36129686

ABSTRACT

BACKGROUND: Improper endotracheal tube (ETT) positioning is frequently observed and potentially hazardous in the intensive care unit. The authors developed a deep learning-based automatic detection algorithm detecting the ETT tip and carina on portable supine chest radiographs to measure the ETT-carina distance. This study investigated the hypothesis that the algorithm might be more accurate than frontline critical care clinicians in ETT tip detection, carina detection, and ETT-carina distance measurement. METHODS: A deep learning-based automatic detection algorithm was developed using 1,842 portable supine chest radiographs of 1,842 adult intubated patients, where two board-certified intensivists worked together to annotate the distal ETT end and tracheal bifurcation. The performance of the deep learning-based algorithm was assessed in 4-fold cross-validation (1,842 radiographs), external validation (216 radiographs), and an observer performance test (462 radiographs) involving 11 critical care clinicians. The performance metrics included the errors from the ground truth in ETT tip detection, carina detection, and ETT-carina distance measurement. RESULTS: During 4-fold cross-validation and external validation, the median errors (interquartile range) of the algorithm in ETT-carina distance measurement were 3.9 (1.8 to 7.1) mm and 4.2 (1.7 to 7.8) mm, respectively. During the observer performance test, the median errors (interquartile range) of the algorithm were 2.6 (1.6 to 4.8) mm, 3.6 (2.1 to 5.9) mm, and 4.0 (1.7 to 7.2) mm in ETT tip detection, carina detection, and ETT-carina distance measurement, significantly superior to that of 6, 10, and 7 clinicians (all P < 0.05), respectively. The algorithm outperformed 7, 3, and 0, 9, 6, and 4, and 5, 5, and 3 clinicians (all P < 0.005) regarding the proportions of chest radiographs within 5 mm, 10 mm, and 15 mm error in ETT tip detection, carina detection, and ETT-carina distance measurement, respectively. No clinician was significantly more accurate than the algorithm in any comparison. CONCLUSIONS: A deep learning-based algorithm can match or even outperform frontline critical care clinicians in ETT tip detection, carina detection, and ETT-carina distance measurement.

Subject(s)

Deep Learning , Adult , Humans , Trachea , Intubation, Intratracheal , Radiography , Mediastinum

7.

OrchidBase 4.0: a database for orchid genomics and molecular biology.

Hsiao, Yu-Yun; Fu, Chih-Hsiung; Ho, Sau-Yee; Li, Chung-I; Chen, You-Yi; Wu, Wan-Lin; Wang, Jeen-Shing; Zhang, Di-Yang; Hu, Wen-Qi; Yu, Xia; Sun, Wei-Hong; Zhou, Zhuang; Liu, Ke-Wei; Huang, Laiqiang; Lan, Si-Ren; Chen, Hong-Hwa; Wu, Wei-Sheng; Liu, Zhong-Jian; Tsai, Wen-Chieh.

BMC Plant Biol ; 21(1): 371, 2021 Aug 12.

Article in English | MEDLINE | ID: mdl-34384382

ABSTRACT

BACKGROUND: The Orchid family is the largest families of the monocotyledons and an economically important ornamental plant worldwide. Given the pivotal role of this plant to humans, botanical researchers and breeding communities should have access to valuable genomic and transcriptomic information of this plant. Previously, we established OrchidBase, which contains expressed sequence tags (ESTs) from different tissues and developmental stages of Phalaenopsis as well as biotic and abiotic stress-treated Phalaenopsis. The database includes floral transcriptomic sequences from 10 orchid species across all the five subfamilies of Orchidaceae. DESCRIPTION: Recently, the whole-genome sequences of Apostasia shenzhenica, Dendrobium catenatum, and Phalaenopsis equestris were de novo assembled and analyzed. These datasets were used to develop OrchidBase 4.0, including genomic and transcriptomic data for these three orchid species. OrchidBase 4.0 offers information for gene annotation, gene expression with fragments per kilobase of transcript per millions mapped reads (FPKM), KEGG pathways and BLAST search. In addition, assembled genome sequences and location of genes and miRNAs could be visualized by the genome browser. The online resources in OrchidBase 4.0 can be accessed by browsing or using BLAST. Users can also download the assembled scaffold sequences and the predicted gene and protein sequences of these three orchid species. CONCLUSIONS: OrchidBase 4.0 is the first database that contain the whole-genome sequences and annotations of multiple orchid species. OrchidBase 4.0 is available at http://orchidbase.itps.ncku.edu.tw/.

Subject(s)

Databases, Genetic , Orchidaceae/genetics , Genome, Plant

8.

The ancestral duplicated DL/CRC orthologs, PeDL1 and PeDL2, function in orchid reproductive organ innovation.

Chen, You-Yi; Hsiao, Yu-Yun; Li, Chung-I; Yeh, Chuan-Ming; Mitsuda, Nobutaka; Yang, Hong-Xing; Chiu, Chi-Chou; Chang, Song-Bin; Liu, Zhong-Jian; Tsai, Wen-Chieh.

J Exp Bot ; 72(15): 5442-5461, 2021 07 28.

Article in English | MEDLINE | ID: mdl-33963755

ABSTRACT

Orchid gynostemium, the fused organ of the androecium and gynoecium, and ovule development are unique developmental processes. Two DROOPING LEAF/CRABS CLAW (DL/CRC) genes, PeDL1 and PeDL2, were identified from the Phalaenopsis orchid genome and functionally characterized. Phylogenetic analysis indicated that the most recent common ancestor of orchids contained the duplicated DL/CRC-like genes. Temporal and spatial expression analysis indicated that PeDL genes are specifically expressed in the gynostemium and at the early stages of ovule development. Both PeDLs could partially complement an Arabidopsis crc-1 mutant. Virus-induced gene silencing (VIGS) of PeDL1 and PeDL2 affected the number of protuberant ovule initials differentiated from the placenta. Transient overexpression of PeDL1 in Phalaenopsis orchids caused abnormal development of ovule and stigmatic cavity of gynostemium. PeDL1, but not PeDL2, could form a heterodimer with Phalaenopsis equestris CINCINNATA 8 (PeCIN8). Paralogous retention and subsequent divergence of the gene sequences of PeDL1 and PeDL2 in P. equestris might result in the differentiation of function and protein behaviors. These results reveal that the ancestral duplicated DL/CRC-like genes play important roles in orchid reproductive organ innovation.

Subject(s)

Gene Expression Regulation, Plant , Orchidaceae , Genitalia/metabolism , Orchidaceae/genetics , Orchidaceae/metabolism , Phylogeny , Plant Proteins/genetics , Plant Proteins/metabolism

9.

Power and sample size calculations for high-throughput sequencing-based experiments.

Li, Chung-I; Samuels, David C; Zhao, Ying-Yong; Shyr, Yu; Guo, Yan.

Brief Bioinform ; 19(6): 1247-1255, 2018 11 27.

Article in English | MEDLINE | ID: mdl-28605403

ABSTRACT

Power/sample size (power) analysis estimates the likelihood of successfully finding the statistical significance in a data set. There has been a growing recognition of the importance of power analysis in the proper design of experiments. Power analysis is complex, yet necessary for the success of large studies. It is important to design a study that produces statistically accurate and reliable results. Power computation methods have been well established for both microarray-based gene expression studies and genotyping microarray-based genome-wide association studies. High-throughput sequencing (HTS) has greatly enhanced our ability to conduct biomedical studies at the highest possible resolution (per nucleotide). However, the complexity of power computations is much greater for sequencing data than for the simpler genotyping array data. Research on methods of power computations for HTS-based studies has been recently conducted but is not yet well known or widely used. In this article, we describe the power computation methods that are currently available for a range of HTS-based studies, including DNA sequencing, RNA-sequencing, microbiome sequencing and chromatin immunoprecipitation sequencing. Most importantly, we review the methods of power analysis for several types of sequencing data and guide the reader to the relevant methods for each data type.

Subject(s)

High-Throughput Nucleotide Sequencing/methods , Chromatin Immunoprecipitation , Genome-Wide Association Study , Heterozygote , Humans , Microbiota , Mutation , Poisson Distribution , Sequence Analysis, RNA/methods

10.

Identification of active miRNA promoters from nuclear run-on RNA sequencing.

Liu, Qi; Wang, Jing; Zhao, Yue; Li, Chung-I; Stengel, Kristy R; Acharya, Pankaj; Johnston, Gretchen; Hiebert, Scott W; Shyr, Yu.

Nucleic Acids Res ; 45(13): e121, 2017 Jul 27.

Article in English | MEDLINE | ID: mdl-28460090

ABSTRACT

The genome-wide identification of microRNA transcription start sites (miRNA TSSs) is essential for understanding how miRNAs are regulated in development and disease. In this study, we developed mirSTP (mirna transcription Start sites Tracking Program), a probabilistic model for identifying active miRNA TSSs from nascent transcriptomes generated by global run-on sequencing (GRO-seq) and precision run-on sequencing (PRO-seq). MirSTP takes advantage of characteristic bidirectional transcription signatures at active TSSs in GRO/PRO-seq data, and provides accurate TSS prediction for human intergenic miRNAs at a high resolution. MirSTP performed better than existing generalized and experiment specific methods, in terms of the enrichment of various promoter-associated marks. MirSTP analysis of 27 human cell lines in 183 GRO-seq and 28 PRO-seq experiments identified TSSs for 480 intergenic miRNAs, indicating a wide usage of alternative TSSs. By integrating predicted miRNA TSSs with matched ENCODE transcription factor (TF) ChIP-seq data, we connected miRNAs into the transcriptional circuitry, which provides a valuable source for understanding the complex interplay between TF and miRNA. With mirSTP, we not only predicted TSSs for 72 miRNAs, but also identified 12 primary miRNAs with significant RNA polymerase pausing alterations after JQ1 treatment; each miRNA was further validated through BRD4 binding to its predicted promoter. MirSTP is available at http://bioinfo.vanderbilt.edu/mirSTP/.

Subject(s)

MicroRNAs/genetics , Promoter Regions, Genetic , Sequence Analysis, RNA/methods , Algorithms , Cell Line , DNA, Intergenic/genetics , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/statistics & numerical data , Humans , MicroRNAs/metabolism , Models, Statistical , RNA, Nuclear/genetics , RNA, Nuclear/metabolism , Sequence Analysis, RNA/statistics & numerical data , Software , Transcription Initiation Site

11.

RnaSeqSampleSize: real data based sample size estimation for RNA sequencing.

Zhao, Shilin; Li, Chung-I; Guo, Yan; Sheng, Quanhu; Shyr, Yu.

BMC Bioinformatics ; 19(1): 191, 2018 05 30.

Article in English | MEDLINE | ID: mdl-29843589

ABSTRACT

BACKGROUND: One of the most important and often neglected components of a successful RNA sequencing (RNA-Seq) experiment is sample size estimation. A few negative binomial model-based methods have been developed to estimate sample size based on the parameters of a single gene. However, thousands of genes are quantified and tested for differential expression simultaneously in RNA-Seq experiments. Thus, additional issues should be carefully addressed, including the false discovery rate for multiple statistic tests, widely distributed read counts and dispersions for different genes. RESULTS: To solve these issues, we developed a sample size and power estimation method named RnaSeqSampleSize, based on the distributions of gene average read counts and dispersions estimated from real RNA-seq data. Datasets from previous, similar experiments such as the Cancer Genome Atlas (TCGA) can be used as a point of reference. Read counts and their dispersions were estimated from the reference's distribution; using that information, we estimated and summarized the power and sample size. RnaSeqSampleSize is implemented in R language and can be installed from Bioconductor website. A user friendly web graphic interface is provided at http://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/ . CONCLUSIONS: RnaSeqSampleSize provides a convenient and powerful way for power and sample size estimation for an RNAseq experiment. It is also equipped with several unique features, including estimation for interested genes or pathway, power curve visualization, and parameter optimization.

Subject(s)

Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , High-Throughput Nucleotide Sequencing , Models, Statistical , Sample Size , Software

12.

Bivariate Poisson models with varying offsets: an application to the paired mitochondrial DNA dataset.

Su, Pei-Fang; Mau, Yu-Lin; Guo, Yan; Li, Chung-I; Liu, Qi; Boice, John D; Shyr, Yu.

Stat Appl Genet Mol Biol ; 16(1): 47-58, 2017 03 01.

Article in English | MEDLINE | ID: mdl-28248637

ABSTRACT

To assess the effect of chemotherapy on mitochondrial genome mutations in cancer survivors and their offspring, a study sequenced the full mitochondrial genome and determined the mitochondrial DNA heteroplasmic (mtDNA) mutation rate. To build a model for counts of heteroplasmic mutations in mothers and their offspring, bivariate Poisson regression was used to examine the relationship between mutation count and clinical information while accounting for the paired correlation. However, if the sequencing depth is not adequate, a limited fraction of the mtDNA will be available for variant calling. The classical bivariate Poisson regression model treats the offset term as equal within pairs; thus, it cannot be applied directly. In this research, we propose an extended bivariate Poisson regression model that has a more general offset term to adjust the length of the accessible genome for each observation. We evaluate the performance of the proposed method with comprehensive simulations, and the results show that the regression model provides unbiased parameter estimations. The use of the model is also demonstrated using the paired mtDNA dataset.

Subject(s)

DNA, Mitochondrial/genetics , Models, Biological , Antineoplastic Agents/pharmacology , Base Sequence , Cancer Survivors , Computer Simulation , DNA, Mitochondrial/drug effects , Databases, Nucleic Acid , Genome, Mitochondrial/genetics , Humans , Mutation Rate , Regression Analysis

13.

Sample size calculation based on generalized linear models for differential expression analysis in RNA-seq data.

Li, Chung-I; Shyr, Yu.

Stat Appl Genet Mol Biol ; 15(6): 491-505, 2016 12 01.

Article in English | MEDLINE | ID: mdl-27866174

ABSTRACT

As RNA-seq rapidly develops and costs continually decrease, the quantity and frequency of samples being sequenced will grow exponentially. With proteomic investigations becoming more multivariate and quantitative, determining a study's optimal sample size is now a vital step in experimental design. Current methods for calculating a study's required sample size are mostly based on the hypothesis testing framework, which assumes each gene count can be modeled through Poisson or negative binomial distributions; however, these methods are limited when it comes to accommodating covariates. To address this limitation, we propose an estimating procedure based on the generalized linear model. This easy-to-use method constructs a representative exemplary dataset and estimates the conditional power, all without requiring complicated mathematical approximations or formulas. Even more attractive, the downstream analysis can be performed with current R/Bioconductor packages. To demonstrate the practicability and efficiency of this method, we apply it to three real-world studies, and introduce our on-line calculator developed to determine the optimal sample size for a RNA-seq study.

Subject(s)

Gene Expression Profiling/statistics & numerical data , Sequence Analysis, RNA/methods , Linear Models , Proteomics , RNA/chemistry , Sample Size

14.

Practicability of detecting somatic point mutation from RNA high throughput sequencing data.

Sheng, Quanhu; Zhao, Shilin; Li, Chung-I; Shyr, Yu; Guo, Yan.

Genomics ; 107(5): 163-9, 2016 05.

Article in English | MEDLINE | ID: mdl-27046520

ABSTRACT

Traditionally, somatic mutations are detected by examining DNA sequence. The maturity of sequencing technology has allowed researchers to screen for somatic mutations in the whole genome. Increasingly, researchers have become interested in identifying somatic mutations through RNAseq data. With this motivation, we evaluated the practicability of detecting somatic mutations from RNAseq data. Current somatic mutation calling tools were designed for DNA sequencing data. To increase performance on RNAseq data, we developed a somatic mutation caller GLMVC based on bias reduced generalized linear model for both DNA and RNA sequencing data. Through comparison with MuTect and Varscan we showed that GLMVC performed better for somatic mutation detection using exome sequencing or RNAseq data. GLMVC is freely available for download at the following website: https://github.com/shengqh/GLMVC/wiki.

Subject(s)

High-Throughput Nucleotide Sequencing/methods , Point Mutation/genetics , Sequence Analysis, RNA/methods , Algorithms , Computational Biology , Exome/genetics , Genomics , Humans , Software

15.

Transfer RNA detection by small RNA deep sequencing and disease association with myelodysplastic syndromes.

Guo, Yan; Bosompem, Amma; Mohan, Sanjay; Erdogan, Begum; Ye, Fei; Vickers, Kasey C; Sheng, Quanhu; Zhao, Shilin; Li, Chung-I; Su, Pei-Fang; Jagasia, Madan; Strickland, Stephen A; Griffiths, Elizabeth A; Kim, Annette S.

BMC Genomics ; 16: 727, 2015 Sep 24.

Article in English | MEDLINE | ID: mdl-26400237

ABSTRACT

BACKGROUND: Although advances in sequencing technologies have popularized the use of microRNA (miRNA) sequencing (miRNA-seq) for the quantification of miRNA expression, questions remain concerning the optimal methodologies for analysis and utilization of the data. The construction of a miRNA sequencing library selects RNA by length rather than type. However, as we have previously described, miRNAs represent only a subset of the species obtained by size selection. Consequently, the libraries obtained for miRNA sequencing also contain a variety of additional species of small RNAs. This study looks at the prevalence of these other species obtained from bone marrow aspirate specimens and explores the predictive value of these small RNAs in the determination of response to therapy in myelodysplastic syndromes (MDS). METHODS: Paired pre and post treatment bone marrow aspirate specimens were obtained from patients with MDS who were treated with either azacytidine or decitabine (24 pre-treatment specimens, 23 post-treatment specimens) with 22 additional non-MDS control specimens. Total RNA was extracted from these specimens and submitted for next generation sequencing after an additional size exclusion step to enrich for small RNAs. The species of small RNAs were enumerated, single nucleotide variants (SNVs) identified, and finally the differential expression of tRNA-derived species (tDRs) in the specimens correlated with diseasestatus and response to therapy. RESULTS: Using miRNA sequencing data generated from bone marrow aspirate samples of patients with known MDS (N = 47) and controls (N = 23), we demonstrated that transfer RNA (tRNA) fragments (specifically tRNA halves, tRHs) are one of the most common species of small RNA isolated from size selection. Using tRNA expression values extracted from miRNA sequencing data, we identified six tRNA fragments that are differentially expressed between MDS and normal samples. Using the elastic net method, we identified four tRNAs-derived small RNAs (tDRs) that together can explain 67 % of the variation in treatment response for MDS patients. Similar analysis of specifically mitochondrial tDRs (mt-tDRs) identified 13 mt-tDRs which distinguished disease status in the samples and a single mt-tDR which predited response. Finally, 14 SNVs within the tDRs were found in at least 20 % of the MDS samples and were not observed in any of the control specimens. DISCUSSION: This study highlights the prevalence of tDRs in RNA-seq studies focused on small RNAs. The potential etiologies of these species, both technical and biologic, are discussed as well as important challenges in the interpretation of tDR data. CONCLUSIONS: Our analysis results suggest that tRNA fragments can be accurately detected through miRNA sequencing data and that the expression of these species may be useful in the diagnosis of MDS and the prediction of response to therapy.

Subject(s)

High-Throughput Nucleotide Sequencing/methods , Myelodysplastic Syndromes/genetics , RNA, Transfer/genetics , Aged , Base Sequence , Female , Gene Expression Regulation , Humans , Male , MicroRNAs/genetics , Myelodysplastic Syndromes/diagnosis , Myelodysplastic Syndromes/pathology , RNA, Transfer/isolation & purification

16.

MitoSeek: extracting mitochondria information and performing high-throughput mitochondria sequencing analysis.

Guo, Yan; Li, Jiang; Li, Chung-I; Shyr, Yu; Samuels, David C.

Bioinformatics ; 29(9): 1210-1, 2013 May 01.

Article in English | MEDLINE | ID: mdl-23471301

ABSTRACT

MOTIVATION: Exome capture kits have capture efficiencies that range from 40 to 60%. A significant amount of off-target reads are from the mitochondrial genome. These unintentionally sequenced mitochondrial reads provide unique opportunities to study the mitochondria genome. RESULTS: MitoSeek is an open-source software tool that can reliably and easily extract mitochondrial genome information from exome and whole genome sequencing data. MitoSeek evaluates mitochondrial genome alignment quality, estimates relative mitochondrial copy numbers and detects heteroplasmy, somatic mutation and structural variants of the mitochondrial genome. MitoSeek can be set up to run in parallel or serial on large exome sequencing datasets. AVAILABILITY: https://github.com/riverlee/MitoSeek

Subject(s)

DNA, Mitochondrial/chemistry , Genome, Mitochondrial , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Software , Chromosome Mapping , Data Mining , Exome , Sequence Alignment

17.

Sample size determination for paired right-censored data based on the difference of Kaplan-Meier estimates.

Su, Pei-Fang; Li, Chung-I; Shyr, Yu.

Comput Stat Data Anal ; 74: 39-51, 2014 Jun 01.

Article in English | MEDLINE | ID: mdl-24567661

ABSTRACT

Sample size determination is essential to planning clinical trials. Jung (2008) established a sample size calculation formula for paired right-censored data based on the logrank test, which has been well-studied for comparing independent survival outcomes. An alternative to rank-based methods for independent right-censored data, advocated by Pepe and Fleming (1989), tests for differences between integrated weighted Kaplan-Meier estimates and is more sensitive to the magnitude of difference in survival times between groups. In this paper, we employ the concept of the Pepe-Fleming method to determine an adequate sample size by calculating differences between Kaplan-Meier estimators considering pair-wise correlation. We specify a positive stable frailty model for the joint distribution of paired survival times. We evaluate the performance of the proposed method by simulation studies and investigate the impacts of the accrual times, follow-up times, loss to follow-up rate, and sensitivity of power under misspecification of the model. The results show that ignoring the pair-wise correlation results in overestimating the required sample size. Furthermore, the proposed method is applied to two real-world studies, and the R code for sample size calculation is made available to users.

18.

A computed tomography radiomics-based model for predicting osteoporosis after breast cancer treatment.

Lai, Yu-Hsuan; Tsai, Yi-Shan; Su, Pei-Fang; Li, Chung-I; Chen, Helen H W.

Phys Eng Sci Med ; 47(1): 239-248, 2024 Mar.

Article in English | MEDLINE | ID: mdl-38190012

ABSTRACT

Many treatments against breast cancer decrease the level of estrogen in blood, resulting in bone loss, osteoporosis and fragility fractures in breast cancer patients. This retrospective study aimed to evaluate a novel opportunistic screening for cancer treatment-induced bone loss (CTIBL) in breast cancer patients using CT radiomics. Between 2011 and 2021, a total of 412 female breast cancer patients who received treatment and were followed up in our institution, had post-treatment dual-energy X-ray absorptiometry (DXA) examination of the lumbar vertebrae and had post-treatment chest CT scan that encompassed the L1 vertebra, were included in this study. Results indicated that the T-score of L1 vertebra had a strongly positive correlation with the average T-score of L1-L4 vertebrae derived from DXA (r = 0.91, p < 0.05). On multivariable analysis, four clinical variables (age, body weight, menopause status, aromatase inhibitor exposure duration) and three radiomic features extracted from the region of interest of L1 vertebra (original_firstorder_RootMeanSquared, wavelet.HH_glcm_InverseVariance, and wavelet.LL_glcm_MCC) were selected for building predictive models of L1 T-score and bone health. The predictive model combining clinical and radiomic features showed the greatest adjusted R2 value (0.557), sensitivity (83.6%), specificity (74.2%) and total accuracy (79.4%) compared to models that relied solely on clinical data, radiomic features, or Hounsfield units. In conclusion, the clinical-radiomic predictive model may be used as an opportunistic screening tool for early identification of breast cancer survivors at high risk of CTIBL based on non-contrast CT images of the L1 vertebra, thereby facilitating early intervention for osteoporosis.

Subject(s)

Bone Diseases, Metabolic , Breast Neoplasms , Osteoporosis , Humans , Female , Bone Density , Breast Neoplasms/diagnostic imaging , Breast Neoplasms/drug therapy , Retrospective Studies , Radiomics , Osteoporosis/chemically induced , Osteoporosis/diagnostic imaging , Tomography, X-Ray Computed/methods

19.

Wolfberry genome database: integrated genomic datasets for studying molecular biology.

Cao, You-Long; Chen, You-Yi; Li, Yan-Long; Li, Chung-I; Lin, Shao-Ting; Lee, Bing-Ru; Hsieh, Chun-Lin; Hsiao, Yu-Yun; Fan, Yun-Fang; Luo, Qing; Zhao, Jian-Hua; Yin, Yue; An, Wei; Shi, Zhi-Gang; Chow, Chi-Nga; Chang, Wen-Chi; Huang, Chun-Lin; Chang, Wei-Hung; Liu, Zhong-Jian; Wu, Wei-Sheng; Tsai, Wen-Chieh.

Front Plant Sci ; 15: 1310346, 2024.

Article in English | MEDLINE | ID: mdl-38444537

ABSTRACT

Wolfberry, also known as goji berry or Lycium barbarum, is a highly valued fruit with significant health benefits and nutritional value. For more efficient and comprehensive usage of published L. barbarum genomic data, we established the Wolfberry database. The utility of the Wolfberry Genome Database (WGDB) is highlighted through the Genome browser, which enables the user to explore the L. barbarum genome, browse specific chromosomes, and access gene sequences. Gene annotation features provide comprehensive information about gene functions, locations, expression profiles, pathway involvement, protein domains, and regulatory transcription factors. The transcriptome feature allows the user to explore gene expression patterns using transcripts per kilobase million (TPM) and fragments per kilobase per million mapped reads (FPKM) metrics. The Metabolism pathway page provides insights into metabolic pathways and the involvement of the selected genes. In addition to the database content, we also introduce six analysis tools developed for the WGDB. These tools offer functionalities for gene function prediction, nucleotide and amino acid BLAST analysis, protein domain analysis, GO annotation, and gene expression pattern analysis. The WGDB is freely accessible at https://cosbi7.ee.ncku.edu.tw/Wolfberry/. Overall, WGDB serves as a valuable resource for researchers interested in the genomics and transcriptomics of L. barbarum. Its user-friendly web interface and comprehensive data facilitate the exploration of gene functions, regulatory mechanisms, and metabolic pathways, ultimately contributing to a deeper understanding of wolfberry and its potential applications in agronomy and nutrition.

20.

Sample size calculation based on exact test for assessing differential expression analysis in RNA-seq data.

Li, Chung-I; Su, Pei-Fang; Shyr, Yu.

BMC Bioinformatics ; 14: 357, 2013 Dec 06.

Article in English | MEDLINE | ID: mdl-24314022

ABSTRACT

BACKGROUND: Sample size calculation is an important issue in the experimental design of biomedical research. For RNA-seq experiments, the sample size calculation method based on the Poisson model has been proposed; however, when there are biological replicates, RNA-seq data could exhibit variation significantly greater than the mean (i.e. over-dispersion). The Poisson model cannot appropriately model the over-dispersion, and in such cases, the negative binomial model has been used as a natural extension of the Poisson model. Because the field currently lacks a sample size calculation method based on the negative binomial model for assessing differential expression analysis of RNA-seq data, we propose a method to calculate the sample size. RESULTS: We propose a sample size calculation method based on the exact test for assessing differential expression analysis of RNA-seq data. CONCLUSIONS: The proposed sample size calculation method is straightforward and not computationally intensive. Simulation studies to evaluate the performance of the proposed sample size method are presented; the results indicate our method works well, with achievement of desired power.

Subject(s)

Gene Expression Regulation , RNA/biosynthesis , RNA/genetics , Sequence Analysis, RNA/methods , Base Sequence , Computer Simulation/statistics & numerical data , Likelihood Functions , Models, Statistical , Poisson Distribution , RNA/antagonists & inhibitors , Random Allocation , Research Design/statistics & numerical data , Research Design/trends , Sample Size , Sequence Analysis, RNA/statistics & numerical data , Sequence Analysis, RNA/trends , User-Computer Interface

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL