Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
1.
J Formos Med Assoc ; 121(9): 1728-1738, 2022 Sep.
Article in English | MEDLINE | ID: mdl-35168836

ABSTRACT

BACKGROUND: The need is growing to create medical big data based on the electronic health records collected from different hospitals. Errors for sure occur and how to correct them should be explored. METHODS: Electronic health records of 9,197,817 patients and 53,081,148 visits, totaling about 500 million records for 2006-2016, were transmitted from eight hospitals into an integrated database. We randomly selected 10% of patients, accumulated the primary keys for their tabulated data, and compared the key numbers in the transmitted data with those of the raw data. Errors were identified based on statistical testing and clinical reasoning. RESULTS: Data were recorded in 1573 tables. Among these, 58 (3.7%) had different key numbers, with the maximum of 16.34/1000. Statistical differences (P < 0.05) were found in 34 (58.6%), of which 15 were caused by changes in diagnostic codes, wrong accounts, or modified orders. For the rest, the differences were related to accumulation of hospital visits over time. In the remaining 24 tables (41.4%) without significant differences, three were revised because of incorrect computer programming or wrong accounts. For the rest, the programming was correct and absolute differences were negligible. The applicability was confirmed using the data of 2,730,883 patients and 15,647,468 patient-visits transmitted during 2017-2018, in which 10 (3.5%) tables were corrected. CONCLUSION: Significant magnitude of inconsistent data does exist during the transmission of big data from diverse sources. Systematic validation is essential. Comparing the number of data tabulated using the primary keys allow us to rapidly identify and correct these scattered errors.


Subject(s)
Big Data , Biomedical Research , Databases, Factual , Electronic Health Records , Humans , Multi-Institutional Systems
2.
Nucleic Acids Res ; 43(Database issue): D862-7, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25398902

ABSTRACT

We previously presented YM500, which is an integrated database for miRNA quantification, isomiR identification, arm switching discovery and novel miRNA prediction from 468 human smRNA-seq datasets. Here in this updated YM500v2 database (http://ngs.ym.edu.tw/ym500/), we focus on the cancer miRNome to make the database more disease-orientated. New miRNA-related algorithms developed after YM500 were included in YM500v2, and, more significantly, more than 8000 cancer-related smRNA-seq datasets (including those of primary tumors, paired normal tissues, PBMC, recurrent tumors, and metastatic tumors) were incorporated into YM500v2. Novel miRNAs (miRNAs not included in the miRBase R21) were not only predicted by three independent algorithms but also cleaned by a new in silico filtration strategy and validated by wetlab data such as Cross-Linked ImmunoPrecipitation sequencing (CLIP-seq) to reduce the false-positive rate. A new function 'Meta-analysis' is additionally provided for allowing users to identify real-time differentially expressed miRNAs and arm-switching events according to customer-defined sample groups and dozens of clinical criteria tidying up by proficient clinicians. Cancer miRNAs identified hold the potential for both basic research and biotech applications.


Subject(s)
Databases, Nucleic Acid , MicroRNAs/chemistry , MicroRNAs/metabolism , Neoplasms/genetics , Gene Expression Profiling , Humans , Internet , Sequence Analysis, RNA
3.
Nucleic Acids Res ; 42(Database issue): D1048-54, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24214964

ABSTRACT

Exome sequencing (exome-seq) has aided in the discovery of a huge amount of mutations in cancers, yet challenges remain in converting oncogenomics data into information that is interpretable and accessible for clinical care. We constructed DriverDB (http://ngs.ym.edu.tw/driverdb/), a database which incorporates 6079 cases of exome-seq data, annotation databases (such as dbSNP, 1000 Genome and Cosmic) and published bioinformatics algorithms dedicated to driver gene/mutation identification. We provide two points of view, 'Cancer' and 'Gene', to help researchers to visualize the relationships between cancers and driver genes/mutations. The 'Cancer' section summarizes the calculated results of driver genes by eight computational methods for a specific cancer type/dataset and provides three levels of biological interpretation for realization of the relationships between driver genes. The 'Gene' section is designed to visualize the mutation information of a driver gene in five different aspects. Moreover, a 'Meta-Analysis' function is provided so researchers may identify driver genes in customer-defined samples. The novel driver genes/mutations identified hold potential for both basic research and biotech applications.


Subject(s)
Databases, Nucleic Acid , Exome , Genes, Neoplasm , Mutation , High-Throughput Nucleotide Sequencing , Humans , Internet , Molecular Sequence Annotation
4.
BMC Genomics ; 16 Suppl 2: S2, 2015.
Article in English | MEDLINE | ID: mdl-25708300

ABSTRACT

BACKGROUND: Identification of genes with ascending or descending monotonic expression patterns over time or stages of stem cells is an important issue in time-series microarray data analysis. We propose a method named Monotonic Feature Selector (MFSelector) based on a concept of total discriminating error (DEtotal) to identify monotonic genes. MFSelector considers various time stages in stage order (i.e., Stage One vs. other stages, Stages One and Two vs. remaining stages and so on) and computes DEtotal of each gene. MFSelector can successfully identify genes with monotonic characteristics. RESULTS: We have demonstrated the effectiveness of MFSelector on two synthetic data sets and two stem cell differentiation data sets: embryonic stem cell neurogenesis (ESCN) and embryonic stem cell vasculogenesis (ESCV) data sets. We have also performed extensive quantitative comparisons of the three monotonic gene selection approaches. Some of the monotonic marker genes such as OCT4, NANOG, BLBP, discovered from the ESCN dataset exhibit consistent behavior with that reported in other studies. The role of monotonic genes found by MFSelector in either stemness or differentiation is validated using information obtained from Gene Ontology analysis and other literature. We justify and demonstrate that descending genes are involved in the proliferation or self-renewal activity of stem cells, while ascending genes are involved in differentiation of stem cells into variant cell lineages. CONCLUSIONS: We have developed a novel system, easy to use even with no pre-existing knowledge, to identify gene sets with monotonic expression patterns in multi-stage as well as in time-series genomics matrices. The case studies on ESCN and ESCV have helped to get a better understanding of stemness and differentiation. The novel monotonic marker genes discovered from a data set are found to exhibit consistent behavior in another independent data set, demonstrating the utility of the proposed method. The MFSelector R function and data sets can be downloaded from: http://microarray.ym.edu.tw/tools/MFSelector/.


Subject(s)
Computational Biology/methods , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Stem Cells/metabolism , Algorithms , Cell Differentiation/genetics , Cell Lineage/genetics , Cluster Analysis , Homeodomain Proteins/genetics , Humans , Internet , Nanog Homeobox Protein , Neovascularization, Physiologic/genetics , Neurogenesis/genetics , Octamer Transcription Factor-3/genetics , Stem Cells/cytology , Time Factors
5.
Nucleic Acids Res ; 41(Database issue): D285-94, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23203880

ABSTRACT

MicroRNAs (miRNAs) are small RNAs ∼22 nt in length that are involved in the regulation of a variety of physiological and pathological processes. Advances in high-throughput small RNA sequencing (smRNA-seq), one of the next-generation sequencing applications, have reshaped the miRNA research landscape. In this study, we established an integrative database, the YM500 (http://ngs.ym.edu.tw/ym500/), containing analysis pipelines and analysis results for 609 human and mice smRNA-seq results, including public data from the Gene Expression Omnibus (GEO) and some private sources. YM500 collects analysis results for miRNA quantification, for isomiR identification (incl. RNA editing), for arm switching discovery, and, more importantly, for novel miRNA predictions. Wetlab validation on >100 miRNAs confirmed high correlation between miRNA profiling and RT-qPCR results (R = 0.84). This database allows researchers to search these four different types of analysis results via our interactive web interface. YM500 allows researchers to define the criteria of isomiRs, and also integrates the information of dbSNP to help researchers distinguish isomiRs from SNPs. A user-friendly interface is provided to integrate miRNA-related information and existing evidence from hundreds of sequencing datasets. The identified novel miRNAs and isomiRs hold the potential for both basic research and biotech applications.


Subject(s)
Databases, Nucleic Acid , MicroRNAs/chemistry , MicroRNAs/metabolism , Internet , Sequence Analysis, RNA , Transcriptome , User-Computer Interface
6.
J Ethnopharmacol ; 173: 370-82, 2015 Sep 15.
Article in English | MEDLINE | ID: mdl-26239152

ABSTRACT

ETHNOPHARMACOLOGICAL RELEVANCE: Four traditional Chinese herbal remedies (CHR) including Buyang Huanwu decoction (BHD), Xuefu Zhuyu decoction (XZD), Tianma Gouteng decoction (TGD) and Shengyu decoction (SYD) are popular used in treating brain-related dysfunction clinically with different syndrome/pattern based on traditional Chinese medicine (TCM) principles, yet their neuroprotective mechanisms are still unclear. MATERIALS AND METHODS: Mice were subjected to an acute ischemic stroke to examine the efficacy and molecular mechanisms of action underlying these CHR. RESULTS: CHR treatment significantly enhanced the survival rate of stroke mice, with BHD being the most effective CHR. All CHR were superior to recombinant tissue-type plasminogen activator (rt-PA) treatment in successfully ameliorating brain function, infarction, and neurological deficits in stroke mice that also paralleled to improvements in blood-brain barrier damage, inflammation, apoptosis, and neurogenesis. Transcriptome analyses reveals that a total of 774 ischemia-induced probe sets were significantly modulated by four CHR, including 52 commonly upregulated genes and 54 commonly downregulated ones. Among them, activation of neurogenesis-associated signaling pathways and down-regulating inflammation and apoptosis pathways are key common mechanisms in ischemic stroke protection by all CHR. Besides, levels of plasma CX3CL1 and S100a9 in patients could be used as biomarkers for therapeutic evaluation before functional recovery could be observed. CONCLUSION: Our results suggest that using CHR, a combinatory cocktail therapy, is a better way than rt-PA for treating cerebral ischemic-associated diseases through modulating a common as well as a specific group of genes/pathways that may partially explain the syndrome differentiation and treatment principle in TCM.


Subject(s)
Drugs, Chinese Herbal/therapeutic use , Infarction, Middle Cerebral Artery/drug therapy , Neuroprotective Agents/therapeutic use , Animals , Calgranulin B/genetics , Chemokine CXCL1/genetics , Drug Therapy, Combination , Drugs, Chinese Herbal/pharmacology , Gene Expression Profiling , Infarction, Middle Cerebral Artery/genetics , Male , Medicine, Chinese Traditional , Mice, Inbred ICR , Neurogenesis/drug effects , Neuroprotective Agents/pharmacology , Phytotherapy
SELECTION OF CITATIONS
SEARCH DETAIL