Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 61
Filter
1.
BMC Syst Biol ; 12(Suppl 7): 119, 2018 12 14.
Article in English | MEDLINE | ID: mdl-30547775

ABSTRACT

BACKGROUND: Accumulation of amyloid ß-peptide (Aß) is implicated in the pathogenesis and development of Alzheimer's disease (AD). Neuron-enriched miRNA was aberrantly regulated and may be associated with the pathogenesis of AD. However, regarding whether miRNA is involved in the accumulation of Aß in AD, the underlying molecule mechanism remains unclear. Therefore, we conduct a systematic identification of the promising role of miRNAs in Aß deposition, and shed light on the molecular mechanism of target miRNAs underlying SH-SY5Y cells treated with Aß-induced cytotoxicity. RESULTS: Statistical analyses of microarray data revealed that 155 significantly upregulated and 50 significantly downregulated miRNAs were found on the basis of log2 | Fold Change | ≥ 0.585 and P < 0.05 filter condition through 2588 kinds of mature miRNA probe examined. PCR results show that the expression change trend of the selected six miRNAs (miR-6845-3p, miR-4487, miR-4534, miR-3622-3p, miR-1233-3p, miR-6760-5p) was consistent with the results of the gene chip. Notably, Aß25-35 downregulated hsa-miR-4487 and upregulated hsa-miR-6845-3p in SH-SY5Y cell lines associated with Aß-mediated pathophysiology. Increase of hsa-miR-4487 could inhibit cells apoptosis, and diminution of hsa-miR-6845-3p could attenuate axon damage mediated by Aß25-35 in SH-SY5Y. CONCLUSIONS: Together, these findings suggest that dysregulation of hsa-miR-4487 and hsa-miR-6845-3p contributed to the pathogenesis of AD associated with Aß25-35 mediated by triggering cell apoptosis and synaptic dysfunction. It might be beneficial to understand the pathogenesis and development of clinical diagnosis and treatment of AD. Further, our well-designed validation studies will test the miRNAs signature as a prognostication tool associated with clinical outcomes in AD.


Subject(s)
Alzheimer Disease/genetics , Alzheimer Disease/pathology , Amyloid beta-Peptides/toxicity , MicroRNAs/genetics , Peptide Fragments/toxicity , Apoptosis/drug effects , Apoptosis/genetics , Axons/drug effects , Axons/pathology , Cell Line, Tumor , Cell Survival/drug effects , Cell Survival/genetics , Humans , Transcriptome/drug effects
2.
BMC Med Genomics ; 11(Suppl 5): 106, 2018 Nov 20.
Article in English | MEDLINE | ID: mdl-30453959

ABSTRACT

BACKGROUND: Non-small cell lung cancer (NSCLC) represents more than about 80% of the lung cancer. The early stages of NSCLC can be treated with complete resection with a good prognosis. However, most cases are detected at late stage of the disease. The average survival rate of the patients with invasive lung cancer is only about 4%. Adenocarcinoma in situ (AIS) is an intermediate subtype of lung adenocarcinoma that exhibits early stage growth patterns but can develop into invasion. METHODS: In this study, we used RNA-seq data from normal, AIS, and invasive lung cancer tissues to identify a gene module that represents the distinguishing characteristics of AIS as AIS-specific genes. Two differential expression analysis algorithms were employed to identify the AIS-specific genes. Then, the subset of the best performed AIS-specific genes for the early lung cancer prediction were selected by random forest. Finally, the performances of the early lung cancer prediction were assessed using random forest, support vector machine (SVM) and artificial neural networks (ANNs) on four independent early lung cancer datasets including one tumor-educated blood platelets (TEPs) dataset. RESULTS: Based on the differential expression analysis, 107 AIS-specific genes that consisted of 93 protein-coding genes and 14 long non-coding RNAs (lncRNAs) were identified. The significant functions associated with these genes include angiogenesis and ECM-receptor interaction, which are highly related to cancer development and contribute to the smoking-free lung cancers. Moreover, 12 of the AIS-specific lncRNAs are involved in lung cancer progression by potentially regulating the ECM-receptor interaction pathway. The feature selection by random forest identified 20 of the AIS-specific genes as early stage lung cancer signatures using the dataset obtained from The Cancer Genome Atlas (TCGA) lung adenocarcinoma samples. Of the 20 signatures, two were lncRNAs, BLACAT1 and CTD-2527I21.15 which have been reported to be associated with bladder cancer, colorectal cancer and breast cancer. In blind classification for three independent tissue sample datasets, these signature genes consistently yielded about 98% accuracy for distinguishing early stage lung cancer from normal cases. However, the prediction accuracy for the blood platelets samples was only 64.35% (sensitivity 78.1%, specificity 50.59%, and AUROC 0.747). CONCLUSIONS: The comparison of AIS with normal and invasive tumor revealed diseases-specific genes and offered new insights into the mechanism underlying AIS progression into an invasive tumor. These genes can also serve as the signatures for early diagnosis of lung cancer with high accuracy. The expression profile of gene signatures identified from tissue cancer samples yielded remarkable early cancer prediction for tissues samples, however, relatively lower accuracy for boold platelets samples.


Subject(s)
Adenocarcinoma in Situ/pathology , Lung Neoplasms/pathology , Adenocarcinoma in Situ/genetics , Area Under Curve , Databases, Genetic , Disease Progression , Gene Expression Regulation, Neoplastic , Humans , Lung Neoplasms/genetics , Machine Learning , Neoplasm Staging , Open Reading Frames/genetics , RNA, Long Noncoding/genetics , ROC Curve , Transcriptome
3.
BMC Med Genomics ; 11(Suppl 5): 104, 2018 Nov 20.
Article in English | MEDLINE | ID: mdl-30454048

ABSTRACT

BACKGROUND: Breast cancer is the most common type of invasive cancer in woman. It accounts for approximately 18% of all cancer deaths worldwide. It is well known that somatic mutation plays an essential role in cancer development. Hence, we propose that a prognostic prediction model that integrates somatic mutations with gene expression can improve survival prediction for cancer patients and also be able to reveal the genetic mutations associated with survival. METHOD: Differential expression analysis was used to identify breast cancer related genes. Genetic algorithm (GA) and univariate Cox regression analysis were applied to filter out survival related genes. DAVID was used for enrichment analysis on somatic mutated gene set. The performance of survival predictors were assessed by Cox regression model and concordance index(C-index). RESULTS: We investigated the genome-wide gene expression profile and somatic mutations of 1091 breast invasive carcinoma cases from The Cancer Genome Atlas (TCGA). We identified 118 genes with high hazard ratios as breast cancer survival risk gene candidates (log rank p <  0.0001 and c-index = 0.636). Multiple breast cancer survival related genes were found in this gene set, including FOXR2, FOXD1, MTNR1B and SDC1. Further genetic algorithm (GA) revealed an optimal gene set consisted of 88 genes with higher c-index (log rank p <  0.0001 and c-index = 0.656). We validated this gene set on an independent breast cancer data set and achieved a similar performance (log rank p <  0.0001 and c-index = 0.614). Moreover, we revealed 25 functional annotations, 15 gene ontology terms and 14 pathways that were significantly enriched in the genes that showed distinct mutation patterns in the different survival risk groups. These functional gene sets were used as new features for the survival prediction model. In particular, our results suggested that the Fanconi anemia pathway had an important role in breast cancer prognosis. CONCLUSIONS: Our study indicated that the expression levels of the gene signatures remain the effective indicators for breast cancer survival prediction. Combining the gene expression information with other types of features derived from somatic mutations can further improve the performance of survival prediction. The pathways that were associated with survival risk suggested by our study can be further investigated for improving cancer patient survival.


Subject(s)
Algorithms , Breast Neoplasms/genetics , Breast Neoplasms/mortality , Breast Neoplasms/pathology , Female , Forkhead Transcription Factors/genetics , Gene Expression Regulation, Neoplastic , Genome, Human , Humans , Mutation , Proportional Hazards Models , Receptor, Melatonin, MT2/genetics , Survival Analysis , Transcriptome
4.
Genes (Basel) ; 9(5)2018 05 14.
Article in English | MEDLINE | ID: mdl-29757968

ABSTRACT

The authors wish to make the following change to their paper [...].

5.
Genes (Basel) ; 9(1)2018 Jan 05.
Article in English | MEDLINE | ID: mdl-29303984

ABSTRACT

Lung cancer is the second most commonly diagnosed carcinoma and is the leading cause of cancer death. Although significant progress has been made towards its understanding and treatment, unraveling the complexities of lung cancer is still hampered by a lack of comprehensive knowledge on the mechanisms underlying the disease. High-throughput and multidimensional genomic data have shed new light on cancer biology. In this study, we developed a network-based approach integrating somatic mutations, the transcriptome, DNA methylation, and protein-DNA interactions to reveal the key regulators in lung adenocarcinoma (LUAD). By combining Bayesian network analysis with tissue-specific transcription factor (TF) and targeted gene interactions, we inferred 15 disease-related core regulatory networks in co-expression gene modules associated with LUAD. Through target gene set enrichment analysis, we identified a set of key TFs, including known cancer genes that potentially regulate the disease networks. These TFs were significantly enriched in multiple cancer-related pathways. Specifically, our results suggest that hepatitis viruses may contribute to lung carcinogenesis, highlighting the need for further investigations into the roles that viruses play in treating lung cancer. Additionally, 13 putative regulatory long non-coding RNAs (lncRNAs), including three that are known to be associated with lung cancer, and nine novel lncRNAs were revealed by our study. These lncRNAs and their target genes exhibited high interaction potentials and demonstrated significant expression correlations between normal lung and LUAD tissues. We further extended our study to include 16 solid-tissue tumor types and determined that the majority of these lncRNAs have putative regulatory roles in multiple cancers, with a few showing lung-cancer specific regulations. Our study provides a comprehensive investigation of transcription factor and lncRNA regulation in the context of LUAD regulatory networks and yields new insights into the regulatory mechanisms underlying LUAD. The novel key regulatory elements discovered by our research offer new targets for rational drug design and accompanying therapeutic strategies.

6.
Hum Genomics ; 10 Suppl 2: 21, 2016 07 25.
Article in English | MEDLINE | ID: mdl-27461004

ABSTRACT

BACKGROUND: Chronic inflammation has been widely considered to be the major risk factor of coronary heart disease (CHD). The goal of our study was to explore the possible association with CHD for inflammation-related single nucleotide polymorphisms (SNPs) involved in cytosine-phosphate-guanine (CpG) dinucleotides. A total of 784 CHD patients and 739 non-CHD controls were recruited from Zhejiang Province, China. Using the Sequenom MassARRAY platform, we measured the genotypes of six inflammation-related CpG-SNPs, including IL1B rs16944, IL1R2 rs2071008, PLA2G7 rs9395208, FAM5C rs12732361, CD40 rs1800686, and CD36 rs2065666). Allele and genotype frequencies were compared between CHD and non-CHD individuals using the CLUMP22 software with 10,000 Monte Carlo simulations. RESULTS: Allelic tests showed that PLA2G7 rs9395208 and CD40 rs1800686 were significantly associated with CHD. Moreover, IL1B rs16944, PLA2G7 rs9395208, and CD40 rs1800686 were shown to be associated with CHD under the dominant model. Further gender-based subgroup tests showed that one SNP (CD40 rs1800686) and two SNPs (FAM5C rs12732361 and CD36 rs2065666) were associated with CHD in females and males, respectively. And the age-based subgroup tests indicated that PLA2G7 rs9395208, IL1B rs16944, and CD40 rs1800686 were associated with CHD among individuals younger than 55, younger than 65, and over 65, respectively. CONCLUSIONS: In conclusion, all the six inflammation-related CpG-SNPs (rs16944, rs2071008, rs12732361, rs2065666, rs9395208, and rs1800686) were associated with CHD in the combined or subgroup tests, suggesting an important role of inflammation in the risk of CHD.


Subject(s)
Coronary Disease/genetics , CpG Islands/genetics , Genetic Predisposition to Disease/genetics , Inflammation/genetics , Polymorphism, Single Nucleotide , 1-Alkyl-2-acetylglycerophosphocholine Esterase/genetics , Aged , Asian People/genetics , CD36 Antigens/genetics , CD40 Antigens/genetics , China , Coronary Disease/ethnology , DNA-Binding Proteins/genetics , Female , Gene Frequency , Genetic Predisposition to Disease/ethnology , Genotype , Humans , Inflammation/ethnology , Interleukin-1beta/genetics , Linkage Disequilibrium , Male , Middle Aged , Odds Ratio , Receptors, Interleukin-1 Type II/genetics , Risk Factors
7.
Hum Genomics ; 10 Suppl 2: 22, 2016 07 25.
Article in English | MEDLINE | ID: mdl-27461247

ABSTRACT

BACKGROUND: Snail is a typical transcription factor that could induce epithelial-mesenchymal transition (EMT) and cancer progression. There are some related reports about the clinical significance of snail protein expression in gastric cancer. However, the published results were not completely consistent. This study was aimed to investigate snail expression and clinical significance in gastric cancer. RESULTS: A systematic review of PubMed, CNKI, Weipu, and Wanfang database before March 2015 was conducted. We established an inclusion criterion according to subjects, method of detection, and results evaluation of snail protein. Meta-analysis was conducted using RevMan4.2 software. And merged odds ratio (OR) and 95 % CI (95 % confidence interval) were calculated. Also, forest plots and funnel plot were used to assess the potential of publication bias. A total of 10 studies were recruited. The meta-analysis was conducted to evaluate the positive rate of snail protein expression. OR and 95 % CI for different groups were listed below: (1) gastric cancer and para-carcinoma tissue [OR = 6.15, 95 % CI (4.70, 8.05)]; (2) gastric cancer and normal gastric tissue [OR = 17.00, 95 % CI (10.08, 28.67)]; (3) non-lymph node metastasis and lymph node metastasis [OR = 0.40, 95 % CI (0.18, 0.93)]; (4) poor differentiated cancer, highly differentiated cancer, and moderate cancer [OR = 3.34, 95 % CI (2.22, 5.03)]; (5) clinical stage TI + TII and stage TIII + TIV [OR = 0.38, 95 % CI (0.23, 0.60)]; (6) superficial muscularis and deep muscularis [OR = 0.18, 95 % CI (0.11, 0.31)]. CONCLUSIONS: Our results indicated that the increase of snail protein expression may play an important role in the carcinogenesis, progression, and metastasis of gastric cancer. And this result might provide instruction for the diagnosis, therapy, and prognosis of gastric cancer.


Subject(s)
Gastric Mucosa/metabolism , Gene Expression Regulation, Neoplastic , Snail Family Transcription Factors/genetics , Stomach Neoplasms/genetics , Gene Regulatory Networks , Humans , Lymphatic Metastasis , Neoplasm Invasiveness , Neoplasm Staging , Odds Ratio , Prognosis , Signal Transduction/genetics , Snail Family Transcription Factors/metabolism , Stomach/pathology , Stomach Neoplasms/diagnosis , Stomach Neoplasms/metabolism
8.
BMC Genomics ; 15 Suppl 11: I1, 2014.
Article in English | MEDLINE | ID: mdl-25558922

ABSTRACT

Synergistically integrating multi-layer genomic data at systems level not only can lead to deeper insights into the molecular mechanisms related to disease initiation and progression, but also can guide pathway-based biomarker and drug target identification. With the advent of high-throughput next-generation sequencing technologies, sequencing both DNA and RNA has generated multi-layer genomic data that can provide DNA polymorphism, non-coding RNA, messenger RNA, gene expression, isoform and alternative splicing information. Systems biology on the other hand studies complex biological systems, particularly systematic study of complex molecular interactions within specific cells or organisms. Genomics and molecular systems biology can be merged into the study of genomic profiles and implicated biological functions at cellular or organism level. The prospectively emerging field can be referred to as systems genomics or genomic systems biology. The Mid-South Bioinformatics Centre (MBC) and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and University of Arkansas for Medical Sciences are particularly interested in promoting education and research advancement in this prospectively emerging field. Based on past investigations and research outcomes, MBC is further utilizing differential gene and isoform/exon expression from RNA-seq and co-regulation from the ChiP-seq specific for different phenotypes in combination with protein-protein interactions, and protein-DNA interactions to construct high-level gene networks for an integrative genome-phoneme investigation at systems biology level.


Subject(s)
Genetic Research , Genomics , Systems Biology
9.
BMC Bioinformatics ; 15 Suppl 17: I1, 2014.
Article in English | MEDLINE | ID: mdl-25559210

ABSTRACT

Advances of high-throughput technologies have rapidly produced more and more data from DNAs and RNAs to proteins, especially large volumes of genome-scale data. However, connection of the genomic information to cellular functions and biological behaviours relies on the development of effective approaches at higher systems level. In particular, advances in RNA-Seq technology has helped the studies of transcriptome, RNA expressed from the genome, while systems biology on the other hand provides more comprehensive pictures, from which genes and proteins actively interact to lead to cellular behaviours and physiological phenotypes. As biological interactions mediate many biological processes that are essential for cellular function or disease development, it is important to systematically identify genomic information including genetic mutations from GWAS (genome-wide association study), differentially expressed genes, bidirectional promoters, intrinsic disordered proteins (IDP) and protein interactions to gain deep insights into the underlying mechanisms of gene regulations and networks. Furthermore, bidirectional promoters can co-regulate many biological pathways, where the roles of bidirectional promoters can be studied systematically for identifying co-regulating genes at interactive network level. Combining information from different but related studies can ultimately help revealing the landscape of molecular mechanisms underlying complex diseases such as cancer.


Subject(s)
Computational Biology/methods , Genome, Human , Neoplasms/genetics , Neoplasms/pathology , Transcriptome , Translational Research, Biomedical , Genomics , Humans , Phenotype
10.
BMC Bioinformatics ; 15 Suppl 17: S2, 2014.
Article in English | MEDLINE | ID: mdl-25559354

ABSTRACT

BACKGROUND: Kidney Renal Clear Cell Carcinoma (KIRC) is one of fatal genitourinary diseases and accounts for most malignant kidney tumours. KIRC has been shown resistance to radiotherapy and chemotherapy. Like many types of cancers, there is no curative treatment for metastatic KIRC. Using advanced sequencing technologies, The Cancer Genome Atlas (TCGA) project of NIH/NCI-NHGRI has produced large-scale sequencing data, which provide unprecedented opportunities to reveal new molecular mechanisms of cancer. We combined differentially expressed genes, pathways and network analyses to gain new insights into the underlying molecular mechanisms of the disease development. RESULTS: Followed by the experimental design for obtaining significant genes and pathways, comprehensive analysis of 537 KIRC patients' sequencing data provided by TCGA was performed. Differentially expressed genes were obtained from the RNA-Seq data. Pathway and network analyses were performed. We identified 186 differentially expressed genes with significant p-value and large fold changes (P < 0.01, |log(FC)| > 5). The study not only confirmed a number of identified differentially expressed genes in literature reports, but also provided new findings. We performed hierarchical clustering analysis utilizing the whole genome-wide gene expressions and differentially expressed genes that were identified in this study. We revealed distinct groups of differentially expressed genes that can aid to the identification of subtypes of the cancer. The hierarchical clustering analysis based on gene expression profile and differentially expressed genes suggested four subtypes of the cancer. We found enriched distinct Gene Ontology (GO) terms associated with these groups of genes. Based on these findings, we built a support vector machine based supervised-learning classifier to predict unknown samples, and the classifier achieved high accuracy and robust classification results. In addition, we identified a number of pathways (P < 0.04) that were significantly influenced by the disease. We found that some of the identified pathways have been implicated in cancers from literatures, while others have not been reported in the cancer before. The network analysis leads to the identification of significantly disrupted pathways and associated genes involved in the disease development. Furthermore, this study can provide a viable alternative in identifying effective drug targets. CONCLUSIONS: Our study identified a set of differentially expressed genes and pathways in kidney renal clear cell carcinoma, and represents a comprehensive computational approach to analysis large-scale next-generation sequencing data. The pathway and network analyses suggested that information from distinctly expressed genes can be utilized in the identification of aberrant upstream regulators. Identification of distinctly expressed genes and altered pathways are important in effective biomarker identification for early cancer diagnosis and treatment planning. Combining differentially expressed genes with pathway and network analyses using intelligent computational approaches provide an unprecedented opportunity to identify upstream disease causal genes and effective drug targets.


Subject(s)
Biomarkers, Tumor/genetics , Carcinoma, Renal Cell/genetics , Gene Expression Profiling/methods , Gene Regulatory Networks , Kidney Neoplasms/genetics , Kidney/metabolism , Signal Transduction , Carcinoma, Renal Cell/pathology , Case-Control Studies , Cluster Analysis , Gene Expression Regulation, Neoplastic , Humans , Kidney Neoplasms/pathology , Support Vector Machine
11.
BMC Bioinformatics ; 15 Suppl 17: S3, 2014.
Article in English | MEDLINE | ID: mdl-25559433

ABSTRACT

BACKGROUND: Combining information from different studies is an important and useful practice in bioinformatics, including genome-wide association study, rare variant data analysis and other set-based analyses. Many statistical methods have been proposed to combine p-values from independent studies. However, it is known that there is no uniformly most powerful test under all conditions; therefore, finding a powerful test in specific situation is important and desirable. RESULTS: In this paper, we propose a new statistical approach to combining p-values based on gamma distribution, which uses the inverse of the p-value as the shape parameter in the gamma distribution. CONCLUSIONS: Simulation study and real data application demonstrate that the proposed method has good performance under some situations.


Subject(s)
Genome-Wide Association Study , Models, Statistical , Case-Control Studies , Computer Simulation , Humans
12.
BMC Bioinformatics ; 15 Suppl 17: S5, 2014.
Article in English | MEDLINE | ID: mdl-25559614

ABSTRACT

BACKGROUND: Diabetes mellitus of type 2 (T2D), also known as noninsulin-dependent diabetes mellitus (NIDDM) or adult-onset diabetes, is a common disease. It is estimated that more than 300 million people worldwide suffer from T2D. In this study, we investigated the T2D, pre-diabetic and healthy human (no diabetes) bloodstream samples using genomic, genealogical, and phonemic information. We identified differentially expressed genes and pathways. The study has provided deeper insights into the development of T2D, and provided useful information for further effective prevention and treatment of the disease. RESULTS: A total of 142 bloodstream samples were collected, including 47 healthy humans, 22 pre-diabetic and 73 T2D patients. Whole genome scale gene expression profiles were obtained using the Agilent Oligo chips that contain over 20,000 human genes. We identified 79 significantly differentially expressed genes that have fold change ≥ 2. We mapped those genes and pinpointed locations of those genes on human chromosomes. Amongst them, 3 genes were not mapped well on the human genome, but the rest of 76 differentially expressed genes were well mapped on the human genome. We found that most abundant differentially expressed genes are on chromosome one, which contains 9 of those genes, followed by chromosome two that contains 7 of the 76 differentially expressed genes. We performed gene ontology (GO) functional analysis of those 79 differentially expressed genes and found that genes involve in the regulation of cell proliferation were among most common pathways related to T2D. The expression of the 79 genes was combined with clinical information that includes age, sex, and race to construct an optimal discriminant model. The overall performance of the model reached 95.1% accuracy, with 91.5% accuracy on identifying healthy humans, 100% accuracy on pre-diabetic patients and 95.9% accuract on T2D patients. The higher performance on identifying pre-diabetic patients was resulted from more significant changes of gene expressions among this particular group of humans, which implicated that patients were having profound genetic changes towards disease development. CONCLUSION: Differentially expressed genes were distributed across chromosomes, and are more abundant on chromosomes 1 and 2 than the rest of the human genome. We found that regulation of cell proliferation actually plays an important role in the T2D disease development. The predictive model developed in this study has utilized the 79 significant genes in combination with age, sex, and racial information to distinguish pre-diabetic, T2D, and healthy humans. The study not only has provided deeper understanding of the disease molecular mechanisms but also useful information for pathway analysis and effective drug target identification.


Subject(s)
Biomarkers/blood , Diabetes Mellitus, Type 2/genetics , Gene Expression Profiling , Models, Statistical , Prediabetic State/genetics , Signal Transduction , Adult , Case-Control Studies , Chromosomes, Human , Diabetes Mellitus, Type 2/blood , Genome, Human , Humans , Oligonucleotide Array Sequence Analysis , Prediabetic State/blood , RNA, Messenger/genetics
13.
BMC Med Genomics ; 6 Suppl 1: S10, 2013.
Article in English | MEDLINE | ID: mdl-23369200

ABSTRACT

BACKGROUND: Understanding how genes are expressed specifically in particular tissues is a fundamental question in developmental biology. Many tissue-specific genes are involved in the pathogenesis of complex human diseases. However, experimental identification of tissue-specific genes is time consuming and difficult. The accurate predictions of tissue-specific gene targets could provide useful information for biomarker development and drug target identification. RESULTS: In this study, we have developed a machine learning approach for predicting the human tissue-specific genes using microarray expression data. The lists of known tissue-specific genes for different tissues were collected from UniProt database, and the expression data retrieved from the previously compiled dataset according to the lists were used for input vector encoding. Random Forests (RFs) and Support Vector Machines (SVMs) were used to construct accurate classifiers. The RF classifiers were found to outperform SVM models for tissue-specific gene prediction. The results suggest that the candidate genes for brain or liver specific expression can provide valuable information for further experimental studies. Our approach was also applied for identifying tissue-selective gene targets for different types of tissues. CONCLUSIONS: A machine learning approach has been developed for accurately identifying the candidate genes for tissue specific/selective expression. The approach provides an efficient way to select some interesting genes for developing new biomedical markers and improve our knowledge of tissue-specific expression.


Subject(s)
Genome, Human , Support Vector Machine , Brain/metabolism , Computational Biology , Databases, Genetic , Gene Expression , Humans , Liver/metabolism , Oligonucleotide Array Sequence Analysis , Promoter Regions, Genetic , ROC Curve , Sequence Analysis, DNA
14.
BMC Genomics ; 12 Suppl 5: I1, 2011 Dec 23.
Article in English | MEDLINE | ID: mdl-22369358

ABSTRACT

This is an editorial report of the supplement to BMC Genomics that includes 15 papers selected from the BIOCOMP'10 - The 2010 International Conference on Bioinformatics & Computational Biology as well as other sources with a focus on genomics studies. BIOCOMP'10 was held on July 12-15 in Las Vegas, Nevada. The congress covered a large variety of research areas, and genomics was one of the major focuses because of the fast development in this field. We set out to launch a supplement to BMC Genomics with manuscripts selected from this congress and invited submissions. With a rigorous peer review process, we selected 15 manuscripts that showed work in cutting-edge genomics fields and proposed innovative methodology. We hope this supplement presents the current computational and statistical challenges faced in genomics studies, and shows the enormous promises and opportunities in the genomic future.


Subject(s)
Gene Regulatory Networks , Genomics , Computational Biology , Peer Review, Research , Precision Medicine
15.
BMC Syst Biol ; 5 Suppl 3: S1, 2011.
Article in English | MEDLINE | ID: mdl-22784615

ABSTRACT

BACKGROUND: For RNA-seq data, the aggregated counts of the short reads from the same gene is used to approximate the gene expression level. The count data can be modelled as samples from Poisson distributions with possible different parameters. To detect differentially expressed genes under two situations, statistical methods for detecting the difference of two Poisson means are used. When the expression level of a gene is low, i.e., the number of count is small, it is usually more difficult to detect the mean differences, and therefore statistical methods which are more powerful for low expression level are particularly desirable. In statistical literature, several methods have been proposed to compare two Poisson means (rates). In this paper, we compare these methods by using simulated and real RNA-seq data. RESULTS: Through simulation study and real data analysis, we find that the Wald test with the data being log-transformed is more powerful than other methods, including the likelihood ratio test, which has similar power as the variance stabilizing transformation test; both are more powerful than the conditional exact test and Fisher exact test. CONCLUSIONS: When the count data in RNA-seq can be reasonably modelled as Poisson distribution, the Wald-Log test is more powerful and should be used to detect the differentially expressed genes.


Subject(s)
Computational Biology/methods , Gene Expression Regulation , Sequence Analysis, RNA/methods , Models, Statistical , Poisson Distribution
16.
BMC Syst Biol ; 5 Suppl 3: I1, 2011.
Article in English | MEDLINE | ID: mdl-22784614

ABSTRACT

We present a report of the BIOCOMP'10 - The 2010 International Conference on Bioinformatics & Computational Biology and other related work in the area of systems biology.


Subject(s)
Gene Expression Regulation , Gene Regulatory Networks , Systems Biology/methods , Humans , Oligonucleotide Array Sequence Analysis
17.
BMC Syst Biol ; 5 Suppl 3: S13, 2011.
Article in English | MEDLINE | ID: mdl-22784619

ABSTRACT

BACKGROUND: The primary objectives of this paper are: 1.) to apply Statistical Learning Theory (SLT), specifically Partial Least Squares (PLS) and Kernelized PLS (K-PLS), to the universal "feature-rich/case-poor" (also known as "large p small n", or "high-dimension, low-sample size") microarray problem by eliminating those features (or probes) that do not contribute to the "best" chromosome bio-markers for lung cancer, and 2.) quantitatively measure and verify (by an independent means) the efficacy of this PLS process. A secondary objective is to integrate these significant improvements in diagnostic and prognostic biomedical applications into the clinical research arena. That is, to devise a framework for converting SLT results into direct, useful clinical information for patient care or pharmaceutical research. We, therefore, propose and preliminarily evaluate, a process whereby PLS, K-PLS, and Support Vector Machines (SVM) may be integrated with the accepted and well understood traditional biostatistical "gold standard", Cox Proportional Hazard model and Kaplan-Meier survival analysis methods. Specifically, this new combination will be illustrated with both PLS and Kaplan-Meier followed by PLS and Cox Hazard Ratios (CHR) and can be easily extended for both the K-PLS and SVM paradigms. Finally, these previously described processes are contained in the Fine Feature Selection (FFS) component of our overall feature reduction/evaluation process, which consists of the following components: 1.) coarse feature reduction, 2.) fine feature selection and 3.) classification (as described in this paper) and prediction. RESULTS: Our results for PLS and K-PLS showed that these techniques, as part of our overall feature reduction process, performed well on noisy microarray data. The best performance was a good 0.794 Area Under a Receiver Operating Characteristic (ROC) Curve (AUC) for classification of recurrence prior to or after 36 months and a strong 0.869 AUC for classification of recurrence prior to or after 60 months. Kaplan-Meier curves for the classification groups were clearly separated, with p-values below 4.5e-12 for both 36 and 60 months. CHRs were also good, with ratios of 2.846341 (36 months) and 3.996732 (60 months). CONCLUSIONS: SLT techniques such as PLS and K-PLS can effectively address difficult problems with analyzing biomedical data such as microarrays. The combinations with established biostatistical techniques demonstrated in this paper allow these methods to move from academic research and into clinical practice.


Subject(s)
Computational Biology/methods , Oligonucleotide Array Sequence Analysis , Humans , Kaplan-Meier Estimate , Least-Squares Analysis , Lung Neoplasms/genetics , Proportional Hazards Models , Risk Assessment , Support Vector Machine
18.
BMC Genomics ; 11 Suppl 3: I1, 2010 Dec 01.
Article in English | MEDLINE | ID: mdl-21143775

ABSTRACT

Significant interest exists in establishing synergistic research in bioinformatics, systems biology and intelligent computing. Supported by the United States National Science Foundation (NSF), International Society of Intelligent Biological Medicine (http://www.ISIBM.org), International Journal of Computational Biology and Drug Design (IJCBDD) and International Journal of Functional Informatics and Personalized Medicine, the ISIBM International Joint Conferences on Bioinformatics, Systems Biology and Intelligent Computing (ISIBM IJCBS 2009) attracted more than 300 papers and 400 researchers and medical doctors world-wide. It was the only inter/multidisciplinary conference aimed to promote synergistic research and education in bioinformatics, systems biology and intelligent computing. The conference committee was very grateful for the valuable advice and suggestions from honorary chairs, steering committee members and scientific leaders including Dr. Michael S. Waterman (USC, Member of United States National Academy of Sciences), Dr. Chih-Ming Ho (UCLA, Member of United States National Academy of Engineering and Academician of Academia Sinica), Dr. Wing H. Wong (Stanford, Member of United States National Academy of Sciences), Dr. Ruzena Bajcsy (UC Berkeley, Member of United States National Academy of Engineering and Member of United States Institute of Medicine of the National Academies), Dr. Mary Qu Yang (United States National Institutes of Health and Oak Ridge, DOE), Dr. Andrzej Niemierko (Harvard), Dr. A. Keith Dunker (Indiana), Dr. Brian D. Athey (Michigan), Dr. Weida Tong (FDA, United States Department of Health and Human Services), Dr. Cathy H. Wu (Georgetown), Dr. Dong Xu (Missouri), Drs. Arif Ghafoor and Okan K Ersoy (Purdue), Dr. Mark Borodovsky (Georgia Tech, President of ISIBM), Dr. Hamid R. Arabnia (UGA, Vice-President of ISIBM), and other scientific leaders. The committee presented the 2009 ISIBM Outstanding Achievement Awards to Dr. Joydeep Ghosh (UT Austin), Dr. Aidong Zhang (Buffalo) and Dr. Zhi-Hua Zhou (Nanjing) for their significant contributions to the field of intelligent biological medicine.


Subject(s)
Computational Biology , Precision Medicine , Systems Biology , Genomics , Humans
19.
BMC Genomics ; 11 Suppl 3: S15, 2010 Dec 01.
Article in English | MEDLINE | ID: mdl-21143782

ABSTRACT

BACKGROUND: Significant interest exists in establishing radiologic imaging as a valid biomarker for assessing the response of cancer to a variety of treatments. To address this problem, we have chosen to study patients with metastatic colorectal carcinoma to learn whether statistical learning theory can improve the performance of radiologists using CT in predicting patient treatment response to therapy compared with the more traditional RECIST (Response Evaluation Criteria in Solid Tumors) standard. RESULTS: Predictions of survival after 8 months in 38 patients with metastatic colorectal carcinoma using the Support Vector Machine (SVM) technique improved 30% when using additional information compared to WHO (World Health Organization) or RECIST measurements alone. With both Logistic Regression (LR) and SVM, there was no significant difference in performance between WHO and RECIST. The SVM and LR techniques also demonstrated that one radiologist consistently outperformed another. CONCLUSIONS: This preliminary research study has demonstrated that SLT algorithms, properly used in a clinical setting, have the potential to address questions and criticisms associated with both RECIST and WHO scoring methods. We also propose that tumor heterogeneity, shape, etc. obtained from CT and/or MRI scans be added to the SLT feature vector for processing.


Subject(s)
Carcinoma/diagnostic imaging , Carcinoma/secondary , Colorectal Neoplasms/diagnostic imaging , Tomography, X-Ray Computed , Area Under Curve , Biomarkers, Tumor , Carcinoma/drug therapy , Carcinoma/mortality , Colorectal Neoplasms/drug therapy , Colorectal Neoplasms/mortality , Colorectal Neoplasms/pathology , Humans , Logistic Models , Odds Ratio , ROC Curve , Software , Survival Analysis
20.
BMC Genomics ; 11 Suppl 3: S2, 2010 Dec 01.
Article in English | MEDLINE | ID: mdl-21143784

ABSTRACT

BACKGROUND: Short interfering RNAs (siRNAs) can be used to knockdown gene expression in functional genomics. For a target gene of interest, many siRNA molecules may be designed, whereas their efficiency of expression inhibition often varies. RESULTS: To facilitate gene functional studies, we have developed a new machine learning method to predict siRNA potency based on random forests and support vector machines. Since there were many potential sequence features, random forests were used to select the most relevant features affecting gene expression inhibition. Support vector machine classifiers were then constructed using the selected sequence features for predicting siRNA potency. Interestingly, gene expression inhibition is significantly affected by nucleotide dimer and trimer compositions of siRNA sequence. CONCLUSIONS: The findings in this study should help design potent siRNAs for functional genomics, and might also provide further insights into the molecular mechanism of RNA interference.


Subject(s)
Algorithms , Artificial Intelligence , RNA, Small Interfering/chemistry , Gene Knockdown Techniques , RNA Interference , RNA, Small Interfering/classification , RNA, Small Interfering/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...