Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 61
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Hum Genomics ; 10 Suppl 2: 21, 2016 07 25.
Artigo em Inglês | MEDLINE | ID: mdl-27461004

RESUMO

BACKGROUND: Chronic inflammation has been widely considered to be the major risk factor of coronary heart disease (CHD). The goal of our study was to explore the possible association with CHD for inflammation-related single nucleotide polymorphisms (SNPs) involved in cytosine-phosphate-guanine (CpG) dinucleotides. A total of 784 CHD patients and 739 non-CHD controls were recruited from Zhejiang Province, China. Using the Sequenom MassARRAY platform, we measured the genotypes of six inflammation-related CpG-SNPs, including IL1B rs16944, IL1R2 rs2071008, PLA2G7 rs9395208, FAM5C rs12732361, CD40 rs1800686, and CD36 rs2065666). Allele and genotype frequencies were compared between CHD and non-CHD individuals using the CLUMP22 software with 10,000 Monte Carlo simulations. RESULTS: Allelic tests showed that PLA2G7 rs9395208 and CD40 rs1800686 were significantly associated with CHD. Moreover, IL1B rs16944, PLA2G7 rs9395208, and CD40 rs1800686 were shown to be associated with CHD under the dominant model. Further gender-based subgroup tests showed that one SNP (CD40 rs1800686) and two SNPs (FAM5C rs12732361 and CD36 rs2065666) were associated with CHD in females and males, respectively. And the age-based subgroup tests indicated that PLA2G7 rs9395208, IL1B rs16944, and CD40 rs1800686 were associated with CHD among individuals younger than 55, younger than 65, and over 65, respectively. CONCLUSIONS: In conclusion, all the six inflammation-related CpG-SNPs (rs16944, rs2071008, rs12732361, rs2065666, rs9395208, and rs1800686) were associated with CHD in the combined or subgroup tests, suggesting an important role of inflammation in the risk of CHD.


Assuntos
Doença das Coronárias/genética , Ilhas de CpG/genética , Predisposição Genética para Doença/genética , Inflamação/genética , Polimorfismo de Nucleotídeo Único , 1-Alquil-2-acetilglicerofosfocolina Esterase/genética , Idoso , Povo Asiático/genética , Antígenos CD36/genética , Antígenos CD40/genética , China , Doença das Coronárias/etnologia , Proteínas de Ligação a DNA/genética , Feminino , Frequência do Gene , Predisposição Genética para Doença/etnologia , Genótipo , Humanos , Inflamação/etnologia , Interleucina-1beta/genética , Desequilíbrio de Ligação , Masculino , Pessoa de Meia-Idade , Razão de Chances , Receptores Tipo II de Interleucina-1/genética , Fatores de Risco
2.
Hum Genomics ; 10 Suppl 2: 22, 2016 07 25.
Artigo em Inglês | MEDLINE | ID: mdl-27461247

RESUMO

BACKGROUND: Snail is a typical transcription factor that could induce epithelial-mesenchymal transition (EMT) and cancer progression. There are some related reports about the clinical significance of snail protein expression in gastric cancer. However, the published results were not completely consistent. This study was aimed to investigate snail expression and clinical significance in gastric cancer. RESULTS: A systematic review of PubMed, CNKI, Weipu, and Wanfang database before March 2015 was conducted. We established an inclusion criterion according to subjects, method of detection, and results evaluation of snail protein. Meta-analysis was conducted using RevMan4.2 software. And merged odds ratio (OR) and 95 % CI (95 % confidence interval) were calculated. Also, forest plots and funnel plot were used to assess the potential of publication bias. A total of 10 studies were recruited. The meta-analysis was conducted to evaluate the positive rate of snail protein expression. OR and 95 % CI for different groups were listed below: (1) gastric cancer and para-carcinoma tissue [OR = 6.15, 95 % CI (4.70, 8.05)]; (2) gastric cancer and normal gastric tissue [OR = 17.00, 95 % CI (10.08, 28.67)]; (3) non-lymph node metastasis and lymph node metastasis [OR = 0.40, 95 % CI (0.18, 0.93)]; (4) poor differentiated cancer, highly differentiated cancer, and moderate cancer [OR = 3.34, 95 % CI (2.22, 5.03)]; (5) clinical stage TI + TII and stage TIII + TIV [OR = 0.38, 95 % CI (0.23, 0.60)]; (6) superficial muscularis and deep muscularis [OR = 0.18, 95 % CI (0.11, 0.31)]. CONCLUSIONS: Our results indicated that the increase of snail protein expression may play an important role in the carcinogenesis, progression, and metastasis of gastric cancer. And this result might provide instruction for the diagnosis, therapy, and prognosis of gastric cancer.


Assuntos
Mucosa Gástrica/metabolismo , Regulação Neoplásica da Expressão Gênica , Fatores de Transcrição da Família Snail/genética , Neoplasias Gástricas/genética , Redes Reguladoras de Genes , Humanos , Metástase Linfática , Invasividade Neoplásica , Estadiamento de Neoplasias , Razão de Chances , Prognóstico , Transdução de Sinais/genética , Fatores de Transcrição da Família Snail/metabolismo , Estômago/patologia , Neoplasias Gástricas/diagnóstico , Neoplasias Gástricas/metabolismo
3.
BMC Bioinformatics ; 15 Suppl 17: S3, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25559433

RESUMO

BACKGROUND: Combining information from different studies is an important and useful practice in bioinformatics, including genome-wide association study, rare variant data analysis and other set-based analyses. Many statistical methods have been proposed to combine p-values from independent studies. However, it is known that there is no uniformly most powerful test under all conditions; therefore, finding a powerful test in specific situation is important and desirable. RESULTS: In this paper, we propose a new statistical approach to combining p-values based on gamma distribution, which uses the inverse of the p-value as the shape parameter in the gamma distribution. CONCLUSIONS: Simulation study and real data application demonstrate that the proposed method has good performance under some situations.


Assuntos
Estudo de Associação Genômica Ampla , Modelos Estatísticos , Estudos de Casos e Controles , Simulação por Computador , Humanos
4.
BMC Bioinformatics ; 15 Suppl 17: S5, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25559614

RESUMO

BACKGROUND: Diabetes mellitus of type 2 (T2D), also known as noninsulin-dependent diabetes mellitus (NIDDM) or adult-onset diabetes, is a common disease. It is estimated that more than 300 million people worldwide suffer from T2D. In this study, we investigated the T2D, pre-diabetic and healthy human (no diabetes) bloodstream samples using genomic, genealogical, and phonemic information. We identified differentially expressed genes and pathways. The study has provided deeper insights into the development of T2D, and provided useful information for further effective prevention and treatment of the disease. RESULTS: A total of 142 bloodstream samples were collected, including 47 healthy humans, 22 pre-diabetic and 73 T2D patients. Whole genome scale gene expression profiles were obtained using the Agilent Oligo chips that contain over 20,000 human genes. We identified 79 significantly differentially expressed genes that have fold change ≥ 2. We mapped those genes and pinpointed locations of those genes on human chromosomes. Amongst them, 3 genes were not mapped well on the human genome, but the rest of 76 differentially expressed genes were well mapped on the human genome. We found that most abundant differentially expressed genes are on chromosome one, which contains 9 of those genes, followed by chromosome two that contains 7 of the 76 differentially expressed genes. We performed gene ontology (GO) functional analysis of those 79 differentially expressed genes and found that genes involve in the regulation of cell proliferation were among most common pathways related to T2D. The expression of the 79 genes was combined with clinical information that includes age, sex, and race to construct an optimal discriminant model. The overall performance of the model reached 95.1% accuracy, with 91.5% accuracy on identifying healthy humans, 100% accuracy on pre-diabetic patients and 95.9% accuract on T2D patients. The higher performance on identifying pre-diabetic patients was resulted from more significant changes of gene expressions among this particular group of humans, which implicated that patients were having profound genetic changes towards disease development. CONCLUSION: Differentially expressed genes were distributed across chromosomes, and are more abundant on chromosomes 1 and 2 than the rest of the human genome. We found that regulation of cell proliferation actually plays an important role in the T2D disease development. The predictive model developed in this study has utilized the 79 significant genes in combination with age, sex, and racial information to distinguish pre-diabetic, T2D, and healthy humans. The study not only has provided deeper understanding of the disease molecular mechanisms but also useful information for pathway analysis and effective drug target identification.


Assuntos
Biomarcadores/sangue , Diabetes Mellitus Tipo 2/genética , Perfilação da Expressão Gênica , Modelos Estatísticos , Estado Pré-Diabético/genética , Transdução de Sinais , Adulto , Estudos de Casos e Controles , Cromossomos Humanos , Diabetes Mellitus Tipo 2/sangue , Genoma Humano , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Estado Pré-Diabético/sangue , RNA Mensageiro/genética
5.
BMC Bioinformatics ; 15 Suppl 17: I1, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25559210

RESUMO

Advances of high-throughput technologies have rapidly produced more and more data from DNAs and RNAs to proteins, especially large volumes of genome-scale data. However, connection of the genomic information to cellular functions and biological behaviours relies on the development of effective approaches at higher systems level. In particular, advances in RNA-Seq technology has helped the studies of transcriptome, RNA expressed from the genome, while systems biology on the other hand provides more comprehensive pictures, from which genes and proteins actively interact to lead to cellular behaviours and physiological phenotypes. As biological interactions mediate many biological processes that are essential for cellular function or disease development, it is important to systematically identify genomic information including genetic mutations from GWAS (genome-wide association study), differentially expressed genes, bidirectional promoters, intrinsic disordered proteins (IDP) and protein interactions to gain deep insights into the underlying mechanisms of gene regulations and networks. Furthermore, bidirectional promoters can co-regulate many biological pathways, where the roles of bidirectional promoters can be studied systematically for identifying co-regulating genes at interactive network level. Combining information from different but related studies can ultimately help revealing the landscape of molecular mechanisms underlying complex diseases such as cancer.


Assuntos
Biologia Computacional/métodos , Genoma Humano , Neoplasias/genética , Neoplasias/patologia , Transcriptoma , Pesquisa Translacional Biomédica , Genômica , Humanos , Fenótipo
6.
BMC Bioinformatics ; 15 Suppl 17: S2, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25559354

RESUMO

BACKGROUND: Kidney Renal Clear Cell Carcinoma (KIRC) is one of fatal genitourinary diseases and accounts for most malignant kidney tumours. KIRC has been shown resistance to radiotherapy and chemotherapy. Like many types of cancers, there is no curative treatment for metastatic KIRC. Using advanced sequencing technologies, The Cancer Genome Atlas (TCGA) project of NIH/NCI-NHGRI has produced large-scale sequencing data, which provide unprecedented opportunities to reveal new molecular mechanisms of cancer. We combined differentially expressed genes, pathways and network analyses to gain new insights into the underlying molecular mechanisms of the disease development. RESULTS: Followed by the experimental design for obtaining significant genes and pathways, comprehensive analysis of 537 KIRC patients' sequencing data provided by TCGA was performed. Differentially expressed genes were obtained from the RNA-Seq data. Pathway and network analyses were performed. We identified 186 differentially expressed genes with significant p-value and large fold changes (P < 0.01, |log(FC)| > 5). The study not only confirmed a number of identified differentially expressed genes in literature reports, but also provided new findings. We performed hierarchical clustering analysis utilizing the whole genome-wide gene expressions and differentially expressed genes that were identified in this study. We revealed distinct groups of differentially expressed genes that can aid to the identification of subtypes of the cancer. The hierarchical clustering analysis based on gene expression profile and differentially expressed genes suggested four subtypes of the cancer. We found enriched distinct Gene Ontology (GO) terms associated with these groups of genes. Based on these findings, we built a support vector machine based supervised-learning classifier to predict unknown samples, and the classifier achieved high accuracy and robust classification results. In addition, we identified a number of pathways (P < 0.04) that were significantly influenced by the disease. We found that some of the identified pathways have been implicated in cancers from literatures, while others have not been reported in the cancer before. The network analysis leads to the identification of significantly disrupted pathways and associated genes involved in the disease development. Furthermore, this study can provide a viable alternative in identifying effective drug targets. CONCLUSIONS: Our study identified a set of differentially expressed genes and pathways in kidney renal clear cell carcinoma, and represents a comprehensive computational approach to analysis large-scale next-generation sequencing data. The pathway and network analyses suggested that information from distinctly expressed genes can be utilized in the identification of aberrant upstream regulators. Identification of distinctly expressed genes and altered pathways are important in effective biomarker identification for early cancer diagnosis and treatment planning. Combining differentially expressed genes with pathway and network analyses using intelligent computational approaches provide an unprecedented opportunity to identify upstream disease causal genes and effective drug targets.


Assuntos
Biomarcadores Tumorais/genética , Carcinoma de Células Renais/genética , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Neoplasias Renais/genética , Rim/metabolismo , Transdução de Sinais , Carcinoma de Células Renais/patologia , Estudos de Casos e Controles , Análise por Conglomerados , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias Renais/patologia , Máquina de Vetores de Suporte
7.
BMC Genomics ; 15 Suppl 11: I1, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25558922

RESUMO

Synergistically integrating multi-layer genomic data at systems level not only can lead to deeper insights into the molecular mechanisms related to disease initiation and progression, but also can guide pathway-based biomarker and drug target identification. With the advent of high-throughput next-generation sequencing technologies, sequencing both DNA and RNA has generated multi-layer genomic data that can provide DNA polymorphism, non-coding RNA, messenger RNA, gene expression, isoform and alternative splicing information. Systems biology on the other hand studies complex biological systems, particularly systematic study of complex molecular interactions within specific cells or organisms. Genomics and molecular systems biology can be merged into the study of genomic profiles and implicated biological functions at cellular or organism level. The prospectively emerging field can be referred to as systems genomics or genomic systems biology. The Mid-South Bioinformatics Centre (MBC) and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and University of Arkansas for Medical Sciences are particularly interested in promoting education and research advancement in this prospectively emerging field. Based on past investigations and research outcomes, MBC is further utilizing differential gene and isoform/exon expression from RNA-seq and co-regulation from the ChiP-seq specific for different phenotypes in combination with protein-protein interactions, and protein-DNA interactions to construct high-level gene networks for an integrative genome-phoneme investigation at systems biology level.


Assuntos
Pesquisa em Genética , Genômica , Biologia de Sistemas
8.
BMC Genomics ; 12 Suppl 5: I1, 2011 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-22369358

RESUMO

This is an editorial report of the supplement to BMC Genomics that includes 15 papers selected from the BIOCOMP'10 - The 2010 International Conference on Bioinformatics & Computational Biology as well as other sources with a focus on genomics studies. BIOCOMP'10 was held on July 12-15 in Las Vegas, Nevada. The congress covered a large variety of research areas, and genomics was one of the major focuses because of the fast development in this field. We set out to launch a supplement to BMC Genomics with manuscripts selected from this congress and invited submissions. With a rigorous peer review process, we selected 15 manuscripts that showed work in cutting-edge genomics fields and proposed innovative methodology. We hope this supplement presents the current computational and statistical challenges faced in genomics studies, and shows the enormous promises and opportunities in the genomic future.


Assuntos
Redes Reguladoras de Genes , Genômica , Biologia Computacional , Revisão da Pesquisa por Pares , Medicina de Precisão
9.
BMC Genomics ; 11 Suppl 3: S2, 2010 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-21143784

RESUMO

BACKGROUND: Short interfering RNAs (siRNAs) can be used to knockdown gene expression in functional genomics. For a target gene of interest, many siRNA molecules may be designed, whereas their efficiency of expression inhibition often varies. RESULTS: To facilitate gene functional studies, we have developed a new machine learning method to predict siRNA potency based on random forests and support vector machines. Since there were many potential sequence features, random forests were used to select the most relevant features affecting gene expression inhibition. Support vector machine classifiers were then constructed using the selected sequence features for predicting siRNA potency. Interestingly, gene expression inhibition is significantly affected by nucleotide dimer and trimer compositions of siRNA sequence. CONCLUSIONS: The findings in this study should help design potent siRNAs for functional genomics, and might also provide further insights into the molecular mechanism of RNA interference.


Assuntos
Algoritmos , Inteligência Artificial , RNA Interferente Pequeno/química , Técnicas de Silenciamento de Genes , Interferência de RNA , RNA Interferente Pequeno/classificação , RNA Interferente Pequeno/metabolismo
10.
BMC Genomics ; 11 Suppl 3: S15, 2010 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-21143782

RESUMO

BACKGROUND: Significant interest exists in establishing radiologic imaging as a valid biomarker for assessing the response of cancer to a variety of treatments. To address this problem, we have chosen to study patients with metastatic colorectal carcinoma to learn whether statistical learning theory can improve the performance of radiologists using CT in predicting patient treatment response to therapy compared with the more traditional RECIST (Response Evaluation Criteria in Solid Tumors) standard. RESULTS: Predictions of survival after 8 months in 38 patients with metastatic colorectal carcinoma using the Support Vector Machine (SVM) technique improved 30% when using additional information compared to WHO (World Health Organization) or RECIST measurements alone. With both Logistic Regression (LR) and SVM, there was no significant difference in performance between WHO and RECIST. The SVM and LR techniques also demonstrated that one radiologist consistently outperformed another. CONCLUSIONS: This preliminary research study has demonstrated that SLT algorithms, properly used in a clinical setting, have the potential to address questions and criticisms associated with both RECIST and WHO scoring methods. We also propose that tumor heterogeneity, shape, etc. obtained from CT and/or MRI scans be added to the SLT feature vector for processing.


Assuntos
Carcinoma/diagnóstico por imagem , Carcinoma/secundário , Neoplasias Colorretais/diagnóstico por imagem , Tomografia Computadorizada por Raios X , Área Sob a Curva , Biomarcadores Tumorais , Carcinoma/tratamento farmacológico , Carcinoma/mortalidade , Neoplasias Colorretais/tratamento farmacológico , Neoplasias Colorretais/mortalidade , Neoplasias Colorretais/patologia , Humanos , Modelos Logísticos , Razão de Chances , Curva ROC , Software , Análise de Sobrevida
11.
BMC Genomics ; 11 Suppl 3: I1, 2010 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-21143775

RESUMO

Significant interest exists in establishing synergistic research in bioinformatics, systems biology and intelligent computing. Supported by the United States National Science Foundation (NSF), International Society of Intelligent Biological Medicine (http://www.ISIBM.org), International Journal of Computational Biology and Drug Design (IJCBDD) and International Journal of Functional Informatics and Personalized Medicine, the ISIBM International Joint Conferences on Bioinformatics, Systems Biology and Intelligent Computing (ISIBM IJCBS 2009) attracted more than 300 papers and 400 researchers and medical doctors world-wide. It was the only inter/multidisciplinary conference aimed to promote synergistic research and education in bioinformatics, systems biology and intelligent computing. The conference committee was very grateful for the valuable advice and suggestions from honorary chairs, steering committee members and scientific leaders including Dr. Michael S. Waterman (USC, Member of United States National Academy of Sciences), Dr. Chih-Ming Ho (UCLA, Member of United States National Academy of Engineering and Academician of Academia Sinica), Dr. Wing H. Wong (Stanford, Member of United States National Academy of Sciences), Dr. Ruzena Bajcsy (UC Berkeley, Member of United States National Academy of Engineering and Member of United States Institute of Medicine of the National Academies), Dr. Mary Qu Yang (United States National Institutes of Health and Oak Ridge, DOE), Dr. Andrzej Niemierko (Harvard), Dr. A. Keith Dunker (Indiana), Dr. Brian D. Athey (Michigan), Dr. Weida Tong (FDA, United States Department of Health and Human Services), Dr. Cathy H. Wu (Georgetown), Dr. Dong Xu (Missouri), Drs. Arif Ghafoor and Okan K Ersoy (Purdue), Dr. Mark Borodovsky (Georgia Tech, President of ISIBM), Dr. Hamid R. Arabnia (UGA, Vice-President of ISIBM), and other scientific leaders. The committee presented the 2009 ISIBM Outstanding Achievement Awards to Dr. Joydeep Ghosh (UT Austin), Dr. Aidong Zhang (Buffalo) and Dr. Zhi-Hua Zhou (Nanjing) for their significant contributions to the field of intelligent biological medicine.


Assuntos
Biologia Computacional , Medicina de Precisão , Biologia de Sistemas , Genômica , Humanos
12.
BMC Genomics ; 10 Suppl 1: S1, 2009 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-19594868

RESUMO

BACKGROUND: Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for only a few hundreds of protein-DNA complexes. With the rapid accumulation of sequence data, it becomes an important but challenging task to accurately predict DNA-binding residues directly from amino acid sequence data. RESULTS: A new machine learning approach has been developed in this study for predicting DNA-binding residues from amino acid sequence data. The approach used both the labelled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented as position-specific scoring matrices (PSSMs) and several new descriptors. The sequence-derived features were then used to train random forests (RFs), which could handle a large number of input variables and avoid model overfitting. The use of evolutionary information was found to significantly improve classifier performance. The RF classifier was further evaluated using a separate test dataset, and the predicted DNA-binding residues were examined in the context of three-dimensional structures. CONCLUSION: The results suggest that the RF-based approach gives rise to more accurate prediction of DNA-binding residues than previous studies. A new web server called BindN-RF http://bioinfo.ggc.org/bindn-rf/ has thus been developed to make the RF classifier accessible to the biological research community.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Proteínas de Ligação a DNA/metabolismo , Análise de Sequência de Proteína/métodos , Algoritmos , Sítios de Ligação , Curva ROC , Software
13.
BMC Genomics ; 10 Suppl 1: I1, 2009 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-19594867

RESUMO

The advent of high-throughput next generation sequencing technologies have fostered enormous potential applications of supercomputing techniques in genome sequencing, epi-genetics, metagenomics, personalized medicine, discovery of non-coding RNAs and protein-binding sites. To this end, the 2008 International Conference on Bioinformatics and Computational Biology (Biocomp) - 2008 World Congress on Computer Science, Computer Engineering and Applied Computing (Worldcomp) was designed to promote synergistic inter/multidisciplinary research and education in response to the current research trends and advances. The conference attracted more than two thousand scientists, medical doctors, engineers, professors and students gathered at Las Vegas, Nevada, USA during July 14-17 and received great success. Supported by International Society of Intelligent Biological Medicine (ISIBM), International Journal of Computational Biology and Drug Design (IJCBDD), International Journal of Functional Informatics and Personalized Medicine (IJFIPM) and the leading research laboratories from Harvard, M.I.T., Purdue, UIUC, UCLA, Georgia Tech, UT Austin, U. of Minnesota, U. of Iowa etc, the conference received thousands of research papers. Each submitted paper was reviewed by at least three reviewers and accepted papers were required to satisfy reviewers' comments. Finally, the review board and the committee decided to select only 19 high-quality research papers for inclusion in this supplement to BMC Genomics based on the peer reviews only. The conference committee was very grateful for the Plenary Keynote Lectures given by: Dr. Brian D. Athey (University of Michigan Medical School), Dr. Vladimir N. Uversky (Indiana University School of Medicine), Dr. David A. Patterson (Member of United States National Academy of Sciences and National Academy of Engineering, University of California at Berkeley) and Anousheh Ansari (Prodea Systems, Space Ambassador). The theme of the conference to promote synergistic research and education has been achieved successfully.


Assuntos
Biologia Computacional/métodos , Biologia Computacional/tendências , Congressos como Assunto
14.
BMC Genomics ; 10 Suppl 1: S2, 2009 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-19594879

RESUMO

BACKGROUND: Gene expression time series array data has become a useful resource for investigating gene functions and the interactions between genes. However, the gene expression arrays are always mixed with noise, and many nonlinear regulatory relationships have been omitted in many linear models. Because of those practical limitations, inference of gene regulatory model from expression data is still far from satisfactory. RESULTS: In this study, we present a model-based computational approach, Slice Pattern Model (SPM), to identify gene regulatory network from time series gene expression array data. In order to estimate performances of stability and reliability of our model, an artificial gene network is tested by the traditional linear model and SPM. SPM can handle the multiple transcriptional time lags and more accurately reconstruct the gene network. Using SPM, a 17 time-series gene expression data in yeast cell cycle is retrieved to reconstruct the regulatory network. Under the reliability threshold, theta = 55%, 18 relationships between genes are identified and transcriptional regulatory network is reconstructed. Results from previous studies demonstrate that most of gene relationships identified by SPM are correct. CONCLUSION: With the help of pattern recognition and similarity analysis, the effect of noise has been limited in SPM method. At the same time, genetic algorithm is introduced to optimize parameters of gene network model, which is performed based on a statistic method in our experiments. The results of experiments demonstrate that the gene regulatory model reconstructed using SPM is more stable and reliable than those models coming from traditional linear model.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , Modelos Genéticos , Algoritmos , Modelos Lineares , Modelos Estatísticos , Reconhecimento Automatizado de Padrão
15.
BMC Genomics ; 10 Suppl 1: S3, 2009 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-19594880

RESUMO

INTRODUCTION: In the classification of Mass Spectrometry (MS) proteomics data, peak detection, feature selection, and learning classifiers are critical to classification accuracy. To better understand which methods are more accurate when classifying data, some publicly available peak detection algorithms for Matrix assisted Laser Desorption Ionization Mass Spectrometry (MALDI-MS) data were recently compared; however, the issue of different feature selection methods and different classification models as they relate to classification performance has not been addressed. With the application of intelligent computing, much progress has been made in the development of feature selection methods and learning classifiers for the analysis of high-throughput biological data. The main objective of this paper is to compare the methods of feature selection and different learning classifiers when applied to MALDI-MS data and to provide a subsequent reference for the analysis of MS proteomics data. RESULTS: We compared a well-known method of feature selection, Support Vector Machine Recursive Feature Elimination (SVMRFE), and a recently developed method, Gradient based Leave-one-out Gene Selection (GLGS) that effectively performs microarray data analysis. We also compared several learning classifiers including K-Nearest Neighbor Classifier (KNNC), Naïve Bayes Classifier (NBC), Nearest Mean Scaled Classifier (NMSC), uncorrelated normal based quadratic Bayes Classifier recorded as UDC, Support Vector Machines, and a distance metric learning for Large Margin Nearest Neighbor classifier (LMNN) based on Mahanalobis distance. To compare, we conducted a comprehensive experimental study using three types of MALDI-MS data. CONCLUSION: Regarding feature selection, SVMRFE outperformed GLGS in classification. As for the learning classifiers, when classification models derived from the best training were compared, SVMs performed the best with respect to the expected testing accuracy. However, the distance metric learning LMNN outperformed SVMs and other classifiers on evaluating the best testing. In such cases, the optimum classification model based on LMNN is worth investigating for future study.


Assuntos
Inteligência Artificial , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Algoritmos , Biologia Computacional/métodos , Análise de Sequência com Séries de Oligonucleotídeos , Proteômica
16.
BMC Bioinformatics ; 9 Suppl 6: S7, 2008 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-18541060

RESUMO

BACKGROUND: Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer than that of negatives, it is important to predict molecular activities considering such an unbalanced situation. RESULTS: Here, asymmetric bagging and feature selection are introduced into the problem and asymmetric bagging of support vector machines (asBagging) is proposed on predicting drug activities to treat the unbalanced problem. At the same time, the features extracted from the structures of drug molecules affect prediction accuracy of QSAR models. Therefore, a novel algorithm named PRIFEAB is proposed, which applies an embedded feature selection method to remove redundant and irrelevant features for asBagging. Numerical experimental results on a data set of molecular activities show that asBagging improve the AUC and sensitivity values of molecular activities and PRIFEAB with feature selection further helps to improve the prediction ability. CONCLUSION: Asymmetric bagging can help to improve prediction accuracy of activities of drug molecules, which can be furthermore improved by performing feature selection to select relevant features from the drug molecules data sets.


Assuntos
Algoritmos , Inteligência Artificial , Modelos Químicos , Reconhecimento Automatizado de Padrão/métodos , Preparações Farmacêuticas/química , Relação Quantitativa Estrutura-Atividade , Simulação por Computador
17.
BMC Bioinformatics ; 9 Suppl 6: S8, 2008 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-18541061

RESUMO

BACKGROUND: Analysis of gene expression data for tumor classification is an important application of bioinformatics methods. But it is hard to analyse gene expression data from DNA microarray experiments by commonly used classifiers, because there are only a few observations but with thousands of measured genes in the data set. Dimension reduction is often used to handle such a high dimensional problem, but it is obscured by the existence of amounts of redundant features in the microarray data set. RESULTS: Dimension reduction is performed by combing feature extraction with redundant gene elimination for tumor classification. A novel metric of redundancy based on DIScriminative Contribution (DISC) is proposed which estimates the feature similarity by explicitly building a linear classifier on each gene. Compared with the standard linear correlation metric, DISC takes the label information into account and directly estimates the redundancy of the discriminative ability of two given features. Based on the DISC metric, a novel algorithm named REDISC (Redundancy Elimination based on Discriminative Contribution) is proposed, which eliminates redundant genes before feature extraction and promotes performance of dimension reduction. Experimental results on two microarray data sets show that the REDISC algorithm is effective and reliable to improve generalization performance of dimension reduction and hence the used classifier. CONCLUSION: Dimension reduction by performing redundant gene elimination before feature extraction is better than that with only feature extraction for tumor classification, and redundant gene elimination in a supervised way is superior to the commonly used unsupervised method like linear correlation coefficients.


Assuntos
Algoritmos , Biomarcadores Tumorais/análise , Diagnóstico por Computador/métodos , Perfilação da Expressão Gênica/métodos , Proteínas de Neoplasias/análise , Neoplasias/diagnóstico , Neoplasias/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Humanos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
18.
BMC Genomics ; 9 Suppl 1: S9, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18366622

RESUMO

BACKGROUND: Many protein regions and some entire proteins have no definite tertiary structure, existing instead as dynamic, disorder ensembles under different physiochemical circumstances. Identification of these protein disorder regions is important for protein production, protein structure prediction and determination, and protein function annotation. A number of different disorder prediction software and web services have been developed since the first predictor was designed by Dunker's lab in 1997. However, most of the software packages use a pre-defined threshold to select ordered or disordered residues. In many situations, users need to choose ordered or disordered residues at different sensitivity and specificity levels. RESULTS: Here we benchmark a state of the art disorder predictor, DISpro, on a large protein disorder dataset created from Protein Data Bank and systematically evaluate the relationship of sensitivity and specificity. Also, we extend its functionality to allow users to trade off specificity and sensitivity by setting different decision thresholds. Moreover, we compare DISpro with seven other automated disorder predictors on the 95 protein targets used in the seventh edition of Critical Assessment of Techniques for Protein Structure Prediction (CASP7). DISpro is ranked as one of the best predictors. CONCLUSION: The evaluation and extension of DISpro make it a more valuable and useful tool for structural and functional genomics.


Assuntos
Redes Neurais de Computação , Conformação Proteica , Proteínas/genética , Proteômica/métodos , Software , Caspase 7/genética , Sensibilidade e Especificidade
19.
BMC Genomics ; 9 Suppl 2: S8, 2008 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-18831799

RESUMO

BACKGROUND: Many protein regions and some entire proteins have no definite tertiary structure, presenting instead as dynamic, disorder ensembles under different physiochemical circumstances. These proteins and regions are known as Intrinsically Unstructured Proteins (IUP). IUP have been associated with a wide range of protein functions, along with roles in diseases characterized by protein misfolding and aggregation. RESULTS: Identifying IUP is important task in structural and functional genomics. We exact useful features from sequences and develop machine learning algorithms for the above task. We compare our IUP predictor with PONDRs (mainly neural-network-based predictors), disEMBL (also based on neural networks) and Globplot (based on disorder propensity). CONCLUSION: We find that augmenting features derived from physiochemical properties of amino acids (such as hydrophobicity, complexity etc.) and using ensemble method proved beneficial. The IUP predictor is a viable alternative software tool for identifying IUP protein regions and proteins.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Sequência de Aminoácidos , Inteligência Artificial , Interações Hidrofóbicas e Hidrofílicas , Modelos Moleculares , Reconhecimento Automatizado de Padrão/métodos , Estrutura Terciária de Proteína
20.
BMC Genomics ; 9 Suppl 1: S13, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18366602

RESUMO

BACKGROUND: Several classification and feature selection methods have been studied for the identification of differentially expressed genes in microarray data. Classification methods such as SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods have been used in recent studies. The accuracy of these methods has been calculated with validation methods such as v-fold validation. However there is lack of comparison between these methods to find a better framework for classification, clustering and analysis of microarray gene expression results. RESULTS: In this study, we compared the efficiency of the classification methods including; SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods. The v-fold cross validation was used to calculate the accuracy of the classifiers. Some of the common clustering methods including K-means, DBC, and EM clustering were applied to the datasets and the efficiency of these methods have been analysed. Further the efficiency of the feature selection methods including support vector machine recursive feature elimination (SVM-RFE), Chi Squared, and CSF were compared. In each case these methods were applied to eight different binary (two class) microarray datasets. We evaluated the class prediction efficiency of each gene list in training and test cross-validation using supervised classifiers. CONCLUSIONS: We presented a study in which we compared some of the common used classification, clustering, and feature selection methods. We applied these methods to eight publicly available datasets, and compared how these methods performed in class prediction of test datasets. We reported that the choice of feature selection methods, the number of genes in the gene list, the number of cases (samples) substantially influence classification success. Based on features chosen by these methods, error rates and accuracy of several classification algorithms were obtained. Results revealed the importance of feature selection in accurately classifying new samples and how an integrated feature selection and classification algorithm is performing and is capable of identifying significant genes.


Assuntos
Inteligência Artificial , Perfilação da Expressão Gênica/classificação , Expressão Gênica/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA