Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Oncol Lett ; 18(2): 1597-1606, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31423227

RESUMO

Traditional clinical features are not sufficient to accurately judge the prognosis of endometrioid endometrial adenocarcinoma (EEA). Molecular biological characteristics and traditional clinical features are particularly important in the prognosis of EEA. The aim of the present study was to establish a predictive model that considers genes and clinical features for the prognosis of EEA. The clinical and RNA sequencing expression data of EEA were derived from samples from The Cancer Genome Atlas (TCGA) and Peking University People's Hospital (PKUPH; Beijing, China). Samples from TCGA were used as the training set, and samples from the PKUPH were used as the testing set. Variable selection using Random Forests (VSURF) was used to select the genes and clinical features on the basis of TCGA samples. The RF classification method was used to establish the prediction model. Kaplan-Meier curves were tested with the log-rank test. The results from this study demonstrated that on the basis of TCGA samples, 11 genes and the grade were selected as the input features. In the training set, the out-of-bag (OOB) error of RF model-1, which was established using the '11 genes', was 0.15; the OOB error of RF model-2, which was established using the 'grade', was 0.39; and the OOB error of RF model-3, established using the '11 genes and grade', was 0.15. In the testing set, the classification accuracy of RF model-1, model-2 and model-3 was 71.43, 66.67 and 80.95%, respectively. In conclusion, to the best of our knowledge, the VSURF was used to select features relevant to EEA prognosis, and an EEA predictive model combining genes and traditional features was established for the first time in the present study. The prediction accuracy of the RF model on the basis of the 11 genes and grade was markedly higher than that of the RF models established by either the 11 genes or grade alone.

2.
Artigo em Inglês | MEDLINE | ID: mdl-26955049

RESUMO

Hi-C technology, a chromosome conformation capture (3C) based method, has been developed to capture genome-wide interactions at a given resolution. The next challenge is to reconstruct 3D structure of genome from the 3C-derived data computationally. Several existing methods have been proposed to obtain a consensus structure or ensemble structures. These methods can be categorized as probabilistic models or restraint-based models. In this paper, we propose a method, named ShRec3D+, to infer a consensus 3D structure of a genome from Hi-C data. The method is a two-step algorithm which is based on ChromSDE and ShRec3D methods. First, correct the conversion factor by golden section search for converting interaction frequency data to a distance weighted graph. Second, apply shortest-path algorithm and multi-dimensional scaling (MDS) algorithm to compute the 3D coordinates of a set of genomic loci from the distance graph. We validate ShRec3D+ accuracy on both simulation data and publicly Hi-C data. Our test results indicate that our method successfully corrects the parameter with a given resolution, is more accurate than ShRec3D, and is more efficient and robust than ChromSDE.


Assuntos
Análise Fatorial , Genoma , Genômica/métodos , Algoritmos , Animais , Cromossomos/química , Cromossomos/genética , Simulação por Computador , Bases de Dados Genéticas , Humanos , Processamento de Imagem Assistida por Computador , Camundongos , Conformação de Ácido Nucleico
3.
Genomics Proteomics Bioinformatics ; 4(4): 245-52, 2006 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17531800

RESUMO

Computational analysis is essential for transforming the masses of microarray data into a mechanistic understanding of cancer. Here we present a method for finding gene functional modules of cancer from microarray data and have applied it to colon cancer. First, a colon cancer gene network and a normal colon tissue gene network were constructed using correlations between the genes. Then the modules that tended to have a homogeneous functional composition were identified by splitting up the network. Analysis of both networks revealed that they are scale-free. Comparison of the gene functional modules for colon cancer and normal tissues showed that the modules' functions changed with their structures.


Assuntos
Algoritmos , Neoplasias do Colo/metabolismo , Redes Reguladoras de Genes , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Humanos
4.
Sci China C Life Sci ; 49(3): 293-304, 2006 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16856499

RESUMO

Gene expression profiles of 14 common tumors and their counterpart normal tissues were analyzed with machine learning methods to address the problem of selection of tumor-specific genes and analysis of their differential expressions in tumor tissues. First, a variation of the Relief algorithm, "RFE_Relief algorithm" was proposed to learn the relations between genes and tissue types. Then, a support vector machine was employed to find the gene subset with the best classification performance for distinguishing cancerous tissues and their counterparts. After tissue-specific genes were removed, cross validation experiments were employed to demonstrate the common deregulated expressions of the selected gene in tumor tissues. The results indicate the existence of a specific expression fingerprint of these genes that is shared in different tumor tissues, and the hallmarks of the expression patterns of these genes in cancerous tissues are summarized at the end of this paper.


Assuntos
Perfilação da Expressão Gênica/métodos , Expressão Gênica , Neoplasias/genética , Algoritmos , Inteligência Artificial , DNA de Neoplasias/genética , Bases de Dados de Ácidos Nucleicos , Feminino , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Masculino , Análise de Sequência com Séries de Oligonucleotídeos , Oncogenes
5.
Interdiscip Sci ; 7(4): 391-6, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26298581

RESUMO

Feature selection techniques have been widely applied to tumor gene expression data analysis in recent years. A filter feature selection method named marginal Fisher analysis score (MFA score) which is based on graph embedding has been proposed, and it has been widely used mainly because it is superior to Fisher score. Considering the heavy redundancy in gene expression data, we proposed a new filter feature selection technique in this paper. It is named MFA score+ and is based on MFA score and redundancy excluding. We applied it to an artificial dataset and eight tumor gene expression datasets to select important features and then used support vector machine as the classifier to classify the samples. Compared with MFA score, t test and Fisher score, it achieved higher classification accuracy.


Assuntos
Bases de Dados Genéticas , Algoritmos , Animais , Humanos , Máquina de Vetores de Suporte
6.
Biomark Insights ; 9: 67-76, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25210421

RESUMO

High-throughput gene expression microarrays can be examined by machine-learning algorithms to identify gene signatures that recognize the biological characteristics of specific human diseases, including cancer, with high sensitivity and specificity. A previous study compared 20 gastric cancer (GC) samples against 20 normal tissue (NT) samples and identified 1,519 differentially expressed genes (DEGs). In this study, Classification Information Index (CII), Information Gain Index (IGI), and RELIEF algorithms are used to mine the previously reported gene expression profiling data. In all, 29 of these genes are identified by all three algorithms and are treated as GC candidate biomarkers. Three biomarkers, COL1A2, ATP4B, and HADHSC, are selected and further examined using quantitative real-time polymerase chain reaction (qRT-PCR) and immunohistochemistry (IHC) staining in two independent sets of GC and normal adjacent tissue (NAT) samples. Our study shows that COL1A2 and HADHSC are the two best biomarkers from the microarray data, distinguishing all GC from the NT, whereas ATP4B is diagnostically significant in lab tests because of its wider range of fold-changes in expression. Herein, a data-mining model applicable for small sample sizes is presented and discussed. Our result suggested that this mining model may be useful in small sample-size studies to identify putative biomarkers and potential biological features of GC.

7.
Oncol Rep ; 28(3): 1036-42, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22752057

RESUMO

Colon cancer is the third most common cancer and one of the leading causes of cancer-related death in the world. Therefore, identification of biomarkers with potential in recognizing the biological characteristics is a key problem for early diagnosis of colon cancer patients. In this study, we used a random forest approach to discover biomarkers based on a set of oligonucleotide microarray data of colon cancer. Real-time PCR was used to validate the related expression levels of biomarkers selected by our approach. Furthermore, ROC curves were used to analyze the sensitivity and specificity of each biomarker in both training and test sample sets. Finally, we analyzed the clinical significance of each biomarker based on their differential expression. A single classifier consisting of 4 genes (IL8, WDR77, MYL9 and VIP) was selected by random forests with an average sensitivity and specificity of 83.75 and 76.15%. The differential expression levels of each biomarker was validated by real-time PCR in 48 test colon cancer samples compared to the matched normal tissues. Patients with high expression of IL8 and WDR77, and low expression of MYL9 and VIP had a significantly reduced median survival rate compared to colon cancer patients. The results indicate that our approach can be employed for biomarker identification based on microarray data. These 4 genes identified by our approach have the potential to act as clinical biomarkers for the early diagnosis of colon cancer.


Assuntos
Algoritmos , Biomarcadores Tumorais/genética , Neoplasias do Colo/metabolismo , Interpretação Estatística de Dados , Expressão Gênica , Área Sob a Curva , Biomarcadores Tumorais/metabolismo , Análise por Conglomerados , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Interleucina-8/genética , Interleucina-8/metabolismo , Masculino , Pessoa de Meia-Idade , Cadeias Leves de Miosina/genética , Cadeias Leves de Miosina/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Curva ROC , Reação em Cadeia da Polimerase em Tempo Real , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Peptídeo Intestinal Vasoativo/genética , Peptídeo Intestinal Vasoativo/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA