Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 85
Filtrar
1.
Nucleic Acids Res ; 50(D1): D1208-D1215, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34792145

RESUMO

DNA methylation has a growing potential for use as a biomarker because of its involvement in disease. DNA methylation data have also substantially grown in volume during the past 5 years. To facilitate access to these fragmented data, we proposed DiseaseMeth version 3.0 based on DiseaseMeth version 2.0, in which the number of diseases including increased from 88 to 162 and High-throughput profiles samples increased from 32 701 to 49 949. Experimentally confirmed associations added 448 pairs obtained by manual literature mining from 1472 papers in PubMed. The search, analyze and tools sections were updated to increase performance. In particular, the FunctionSearch now provides for the functional enrichment of genes from localized GO and KEGG annotation. We have also developed a unified analysis pipeline for identifying differentially DNA methylated genes (DMGs) from the original data stored in the database. 22 718 DMGs were found in 99 diseases. These DMGs offer application in disease evaluation using two self-developed online tools, Methylation Disease Correlation and Cancer Prognosis & Co-Methylation. All query results can be downloaded and can also be displayed through a box plot, heatmap or network module according to whichever search section is used. DiseaseMeth version 3.0 is freely available at http://diseasemeth.edbc.org/.


Assuntos
Metilação de DNA/genética , Bases de Dados Factuais , Perfilação da Expressão Gênica/classificação , Doenças Genéticas Inatas/classificação , Biomarcadores Tumorais/genética , Doenças Genéticas Inatas/genética , Humanos , Neoplasias/classificação , Neoplasias/genética , PubMed
2.
Nucleic Acids Res ; 50(D1): D1164-D1171, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34634794

RESUMO

Drug response to many diseases varies dramatically due to the complex genomics and functional features and contexts. Cellular diversity of human tissues, especially tumors, is one of the major contributing factors to the different drug response in different samples. With the accumulation of single-cell RNA sequencing (scRNA-seq) data, it is now possible to study the drug response to different treatments at the single cell resolution. Here, we present CeDR Atlas (available at https://ngdc.cncb.ac.cn/cedr), a knowledgebase reporting computational inference of cellular drug response for hundreds of cell types from various tissues. We took advantage of the high-throughput profiling of drug-induced gene expression available through the Connectivity Map resource (CMap) as well as hundreds of scRNA-seq data covering cells from a wide variety of organs/tissues, diseases, and conditions. Currently, CeDR maintains the results for more than 582 single cell data objects for human, mouse and cell lines, including about 140 phenotypes and 1250 tissue-cell combination types. All the results can be explored and searched by keywords for drugs, cell types, tissues, diseases, and signature genes. Overall, CeDR fine maps drug response at cellular resolution and sheds lights on the design of combinatorial treatments, drug resistance and even drug side effects.


Assuntos
Biomarcadores Farmacológicos , Bases de Dados Factuais , Neoplasias/tratamento farmacológico , Software , Animais , Perfilação da Expressão Gênica/classificação , Humanos , Bases de Conhecimento , Camundongos , Neoplasias/classificação , RNA-Seq/classificação , Análise de Célula Única/classificação , Sequenciamento do Exoma/classificação
3.
Nucleic Acids Res ; 49(17): e99, 2021 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-34214174

RESUMO

Though transcriptomics technologies evolve rapidly in the past decades, integrative analysis of mixed data between microarray and RNA-seq remains challenging due to the inherent variability difference between them. Here, Rank-In was proposed to correct the nonbiological effects across the two technologies, enabling freely blended data for consolidated analysis. Rank-In was rigorously validated via the public cell and tissue samples tested by both technologies. On the two reference samples of the SEQC project, Rank-In not only perfectly classified the 44 profiles but also achieved the best accuracy of 0.9 on predicting TaqMan-validated DEGs. More importantly, on 327 Glioblastoma (GBM) profiles and 248, 523 heterogeneous colon cancer profiles respectively, only Rank-In can successfully discriminate every single cancer profile from normal controls, while the others cannot. Further on different sizes of mixed seq-array GBM profiles, Rank-In can robustly reproduce a median range of DEG overlapping from 0.74 to 0.83 among top genes, whereas the others never exceed 0.72. Being the first effective method enabling mixed data of cross-technology analysis, Rank-In welcomes hybrid of array and seq profiles for integrative study on large/small, paired/unpaired and balanced/imbalanced samples, opening possibility to reduce sampling space of clinical cancer patients. Rank-In can be accessed at http://www.badd-cao.net/rank-in/index.html.


Assuntos
Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , RNA-Seq/métodos , Análise por Conglomerados , Neoplasias do Colo/diagnóstico , Neoplasias do Colo/genética , Diagnóstico Diferencial , Perfilação da Expressão Gênica/classificação , Glioblastoma/diagnóstico , Glioblastoma/genética , Humanos , Internet , Neoplasias/diagnóstico , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
4.
Cancer Med ; 10(11): 3782-3793, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33987975

RESUMO

Relapsed acute lymphoblastic leukaemia (ALL) remains a prevalent paediatric cancer and one of the most common causes of mortality from malignancy in children. Tailoring the intensity of therapy according to early stratification is a promising strategy but remains a major challenge due to heterogeneity and subtyping difficulty. In this study, we subgroup B-precursor ALL patients by gene expression profiles, using non-negative matrix factorization and minimum description length which unsupervisedly determines the number of subgroups. Within each of the four subgroups, logistic and Cox regression with elastic net regularization are used to build models predicting minimal residual disease (MRD) and relapse-free survival (RFS) respectively. Measured by area under the receiver operating characteristic curve (AUC), subgrouping improves prediction of MRD in one subgroup which mostly overlaps with subtype TCF3-PBX1 (AUC = 0·986 in the training set and 1·0 in the test set), compared to a global model published previously. The models predicting RFS displayed acceptable concordance in training set and discriminate high-relapse-risk patients in three subgroups of the test set (Wilcoxon test p = 0·048, 0·036, and 0·016). Genes playing roles in the models are specific to different subgroups. The improvement of subgrouped MRD prediction and the differences of genes in prediction models of subgroups suggest that the heterogeneity of B-precursor ALL can be handled by subgrouping according to gene expression profiles to improve the prediction accuracy.


Assuntos
Perfilação da Expressão Gênica , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Adolescente , Criança , Pré-Escolar , Intervalo Livre de Doença , Feminino , Perfilação da Expressão Gênica/classificação , Humanos , Lactente , Modelos Logísticos , Masculino , Neoplasia Residual , Leucemia-Linfoma Linfoblástico de Células Precursoras/classificação , Modelos de Riscos Proporcionais , Curva ROC , Recidiva , Adulto Jovem
5.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-34020547

RESUMO

Cancer is a highly heterogeneous disease caused by dysregulation in different cell types and tissues. However, different cancers may share common mechanisms. It is critical to identify decisive genes involved in the development and progression of cancer, and joint analysis of multiple cancers may help to discover overlapping mechanisms among different cancers. In this study, we proposed a fusion feature selection framework attributed to ensemble method named Fisher score and Gradient Boosting Decision Tree (FS-GBDT) to select robust and decisive feature genes in high-dimensional gene expression datasets. Joint analysis of 11 human cancers types was conducted to explore the key feature genes subset of cancer. To verify the efficacy of FS-GBDT, we compared it with four other common feature selection algorithms by Support Vector Machine (SVM) classifier. The algorithm achieved highest indicators, outperforms other four methods. In addition, we performed gene ontology analysis and literature validation of the key gene subset, and this subset were classified into several functional modules. Functional modules can be used as markers of disease to replace single gene which is difficult to be found repeatedly in applications of gene chip, and to study the core mechanisms of cancer.


Assuntos
Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Neoplasias/genética , Máquina de Vetores de Suporte , Análise por Conglomerados , Árvores de Decisões , Perfilação da Expressão Gênica/classificação , Ontologia Genética , Humanos , Neoplasias/patologia , Reprodutibilidade dos Testes
6.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33876181

RESUMO

Gene expression profiling has played a significant role in the identification and classification of tumor molecules. In gene expression data, only a few feature genes are closely related to tumors. It is a challenging task to select highly discriminative feature genes, and existing methods fail to deal with this problem efficiently. This article proposes a novel metaheuristic approach for gene feature extraction, called variable neighborhood learning Harris Hawks optimizer (VNLHHO). First, the F-score is used for a primary selection of the genes in gene expression data to narrow down the selection range of the feature genes. Subsequently, a variable neighborhood learning strategy is constructed to balance the global exploration and local exploitation of the Harris Hawks optimization. Finally, mutation operations are employed to increase the diversity of the population, so as to prevent the algorithm from falling into a local optimum. In addition, a novel activation function is used to convert the continuous solution of the VNLHHO into binary values, and a naive Bayesian classifier is utilized as a fitness function to select feature genes that can help classify biological tissues of binary and multi-class cancers. An experiment is conducted on gene expression profile data of eight types of tumors. The results show that the classification accuracy of the VNLHHO is greater than 96.128% for tumors in the colon, nervous system and lungs and 100% for the rest. We compare seven other algorithms and demonstrate the superiority of the VNLHHO in terms of the classification accuracy, fitness value and AUC value in feature selection for gene expression data.


Assuntos
Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Aprendizado de Máquina , Neoplasias/genética , Animais , Análise por Conglomerados , Bases de Dados Factuais/estatística & dados numéricos , Perfilação da Expressão Gênica/classificação , Regulação Neoplásica da Expressão Gênica , Humanos , Internet , Modelos Genéticos , Mutação , Neoplasias/classificação , Reprodutibilidade dos Testes
7.
Biol Reprod ; 97(3): 353-364, 2017 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-29025079

RESUMO

Early mammalian embryonic transcriptomes are dynamic throughout the process of preimplantation development. Cataloging of primate transcriptomics during early development has been accomplished in humans, but global characterization of transcripts is lacking in the rhesus macaque: a key model for human reproductive processes. We report here the systematic classification of individual macaque transcriptomes using RNA-Seq technology from the germinal vesicle stage oocyte through the blastocyst stage embryo. Major differences in gene expression were found between sequential stages, with the 4- to 8-cell stages showing the highest level of differential gene expression. Analysis of putative transcription factor binding sites also revealed a striking increase in key regulatory factors in 8-cell embryos, indicating a strong likelihood of embryonic genome activation occurring at this stage. Furthermore, clustering analyses of gene co-expression throughout this period resulted in distinct groups of transcripts significantly associated to the different embryo stages assayed. The sequence data provided here along with characterizations of major regulatory transcript groups present a comprehensive atlas of polyadenylated transcripts that serves as a useful resource for comparative studies of preimplantation development in humans and other species.


Assuntos
Blastocisto/fisiologia , Perfilação da Expressão Gênica/classificação , Perfilação da Expressão Gênica/métodos , Oócitos/fisiologia , Transcriptoma/genética , Transcriptoma/fisiologia , Animais , Sítios de Ligação , Mapeamento Cromossômico , Análise por Conglomerados , DNA Complementar/genética , Desenvolvimento Embrionário/genética , Feminino , Regulação da Expressão Gênica no Desenvolvimento/genética , Macaca mulatta , Gravidez , RNA/genética , Fatores de Transcrição/metabolismo
8.
Diagn Cytopathol ; 44(11): 867-873, 2016 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-27534929

RESUMO

BACKGROUND: The gene expression classifier (GEC; Afirma-Veracyte) has proven to be an effective triage modality in the management of thyroid nodules. We evaluate our institutional experience with GEC, specifically examining performance as a first line testing strategy versus in conjunction with repeat fine needle aspiration (FNA), usage trends based on clinical setting, and performance related to diagnostic categories of The Bethesda System for Reporting Thyroid Cytology (TBSRTC). METHODS: All nodules undergoing GEC analysis from 1/2011 to 12/2015 at the Hospital of the University of Pennsylvania were identified using electronic database search methods. Corresponding cytologic diagnoses, GEC results, origin of the sample (in-house vs. satellite site), number and diagnosis of prior FNA's, and clinical and histologic follow-up were collected. RESULTS: The cohort included 294 nodules. Of these, 145 (49%) were classified as benign, 136 (46%) as suspicious, and 13 (5%) as quantity insufficient by GEC. Surgical resection was performed in 130 (130/294-44%) cases (107, 82% "suspicious" by GEC); final histopathologic diagnosis was benign in 85 (65%) and malignant in 45 (35%) cases. Three false negative diagnoses were identified in the setting of GEC analysis as a first line testing strategy. Most cases with GEC as a first line testing strategy came from satellite clinical sites (112, 66%). CONCLUSIONS: The GEC showed improved performance characteristics when coupled with a repeat FNA. It continues to be of low specificity and positive predictive value in oncocytic follicular lesions. Diagn. Cytopathol. 2016;44:867-873. © 2016 Wiley Periodicals, Inc.


Assuntos
Aspiração por Agulha Fina Guiada por Ultrassom Endoscópico/normas , Perfilação da Expressão Gênica/normas , Técnicas de Diagnóstico Molecular/normas , Nódulo da Glândula Tireoide/patologia , Biomarcadores/metabolismo , Aspiração por Agulha Fina Guiada por Ultrassom Endoscópico/estatística & dados numéricos , Perfilação da Expressão Gênica/classificação , Perfilação da Expressão Gênica/estatística & dados numéricos , Hospitais Universitários/estatística & dados numéricos , Humanos , Técnicas de Diagnóstico Molecular/classificação , Técnicas de Diagnóstico Molecular/estatística & dados numéricos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Nódulo da Glândula Tireoide/metabolismo
9.
J Comput Biol ; 23(7): 603-14, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27104372

RESUMO

Similarity (or conversely distance) measures are at the heart of most bioinformatic applications. When the similarity involves only a small subset of features out of many, global similarity measures may be significantly affected by noise. Selecting only a subset of (putatively relevant) features for comparison is a widespread solution to the problem albeit affected by arbitrariness and manual intervention. The problem is becoming more and more important due to the increasing amount of experimental data available. In recent years measures based on ranking similarities between two datasets have been proposed. Here, we use one of the proposed rank similarity measures, sharing some aspects with the fraction enrichment score used for protein structure prediction and the gene set enrichment analysis, and test its performance in classifying experiments. The discrimination ability of the similarity measures based on the overlap of ranked genes tested here compares well or better with standard measures of similarity. This conclusion supports the use of rank-based proximity measures to gain further insight in dataset comparisons, particularly on expression data obtained by different techonologies (e.g., RNA-seq and microarrays).


Assuntos
Perfilação da Expressão Gênica/classificação , Análise de Sequência com Séries de Oligonucleotídeos/classificação , Proteínas/genética , Algoritmos , Biologia Computacional/métodos
10.
Am J Ophthalmol ; 162: 20-27.e1, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26596399

RESUMO

PURPOSE: To determine whether any conventional clinical prognostic factors for metastasis from uveal melanoma retain prognostic significance in multivariate models incorporating gene expression profile (GEP) class of the tumor cells. DESIGN: Prospective, interventional case series with a prognostic model. METHODS: Single-institution study of GEP testing and other conventional prognostic factors for metastasis and metastatic death in 299 patients with posterior uveal melanoma evaluated by fine-needle aspiration biopsy (FNAB) at the time of or shortly prior to initial treatment. Univariate prognostic significance of all evaluated potential prognostic variables (patient age, largest linear basal diameter of tumor [LBD], tumor thickness, intraocular location of tumor, melanoma cytomorphologic subtype, and GEP class) was performed by comparison of Kaplan-Meier event rate curves and univariate Cox proportional hazards modeling. Multivariate prognostic significance of combinations of significant prognostic factors identified by univariate analysis was performed using step-up and step-down Cox proportional hazards modeling. RESULTS: GEP class was the strongest prognostic factor for metastatic death in this series. However, tumor LBD, tumor thickness, and intraocular tumor location also proved to be significant individual prognostic factors in this study. On multivariate analysis, a 2-term model that incorporated GEP class and largest basal diameter was associated with strong independent significance of each of the factors. CONCLUSION: Although GEP test is the most robust prognostic indicator in uveal melanoma and early studies of mostly larger tumors found that no clinicopathologic factors had significant prognostic value independent of GEP, our single-center study, which included a substantial proportion of smaller tumors, showed that both GEP and LBD of the tumor are independent prognostic factors for metastasis and metastatic death in multivariate analysis.


Assuntos
Melanoma/diagnóstico , Melanoma/genética , Transcriptoma/genética , Neoplasias Uveais/diagnóstico , Neoplasias Uveais/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Biópsia por Agulha Fina , Feminino , Perfilação da Expressão Gênica/classificação , Genes Neoplásicos , Humanos , Masculino , Melanoma/classificação , Melanoma/mortalidade , Pessoa de Meia-Idade , Proteínas de Neoplasias/genética , Prognóstico , Modelos de Riscos Proporcionais , Estudos Prospectivos , Taxa de Sobrevida , Neoplasias Uveais/classificação , Neoplasias Uveais/mortalidade
11.
PLoS One ; 10(11): e0141874, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26562156

RESUMO

One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/estatística & dados numéricos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Ferimentos e Lesões/genética , Análise por Conglomerados , Simulação por Computador , Perfilação da Expressão Gênica/classificação , Perfilação da Expressão Gênica/métodos , Humanos , Cadeias de Markov , Modelos Genéticos , Modelos Estatísticos , Método de Monte Carlo , Análise de Sequência com Séries de Oligonucleotídeos/classificação , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Avaliação de Resultados em Cuidados de Saúde/métodos , Avaliação de Resultados em Cuidados de Saúde/estatística & dados numéricos
12.
Plant Physiol ; 169(4): 2684-99, 2015 12.
Artigo em Inglês | MEDLINE | ID: mdl-26438786

RESUMO

A plethora of diverse programmed cell death (PCD) processes has been described in living organisms. In animals and plants, different forms of PCD play crucial roles in development, immunity, and responses to the environment. While the molecular control of some animal PCD forms such as apoptosis is known in great detail, we still know comparatively little about the regulation of the diverse types of plant PCD. In part, this deficiency in molecular understanding is caused by the lack of reliable reporters to detect PCD processes. Here, we addressed this issue by using a combination of bioinformatics approaches to identify commonly regulated genes during diverse plant PCD processes in Arabidopsis (Arabidopsis thaliana). Our results indicate that the transcriptional signatures of developmentally controlled cell death are largely distinct from the ones associated with environmentally induced cell death. Moreover, different cases of developmental PCD share a set of cell death-associated genes. Most of these genes are evolutionary conserved within the green plant lineage, arguing for an evolutionary conserved core machinery of developmental PCD. Based on this information, we established an array of specific promoter-reporter lines for developmental PCD in Arabidopsis. These PCD indicators represent a powerful resource that can be used in addition to established morphological and biochemical methods to detect and analyze PCD processes in vivo and in planta.


Assuntos
Apoptose/genética , Proteínas de Arabidopsis/genética , Arabidopsis/genética , Perfilação da Expressão Gênica/métodos , Arabidopsis/crescimento & desenvolvimento , Proteínas de Arabidopsis/classificação , Biologia Computacional/métodos , Perfilação da Expressão Gênica/classificação , Regulação da Expressão Gênica no Desenvolvimento/efeitos dos fármacos , Regulação da Expressão Gênica no Desenvolvimento/efeitos da radiação , Regulação da Expressão Gênica de Plantas/efeitos dos fármacos , Regulação da Expressão Gênica de Plantas/efeitos da radiação , Peróxido de Hidrogênio/farmacologia , Microscopia Confocal , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Oxidantes/farmacologia , Plantas Geneticamente Modificadas , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Cloreto de Sódio/farmacologia , Transcriptoma/efeitos dos fármacos , Transcriptoma/efeitos da radiação , Raios Ultravioleta
13.
OMICS ; 19(8): 471-7, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26230532

RESUMO

High-throughput assays from genomics, proteomics, metabolomics, and next generation sequencing produce massive omics datasets that are challenging to analyze in biological or clinical contexts. Thus far, there is no publicly available program for converting quantitative omics data into input formats to be used in off-the-shelf robust phylogenetic programs. To the best of our knowledge, this is the first report on creation of two Windows-based programs, OmicsTract and SynpExtractor, to address this gap. We note, as a way of introduction and development of these programs, that one particularly useful bioinformatics inferential modeling is the phylogenetic cladogram. Cladograms are multidimensional tools that show the relatedness between subgroups of healthy and diseased individuals and the latter's shared aberrations; they also reveal some characteristics of a disease that would not otherwise be apparent by other analytical methods. The OmicsTract and SynpExtractor were written for the respective tasks of (1) accommodating advanced phylogenetic parsimony analysis (through standard programs of MIX [from PHYLIP] and TNT), and (2) extracting shared aberrations at the cladogram nodes. OmicsTract converts comma-delimited data tables through assigning each data point into a binary value ("0" for normal states and "1" for abnormal states) then outputs the converted data tables into the proper input file formats for MIX or with embedded commands for TNT. SynapExtractor uses outfiles from MIX and TNT to extract the shared aberrations of each node of the cladogram, matching them with identifying labels from the dataset and exporting them into a comma-delimited file. Labels may be gene identifiers in gene-expression datasets or m/z values in mass spectrometry datasets. By automating these steps, OmicsTract and SynpExtractor offer a veritable opportunity for rapid and standardized phylogenetic analyses of omics data; their model can also be extended to next generation sequencing (NGS) data. We make OmicsTract and SynpExtractor publicly and freely available for non-commercial use in order to strengthen and build capacity for the phylogenetic paradigm of omics analysis.


Assuntos
Perfilação da Expressão Gênica/classificação , Neoplasias da Próstata/diagnóstico , Neoplasias da Próstata/genética , Software , Algoritmos , Conjuntos de Dados como Assunto , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Disseminação de Informação , Armazenamento e Recuperação da Informação , Masculino , Metabolômica/métodos , Próstata/metabolismo , Próstata/patologia , Neoplasias da Próstata/patologia
14.
Ren Fail ; 37(7): 1219-24, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26156684

RESUMO

OBJECTIVE: We attempt to explore the pathogenesis and specific genes with aberrant expression in diabetic nephropathy (DN). METHODS: The gene expression profile of GSE1009 was downloaded from Gene Expression Omnibus database, including 3 normal function glomeruli and DN glomeruli from cadaveric donor kidneys. The differentially expressed genes (DEGs) were analyzed and the aberrant gene-related functions were predicted by informatics methods. The protein-protein interaction (PPI) networks for DEGs were constructed and the functional sub-network was screened. RESULTS: A total of 416 DEGs were found to be differentially expressed in DN samples comparing with normal controls, including 404 up-regulated genes and 12 down-regulated genes. DEGs were involved in the process of combination to saccharides and the decline of tissue repairing ability of the organisms. The genes of VEGFA, ACTG1, HSP90AA1 had high degree in the PPI network. The main biological process of genes in the sub-network was related with cell proliferation and signal transmitting of cell membrane receptor. CONCLUSION: Significant nodes in PPI network provide new insights to understand the mechanism of DN. VEGFA, ACTG1 and HSP90AA1 may be the potential targets in the DN treatment.


Assuntos
Biologia Computacional , Nefropatias Diabéticas/genética , Perfilação da Expressão Gênica/classificação , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Bases de Dados Factuais , Regulação para Baixo , Humanos , Modelos Lineares , Regulação para Cima
15.
Comput Biol Med ; 64: 292-8, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25712072

RESUMO

Micro-array data are typically characterized by high dimensional features with a small number of samples. Several problems in identifying genes causing diseases from micro-array data can be transformed into the problem of classifying the features extracted from gene expression in micro-array data. However, too many features can cause low prediction accuracy as well as high computational complexity. Dimensional reduction is a method to eliminate irrelevant features to improve the prediction accuracy. Typically, the eigenvalues or dimensional data variance from principal component analysis are used as criteria to select relevant features. This approach is simple but not efficient since it does not concern the degree of data overlap in each dimension in the feature space. A new method to select relevant features based on degree of dimensional data overlap with proper feature selection was introduced. Furthermore, our study concentrated on small sized data sets which usually occur in reality. The experimental results signified that this new approach can achieve substantially higher prediction accuracy when compared with other methods.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/classificação , Perfilação da Expressão Gênica/métodos , Algoritmos , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Componente Principal , Curva ROC , Máquina de Vetores de Suporte
16.
Am J Ophthalmol ; 159(2): 248-56, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25448994

RESUMO

PURPOSE: To determine the frequency of discordant gene expression profile (GEP) classification of posterior uveal melanomas sampled at 2 tumor sites by fine-needle aspiration biopsy (FNAB). DESIGN: Prospective single-institution longitudinal study performed in conjunction with a multicenter validation study of the prognostic value of GEP class of posterior uveal melanoma cells for metastasis and metastatic death. METHODS: FNAB aspirates of 80 clinically diagnosed primary choroidal and ciliochoroidal melanomas were obtained from 2 tumor sites prior to or at the time of initial ocular tumor treatment and submitted for independent GEP testing and classification. Frequency of discordant GEP classification of these specimens was determined. RESULTS: Using the support vector machine learning algorithm favored by the developer of the GEP test employed in this study, 9 of the 80 cases (11.3% [95% confidence interval: 9.0%-13.6%]) were clearly discordant. If cases with a failed classification at 1 site or a low confidence class assignment by the support vector machine algorithm at 1 or both sites are also regarded as discordant, then this frequency rises to 13 of the 80 cases (16.3% [95% confidence interval: 13.0%-19.6%]). CONCLUSION: Sampling of a clinically diagnosed posterior uveal melanoma at a single site for prognostic GEP testing is associated with a substantial probability of misclassification. Two-site sampling of such tumors with independent GEP testing of each specimen may be advisable to lessen the probability of underestimating an individual patient's prognostic risk of metastasis and metastatic death.


Assuntos
Neoplasias da Coroide/classificação , Perfilação da Expressão Gênica/classificação , Frequência do Gene , Melanoma/classificação , Proteínas de Neoplasias/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Biópsia por Agulha Fina , Braquiterapia , Neoplasias da Coroide/genética , Neoplasias da Coroide/mortalidade , Neoplasias da Coroide/patologia , Feminino , Humanos , Masculino , Melanoma/genética , Melanoma/mortalidade , Melanoma/secundário , Pessoa de Meia-Idade , Reação em Cadeia da Polimerase , Prognóstico , Estudos Prospectivos , Transcriptoma
17.
Otolaryngol Clin North Am ; 47(4): 573-93, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25041959

RESUMO

Thyroid fine-needle aspiration biopsies are cytologically indeterminate in 15% to 30% of cases. When cytologically indeterminate thyroid nodules undergo diagnostic surgery, approximately three-quarters prove to be histologically benign. A negative predictive value of more than or equal to 94% for the Afirma Gene Expression Classifier (GEC) is achieved for indeterminate nodules. Most Afirma GEC benign nodules can be clinically observed, as suggested by the National Comprehensive Cancer Network Thyroid Carcinoma Guideline. More than half of the benign nodules with indeterminate cytology (Bethesda categories III/IV) can be identified as GEC benign and removed from the surgical pool to prevent unnecessary diagnostic surgery.


Assuntos
Perfilação da Expressão Gênica/métodos , Nódulo da Glândula Tireoide/diagnóstico , Nódulo da Glândula Tireoide/genética , Biópsia por Agulha Fina , Citodiagnóstico/métodos , Análise Mutacional de DNA , Perfilação da Expressão Gênica/classificação , Regulação Neoplásica da Expressão Gênica , Humanos , Imuno-Histoquímica , Sensibilidade e Especificidade , Glândula Tireoide/patologia , Nódulo da Glândula Tireoide/patologia , Tireoidectomia/economia
18.
Proc Natl Acad Sci U S A ; 111(23): E2423-30, 2014 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-24912181

RESUMO

To modulate the expression of genes involved in nitrogen assimilation, the cyanobacterial PII-interacting protein X (PipX) interacts with the global transcriptional regulator NtcA and the signal transduction protein PII, a protein found in all three domains of life as an integrator of signals of the nitrogen and carbon balance. PipX can form alternate complexes with NtcA and PII, and these interactions are stimulated and inhibited, respectively, by 2-oxoglutarate, providing a mechanistic link between PII signaling and NtcA-regulated gene expression. Here, we demonstrate that PipX is involved in a much wider interaction network. The effect of pipX alleles on transcript levels was studied by RNA sequencing of S. elongatus strains grown in the presence of either nitrate or ammonium, followed by multivariate analyses of relevant mutant/control comparisons. As a result of this process, 222 genes were classified into six coherent groups of differentially regulated genes, two of which, containing either NtcA-activated or NtcA-repressed genes, provided further insights into the function of NtcA-PipX complexes. The remaining four groups suggest the involvement of PipX in at least three NtcA-independent regulatory pathways. Our results pave the way to uncover new regulatory interactions and mechanisms in the control of gene expression in cyanobacteria.


Assuntos
Proteínas de Bactérias/genética , Proteínas de Ligação a DNA/genética , Regulação Bacteriana da Expressão Gênica , Synechococcus/genética , Fatores de Transcrição/genética , Compostos de Amônio/metabolismo , Compostos de Amônio/farmacologia , Proteínas de Bactérias/metabolismo , Sequência de Bases , Proteínas de Ligação a DNA/metabolismo , Perfilação da Expressão Gênica/classificação , Ácidos Cetoglutáricos/farmacologia , Modelos Genéticos , Dados de Sequência Molecular , Análise Multivariada , Mutação , Nitratos/metabolismo , Nitratos/farmacologia , Nitrogênio/metabolismo , Nitrogênio/farmacologia , Motivos de Nucleotídeos/genética , Proteínas PII Reguladoras de Nitrogênio/genética , Proteínas PII Reguladoras de Nitrogênio/metabolismo , Regiões Promotoras Genéticas/genética , Ligação Proteica/efeitos dos fármacos , Homologia de Sequência do Ácido Nucleico , Synechococcus/metabolismo , Fatores de Transcrição/metabolismo , Sítio de Iniciação de Transcrição
19.
ScientificWorldJournal ; 2014: 593503, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24790574

RESUMO

A Relative Expression Analysis (RXA) uses ordering relationships in a small collection of genes and is successfully applied to classiffication using microarray data. As checking all possible subsets of genes is computationally infeasible, the RXA algorithms require feature selection and multiple restrictive assumptions. Our main contribution is a specialized evolutionary algorithm (EA) for top-scoring pairs called EvoTSP which allows finding more advanced gene relations. We managed to unify the major variants of relative expression algorithms through EA and introduce weights to the top-scoring pairs. Experimental validation of EvoTSP on public available microarray datasets showed that the proposed solution significantly outperforms in terms of accuracy other relative expression algorithms and allows exploring much larger solution space.


Assuntos
Algoritmos , Biologia Computacional/métodos , Evolução Molecular , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Perfilação da Expressão Gênica/classificação , Perfilação da Expressão Gênica/estatística & dados numéricos , Aptidão Genética , Variação Genética , Mutação , Análise de Sequência com Séries de Oligonucleotídeos/classificação , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Recombinação Genética , Seleção Genética
20.
BMC Bioinformatics ; 14: 350, 2013 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-24299119

RESUMO

BACKGROUND: Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison. RESULTS: We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach. CONCLUSION: In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation.


Assuntos
Drosophila melanogaster/citologia , Drosophila melanogaster/genética , Regulação da Expressão Gênica no Desenvolvimento , Genoma de Inseto/genética , Modelos Genéticos , Anotação de Sequência Molecular/métodos , Animais , Diferenciação Celular/genética , Divisão Celular/genética , Biologia Computacional/classificação , Biologia Computacional/métodos , Drosophila melanogaster/embriologia , Perfilação da Expressão Gênica/classificação , Perfilação da Expressão Gênica/métodos , Ensaios de Triagem em Larga Escala , Anotação de Sequência Molecular/classificação , Valor Preditivo dos Testes , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...