Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
Bioinformatics ; 40(Supplement_1): i511-i520, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940121

RESUMO

MOTIVATION: Identifying cancer genes remains a significant challenge in cancer genomics research. Annotated gene sets encode functional associations among multiple genes, and cancer genes have been shown to cluster in hallmark signaling pathways and biological processes. The knowledge of annotated gene sets is critical for discovering cancer genes but remains to be fully exploited. RESULTS: Here, we present the DIsease-Specific Hypergraph neural network (DISHyper), a hypergraph-based computational method that integrates the knowledge from multiple types of annotated gene sets to predict cancer genes. First, our benchmark results demonstrate that DISHyper outperforms the existing state-of-the-art methods and highlight the advantages of employing hypergraphs for representing annotated gene sets. Second, we validate the accuracy of DISHyper-predicted cancer genes using functional validation results and multiple independent functional genomics data. Third, our model predicts 44 novel cancer genes, and subsequent analysis shows their significant associations with multiple types of cancers. Overall, our study provides a new perspective for discovering cancer genes and reveals previously undiscovered cancer genes. AVAILABILITY AND IMPLEMENTATION: DISHyper is freely available for download at https://github.com/genemine/DISHyper.


Assuntos
Neoplasias , Redes Neurais de Computação , Humanos , Neoplasias/genética , Biologia Computacional/métodos , Genômica/métodos , Genes Neoplásicos , Anotação de Sequência Molecular/métodos , Bases de Dados Genéticas
2.
Bioinformatics ; 39(39 Suppl 1): i368-i376, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387178

RESUMO

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) offers a powerful tool to dissect the complexity of biological tissues through cell sub-population identification in combination with clustering approaches. Feature selection is a critical step for improving the accuracy and interpretability of single-cell clustering. Existing feature selection methods underutilize the discriminatory potential of genes across distinct cell types. We hypothesize that incorporating such information could further boost the performance of single cell clustering. RESULTS: We develop CellBRF, a feature selection method that considers genes' relevance to cell types for single-cell clustering. The key idea is to identify genes that are most important for discriminating cell types through random forests guided by predicted cell labels. Moreover, it proposes a class balancing strategy to mitigate the impact of unbalanced cell type distributions on feature importance evaluation. We benchmark CellBRF on 33 scRNA-seq datasets representing diverse biological scenarios and demonstrate that it substantially outperforms state-of-the-art feature selection methods in terms of clustering accuracy and cell neighborhood consistency. Furthermore, we demonstrate the outstanding performance of our selected features through three case studies on cell differentiation stage identification, non-malignant cell subtype identification, and rare cell identification. CellBRF provides a new and effective tool to boost single-cell clustering accuracy. AVAILABILITY AND IMPLEMENTATION: All source codes of CellBRF are freely available at https://github.com/xuyp-csu/CellBRF.


Assuntos
Benchmarking , Algoritmo Florestas Aleatórias , Diferenciação Celular , Análise por Conglomerados
3.
Mol Neurodegener ; 16(1): 32, 2021 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-33957936

RESUMO

INTRODUCTION: Passive immunotherapies targeting Aß continue to be evaluated as Alzheimer's disease (AD) therapeutics, but there remains debate over the mechanisms by which these immunotherapies work. Besides the amount of preexisting Aß deposition and the type of deposit (compact or diffuse), there is little data concerning what factors, independent of those intrinsic to the antibody, might influence efficacy. Here we (i) explored how constitutive priming of the underlying innate activation states by Il10 and Il6 might influence passive Aß immunotherapy and (ii) evaluated transcriptomic data generated in the AMP-AD initiative to inform how these two cytokines and their receptors' mRNA levels are altered in human AD and an APP mouse model. METHODS: rAAV2/1 encoding EGFP, Il6 or Il10 were delivered by somatic brain transgenesis to neonatal (P0) TgCRND8 APP mice. Then, at 2 months of age, the mice were treated bi-weekly with a high-affinity anti-Aß1-16 mAb5 monoclonal antibody or control mouse IgG until 6 months of age. rAAV mediated transgene expression, amyloid accumulation, Aß levels and gliosis were assessed. Extensive transcriptomic data was used to evaluate the mRNA expression levels of IL10 and IL6 and their receptors in the postmortem human AD temporal cortex and in the brains of TgCRND8 mice, the later at multiple ages. RESULTS: Priming TgCRND8 mice with Il10 increases Aß loads and blocks efficacy of subsequent mAb5 passive immunotherapy, whereas priming with Il6 priming reduces Aß loads by itself and subsequent Aß immunotherapy shows only a slightly additive effect. Transcriptomic data shows that (i) there are significant increases in the mRNA levels of Il6 and Il10 receptors in the TgCRND8 mouse model and temporal cortex of humans with AD and (ii) there is a great deal of variance in individual mouse brain and the human temporal cortex of these interleukins and their receptors. CONCLUSIONS: The underlying immune activation state can markedly affect the efficacy of passive Aß immunotherapy. These results have important implications for ongoing human AD immunotherapy trials, as they indicate that underlying immune activation states within the brain, which may be highly variable, may influence the ability for passive immunotherapy to alter Aß deposition.


Assuntos
Doença de Alzheimer/imunologia , Peptídeos beta-Amiloides/antagonistas & inibidores , Anticorpos Monoclonais/farmacologia , Imunidade Inata/efeitos dos fármacos , Imunização Passiva/métodos , Animais , Humanos , Interleucina-10/imunologia , Interleucina-6/imunologia , Camundongos , Camundongos Transgênicos
4.
Alzheimers Dement ; 17(6): 984-1004, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33480174

RESUMO

Intron retention (IR) has been implicated in the pathogenesis of complex diseases such as cancers; its association with Alzheimer's disease (AD) remains unexplored. We performed genome-wide analysis of IR through integrating genetic, transcriptomic, and proteomic data of AD subjects and mouse models from the Accelerating Medicines Partnership-Alzheimer's Disease project. We identified 4535 and 4086 IR events in 2173 human and 1736 mouse genes, respectively. Quantitation of IR enabled the identification of differentially expressed genes that conventional exon-level approaches did not reveal. There were significant correlations of intron expression within innate immune genes, like HMBOX1, with AD in humans. Peptides with a high probability of translation from intron-retained mRNAs were identified using mass spectrometry. Further, we established AD-specific intron expression Quantitative Trait Loci, and identified splicing-related genes that may regulate IR. Our analysis provides a novel resource for the search for new AD biomarkers and pathological mechanisms.


Assuntos
Doença de Alzheimer , Autopsia , Encéfalo/patologia , Modelos Animais de Doenças , Genômica , Íntrons/genética , Doença de Alzheimer/genética , Doença de Alzheimer/patologia , Animais , Proteínas de Homeodomínio/genética , Humanos , Camundongos , Proteômica , Locos de Características Quantitativas , Transcriptoma
5.
EMBO Rep ; 22(1): e50535, 2021 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-33319461

RESUMO

Alternative splicing (AS) leads to transcriptome diversity in eukaryotic cells and is one of the key regulators driving cellular differentiation. Although AS is of crucial importance for normal hematopoiesis and hematopoietic malignancies, its role in early hematopoietic development is still largely unknown. Here, by using high-throughput transcriptomic analyses, we show that pervasive and dynamic AS takes place during hematopoietic development of human pluripotent stem cells (hPSCs). We identify a splicing factor switch that occurs during the differentiation of mesodermal cells to endothelial progenitor cells (EPCs). Perturbation of this switch selectively impairs the emergence of EPCs and hemogenic endothelial progenitor cells (HEPs). Mechanistically, an EPC-induced alternative spliced isoform of NUMB dictates EPC specification by controlling NOTCH signaling. Furthermore, we demonstrate that the splicing factor SRSF2 regulates splicing of the EPC-induced NUMB isoform, and the SRSF2-NUMB-NOTCH splicing axis regulates EPC generation. The identification of this splicing factor switch provides a new molecular mechanism to control cell fate and lineage specification.


Assuntos
Linhagem da Célula , Células-Tronco Pluripotentes , Fatores de Processamento de Serina-Arginina/genética , Diferenciação Celular , Linhagem da Célula/genética , Hematopoese/genética , Células-Tronco Hematopoéticas , Humanos , Proteínas de Membrana , Proteínas do Tecido Nervoso
6.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32427285

RESUMO

Advances in sequencing technologies facilitate personalized disease-risk profiling and clinical diagnosis. In recent years, some great progress has been made in noninvasive diagnoses based on cell-free DNAs (cfDNAs). It exploits the fact that dead cells release DNA fragments into the circulation, and some DNA fragments carry information that indicates their tissues-of-origin (TOOs). Based on the signals used for identifying the TOOs of cfDNAs, the existing methods can be classified into three categories: cfDNA mutation-based methods, methylation pattern-based methods and cfDNA fragmentation pattern-based methods. In cfDNA mutation-based methods, the SNP information or the detected mutations in driven genes of certain diseases are employed to identify the TOOs of cfDNAs. Methylation pattern-based methods are developed to identify the TOOs of cfDNAs based on the tissue-specific methylation patterns. In cfDNA fragmentation pattern-based methods, cfDNA fragmentation patterns, such as nucleosome positioning or preferred end coordinates of cfDNAs, are used to predict the TOOs of cfDNAs. In this paper, the strategies and challenges in each category are reviewed. Furthermore, the representative applications based on the TOOs of cfDNAs, including noninvasive prenatal testing, noninvasive cancer screening, transplantation rejection monitoring and parasitic infection detection, are also reviewed. Moreover, the challenges and future work in identifying the TOOs of cfDNAs are discussed. Our research provides a comprehensive picture of the development and challenges in identifying the TOOs of cfDNAs, which may benefit bioinformatics researchers to develop new methods to improve the identification of the TOOs of cfDNAs.


Assuntos
Ácidos Nucleicos Livres/genética , Neoplasias/diagnóstico , Biomarcadores Tumorais/genética , Ácidos Nucleicos Livres/sangue , Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mutação , Neoplasias/genética
7.
Front Genet ; 11: 586, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32733531

RESUMO

Intron retention (IR) is an alternative splicing mode whereby introns, rather than being spliced out as usual, are retained in mature mRNAs. It was previously considered a consequence of mis-splicing and received very limited attention. Only recently has IR become of interest for transcriptomic data analysis owing to its recognized roles in gene expression regulation and associations with complex diseases. In this article, we first review the function of IR in regulating gene expression in a number of biological processes, such as neuron differentiation and activation of CD4+ T cells. Next, we briefly review its association with diseases, such as Alzheimer's disease and cancers. Then, we describe state-of-the-art methods for IR detection, including RNA-seq analysis tools IRFinder and iREAD, highlighting their underlying principles and discussing their advantages and limitations. Finally, we discuss the challenges for IR detection and potential ways in which IR detection methods could be improved.

8.
J Bioinform Comput Biol ; 18(3): 2040009, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32698720

RESUMO

Clustering analysis of gene expression data is essential for understanding complex biological data, and is widely used in important biological applications such as the identification of cell subpopulations and disease subtypes. In commonly used methods such as hierarchical clustering (HC) and consensus clustering (CC), holistic expression profiles of all genes are often used to assess the similarity between samples for clustering. While these methods have been proven successful in identifying sample clusters in many areas, they do not provide information about which gene sets (functions) contribute most to the clustering, thus limiting the interpretability of the resulting cluster. We hypothesize that integrating prior knowledge of annotated gene sets would not only achieve satisfactory clustering performance but also, more importantly, enable potential biological interpretation of clusters. Here we report ClusterMine, an approach that identifies clusters by assessing functional similarity between samples through integrating known annotated gene sets in functional annotation databases such as Gene Ontology. In addition to the cluster membership of each sample as provided by conventional approaches, it also outputs gene sets that most likely contribute to the clustering, thus facilitating biological interpretation. We compare ClusterMine with conventional approaches on nine real-world experimental datasets that represent different application scenarios in biology. We find that ClusterMine achieves better performances and that the gene sets prioritized by our method are biologically meaningful. ClusterMine is implemented as an R package and is freely available at: www.genemine.org/clustermine.php.


Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Algoritmos , Ciclo Celular/genética , Bases de Dados Genéticas , Humanos , Anotação de Sequência Molecular , Neoplasias/genética , Neoplasias/patologia , Células-Tronco Pluripotentes/fisiologia , Células Receptoras Sensoriais/fisiologia
9.
BMC Genomics ; 21(1): 128, 2020 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-32028886

RESUMO

BACKGROUND: Intron retention (IR) has been traditionally overlooked as 'noise' and received negligible attention in the field of gene expression analysis. In recent years, IR has become an emerging field for interrogating transcriptomes because it has been recognized to carry out important biological functions such as gene expression regulation and it has been found to be associated with complex diseases such as cancers. However, methods for detecting IR today are limited. Thus, there is a need to develop novel methods to improve IR detection. RESULTS: Here we present iREAD (intron REtention Analysis and Detector), a tool to detect IR events genome-wide from high-throughput RNA-seq data. The command line interface for iREAD is implemented in Python. iREAD takes as input a BAM file, representing the transcriptome, and a text file containing the intron coordinates of a genome. It then 1) counts all reads that overlap intron regions, 2) detects IR events by analyzing the features of reads such as depth and distribution patterns, and 3) outputs a list of retained introns into a tab-delimited text file. iREAD provides significant added value in detecting IR compared with output from IRFinder with a higher AUC on all datasets tested. Both methods showed low false positive rates and high false negative rates in different regimes, indicating that use together is generally beneficial. The output from iREAD can be directly used for further exploratory analysis such as differential intron expression and functional enrichment. The software is freely available at https://github.com/genemine/iread. CONCLUSION: Being complementary to existing tools, iREAD provides a new and generic tool to interrogate poly-A enriched transcriptomic data of intron regions. Intron retention analysis provides a complementary approach for understanding transcriptome.


Assuntos
Íntrons , RNA-Seq , Software , Algoritmos , Animais , Humanos , Camundongos
10.
J Craniofac Surg ; 30(7): 2174-2177, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31425405

RESUMO

BACKGROUND: Underdevelopment of nose and chin in East Asians is quite common. Rhinoplasty and mentoplasty are effective procedures to solve the above-depicted defects and can achieve remarkable cosmetic effects. An autologous costal cartilage graft has become an ideal material for rhinoplasty, especially for revision surgery. However, many problems in the clinical application of costal cartilage remain unresolved. This study is to investigate application strategies of autologous costal cartilage grafts in rhino- and mentoplasty. METHODS: The methods involved are as follows: application of an integrated cartilage scaffold; comprehensive application of diced cartilage; and chin augmentation of an autologous costal cartilage graft. RESULTS: In this study, satisfactory facial contour appearance was immediately achieved in 28 patients after surgery; 21 patients had satisfactory appearance of the nose and chin during the 6- to 18-month follow-up. Cartilage resorption was not observed. Two patients had nasal tip skin redness and were cured after treatment. CONCLUSION: This procedure can be used to effectively solve: curvature of the costal cartilage segment itself; warping of the carved costal cartilage; and effective use of the costal cartilage segment. The procedure has achieved satisfactory outcomes, and its application is worth extending to clinical practice.


Assuntos
Cartilagem Costal/transplante , Mentoplastia , Rinoplastia , Adolescente , Adulto , Autoenxertos/cirurgia , Queixo/cirurgia , Feminino , Humanos , Nariz/cirurgia , Adulto Jovem
11.
J Proteome Res ; 14(9): 3484-91, 2015 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-26216192

RESUMO

Alternative splicing allows a single gene to produce multiple transcript-level splice isoforms from which the translated proteins may show differences in their expression and function. Identifying the major functional or canonical isoform is important for understanding gene and protein functions. Identification and characterization of splice isoforms is a stated goal of the HUPO Human Proteome Project and of neXtProt. Multiple efforts have catalogued splice isoforms as "dominant", "principal", or "major" isoforms based on expression or evolutionary traits. In contrast, we recently proposed highest connected isoforms (HCIs) as a new class of canonical isoforms that have the strongest interactions in a functional network and revealed their significantly higher (differential) transcript-level expression compared to nonhighest connected isoforms (NCIs) regardless of tissues/cell lines in the mouse. HCIs and their expression behavior in the human remain unexplored. Here we identified HCIs for 6157 multi-isoform genes using a human isoform network that we constructed by integrating a large compendium of heterogeneous genomic data. We present examples for pairs of transcript isoforms of ABCC3, RBM34, ERBB2, and ANXA7. We found that functional networks of isoforms of the same gene can show large differences. Interestingly, differential expression between HCIs and NCIs was also observed in the human on an independent set of 940 RNA-seq samples across multiple tissues, including heart, kidney, and liver. Using proteomic data from normal human retina and placenta, we showed that HCIs are a promising indicator of expressed protein isoforms exemplified by NUDFB6 and M6PR. Furthermore, we found that a significant percentage (20%, p = 0.0003) of human and mouse HCIs are homologues, suggesting their conservation between species. Our identified HCIs expand the repertoire of canonical isoforms and are expected to facilitate studying main protein products, understanding gene regulation, and possibly evolution. The network is available through our web server as a rich resource for investigating isoform functional relationships (http://guanlab.ccmb.med.umich.edu/hisonet). All MS/MS data were available at ProteomeXchange Web site (http://www.proteomexchange.org) through their identifiers (retina: PXD001242, placenta: PXD000754).


Assuntos
Processamento Alternativo , Cromossomos Humanos Par 17 , Isoformas de Proteínas/genética , Proteínas/genética , Proteoma , Animais , Humanos , Camundongos , Isoformas de Proteínas/química , Proteínas/química , RNA Mensageiro/genética , Análise de Sequência de RNA
12.
Yi Chuan ; 37(2): 165-173, 2015 Feb.
Artigo em Chinês | MEDLINE | ID: mdl-25665643

RESUMO

Aging is associated with many complex diseases such as cancer and neurodegenerative diseases. Recently, many age-related DNA methylation biomarkers in peripheral whole blood have been identified. These biomarkers may reflect DNA methylation changes derived from changes in the number of a specific leukocyte cell type during aging. To clarify the source of these age-related DNA methylation changes, we analysed DNA methylation profile of peripheral whole blood from three independent cohorts of healthy subjects and identified age-related DNA methylation CpG sites (arCpGs) using the Spearman's rank test with high reproducibility (Hypergeometric test, P=1.65 × 10⁻¹¹). Using a deconvolution algorithm, we found that the proportion of myeloid lineage cells was increased while that of lymphoid lineage cells was decreased in the peripheral whole blood with age (Spearman's rank correlation test, P<0.05, r ≤ 0.22). The CpG sites, whose methylation levels were significantly different in myeloid cells and lymphoid cells, were preferentially recognized as arCpGs in peripheral whole blood. Moreover, the arCpGs in CD4+ T cells significantly overlapped with that in peripheral whole blood (Hypergeometric test, P=6.14 × 10⁻¹²) and 99.1% of the overlapping arCpGs had consistent positive or negative correlations with age. Though the arCpGs in CD14+ monocytes did not significantly overlap with that in peripheral whole blood (Hypergeometric test, P=0.232), 90.1% of 51 overlapping arCpGs were correlated with age in CD14+ monocytes, peripheral whole blood, and CD4+ T cells consistently. In summary, most of the methylation changes in arCpGs identified in peripheral whole blood come from common or specific DNA methylation changes in leukocyte subtypes, while part of them reflect alterations in the number of specific cell types of leukocytes.


Assuntos
Metilação de DNA , Leucócitos Mononucleares/metabolismo , Células Mieloides/metabolismo , Adolescente , Adulto , Fatores Etários , Idoso , Linhagem da Célula , Ilhas de CpG , Feminino , Humanos , Masculino , Pessoa de Meia-Idade
13.
Blood ; 124(20): 3155-64, 2014 Nov 13.
Artigo em Inglês | MEDLINE | ID: mdl-25208887

RESUMO

Plasminogen is the precursor of the serine protease plasmin, a central enzyme of the fibrinolytic system. Plasma levels of plasminogen vary by almost 2-fold among healthy individuals, yet little is known about its heritability or genetic determinants in the general population. In order to identify genetic factors affecting the natural variation of plasminogen levels, we performed a genome-wide association study and linkage analysis in a sample of 3456 young healthy individuals who participated in the Genes and Blood Clotting Study (GABC) or the Trinity Student Study (TSS). Heritability of plasminogen levels was 48.1% to 60.0%. Tobacco smoking and female sex were associated with higher levels of plasminogen. In the meta-analysis, 11 single-nucleotide polymorphisms (SNPs) in 2 regions reached genome-wide significance (P < 5.0E-8). Of these, 9 SNPs were near the PLG or LPA genes on Chr6q26, whereas 2 were on Chr19q13 and 5' upstream of SIGLEC14. These 11 SNPs represented 4 independent signals and collectively explained 6.8% of plasminogen level variation in the study populations. The strongest association was observed for a nonsynonymous SNP in the PLG gene (R523W). Individuals bearing an additional copy of this allele had an average decrease of 13.4% in plasma plasminogen level.


Assuntos
Apolipoproteínas A/genética , Lectinas/genética , Plasminogênio/análise , Plasminogênio/genética , Receptores de Superfície Celular/genética , Fumar/sangue , Adolescente , Adulto , Estudos de Coortes , Feminino , Deleção de Genes , Ligação Genética , Variação Genética , Estudo de Associação Genômica Ampla , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Adulto Jovem
14.
Trends Genet ; 30(8): 340-7, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24951248

RESUMO

The vast majority of multi-exon genes in humans undergo alternative splicing, which greatly increases the functional diversity of protein species. Predicting functions at the isoform level is essential to further our understanding of developmental abnormalities and cancers, which frequently exhibit aberrant splicing and dysregulation of isoform expression. However, determination of isoform function is very difficult, and efforts to predict isoform function have been limited in the functional genomics field. Deep sequencing of RNA now provides an unprecedented amount of expression data at the transcript level. We describe here emerging computational approaches that integrate such large-scale whole-transcriptome sequencing (RNA-seq) data for predicting the functions of alternatively spliced isoforms, and we discuss their applications in developmental and cancer biology. We outline future directions for isoform function prediction, emphasizing the need for heterogeneous genomic data integration and tissue-specific, dynamic isoform-level network modeling, which will allow the field to realize its full potential.


Assuntos
Processamento Alternativo , Biologia Computacional , Genômica , Proteínas/genética , Animais , Humanos , Isoformas de Proteínas
15.
Artigo em Inglês | MEDLINE | ID: mdl-23602956

RESUMO

Wavelength selection is a critical step for producing better prediction performance when applied to spectral data. Considering the fact that the vibrational and rotational spectra have continuous features of spectral bands, we propose a novel method of wavelength interval selection based on random frog, called interval random frog (iRF). To obtain all the possible continuous intervals, spectra are first divided into intervals by moving window of a fix width over the whole spectra. These overlapping intervals are ranked applying random frog coupled with PLS and the optimal ones are chosen. This method has been applied to two near-infrared spectral datasets displaying higher efficiency in wavelength interval selection than others. The source code of iRF can be freely downloaded for academy research at the website: http://code.google.com/p/multivariate-calibration/downloads/list.


Assuntos
Algoritmos , Fenômenos Ópticos , Análise Espectral/métodos , Animais , Calibragem , Análise dos Mínimos Quadrados , Leite , Análise Multivariada , Nicotiana
16.
Anal Chim Acta ; 740: 20-6, 2012 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-22840646

RESUMO

The identification of disease-relevant genes represents a challenge in microarray-based disease diagnosis where the sample size is often limited. Among established methods, reversible jump Markov Chain Monte Carlo (RJMCMC) methods have proven to be quite promising for variable selection. However, the design and application of an RJMCMC algorithm requires, for example, special criteria for prior distributions. Also, the simulation from joint posterior distributions of models is computationally extensive, and may even be mathematically intractable. These disadvantages may limit the applications of RJMCMC algorithms. Therefore, the development of algorithms that possess the advantages of RJMCMC methods and are also efficient and easy to follow for selecting disease-associated genes is required. Here we report a RJMCMC-like method, called random frog that possesses the advantages of RJMCMC methods and is much easier to implement. Using the colon and the estrogen gene expression datasets, we show that random frog is effective in identifying discriminating genes. The top 2 ranked genes for colon and estrogen are Z50753, U00968, and Y10871_at, Z22536_at, respectively. (The source codes with GNU General Public License Version 2.0 are freely available to non-commercial users at: http://code.google.com/p/randomfrog/.).


Assuntos
Neoplasias da Mama/genética , Neoplasias do Colo/genética , Doença/classificação , Doença/genética , Cadeias de Markov , Método de Monte Carlo , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Colo/metabolismo , Estrogênios/genética , Perfilação da Expressão Gênica , Humanos
17.
Artigo em Inglês | MEDLINE | ID: mdl-21339535

RESUMO

Selecting a small number of informative genes for microarray-based tumor classification is central to cancer prediction and treatment. Based on model population analysis, here we present a new approach, called Margin Influence Analysis (MIA), designed to work with support vector machines (SVM) for selecting informative genes. The rationale for performing margin influence analysis lies in the fact that the margin of support vector machines is an important factor which underlies the generalization performance of SVM models. Briefly, MIA could reveal genes which have statistically significant influence on the margin by using Mann-Whitney U test. The reason for using the Mann-Whitney U test rather than two-sample t test is that Mann-Whitney U test is a nonparametric test method without any distribution-related assumptions and is also a robust method. Using two publicly available cancerous microarray data sets, it is demonstrated that MIA could typically select a small number of margin-influencing genes and further achieves comparable classification accuracy compared to those reported in the literature. The distinguished features and outstanding performance may make MIA a good alternative for gene selection of high dimensional microarray data. (The source code in MATLAB with GNU General Public License Version 2.0 is freely available at http://code.google.com/p/mia2009/).


Assuntos
Perfilação da Expressão Gênica/métodos , Máquina de Vetores de Suporte , Bases de Dados Genéticas , Genética Populacional , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos
18.
Analyst ; 136(7): 1456-63, 2011 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-21321685

RESUMO

Selecting a small subset of informative genes plays an important role in accurate prediction of clinical tumor samples. Based on model population analysis, a novel variable selection method, called noise incorporated subwindow permutation analysis (NISPA), is proposed in this study to work with support vector machines (SVMs). The essence of NISPA lies in the point that one noise variable is added into each sampled sub-dataset and then the distribution of variable importance of the added noise could be computed and serves as the common reference to evaluate the experimental variables. Further, by using the non-parametric Mann-Whitney U test, a P value can be assigned to each variable which describes to what extent the distributions of the gene variable and the noise variable are different. According to the computed P values, all the variables could be ranked and then a small subset of informative variables could be determined to build the model. Moreover, by NISPA, we are the first to distinguish the variables into a more detailed classification as informative, uninformative (noise) and interfering variables in comparison with other methods. In this study, two microarray datasets are employed to evaluate the performance of NISPA. The results show that the prediction errors of SVM classifiers could be significantly reduced by variable selection using NISPA. It is concluded that NISPA is a good alternative of variable selection algorithm.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Colo/metabolismo , Neoplasias do Colo/genética , Bases de Dados Factuais , Estrogênios/genética , Humanos , Modelos Genéticos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA