Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35849093

RESUMO

The coronavirus disease 2019 pandemic has alerted people of the threat caused by viruses. Vaccine is the most effective way to prevent the disease from spreading. The interaction between antibodies and antigens will clear the infectious organisms from the host. Identifying B-cell epitopes is critical in vaccine design, development of disease diagnostics and antibody production. However, traditional experimental methods to determine epitopes are time-consuming and expensive, and the predictive performance using the existing in silico methods is not satisfactory. This paper develops a general framework to predict variable-length linear B-cell epitopes specific for human-adapted viruses with machine learning approaches based on Protvec representation of peptides and physicochemical properties of amino acids. QR decomposition is incorporated during the embedding process that enables our models to handle variable-length sequences. Experimental results on large immune epitope datasets validate that our proposed model's performance is superior to the state-of-the-art methods in terms of AUROC (0.827) and AUPR (0.831) on the testing set. Moreover, sequence analysis also provides the results of the viral category for the corresponding predicted epitopes with high precision. Therefore, this framework is shown to reliably identify linear B-cell epitopes of human-adapted viruses given protein sequences and could provide assistance for potential future pandemics and epidemics.


Assuntos
COVID-19 , Vírus , Aminoácidos , Mapeamento de Epitopos/métodos , Epitopos de Linfócito B , Humanos , Aprendizado de Máquina , Peptídeos/química
2.
IEEE Trans Cybern ; 52(11): 12231-12244, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33961570

RESUMO

The rapid emergence of high-dimensional data in various areas has brought new challenges to current ensemble clustering research. To deal with the curse of dimensionality, recently considerable efforts in ensemble clustering have been made by means of different subspace-based techniques. However, besides the emphasis on subspaces, rather limited attention has been paid to the potential diversity in similarity/dissimilarity metrics. It remains a surprisingly open problem in ensemble clustering how to create and aggregate a large population of diversified metrics, and furthermore, how to jointly investigate the multilevel diversity in the large populations of metrics, subspaces, and clusters in a unified framework. To tackle this problem, this article proposes a novel multidiversified ensemble clustering approach. In particular, we create a large number of diversified metrics by randomizing a scaled exponential similarity kernel, which are then coupled with random subspaces to form a large set of metric-subspace pairs. Based on the similarity matrices derived from these metric-subspace pairs, an ensemble of diversified base clusterings can be thereby constructed. Furthermore, an entropy-based criterion is utilized to explore the cluster wise diversity in ensembles, based on which three specific ensemble clustering algorithms are presented by incorporating three types of consensus functions. Extensive experiments are conducted on 30 high-dimensional datasets, including 18 cancer gene expression datasets and 12 image/speech datasets, which demonstrate the superiority of our algorithms over the state of the art. The source code is available at https://github.com/huangdonghere/MDEC.


Assuntos
Benchmarking , Neoplasias , Algoritmos , Análise por Conglomerados , Humanos , Neoplasias/genética , Software
3.
Genome Biol ; 22(1): 226, 2021 08 16.
Artigo em Inglês | MEDLINE | ID: mdl-34399797

RESUMO

Chromatin interactions play important roles in regulating gene expression. However, the availability of genome-wide chromatin interaction data is limited. We develop a computational method, chromatin interaction neural network (ChINN), to predict chromatin interactions between open chromatin regions using only DNA sequences. ChINN predicts CTCF- and RNA polymerase II-associated and Hi-C chromatin interactions. ChINN shows good across-sample performances and captures various sequence features for chromatin interaction prediction. We apply ChINN to 6 chronic lymphocytic leukemia (CLL) patient samples and a published cohort of 84 CLL open chromatin samples. Our results demonstrate extensive heterogeneity in chromatin interactions among CLL patient samples.


Assuntos
Cromatina , Aprendizado de Máquina , Redes Neurais de Computação , Sequência de Bases , Biologia Computacional , Genoma , Humanos , Leucemia/genética
4.
Bioinformatics ; 37(16): 2432-2440, 2021 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-33609108

RESUMO

MOTIVATION: Synthetic Lethality (SL) plays an increasingly critical role in the targeted anticancer therapeutics. In addition, identifying SL interactions can create opportunities to selectively kill cancer cells without harming normal cells. Given the high cost of wet-lab experiments, in silico prediction of SL interactions as an alternative can be a rapid and cost-effective way to guide the experimental screening of candidate SL pairs. Several matrix factorization-based methods have recently been proposed for human SL prediction. However, they are limited in capturing the dependencies of neighbors. In addition, it is also highly challenging to make accurate predictions for new genes without any known SL partners. RESULTS: In this work, we propose a novel graph contextualized attention network named GCATSL to learn gene representations for SL prediction. First, we leverage different data sources to construct multiple feature graphs for genes, which serve as the feature inputs for our GCATSL method. Second, for each feature graph, we design node-level attention mechanism to effectively capture the importance of local and global neighbors and learn local and global representations for the nodes, respectively. We further exploit multi-layer perceptron (MLP) to aggregate the original features with the local and global representations and then derive the feature-specific representations. Third, to derive the final representations, we design feature-level attention to integrate feature-specific representations by taking the importance of different feature graphs into account. Extensive experimental results on three datasets under different settings demonstrated that our GCATSL model outperforms 14 state-of-the-art methods consistently. In addition, case studies further validated the effectiveness of our proposed model in identifying novel SL pairs. AVAILABILITYAND IMPLEMENTATION: Python codes and dataset are freely available on GitHub (https://github.com/longyahui/GCATSL) and Zenodo (https://zenodo.org/record/4522679) under the MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

5.
BMC Bioinformatics ; 21(1): 126, 2020 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-32216744

RESUMO

BACKGROUND: Accumulated evidence shows that the abnormal regulation of long non-coding RNA (lncRNA) is associated with various human diseases. Accurately identifying disease-associated lncRNAs is helpful to study the mechanism of lncRNAs in diseases and explore new therapies of diseases. Many lncRNA-disease association (LDA) prediction models have been implemented by integrating multiple kinds of data resources. However, most of the existing models ignore the interference of noisy and redundancy information among these data resources. RESULTS: To improve the ability of LDA prediction models, we implemented a random forest and feature selection based LDA prediction model (RFLDA in short). First, the RFLDA integrates the experiment-supported miRNA-disease associations (MDAs) and LDAs, the disease semantic similarity (DSS), the lncRNA functional similarity (LFS) and the lncRNA-miRNA interactions (LMI) as input features. Then, the RFLDA chooses the most useful features to train prediction model by feature selection based on the random forest variable importance score that takes into account not only the effect of individual feature on prediction results but also the joint effects of multiple features on prediction results. Finally, a random forest regression model is trained to score potential lncRNA-disease associations. In terms of the area under the receiver operating characteristic curve (AUC) of 0.976 and the area under the precision-recall curve (AUPR) of 0.779 under 5-fold cross-validation, the performance of the RFLDA is better than several state-of-the-art LDA prediction models. Moreover, case studies on three cancers demonstrate that 43 of the 45 lncRNAs predicted by the RFLDA are validated by experimental data, and the other two predicted lncRNAs are supported by other LDA prediction models. CONCLUSIONS: Cross-validation and case studies indicate that the RFLDA has excellent ability to identify potential disease-associated lncRNAs.


Assuntos
Algoritmos , Doença/genética , RNA Longo não Codificante/metabolismo , Área Sob a Curva , Biologia Computacional/métodos , Simulação por Computador , Humanos , MicroRNAs/metabolismo , Neoplasias/genética , Curva ROC , Análise de Regressão , Fatores de Risco
6.
BMC Bioinformatics ; 20(1): 624, 2019 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-31795954

RESUMO

BACKGROUND: A large body of evidence shows that miRNA regulates the expression of its target genes at post-transcriptional level and the dysregulation of miRNA is related to many complex human diseases. Accurately discovering disease-related miRNAs is conductive to the exploring of the pathogenesis and treatment of diseases. However, because of the limitation of time-consuming and expensive experimental methods, predicting miRNA-disease associations by computational models has become a more economical and effective mean. RESULTS: Inspired by the work of predecessors, we proposed an improved computational model based on random forest (RF) for identifying miRNA-disease associations (IRFMDA). First, the integrated similarity of diseases and the integrated similarity of miRNAs were calculated by combining the semantic similarity and Gaussian interaction profile kernel (GIPK) similarity of diseases, the functional similarity and GIPK similarity of miRNAs, respectively. Then, the integrated similarity of diseases and the integrated similarity of miRNAs were combined to represent each miRNA-disease relationship pair. Next, the miRNA-disease relationship pairs contained in the HMDD (v2.0) database were considered positive samples, and the randomly constructed miRNA-disease relationship pairs not included in HMDD (v2.0) were considered negative samples. Next, the feature selection based on the variable importance score of RF was performed to choose more useful features to represent samples to optimize the model's ability of inferring miRNA-disease associations. Finally, a RF regression model was trained on reduced sample space to score the unknown miRNA-disease associations. The AUCs of IRFMDA under local leave-one-out cross-validation (LOOCV), global LOOCV and 5-fold cross-validation achieved 0.8728, 0.9398 and 0.9363, which were better than several excellent models for predicting miRNA-disease associations. Moreover, case studies on oesophageal cancer, lymphoma and lung cancer showed that 94 (oesophageal cancer), 98 (lymphoma) and 100 (lung cancer) of the top 100 disease-associated miRNAs predicted by IRFMDA were supported by the experimental data in the dbDEMC (v2.0) database. CONCLUSIONS: Cross-validation and case studies demonstrated that IRFMDA is an excellent miRNA-disease association prediction model, and can provide guidance and help for experimental studies on the regulatory mechanism of miRNAs in complex human diseases in the future.


Assuntos
Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Estudos de Associação Genética , Predisposição Genética para Doença , MicroRNAs/genética , Área Sob a Curva , Humanos , MicroRNAs/metabolismo , Neoplasias/genética , Fatores de Risco
7.
J Chem Inf Model ; 59(7): 3316-3329, 2019 07 22.
Artigo em Inglês | MEDLINE | ID: mdl-31140800

RESUMO

Solute-solvent interactions are critical for biomolecular stability and recognition. Explicit solvent molecular dynamics (MD) simulations are routinely used to probe such interactions. However, detailed analyses and interpretation of the hydration patterns seen in MD simulations can be both complex and time-consuming. A variety of approaches/tools to compute and interrogate hydration properties in structural ensembles of proteins, nucleic acids, or in general any molecule are available and are complemented here with a new and free software package ("JAL"). Central to "JAL" is an intuitive atom centric approach of computing hydration properties. In addition to the standard metrics commonly used to understand hydration, "JAL" introduces two nonstandard utilities: a program to rapidly compute buried waters in an MD trajectory and a new method to compute multiwater bridges around a solute. We demonstrate the utility of the package by probing the hydration characteristics of the tumor suppressor protein p53 and the translation initiation factor eif4E. "JAL" is hosted online and can be accessed for free at http://mspc.bii.a-star.edu.sg/minhn/jal.html .


Assuntos
Proteína Supressora de Tumor p53/química , Água/química , Modelos Moleculares , Simulação de Dinâmica Molecular , Conformação Proteica , Solventes
8.
Nucleic Acids Res ; 47(4): 1637-1652, 2019 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-30649466

RESUMO

The DNA binding domain (DBD) of the tumor suppressor p53 is the site of several oncogenic mutations. A subset of these mutations lowers the unfolding temperature of the DBD. Unfolding leads to the exposure of a hydrophobic ß-strand and nucleates aggregation which results in pathologies through loss of function and dominant negative/gain of function effects. Inspired by the hypothesis that structural changes that are associated with events initiating unfolding in DBD are likely to present opportunities for inhibition, we investigate the dynamics of the wild type (WT) and some aggregating mutants through extensive all atom explicit solvent MD simulations. Simulations reveal differential conformational sampling between the WT and the mutants of a turn region (S6-S7) that is contiguous to a known aggregation-prone region (APR). The conformational properties of the S6-S7 turn appear to be modulated by a network of interacting residues. We speculate that changes that take place in this network as a result of the mutational stress result in the events that destabilize the DBD and initiate unfolding. These perturbations also result in the emergence of a novel pocket that appears to have druggable characteristics. FDA approved drugs are computationally screened against this pocket.


Assuntos
Proteínas de Ligação a DNA/química , Proteínas Mutantes/química , Bibliotecas de Moléculas Pequenas/química , Proteína Supressora de Tumor p53/química , Proteínas de Ligação a DNA/genética , Avaliação Pré-Clínica de Medicamentos/métodos , Humanos , Interações Hidrofóbicas e Hidrofílicas/efeitos dos fármacos , Modelos Moleculares , Simulação de Dinâmica Molecular , Proteínas Mutantes/genética , Mutação/genética , Conformação Proteica/efeitos dos fármacos , Domínios Proteicos/efeitos dos fármacos , Domínios Proteicos/genética , Desdobramento de Proteína/efeitos dos fármacos , Proteína Supressora de Tumor p53/genética
9.
J Biomol Struct Dyn ; 36(16): 4366-4377, 2018 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29237328

RESUMO

HIV polyprotein Gag is increasingly found to contribute to protease inhibitor resistance. Despite its role in viral maturation and in developing drug resistance, there remain gaps in the knowledge of the role of certain Gag subunits (e.g. p6), and that of non-cleavage mutations in drug resistance. As p6 is flexible, it poses a problem for structural experiments, and is hence often omitted in experimental Gag structural studies. Nonetheless, as p6 is an indispensable component for viral assembly and maturation, we have modeled the full length Gag structure based on several experimentally determined constraints and studied its structural dynamics. Our findings suggest that p6 can mechanistically modulate Gag conformations. In addition, the full length Gag model reveals that allosteric communication between the non-cleavage site mutations and the first Gag cleavage site could possibly result in protease drug resistance, particularly in the absence of mutations in Gag cleavage sites. Our study provides a mechanistic understanding to the structural dynamics of HIV-1 Gag, and also proposes p6 as a possible drug target in anti-HIV therapy.


Assuntos
Farmacorresistência Viral/genética , Protease de HIV/genética , HIV-1/genética , Mutação , Produtos do Gene gag do Vírus da Imunodeficiência Humana/genética , Regulação Alostérica , Sítios de Ligação/genética , Farmacorresistência Viral/efeitos dos fármacos , Infecções por HIV/prevenção & controle , Infecções por HIV/virologia , Protease de HIV/química , Protease de HIV/metabolismo , Inibidores da Protease de HIV/química , Inibidores da Protease de HIV/metabolismo , Inibidores da Protease de HIV/farmacologia , HIV-1/efeitos dos fármacos , HIV-1/fisiologia , Humanos , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Produtos do Gene gag do Vírus da Imunodeficiência Humana/antagonistas & inibidores , Produtos do Gene gag do Vírus da Imunodeficiência Humana/química , Produtos do Gene gag do Vírus da Imunodeficiência Humana/metabolismo
10.
PLoS One ; 11(10): e0165049, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27764199

RESUMO

Extracellular signals are captured and transmitted by signaling proteins inside a cell. An important type of cellular responses to the signals is the cell fate decision, e.g., apoptosis. However, the underlying mechanisms of cell fate regulation are still unclear, thus comprehensive and detailed kinetic models are not yet available. Alternatively, data-driven models are promising to bridge signaling data with the phenotypic measurements of cell fates. The traditional linear model for data-driven modeling of signaling pathways has its limitations because it assumes that the a cell fate is proportional to the activities of signaling proteins, which is unlikely in the complex biological systems. Therefore, we propose a power-law model to relate the activities of all the measured signaling proteins to the probabilities of cell fates. In our experiments, we compared our nonlinear power-law model with the linear model on three cancer datasets with phosphoproteomics and cell fate measurements, which demonstrated that the nonlinear model has superior performance on cell fates prediction. By in silico simulation of virtual protein knock-down, the proposed model is able to reveal drug effects which can complement traditional approaches such as binding affinity analysis. Moreover, our model is able to capture cell line specific information to distinguish one cell line from another in cell fate prediction. Our results show that the power-law data-driven model is able to perform better in cell fate prediction and provide more insights into the signaling pathways for cancer cell fates than the linear model.


Assuntos
Neoplasias/patologia , Fosfoproteínas/metabolismo , Proteômica/métodos , Diferenciação Celular , Simulação por Computador , Humanos , Modelos Biológicos , Neoplasias/metabolismo , Dinâmica não Linear , Transdução de Sinais
11.
BMC Cancer ; 15: 828, 2015 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-26520397

RESUMO

BACKGROUND: Despite advances in therapeutics, outcomes for hepatocellular carcinoma (HCC) remain poor and there is an urgent need for efficacious systemic therapy. Unfortunately, drugs that are successful in preclinical studies often fail in the clinical setting, and we hypothesize that this is due to functional differences between primary tumors and commonly used preclinical models. In this study, we attempt to answer this question by comparing tumor morphology and gene expression profiles between primary tumors, xenografts and HCC cell lines. METHODS: Hep G2 cell lines and tumor cells from patient tumor explants were subcutaneously (ectopically) injected into the flank and orthotopically into liver parenchyma of Mus Musculus SCID mice. The mice were euthanized after two weeks. RNA was extracted from the tumors, and gene expression profiling was performed using the Gene Chip Human Genome U133 Plus 2.0. Principal component analyses (PCA) and construction of dendrograms were conducted using Partek genomics suite. RESULTS: PCA showed that the commonly used HepG2 cell line model and its xenograft counterparts were vastly different from all fresh primary tumors. Expression profiles of primary tumors were also significantly divergent from their counterpart patient-derived xenograft (PDX) models, regardless of the site of implantation. Xenografts from the same primary tumors were more likely to cluster together regardless of site of implantation, although heat maps showed distinct differences in gene expression profiles between orthotopic and ectopic models. CONCLUSIONS: The data presented here challenges the utility of routinely used preclinical models. Models using HepG2 were vastly different from primary tumors and PDXs, suggesting that this is not clinically representative. Surprisingly, site of implantation (orthotopic versus ectopic) resulted in limited impact on gene expression profiles, and in both scenarios xenografts differed significantly from the original primary tumors, challenging the long-held notion that orthotopic PDX model is the gold standard preclinical model for HCC.


Assuntos
Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/patologia , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/patologia , Transcriptoma , Animais , Análise por Conglomerados , Biologia Computacional/métodos , Modelos Animais de Doenças , Células Hep G2 , Xenoenxertos , Humanos , Camundongos
12.
J Bioinform Comput Biol ; 13(3): 1541002, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25669329

RESUMO

A major goal of personalized anti-cancer therapy is to increase the drug effects while reducing the side effects as much as possible. A novel therapeutic strategy called synthetic lethality (SL) provides a great opportunity to achieve this goal. SL arises if mutations of both genes lead to cell death while mutation of either single gene does not. Hence, the SL partner of a gene mutated only in cancer cells could be a promising drug target, and the identification of SL pairs of genes is of great significance in pharmaceutical industry. In this paper, we propose a hybridized method to predict SL pairs of genes. We combine a data-driven model with knowledge of signalling pathways to simulate the influence of single gene knock-down and double genes knock-down to cell death. A pair of genes is considered as an SL candidate when double knock-down increases the probability of cell death significantly, but single knock-down does not. The single gene knock-down is confirmed according to the human essential genes database. Our validation against literatures shows that the predicted SL candidates agree well with wet-lab experiments. A few novel reliable SL candidates are also predicted by our model.


Assuntos
Técnicas de Silenciamento de Genes/métodos , Modelos Genéticos , Neoplasias/genética , Medicina de Precisão/métodos , Algoritmos , Morte Celular/genética , Simulação por Computador , Genes Essenciais , Humanos , Neoplasias/patologia , Transdução de Sinais/genética
13.
Cancer Inform ; 13(Suppl 3): 71-80, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25452682

RESUMO

A major goal in cancer medicine is to find selective drugs with reduced side effect. A pair of genes is called synthetic lethality (SL) if mutations of both genes will kill a cell while mutation of either gene alone will not. Hence, a gene in SL interactions with a cancer-specific mutated gene will be a promising drug target with anti-cancer selectivity. Wet-lab screening approach is still so costly that even for yeast only a small fraction of gene pairs has been covered. Computational methods are therefore important for large-scale discovery of SL interactions. Most existing approaches focus on individual features or machine-learning methods, which are prone to noise or overfitting. In this paper, we propose an approach named MetaSL for predicting yeast SL, which integrates 17 genomic and proteomic features and the outputs of 10 classification methods. MetaSL thus combines the strengths of existing methods and achieves the highest area under the Receiver Operating Characteristics (ROC) curve (AUC) of 87.1% among all competitors on yeast data. Moreover, through orthologous mapping from yeast to human genes, we then predicted several lists of candidate SL pairs in human cancer. Our method and predictions would thus shed light on mechanisms of SL and lead to discovery of novel anti-cancer drugs. In addition, all the experimental results can be downloaded from http://www.ntu.edu.sg/home/zhengjie/data/MetaSL.

14.
BMC Syst Biol ; 8: 107, 2014 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-25208583

RESUMO

BACKGROUND: The regulatory mechanism of recombination is one of the most fundamental problems in genomics, with wide applications in genome wide association studies (GWAS), birth-defect diseases, molecular evolution, cancer research, etc. Recombination events cluster into short genomic regions called "recombination hotspots". Recently, a zinc finger protein PRDM9 was reported to regulate recombination hotspots in human and mouse genomes. In addition, a 13-mer motif contained in the binding sites of PRDM9 is found to be enriched in human hotspots. However, this 13-mer motif only covers a fraction of hotspots, indicating that PRDM9 is not the only regulator of recombination hotspots. Therefore, the challenge of discovering other regulators of recombination hotspots becomes significant. Furthermore, recombination is a complex process. Hence, multiple proteins acting as machinery, rather than individual proteins, are more likely to carry out this process in a precise and stable manner. Therefore, the extension of the prediction of individual trans-regulators to protein complexes is also highly desired. RESULTS: In this paper, we introduce a pipeline to identify genes and protein complexes associated with recombination hotspots. First, we prioritize proteins associated with hotspots based on their preference of binding to hotspots and coldspots. Second, using the above identified genes as seeds, we apply the Random Walk with Restart algorithm (RWR) to propagate their influences to other proteins in protein-protein interaction (PPI) networks. Hence, many proteins without DNA-binding information will also be assigned a score to implicate their roles in recombination hotspots. Third, we construct sub-PPI networks induced by top genes ranked by RWR for various species (e.g., yeast, human and mouse) and detect protein complexes in those sub-PPI networks. CONCLUSIONS: The GO term analysis show that our prioritizing methods and the RWR algorithm are capable of identifying novel genes associated with recombination hotspots. The trans-regulators predicted by our pipeline are enriched with epigenetic functions (e.g., histone modifications), demonstrating the epigenetic regulatory mechanisms of recombination hotspots. The identified protein complexes also provide us with candidates to further investigate the molecular machineries for recombination hotspots. Moreover, the experimental data and results are available on our web site http://www.ntu.edu.sg/home/zhengjie/data/RecombinationHotspot/NetPipe/.


Assuntos
Biologia Computacional/métodos , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Meiose/genética , Recombinação Genética , Sequências Reguladoras de Ácido Nucleico/genética , Leveduras/citologia , Algoritmos , Animais , Regulação da Expressão Gênica , Ontologia Genética , Humanos , Camundongos , Razão de Chances , Mapeamento de Interação de Proteínas , Leveduras/genética , Leveduras/metabolismo
15.
Genome Res ; 24(7): 1064-74, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24709820

RESUMO

Integrating the genotype with epigenetic marks holds the promise of better understanding the biology that underlies the complex interactions of inherited and environmental components that define the developmental origins of a range of disorders. The quality of the in utero environment significantly influences health over the lifecourse. Epigenetics, and in particular DNA methylation marks, have been postulated as a mechanism for the enduring effects of the prenatal environment. Accordingly, neonate methylomes contain molecular memory of the individual in utero experience. However, interindividual variation in methylation can also be a consequence of DNA sequence polymorphisms that result in methylation quantitative trait loci (methQTLs) and, potentially, the interaction between fixed genetic variation and environmental influences. We surveyed the genotypes and DNA methylomes of 237 neonates and found 1423 punctuate regions of the methylome that were highly variable across individuals, termed variably methylated regions (VMRs), against a backdrop of homogeneity. MethQTLs were readily detected in neonatal methylomes, and genotype alone best explained ∼25% of the VMRs. We found that the best explanation for 75% of VMRs was the interaction of genotype with different in utero environments, including maternal smoking, maternal depression, maternal BMI, infant birth weight, gestational age, and birth order. Our study sheds new light on the complex relationship between biological inheritance as represented by genotype and individual prenatal experience and suggests the importance of considering both fixed genetic variation and environmental factors in interpreting epigenetic variation.


Assuntos
Metilação de DNA , Meio Ambiente , Epigênese Genética , Interação Gene-Ambiente , Heterogeneidade Genética , Genótipo , Transcriptoma , Biologia Computacional/métodos , Ilhas de CpG , Epigenômica/métodos , Feminino , Humanos , Recém-Nascido , Masculino , Polimorfismo de Nucleotídeo Único , Gravidez , Locos de Características Quantitativas , Fatores de Risco
16.
Int J Data Min Bioinform ; 7(4): 416-35, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23798225

RESUMO

Pre-processing algorithms (PPA) and gene-selection methods (GSM) are commonly employed to select Differentially Expressed Genes (DEGs) from microarray data. Previous studies established that different combinations of PPAs and GSMs are intrinsically different in their performance to select biologically relevant DEGs. In this study, we evaluated eight combinations of PPAs and GSMs for their ability to select DEGs for prioritising gene-networks. Although the different combinations yielded dissimilar DEG-lists, all DEG-lists selected could segregate tumour from normal. Nevertheless, the DEG-list selected significantly impacted the prioritisation of cancer-associated gene-networks; hence the initial choice of PPA and GSM is crucial for subsequent interactome investigations.


Assuntos
Redes Reguladoras de Genes , Neoplasias/genética , Transcriptoma , Algoritmos , Humanos , Análise Serial de Proteínas
17.
Proteome Sci ; 10 Suppl 1: S11, 2012 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-22759569

RESUMO

The regulatory mechanism of recombination is a fundamental problem in genomics, with wide applications in genome-wide association studies, birth-defect diseases, molecular evolution, cancer research, etc. In mammalian genomes, recombination events cluster into short genomic regions called "recombination hotspots". Recently, a 13-mer motif enriched in hotspots is identified as a candidate cis-regulatory element of human recombination hotspots; moreover, a zinc finger protein, PRDM9, binds to this motif and is associated with variation of recombination phenotype in human and mouse genomes, thus is a trans-acting regulator of recombination hotspots. However, this pair of cis and trans-regulators covers only a fraction of hotspots, thus other regulators of recombination hotspots remain to be discovered. In this paper, we propose an approach to predicting additional trans-regulators from DNA-binding proteins by comparing their enrichment of binding sites in hotspots. Applying this approach on newly mapped mouse hotspots genome-wide, we confirmed that PRDM9 is a major trans-regulator of hotspots. In addition, a list of top candidate trans-regulators of mouse hotspots is reported. Using GO analysis we observed that the top genes are enriched with function of histone modification, highlighting the epigenetic regulatory mechanisms of recombination hotspots.

18.
Nucleic Acids Res ; 40(2): e16, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22121227

RESUMO

R-loop is the structure co-transcriptionally formed between nascent RNA transcript and DNA template, leaving the non-transcribed DNA strand unpaired. This structure can be involved in the hyper-mutation and dsDNA breaks in mammalian immunoglobulin (Ig) genes, oncogenes and neurodegenerative disease related genes. R-loops have not been studied at the genome scale yet. To identify the R-loops, we developed a computational algorithm and mapped R-loop forming sequences (RLFS) onto 66,803 sequences defined by UCSC as 'known' genes. We found that ∼59% of these transcribed sequences contain at least one RLFS. We created R-loopDB (http://rloop.bii.a-star.edu.sg/), the database that collects all RLFS identified within over half of the human genes and links to the UCSC Genome Browser for information integration and visualisation across a variety of bioinformatics sources. We found that many oncogenes and tumour suppressors (e.g. Tp53, BRCA1, BRCA2, Kras and Ptprd) and neurodegenerative diseases related genes (e.g. ATM, Park2, Ptprd and GLDC) could be prone to significant R-loop formation. Our findings suggest that R-loops provide a novel level of RNA-DNA interactome complexity, playing key roles in gene expression controls, mutagenesis, recombination process, chromosomal rearrangement, alternative splicing, DNA-editing and epigenetic modifications. RLFSs could be used as a novel source of prospective therapeutic targets.


Assuntos
DNA/química , RNA/química , Algoritmos , Processamento Alternativo , Pareamento de Bases , Bases de Dados de Ácidos Nucleicos , Epigênese Genética , Genes , Genoma Humano , Humanos , Modelos Genéticos , Mutação , Neoplasias/genética , Doenças Neurodegenerativas/genética , Recombinação Genética , Transcrição Gênica
19.
PLoS One ; 6(7): e21502, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21799737

RESUMO

BACKGROUND: Phenotypically similar diseases have been found to be caused by functionally related genes, suggesting a modular organization of the genetic landscape of human diseases that mirrors the modularity observed in biological interaction networks. Protein complexes, as molecular machines that integrate multiple gene products to perform biological functions, express the underlying modular organization of protein-protein interaction networks. As such, protein complexes can be useful for interrogating the networks of phenome and interactome to elucidate gene-phenotype associations of diseases. METHODOLOGY/PRINCIPAL FINDINGS: We proposed a technique called RWPCN (Random Walker on Protein Complex Network) for predicting and prioritizing disease genes. The basis of RWPCN is a protein complex network constructed using existing human protein complexes and protein interaction network. To prioritize candidate disease genes for the query disease phenotypes, we compute the associations between the protein complexes and the query phenotypes in their respective protein complex and phenotype networks. We tested RWPCN on predicting gene-phenotype associations using leave-one-out cross-validation; our method was observed to outperform existing approaches. We also applied RWPCN to predict novel disease genes for two representative diseases, namely, Breast Cancer and Diabetes. CONCLUSIONS/SIGNIFICANCE: Guilt-by-association prediction and prioritization of disease genes can be enhanced by fully exploiting the underlying modular organizations of both the disease phenome and the protein interactome. Our RWPCN uses a novel protein complex network as a basis for interrogating the human phenome-interactome network. As the protein complex network can capture the underlying modularity in the biological interaction networks better than simple protein interaction networks, RWPCN was found to be able to detect and prioritize disease genes better than traditional approaches that used only protein-phenotype associations.


Assuntos
Biologia Computacional/métodos , Doença/genética , Fenótipo , Mapas de Interação de Proteínas/genética , Algoritmos , Genoma Humano/genética , Humanos
20.
BMC Genomics ; 12 Suppl 3: S24, 2011 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-22369099

RESUMO

BACKGROUND: Lung cancer is the leading cause of cancer deaths in the world. The most common type of lung cancer is lung adenocarcinoma (AC). The genetic mechanisms of the early stages and lung AC progression steps are poorly understood. There is currently no clinically applicable gene test for the early diagnosis and AC aggressiveness. Among the major reasons for the lack of reliable diagnostic biomarkers are the extraordinary heterogeneity of the cancer cells, complex and poorly understudied interactions of the AC cells with adjacent tissue and immune system, gene variation across patient cohorts, measurement variability, small sample sizes and sub-optimal analytical methods. We suggest that gene expression profiling of the primary tumours and adjacent tissues (PT-AT) handled with a rational statistical and bioinformatics strategy of biomarker prediction and validation could provide significant progress in the identification of clinical biomarkers of AC. To minimise sample-to-sample variability, repeated multivariate measurements in the same object (organ or tissue, e.g. PT-AT in lung) across patients should be designed, but prediction and validation on the genome scale with small sample size is a great methodical challenge. RESULTS: To analyse PT-AT relationships efficiently in the statistical modelling, we propose an Extreme Class Discrimination (ECD) feature selection method that identifies a sub-set of the most discriminative variables (e.g. expressed genes). Our method consists of a paired Cross-normalization (CN) step followed by a modified sign Wilcoxon test with multivariate adjustment carried out for each variable. Using an Affymetrix U133A microarray paired dataset of 27 AC patients, we reviewed the global reprogramming of the transcriptome in human lung AC tissue versus normal lung tissue, which is associated with about 2,300 genes discriminating the tissues with 100% accuracy. Cluster analysis applied to these genes resulted in four distinct gene groups which we classified as associated with (i) up-regulated genes in the mitotic cell cycle lung AC, (ii) silenced/suppressed gene specific for normal lung tissue, (iii) cell communication and cell motility and (iv) the immune system features. The genes related to mutagenesis, specific lung cancers, early stage of AC development, tumour aggressiveness and metabolic pathway alterations and adaptations of cancer cells are strongly enriched in the AC PT-AT discriminative gene set. Two AC diagnostic biomarkers SPP1 and CENPA were successfully validated on RT-RCR tissue array. ECD method was systematically compared to several alternative methods and proved to be of better performance and as well as it was validated by comparison of the predicted gene set with literature meta-signature. CONCLUSIONS: We developed a method that identifies and selects highly discriminative variables from high dimensional data spaces of potential biomarkers based on a statistical analysis of paired samples when the number of samples is small. This method provides superior selection in comparison to conventional methods and can be widely used in different applications. Our method revealed at least 23 hundreds patho-biologically essential genes associated with the global transcriptional reprogramming of human lung epithelium cells and lung AC aggressiveness. This gene set includes many previously published AC biomarkers reflecting inherent disease complexity and specifies the mechanisms of carcinogenesis in the lung AC. SPP1, CENPA and many other PT-AT discriminative genes could be considered as the prospective diagnostic and prognostic biomarkers of lung AC.


Assuntos
Adenocarcinoma/genética , Biologia Computacional/métodos , Neoplasias Pulmonares/genética , Pulmão/metabolismo , Adenocarcinoma/diagnóstico , Algoritmos , Biomarcadores Tumorais/análise , Biomarcadores Tumorais/genética , Análise por Conglomerados , Bases de Dados Factuais , Análise Discriminante , Humanos , Neoplasias Pulmonares/diagnóstico , Análise de Sequência com Séries de Oligonucleotídeos , Prognóstico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA