Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 120
Filtrar
1.
PLoS Comput Biol ; 17(11): e1009161, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34762640

RESUMO

Network propagation refers to a class of algorithms that integrate information from input data across connected nodes in a given network. These algorithms have wide applications in systems biology, protein function prediction, inferring condition-specifically altered sub-networks, and prioritizing disease genes. Despite the popularity of network propagation, there is a lack of comparative analyses of different algorithms on real data and little guidance on how to select and parameterize the various algorithms. Here, we address this problem by analyzing different combinations of network normalization and propagation methods and by demonstrating schemes for the identification of optimal parameter settings on real proteome and transcriptome data. Our work highlights the risk of a 'topology bias' caused by the incorrect use of network normalization approaches. Capitalizing on the fact that network propagation is a regularization approach, we show that minimizing the bias-variance tradeoff can be utilized for selecting optimal parameters. The application to real multi-omics data demonstrated that optimal parameters could also be obtained by either maximizing the agreement between different omics layers (e.g. proteome and transcriptome) or by maximizing the consistency between biological replicates. Furthermore, we exemplified the utility and robustness of network propagation on multi-omics datasets for identifying ageing-associated genes in brain and liver tissues of rats and for elucidating molecular mechanisms underlying prostate cancer progression. Overall, this work compares different network propagation approaches and it presents strategies for how to use network propagation algorithms to optimally address a specific research question at hand.


Assuntos
Algoritmos , Biologia Computacional/métodos , Envelhecimento/genética , Envelhecimento/metabolismo , Animais , Viés , Encéfalo/metabolismo , Biologia Computacional/estatística & dados numéricos , Interpretação Estatística de Dados , Progressão da Doença , Perfilação da Expressão Gênica/estatística & dados numéricos , Redes Reguladoras de Genes , Genômica/estatística & dados numéricos , Humanos , Fígado/metabolismo , Masculino , Neoplasias da Próstata/etiologia , Neoplasias da Próstata/genética , Neoplasias da Próstata/metabolismo , Mapas de Interação de Proteínas , Proteômica/estatística & dados numéricos , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Ratos , Biologia de Sistemas
2.
Medicine (Baltimore) ; 100(37): e27257, 2021 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-34664875

RESUMO

ABSTRACT: Nasopharyngeal carcinoma (NPC) is one of the most prevalent head and neck cancer in southeast Asia. It is necessary to proceed further studies on the mechanism of occurrence and development of NPC.In this study, we employed the microarray dataset GSE12452 and GSE53819 including 28 normal samples and 49 nasopharyngeal carcinoma samples downloaded from the Gene Expression Omnibus(GEO) to analysis. R software, STRING, CMap, and various databases were used to screen differentially expressed genes (DEGs), construct the protein-protein interaction (PPI) network, and proceed small molecule compounds analysis, among others.Totally, 424 DEGs were selected from the dataset. DEGs were mainly enriched in extracellular matrix organization, cilium organization, PI3K-Akt signaling pathway, collagen-containing extracellular matrix, and extracellular matrix-receptor interaction, among others. Top 10 upregulated and top 10 downregulated hub genes were identified as hub DEGs. Piperlongumine, apigenin, menadione, 1,4-chrysenequinone, and chrysin were identified as potential drugs to prevent and treat NPC. Besides, the effect of genes CDK1, CDC45, RSPH4A, and ZMYND10 on survival of NPC was validated in GEPIA database.The data revealed novel aberrantly expressed genes and pathways in NPC by bioinformatics analysis, potentially providing novel insights for the molecular mechanisms governing NPC progression. Although further studies needed, the results demonstrated that the expression levels of CDK1, CDC45, RSPH4A, and ZMYND10 probably affected survival of NPC patients.


Assuntos
Biologia Computacional/métodos , Neoplasias Nasofaríngeas/genética , Bibliometria , Biologia Computacional/estatística & dados numéricos , Perfilação da Expressão Gênica/instrumentação , Perfilação da Expressão Gênica/métodos , Humanos , Neoplasias Nasofaríngeas/patologia
3.
Medicine (Baltimore) ; 100(24): e26271, 2021 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-34128858

RESUMO

BACKGROUND: Thymic epithelial tumors (TETs), originating from the thymic epithelial cells, are the most common primary neoplasms of the anterior mediastinum. Emerging evidence demonstrated that the competing endogenous RNAs (ceRNAs) exerted a crucial effect on tumor development. Hence, it is urgent to understand the regulatory mechanism of ceRNAs in TETs and its impact on tumor prognosis. METHODS: TETs datasets were harvested from the UCSC Xena as the training cohort, followed by differentially expressed mRNAs (DEmRNAs), lncRNAs (DElncRNAs), and miRNAs (DEmiRNAs) at different pathologic type (A, AB, B, and TC) identified via DESeq2 package. clusterProfiler package was utilized to carry out gene ontology and Kyoto encyclopedia of genes and genomes functional analysis on the DEmRNAs. Subsequently, the lncRNA-miRNA-mRNA regulatory network was constructed to screen the key DEmRNAs. After the key DEmRNAs were verified in the external cohort from Gene Expression Omnibus database, their associated-ceRNAs modules were used to perform the K-M and Cox regression analysis to build a prognostic significance for TETs. Lastly, the feasibility of the prognostic significance was validated by receiver operating characteristic (ROC) curves and the area under the curve. RESULTS: Finally, a total of 463 DEmRNAs, 87 DElncRNAs, and 20 DEmiRNAs were obtained from the intersection of differentially expressed genes in different pathological types of TETs. Functional enrichment analysis showed that the DEmRNAs were closely related to cell proliferation and tumor development. After lncRNA-miRNA-mRNA network construction and external cohort validation, a total of 4 DEmRNAs DOCK11, MCAM, MYO10, and WASF3 were identified and their associated-ceRNA modules were significantly associated with prognosis, which contained 3 lncRNAs (lncRNA LINC00665, lncRNA NR2F1-AS1, and lncRNA RP11-285A1.1), 4 mRNAs (DOCK11, MCAM, MYO10, and WASF3), and 4 miRNAs (hsa-mir-143, hsa-mir-141, hsa-mir-140, and hsa-mir-3199). Meanwhile, ROC curves verified the accuracy of prediction ability of the screened ceRNA modules for prognosis of TETs. CONCLUSION: Our study revealed that ceRNAs modules might exert a crucial role in the progression of TETs. The mRNA associated-ceRNA modules could effectively predict the prognosis of TETs, which might be the potential prognostic and therapeutic markers for TETs patients.


Assuntos
Biologia Computacional/estatística & dados numéricos , MicroRNAs/análise , Neoplasias Epiteliais e Glandulares/genética , RNA Longo não Codificante/análise , RNA Mensageiro/análise , Neoplasias do Timo/genética , Biomarcadores Tumorais/genética , Estudos de Coortes , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Progressão da Doença , Regulação Neoplásica da Expressão Gênica/genética , Ontologia Genética , Redes Reguladoras de Genes , Humanos , Valor Preditivo dos Testes , Prognóstico , Modelos de Riscos Proporcionais , Curva ROC
4.
Sci Rep ; 11(1): 5146, 2021 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-33664338

RESUMO

Multi-modal molecular profiling data in bulk tumors or single cells are accumulating at a fast pace. There is a great need for developing statistical and computational methods to reveal molecular structures in complex data types toward biological discoveries. Here, we introduce Nebula, a novel Bayesian integrative clustering analysis for high dimensional multi-modal molecular data to identify directly interpretable clusters and associated biomarkers in a unified and biologically plausible framework. To facilitate computational efficiency, a variational Bayes approach is developed to approximate the joint posterior distribution to achieve model inference in high-dimensional settings. We describe a pan-cancer data analysis of genomic, epigenomic, and transcriptomic alterations in close to 9000 tumor samples across canonical oncogenic signaling pathways, immune and stemness phenotype, with comparisons to state-of-the-art clustering methods. We demonstrate that Nebula has the unique advantage of revealing patterns on the basis of shared pathway alterations, offering biological and clinical insights beyond tumor type and histology in the pan-cancer analysis setting. We also illustrate the utility of Nebula in single cell data for immune cell decomposition in peripheral blood samples.


Assuntos
Carcinogênese/genética , Biologia Computacional/estatística & dados numéricos , Genômica/estatística & dados numéricos , Neoplasias/genética , Teorema de Bayes , Análise por Conglomerados , Epigenômica , Humanos , Modelos Estatísticos , Neoplasias/patologia , Transcriptoma/genética
5.
Ann N Y Acad Sci ; 1493(1): 3-28, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33410160

RESUMO

Translational medicine describes a bench-to-bedside approach that eventually converts findings from basic scientific studies into real-world clinical research. It encompasses new treatments, advanced equipment, medical procedures, preventive and diagnostic approaches creating a bridge between basic studies and clinical research. Despite considerable investment in basic science, improvements in technology, and increased knowledge of the biology of human disease, translation of laboratory findings into substantial therapeutic progress has been slower than expected, and the return on investment has been limited in terms of clinical efficacy. In this review, we provide a fresh perspective on some experimental and computational approaches for translational medicine. We cover the analysis, visualization, and modeling of high-dimensional data, with a focus on single-cell technologies, sequence, and structure analysis. Current challenges, limitations, and future directions, with examples from cancer and fibrotic disease, will be discussed.


Assuntos
Big Data , Pesquisa Translacional Biomédica/métodos , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Simulação por Computador , Mineração de Dados , Epigenoma , Feminino , Fibrose/diagnóstico , Fibrose/terapia , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/estatística & dados numéricos , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Aprendizado de Máquina , Masculino , Neoplasias/diagnóstico , Neoplasias/etiologia , Neoplasias/terapia , Neoplasias da Próstata/diagnóstico , Neoplasias da Próstata/etiologia , Neoplasias da Próstata/terapia , Proteoma , Análise de Célula Única/métodos , Análise de Célula Única/estatística & dados numéricos , Pesquisa Translacional Biomédica/estatística & dados numéricos , Sequenciamento Completo do Genoma/métodos , Sequenciamento Completo do Genoma/estatística & dados numéricos
6.
Nucleic Acids Res ; 49(D1): D266-D273, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33237325

RESUMO

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.


Assuntos
Biologia Computacional/estatística & dados numéricos , Bases de Dados de Proteínas/estatística & dados numéricos , Domínios Proteicos , Proteínas/química , Sequência de Aminoácidos , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , Biologia Computacional/métodos , Epidemias , Humanos , Internet , Anotação de Sequência Molecular , Proteínas/genética , Proteínas/metabolismo , SARS-CoV-2/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Análise de Sequência de Proteína/métodos , Homologia de Sequência de Aminoácidos , Proteínas Virais/química , Proteínas Virais/genética , Proteínas Virais/metabolismo
7.
Nucleic Acids Res ; 49(D1): D261-D265, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33137182

RESUMO

ADP-ribosylation is a protein modification responsible for biological processes such as DNA repair, RNA regulation, cell cycle and biomolecular condensate formation. Dysregulation of ADP-ribosylation is implicated in cancer, neurodegeneration and viral infection. We developed ADPriboDB (adpribodb.leunglab.org) to facilitate studies in uncovering insights into the mechanisms and biological significance of ADP-ribosylation. ADPriboDB 2.0 serves as a one-stop repository comprising 48 346 entries and 9097 ADP-ribosylated proteins, of which 6708 were newly identified since the original database release. In this updated version, we provide information regarding the sites of ADP-ribosylation in 32 946 entries. The wealth of information allows us to interrogate existing databases or newly available data. For example, we found that ADP-ribosylated substrates are significantly associated with the recently identified human protein interaction networks associated with SARS-CoV-2, which encodes a conserved protein domain called macrodomain that binds and removes ADP-ribosylation. In addition, we create a new interactive tool to visualize the local context of ADP-ribosylation, such as structural and functional features as well as other post-translational modifications (e.g. phosphorylation, methylation and ubiquitination). This information provides opportunities to explore the biology of ADP-ribosylation and generate new hypotheses for experimental testing.


Assuntos
Adenosina Difosfato Ribose/metabolismo , Biologia Computacional/estatística & dados numéricos , Bases de Dados de Proteínas/estatística & dados numéricos , Proteínas/metabolismo , ADP-Ribosilação , Sítios de Ligação , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , Biologia Computacional/métodos , Humanos , Domínios Proteicos , Processamento de Proteína Pós-Traducional , Proteínas/química , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Proteínas Virais/química , Proteínas Virais/metabolismo
8.
Medicine (Baltimore) ; 99(32): e21702, 2020 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-32769939

RESUMO

Hepatocellular carcinoma (HCC) is a malignant tumor with unsatisfactory prognosis. The abnormal genes expression is significantly associated with initiation and poor prognosis of HCC. The aim of the present study was to identify molecular biomarkers related to the initiation and development of HCC via bioinformatics analysis, so as to provide a certain molecular mechanism for individualized treatment of hepatocellular carcinoma.Three datasets (GSE101685, GSE112790, and GSE121248) from the GEO database were used for the bioinformatics analysis. Differentially expressed genes (DEGs) of HCC and normal liver samples were obtained using GEO2R online tools. Gene ontology term and Kyoto Encyclopedia of Gene and Genome (KEGG) pathway analysis were conducted via the Database for Annotation, Visualization, and Integrated Discovery online bioinformatics tool. The protein-protein interaction (PPI) network was constructed by the Search Tool for the Retrieval of Interacting Genes database and hub genes were visualized by Cytoscape. Survival analysis and RNA sequencing expression were conducted by UALCAN and Gene Expression Profiling Interactive Analysis.A total of 115 shared DEGs were identified, including 30 upregulated genes and 85 downregulated genes in HCC samples. P53 signaling pathway and cell cycle were the major enriched pathways for the upregulated DEGs whereas metabolism-related pathways were the major enriched pathways for the downregulated DEGs. The PPI network was established with 105 nodes and 249 edges and 3 significant modules were identified via molecular complex detection. Additionally, 17 candidate genes from these 3 modules were significantly correlated with HCC patient survival and 15 of 17 genes exhibited high expression level in HCC samples. Moreover, 4 hub genes (CCNB1, CDK1, RRM2, BUB1B) were identified for further reanalysis of KEGG pathway, and enriched in 2 pathways, the P53 signaling pathway and cell cycle pathway.Overexpression of CCNB1, CDK1, RRM2, and BUB1B in HCC samples was correlated with poor survival in HCC patients, which could be potential therapeutic targets for HCC.


Assuntos
Biomarcadores Tumorais/análise , Carcinoma Hepatocelular/diagnóstico , Biologia Computacional/estatística & dados numéricos , Programas de Rastreamento/normas , Prognóstico , Carcinoma Hepatocelular/mortalidade , Carcinoma Hepatocelular/fisiopatologia , China/epidemiologia , Análise por Conglomerados , Expressão Gênica/genética , Humanos , Neoplasias Hepáticas/epidemiologia , Neoplasias Hepáticas/mortalidade , Neoplasias Hepáticas/patologia , Programas de Rastreamento/métodos , Mapas de Interação de Proteínas/genética , Análise de Sobrevida
9.
Medicine (Baltimore) ; 99(21): e20470, 2020 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-32481352

RESUMO

Clear cell renal cell carcinoma (ccRCC) is the most common subtype among renal cancer, and more and more researches find that the occurrence of ccRCC is associated with genetic changes, but the molecular mechanism still remains unclear. The present study aimed to identify aggregation trend of differentially expressed genes (DEGs) in ccRCC, which would be beneficial to the treatment of ccRCC and provide research ideas using a series of bioinformatics approach. Gene ontology (GO) and Kyoto Encyclopedia of Gene and Genomes (KEGG) analysis were used to get the enrichment trend of DEGs of GSE53757 and GSE16449. Draw Venn Diagram was applied for co-expression of DEGs. Cytoscape with the Retrieval of Interacting Gene (STRING) datasets and Molecular Complex Detection (MCODE) were performed protein-protein interaction (PPI) of DEGs. The Kaplan-Meier Plotter analysis of top 15 upregulated and top 15 downregulated were selected in Gene Expression Profiling Interactive Analysis (GEPIA). Then, the expression level of hub genes between normal renal tissue and different pathological stages of ccRCC tissue, which significantly correlated with overall survival in ccRCC patients, were also analyzed by Ualcan based on The Cancer Genome Atlas (TCGA) database. In this study, we got 167 co-expression DEGs, including 72 upregulated DEGs and 95 downregulated DEGs. We identified 11 hub genes had significantly correlated with overall survival in ccRCC patients. Among them, KIF23, APLN, ADCY1, GREB1, TLR4, IRF8, CXCL1, CXCL2, deserved our attention.


Assuntos
Adenocarcinoma de Células Claras/genética , Carcinoma de Células Renais/genética , Biologia Computacional/estatística & dados numéricos , Perfilação da Expressão Gênica/métodos , Estudos Observacionais como Assunto/normas , Adenocarcinoma de Células Claras/diagnóstico , Biomarcadores Tumorais/análise , Biomarcadores Tumorais/genética , Carcinoma de Células Renais/diagnóstico , Biologia Computacional/métodos , Bases de Dados Factuais/estatística & dados numéricos , Epidemiologia/instrumentação , Perfilação da Expressão Gênica/instrumentação , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Estimativa de Kaplan-Meier
10.
BMC Cancer ; 20(1): 490, 2020 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-32487193

RESUMO

BACKGROUND: Stomach cancer (SC) is a type of cancer, which is derived from the stomach mucous membrane. As there are non-specific symptoms or no noticeable symptoms observed at the early stage, newly diagnosed SC cases usually reach an advanced stage and are thus difficult to cure. Therefore, in this study, we aimed to develop an integrated database of SC. METHODS: SC-related genes were identified through literature mining and by analyzing the publicly available microarray datasets. Using the RNA-seq, miRNA-seq and clinical data downloaded from The Cancer Genome Atlas (TCGA), the Kaplan-Meier (KM) survival curves for all the SC-related genes were generated and analyzed. The miRNAs (miRanda, miRTarget2, PicTar, PITA and TargetScan databases), SC-related miRNAs (HMDD and miR2Disease databases), single nucleotide polymorphisms (SNPs, dbSNP database), and SC-related SNPs (ClinVar database) were also retrieved from the indicated databases. Moreover, gene_disease (OMIM and GAD databases), copy number variation (CNV, DGV database), methylation (PubMeth database), drug (WebGestalt database), and transcription factor (TF, TRANSFAC database) analyses were performed for the differentially expressed genes (DEGs). RESULTS: In total, 9990 SC-related genes (including 8347 up-regulated genes and 1643 down-regulated genes) were identified, among which, 65 genes were further confirmed as SC-related genes by performing enrichment analysis. Besides this, 457 miRNAs, 20 SC-related miRNAs, 1570 SNPs, 108 SC-related SNPs, 419 TFs, 44,605 CNVs, 3404 drug-associated genes, 63 genes with methylation, and KM survival curves of 20,264 genes were obtained. By integrating these datasets, an integrated database of stomach cancer, designated as SCDb, (available at http://www.stomachcancerdb.org/) was established. CONCLUSIONS: As a comprehensive resource for human SC, SCDb database will be very useful for performing SC-related research in future, and will thus promote the understanding of the pathogenesis of SC.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas/estatística & dados numéricos , Conjuntos de Dados como Assunto , Regulação Neoplásica da Expressão Gênica , Neoplasias Gástricas/genética , Biologia Computacional/estatística & dados numéricos , Redes Reguladoras de Genes , Humanos , Estimativa de Kaplan-Meier , MicroRNAs/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único , RNA-Seq/estatística & dados numéricos , Neoplasias Gástricas/mortalidade , Neoplasias Gástricas/patologia
11.
Exp Cell Res ; 391(1): 111923, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32135166

RESUMO

Growing evidence illustrates the shortcomings on the current understanding of the full complexity of the proteome. Previously overlooked small open reading frames (sORFs) and their encoded microproteins have filled important gaps, exerting their function as biologically relevant regulators. The characterization of the full small proteome has potential applications in many fields. Continuous development of techniques and tools led to an improved sORF discovery, where these can originate from bioinformatics analyses, from sequencing routines or proteomics approaches. In this mini review, we discuss the ongoing trends in the three fields and suggest some strategies for further characterization of high potential candidates.


Assuntos
Biologia Computacional/estatística & dados numéricos , Redes Neurais de Computação , Fases de Leitura Aberta , Biossíntese de Proteínas , Proteoma/genética , Ribossomos/genética , Animais , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Plantas/genética , Sinais Direcionadores de Proteínas/genética , Proteoma/classificação , Proteoma/metabolismo , Ribossomos/classificação , Ribossomos/metabolismo , Software
12.
J Pak Med Assoc ; 70(3): 427-431, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-32207419

RESUMO

OBJECTIVE: To study the orthologs of the five genes of congenital hypothyroidism NIS, PAX8, DUOX2, FOXE1, NKX2-1 that are involved in the development of the thyroid gland. METHODS: The study was conducted at INMOL Cancer Hospital, Lahore in September 2017 and comprised of finding gene orthologs, phylogenetic tree and domains of NIS, PAX8, DUOX2, FOXE1, NKX2-1 which were studied using different bioinformatics tools, including FASTA, BLAST, ENSEMBL, UniProt, MultiAlin, to find out the important domains involved in the mutations of these genes. RESULTS: Genes showed consensus sequence / motifs involved in congenital hypothyroidism. Phylogenetic results showed that these genes shared some common motifs. Phylogenetic trees revealed sub-clusters with high protein homology. CONCLUSIONS: Genes involved in congenital hypothyroidism were found to have a consensus sequence motifs.


Assuntos
Hipotireoidismo Congênito/genética , Oxidases Duais/genética , Fatores de Transcrição Forkhead/genética , Fator de Transcrição PAX8/genética , Simportadores/genética , Fator Nuclear 1 de Tireoide/genética , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Humanos , Mutação , Filogenia , Glândula Tireoide/metabolismo
13.
Curr Protein Pept Sci ; 21(11): 1044-1053, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32039677

RESUMO

Accumulating evidence demonstrates that miRNAs serve as critical biomarkers in various complex human diseases. Thus, identifying potential miRNA-disease associations has become a hot research topic for providing a better understanding of disease pathology, including cell carcinoma, cell proliferation and mevalonate pathway. Recently, based on various biological datasets, more and more computational prediction methods have been designed to uncover disease-related miRNAs for further experimental validation. Due to the fact that different limitations exist in previous computational methods, we proposed the model of Decision Template-based MiRNA-Disease Association prediction (DTMDA) to prioritize potential related miRNAs for diseases of interest. By integrating miRNA functional similarity network, miRNA Gaussian interaction profile kernel similarity network, two disease semantic similarity networks and disease Gaussian interaction profile kernel similarity network, we trained five multi-label K nearest neighbors-based core classifiers.


Assuntos
Neoplasias da Mama/diagnóstico , Carcinoma de Células Escamosas/diagnóstico , Neoplasias do Colo/diagnóstico , Biologia Computacional/estatística & dados numéricos , Neoplasias Esofágicas/diagnóstico , Neoplasias Renais/diagnóstico , MicroRNAs/genética , Algoritmos , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Carcinoma de Células Escamosas/genética , Carcinoma de Células Escamosas/metabolismo , Carcinoma de Células Escamosas/patologia , Linhagem Celular Tumoral , Proliferação de Células , Neoplasias do Colo/genética , Neoplasias do Colo/metabolismo , Neoplasias do Colo/patologia , Neoplasias Esofágicas/genética , Neoplasias Esofágicas/metabolismo , Neoplasias Esofágicas/patologia , Feminino , Estudos de Associação Genética , Genoma Humano , Humanos , Neoplasias Renais/genética , Neoplasias Renais/metabolismo , Neoplasias Renais/patologia , Masculino , MicroRNAs/metabolismo , Modelos Estatísticos , RNA Neoplásico/genética , RNA Neoplásico/metabolismo
14.
J Comput Biol ; 27(9): 1337-1340, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-31905016

RESUMO

The increasing availability of complex data in biology and medicine has promoted the use of machine learning in classification tasks to address important problems in translational and fundamental science. Two important obstacles, however, may limit the unraveling of the full potential of machine learning in these fields: the lack of generalization of the resulting models and the limited number of labeled data sets in some applications. To address these important problems, we developed an unsupervised ensemble algorithm called strategy for unsupervised multiple method aggregation (SUMMA). By virtue of being an ensemble method, SUMMA is more robust to generalization than the predictions it combines. By virtue of being unsupervised, SUMMA does not require labeled data. SUMMA receives as input predictions from a diversity of models and estimates their classification performance even when labeled data are unavailable. It then uses these performance estimates to combine these different predictions into an ensemble model. SUMMA can be applied to a variety of binary classification problems in bioinformatics including but not limited to gene network inference, cancer diagnostics, drug response prediction, somatic mutation, and differential expression calling. In this application note, we introduce the R/PY-SUMMA packages, available in R or Python, that implement the SUMMA algorithm.


Assuntos
Biologia Computacional/estatística & dados numéricos , Redes Reguladoras de Genes/genética , Aprendizado de Máquina não Supervisionado/estatística & dados numéricos , Algoritmos , Modelos Estatísticos
15.
J Cell Mol Med ; 23(12): 8381-8391, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31576674

RESUMO

The eutopic endometrium has been suggested to play a crucial role in the pathogenesis of adenomyosis. However, the specific genes in eutopic endometrium responsible for the pathogenesis of adenomyosis still remain to be elucidated. We aim to identify differentially expressed genes (DEGs) and molecular pathways/networks in eutopic endometrium from adenomyosis patients and provide a new insight into disease mechanisms at transcriptome level. RNA sequencing (RNA-Seq) was performed with 12 eutopic endometrium from adenomyosis and control groups. Differentially expressed genes in adenomyosis were validated by quantitative real-time PCR (qPCR) and immunochemistry. Functional annotations of the DEGs were analysed with Ingenuity Pathway Analysis (IPA). Quantitative DNA methylation analysis of CEBPB was performed with MassArray system. A total of 373 differentially expressed genes were identified in the adenomyosis eutopic endometrium compared to matched controls. Bioinformatic analysis predicted that IL-6 signalling and ERK/MAPK signalling were activated in adenomyosis endometrium. We also found that the increased expression and DNA hypomethylation of CEBPB were associated with adenomyosis. Our results revealed key pathways and networks in eutopic endometrium of adenomyosis. The study is the first to propose the association between C/EBPß and adenomyosis and can improve the understanding of the pathogenesis of adenomyosis.


Assuntos
Adenomiose/genética , Endométrio/metabolismo , Sequenciamento do Exoma/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Transcriptoma , Adenomiose/metabolismo , Adenomiose/fisiopatologia , Adulto , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Endométrio/patologia , Feminino , Redes Reguladoras de Genes , Humanos , Interleucina-6/genética , Interleucina-6/metabolismo , Sistema de Sinalização das MAP Quinases/genética , Pessoa de Meia-Idade , Transdução de Sinais/genética
16.
J Toxicol Environ Health A ; 82(17): 935-943, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31524549

RESUMO

MicroRNAs (miRNAs) are involved in various crucial biological processes including regulation of cell differentiation, proliferation, and migration, and are closely associated with tumor development. This study aimed to investigate miR-130b expression levels in lung cancer patient tissues. Two Gene Expression Omnibus (GEO) databases, including GSE48414 and GSE74190, and two The Cancer Genome Atlas (TCGA) databases including TCGA LUAD and TCGA LUSC, were accessed to obtain information for differential expression analysis and clinical-pathological correlation analysis. The results showed that miR-130b expression levels were significantly increased in lung cancer compared to normal tissues. Data also demonstrated that confounding factors such as tumor clinical stages and tumor invasion depth markedly affected miR-130b expression levels in cancer patients. A total of 169 target genes modified by miR-130b expression were identified by using 4 online websites for target gene prediction. Further enrichment analysis indicated that these 169 target genes were significantly enriched in several cancer-related biological processes and signaling pathways, including wound healing, cell proliferation, Wnt signaling, Ras signaling, and mTOR signaling. It was also of interest to examine the seven sites on the promoter region of miR-130b encoding gene in lung cancer patients and then compare methylation at these loci with miR-130b expression. The correlation analysis between encoding gene methylation and miR-130b expression in TCGA datasets revealed that decreased methylation in the promoter region was significantly associated with elevated miR-130b expression. This phenomenon was markedly dependent upon smoking history and clinical-pathological features. In conclusion, data indicated alterations in the methylation of DNA promoter region of miR-130b encoding gene were associated with disturbances in miR-130b expression in lung cancer patients suggesting that the DNA methylation process and miR-130b expression may serve as biomarkers for detection of lung cancer.


Assuntos
Carcinoma de Células Escamosas/genética , Carcinoma de Células Escamosas/fisiopatologia , Metilação de DNA/genética , Regulação Neoplásica da Expressão Gênica , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/fisiopatologia , MicroRNAs/genética , Biologia Computacional/estatística & dados numéricos , Humanos
17.
PLoS One ; 14(8): e0221068, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31437182

RESUMO

Clustering homologous sequences based on their similarity is a problem that appears in many bioinformatics applications. The fact that sequences cluster is ultimately the result of their phylogenetic relationships. Despite this observation and the natural ways in which a tree can define clusters, most applications of sequence clustering do not use a phylogenetic tree and instead operate on pairwise sequence distances. Due to advances in large-scale phylogenetic inference, we argue that tree-based clustering is under-utilized. We define a family of optimization problems that, given an arbitrary tree, return the minimum number of clusters such that all clusters adhere to constraints on their heterogeneity. We study three specific constraints, limiting (1) the diameter of each cluster, (2) the sum of its branch lengths, or (3) chains of pairwise distances. These three problems can be solved in time that increases linearly with the size of the tree, and for two of the three criteria, the algorithms have been known in the theoretical computer scientist literature. We implement these algorithms in a tool called TreeCluster, which we test on three applications: OTU clustering for microbiome data, HIV transmission clustering, and divide-and-conquer multiple sequence alignment. We show that, by using tree-based distances, TreeCluster generates more internally consistent clusters than alternatives and improves the effectiveness of downstream applications. TreeCluster is available at https://github.com/niemasd/TreeCluster.


Assuntos
Algoritmos , Biologia Computacional/estatística & dados numéricos , HIV/genética , Microbiota/genética , Filogenia , Alinhamento de Sequência/estatística & dados numéricos , Sequência de Bases , Análise por Conglomerados , Biologia Computacional/métodos , HIV/classificação , Infecções por HIV/epidemiologia , Infecções por HIV/transmissão , Infecções por HIV/virologia , Humanos , Software
18.
BMC Genomics ; 20(1): 539, 2019 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-31266446

RESUMO

BACKGROUND: Long non-coding RNA (lncRNA) expression data have been increasingly used in finding diagnostic and prognostic biomarkers in cancer studies. Existing differential analysis tools for RNA sequencing do not effectively accommodate low abundant genes, as commonly observed in lncRNAs. RESULTS: We investigated the statistical distribution of normalized counts for low expression genes in lncRNAs and mRNAs, and proposed a new tool lncDIFF based on the underlying distribution pattern to detect differentially expressed (DE) lncRNAs. lncDIFF adopts the generalized linear model with zero-inflated Exponential quasi-likelihood to estimate group effect on normalized counts, and employs the likelihood ratio test to detect differential expressed genes. The proposed method and tool are applicable to data processed with standard RNA-Seq preprocessing and normalization pipelines. Simulation results showed that lncDIFF was able to detect DE genes with more power and lower false discovery rate regardless of the data pattern, compared to DESeq2, edgeR, limma, zinbwave, DEsingle, and ShrinkBayes. In the analysis of a head and neck squamous cell carcinomas data, lncDIFF also appeared to have higher sensitivity in identifying novel lncRNA genes with relatively large fold change and prognostic value. CONCLUSIONS: lncDIFF is a powerful differential analysis tool for low abundance non-coding RNA expression data. This method is compatible with various existing RNA-Seq quantification and normalization tools. lncDIFF is implemented in an R package available at https://github.com/qianli10000/lncDIFF .


Assuntos
Biologia Computacional/estatística & dados numéricos , RNA Longo não Codificante/genética , Software , Área Sob a Curva , Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica , Neoplasias de Cabeça e Pescoço/genética , Humanos , Funções Verossimilhança , Modelos Lineares , Modelos Genéticos , Carcinoma de Células Escamosas de Cabeça e Pescoço/genética
19.
Blood Adv ; 3(12): 1837-1847, 2019 06 25.
Artigo em Inglês | MEDLINE | ID: mdl-31208955

RESUMO

Patients with myelodysplastic syndromes (MDS) or acute myeloid leukemia (AML) are generally older and have more comorbidities. Therefore, identifying personalized treatment options for each patient early and accurately is essential. To address this, we developed a computational biology modeling (CBM) and digital drug simulation platform that relies on somatic gene mutations and gene CNVs found in malignant cells of individual patients. Drug treatment simulations based on unique patient-specific disease networks were used to generate treatment predictions. To evaluate the accuracy of the genomics-informed computational platform, we conducted a pilot prospective clinical study (NCT02435550) enrolling confirmed MDS and AML patients. Blinded to the empirically prescribed treatment regimen for each patient, genomic data from 50 evaluable patients were analyzed by CBM to predict patient-specific treatment responses. CBM accurately predicted treatment responses in 55 of 61 (90%) simulations, with 33 of 61 true positives, 22 of 61 true negatives, 3 of 61 false positives, and 3 of 61 false negatives, resulting in a sensitivity of 94%, a specificity of 88%, and an accuracy of 90%. Laboratory validation further confirmed the accuracy of CBM-predicted activated protein networks in 17 of 19 (89%) samples from 11 patients. Somatic mutations in the TET2, IDH1/2, ASXL1, and EZH2 genes were discovered to be highly informative of MDS response to hypomethylating agents. In sum, analyses of patient cancer genomics using the CBM platform can be used to predict precision treatment responses in MDS and AML patients.


Assuntos
Biologia Computacional/métodos , Genômica/instrumentação , Leucemia Mieloide Aguda/genética , Síndromes Mielodisplásicas/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Biologia Computacional/estatística & dados numéricos , Variações do Número de Cópias de DNA/genética , Metilação de DNA/efeitos dos fármacos , Proteínas de Ligação a DNA/genética , Dioxigenases , Proteína Potenciadora do Homólogo 2 de Zeste/genética , Feminino , Humanos , Isocitrato Desidrogenase/genética , Leucemia Mieloide Aguda/terapia , Masculino , Pessoa de Meia-Idade , Mutação , Síndromes Mielodisplásicas/terapia , Ensaios Clínicos Controlados não Aleatórios como Assunto , Medicina de Precisão/instrumentação , Valor Preditivo dos Testes , Estudos Prospectivos , Proteínas Proto-Oncogênicas/genética , Proteínas Repressoras/genética , Sensibilidade e Especificidade , Fatores de Transcrição/genética , Resultado do Tratamento
20.
Gene ; 706: 188-200, 2019 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-31085273

RESUMO

Due to the rapid development of DNA microarray technology, a large number of microarray data come into being and classifying these data has been verified useful for cancer diagnosis, treatment and prevention. However, microarray data classification is still a challenging task since there are often a huge number of genes but a small number of samples in gene expression data. As a result, a computational method for reducing the dimension of microarray data is necessary. In this paper, we introduce a computational gene selection model for microarray data classification via adaptive hypergraph embedded dictionary learning (AHEDL). Specifically, a dictionary is learned from the feature space of original high dimensional microarray data, and this learned dictionary is used to represent original genes with a reconstruction coefficient matrix. Then we use a l2, 1-norm regularization to impose the row sparsity on the coefficient matrix for selecting discriminate genes. Meanwhile, in order to capture the localmanifold geometrical structure of original microarray data in a high-order manner, a hypergraph is adaptively learned and embedded into the model. An iterative updating algorithm is designed for solving the optimization problem. In order to validate the efficacy of the proposed model, we have conducted experiments on six publicly available microarray data sets and the results demonstrate that AHEDL outperforms other state-of-the-art methods in terms of microarray data classification. ABBREVIATIONS.


Assuntos
Biologia Computacional/métodos , Análise em Microsséries/métodos , Algoritmos , Big Data , Biologia Computacional/estatística & dados numéricos , Análise de Dados , Humanos , Análise em Microsséries/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA