Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinform Adv ; 4(1): vbae009, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38736682

RESUMO

Motivation: Post-market unexpected Adverse Drug Reactions (ADRs) are associated with significant costs, in both financial burden and human health. Due to the high cost and time required to run clinical trials, there is significant interest in accurate computational methods that can aid in the prediction of ADRs for new drugs. As a machine learning task, ADR prediction is made more challenging due to a high degree of class imbalance and existing methods do not successfully balance the requirement to detect the minority cases (true positives for ADR), as measured by the Area Under the Precision-Recall (AUPR) curve with the ability to separate true positives from true negatives [as measured by the Area Under the Receiver Operating Characteristic (AUROC) curve]. Surprisingly, the performance of most existing methods is worse than a naïve method that attributes ADRs to drugs according to the frequency with which the ADR has been observed over all other drugs. The existing advanced methods applied do not lead to substantial gains in predictive performance. Results: We designed a rigorous evaluation to provide an unbiased estimate of the performance of ADR prediction methods: Nested Cross-Validation and a hold-out set were adopted. Among the existing methods, Kernel Regression (KR) performed best in AUPR but had a disadvantage in AUROC, relative to other methods, including the naïve method. We proposed a novel method that combines non-negative matrix factorization with kernel regression, called VKR. This novel approach matched or exceeded the performance of existing methods, overcoming the weakness of the existing methods. Availability: Code and data are available on https://github.com/YezhaoZhong/VKR.

2.
Neural Netw ; 156: 205-217, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36274527

RESUMO

The scarcity of high-quality annotations in many application scenarios has recently led to an increasing interest in devising learning techniques that combine unlabeled data with labeled data in a network. In this work, we focus on the label propagation problem in multilayer networks. Our approach is inspired by the heat diffusion model, which shows usefulness in machine learning problems such as classification and dimensionality reduction. We propose a novel boundary-based heat diffusion algorithm that guarantees a closed-form solution with an efficient implementation. We experimentally validated our method on synthetic networks and five real-world multilayer network datasets representing scientific coauthorship, spreading drug adoption among physicians, two bibliographic networks, and a movie network. The results demonstrate the benefits of the proposed algorithm, where our boundary-based heat diffusion dominates the performance of the state-of-the-art methods.


Assuntos
Temperatura Alta , Aprendizado de Máquina Supervisionado , Algoritmos , Aprendizado de Máquina
3.
IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 1203-1213, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-33064647

RESUMO

Semi-Supervised Learning (SSL)is an approach to machine learning that makes use of unlabeled data for training with a small amount of labeled data. In the context of molecular biology and pharmacology, one can take advantage of unlabeled data. For instance, to identify drugs and targets where a few genes are known to be associated with a specific target for drugs and considered as labeled data. Labeling the genes requires laboratory verification and validation. This process is usually very time consuming and expensive. Thus, it is useful to estimate the functional role of drugs from unlabeled data using computational methods. To develop such a model, we used openly available data resources to create (i)drugs and genes, (ii)genes and disease, bipartite graphs. We constructed the genetic embedding graph from the two bipartite graphs using Tensor Factorization methods. We integrated the genetic embedding graph with the publicly available protein functional association network. Our results show the usefulness of the integration by effectively predicting drug labels.


Assuntos
Proteínas , Aprendizado de Máquina Supervisionado , Proteínas/genética , Proteínas/metabolismo
4.
J Mol Evol ; 88(7): 549-561, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32617614

RESUMO

Phylogenetic models of the evolution of protein-coding sequences can provide insights into the selection pressures that have shaped them. In the application of these models synonymous nucleotide substitutions, which do not alter the encoded amino acid, are often assumed to have limited functional consequences and used as a proxy for the neutral rate of evolution. The ratio of nonsynonymous to synonymous substitution rates is then used to categorize the selective regime that applies to the protein (e.g., purifying selection, neutral evolution, diversifying selection). Here, we extend the Muse and Gaut model of codon evolution to explore the extent of purifying selection acting on substitutions between synonymous stop codons. Using a large collection of coding sequence alignments, we estimate that a high proportion (approximately 57%) of mammalian genes are affected by selection acting on stop codon preference. This proportion varies substantially by codon, with UGA stop codons far more likely to be conserved. Genes with evidence of selection acting on synonymous stop codons have distinctive characteristics, compared to unconserved genes with the same stop codon, including longer [Formula: see text] untranslated regions (UTRs) and shorter mRNA half-life. The coding regions of these genes are also much more likely to be under strong purifying selection pressure. Our results suggest that the preference for UGA stop codons found in many multicellular eukaryotes is selective rather than mutational in origin.


Assuntos
Códon de Terminação , Evolução Molecular , Mamíferos/genética , Modelos Genéticos , Animais , Humanos , Filogenia
5.
BMC Bioinformatics ; 20(1): 462, 2019 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-31500564

RESUMO

BACKGROUND: Determining the association between tumor sample and the gene is demanding because it requires a high cost for conducting genetic experiments. Thus, the discovered association between tumor sample and gene further requires clinical verification and validation. This entire mechanism is time-consuming and expensive. Due to this issue, predicting the association between tumor samples and genes remain a challenge in biomedicine. RESULTS: Here we present, a computational model based on a heat diffusion algorithm which can predict the association between tumor samples and genes. We proposed a 2-layered graph. In the first layer, we constructed a graph of tumor samples and genes where these two types of nodes are connected by "hasGene" relationship. In the second layer, the gene nodes are connected by "interaction" relationship. We applied the heat diffusion algorithms in nine different variants of genetic interaction networks extracted from STRING and BioGRID database. The heat diffusion algorithm predicted the links between tumor samples and genes with mean AUC-ROC score of 0.84. This score is obtained by using weighted genetic interactions of fusion or co-occurrence channels from the STRING database. For the unweighted genetic interaction from the BioGRID database, the algorithms predict the links with an AUC-ROC score of 0.74. CONCLUSIONS: We demonstrate that the gene-gene interaction scores could improve the predictive power of the heat diffusion model to predict the links between tumor samples and genes. We showed the efficient runtime of the heat diffusion algorithm in various genetic interaction network. We statistically validated our prediction quality of the links between tumor samples and genes.


Assuntos
Algoritmos , Genes Neoplásicos , Neoplasias/genética , Área Sob a Curva , Metilação de DNA/genética , Bases de Dados Factuais , Difusão , Epistasia Genética , Redes Reguladoras de Genes , Humanos , Curva ROC , Reprodutibilidade dos Testes
6.
Sci Rep ; 9(1): 10436, 2019 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-31320740

RESUMO

Identifying the unintended effects of drugs (side effects) is a very important issue in pharmacological studies. The laboratory verification of associations between drugs and side effects requires costly, time-intensive research. Thus, an approach to predicting drug side effects based on known side effects, using a computational model, is highly desirable. To provide such a model, we used openly available data resources to model drugs and side effects as a bipartite graph. The drug-drug network is constructed using the word2vec model where the edges between drugs represent the semantic similarity between them. We integrated the bipartite graph and the semantic similarity graph using a matrix factorization method and a diffusion based model. Our results show the effectiveness of this integration by computing weighted (i.e., ranked) predictions of initially unknown links between side effects and drugs.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/etiologia , Preparações Farmacêuticas/química , Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Difusão , Descoberta de Drogas/métodos , Humanos , Semântica
7.
Neuroscience ; 349: 64-75, 2017 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-28257890

RESUMO

Fragile X mental retardation protein (FMRP), an important RNA-binding protein responsible for fragile X syndrome, is involved in posttranscriptional control of gene expression that links with brain development and synaptic functions. Here, we reveal a novel role of FMRP in pre-mRNA alternative splicing, a general event of posttranscriptional regulation. Using co-immunoprecipitation and immunofluorescence assays, we identified that FMRP interacts with an alternative-splicing-associated protein RNA-binding protein 14 (RBM14) in a RNA-dependent fashion, and the two proteins partially colocalize in the nuclei of hippocampal neurons. We show that the relative skipping/inclusion ratio of the micro-exon L in the Protrudin gene and exon 10 in the Tau gene decreased in the hippocampus of Fmr1 knockout (KO) mice. Knockdown of either FMRP or RBM14 alters the relative skipping/inclusion ratio of Protrudin and Tau in cultured Neuro-2a cells, similar to that in the Fmr1 KO mice. Furthermore, overexpression of FMRP leads to an opposite pattern of the splicing, which can be offset by RBM14 knockdown. RNA immunoprecipitation assays indicate that FMRP promotes RBM14's binding to the mRNA targets. In addition, overexpression of the long form of Protrudin or the short form of Tau promotes protrusion growth of the retinoic acid-treated, neuronal-differentiated Neuro-2a cells. Together, these data suggest a novel function of FMRP in the regulation of pre-mRNA alternative splicing through RBM14 that may be associated with normal brain function and FMRP-related neurological disorders.


Assuntos
Processamento Alternativo/genética , Proteína do X Frágil da Deficiência Intelectual/metabolismo , Precursores de RNA/genética , Animais , Células Cultivadas , Proteína do X Frágil da Deficiência Intelectual/genética , Hipocampo/metabolismo , Imunoprecipitação/métodos , Camundongos Knockout , Neurônios/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo
8.
Mol Neurobiol ; 54(4): 2585-2594, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-26993298

RESUMO

Fragile X mental retardation protein (FMRP), associated with fragile X syndrome, is known as an RNA-binding protein to regulate gene expression at post-transcriptional level in the brain. FMRP is also involved in microRNA (miRNA) biogenesis during the process of precursor miRNA (pre-miRNA) into mature miRNA. However, there is no description of the effect of FMRP on primary miRNA (pri-miRNA) processing. Here, we uncover a novel role of FMRP in pri-miRNA processing via controlling Drosha translation. We show that the expression of DROSHA protein, instead of its messenger RNA (mRNA) transcripts, is downregulated in both the hippocampus of Fmr1-knockout mice and the FMRP-knockdown Neuro-2a cells. Overexpression or knockdown FMRP does not alter Drosha mRNA stability. Immunoprecipitation and polysome analyses demonstrate that FMRP binds to the Drosha mRNA and enhances its translation. Additionally, we show that loss of FMRP in Fmr1-deficient mice results in the accumulation of three in six analyzed pri-miRNAs and the reduction of the corresponding pre-miRNAs and mature miRNAs. Thus, our data suggest that FMRP is involved in pri-miRNA processing via enhancing DROSHA expression that may play an important role in fragile X syndrome.


Assuntos
Proteína do X Frágil da Deficiência Intelectual/metabolismo , MicroRNAs/genética , Biossíntese de Proteínas/genética , Processamento Pós-Transcricional do RNA/genética , Ribonuclease III/genética , Animais , Linhagem Celular Tumoral , Regulação para Baixo/genética , Técnicas de Silenciamento de Genes , Camundongos Knockout , MicroRNAs/metabolismo , Ligação Proteica/genética , Estabilidade de RNA/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Ribonuclease III/metabolismo , Regulação para Cima/genética
9.
PLoS One ; 11(10): e0164880, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27741311

RESUMO

Nonnegative Matrix Factorization (NMF) has proved to be an effective method for unsupervised clustering analysis of gene expression data. By the nonnegativity constraint, NMF provides a decomposition of the data matrix into two matrices that have been used for clustering analysis. However, the decomposition is not unique. This allows different clustering results to be obtained, resulting in different interpretations of the decomposition. To alleviate this problem, some existing methods directly enforce uniqueness to some extent by adding regularization terms in the NMF objective function. Alternatively, various normalization methods have been applied to the factor matrices; however, the effects of the choice of normalization have not been carefully investigated. Here we investigate the performance of NMF for the task of cancer class discovery, under a wide range of normalization choices. After extensive evaluations, we observe that the maximum norm showed the best performance, although the maximum norm has not previously been used for NMF. Matlab codes are freely available from: http://maths.nuigalway.ie/~haixuanyang/pNMF/pNMF.htm.


Assuntos
Neoplasias/genética , Algoritmos , Análise por Conglomerados , Regulação Neoplásica da Expressão Gênica , Humanos , MicroRNAs/metabolismo , Neoplasias/metabolismo , Neoplasias/patologia
10.
J R Soc Interface ; 11(99)2014 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-25142519

RESUMO

We aimed to test the proposal that progressive combinations of multiple promoter elements acting in concert may be responsible for the full range of phases observed in plant circadian output genes. In order to allow reliable selection of informative phase groupings of genes for our purpose, intrinsic cyclic patterns of expression were identified using a novel, non-biased method for the identification of circadian genes. Our non-biased approach identified two dominant, inherent orthogonal circadian trends underlying publicly available microarray data from plants maintained under constant conditions. Furthermore, these trends were highly conserved across several plant species. Four phase-specific modules of circadian genes were generated by projection onto these trends and, in order to identify potential combinatorial promoter elements that might classify genes into these groups, we used a Random Forest pipeline which merged data from multiple decision trees to look for the presence of element combinations. We identified a number of regulatory motifs which aggregated into coherent clusters capable of predicting the inclusion of genes within each phase module with very high fidelity and these motif combinations changed in a consistent, progressive manner from one phase module group to the next, providing strong support for our hypothesis.


Assuntos
Relógios Circadianos/genética , Regulação da Expressão Gênica de Plantas/fisiologia , Redes Reguladoras de Genes/genética , Genes de Plantas/genética , Fenômenos Fisiológicos Vegetais , Regiões Promotoras Genéticas/genética
11.
Bioinformatics ; 30(15): 2235-6, 2014 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-24659104

RESUMO

SUMMARY: We present GOssTo, the Gene Ontology semantic similarity Tool, a user-friendly software system for calculating semantic similarities between gene products according to the Gene Ontology. GOssTo is bundled with six semantic similarity measures, including both term- and graph-based measures, and has extension capabilities to allow the user to add new similarities. Importantly, for any measure, GOssTo can also calculate the Random Walk Contribution that has been shown to greatly improve the accuracy of similarity measures. GOssTo is very fast, easy to use, and it allows the calculation of similarities on a genomic scale in a few minutes on a regular desktop machine. CONTACT: alberto@cs.rhul.ac.uk AVAILABILITY: GOssTo is available both as a stand-alone application running on GNU/Linux, Windows and MacOS from www.paccanarolab.org/gossto and as a web application from www.paccanarolab.org/gosstoweb. The stand-alone application features a simple and concise command line interface for easy integration into high-throughput data processing pipelines.


Assuntos
Mineração de Dados/métodos , Ontologia Genética , Internet , Semântica , Software , Proteínas/genética , Vocabulário Controlado
12.
Onco Targets Ther ; 6: 925-9, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23926435

RESUMO

OBJECTIVES: We aimed to evaluate the efficacy and safety of combination therapy of Endostar (recombinant human endostatin) and S-1 combined with oxaliplatin (SOX) in patients with advanced gastric cancer. METHODS: In this randomized, controlled trial, 165 late-stage gastric cancer patients were assigned to the experimental arm with Endostar in combination with SOX (80 patients) and the control arm with SOX alone (85 patients). The end points of this study included progression-free survival, response rate, and disease-control rate. RESULTS: There was no statistically significant difference in response rate between the experimental arm and the control arm (53.8% vs 42.4%, P=0.188). The difference in disease-control rate was also statistically insignificant between the two arms (85.0% vs 72.9%, P=0.188). Progression-free survival in the experimental arm was significantly higher than that in the control arm (15.0 months vs 12.0 months, P=0.0001). Common adverse events included immunosuppression, gastrointestinal distress, and neuropathy. There was no statistical difference in the incidences of adverse events. CONCLUSION: Combination therapy of Endostar and SOX provides therapeutic benefits to advanced gastric cancer patients, with tolerable adverse effects.

13.
Nat Methods ; 10(3): 221-7, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23353650

RESUMO

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.


Assuntos
Biologia Computacional/métodos , Biologia Molecular/métodos , Anotação de Sequência Molecular , Proteínas/fisiologia , Algoritmos , Animais , Bases de Dados de Proteínas , Exorribonucleases/classificação , Exorribonucleases/genética , Exorribonucleases/fisiologia , Previsões , Humanos , Proteínas/química , Proteínas/classificação , Proteínas/genética , Especificidade da Espécie
14.
Cell ; 150(5): 1068-81, 2012 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-22939629

RESUMO

Cellular processes often depend on stable physical associations between proteins. Despite recent progress, knowledge of the composition of human protein complexes remains limited. To close this gap, we applied an integrative global proteomic profiling approach, based on chromatographic separation of cultured human cell extracts into more than one thousand biochemical fractions that were subsequently analyzed by quantitative tandem mass spectrometry, to systematically identify a network of 13,993 high-confidence physical interactions among 3,006 stably associated soluble human proteins. Most of the 622 putative protein complexes we report are linked to core biological processes and encompass both candidate disease genes and unannotated proteins to inform on mechanism. Strikingly, whereas larger multiprotein assemblies tend to be more extensively annotated and evolutionarily conserved, human protein complexes with five or fewer subunits are far more likely to be functionally unannotated or restricted to vertebrates, suggesting more recent functional innovations.


Assuntos
Complexos Multiproteicos/análise , Mapas de Interação de Proteínas , Proteínas/química , Proteômica/métodos , Humanos , Espectrometria de Massas em Tandem
15.
PLoS One ; 7(8): e39681, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22879875

RESUMO

The Guilt-by-Association (GBA) principle, according to which genes with similar expression profiles are functionally associated, is widely applied for functional analyses using large heterogeneous collections of transcriptomics data. However, the use of such large collections could hamper GBA functional analysis for genes whose expression is condition specific. In these cases a smaller set of condition related experiments should instead be used, but identifying such functionally relevant experiments from large collections based on literature knowledge alone is an impractical task. We begin this paper by analyzing, both from a mathematical and a biological point of view, why only condition specific experiments should be used in GBA functional analysis. We are able to show that this phenomenon is independent of the functional categorization scheme and of the organisms being analyzed. We then present a semi-supervised algorithm that can select functionally relevant experiments from large collections of transcriptomics experiments. Our algorithm is able to select experiments relevant to a given GO term, MIPS FunCat term or even KEGG pathways. We extensively test our algorithm on large dataset collections for yeast and Arabidopsis. We demonstrate that: using the selected experiments there is a statistically significant improvement in correlation between genes in the functional category of interest; the selected experiments improve GBA-based gene function prediction; the effectiveness of the selected experiments increases with annotation specificity; our algorithm can be successfully applied to GBA-based pathway reconstruction. Importantly, the set of experiments selected by the algorithm reflects the existing literature knowledge about the experiments. [A MATLAB implementation of the algorithm and all the data used in this paper can be downloaded from the paper website: http://www.paccanarolab.org/papers/CorrGene/].


Assuntos
Algoritmos , Arabidopsis/genética , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Saccharomyces cerevisiae/genética , Redes Reguladoras de Genes/genética , Genes Fúngicos/genética , Genes de Plantas/genética , Anotação de Sequência Molecular , Curva ROC
16.
Bioinformatics ; 28(10): 1383-9, 2012 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-22522134

RESUMO

MOTIVATION: Several measures have been recently proposed for quantifying the functional similarity between gene products according to well-structured controlled vocabularies where biological terms are organized in a tree or in a directed acyclic graph (DAG) structure. However, existing semantic similarity measures ignore two important facts. First, when calculating the similarity between two terms, they disregard the descendants of these terms. While this makes no difference when the ontology is a tree, we shall show that it has important consequences when the ontology is a DAG-this is the case, for example, with the Gene Ontology (GO). Second, existing similarity measures do not model the inherent uncertainty which comes from the fact that our current knowledge of the gene annotation and of the ontology structure is incomplete. Here, we propose a novel approach based on downward random walks that can be used to improve any of the existing similarity measures to exhibit these two properties. The approach is computationally efficient-random walks do not need to be simulated as we provide formulas to calculate their stationary distributions. RESULTS: To show that our approach can potentially improve any semantic similarity measure, we test it on six different semantic similarity measures: three commonly used measures by Resnik (1999), Lin (1998), and Jiang and Conrath (1997); and three recently proposed measures: simUI, simGIC by Pesquita et al. (2008); GraSM by Couto et al. (2007); and Couto and Silva (2011). We applied these improved measures to the GO annotations of the yeast Saccharomyces cerevisiae, and tested how they correlate with sequence similarity, mRNA co-expression and protein-protein interaction data. Our results consistently show that the use of downward random walks leads to more reliable similarity measures.


Assuntos
Algoritmos , Semântica , Vocabulário Controlado , Anotação de Sequência Molecular , Complexos Multiproteicos/genética , Proteínas/genética , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Terminologia como Assunto , Incerteza
17.
IEEE Trans Syst Man Cybern B Cybern ; 39(2): 417-30, 2009 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-19095552

RESUMO

Heat-diffusion models have been successfully applied to various domains such as classification and dimensionality-reduction tasks in manifold learning. One critical local approximation technique is employed to weigh the edges in the graph constructed from data points. This approximation technique is based on an implicit assumption that the data are distributed evenly. However, this assumption is not valid in most cases, so the approximation is not accurate in these cases. To solve this challenging problem, we propose a volume-based heat-diffusion model (VHDM). In VHDM, the volume is theoretically justified by handling the input data that are unevenly distributed on an unknown manifold. We also propose a novel volume-based heat-diffusion classifier (VHDC) based on VHDM. One of the advantages of VHDC is that the computational complexity is linear on the number of edges given a constructed graph. Moreover, we give an analysis on the stability of VHDC with respect to its three free parameters, and we demonstrate the connection between VHDC and some other classifiers. Experiments show that VHDC performs better than Parzen window approach, K nearest neighbor, and the HDC without volumes in prediction accuracy and outperforms some recently proposed transductive-learning algorithms. The enhanced performance of VHDC shows the validity of introducing the volume. The experiments also confirm the stability of VHDC with respect to its three free parameters.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...