Your browser doesn't support javascript.
loading
Graph Traversal Edit Distance and Extensions.
Ebrahimpour Boroojeny, Ali; Shrestha, Akash; Sharifi-Zarchi, Ali; Gallagher, Suzanne Renick; Sahinalp, S Cenk; Chitsaz, Hamidreza.
Afiliación
  • Ebrahimpour Boroojeny A; Department of Computer Science, Colorado State University, Fort Collins, Colorado.
  • Shrestha A; Department of Computer Science, Colorado State University, Fort Collins, Colorado.
  • Sharifi-Zarchi A; Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
  • Gallagher SR; Department of Computer Science, Colorado State University, Fort Collins, Colorado.
  • Sahinalp SC; National Cancer Institute, NIH, Bethesda, Maryland.
  • Chitsaz H; Department of Computer Science, Colorado State University, Fort Collins, Colorado.
J Comput Biol ; 27(3): 317-329, 2020 03.
Article en En | MEDLINE | ID: mdl-32058803
ABSTRACT
Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space, which renders graph kernels important for the aforementioned applications. In this article, we give a new graph kernel, which we call graph traversal edit distance (GTED). We introduce the GTED problem and give the first polynomial time algorithm for it. Informally, the GTED is the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. Also, GTED is motivated by and provides the first mathematical formalism for sequence co-assembly and de novo variation detection in bioinformatics. We demonstrate that GTED admits a polynomial time algorithm using a linear program in the graph product space that is guaranteed to yield an integer solution. To the best of our knowledge, this is the first approach to this problem. We also give a linear programming relaxation algorithm for a lower bound on GTED. We use GTED as a graph kernel and evaluate it by computing the accuracy of a support vector machine (SVM) classifier on a few data sets in the literature. Our results suggest that our kernel outperforms many of the common graph kernels in the tested data sets. As a second set of experiments, we successfully cluster viral genomes using GTED on their assembly graphs obtained from de novo assembly of next-generation sequencing reads.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Programación Lineal / Biología Computacional Límite: Animals / Humans Idioma: En Revista: J Comput Biol Asunto de la revista: BIOLOGIA MOLECULAR / INFORMATICA MEDICA Año: 2020 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Programación Lineal / Biología Computacional Límite: Animals / Humans Idioma: En Revista: J Comput Biol Asunto de la revista: BIOLOGIA MOLECULAR / INFORMATICA MEDICA Año: 2020 Tipo del documento: Article